The GW/UMD CLPsych 2016 Shared Task System

  title={The GW/UMD CLPsych 2016 Shared Task System},
  author={Ayah Zirikly and Varun Kumar and Philip Resnik},
Suicide is the third leading cause for death for young people, and in an average U.S. high school classroom, 30% have experienced a long period of feeling hopeless, 20% have been bullied, 16.7% have seriously considered suicide, and 6.7% of students have actually made a suicide attempt.1 The 2016 ACL Workshop on Computational Linguistics and Clinical Psychology (CLPsych) included a shared task focusing on classification of posts to ReachOut, an online information and support service that… 

Figures and Tables from this paper

CLPsych 2018 Shared Task: Predicting Current and Future Psychological Health from Childhood Essays

This shared task represents one of the first attempts to evaluate the use of early language to predict future health and has the potential to support a wide variety of clinical health care tasks, from early assessment of lifetime risk for mental health problems, to optimal timing for targeted interventions aimed at both prevention and treatment.

Are You Really Okay? A Transfer Learning-based Approach for Identification of Underlying Mental Illnesses

A novel state-of-the-art transfer learning-based approach that learns from linguistic feature spaces of previous conditions and predicts unknown ones, offering promising evidence that language models can harness learned patterns from known mental health conditions to aid in their prediction of others that may lie latent.

Lightme: analysing language in internet support groups for mental health

It is possible to build a competitive triage classifier using features derived only from the textual content of the post using a dataset from mental health forum for young people.

CLPsych 2016 Shared Task: Triaging content in online peer-support forums

This paper introduces a new shared task that aims to directly support the moderators of a youth mental health forum by asking participants to automatically triage posts into one of four severity labels: green, amber, red or crisis.

Using contextual information for automatic triage of posts in a peer-support forum

A Machine Learning approach for the triage of posts that consists of the development and implementation of a large variety of new features from both, the content and the context of posts, such as previous messages, interaction with other users and author’s history.

Expert, Crowdsourced, and Machine Assessment of Suicide Risk via Online Postings

Evaluation of risk-level annotations by experts yields what is, to the authors' knowledge, the first demonstration of reliability in risk assessment by clinicians based on social media postings.

Broadening horizons: the case for capturing function and the role of health informatics in its use

How activity and participation information can be more effectively captured and how health informatics methodologies, including natural language processing (NLP), can enable automatically locating, extracting, and organizing this information on a large scale, supporting standardization and utilization with minimal additional provider burden are described.



Predicting Depression via Social Media

It is found that social media contains useful signals for characterizing the onset of depression in individuals, as measured through decrease in social activity, raised negative affect, highly clustered egonetworks, heightened relational and medicinal concerns, and greater expression of religious involvement.


It is shown how the combined strength and wisdom of the crowds can be used to generate a large, high‐quality, word–emotion and word–polarity association lexicon quickly and inexpensively.

Ensemble based systems in decision making

  • R. Polikar
  • Computer Science
    IEEE Circuits and Systems Magazine
  • 2006
Conditions under which ensemble based systems may be more beneficial than their single classifier counterparts are reviewed, algorithms for generating individual components of the ensemble systems, and various procedures through which the individual classifiers can be combined are reviewed.

An Empirical Study of Learning from Imbalanced Data Using Random Forest

A comprehensive suite of experiments that analyze the performance of the random forest (RF) learner implemented in Weka are discussed, providing an extensive empirical evaluation of RF learners built from imbalanced data.

The Stanford CoreNLP Natural Language Processing Toolkit

The design and use of the Stanford CoreNLP toolkit is described, an extensible pipeline that provides core natural language analysis, and it is suggested that this follows from a simple, approachable design, straightforward interfaces, the inclusion of robust and good quality analysis components, and not requiring use of a large amount of associated baggage.

Distributed Representations of Words and Phrases and their Compositionality

This paper presents a simple method for finding phrases in text, and shows that learning good vector representations for millions of phrases is possible and describes a simple alternative to the hierarchical softmax called negative sampling.

The Psychological Meaning of Words: LIWC and Computerized Text Analysis Methods

We are in the midst of a technological revolution whereby, for the first time, researchers can link daily word use to a broad array of real-world behaviors. This article reviews several computerized

Supportvector networks

  • Mach. Learn., 20(3):273–297, September.
  • 1995