• Publications
  • Influence
A Discourse-Aware Attention Model for Abstractive Summarization of Long Documents
This work proposes the first model for abstractive summarization of single, longer-form documents (e.g., research papers), consisting of a new hierarchical encoder that models the discourse structure of a document, and an attentive discourse-aware decoder to generate the summary.
CEDR: Contextualized Embeddings for Document Ranking
This work investigates how two pretrained contextualized language models (ELMo and BERT) can be utilized for ad-hoc document ranking and proposes a joint approach that incorporates BERT's classification vector into existing neural models and shows that it outperforms state-of-the-art ad-Hoc ranking baselines.
Depression and Self-Harm Risk Assessment in Online Forums
This work introduces a large-scale general forum dataset consisting of users with self-reported depression diagnoses matched with control users, and proposes methods for identifying posts in support communities that may indicate a risk of self-harm, and demonstrates that this approach outperforms strong previously proposed methods.
Hate speech detection: Challenges and solutions
This work identifies and examines challenges faced by online automatic approaches for hate speech detection in text, and proposes a multi-view SVM approach that achieves near state-of-the-art performance, while being simpler and producing more easily interpretable decisions than neural methods.
SMHD: a Large-Scale Resource for Exploring Online Language Usage for Multiple Mental Health Conditions
This paper investigates the creation of high-precision patterns to identify self-reported diagnoses of nine different mental health conditions, and obtains high-quality labeled data without the need for manual labelling.
Scientific Article Summarization Using Citation-Context and Article’s Discourse Structure
It is shown that the proposed summarization approach for scientific articles which takes advantage of citation-context and the document discourse model effectively improves over existing summarization approaches (greater than 30% improvement over the best performing baseline) in terms of ROUGE scores on TAC2014 scientific summarization dataset.
ADRTrace: Detecting Expected and Unexpected Adverse Drug Reactions from User Reviews on Social Media Sites
We automatically extract adverse drug reactions (ADRs) from consumer reviews provided on various drug social media sites to identify adverse reactions not reported by the United States Food and Drug
Ambiguity measure feature-selection algorithm
The ambiguity measure (AM) feature-selection algorithm, which selects the most unambiguous features from the feature set, is presented and it is shown that the training time for the SVM algorithm can be reduced, while still improving the accuracy of the text classifier.
Fusion of effective retrieval strategies in the same information retrieval system
It is shown that data fusion of highly effective retrieval strategies alone shows little or no improvement in retrieval effectiveness, and a detailed analysis of the performance of modern data fusion approaches is presented.
Disproving the fusion hypothesis: an analysis of data fusion via effective information retrieval strategies
It is demonstrated that for fusion to improve effectiveness, the result sets being fused must contain a significant number of unique relevant documents, and for this improvement to be visible, theseunique relevant documents must be highly ranked.