• Publications
  • Influence
Cadec: A corpus of adverse drug event annotations
TLDR
A new rich annotated corpus of medical forum posts on patient-reported Adverse Drug Events (ADEs), which contains text that is largely written in colloquial language and often deviates from formal English grammar and punctuation rules. Expand
Evaluating topic models for digital libraries
TLDR
This large-scale user study includes over 70 human subjects evaluating and scoring almost 500 topics learned from collections from a wide range of genres and domains and shows how scoring model -- based on pointwise mutual information of word-pair using Wikipedia, Google and MEDLINE as external data sources - performs well at predicting human scores. Expand
External evaluation of topic models
TLDR
The authors' PMI score, computed using word-pair co-occurrence statistics from external data sources, has relatively good agreement with human scoring and it is shown that the ability to identify less useful topics can improve the results of a topic-based document similarity metric. Expand
Classifying microblogs for disasters
TLDR
This work addresses the issue of filtering massive amounts of Twitter data to identify high-value messages related to disasters, and to further classify disaster-related messages into those pertaining to particular disaster types, such as earthquake, flooding, fire, or storm. Expand
Location extraction from disaster-related microblogs
TLDR
This work investigates the feasibility of applying Named Entity Recognizers to extract locations from microblogs, at the level of both geo-location and point-of-interest, and shows that such tools once retrained on microblog data have great potential to detect the where information, even at the granularity of point- of-interest. Expand
Best Topic Word Selection for Topic Labelling
TLDR
This paper proposes a number of features intended to capture the best topic word, and shows that, in combination as inputs to a reranking model, they are able to consistently achieve results above the baseline of simply selecting the highest-ranked topic word. Expand
An Effective Transition-based Model for Discontinuous NER
TLDR
This work proposes a simple, effective transition-based model with generic neural encoding for discontinuous NER that can effectively recognize discontinuous mentions without sacrificing the accuracy on continuous mentions. Expand
Automatic Diagnosis Coding of Radiology Reports: A Comparison of Deep Learning and Conventional Classification Methods
TLDR
This work identifies optimal parameters for setting up a convolutional neural network for autocoding with comparable results to that of conventional methods. Expand
Concept Extraction to Identify Adverse Drug Reactions in Medical Forums: A Comparison of Algorithms
TLDR
This study is the first to systematically examine the effect of popular concept extraction methods in the area of signal detection for adverse reactions, and shows that the choice of algorithm or controlled vocabulary has a significant impact on concept extraction, which will impact the overall signal detection process. Expand
Text and Data Mining Techniques in Adverse Drug Reaction Detection
TLDR
In order to highlight the importance of contributions made by computer scientists in this area so far, the existing approaches are categorized and review, and most importantly, areas where more research should be undertaken are identified. Expand
...
1
2
3
4
5
...