• Publications
  • Influence
Automated systems for the de-identification of longitudinal clinical narratives: Overview of 2014 i2b2/UTHealth shared task Track 1
TLDR
The findings indicate that automated systems can be very effective for this task, but that de-identification is not yet a solved problem. Expand
Identifying risk factors for heart disease over time: Overview of 2014 i2b2/UTHealth shared task Track 2
The second track of the 2014 i2b2/UTHealth natural language processing shared task focused on identifying medical risk factors related to Coronary Artery Disease (CAD) in the narratives ofExpand
Challenges in Synthesizing Surrogate PHI in Narrative EMRs
TLDR
This chapter discusses the challenges associated with generating realistic surrogates and describes the algorithms used to prepare the 2014 i2b2/UTHealth shared task corpus for distribution and use in a natural language processing task focused on de-identification. Expand
E-petitioning as Collective Political Action in We the People
In this study, we aim to reveal patterns of e-petition co-signing behavior that are indicative of political mobilization of online “communities” in the case of We the People (WtP), the firstExpand
Introducing textual analysis tools for policy informatics: a case study of e-petitions
TLDR
This paper introduces textual analysis tools (such as NER and topic modeling) and extracts three types of novel variables (informativeness, named entities, and 21 topics) from The authors the People petition texts and shows that informativeness, Named location, and several topics are significantly correlated with the log of the signature counts. Expand
Understanding Citizens' Direct Policy Suggestions to the Federal Government: A Natural Language Processing and Topic Modeling Approach
TLDR
The results imply that topic modeling has the potential to enable the interpretation of large quantities of citizen generated policy suggestions through a largely automated process, with potential application to research on e-participation and policy informatics. Expand
Examining political mobilization of online communities through e-petitioning behavior in We the People
This study aims to reveal patterns of e-petition co-signing behavior that are indicative of the political mobilization of online “communities”. We discuss the case of We the People, a US nationalExpand
A systematic comparison of feature space effects on disease classifier performance for phenotype identification of five diseases
TLDR
It is shown that inclusion of semantically-informed features does not statistically improve performance for these models and the addition of training data has weak effects of mixed statistical significance across disease classes suggesting larger corpora are not necessary to achieve relatively high performance with these models. Expand
This message will self‐destruct: The growing role of obscurity and self‐destructing data in digital communication
TLDR
The emergence of a new trend in social networking and communication – self-destructing data – sets the stage for obscurity to return to the information marketplace in force and signals a growing awareness of the public eye in modern social networking communication. Expand