Learn More
OBJECTIVE De-identified medical records are critical to biomedical research. Text de-identification software exists, including "resynthesis" components that replace real identifiers with synthetic identifiers. The goal of this research is to evaluate the effectiveness and examine possible bias introduced by resynthesis on de-identification software. (More)
PURPOSE Medical records must often be stripped of patient identifiers, or de-identified, before being shared. De-identification by humans is time-consuming, and existing software is limited in its generality. The open source MITRE Identification Scrubber Toolkit (MIST) provides an environment to support rapid tailoring of automated de-identification to(More)
In this paper, we propose to model and analyze changes that occur to an entity in terms of changes in the words that co-occur with the entity over time. We propose to do an in-depth analysis of how this co-occurrence changes over time, how the change influences the state (semantic, role) of the entity, and how the change may correspond to events occurring(More)
We present a novel scheme to apply fac-tored phrase-based SMT to a language pair with very disparate morphological structures. Our approach relies on syntactic analysis on the source side (English) and then encodes a wide variety of local and non-local syntactic structures as complex structural tags which appear as additional factors in the training data.(More)
We introduce a controlled natural language for biomedical queries, called BIOQUERYCNL, and present an algorithm to convert a biomed-ical query in this language into a program in answer set programming (ASP)—a formal framework to automate reasoning about knowledge. BIOQUERYCNL allows users to express complex queries (possibly containing nested relative(More)
One major difficulty in performing ad-hoc search on mi-croblogs such as Twitter is the limited vocabulary of each document due their short length. In this paper, two approaches to addressing this issue are presented. The first is query expansion through pseudo-relevance feedback and the other is document expansion of tweets using web documents linked from(More)
Authority-based approaches are widely used in expert retrieval from social media. However, most of these approaches are applied to either topic-independent networks, or more topic-dependent networks which still contain topic-irrelevant users as nodes and interactions as edges. Therefore, authority estimation over these graphs is still not topic-specific(More)
Selecting the most relevant factors from genetic profiles that can optimally characterize cellular states is of crucial importance in identifying complex disease genes and biomarkers for disease diagnosis and assessing drug efficiency. In this paper, we present an approach using a genetic algorithm for a feature subset selection problem that can be used in(More)