Reyyan Yeniterzi

Learn More
PURPOSE Medical records must often be stripped of patient identifiers, or de-identified, before being shared. De-identification by humans is time-consuming, and existing software is limited in its generality. The open source MITRE Identification Scrubber Toolkit (MIST) provides an environment to support rapid tailoring of automated de-identification to(More)
We present a novel scheme to apply factored phrase-based SMT to a language pair with very disparate morphological structures. Our approach relies on syntactic analysis on the source side (English) and then encodes a wide variety of local and non-local syntactic structures as complex structural tags which appear as additional factors in the training data. On(More)
OBJECTIVE De-identified medical records are critical to biomedical research. Text de-identification software exists, including "resynthesis" components that replace real identifiers with synthetic identifiers. The goal of this research is to evaluate the effectiveness and examine possible bias introduced by resynthesis on de-identification software. (More)
In this paper, we propose to model and analyze changes that occur to an entity in terms of changes in the words that co-occur with the entity over time. We propose to do an in-depth analysis of how this co-occurrence changes over time, how the change influences the state (semantic, role) of the entity, and how the change may correspond to events occurring(More)
Selecting the most relevant factors from genetic profiles that can optimally characterize cellular states is of crucial importance in identifying complex disease genes and biomarkers for disease diagnosis and assessing drug efficiency. In this paper, we present an approach using a genetic algorithm for a feature subset selection problem that can be used in(More)
We introduce a controlled natural language for biomedical queries, called BIOQUERYCNL, and present an algorithm to convert a biomedical query in this language into a program in answer set programming (ASP)—a formal framework to automate reasoning about knowledge. BIOQUERYCNL allows users to express complex queries (possibly containing nested relative(More)
Authority-based approaches are widely used in expert retrieval from social media. However, most of these approaches are applied to either topic-independent networks, or more topic-dependent networks which still contain topic-irrelevant users as nodes and interactions as edges. Therefore, authority estimation over these graphs is still not topic-specific(More)
One major difficulty in performing ad-hoc search on microblogs such as Twitter is the limited vocabulary of each document due their short length. In this paper, two approaches to addressing this issue are presented. The first is query expansion through pseudo-relevance feedback and the other is document expansion of tweets using web documents linked from(More)