• Publications
  • Influence
The BioScope corpus: biomedical texts annotated for uncertainty, negation and their scopes
A corpus annotation project that has produced a freely available resource for research on handling negation and uncertainty in biomedical texts, which is also a good resource for the linguistic analysis of scientific and clinical texts.
The CoNLL-2010 Shared Task: Learning to Detect Hedges and their Scope in Natural Language Text
A general overview of the CoNLL-2010 Shared Task, including the annotation protocols of the training and evaluation datasets, the exact task definitions, the evaluation metrics employed and the overall results is provided.
The BioScope corpus: annotation for negation, uncertainty and their scope in biomedical texts
A corpus annotation project that has produced a freely available resource for research on handling negation and uncertainty in biomedical texts and is called the BioScope corpus, which consists of medical free texts, biological full papers and biological scientific abstracts.
What helps where – and why? Semantic relatedness for knowledge transfer
This work addresses the question of how to automatically decide which information to transfer between classes without the need of any human intervention and taps into linguistic knowledge bases to provide the semantic link between sources (what) and targets (where) of knowledge transfer.
Hedge Classification in Biomedical Texts with a Weakly Supervised Selection of Keywords
This paper demonstrates the importance of hedge classification experimentally in two real life scenarios, namely the ICD9-CM coding of radiology reports and gene name Entity Extraction from scientific texts, and develops a maxent-based solution for both the free text and scientific text processing tasks.
Cross-Genre and Cross-Domain Detection of Semantic Uncertainty
A unified subcategorization of semantic uncertainty as different domain applications can apply different uncertainty categories is introduced and the domain adaptation for training the models offer an efficient solution for cross-domain and cross-genre semantic uncertainty recognition.
The Multilingual Amazon Reviews Corpus
The use of mean absolute error (MAE) instead of classification accuracy for this task, since MAE accounts for the ordinal nature of the ratings, is proposed.
Methods and results of the Hungarian WordNet project
This paper presents a complete outline of the results of the Hungarian WordNet (HuWN) project: the construction process of the general vocabulary Hungarian WordNet ontology, its validation and
Automatic construction of rule-based ICD-9-CM coding systems
The results demonstrate that hand-crafted systems – which proved to be successful in ICD-9-CM coding – can be reproduced by replacing several laborious steps in their construction with machine learning models.
State-of-the-art anonymization of medical records using an iterative machine learning framework.
A de-identification model that can successfully remove personal health information (PHI) from discharge records to make them conform to the guidelines of the Health Information Portability and Accountability Act is developed.