Dina Vishnyakova

Learn More
We report the Gene Normalization (GN) challenge in BioCreative III where participating teams were asked to return a ranked list of identifiers of the genes detected in full-text articles. For training, 32 fully and 500 partially annotated articles were prepared. A total of 507 articles were selected as the test set. Due to the high annotation cost, it was(More)
Biomedical professionals have access to a huge amount of literature, but when they use a search engine, they often have to deal with too many documents to efficiently find the appropriate information in a reasonable time. In this perspective, question-answering (QA) engines are designed to display answers, which were automatically extracted from the(More)
The available curated data lag behind current biological knowledge contained in the literature. Text mining can assist biologists and curators to locate and access this knowledge, for instance by characterizing the functional profile of publications. Gene Ontology (GO) category assignment in free text already supports various applications, such as powering(More)
In the intellectual property field two tasks are of high relevance: prior art searching and patent classification. Prior art search is fundamental for many strategic issues such as patent granting, freedom to operate and opposition. Accurate classification of patent documents according to the IPC code system is vital for the interoperability between(More)
For two years, the TREC Chemical Track aims at evaluating participant systems in chemical patent searching. In 2010, it continued with the two tasks from 2009: Prior Art search (PA) and Technology Survey (TS). The BiTeM group participated in both tasks and obtained satisfactory results, relying on a large panel of strategies which were evaluated within the(More)
For the third year, the BiTeM group participated in the TREC Chemical IR Track. For this campaign, we applied strategies that already showed their effectiveness, as the Citations Feedback, which takes benefit from the citations of the retrieved documents in order to re-arrange the ranking. But we also investigated a new inter-lingua model built with(More)
UNLABELLED Gene function curation of the literature with Gene Ontology (GO) concepts is one particularly time-consuming task in genomics, and the help from bioinformatics is highly requested to keep up with the flow of publications. In 2004, the first BioCreative challenge already designed a task of automatic GO concepts assignment from a full text. At this(More)
We report on the original integration of an automatic text categorization pipeline, so-called ToxiCat (Toxicogenomic Categorizer), that we developed to perform biomedical documents classification and prioritization in order to speed up the curation of the Comparative Toxicogenomics Database (CTD). The task can be basically described as a binary(More)
The BiTeM group participated in the first TREC Medical Records Track in 2011 relying on a strong background in medical records processing and medical terminologies. For this campaign, we submitted a baseline run, computed with a simple free-text index in the Terrier platform, which achieved fair results (0.468 for P10). We also performed automatic text(More)