Learn More
This article describes the algorithms implemented in the Essie search engine that is currently serving several Web sites at the National Library of Medicine. Essie is a phrase-based search engine with term and concept query expansion and probabilistic relevancy ranking. Essie's design is motivated by an observation that query terms are often conceptually(More)
The lack of discipline and consistency in gene naming poses a formidable challenge to researchers in locating relevant information sources in the genomics literature. The research presented here primarily focuses on how to find the MEDLINE ® citations that describe functions of particular genes. We developed new methods and extended current techniques that(More)
We conducted a study of user queries to the National Library of Medicine Web site over a three month period. Our purpose was to study the nature and scope of these queries in order to understand how to improve users' access to the information they are seeking on our site. The results show that the queries are primarily medical in content (94%), with only a(More)
The NLM LHC team approached the cohort selection task of the 2011 Medical Records Track as a question answering problem. We developed 60 training topics and then manually converted those topics to question frames. We started with the evidence-based medicine well-formed question frame and expanded it to explicitly capture temporal and causal relations. We(More)
One of the NLM experimental approaches to the 2007 Genomics track question answering task followed the track evaluation design: we attempted identifying exact answers in the form of semantic relations between biomedical entities named in questions and the potential answer types and then marked the passages containing the relations as containing the answers.(More)
This paper presents our approach to retargeting the information retrieval systems designed and/or optimized for retrieval of MEDLINE citations to the task of finding relevant passages in the text of scientific articles. To continue using our TREC 2005 fusion approach, we needed a common representation for the full text biomedical articles to be shared by(More)
ClinicalTrials.gov is a Web-based system intended for a diverse audience, including patients, family members and other members of the public. Throughout the system design and development process, our decisions have been driven by usability concerns. We first describe the overall design of the site, including the home page, which provides a site overview and(More)
The Lexical Systems Group at the National Library of Medicine (NLM) has developed a Part-of-Speech (POS) tagger to be freely distributed with the SPECIALIST NLP Tools. dTagger is specifically designed for use with the SPECIALIST lexicon but it can be used with an arbitrary tag set. It is capable of single or multi-word chunking. It is trainable with(More)
Retrieving and annotating relevant information sources in the genomics literature are difficult but common tasks undertaken by biologists. The research presented here addresses these issues by exploring methods for retrieving MEDLINE ® citations that answer real bi-ologists' information needs and by addressing the initial tasks required to annotate MED-LINE(More)