Learn More
Nineteen teams presented results for the Gene Mention Task at the BioCreative II Workshop. In this task participants designed systems to identify substrings in sentences corresponding to gene name mentions. A variety of different methods were used and the results varied with a highest achieved F1 score of 0.8721. Here we present brief descriptions of all(More)
With the development of large data banks of protein and nucleic acid sequences, the need for efficient methods of searching such banks for sequences similar to a given sequence has become evident. We present an algorithm for the global comparison of sequences based on matching k-tuples of sequence elements for a fixed k. The method results in substantial(More)
We present a probabilistic topic-based model for content similarity called pmra that underlies the related article search feature in PubMed. Whether or not a document is about a particular topic is computed from term frequencies, modeled as Poisson distributions. Unlike previous probabilistic retrieval models, we do not attempt to estimate relevance–but(More)
The immense growth in the volume of research literature and experimental data in the field of molecular biology calls for efficient automatic methods to capture and store information. In recent years, several groups have worked on specific problems in this area, such as automated selection of articles pertinent to molecular biology, or automated extraction(More)
We consider the classic Stein (1965) model for stochastic neuronal firing under random synaptic input. Our treatment includes the additional effect of synaptic reversal potentials. We develop and employ two numerical methods (in addition to Monte Carlo simulations) to study the relation of the various parameters of the model to the shape of the theoretical(More)
Named entity recognition (NER) is an important first step for text mining the biomedical literature. Evaluating the performance of biomedical NER systems is impossible without a standardized test corpus. The annotation of such a corpus for gene/protein name NER is a difficult process due to the complexity of gene/protein names. We describe the construction(More)
This paper investigates the effectiveness of using MeSH® in PubMed through its automatic query expansion process: Automatic Term Mapping (ATM). We run Boolean searches based on a collection of 55 topics and about 160,000 MEDLINE® citations used in the 2006 and 2007 TREC Genomics Tracks. For each topic, we first automatically construct a query by selecting(More)
A vast amount of scientific information is encoded in natural language text, and the quantity of such text has become so great that it is no longer economically feasible to have a human as the first step in the search process. Natural language processing and text mining tools have become essential to facilitate the search for and extraction of information(More)