Learn More
Biology has now become an information science, and researchers are increasingly dependent on expert-curated biological databases to organize the findings from the published literature. We report here on a series of experiments related to the application of natural language processing to aid in the curation process for FlyBase. We focused on listing the(More)
BACKGROUND Our goal in BioCreAtIve has been to assess the state of the art in text mining, with emphasis on applications that reflect real biological applications, e.g., the curation process for model organism databases. This paper summarizes the BioCreAtIvE task 1B, the "Normalized Gene List" task, which was inspired by the gene list supplied for each(More)
Machine-learning based entity extraction requires a large corpus of annotated training to achieve acceptable results. However, the cost of expert annotation of relevant data, coupled with issues of inter-annotator variability, makes it expensive and time-consuming to create the necessary corpora. We report here on a simple method for the automatic creation(More)
Most C. elegans sensory neuron types consist of a single bilateral pair of neurons, and respond to a unique set of sensory stimuli. Although genes required for the development and function of individual sensory neuron types have been identified in forward genetic screens, these approaches are unlikely to identify genes that when mutated result in subtle or(More)
Neuronal identities are specified by the combinatorial functions of activators and repressors of gene expression. Members of the well-conserved Olf/EBF (O/E) transcription factor family have been shown to play important roles in neuronal and non-neuronal development and differentiation. O/E proteins are highly expressed in the olfactory epithelium, and O/E(More)
BACKGROUND The biological research literature is a major repository of knowledge. As the amount of literature increases, it will get harder to find the information of interest on a particular topic. There has been an increasing amount of work on text mining this literature, but comparing this work is hard because of a lack of standards for making(More)
BACKGROUND We prepared and evaluated training and test materials for an assessment of text mining methods in molecular biology. The goal of the assessment was to evaluate the ability of automated systems to generate a list of unique gene identifiers from PubMed abstracts for the three model organisms Fly, Mouse, and Yeast. This paper describes the(More)
We have developed a challenge task for the second BioCreAtIvE (Critical Assessment of Information Extraction in Biology) that requires participating systems to provide lists of the EntrezGene (formerly LocusLink) identifiers for all human genes and proteins mentioned in a MEDLINE abstract. We are distributing 281 annotated abstracts and another 5,000(More)