Learn More
Systems that extract structured information from natural language passages have been highly successful in specialized domains. The time is opportune for developing analogous applications for molecular biology and genomics. We present a system, GENIES, that extracts and structures information about cellular pathways from the biological literature in(More)
We describe a system which automatically identifies gene and protein names in journal articles, an important and non-trivial first step in knowledge extraction of protein and gene actions. Our system uses a database of gene and protein names and is based on BLAST [Altschul et al., Nucleic Acids Res. 25 (1997) 3389-3402], a popular tool for DNA and protein(More)
BACKGROUND The goal of the gene normalization task is to link genes or gene products mentioned in the literature to biological databases. This is a key step in an accurate search of the biological literature. It is a challenging task, even for the human expert; genes are often described rather than referred to by gene symbol and, confusingly, one gene name(More)
We recently proposed that competitive endogenous RNAs (ceRNAs) sequester microRNAs to regulate mRNA transcripts containing common microRNA recognition elements (MREs). However, the functional role of ceRNAs in cancer remains unknown. Loss of PTEN, a tumor suppressor regulated by ceRNA activity, frequently occurs in melanoma. Here, we report the discovery of(More)
The immense growth in the volume of research literature and experimental data in the field of molecular biology calls for efficient automatic methods to capture and store information. In recent years, several groups have worked on specific problems in this area, such as automated selection of articles pertinent to molecular biology, or automated extraction(More)
INTRODUCTION In this work, we introduce the concept of semantic role labeling to the medical domain. We report first results of porting and adapting an existing resource, Propbank, to the medical field. Propbank is an adjunct to Penn Treebank that provides semantic annotation of predicates and the roles played by their arguments. The main aim of this work(More)
We characterized the mutational landscape of melanoma, the form of skin cancer with the highest mortality rate, by sequencing the exomes of 147 melanomas. Sun-exposed melanomas had markedly more ultraviolet (UV)-like C>T somatic mutations compared to sun-shielded acral, mucosal and uveal melanomas. Among the newly identified cancer genes was PPP6C, encoding(More)
Sophisticated information technologies are needed for effective data acquisition and integration from a growing body of the biomedical literature. Successful term identification is key to getting access to the stored literature information, as it is the terms (and their relationships) that convey knowledge across scientific articles. Due to the complexities(More)
UNLABELLED Yale Image Finder (YIF) is a publicly accessible search engine featuring a new way of retrieving biomedical images and associated papers based on the text carried inside the images. Image queries can also be issued against the image caption, as well as words in the associated paper abstract and title. A typical search scenario using YIF is as(More)
MOTIVATION In order to aid in hypothesis-driven experimental gene discovery, we are designing a computer application for the automatic retrieval of signal transduction data from electronic versions of scientific publications using natural language processing (NLP) techniques, as well as for visualizing and editing representations of regulatory systems.(More)