• Publications
  • Influence
Rapid similarity searches of nucleic acid and protein data banks.
  • W. Wilbur, D. Lipman
  • Biology, Medicine
  • Proceedings of the National Academy of Sciences…
  • 1 February 1983
With the development of large data banks of protein and nucleic acid sequences, the need for efficient methods of searching such banks for sequences similar to a given sequence has become evident. WeExpand
  • 1,155
  • 39
  • PDF
Overview of BioCreative II gene mention recognition
Nineteen teams presented results for the Gene Mention Task at the BioCreative II Workshop. In this task participants designed systems to identify substrings in sentences corresponding to gene nameExpand
  • 243
  • 25
  • PDF
SplicePort—An interactive splice-site analysis tool
SplicePort is a web-based tool for splice site analysis that allows the user to make splice-site predictions for submitted sequences based on these features. Expand
  • 207
  • 24
  • PDF
GENETAG: a tagged corpus for gene/protein named entity recognition
We describe the construction and annotation of GENETAG, a corpus of 20K MEDLINE® sentences for gene/protein name NER. Expand
  • 244
  • 24
BioC: a minimalist approach to interoperability for biomedical text processing
We propose an interchange data format to represent, store and exchange the data in a simple manner between different language processing systems and text mining tools is highly desirable. Expand
  • 166
  • 22
  • PDF
MedPost: a part-of-speech tagger for bioMedical text
We present a part-of-speech tagger that achieves over 97% accuracy on MEDLINE citations. Expand
  • 247
  • 19
  • PDF
PubMed related articles: a probabilistic topic-based model for content similarity
  • J. Lin, W. Wilbur
  • Computer Science, Medicine
  • BMC Bioinformatics
  • 30 October 2007
We present a probabilistic topic-based model for content similarity called pmra that underlies the related article search feature in PubMed. Expand
  • 213
  • 19
Database resources of the National Center for Biotechnology Information
In addition to maintaining the GenBank(R) nucleic acid sequence database, the National Center for Biotechnology Information (NCBI) provides data analysis and retrieval resources for the data in GenBank and other biological data made available through NCBI's website. Expand
  • 304
  • 18
  • PDF
New directions in biomedical text annotation: definitions, guidelines and corpus construction
We present our guidelines defining a text annotation task, along with annotation results from multiple independently produced annotations, demonstrating the feasibility of the task. Expand
  • 170
  • 15
Tagging gene and protein names in biomedical text
We propose to approach the detection of gene and protein names in scientific abstracts as part-of-speech tagging, the most basic form of linguistic corpus annotation. Expand
  • 342
  • 13