PubMed Phrases, an open set of coherent phrases for searching biomedical literature

@article{Kim2018PubMedPA,
  title={PubMed Phrases, an open set of coherent phrases for searching biomedical literature},
  author={Sun Kim and Lana Yeganova and Donald C. Comeau and W. John Wilbur and Zhiyong Lu},
  journal={Scientific Data},
  year={2018},
  volume={5}
}
In biomedicine, key concepts are often expressed by multiple words (e.g., ‘zinc finger protein’). Previous work has shown treating a sequence of words as a meaningful unit, where applicable, is not only important for human understanding but also beneficial for automatic information seeking. Here we present a collection of PubMed® Phrases that are beneficial for information retrieval and human comprehension. We define these phrases as coherent chunks that are logically connected. To collect the… 

PMCVec: Distributed phrase representation for biomedical text processing

A novel MEDLINE topic indexing method using image presentation

PubMed Author-assigned Keyword Extraction (PubMedAKE) Benchmark

Experimental results using state-of-the-art baseline methods illustrate the need for developing automatic keyword extraction methods for biomedical literature.

Clinical Phrase Mining with Language Models

Experimental results on the MIMIC-III dataset show that the proposed CliniPhrase method can outperform the current state-of-the-art techniques by up to 18% in terms of F1 measure while being very efficient (up to 48 times faster).

MeSH-based dataset for measuring the relevance of text retrieval

This work selects a suitable subset of MeSH terms as queries, and utilizes MeSH term assignments as pseudo-relevance rankings for retrieval evaluation, and uses the proposed retrieval evaluation framework to better understand how to combine heterogeneous sources of textual information.

Robust Representation Learning of Biomedical Names

The idea behind the approach is to consider and encode contextual meaning, conceptual meaning, and the similarity between synonyms during the representation learning process, resulting in high practical utility in real-world applications.

A reference set of curated biomedical data and metadata from clinical case reports

A standardized metadata template and MACCR set are developed that render CCRs more findable, accessible, interoperable, and reusable while serving as valuable resources for key user groups, including researchers, physician investigators, clinicians, data scientists, and those shaping government policies for clinical trials.

A graph-based method for reconstructing entities from coordination ellipsis in medical text

RECEEM improves concept normalization for medical coordinated elliptical expressions in a variety of biomedical corpora and outperformed existing methods and significantly enhanced the performance of 2 notable NLP systems for mapping coordination ellipses in the evaluation.

Fast searches of large collections of single cell data using scfind

Using transcriptome data from mouse cell atlases, scfind can be used to evaluate marker genes, to perform in silico gating, and to identify both cell-type specific and housekeeping genes, and a subquery optimization routine is developed to ensure that long and complex queries return meaningful results.

References

SHOWING 1-10 OF 41 REFERENCES

How to interpret PubMed queries and why it matters

An automated retrieval evaluation method is developed, based on machine learning techniques, that enables us to evaluate and compare various retrieval outcomes and shows that the class of records that contain all the search terms, but not the phrase, qualitatively differs from theclass of records containing the phrase.

Summarizing Topical Contents from PubMed Documents Using a Thematic Analysis

A method that finds sub-topics that are referred to as themes and computes representative titles based on a set of documents in each theme is proposed, which outperformed LDA and outperformed MeSH r terms.

Extracting noun phrases for all of MEDLINE

The extraction of noun phrases from MEDLINE is discussed, using a general parser not tuned specifically for any medical domain, and it is claimed that a generic parser can effectively extract all the different phrases across the entire medical literature.

Research Paper: Corpus-based Statistical Screening for Phrase Identification

Six different scoring methods that each proves effective in identifying UMLS quality phrases in a large subset of MEDLINE are found, applicable both to word pairs and word triples.

Corpus-based statistical screening for phrase identification.

  • W. KimW. Wilbur
  • Computer Science
    Journal of the American Medical Informatics Association : JAMIA
  • 2000
Statistical scoring methods provide a promising approach to the extraction of useful phrases from a natural language database for the purpose of indexing or providing hyperlinks in text.

Understanding PubMed® user search behavior through log analysis

This investigation was conducted through the analysis of one month of log data, consisting of more than 23 million user sessions and more than 58 million user queries, which provided insight into PubMed users’ needs and their behavior.

Meshable: searching PubMed abstracts by utilizing MeSH and MeSH-derived topical terms

A web interface is introduced which allows users to enter queries to find MeSH terms closely related to the queries and can be effectively used to find full names of abbreviations and to disambiguate user queries.

Retro: concept-based clustering of biomedical topical sets

Retro-a novel clustering algorithm that extracts meaningful clusters along with concise and descriptive titles from small and homogenous document collections, and is superior to existing methods in terms of quality of clusters.