A Corpus with Multi-Level Annotations of Patients, Interventions and Outcomes to Support Language Processing for Medical Literature

@article{Nye2018ACW,
  title={A Corpus with Multi-Level Annotations of Patients, Interventions and Outcomes to Support Language Processing for Medical Literature},
  author={Benjamin E. Nye and Junyi Jessy Li and Roma Patel and Yinfei Yang and Iain James Marshall and Ani Nenkova and Byron C. Wallace},
  journal={Proceedings of the conference. Association for Computational Linguistics. Meeting},
  year={2018},
  volume={2018},
  pages={
          197-207
        }
}
We present a corpus of 5,000 richly annotated abstracts of medical articles describing clinical randomized controlled trials. Annotations include demarcations of text spans that describe the Patient population enrolled, the Interventions studied and to what they were Compared, and the Outcomes measured (the ‘PICO’ elements). These spans are further annotated at a more granular level, e.g., individual interventions within them are marked and mapped onto a structured medical vocabulary. We… 
A clinical trials corpus annotated with UMLS entities to enhance the access to evidence-based medicine
TLDR
This resource is adequate for experiments with state-of-the-art approaches to biomedical named entity recognition and is generalizable to other languages with similar available sources.
Effective Crowd-Annotation of Participants, Interventions, and Outcomes in the Text of Clinical Trial Reports
TLDR
The Cohen’s Kappa agreement between crowd-annotations and gold standard annotations is computed and it is shown that both sentence-based approaches outperform a Baseline approach where entire abstracts are annotated and supporting annotators with tailored task-instance examples is the best performing approach.
Trialstreamer: Mapping and Browsing Medical Evidence in Real-Time
TLDR
Trialstreamer, a living database of clinical trial reports, is introduced, with the evidence extraction component described, which extracts from biomedical abstracts key pieces of information that clinicians need when appraising the literature, and also the relations between these.
A manual corpus of annotated main findings of clinical case reports
TLDR
It is envisioned that case reports in PubMed may be automatically indexed by main finding, so that users can carry out information queries for specific main findings (rather than general topics)—and given one case report, a user can retrieve those having the most similar main findings.
What Does the Evidence Say? Models to Help Make Sense of the Biomedical Literature
TLDR
Work is highlighted on developing tasks, corpora, and models to support semi-automated evidence retrieval and extraction that can consume articles describing clinical trials and automatically extract from these key clinical variables and findings, and estimate their reliability.
PICO Entity Extraction For Preclinical Animal Literature
TLDR
BERT pre-trained on PubMed abstracts is the best for both PICO sentence classification and PICO entity recognition in the preclinical abstracts, and self-training yields better performance for identifying comparators and strains.
Data Mining in Clinical Trial Text: Transformers for Classification and Question Answering Tasks
TLDR
This paper contributes to solving problems related to ambiguity in PICO sentence prediction tasks, as well as highlighting how annotations for training named entity recognition systems are used to train a high-performing, but nevertheless flexible architecture for question answering in systematic review automation.
Understanding Clinical Trial Reports: Extracting Medical Entities and Their Relations
TLDR
This work considers the end-to-end task of extracting treatments and outcomes from full-text articles describing clinical trials and inferring the reported results for the former with respect to the latter, and proposes a new method motivated by how trial results are typically presented that outperforms these purely data-driven baselines.
Automated tabulation of clinical trial results: A joint entity and relation extraction approach with transformer-based language representations
TLDR
This paper investigates automating evidence table generation by decomposing the problem across two language processing tasks: named entity recognition, which identifies key entities within text, such as drug names, and relation extraction, which maps their relationships for separating them into ordered tuples.
Assessment of contextualised representations in detecting outcome phrases in clinical trials
TLDR
A consensus is reached on which contextualised representations are best suited for detecting outcome phrases from clinical trial abstracts, and the best model outperforms scores published on the original EBM-NLP dataset leader-board scores.
...
1
2
3
4
5
...

References

SHOWING 1-10 OF 40 REFERENCES
Extracting PICO Sentences from Clinical Trial Reports using Supervised Distant Supervision
TLDR
A novel method is proposed that uses a small amount of direct supervision to better exploit a large corpus of distantly labeled instances by learning to pseudo-annotate articles using the available DS and it is shown that this approach tends to outperform existing methods with respect to automated PICO extraction.
Evaluation of PICO as a Knowledge Representation for Clinical Questions
TLDR
The PICO framework is primarily centered on therapy questions, and is less suitable for representing other types of clinical information needs, and its value as a tool to assist physicians practicing EBM is reaffirmed.
Development of a Corpus for Evidence Based Medicine Summarisation
TLDR
A corpus for the development of multi-document query-focused summarisation as a key approach to solve the key NLP-related problems related to the practice of Evidence Based Medicine is completed.
Positional Language Models for Clinical Information Retrieval
TLDR
An analysis of the distribution of PECO elements throughout the relevant documents is described and a language modeling approach that uses these distributions as a weighting strategy is proposed that leads to an improvement in MAP and P@5, as compared to the state-of-the-art method.
Answering Clinical Questions with Knowledge-Based and Statistical Techniques
TLDR
A series of knowledge extractors are developed, which employ a combination of knowledge-based and statistical techniques, for automatically identifying clinically relevant aspects of MEDLINE abstracts, and which significantly outperforms the already competitive PubMed baseline.
Automatic Summarization of Results from Clinical Trials
TLDR
A novel method for automatically creating EBM-oriented summaries from research abstracts of randomly-controlled trials (RCTs) is presented, which extracts descriptions of the treatment groups and outcomes, as well as various associated quantities, and then calculates summary statistics.
Sentence retrieval for abstracts of randomized controlled trials
  • G. Chung
  • Medicine
    BMC Medical Informatics Decis. Mak.
  • 2009
TLDR
Results indicate that some of the methodological elements of RCTs are identifiable at the sentence level in both structured and unstructured abstract reports, which is promising in that sentences labeled automatically could potentially form concise summaries, assist in information retrieval and finer-grained extraction.
A Statistical Relational Learning Approach to Identifying Evidence Based Medicine Categories
TLDR
This paper presents an approach to automatically annotate sentences in medical abstracts with these labels using kLog, a new language for statistical relational learning with kernels, and shows a clear improvement with respect to state-of-the-art systems.
ExaCT: automatic extraction of clinical trial characteristics from journal publications
TLDR
An automatic information extraction system that assists users with locating and extracting key trial characteristics from full-text journal articles reporting on randomized controlled trials (RCTs) and can be extended to handle other characteristics and document types.
A corpus of potentially contradictory research claims from cardiovascular research abstracts
TLDR
A methodology for constructing a corpus containing contradictory research claims from the biomedical literature is described and the corpus is made available to enable further research into this area and support the development of automated approaches to contradiction identification.
...
1
2
3
4
...