Overview of the TREC 2012 Medical Records Track


The TREC Medical Records track fosters research that allows electronic health records to be retrieved based on the semantic content of free-text fields. The ability to find records by matching semantic content will enhance clinical care and support the secondary use of medical records in clinical trials and epidemiological studies. TREC 2012 is the sophomore year of the track, which attracted 24 participating research groups. The track repeated the cohort-finding task from its initial year. This task is an ad hoc search task in which systems search a set of de-identified clinical reports to identify cohorts for (possible) clinical studies. A topic statement for the task describes the criteria for inclusion in a study, and a system returns a list of “visits” ordered by the likelihood that the inclusion criteria are satisfied. Physicians created fifty topics and performed relevance judgments for the track. Top-performing groups each used some sort of vocabulary normalization device specific to the medical domain, supporting the hypothesis that language use within electronic health records is sufficiently different from general use to warrant domain-specific processing. Such devices must be used carefully, however, as multiple groups also demonstrated that aggressive use harms baseline performance. Exploiting human expertise through manual query construction proved most effective. Today’s electronic health record (EHR) systems generally provide access to records based on structured fields, data elements in the record that have been coded to allow effective access. Yet the majority of the content of a record is often in the provider’s notes and other free-text fields that are not so structured. Free-text allows providers to express nuance and exceptional circumstances that are precluded—by definition—from being captured in coded fields. Thus EHR system ease-of-use and record quality concerns argue for the continuing use of free-text, provided that that content can be effectively searched. The TREC Medical Records track was established to focus a research community on the problem of enabling content-based access to the free-text fields of EHRs and to build the infrastructure necessary for such research. 1 The Medical Records Track Task The lack of sharable test corpora has been cited as a major impediment to progress in applying natural language processing techniques to clinical text[1]. The TREC Medical Records track looks to help fill this void in the face of pragmatic concerns that constrain what can be done. Due to the sensitive nature of medical records, data constraints are the overarching factor for the Medical Records track. This section first describes the data set used in the track and then motivates the retrieval task.

Extracted Key Phrases

5 Figures and Tables

Citations per Year

81 Citations

Semantic Scholar estimates that this publication has 81 citations based on the available data.

See our FAQ for additional information.

Cite this paper

@inproceedings{Voorhees2012OverviewOT, title={Overview of the TREC 2012 Medical Records Track}, author={Ellen M. Voorhees and William R. Hersh}, booktitle={TREC}, year={2012} }