George Hripcsak

Learn More
This paper analyzes a Question & Answer site for programmers, Stack Overflow, that dramatically improves on the utility and performance of Q&A systems for technical domains. Over 92% of Stack Overflow questions about expert topics are answered - in a median time of 11 minutes. Using a mixed methods approach that combines statistical data analysis(More)
OBJECTIVE Develop a knowledge-based representation for a controlled terminology of clinical information to facilitate creation, maintenance, and use of the terminology. DESIGN The Medical Entities Dictionary (MED) is a semantic network, based on the Unified Medical Language System (UMLS), with a directed acyclic graph to represent multiple hierarchies.(More)
Information retrieval studies that involve searching the Internet or marking phrases usually lack a well-defined number of negative cases. This prevents the use of traditional interrater reliability metrics like the kappa statistic to assess the quality of expert-generated gold standards. Such studies often quantify system performance as precision, recall,(More)
OBJECTIVE The aim of this study was to develop a method based on natural language processing (NLP) that automatically maps an entire clinical document to codes with modifiers and to quantitatively evaluate the method. METHODS An existing NLP system, MedLEE, was adapted to automatically generate codes. The method involves matching of structured output(More)
The Arden Syntax, a language designed for writing and sharing task-specific knowledge for Medical Logic Modules (MLMs), has been recently accepted as a standard by the ASTM. The syntax is concerned with the critical task of sharing medical knowledge bases across many institutions. Because of the relative lack of agreement on vocabularies and data standards(More)
Agreement measures are used frequently in reliability studies that involve categorical data. Simple measures like observed agreement and specific agreement can reveal a good deal about the sample. Chance-corrected agreement in the form of the kappa statistic is used frequently based on its correspondence to an intraclass correlation coefficient and the ease(More)
The Arden Syntax for sharing medical knowledge bases is described. Its current focus is on knowledge that is represented as a set of independent modules that can provide therapeutic suggestions, alerts, diagnosis scores, etc. The syntax is based largely upon HELP and the Regenstrief Medical Record System. Each module, called a Medical Logic Module or MLM,(More)