Hinrich Schütze

Learn More
Introduction to Information Retrieval is the first textbook with a coherent treatment of classical and web information retrieval, including web search and the related areas of text classification and text clustering. Written from a computer science perspective, it gives an up-to-date treatment of all aspects of the design and implementation of systems for(More)
In 1993, Eugene Charniak published a slim volume entitled Statistical Language Learning. At the time, empirical techniques to natural language processing were on the rise — in that year, Computational Linguistics published a special issue on such methods — and Charniak’s text was the first to treat the emerging field. Nowadays, the revolution has become the(More)
This paper presents context-group discrimination, a disambiguation algorithm based on clustering. Senses are interpreted as groups (or clusters) of similar contexts of the ambiguous word. Words, contexts, and senses are represented in Word Space, a high-dimensional, real-valued space in which closeness corresponds to semantic similarity. Similarity in Word(More)
This paper presents a new method for computing a thesaurus from a text corpus. Each word is rep­ resented as a vector in a multi-dimensional space that captures cooccurrence information. Words are defined to be similar if they have similar cooccur­ rence patterns. Two different methods for using these thesaurus vectors in information retrieval are shown to(More)
In this paper, we compare learning techniques based on statistical classification to traditional methods of relevance feedback for the document routing problem. We consider three classification techniques which have decision rules that are derived via explicit error minimization: linear discriminant analysis, logistic regression, and neural networks. We(More)
How to model a pair of sentences is a critical issue in many natural language processing (NLP) tasks such as answer selection (AS), paraphrase identification (PI) and textual entailment (TE). Most prior work (i) deals with one individual task by fine-tuning a specific system; (ii) models each sentence separately, without considering the impact of the other(More)
MOTIVATION New high-throughput technologies have accelerated the accumulation of knowledge about genes and proteins. However, much knowledge is still stored as written natural language text. Therefore, we have developed a new method, GAPSCORE, to identify gene and protein names in text. GAPSCORE scores words based on a statistical model of gene names that(More)