Learn More
1 Boolean retrieval 1 2 The term vocabulary and postings lists 19 3 Dictionaries and tolerant retrieval 49 4 Index construction 67 5 Index compression 85 6 Scoring, term weighting and the vector space model 109 7 Computing scores in a complete search system 135 8 Evaluation in information retrieval 151 9 Relevance feedback and query expansion 177 10 XML(More)
In 1993, Eugene Charniak published a slim volume entitled Statistical Language Learning. At the time, empirical techniques to natural language processing were on the rise — in that year, Computational Linguistics published a special issue on such methods — and Charniak's text was the first to treat the emerging field. Nowadays, the revolution has become the(More)
This paper presents context-group discrimination, a disambiguation algorithm based on clustering. Senses are interpreted as groups (or clusters) of similar contexts of the ambiguous word. Words, contexts, and senses are represented in Word Space, a high-dimensional, real-valued space in which closeness corresponds to semantic similarity. Similarity in Word(More)
We present AutoExtend, a system to learn embeddings for synsets and lexemes. It is flexible in that it can take any word embed-dings as input and does not need an additional training corpus. The synset/lexeme embeddings obtained live in the same vector space as the word embeddings. A sparse tensor formalization guarantees efficiency and parallelizability.(More)
In this paper, we compare learning techniques based on statistical classification to traditional methods of relevance feedback for the document routing problem. We consider three classification techniques which have decision rules that are derived via explicit error minimization: linear discriminant analysis, logistic regression, and neural networks. We(More)