Learn More
1 Boolean retrieval 1 2 The term vocabulary and postings lists 19 3 Dictionaries and tolerant retrieval 49 4 Index construction 67 5 Index compression 85 6 Scoring, term weighting and the vector space model 109 7 Computing scores in a complete search system 135 8 Evaluation in information retrieval 151 9 Relevance feedback and query expansion 177 10 XML(More)
In 1993, Eugene Charniak published a slim volume entitled Statistical Language Learning. At the time, empirical techniques to natural language processing were on the rise — in that year, Computational Linguistics published a special issue on such methods — and Charniak's text was the first to treat the emerging field. Nowadays, the revolution has become the(More)
This paper presents context-group discrimination, a disambiguation algorithm based on clustering. Senses are interpreted as groups (or clusters) of similar contexts of the ambiguous word. Words, contexts, and senses are represented in Word Space, a high-dimensional, real-valued space in which closeness corresponds to semantic similarity. Similarity in Word(More)
Training higher-order conditional random fields is prohibitive for huge tag sets. We present an approximated conditional random field using coarse-to-fine decoding and early updating. We show that our implementation yields fast and accurate morphological taggers across six languages with different morphological properties and that across languages(More)
We present AutoExtend, a system to learn embeddings for synsets and lexemes. It is flexible in that it can take any word embed-dings as input and does not need an additional training corpus. The synset/lexeme embeddings obtained live in the same vector space as the word embeddings. A sparse tensor formalization guarantees efficiency and parallelizability.(More)
How to model a pair of sentences is a critical issue in many natural language processing (NLP) tasks such as answer selection (AS), paraphrase identification (PI) and tex-tual entailment (TE). Most prior work (i) deals with one individual task by fine-tuning a specific system; (ii) models each sentence separately , without considering the impact of the(More)
In this paper, we compare learning techniques based on statistical classification to traditional methods of relevance feedback for the document routing problem. We consider three classification techniques which have decision rules that are derived via explicit error minimization: linear discriminant analysis, logistic regression, and neural networks. We(More)