Learn More
With increasingly higher numbers of non­English language web searchers the problems of efficient handling of non­English Web documents and user queries are becoming major issues for search engines. The main aim of this review paper 1 is to make researchers aware of the existing problems in monolingual non­English Web retrieval by providing an overview of(More)
This paper deals with the application of natural language processing techniques to the field of information retrieval. To be precise , we propose the application of morphological families for single term conflation in order to reduce the linguistic variety of indexed documents written in Spanish. A system for automatic generation of morphological families(More)
This article presents two new approaches for term indexing which are particularly appropriate for languages with a rich lexis and morphology, such as Spanish, and need few resources to be applied. At word level, productive derivational morphology is used to conflate semantically related words. At sentence level, an approximate grammar is used to conflate(More)
In recent years, there has been a considerable amount of interest in using Natural Language Processing in Information Retrieval research, with speciic implementations varying from the word-level morphological analysis to syntactic parsing to conceptual-level semantic analysis. In particular, diierent degrees of phrase-level syntactic information have been(More)
In this our first participation in CLEF, we have applied Natural Language Processing techniques for single word and multi-word term conflation. We have tested several approaches at different levels of text processing in our experiments: firstly, we have lemmatized the text to avoid inflectional variation; secondly, we have expanded the queries through(More)
This paper describes our participation at RepLab 2014, a competitive evaluation for reputation monitoring on Twitter. The following tasks were addressed: (1) categorisation of tweets with respect to standard reputation dimensions and (2) characterisation of Twitter profiles, which includes: (2.1) identifying the type of those profiles, such as journalist or(More)
This article describes the application of lemmatization and shallow parsing as a linguistically-based alternative to stemming in Text Retrieval, with the aim of managing linguistic variation at both word level and phrase level. Several alternatives for selecting the index terms among the syntactic dependencies detected by the parser are evaluated. Though(More)
In this paper we consider a set of natural language processing techniques that can be used to analyze large amounts of texts, focusing on the advanced tokenizer which accounts for a number of complex linguistic phenomena, as well as for pre-tagging tasks such as proper noun recognition. We also show the results of several experiments performed in order to(More)
This work intends to capture the concept of similarity between phrases. The algorithm is based on a dynamic programming approach integrating both the edit distance between parse trees and single-term similarity. Our work stresses the use of the underlying grammatical structure, which serves as a guide in the computation of semantic similarity between words.(More)