Learn More
We present a method for creating a comparable text corpus from two document collections in different languages. The collections can be very different in origin. In this study, we build a comparable corpus from articles by a Swedish news agency and a U.S. newspaper. The keys with best resolution power were extracted from the documents of one collection, the(More)
This paper reviews literature on dictionary-based cross-language information retrieval (CLIR) and presents CLIR research done at the University of Tampere (UTA). The main problems associated with dictionary-based CLIR, as well as appropriate methods to deal with the problems are discussed. We will present the structured query model by Pirkola and report(More)
We will present a novel two-step fuzzy translation technique for cross-lingual spelling variants. In the first stage, transformation rules are applied to source words to render them more similar to their target language equivalents. The rules are generated automatically using translation dictionaries as source data. In the second stage, the intermediate(More)
We used a dictionary-based approach, and performed tests in the bilingual track with three language pairs, i. All the source languages are compound languages, i.e., languages rich in compound words. A compound word refers to a multi-word expression where the component words are written together. Our main efforts were to develop techniques for the processing(More)
Real life information retrieval takes place in sessions, where users search by iterating between various cognitive, perceptual and motor subtasks through an interactive interface. The sessions may follow diverse strategies, which, together with the interface characteristics, affect user effort (cost), experience and session effectiveness. In this paper we(More)
There is overwhelming evidence suggesting that the real users of IR systems often prefer using extremely short queries (one or two individual words) but they try out several queries if needed. Such behavior is fundamentally different from the process modeled in the traditional test collection-based IR evaluation based on using more verbose queries and only(More)
We present a deductive data model for concept-based query expansion. It ]s based on three abstraction levels: the conceptual, linguistic and occurrence levels. Concepts and relationships among them are represented at the conceptual level. The expression level represents natural language expressions for concepts. Each expression has one or more matchmg(More)
In this study the basic framework and performance analysis results are presented for the three year long development process of the dictionary-based UTACLIR system. The tests expand from bilingual CLIR for transitive translation tests are reported. The development process of the UTACLIR query translation system will be regarded from the point of view of a(More)