The information retrieval community is becoming increasingly interested in machine learning techniques, of which text catego-rization is an application. This paper describes how we have applied an existing similarity-based learning algorithm, Charade, to the text cat-egorization problem and compares the results with those obtained using decision tree… (More)
West Group participated in the non-English monolingual retrieval task for French and German. Our primary interest was to investigate whether retrieval of German or French documents was any different from the retrieval of English documents. We focused on two aspects: stemming for both languages and compound breaking for German, and studied several query… (More)
Thomson Legal and Regulatory participated in the monolingual track for all five languages and in the bilingual track with Spanish-English runs. Our monolingual runs for Dutch, Spanish and Italian use settings and rules derived from our runs in French and German last year. Our bilingual runs compared merging strategies for query translation resources.
Thomson Legal and Regulatory participated in the monolingual, the bilingual and the multilingual tracks. Our monolingual runs added Swedish to the languages we had submitted in previous participations. Our bilingual and multilingual efforts used English as the query language. We experimented with dictionaries and similarity thesauri for the bilingual task,… (More)
Thomson Legal and Regulatory participated in the CLIR task of the NTCIR-4 workshop. We submitted formal runs for monolingual retrieval in Japanese, Chinese and Korean. Our bilingual runs from Chinese and Korean to Japanese rely on English as a pivot language. During our monolingual experiments, we compared building stopword lists using query logs to… (More)
Thomson Legal and Regulatory participated in the CLIR task of the NTCIR-3 workshop. We submitted formal runs for monolingual retrieval in Japanese and Chinese, and for bilingual retrieval from English to Japanese. Our main focus was in Japanese retrieval. We compared word-based and character-based indexing , as well as query formulation using characters and… (More)
Thomson Legal and Regulatory participated in the CLEF-2004 monolingual and bilingual tracks. Monolingual experiments included Portuguese, Russian and Finnish. We investigated a new query structure to handle Finnish compounds. Our main focus was bilingual search from German to French. Our approach used query translation and post-translation pseudo-relevance… (More)