Data Set Used
The information retrieval community is becoming increasingly interested in machine learning techniques, of which text catego-rization is an application. This paper describes how we have applied an existing similarity-based learning algorithm, Charade, to the text cat-egorization problem and compares the results with those obtained using decision tree… (More)
West Group participated in the non-English monolingual retrieval task for French and German. Our primary interest was to investigate whether retrieval of German or French documents was any different from the retrieval of English documents. We focused on two aspects: stemming for both languages and compound breaking for German, and studied several query… (More)
For the past few years, text categorization has emerged as an application domain to machine learning techniques. Several approaches have already been proposed. This paper does not present yet another technique. It is rather an attempt to unify the approaches encountered so far. Moreover this state-of-the-art enables us to stress a shortcoming in earlier… (More)
Thomson Legal and Regulatory participated in the monolingual track for all five languages and in the bilingual track with Spanish-English runs. Our monolingual runs for Dutch, Spanish and Italian use settings and rules derived from our runs in French and German last year. Our bilingual runs compared merging strategies for query translation resources.
Thomson Legal and Regulatory participated in the monolingual, the bilingual and the multilingual tracks. Our monolingual runs added Swedish to the languages we had submitted in previous participations. Our bilingual and multilingual efforts used English as the query language. We experimented with dictionaries and similarity thesauri for the bilingual task,… (More)
Thomson Legal and Regulatory participated in the CLIR task of the NTCIR-4 workshop. We submitted formal runs for monolingual retrieval in Japanese, Chinese and Korean. Our bilingual runs from Chinese and Korean to Japanese rely on English as a pivot language. During our monolingual experiments, we compared building stopword lists using query logs to… (More)
Thomson Legal and Regulatory participated in the CLIR task of the NTCIR-3 workshop. We submitted formal runs for monolingual retrieval in Japanese and Chinese, and for bilingual retrieval from English to Japanese. Our main focus was in Japanese retrieval. We compared word-based and character-based indexing , as well as query formulation using characters and… (More)