Learn More
Cross-language information retrieval (CLIR), where queries and documents are in different languages , has of late become one of the major topics within the information retrieval community. This paper proposes a Japanese/English CLIR system, where we combine a query translation and retrieval modules. We currently target the retrieval of technical documents,(More)
This paper proposes a method to analyze Japanese anaphora, in which zero pronouns (omitted obligatory cases) are used to refer to preceding entities (antecedents). Unlike the case of general coreference resolution, zero pronouns have to be detected prior to resolution because they are not expressed in discourse. Our method integrates two probability(More)
In this paper, we propose a method to extract descriptions of technical terms from Web pages in order to utilize the World Wide Web as an encyclopedia. We use linguistic patterns and HTML text structures to extract text fragments containing term descriptions. We also use a language model to discard extraneous descriptions, and a clustering method to(More)
While recent retrieval techniques do not limit the number of index terms, out-of-vocabulary (OOV) words are crucial in speech recognition. Aiming at retrieving information with spoken queries, we fill the gap between speech recognition and text retrieval in terms of the vocabulary size. Given a spoken query, we generate a transcription and detect OOV words(More)
This paper proposes methods for extracting loanwords from Cyrillic Mongolian corpora and producing a Japanese–Mongolian bilingual dictionary. We extract loanwords from Mongolian corpora using our own handcrafted rules. To complement the rule-based extraction, we also extract words in Mongolian corpora that are phonetically similar to Japanese Katakana words(More)
We propose an associative document retrieval method, in which a document is used as a query to search for other similar documents. Because a long document usually includes more than one topic, we first analyze a query document to extract multiple subtopics. For each subtopic element, a sub-query is produced and similar documents are retrieved with a(More)
Speech recognition has of late become a practical technology for real world applications. Aiming at speech-driven text retrieval, which facilitates retrieving information with spoken queries, we propose a method to integrate speech recognition and retrieval methods. Since users speak contents related to a target collection, we adapt statistical language(More)
BMIR-JP is the lirat complete Japanese test collection available for use in evaluating information retrieval systems. It contains sixty queries and the IDS of 5080 newspaper articles in the fields of economics and engineering. The queries are classified into five categories, based on the functions the system is likely to use to interpret them correctly and(More)
This paper proposes a Japanese/English cross-language information retrieval (CLIR) system targeting technical documents. Our system rst translates a given query containing technical terms into the target language, and then retrieves documents relevant to the translated query. The translation of technical terms is still problematic in that technical terms(More)