Learn More
Cross-language information retrieval (CLIR), where queries and documents are in different languages , has of late become one of the major topics within the information retrieval community. This paper proposes a Japanese/English CLIR system, where we combine a query translation and retrieval modules. We currently target the retrieval of technical documents,(More)
This paper proposes a method to analyze Japanese anaphora, in which zero pronouns (omitted obligatory cases) are used to refer to preceding entities (antecedents). Unlike the case of general coreference resolution, zero pronouns have to be detected prior to resolution because they are not expressed in discourse. Our method integrates two probability(More)
In 1999, researchers extended X-ray crystallography to allow the imaging of noncrystalline specimens by measuring the X-ray diffraction pattern of a noncrystalline specimen and then directly phasing it using the oversampling method with iterative algorithms. Since then, the field has evolved moving in three important directions. The first is the 3D(More)
In this paper, we propose a method to extract descriptions of technical terms from Web pages in order to utilize the World Wide Web as an encyclopedia. We use linguistic patterns and HTML text structures to extract text fragments containing term descriptions. We also use a language model to discard extraneous descriptions, and a clustering method to(More)
Cross-language information retrieval (CLIR), where queries and documents are in different languages, needs a translation of queries and/or documents, so as to standardize both of them into a common representation. For this purpose, the use of machine translation is an effective approach. However, computational cost is prohibitive in translating large-scale(More)
While recent retrieval techniques do not limit the number of index terms, out-of-vocabulary (OOV) words are crucial in speech recognition. Aiming at retrieving information with spoken queries, we fill the gap between speech recognition and text retrieval in terms of the vocabulary size. Given a spoken query, we generate a transcription and detect OOV words(More)
This paper proposes methods for extracting loanwords from Cyrillic Mongolian corpora and producing a Japanese–Mongolian bilingual dictionary. We extract loanwords from Mongolian corpora using our own handcrafted rules. To complement the rule-based extraction, we also extract words in Mongolian corpora that are phonetically similar to Japanese Katakana words(More)
We propose an associative document retrieval method, in which a document is used as a query to search for other similar documents. Because a long document usually includes more than one topic, we first analyze a query document to extract multiple subtopics. For each subtopic element, a sub-query is produced and similar documents are retrieved with a(More)
BMIR-JP is the lirat complete Japanese test collection available for use in evaluating information retrieval systems. It contains sixty queries and the IDS of 5080 newspaper articles in the fields of economics and engineering. The queries are classified into five categories, based on the functions the system is likely to use to interpret them correctly and(More)