Hiroyuki Shinnou

Learn More
This paper describes a system which uses a deci sion tree to nd and classify names in Japanese texts The decision tree uses part of speech character type and special dictionary informa tion to determine the probability that a particu lar type of name opens or closes at a given po sition in the text The output is generated from the consistent sequence of(More)
Spectral clustering is a powerful clustering method for document data set. However, spectral clustering needs to solve an eigenvalue problem of the matrix converted from the similarity matrix corresponding to the data set. Therefore, it is not practical to use spectral clustering for a large data set. To overcome this problem, we propose the method to(More)
In this paper, we improve an unsupervised learning method using the ExpectationMaximization (EM) algorithm proposed by Nigam et al. for text classification problems in order to apply it to word sense disambiguation (WSD) problems. The improved method stops the EM algorithm at the optimum iteration number. To estimate that number, we propose two methods. In(More)
自然言語処理では個々の問題を分類問題として定式化し,帰納学習の手法を利用して,その問題を解決す るというアプローチが大きな成功をおさめている.しかしこのアプローチには帰納学習で必要とされる訓 練データを用意しなければならないという大きな問題がある.この問題に対して,近年,少量のラベル付き 訓練データから得られる分類規則の精度を,大量のラベルなし訓練データによって高めてゆく seed 型の学 習が散見される.ここではその中心的な手法である Co-training を語義判別規則に適用することを試みる. ただし Co-training では独立な 2組の素性集合を設定する必要がある.現実的にはこの独立性の条件が厳し いため,得られる規則の精度が頭打ちになってゆく.本論文ではこの問題を回避するために,追加事例の選(More)
In this paper, we describe a system that divides example sentences (data set) into clusters, based on the meaning of the target word, using a semi-supervised clustering technique. In this task, the estimation of the cluster number (the number of the meaning) is critical. Our system primarily concentrates on this aspect. First, a user assigns the system an(More)