Spectral clustering is a powerful clustering method for document data set. However, spectral clustering needs to solve an eigenvalue problem of the matrix converted from the similarity matrix corresponding to the data set. Therefore, it is not practical to use spectral clustering for a large data set. To overcome this problem, we propose the method to… (More)
This paper describes a system which u s e s a d eci-sion tree to o n d a n d classify names in Japanese texts. The d ecision tree uses part-of-speech, character type, and special dictionary information to d etermine t he probability t hat a p a r t icu-lar type of name o pens or closes at a g i v en position in the t ext. The o u tput i s g e n erated from… (More)
In this paper, we propose a new ensemble document clustering method. The novelty of our method is the use of Non-negative Matrix Factorization (NMF) in the generation phase and a weighted hypergraph in the integration phase. In our experiment, we compared our method with some clustering methods. Our method achieved the best results .
In this paper, we improve an unsuper-vised learning method using the Expectation-Maximization (EM) algorithm proposed by Nigam et al. for text classification problems in order to apply it to word sense disambigua-tion (WSD) problems. The improved method stops the EM algorithm at the optimum iteration number. To estimate that number, we propose two methods.… (More)
The inductive learning is effective for a variety of natural language processing problems. However , it needs expensive training data. The quality of learned rules often depends on the quality of training data used. In this paper, we propose a method to detect errors in training data automatically to improve the quality of the training data. We consider a… (More)
In this paper, we describe a system that divides example sentences (data set) into clusters, based on the meaning of the target word, using a semi-supervised clustering technique. In this task, the estimation of the cluster number (the number of the meaning) is critical. Our system primarily concentrates on this aspect. First, a user assigns the system an… (More)
In this paper, we propose a practical method to detect Japanese homophone errors in Japanese texts. It is very important to detect homophone errors in Japanese revision systems because Japanese texts suffer from homophone errors frequently. In order to detect ho-mophone errors, we have only to solve the homophone problem. We can use the decision list to do… (More)
In natural language processing, it is effective to convert problems to classification problems, and to solve them by an inductive learning method. However, this strategy needs labeled training data which is fairly expensive to obtain. To overcome this problem, some learning methods using unlabeled training data have been proposed. Co-training is… (More)