Chinatsu Aone

Learn More
Clustering is a powerful technique for large-scale topic discovery from text. It involves two phases: first, feature extraction maps each document or record to a point in high-dimensional space, then clustering algorithms automatically group the points into a hierarchy of clusters. We describe an unsupervised, near-linear time text clustering system that(More)
We present an application of kernel methods to extracting relations from unstructured natural language sources. We introduce kernels defined over shallow parse representations of text, and design efficient algorithms for computing the kernels. We use the devised kernels in conjunction with Support Vector Machine and Voted Perceptron learning algorithms for(More)
We describe one approach to build an automatically trainable anaphora resolution system. In this approach, we use Japanese newspaper articles tagged with discourse information as training examples for a machine learning algorithm which employs the C4.5 decision tree algorithm by Quinlan (Quinlan, 1993). Then, we evaluate and compare the results of several(More)
We describe a trainable and scalable summarization system which utilizes features derived from information retrieval, inibrmation extraction, and NLP techniques and on-line resources. The system con> bines these features using a trainable feature combiner learned from summary examples through a machine learning algorithm. We demonstrate system scalability(More)
We describe a scalable summarization system which takes advantage of robust NLP technology such as corpus-based statlshcal NLP techmques, information extractmn and readily available on-hne resources The system attempts to compensate for the bottlenecks of traditional frequency-based, knowledge-based or discourse-based summanzatlon approaches by uhhzlng(More)