Learn More
Clustering is a powerful technique for large-scale topic discovery from text. It involves two phases: first, feature extraction maps each document or record to a point in high-dimensional space, then clustering algorithms automatically group the points into a hierarchy of clusters. We describe an unsupervised, near-linear time text clustering system that(More)
We present an application of kernel methods to extracting relations from unstructured natural language sources. We introduce kernels defined over shallow parse representations of text, and design efficient algorithms for computing the kernels. We use the devised kernels in conjunction with Support Vector Machine and Voted Perceptron learning algorithms for(More)
We describe one approach to build an automatically trainable anaphora resolution system. In this approach, we use Japanese newspaper articles tagged with discourse information as training examples for a machine learning algorithm which employs the C4.5 decision tree algorithm by Quin-lan (Quinlan, 1993). Then, we evaluate and compare the results of several(More)
In this paper, we report on our use of zero morphemes in Unification-Based Combinatory Categorial Grammar. After illustrating the benefits of this approach with several examples, we describe the algorithm for compiling zero morphemes into unary rules, which allows us to use zero morphemes more efficiently in natural language processing. 1 Then, we discuss(More)
We describe a scalable summarization system which takes advantage of robust NLP technology such as corpus-based statlsh-cal NLP techmques, information extrac-tmn and readily available on-hne resources The system attempts to compensate for the bottlenecks of traditional frequency-based, knowledge-based or discourse-based sum-manzatlon approaches by uhhzlng(More)
This paper discusses automatic acquisition of predicate-argument mapping information from multilingual texts. The lexicon of our NLP system abstracts the language-dependent portion of predicate-argument mapping information from the core meaning of verb senses (i.e. semantic concepts as defined in the knowledge base). We represent this mapping information in(More)