Learn More
A new approach for learning Bayesian belief networks from raw data is presented The approach is based on Rissanen s Minimal Description Length MDL principle which is particularly well suited for this task Our approach does not require any prior assumptions about the distribution being learned In particular our method can learn unrestricted multiply(More)
This paper describes the functionality of MEAD, a comprehensive, public domain, open source, multidocument multilingual summarization environment that has been thus far downloaded by more than 500 organizations. MEAD has been used in a variety of summarization applications ranging from summarization for mobile devices to Web page summarization within a(More)
We present a large-scale meta evaluation of eight evaluation measures for both single-document and multi-document summarizers. To this end we built a corpus consisting of (a) 100 Million automatic summaries using six summarizers and baselines at ten summary lengths in both English and Chinese, (b) more than 10,000 manual abstracts and extracts, and (c) 200(More)
ÐThe nearest-neighbor algorithm and its derivatives have been shown to perform well for pattern classification. Despite their high classification accuracy, they suffer from high storage requirement, computational cost, and sensitivity to noise. We develop a new framework, called ICPL (Integrated Concept Prototype Learner), which integrates(More)
One common predictive modeling challenge occurs in text mining problems is that the training data and the operational (testing) data are drawn from different underlying distributions. This poses a great difficulty for many statistical learning methods. However, when the distribution in the source domain and the target domain are not identical but related,(More)
ÐWe develop an automatic text categorization approach and investigate its application to text retrieval. The categorization approach is derived from a combination of a learning paradigm known as instance-based learning and an advanced document retrieval technique known as retrieval feedback. We demonstrate the effectiveness of our categorization approach(More)
This article introduces a named entity matching model that makes use of both semantic and phonetic evidence. The matching of semantic and phonetic information is captured by a unified framework via a bipartite graph model. By considering various technical challenges of the problem, including order insensitivity and partial matching, this approach is less(More)