Learn More
A new approach for learning Bayesian belief networks from raw data is presented. The approach is based on Rissanen's Minimal Description Length (MDL) principle, which is particularly well suited for this task. Our approach does not require any prior assumptions about the distribution being learned. In particular, our method can learn unrestricted(More)
This paper describes the functionality of MEAD, a comprehensive, public domain, open source, multidocument multilingual summariza-tion environment that has been thus far downloaded by more than 500 organizations. MEAD has been used in a variety of summarization applications ranging from summarization for mobile devices to Web page summarization within a(More)
One common predictive modeling challenge occurs in text mining problems is that the training data and the operational (testing) data are drawn from different underlying distributions. This poses a great difficulty for many statistical learning methods. However, when the distribution in the source domain and the target domain are not identical but related,(More)
We explore the issue of refining an exis­ tent Bayesian network structure using new data which might mention only a subset of the variables. Most previous works have only considered the refinement of the net­ work's conditional probability parameters, and have not addressed the issue of refi n­ ing the network's structure. We develop a new approach for(More)
We present a large-scale meta evaluation of eight evaluation measures for both single-document and multi-document summarizers. To this end we built a corpus consisting of (a) 100 Million automatic summaries using six summarizers and baselines at ten summary lengths in both English and Chinese, (b) more than 10,000 manual abstracts and extracts, and (c) 200(More)
In previous work we developed a method of learning Bayesian Network models from raw data. This method relies on the well known minimal description length (MDL) principle. The MDL principle is particularly well suited to this task as it allows us to tradeoo, in a principled way, the accuracy of the learned network against its practical usefulness. In this(More)
As a microblogging service, Twitter is playing a more and more important role in our life. Users follow various accounts , such as friends or celebrities, to get the most recent information. However, as one follows more and more people, he/she may be overwhelmed by the huge amount of status updates. Twitter messages are only displayed by time recency, which(More)
We propose an abstraction-based multi-document summarization framework that can construct new sentences by exploring more fine-grained syntactic units than sentences , namely, noun/verb phrases. Different from existing abstraction-based approaches , our method first constructs a pool of concepts and facts represented by phrases from the input documents.(More)