Learn More
Automated summarization methods can be defined as " language-independent, " if they are not based on any language-specific knowledge. Such methods can be used for multilingual summarization defined by Mani (2001) as " processing several languages, with summary in the same language as input. " In this paper , we introduce MUSE, a language-independent(More)
In this paper, we deal with the problem of analyzing and classifying web documents to several major categories/classes in a given domain using domain ontology. We present the ontology-based web content mining methodology that contains such main stages as collecting a training set of labeled documents from a given domain, building a classification model(More)
The increasing trend of cross-border globalization and acculturation requires text summarization techniques to work equally well for multiple languages. However, only some of the automated summarization methods can be defined as “language-independent,” i.e., not based on any language-specific knowledge. Such methods can be used for multilingual(More)
Automatic headline generation is a sub-task of document summarization with many reported applications. In this study we present a sequence-prediction technique for learning how editors title their news stories. The introduced technique models the problem as a discrete optimization task in a feature-rich space. In this space the global optimum can be found(More)
Various news sites exist today where internet audience can read the most recent news and see what other people think about. Most sites do not organize comments well and do not filter irrelevant content. Due to this limitation, readers who are interested to know other people's opinion regarding any specific topic, have to manually follow relevant comments,(More)
In this paper, we introduce DegExt, a graph-based language-independent keyphrase extractor,which extends the keyword extraction method described in [6]. We compare DegExt with two state-of-the-art approaches to keyphrase extraction: GenEx [11] and TextRank [8]. Our experiments on a collection of benchmark summaries show that DegExt outperforms TextRank and(More)
In this paper, we deal with the problem of analyzing and classifying web documents in a given domain by information filtering agents. We present the ontology-based web content mining methodology that contains such main stages as creation of ontology for the specified domain, collecting a training set of labeled documents, building a classification model in(More)
The MUSEEC (MUltilingual SEntence Extraction and Compression) summariza-tion tool implements several extractive summarization techniques – at the level of complete and compressed sentences – that can be applied, with some minor adaptations , to documents in multiple languages. The current version of MUSEEC provides the following summarization methods: (1)(More)