Learn More
Automatic headline generation is a sub-task of document summarization with many reported applications. In this study we present a sequence-prediction technique for learning how editors title their news stories. The introduced technique models the problem as a discrete optimization task in a feature-rich space. In this space the global optimum can be found(More)
Automated summarization methods can be defined as " language-independent, " if they are not based on any language-specific knowledge. Such methods can be used for multilingual summarization defined by Mani (2001) as " processing several languages, with summary in the same language as input. " In this paper , we introduce MUSE, a language-independent(More)
In this paper, we introduce DegExt, a graph-based language-independent keyphrase extractor,which extends the keyword extraction method described in [6]. We compare DegExt with two state-of-the-art approaches to keyphrase extraction: GenEx [11] and TextRank [8]. Our experiments on a collection of benchmark summaries show that DegExt outperforms TextRank and(More)
In this paper, we deal with the problem of analyzing and classifying web documents to several major categories/classes in a given domain using domain ontology. We present the ontology-based web content mining methodology that contains such main stages as collecting a training set of labeled documents from a given domain, building a classification model(More)
The increasing trend of cross-border globalization and acculturation requires text summarization techniques to work equally well for multiple languages. However, only some of the automated summarization methods can be defined as “language-independent,” i.e., not based on any language-specific knowledge. Such methods can be used for multilingual(More)
The MUSEEC (MUltilingual SEntence Extraction and Compression) summariza-tion tool implements several extractive summarization techniques – at the level of complete and compressed sentences – that can be applied, with some minor adaptations , to documents in multiple languages. The current version of MUSEEC provides the following summarization methods: (1)(More)
The trend toward the growing multi-linguality of the Internet requires text summarization techniques that work equally well in multiple languages. Only some of the automated summarization methods proposed in the literature, however , can be defined as " language-independent " , as they are not based on any morphological analysis of the summarized text. In(More)