Juan-Manuel Torres-Moreno

Learn More
This paper presents two corpora produced within the RPM2 project: a multi-document summarization corpus and a sentence compression corpus. Both corpora are in French. The first one is the only one we know in this language. It contains 20 topics with 20 documents each. A first set of 10 documents per topic is summarized and then the second set is used to(More)
Resumen: Hoy en día el análisis discursivo automático es un tema de investigación relevante. Sin embargo, no existen analizadores del discurso para textos en español. El primer paso para desarrollar esta herramienta es la segmentación discursiva. En este artículo presentamos DiSeg, el primer segmentador discursivo para el español que utiliza el marco de la(More)
We present SMMR, a scalable sentence scoring method for query-oriented update summarization. Sentences are scored thanks to a criterion combining query relevance and dissimilarity with already read documents (history). As the amount of data in history increases, non-redundancy is prioritized over query-relevance. We show that SMMR achieves promising results(More)
We study correlation of rankings of text summarization systems using evaluation methods with and without human models. We apply our comparison framework to various well-established content-based evaluation measures in text sum-marization such as coverage, Responsiveness , Pyramids and ROUGE studying their associations in various text summarization tasks(More)
Since information in electronic form is already a standard, and that the variety and the quantity of information become increasingly large, the methods of summarizing or automatic condensation of texts is a critical phase of the analysis of texts. This article describes Cortex a system based on numerical methods, which allows obtaining a condensation of a(More)
In this article we present the RST Spanish Treebank, the first corpus annotated with rhetorical relations for this language. We describe the characteristics of the corpus, the annotation criteria, the annotation procedure, the inter-annotator agreement, and other related aspects. Moreover, we show the interface that we have developed to carry out searches(More)
Availability of labeled language resources, such as annotated corpora and domain dependent labeled language resources is crucial for experiments in the field of Natural Language Processing. Most often, due to lack of resources, manual verification and annotation of electronic text material is a prerequisite for the development of NLP tools. In the context(More)
The evolution of the job market has resulted in traditional methods of recruitment becoming insucient. As it is now necessary to handle volumes of information (mostly in the form of free text) that are impossible to process manually, an analysis and assisted categorization are essential to address this issue. In this paper, we present a combination of the(More)