• Corpus ID: 16471324

Document Summarization Based on Data Reconstruction

@inproceedings{He2012DocumentSB,
  title={Document Summarization Based on Data Reconstruction},
  author={Zhanying He and Chun Chen and Jiajun Bu and C. Wang and Lijun Zhang and Deng Cai and Xiaofei He},
  booktitle={AAAI},
  year={2012}
}
Document summarization is of great value to many real world applications, such as snippets generation for search results and news headlines generation. Traditionally, document summarization is implemented by extracting sentences that cover the main topics of a document with a minimum redundancy. In this paper, we take a different perspective from data reconstruction and propose a novel framework named Document Summarization based on Data Reconstruction (DSDR). Specifically, our approach… 

Figures and Tables from this paper

Unsupervised document summarization from data reconstruction perspective
TopicDSDR: Combining Topic Decomposition and Data Reconstruction for Summarization
TLDR
A novel model that combines data reconstruction and topic decomposition to summarize the documents, named TopicDSDR, is proposed, which can not only best reconstruct the original documents but also capture the semantic similarity and main topics.
Multi-Document Summarization Based on Two-Level Sparse Representation Model
TLDR
This paper tackles the problem of extracting summary sentences from multi-document sets by applying sparse coding techniques and presents a novel framework to this challenging problem, based on the data reconstruction and sentence denoising assumption.
An Unsupervised Multi-Document Summarization Framework Based on Neural Document Model
TLDR
A document-level reconstruction framework named DocRebuild is proposed, which reconstructs the documents with summary sentences through a neural document model and selects summary sentences to minimize the reconstruction error.
Multi-Document Extractive Summarization Using Window-Based Sentence Representation
TLDR
This paper proposes a new technique, namely window-based sentence representation (WSR), to obtain the features of sentences using pre-trained word vectors, developed based on the Extreme Learning Machine (ELM).
Recent advances in document summarization
TLDR
Significant contributions made in recent years are emphasized, including progress on modern sentence extraction approaches that improve concept coverage, information diversity and content coherence, as well as attempts from summarization frameworks that integrate sentence compression, and more abstractive systems that are able to produce completely new sentences.
Automatic Document Summarization via Deep Neural Networks
TLDR
This paper proposes a new framework of document summarization via Deep Neural Networks (DNNs), feeding the sentences as the input to the visible layer of DNNs and design sentences extraction algorithm to construct the summary.
Document summarization using dictionary learning
  • Remya R. K. Menon, N. Aswathy
  • Computer Science
    2017 International Conference on Advances in Computing, Communications and Informatics (ICACCI)
  • 2017
TLDR
This paper focuses on Document summarization, using dictionary learning and sparse coding techniques, considering the ordering of sentences and redundancy of documents, and uses Singular Value Decomposition (SVD) and Orthogonal Matching Pursuit (OMP) for sparse coding.
Automatically Determining a Proper Length for Multi-Document Summarization: A Bayesian Nonparametric Approach
TLDR
A Bayesian nonparametric model for multidocument summarization is proposed in order to automatically determine the proper lengths of summaries and the ”reconstruction” of an original document can be reconstructed by a Bayesian framework which selects sentences to form a good summary.
...
1
2
3
4
5
...

References

SHOWING 1-10 OF 37 REFERENCES
Document Summarization Using Conditional Random Fields
TLDR
A Conditional Random Fields (CRF) based framework is presented to keep the merits of the above two kinds of approaches while avoiding their disadvantages and can take the outcomes of previous methods as features and seamlessly integrate them.
A compositional context sensitive multi-document summarizer: exploring the factors that influence summarization
TLDR
The research shows that a frequency based summarizer can achieve performance comparable to that of state-of-the-art systems, but only with a good composition function; context sensitivity improves performance and significantly reduces repetition.
A Hybrid Hierarchical Model for Multi-Document Summarization
TLDR
This paper forms extractive summarization as a two step learning problem building a generative model for pattern discovery and a regression model for inference based on the lexical and structural characteristics of the sentences.
Multi-document summarization via sentence-level semantic analysis and symmetric matrix factorization
TLDR
A new multi-document summarization framework based on sentence-level semantic analysis and symmetric non-negative matrix factorization is proposed, which aims to create a compressed summary while retaining the main characteristics of the original set of documents.
A Class of Submodular Functions for Document Summarization
TLDR
A class of submodular functions meant for document summarization tasks which combine two terms, one which encourages the summary to be representative of the corpus, and the other which positively rewards diversity, which means that an efficient scalable greedy optimization scheme has a constant factor guarantee of optimality.
Exploring Content Models for Multi-Document Summarization
TLDR
The final model, HierSum, utilizes a hierarchical LDA-style model (Blei et al., 2004) to represent content specificity as a hierarchy of topic vocabulary distributions and yields state-of-the-art ROUGE performance and in pairwise user evaluation strongly outperforms Toutanova et al. (2007)'s state of theart discriminative system.
The PYTHY Summarization System: Microsoft Research at DUC 2007
PYTHY is a trainable extractive summarization engine that learns a log-linear sentence ranking model by maximizing three metrics of sentence goodness: two of the metrics are based on ROUGE scores
Generic text summarization using relevance measure and latent semantic analysis
TLDR
This paper proposes two generic text summarization methods that create text summaries by ranking and extracting sentences from the original documents, and uses the latent semantic analysis technique to identify semantically important sentences, for summary creations.
Comments-oriented document summarization: understanding documents with readers' feedback
TLDR
The proposed summarization methods utilizing comments showed significant improvement over those not using comments, and the methods using feature-biased sentence extraction approach were observed to outperform that using uniform-document approach.
Multi-document Summarization Based on Cluster Using Non-negative Matrix Factorization
TLDR
A new summarization method, which uses non-negative matrix factorization (NMF) and K-means clustering, is introduced to extract meaningful sentences from multi-documents and has better performance than other methods using the LSA, the Kmeans, and the NMF.
...
1
2
3
4
...