Corpus ID: 226975816

A Two-Phase Approach for Abstractive Podcast Summarization

  title={A Two-Phase Approach for Abstractive Podcast Summarization},
  author={Chujie Zheng and Kunpeng Zhang and H. Wang and Ling Fan},
  • Chujie Zheng, Kunpeng Zhang, +1 author Ling Fan
  • Published 2020
  • Computer Science
  • ArXiv
  • Podcast summarization is different from summarization of other data formats, such as news, patents, and scientific papers in that podcasts are often longer, conversational, colloquial, and full of sponsorship and advertising information, which imposes great challenges for existing models. In this paper, we focus on abstractive podcast summarization and propose a two-phase approach: sentence selection and seq2seq learning. Specifically, we first select important sentences from the noisy long… CONTINUE READING

    Figures and Tables from this paper


    A Baseline Analysis for Podcast Abstractive Summarization
    • 2
    • PDF
    PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization
    • 113
    • PDF
    The Spotify Podcasts Dataset
    • 2
    Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer
    • 896
    • PDF
    ProphetNet: Predicting Future N-gram for Sequence-to-Sequence Pre-training
    • 42
    • PDF
    100, 000 Podcasts: A Spoken English Document Corpus
    • 2
    • PDF
    Longformer: The Long-Document Transformer
    • 152
    • PDF
    ROUGE: A Package for Automatic Evaluation of Summaries
    • 4,965
    • PDF
    BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension
    • 479
    • PDF
    Attention is All you Need
    • 15,791
    • PDF