Retrieval-Augmented Multilingual Keyphrase Generation with Retriever-Generator Iterative Training

  title={Retrieval-Augmented Multilingual Keyphrase Generation with Retriever-Generator Iterative Training},
  author={Yifan Gao and Qingyu Yin and Zheng Li and Rui Meng and Tong Zhao and Bing Yin and Irwin King and Michael R. Lyu},
Keyphrase generation is the task of automatically predicting keyphrases given a piece of long text. Despite its recent flourish-ing, keyphrase generation on non-English languages haven’t been vastly investigated. In this paper, we call attention to a new setting named multilingual keyphrase generation and we contribute two new datasets, EcommerceMKP and AcademicMKP, covering six languages. Technically, we propose a retrieval-augmented method for multilingual keyphrase generation to mitigate the… 

Figures and Tables from this paper

Pre-trained Language Models for Keyphrase Generation: A Thorough Empirical Study

An in-depth empirical study of how in-domain PLMs can be used to build strong and data-efficient keyphrase generation models and investigates important design choices including in- domain PLMs, PLMs with different pre-training objectives, using PLM with a parameter budget, and different formulations for present keyphrases.

Fine-grained Multi-lingual Disentangled Autoencoder for Language-agnostic Representation Learning

  • Zetian WuZhongkai SunZhengyang ZhaoSixing LuChengyuan MaChenlei Guo
  • Computer Science
  • 2022
A novel Fine-grained Multilingual Disentangled Autoencoder (FMDA) is proposed to disentangle fine-graining semantic information from language-specific information in a multi-lingual setting and is capable of successfully extracting the disentangled template semantic and residual semantic representations.



Title-Guided Encoding for Keyphrase Generation

This work introduces a new model called Title-Guided Network (TG-Net) for automatic keyphrase generation task based on the encoder-decoder architecture with two new features: the title is additionally employed as a query-like input, and a title-guided encoder gathers the relevant information from the title to each word in the document.

Incorporating Linguistic Constraints into Keyphrase Generation

The parallel Seq2Seq network with the coverage attention to alleviate the overlapping phrase problem is proposed and the coverage vector is introduced to keep track of the attention history and to decide whether the parts of source text have been covered by existing generated keyphrases.

An Integrated Approach for Keyphrase Generation via Exploring the Power of Retrieval and Extraction

A new multi-task learning framework that jointly learns an extractive model and a generative model for keyphrase generation is proposed, and a neural-based merging module is proposed to combine and re-rank the predicted keyphrases from the enhancedGenerative model, the Extractive model, and the retrieved keyphRases.

Deep Keyphrase Generation

Empirical analysis on six datasets demonstrates that the proposed generative model for keyphrase prediction with an encoder-decoder framework achieves a significant performance boost on extracting keyphrases that appear in the source text, but also can generate absent keyphRases based on the semantic meaning of the text.

Select, Extract and Generate: Neural Keyphrase Generation with Layer-wise Coverage Attention

SEG-Net is proposed, a neural keyphrase generation model that is composed of a selector that selects the salient sentences in a document and an extractor-generator that jointly extracts and generates keyphrases from the selected sentences.

Semi-Supervised Learning for Neural Keyphrase Generation

  • Hai YeLu Wang
  • Computer Science
  • 2018
Experimental results show that the semi-supervised learning-based methods outperform a state-of-the-art model trained with labeled data only.

Reinforced Keyphrase Generation with BERT-based Sentence Scorer

  • R. LiuZheng LinPeng FuWeiping Wang
  • Computer Science
    2020 IEEE Intl Conf on Parallel & Distributed Processing with Applications, Big Data & Cloud Computing, Sustainable Computing & Communications, Social Computing & Networking (ISPA/BDCloud/SocialCom/SustainCom)
  • 2020
This work applies a BERT-based sentence scorer to estimate the importance of each sentence and introduces a fine-grained token-level reinforcement learning (RL) reward based on prefix-matching to enhance the training procedure to boost the performance of the model.

Keyphrase Extraction with Span-based Feature Representations

A novelty Span Keyphrase Extraction model is proposed that extracts span-based feature representation of keyphrase directly from all the content tokens and further learns to capture the interaction between keyphrases in one document to get better ranking results.

One2Set: Generating Diverse Keyphrases as a Set

This work proposes a new training paradigm One2 set, a novel model that utilizes a fixed set of learned control codes as conditions to generate a set of keyphrases in parallel, and proposes a K-step label assignment mechanism via bipartite matching, which greatly increases the diversity and reduces the repetition rate of generated keyPhrases.

Keyphrase Generation with Fine-Grained Evaluation-Guided Reinforcement Learning

A new fine-grained evaluation metric is proposed to improve the RL framework, which considers different gran-ularities: token-level F 1 score, edit distance, duplication, and prediction quantities, and can effectively ease the synonym problem and generate a higher quality prediction.