Retrieval-Augmented Multilingual Keyphrase Generation with Retriever-Generator Iterative Training
@article{Gao2022RetrievalAugmentedMK,
title={Retrieval-Augmented Multilingual Keyphrase Generation with Retriever-Generator Iterative Training},
author={Yifan Gao and Qingyu Yin and Zheng Li and Rui Meng and Tong Zhao and Bing Yin and Irwin King and Michael R. Lyu},
journal={ArXiv},
year={2022},
volume={abs/2205.10471}
}Keyphrase generation is the task of automatically predicting keyphrases given a piece of long text. Despite its recent flourishing, keyphrase generation on non-English languages haven't been vastly investigated. In this paper, we call attention to a new setting named multilingual keyphrase generation and we contribute two new datasets, EcommerceMKP and AcademicMKP, covering six languages. Technically, we propose a retrieval-augmented method for multilingual keyphrase generation to mitigate the…
Figures and Tables from this paper
4 Citations
KPEval: Towards Fine-grained Semantic-based Evaluation of Keyphrase Extraction and Generation Systems
- 2023
Computer Science
ArXiv
This work proposes a comprehensive evaluation framework consisting of six critical dimensions: naturalness, faithfulness, saliency, coverage, diversity, and utility, and finds that the best model differs in different dimensions, with pre-trained language models achieving the best in most dimensions.
General-to-Specific Transfer Labeling for Domain Adaptable Keyphrase Generation
- 2022
Computer Science
ArXiv
A three-stage pipeline is proposed, which gradually guides KPG models' learning focus from general syntactical features to domain-related semantics, in a data-efficient manner, and can produce good-quality keyphrases in new domains and achieve consistent improvements after adaptation with limited in-domain annotated data.
Pre-trained Language Models for Keyphrase Generation: A Thorough Empirical Study
- 2022
Computer Science
ArXiv
An in-depth empirical study of how in-domain BERT-like PLMs and pre-trained language models compare and how different design choices can affect the performance of PLM-based models shows that PLMs have competitive high-resource performance and state-of-the-art low- resource performance.
Fine-grained Multi-lingual Disentangled Autoencoder for Language-agnostic Representation Learning
- 2022
Computer Science
MMNLU
A novel Fine-grained Multilingual Disentangled Autoencoder (FMDA) is proposed to disentangle fine-graining semantic information from language-specific information in a multi-lingual setting and is capable of successfully extracting the disentangled template semantic and residual semantic representations.
60 References
Title-Guided Encoding for Keyphrase Generation
- 2019
Computer Science
AAAI
This work introduces a new model called Title-Guided Network (TG-Net) for automatic keyphrase generation task based on the encoder-decoder architecture with two new features: the title is additionally employed as a query-like input, and a title-guided encoder gathers the relevant information from the title to each word in the document.
An Integrated Approach for Keyphrase Generation via Exploring the Power of Retrieval and Extraction
- 2019
Computer Science
NAACL
A new multi-task learning framework that jointly learns an extractive model and a generative model for keyphrase generation is proposed, and a neural-based merging module is proposed to combine and re-rank the predicted keyphrases from the enhancedGenerative model, the Extractive model, and the retrieved keyphRases.
Incorporating Linguistic Constraints into Keyphrase Generation
- 2019
Computer Science
ACL
The parallel Seq2Seq network with the coverage attention to alleviate the overlapping phrase problem is proposed and the coverage vector is introduced to keep track of the attention history and to decide whether the parts of source text have been covered by existing generated keyphrases.
Deep Keyphrase Generation
- 2017
Computer Science
ACL
Empirical analysis on six datasets demonstrates that the proposed generative model for keyphrase prediction with an encoder-decoder framework achieves a significant performance boost on extracting keyphrases that appear in the source text, but also can generate absent keyphRases based on the semantic meaning of the text.
Select, Extract and Generate: Neural Keyphrase Generation with Layer-wise Coverage Attention
- 2021
Computer Science
ACL
SEG-Net is proposed, a neural keyphrase generation model that is composed of a selector that selects the salient sentences in a document and an extractor-generator that jointly extracts and generates keyphrases from the selected sentences.
Structure-Augmented Keyphrase Generation
- 2021
Computer Science
EMNLP
The empirical results validate that the proposed structure augmentation and augmentation-aware encoding/decoding can improve KG for both scenarios, outperforming the state-of-the-art.
Semi-Supervised Learning for Neural Keyphrase Generation
- 2018
Computer Science
EMNLP
Experimental results show that the semi-supervised learning-based methods outperform a state-of-the-art model trained with labeled data only.
Reinforced Keyphrase Generation with BERT-based Sentence Scorer
- 2020
Computer Science
2020 IEEE Intl Conf on Parallel & Distributed Processing with Applications, Big Data & Cloud Computing, Sustainable Computing & Communications, Social Computing & Networking (ISPA/BDCloud/SocialCom/SustainCom)
This work applies a BERT-based sentence scorer to estimate the importance of each sentence and introduces a fine-grained token-level reinforcement learning (RL) reward based on prefix-matching to enhance the training procedure to boost the performance of the model.
Keyphrase Extraction with Span-based Feature Representations
- 2020
Computer Science
ArXiv
A novelty Span Keyphrase Extraction model is proposed that extracts span-based feature representation of keyphrase directly from all the content tokens and further learns to capture the interaction between keyphrases in one document to get better ranking results.
Automatic Keyphrase Extraction by Bridging Vocabulary Gap
- 2011
Computer Science
CoNLL
The method is considered that a document and its keyphrases both describe the same object but are written in two different languages, and outperforms existing unsupervised methods on precision, recall and F-measure.










