A review of keyphrase extraction

  title={A review of keyphrase extraction},
  author={Eirini Papagiannopoulou and Grigorios Tsoumakas},
  journal={Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery},
Keyphrase extraction is a textual information processing task concerned with the automatic extraction of representative and characteristic phrases from a document that express all the key aspects of its content. Keyphrases constitute a succinct conceptual summary of a document, which is very useful in digital information management systems for semantic indexing, faceted search, document clustering and classification. This article introduces keyphrase extraction, provides a well‐structured… 

Multi-Document Keyphrase Extraction: A Literature Review and the First Dataset

The first literature review and the first dataset for the task are presented, MK-DUC-01, which can serve as a new benchmark and several keyphrase extraction baselines are tested and show their results.

Multi-Document Keyphrase Extraction: Dataset, Baselines and Review

The first dataset for the task is presented, MK-DUC-01, which can serve as a new benchmark, and multiple keyphrase extraction baselines are tested on the authors' data.

CorpusRank: Corpus Information in Unsupervised Keyphrase Extraction

This work extends their method by considering information from other documents in the corpus, and introduces a new keyphrase extraction algorithm, CorpusRank, which is based on document-phrase similarity drawn from the embedding space of BERT.

GLEAKE: Global and Local Embedding Automatic Keyphrase Extraction

GLEAKE utilizes single and multi-word embedding techniques to explore the syntactic and semantic aspects of the candidate phrases and then combines them into a series of embedding-based graphs to refine the most significant phrases as a final set of keyphrases.

Unsupervised Keyphrase Extraction via Interpretable Neural Networks

This work proposes INSPECT—a self-explaining neural framework for identifying influential keyphrases by measuring the predictive impact of input phrases on the downstream task of topic classification and suggests a new usage of interpretable neural networks as an intrinsic component in NLP systems, and not only as a tool for explaining model predictions to humans.

Automatic Keyword Extraction From Text Documents

This chapter provides an overview of keyword indexing and elaborates on keyword extraction techniques, providing the general motivations behind the supervised and the unsupervised keyword extraction and enumerating several pioneering and state-of-the-art techniques.

Enhancing keyphrase extraction from academic articles with their reference information

With the development of Internet technology, the phenomenon of information overload is becoming more and more obvious. It takes a lot of time for users to obtain the information they need. However,

LDKP: A Dataset for Identifying Keyphrases from Long Scientific Documents

Identifying keyphrases (KPs) from text documents is a fundamental task in natural language processing and information retrieval. Vast majority of the benchmark datasets for this task are from the

Keyphrases Concentrated Area Identification from Academic Articles as Feature of Keyphrase Extraction: A New Unsupervised Approach

The experimental results show that the proposed unsupervised keyphrase concentrated area (KCA) identification approach effectively recognizes the KCA from articles as well as significantly enhances the current keyphrase extraction methods based on various text sizes, languages, and domains.



Automatic keyphrase extraction from scientific articles

The task is outlined, the overall ranking of the submitted systems is presented, and the improvements to the state-of-the-art in keyphrase extraction are discussed.

KEA: practical automatic keyphrase extraction

This paper uses a large test corpus to evaluate Kea’s effectiveness in terms of how many author-assigned keyphrases are correctly identified, and describes the system, which is simple, robust, and publicly available.

Local word vectors guiding keyphrase extraction

Improving Keyphrase Extraction Using Wikipedia Semantics

This paper proposes a novel automatic keyphrase extraction algorithm using semantic features mined from online Wikipedia, which first identifies candidate keyphrases based on lexical methods, and then a semantic graph which connects candidate keyPhrases with document topics is constructed.

Clustering to Find Exemplar Terms for Keyphrase Extraction

This work proposes an unsupervised method for keyphrase extraction that outperforms sate-of-the-art graph-based ranking methods (TextRank) by 9.5% in F1-measure and guarantees the document to be semantically covered by these exemplar terms.

A ranking approach to keyphrase extraction

Experimental results on three datasets show that Ranking SVM significantly outperforms the baseline methods of SVM and Naive Bayes, indicating that it is better to exploit learning to rank techniques in keyphrase extraction.

Human-competitive tagging using automatic keyphrase extraction

This paper demonstrates how documents can be tagged automatically with a state-of-the-art keyphrase extraction algorithm, and improves performance in this new domain using a new algorithm, "Maui", that utilizes semantic information extracted from Wikipedia.

How Document Pre-processing affects Keyphrase Extraction Performance

This work re-assess the performance of several keyphrase extraction models and measure their robustness against increasingly sophisticated levels of document preprocessing.

PositionRank: An Unsupervised Approach to Keyphrase Extraction from Scholarly Documents

An unsupervised model for keyphrase extraction from scholarly documents that incorporates information from all positions of a word’s occurrences into a biased PageRank, which achieves remarkable improvements over PageRank models that do not take into account word positions.

Keyphrase Annotation with Graph Co-Ranking

This paper proposes a new method to perform both keyphrase extraction and keyphrase assignment in an integrated and mutual reinforcing manner, and shows statistically significant improvements compared to both key phrase extraction andKeyphrase assignment state-of-the art methods.