Corpus ID: 233289404

Back to the Basics: A Quantitative Analysis of Statistical and Graph-Based Term Weighting Schemes for Keyword Extraction

  title={Back to the Basics: A Quantitative Analysis of Statistical and Graph-Based Term Weighting Schemes for Keyword Extraction},
  author={Asahi Ushio and Federico Liberatore and Jos{\'e} Camacho-Collados},
Term weighting schemes are widely used in Natural Language Processing and Information Retrieval. In particular, term weighting is the basis for keyword extraction. However, there are relatively few evaluation studies that shed light about the strengths and shortcomings of each weighting scheme. In fact, in most cases researchers and practitioners resort to the well-known tf-idf as default, despite the existence of other suitable alternatives, including graph-based models. In this paper, we… Expand


YAKE! Keyword extraction from single documents using multiple local features
YAKE!, a light-weight unsupervised automatic keyword extraction method which rests on statistical text features extracted from single documents to select the most relevant keywords of a text, is described. Expand
TopicRank: Graph-Based Topic Ranking for Keyphrase Extraction
This paper presents TopicRank, a graph-based keyphrase extraction method that relies on a topical representation of the document and significantly outperforms state-of-the-art methods on three datasets. Expand
TextRank: Bringing Order into Text
TextRank, a graph-based ranking model for text processing, is introduced and it is shown how this model can be successfully used in natural language applications. Expand
The PageRank Citation Ranking : Bringing Order to the Web
This paper describes PageRank, a mathod for rating Web pages objectively and mechanically, effectively measuring the human interest and attention devoted to them, and shows how to efficiently compute PageRank for large numbers of pages. Expand
Joint Keyphrase Chunking and Salience Ranking with BERT
BERT-JointKPE is presented, a multi-task BERT-based model for keyphrase extraction that employs a chunking network to identify high-quality phrases and a ranking network to learn their salience in the document. Expand
Open Domain Web Keyphrase Extraction Beyond Language Modeling
Experimental results on OpenKP confirm the effectiveness of BLING-KPE and the contributions of its neural architecture, visual features, and search log weak supervision and Zero-shot evaluations on DUC-2001 demonstrate the improved generalization ability of learning from the open domain data compared to a specific domain. Expand
PositionRank: An Unsupervised Approach to Keyphrase Extraction from Scholarly Documents
An unsupervised model for keyphrase extraction from scholarly documents that incorporates information from all positions of a word’s occurrences into a biased PageRank, which achieves remarkable improvements over PageRank models that do not take into account word positions. Expand
Keyword extraction from emails*
A new dataset for keyword extraction from emails is introduced, and supervised and unsupervised methods for keyword extractions from emails are evaluated. Expand
Topical Word Importance for Fast Keyphrase Extraction
This work proposes an improvement on a state-of-the-art keyphrase extraction algorithm, Topical PageRank (TPR), incorporating topical information from topic models, which increases the speed drastically and enables it for use on large collections of text using vast topics models, while not altering performance of the original algorithm. Expand
An improved term weighting scheme for text classification
An improved term weighting scheme called term frequency‐inverse exponential frequency (TF‐IEF) and its various variants are proposed and Experimental results explicitly reveal that the proposedterm weighting schemes come with better performance than the compared schemes. Expand