Corpus ID: 14116376

Corpus-independent Generic Keyphrase Extraction Using Word Embedding Vectors

@inproceedings{Wang2015CorpusindependentGK,
  title={Corpus-independent Generic Keyphrase Extraction Using Word Embedding Vectors},
  author={Rui Wang},
  year={2015}
}
Keyphrase extraction from a given document is a difficult task that requires not only local statistical information but also extensive background knowledge. In this paper, we propose a graph-based ranking approach that uses information supplied by word embedding vectors as the background knowledge. We first introduce a weighting scheme that computes informativeness and phraseness scores of words using the information supplied by both word embedding vectors and local statistics. Keyphrase… Expand

Figures and Tables from this paper

Simple Unsupervised Keyphrase Extraction using Sentence Embeddings
TLDR
This paper tackles keyphrase extraction from single documents with EmbedRank: a novel unsupervised method, that leverages sentence embeddings, that achieves higher F-scores than graph-based state of the art systems on standard datasets and is suitable for real-time processing of large amounts of Web data. Expand
Local word vectors guiding keyphrase extraction
TLDR
This work presents a novel unsupervised method for keyphrase extraction, whose main innovation is the use of local word embeddings (in particular GloVe vectors), i.e., embeddins trained from the single document under consideration. Expand
Keyphrase Extraction by Integrating Multidimensional Information
Traditional supervised keyphrase extraction models depend on the features of labelled keyphrases while prevailing unsupervised models mainly rely on structure of the word graph, with candidate wordsExpand
EmbedRank: Unsupervised Keyphrase Extraction using Sentence Embeddings
TLDR
An unsupervised method for keyphrase extraction from single documents that leverages sentence embeddings is introduced, and it is shown that this embedding-based method is not only simpler, but also more effective than graph-based state of the art systems, achieving higher F-scores on standard datasets. Expand
MIKE: Keyphrase Extraction by Integrating Multidimensional Information
TLDR
This work proposes a random-walk parametric model, MIKE, that learns the latent representation for a candidate keyphrase that captures the mutual influences among all information, and simultaneously optimizes the parameters and ranking scores of candidates in the word graph. Expand
Exploring Word Embeddings in CRF-based Keyphrase Extraction from Research Papers
TLDR
This paper explores keyphrase extraction formulated as sequence labeling and utilizes the power of Conditional Random Fields in capturing label dependencies through a transition parameter matrix consisting of the transition probabilities from one label to the neighboring label. Expand
Incorporating Expert Knowledge into Keyphrase Extraction
TLDR
This paper learns keyphrase taggers for research papers using token-based features incorporating linguistic, surfaceform, and document-structure information through sequence labeling and demonstrates that using document features alone, the tagger trained with Conditional Random Fields performs on-par with existing state-of-the-art systems. Expand
A New Scheme for Scoring Phrases in Unsupervised Keyphrase Extraction
TLDR
This work proposes a new scheme for scoring phrases which calculates the final score using the average of the scores of individual words weighted by the frequency of the phrase in the document. Expand
Unsupervised Keyphrase Extraction by Jointly Modeling Local and Global Context
  • Xinnian Liang, Shuangzhi Wu, Mu Li, Zhoujun Li
  • Computer Science
  • ArXiv
  • 2021
Embedding based methods are widely used for unsupervised keyphrase extraction (UKE) tasks. Generally, these methods simply calculate similarities between phrase embeddings and document embedding,Expand
Word centrality constrained representation for keyphrase extraction
TLDR
This work proposes a new extraction model that introduces a centrality constraint to enrich the word representation of a Bidirectional long short-term memory and outperforms existing state-of-the art approaches. Expand
...
1
2
3
4
5
...

References

SHOWING 1-10 OF 44 REFERENCES
Automatic Keyphrase Extraction via Topic Decomposition
TLDR
A Topical PageRank (TPR) is built on word graph to measure word importance with respect to different topics and shows that TPR outperforms state-of-the-art keyphrase extraction methods on two datasets under various evaluation metrics. Expand
Clustering to Find Exemplar Terms for Keyphrase Extraction
TLDR
This work proposes an unsupervised method for keyphrase extraction that outperforms sate-of-the-art graph-based ranking methods (TextRank) by 9.5% in F1-measure and guarantees the document to be semantically covered by these exemplar terms. Expand
Extracting key terms from noisy and multitheme documents
TLDR
Evaluations of the method show that it outperforms existing methods producing key terms with higher precision and recall, and appears to be substantially more effective on noisy and multi-theme documents than existing methods. Expand
Learning Algorithms for Keyphrase Extraction
TLDR
The experimental results support the claim that a custom-designed algorithm (GenEx), incorporating specialized procedural domain knowledge, can generate better keyphrases than a general-purpose algorithm (C4.5). Expand
Keyword Extraction Based on PageRank
TLDR
A keyword extraction algorithm based on WordNet and PageRank that applies UW-PageRank in the rough graph to do word sense disambiguation, prune the graph, and finally apply UW- PageRank again on the pruned graph to extract keywords. Expand
Domain-specific keyphrase extraction
TLDR
A Keyphrase Identification Program (KIP) is described, which extracts document keyphrases by using prior positive samples of human identified domain key phrases to assign weights to the candidate keyphRases. Expand
Single Document Keyphrase Extraction Using Neighborhood Knowledge
TLDR
This paper proposes to use a small number of nearest neighbor documents to provide more knowledge to improve single document keyphrase extraction. Expand
KeyGraph: automatic indexing by co-occurrence graph based on building construction metaphor
TLDR
KeyGraph presents an algorithm for extracting keywords representing the asserted main point in a document, without relying on external devices such as natural-language processing tools or a document corpus, based on the segmentation of a graph. Expand
Keyword extraction from a single document using word co-occurrence statistical information
TLDR
A new keyword extraction algorithm that applies to a single document without using a corpus and shows comparable performance to tfidf without using an corpus is presented. Expand
HUMB: Automatic Key Term Extraction from Scientific Articles in GROBID
TLDR
The Semeval task 5 was an opportunity for experimenting with the key term extraction module of GROBID, a system for extracting and generating bibliographical information from technical and scientific documents, andagged decision trees appeared to be the most efficient machine learning algorithm for generating a list of ranked key term candidates. Expand
...
1
2
3
4
5
...