Learning Algorithms for Keyphrase Extraction

@article{Turney2004LearningAF,
  title={Learning Algorithms for Keyphrase Extraction},
  author={Peter D. Turney},
  journal={Information Retrieval},
  year={2004},
  volume={2},
  pages={303-336}
}
Many academic journals ask their authors to provide a list of about five to fifteen keywords, to appear on the first page of each article. Since these key words are often phrases of two or more words, we prefer to call them keyphrases. There is a wide variety of tasks for which keyphrases are useful, as we discuss in this paper. We approach the problem of automatically extracting keyphrases from text as a supervised learning task. We treat a document as a set of phrases, which the learning… Expand
Mining the Web for Lexical Knowledge to Improve Keyphrase Extraction: Learning from Labeled and Unlabeled Data
TLDR
New features that are conceptually related to keyphrase-frequency are introduced and experiments are presented that show that the new features result in improved keyphrase extraction, although they are neither domain-specific nor training-intensive. Expand
Learning to Extract Significant Phrases from Text
Prospective readers can quickly determine whether a document is relevant to their information need if the significant phrases (or keyphrases) in this document are provided. Although keyphrases areExpand
Automatic Keyphrase Extraction by Bridging Vocabulary Gap
TLDR
The method is considered that a document and its keyphrases both describe the same object but are written in two different languages, and outperforms existing unsupervised methods on precision, recall and F-measure. Expand
Extracting Discriminative Keyphrases with Learned Semantic Hierarchies
TLDR
This paper proposes to use the hierarchical semantic structure between candidate keyphrases to promote keyphRases that have the right level of specificity to clearly distinguish the target document from others, and shows how this helps identify key expertise of authors from their papers, as well as competencies covered by online courses within different domains. Expand
DIKEA: Domain-Independent Keyphrase Extraction Algorithm
TLDR
Experiments show that the new domain-independent keyphrase extraction system (DIKEA) clearly outperforms KEA and closely matches the performance of KEA++, without requiring any domain-specific knowledge such as KEA's vocabulary list. Expand
A SUPERVISED LEARNING APPROACH FOR AUTOMATIC KEYPHRASE EXTRACTION
Keyphrases, synonymously spoken as keywords, represent semantic metadata and play an important role to capture the main theme represented by a large text data collection. Although authors provide aExpand
Automatic Extraction and Learning of Keyphrases from Scientific Articles
TLDR
This paper introduces various baseline extraction methods and integrates these methods using different machine learning methods to investigate automatic extraction and learning of keyphrases from scientific articles written in English. Expand
Finding nuggets in documents: A machine learning approach
TLDR
A Keyphrase Identification Program (KIP) is described, which extracts document keyphrases by using prior positive samples of human identified phrases to assign weights to the candidate keyPhrase, and its learning function can enrich the glossary database by automatically adding new identified keyphRases to the database. Expand
Improving Keyphrase Extraction Using LL-Ranking
TLDR
The obtained results show that the proposed limitations help to significantly increase the quality of extracted keyphrases in terms of Precision and F1. Expand
Incorporating Expert Knowledge into Keyphrase Extraction
TLDR
This paper learns keyphrase taggers for research papers using token-based features incorporating linguistic, surfaceform, and document-structure information through sequence labeling and demonstrates that using document features alone, the tagger trained with Conditional Random Fields performs on-par with existing state-of-the-art systems. Expand
...
1
2
3
4
5
...

References

SHOWING 1-10 OF 79 REFERENCES
Learning to Extract Keyphrases from Text
TLDR
The experimental results support the claim that a specialized learning algorithm (GenEx) can generate better keyphrases than a general-purpose learning algorithms (C4.5) and the non-learning algorithms that are used in commercial software (Word 97 and Search 97). Expand
Extraction of Keyphrases from Text: Evaluation of Four Algorithms
TLDR
An empirical evaluation of four algorithms for automatically extracting keywords and keyphrases from documents finds NRC’s Extractor yields the best match with the manually generated keyphRases. Expand
Domain-Specific Keyphrase Extraction
TLDR
This paper shows that a simple procedure for keyphrase extraction based on the naive Bayes learning scheme performs comparably to the state of the art, and explains how this procedure's performance can be boosted by automatically tailoring the extraction process to the particular document collection at hand. Expand
Learning Algorithms for Keyphrase Extraction
TLDR
A list of about five to fifteen keywords, to appear on the first page of each article, are provided by the authors of academic journals. Expand
Compound Key Word Generation from Document Databases Using A Hierarchical Clustering ART Model
  • A. Muñoz
  • Computer Science
  • Intell. Data Anal.
  • 1997
TLDR
This paper addresses the specific problem of creating semantic term associations from a text database by using a hierarchical model made up of Fuzzy Adaptive Resonance Theory (ART) neural networks to cluster isolated words into semantic classes. Expand
Learning user information interests through extraction of semantically significant phrases
TLDR
This paper describes an intelligent agent developed to address this problem similar to research systems under development for similar tasks, and presents the solution in the context of a Lotus Notes system, consisting of electronic mail, bulletin boards, news services, and databases. Expand
Compound Key Word Generation from Document Databases Using A Hierarchical Clustering ART Model
TLDR
This paper addresses the specific problem of creating semantic term associations from a text database by using a hierarchical model made up of Fuzzy Adaptive Resonance Theory ART neural networks to cluster isolated words into semantic classes. Expand
Extraction of Index Words from Manuals
TLDR
An automatic extraction method of index words from manuals, which optimizes Rijsbergen's E calculated from the recall and precision is described and experimentally evaluated. Expand
Improving browsing in digital libraries with keyphrase indexes
TLDR
A new kind of search engine, Keyphind, is built that is explicitly designed to support browsing and provides a keyphrase index, allowing users to interact with the collection at the level of topics and subjects rather than words and documents. Expand
New Methods in Automatic Extracting
TLDR
New methods of automatically extracting documents for screening purposes, i.e. the computer selection of sentences having the greatest potential for conveying to the reader the substance of the document, indicate that the three newly proposed components dominate the frequency component in the production of better extracts. Expand
...
1
2
3
4
5
...