ELSKE: efficient large-scale keyphrase extraction

  title={ELSKE: efficient large-scale keyphrase extraction},
  author={Johannes Knittel and Steffen Koch and Thomas Ertl},
  journal={Proceedings of the 21st ACM Symposium on Document Engineering},
Keyphrase extraction methods can provide insights into large collections of documents such as social media posts. Existing methods, however, are less suited for the real-time analysis of streaming data, because they are computationally too expensive or require restrictive constraints regarding the structure of keyphrases. We propose an efficient approach to extract keyphrases from large document collections and show that the method also performs competitively on individual documents. 

Tables from this paper

Real-Time Visual Analysis of High-Volume Social Media Posts

An interactive system that enables the visual analysis of streaming social media data on a large scale in real-time and works with non-geolocated posts and avoids extensive preprocessing such as detecting events is presented.

Theory entity extraction for social and behavioral sciences papers using distant supervision

This paper proposes an automated framework based on distant supervision that leverages entity mentions from Wikipedia to build a ground truth corpus consisting of more than 4500 automatically annotated sentences containing theory/model mentions that is used to train models for theory extraction in SBS papers.



Single Document Keyphrase Extraction Using Neighborhood Knowledge

This paper proposes to use a small number of nearest neighbor documents to provide more knowledge to improve single document keyphrase extraction.

Automatic keyphrase extraction: a survey and trends

A comprehensive review of recent research efforts on the AKPE task and its related techniques is provided, including a comparison study of the best performing techniques, why some perform better than others and proposed recommendations to improve each stage of theAKPE process.

Large Dataset for Keyphrase Extraction

A large dataset for machine learning-based automatic keyphrase extraction based on 2,000 of scientific papers from computer science domain published by ACM is proposed, showing keyphrases recognition accuracy improvement for refined texts.

Conundrums in Unsupervised Keyphrase Extraction: Making Sense of the State-of-the-Art

A systematic evaluation and analysis of state-of-the-art unsupervised keyphrase extraction algorithms on a variety of standard evaluation datasets to gain a better understanding of these algorithms.

TopicRank: Graph-Based Topic Ranking for Keyphrase Extraction

This paper presents TopicRank, a graph-based keyphrase extraction method that relies on a topical representation of the document and significantly outperforms state-of-the-art methods on three datasets.

Open Domain Web Keyphrase Extraction Beyond Language Modeling

Experimental results on OpenKP confirm the effectiveness of BLING-KPE and the contributions of its neural architecture, visual features, and search log weak supervision and Zero-shot evaluations on DUC-2001 demonstrate the improved generalization ability of learning from the open domain data compared to a specific domain.

Deep Keyphrase Generation

Empirical analysis on six datasets demonstrates that the proposed generative model for keyphrase prediction with an encoder-decoder framework achieves a significant performance boost on extracting keyphrases that appear in the source text, but also can generate absent keyphRases based on the semantic meaning of the text.

Keyphrase Extraction in Scientific Publications

In the evaluation using a corpus of 120 scientific publications multiply annotated for keyphrases, the system significantly outperformed Kea at the p < .05 level.

Keyphrase Generation with Correlation Constraints

A new sequence-to-sequence architecture for keyphrase generation named CorrRNN is proposed, which captures correlation among multiple keyphrases in two ways and significantly outperforms the state-of-the-art method on benchmark datasets in terms of both accuracy and diversity.

DAKE: Document-Level Attention for Keyphrase Extraction

Document-level Attention for Keyphrase Extraction (DAKE), which comprises Bidirectional Long Short-Term Memory networks that capture hidden semantics in text, a document-level attention mechanism to incorporate document level contextual information, and gating mechanisms which help to determine the influence of additional contextual information on the fusion with local contextual information.