High-Precision Extraction of Emerging Concepts from Scientific Literature
@article{King2020HighPrecisionEO, title={High-Precision Extraction of Emerging Concepts from Scientific Literature}, author={Daniel King and Doug Downey and Daniel S. Weld}, journal={Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval}, year={2020} }
Identification of new concepts in scientific literature can help power faceted search, scientific trend analysis, knowledge-base construction, and more, but current methods are lacking. Manual identification can't keep up with the torrent of new publications, while the precision of existing automatic techniques is too low for many applications. We present an unsupervised concept extraction method for scientific literature that achieves much higher precision than previous work. Our approach…
7 Citations
The impact of preprint servers in the formation of novel ideas
- Computer SciencebioRxiv
- 2020
A Bayesian method to estimate the time of appearance for a phrase in the literature is developed, and it is seen that presently most phrases appear first in the traditional journals, but there is a number of phrases with the first appearance on preprint servers.
ACCoRD: A Multi-Document Approach to Generating Diverse Descriptions of Scientific Concepts
- Computer ScienceArXiv
- 2022
ACCoRD, an end-to-end system tack-ling the novel task of generating sets of descriptions of scientific concepts, is presented and a user study is conducted demonstrating that users prefer descriptions produced by the system, and users prefer multiple descriptions to a single “best” description.
Enhancing relevant concepts extraction for ontology learning using domain time relevance
- Computer ScienceInf. Process. Manag.
- 2023
Don't Say What You Don't Know: Improving the Consistency of Abstractive Summarization by Constraining Beam Search
- Computer ScienceArXiv
- 2022
PINOCCHIO is presented, a new decoding method that improves the consistency of a transformer-based abstractive summarizer by constraining beam search to avoid hallucinations.
Metrics and Mechanisms: Measuring the Unmeasurable in the Science of Science
- EducationJ. Informetrics
- 2022
Towards Personalized Descriptions of Scientific Concepts
- Computer Science
- 2021
This paper proposes generating personalized scientific concept descriptions that are tailored to the user’s expertise and context and outlines a complete architecture for the task and releases an expert-annotated resource, ACCoRD.
README: A Literature Survey Assistant
- Computer Science
- 2020
Literature review is an integral element of academic research, enabling researchers to learn about and build on existing work. Traditionally, this involves manually going through various published…
References
SHOWING 1-10 OF 22 REFERENCES
TSE-NER: An Iterative Approach for Long-Tail Entity Extraction in Scientific Publications
- Computer ScienceSEMWEB
- 2018
An iterative approach for training NER and NET classifiers in scientific publications that relies on minimal human input, namely a small seed set of instances for the targeted entity type, is presented.
Extracting Keyphrases from Research Papers Using Citation Networks
- Computer ScienceAAAI
- 2014
This work proposes CiteTextRank for keyphrase extraction from research articles, a graph-based algorithm that incorporates evidence from both a document's content as well as the contexts in which the document is referenced within a citation network.
A frequent keyword-set based algorithm for topic modeling and clustering of research papers
- Computer Science2011 3rd Conference on Data Mining and Optimization (DMO)
- 2011
A novel and efficient approach to detect topics in a large corpus of research papers using closed frequent keyword-set to form topics and a modified PageRank algorithm that assigns an authoritative score to each research paper by considering the sub-graph in which the research paper appears.
Construction of the Literature Graph in Semantic Scholar
- Computer ScienceNAACL
- 2018
This paper reduces literature graph construction into familiar NLP tasks, point out research challenges due to differences from standard formulations of these tasks, and report empirical results for each task.
Detecting research topics via the correlation between graphs and texts
- Computer ScienceKDD '07
- 2007
This paper presents a unique approach that uses the correlation between the distribution of a term that represents a topic and the link distribution in the citation graph where the nodes are limited to the documents containing the term.
Phrases as subtopical concepts in scholarly text
- Computer ScienceJCDL '11
- 2011
This work presents a method to extract "phrase" phrases from a text corpus, and rank them using a citation network measure, the compensated normalized link count (CNLC), which measures the extent to which they are propagated along the citation structure of articles.
A review of keyphrase extraction
- Computer ScienceWIREs Data Mining Knowl. Discov.
- 2020
This article introduces keyphrase extraction, provides a well‐structured review of the existing work, offers interesting insights on the different evaluation approaches, highlights open issues and presents a comparative experimental study of popular unsupervised techniques on five datasets.
Self-taught hashing for fast similarity search
- Computer ScienceSIGIR
- 2010
This paper proposes a novel Self-Taught Hashing (STH) approach to semantic hashing: it first finds the optimal l-bit binary codes for all documents in the given corpus via unsupervised learning, and then train l classifiers via supervised learning to predict the l- bit code for any query document unseen before.
Bursty and Hierarchical Structure in Streams
- Computer ScienceData Mining and Knowledge Discovery
- 2004
The goal of the present work is to develop a formal approach for modeling such “bursts,” in such a way that they can be robustly and efficiently identified, and can provide an organizational framework for analyzing the underlying content.