Construction of the Literature Graph in Semantic Scholar

@inproceedings{Ammar2018ConstructionOT,
  title={Construction of the Literature Graph in Semantic Scholar},
  author={Waleed Ammar and Dirk Groeneveld and Chandra Bhagavatula and Iz Beltagy and Miles Crawford and Doug Downey and Jason Dunkelberger and Ahmed Elgohary and Sergey Feldman and Vu A. Ha and Rodney Michael Kinney and Sebastian Kohlmeier and Kyle Lo and Tyler C. Murray and Hsu-Han Ooi and Matthew E. Peters and Joanna L. Power and Sam Skjonsberg and Lucy Lu Wang and Christopher Wilhelm and Zheng Yuan and Madeleine van Zuylen and Oren Etzioni},
  booktitle={NAACL},
  year={2018}
}
We describe a deployed scalable system for organizing published scientific literature into a heterogeneous graph to facilitate algorithmic manipulation and discovery. The resulting literature graph consists of more than 280M nodes, representing papers, authors, entities and various interactions between them (e.g., authorships, citations, entity mentions). We reduce literature graph construction into familiar NLP tasks (e.g., entity extraction and linking), point out research challenges due to… 

Figures and Tables from this paper

GrapAL: Querying Semantic Scholar's Literature Graph
TLDR
The basic elements of GrapAL are described, how to use it, and several use cases such as finding experts on a given topic for peer reviewing, discovering indirect connections between biomedical entities, and computing citation-based metrics are described.
Triple Classification for Scholarly Knowledge Graph Completion
TLDR
This work presents exBERT, a method for leveraging pre-trained transformer language models to perform scholarly knowledge graph completion, and presents two scholarly datasets as resources for the research community, collected from public KGs and online resources.
End-to-End NLP Knowledge Graph Construction
TLDR
This paper applies the SciNLP-KG framework to 30,000 NLP papers from ACL Anthology to build a large-scale KG, which can facilitate automatically constructing scientific leaderboards for the NLP community and indicates that the resulting KG contains high-quality information.
Open Information Extraction for Knowledge Graph Construction
TLDR
The proposed OIE4KGC approach takes a document corpus and identifies triples within this corpus which are then processed to generate a literature knowledge graph.
Scalable, Semi-Supervised Extraction of Structured Information from Scientific Literature
TLDR
A novel, scalable, semi-supervised method for extracting relevant structured information from the vast available raw scientific literature by extracting the fundamental concepts of “aim”, ”method” and “result” from scientific articles and using them to construct a knowledge graph.
From Books to Knowledge Graphs
TLDR
A bottom-up approach to support publishers in creating and maintaining their own publication knowledge graphs in the open domain is proposed by releasing a pipeline able to extract structured information from the bibliographies and indexes of AHSS publications, disambiguate, normalize and export it as linked data.
GrapAL: Connecting the Dots in Scientific Literature
TLDR
The basic elements of GrapAL are described, how to use it, and several use cases such as finding experts on a given topic for peer reviewing, discovering indirect connections between biomedical entities, and computing citation-based metrics are described.
Multi-Task Identification of Entities, Relations, and Coreference for Scientific Knowledge Graph Construction
TLDR
The multi-task setup reduces cascading errors between tasks and leverages cross-sentence relations through coreference links and supports construction of a scientific knowledge graph, which is used to analyze information in scientific literature.
Improving Access to Scientific Literature with Knowledge Graphs
TLDR
A scholarly knowledge graph can be used to give a condensed overview on the state-of-the-art addressing a particular research quest, for example as a tabular comparison of contributions according to various characteristics of the approaches.
Soft Marginal TransE for Scholarly Knowledge Graph Completion
TLDR
The TransE embedding model is reconciled for a specific link prediction task on scholarly metadata and the results show a significant shift in the accuracy and performance evaluation of the model on a dataset with scholarly metadata.
...
...

References

SHOWING 1-10 OF 27 REFERENCES
Explicit Semantic Ranking for Academic Search via Knowledge Graph Embedding
TLDR
Explicit Semantic Ranking is introduced, a new ranking technique that leverages knowledge graph embedding that represents queries and documents in the entity space and ranks them based on their semantic connections from their knowledgegraph embedding.
Swanson linking revisited: Accelerating literature-based discovery across domains using a conceptual influence graph
TLDR
It is demonstrated that this deep reading and search system reduces the effort needed to uncover “undiscovered public knowledge”, and that with the aid of this tool a domain expert was able to drastically reduce her model building time from months to two days.
TabEL: Entity Linking in Web Tables
TLDR
TabEL differs from previous work by weakening the assumption that the semantics of a table can be mapped to pre-defined types and relations found in the target KB, and enforces soft constraints in the form of a graphical model that assigns higher likelihood to sets of entities that tend to co-occur in Wikipedia documents and tables.
Content-Based Citation Recommendation
TLDR
It is shown empirically that, although adding metadata improves the performance on standard metrics, it favors self-citations which are less useful in a citation recommendation setup and released an online portal for citation recommendation based on this method.
TAGME: on-the-fly annotation of short text fragments (by wikipedia entities)
We designed and implemented TAGME, a system that is able to efficiently and judiciously augment a plain-text with pertinent hyperlinks to Wikipedia pages. The specialty of TAGME with respect to known
Design Challenges for Entity Linking
TLDR
This work analyzes differences between several versions of the EL problem and presents a simple yet effective, modular, unsupervised system, called Vinculum, for entity linking, and elucidate key aspects of the system that include mention extraction, candidate generation, entity type prediction, entity coreference, and coherence.
Identifying Meaningful Citations
TLDR
This work introduces the novel task of identifying important citations in scholarly literature, i.e., citations that indicate that the cited work is used or extended in the new effort, and proposes a supervised classification approach that addresses this task with a battery of features.
SemEval 2017 Task 10: ScienceIE - Extracting Keyphrases and Relations from Scientific Publications
We describe the SemEval task of extracting keyphrases and relations between them from scientific documents, which is crucial for understanding which publications describe which processes, tasks and
Exploiting MeSH indexing in MEDLINE to generate a data set for word sense disambiguation
TLDR
A method that can be used to automatically develop a WSD test collection using the Unified Medical Language System (UMLS) Metathesaurus and the manual MeSH indexing of MEDLINE is presented and allows the evaluation of WSD algorithms in the biomedical domain.
CHEMDNER: The drugs and chemical names extraction challenge
TLDR
This task allowed a comparative assessment of the performance of various methodologies using a carefully prepared collection of manually labeled text prepared by specially trained chemists as Gold Standard data, and expected that the tools and resources resulting from this effort will have an impact in future developments of chemical text mining applications.
...
...