S2ORC: The Semantic Scholar Open Research Corpus

@inproceedings{Lo2020S2ORCTS,
  title={S2ORC: The Semantic Scholar Open Research Corpus},
  author={Kyle Lo and Lucy Lu Wang and Mark Neumann and Rodney Michael Kinney and Daniel S. Weld},
  booktitle={ACL},
  year={2020}
}
We introduce S2ORC, a large corpus of 81.1M English-language academic papers spanning many academic disciplines. The corpus consists of rich metadata, paper abstracts, resolved bibliographic references, as well as structured full text for 8.1M open access papers. Full text is annotated with automatically-detected inline mentions of citations, figures, and tables, each linked to their corresponding paper objects. In S2ORC, we aggregate papers from hundreds of academic publishers and digital… Expand
52 Citations
SChuBERT: Scholarly Document Chunks with BERT-encoding boost Citation Count Prediction
  • Highly Influenced
  • PDF
Enhancing Scientific Papers Summarization with Citation Graph
  • 1
  • Highly Influenced
  • PDF
A Large-Scale Analysis of Cross-lingual Citations in English Papers
  • Highly Influenced
  • PDF
Acknowledgement Entity Recognition in CORD-19 Papers
  • 1
  • PDF
Can We Automate Scientific Reviewing?
  • PDF
A Text Mining Approach to Discovering COVID-19 Relevant Factors
  • PDF
TLDR: Extreme Summarization of Scientific Documents
  • 5
  • PDF
Machine Knowledge: Creation and Curation of Comprehensive Knowledge Bases
  • 6
  • PDF
Text mining approaches for dealing with the rapidly expanding literature on COVID-19
  • 1
  • PDF
...
1
2
3
4
5
...

References

SHOWING 1-10 OF 58 REFERENCES
unarXive: a large scholarly data set with publications’ full-text, annotated in-text citations, and links to metadata
  • 6
  • PDF
Structural Scaffolds for Citation Intent Classification in Scientific Publications
  • 37
  • PDF
Construction of the Literature Graph in Semantic Scholar
  • 129
  • PDF
Summarizing Citation Contexts of Scientific Publications
  • 5
SciBERT: A Pretrained Language Model for Scientific Text
  • 387
  • PDF
Identifying Meaningful Citations
  • 105
  • PDF
ArnetMiner: extraction and mining of academic social networks
  • 1,429
  • PDF
...
1
2
3
4
5
...