Infrastructure for Rapid Open Knowledge Network Development

  title={Infrastructure for Rapid Open Knowledge Network Development},
  author={Michael J. Cafarella and Michael Anderson and Iz Beltagy and Arie Cattan and Sarah E. Chasins and Ido Dagan and Doug Downey and Oren Etzioni and Sergey Feldman and Tian Gao and Tom Hope and Kexin Huang and Sophie Johnson and Daniel King and Kyle Lo and Yuze Lou and Matthew Shapiro and Dinghao Shen and Shivashankar Subramanian and Lucy Lu Wang and Yuning Wang and Yitong Wang and Daniel S. Weld and Jenny M. Vo-Phamhi and Anna Zeng and Jiayun Zou},
  journal={AI Mag.},
The past decade has witnessed a growth in the use of knowledge graph technologies for advanced data search, data integration, and query-answering applications. The leading example of a public, general-purpose open knowledge network (aka knowledge graph) is Wikidata, which has demonstrated remarkable advances in quality and coverage over this time. Proprietary knowledge graphs drive some of the leading applications of the day including, for example, Google Search, Alexa, Siri, and Cortana. Open… 
2 Citations
Knowledge Graphs: Introduction, History and, Perspectives
KGs are introduced and discussed and important areas of application that have gained recent prominence are discussed; KGs are situate in the context of the prior work in AI; and a few contrasting perspectives that help in better understanding KGs in relation to related technologies are presented.


UniProt: a worldwide hub of protein knowledge
The UniProt Knowledgebase is a collection of sequences and annotations for over 120 million proteins across all branches of life that has greatly expanded the number of Reference Proteomes that it provides and in particular it has focussed on improving thenumber of viral Reference Protesomes.
Web-scale information extraction in knowitall: (preliminary results)
KnowItAll, a system that aims to automate the tedious process of extracting large collections of facts from the web in an autonomous, domain-independent, and scalable manner, is introduced.
DBpedia: A Nucleus for a Web of Open Data
The extraction of the DBpedia datasets is described, and how the resulting information is published on the Web for human-andmachine-consumption and how DBpedia could serve as a nucleus for an emerging Web of open data.
S2AND: A Benchmark and Evaluation System for Author Name Disambiguation
This work presents S2AND, a unified benchmark dataset for AND on scholarly papers, as well as an open-source reference model implementation, and releases the unified dataset, model code, trained models, and evaluation suite to the research community.
Yago: a core of semantic knowledge
YAGO builds on entities and relations and currently contains more than 1 million entities and 5 million facts, which includes the Is-A hierarchy as well as non-taxonomic relations between entities (such as HASONEPRIZE).
Rousillon: Scraping Distributed Hierarchical Web Data
This work presents Rousillon, a programming system for writing complex web automation scripts by demonstration, and developed novel relation selection and generalization algorithms that can be used to write hierarchically-structured data from across many different webpages.
LayoutParser: A Unified Toolkit for Deep Learning Based Document Image Analysis
The core LayoutParser library comes with a set of simple and intuitive interfaces for applying and customizing DL models for layout detection, character recognition, and many other document processing tasks and incorporates a community platform for sharing both pre-trained models and full document digitization pipelines.
Name Disambiguation in AMiner: Clustering, Maintenance, and Human in the Loop.
A novel representation learning method is proposed by incorporating both global and local information and an end-to-end cluster size estimation method that is significantly better than traditional BIC-based method is presented.
SPECTER: Document-level Representation Learning using Citation-informed Transformers
This work proposes SPECTER, a new method to generate document-level embedding of scientific papers based on pretraining a Transformer language model on a powerful signal of document- level relatedness: the citation graph, and shows that Specter outperforms a variety of competitive baselines on the benchmark.
SciCo: Hierarchical Cross-Document Coreference for Scientific Concepts
This work presents a new task of hierarchical CDCR for concepts in scientific papers, with the goal of jointly inferring coreference clusters and hierarchy between them and creates SCICO, an expert-annotated dataset for this task.