KGTK: A Toolkit for Large Knowledge Graph Manipulation and Analysis

@inproceedings{Ilievski2020KGTKAT,
  title={KGTK: A Toolkit for Large Knowledge Graph Manipulation and Analysis},
  author={Filip Ilievski and Daniel Garijo and Hans Chalupsky and Naren Teja Divvala and Yixiang Yao and Craig Milo Rogers and Ronpeng Li and Jun Liu and Amandeep Singh and Daniel Schwabe and Pedro A. Szekely},
  booktitle={SEMWEB},
  year={2020}
}
Knowledge graphs (KGs) have become the preferred technology for representing, sharing and adding knowledge to modern AI applications. While KGs have become a mainstream technology, the RDF/SPARQL-centric toolset for operating with them at scale is heterogeneous, difficult to integrate and only covers a subset of the operations that are commonly needed in data science applications. In this paper, we present KGTK, a data science-centric toolkit to represent, create, transform, enhance and analyze… 
Commonsense Knowledge in Wikidata
TLDR
This paper investigates whetherWikidata con-tains commonsense knowledge which is complementary to existing commonsense sources, and proposes three recommended actions to improve the coverage and quality of Wikidata-CS further.
Multilayer graphs: a unified data model for graph databases
TLDR
This proposal, called the multilayer graph model, presents a simple and flexible data model for graphs that can naturally support popular graph formats such as RDF, RDF* and property graphs, while at the same time being powerful enough to naturally store information from complex knowledge graphs, such as Wikidata.
OntoMerger: An Ontology Integration Library for Deduplicating and Connecting Knowledge Graph Nodes
TLDR
This paper introduces OntoMerger and illustrates its functionality on a real-world biomedical KG and provides analytic and data testing functionalities that can be used to fine-tune the inputs, further reducing duplication, and to increase connectivity of the output graph.
Expressive Scene Graph Generation Using Commonsense Knowledge Infusion for Visual Understanding and Reasoning
TLDR
The qualitative results of the proposed method in a downstream task of image generation showed that more realistic images are generated using the Commonsense knowledge-based scene graphs, depicting the effectiveness of commonsense knowledge infusion in improving the performance and expressiveness of scene graph generation for visual understanding and reasoning tasks.
A Birds Eye View on Knowledge Graph Embeddings, Software Libraries, Applications and Challenges
TLDR
Existing KGC approaches are discussed, including the state-of-the-art Knowledge Graph Embeddings (KGE), not only on static graphs but also for the latest trends such as multimodal, temporal, and uncertain knowledge graphs.
Data Models for Annotating Biomedical Scholarly Publications: the Case of CORD-19
TLDR
This systematic review provides an analysis of the data models that have been applied to semantic annotation projects for the scholarly publications available in the CORD-19 dataset, an open database of the full texts of scholarly publications about COVID-19.
From Data to Knowledge Graphs: A Multi-Layered Method to Model User's Visual Analytics Workflow for Analytical Purposes
TLDR
This paper presents Visual Analytic Knowledge Graph (VAKG), a conceptual framework that generalizes existing knowledge models and ontologies by focusing on how humans relate to computer processes temporally and how it relates to the workflow’s state space.
Nipping in the bud
TLDR
This article presents methodological challenges that hinder building automated hate mitigation systems and discusses a series of proposed solutions to limit the spread of hate speech on social media.
Nipping in the Bud: Detection, Diffusion and Mitigation of Hate Speech on Social Media
TLDR
This article presents methodological challenges that hinder building automated hate mitigation systems and discusses a series of proposed solutions to limit the spread of hate speech on social media.
...
...

References

SHOWING 1-10 OF 30 REFERENCES
CORD-19: The COVID-19 Open Research Dataset
TLDR
The mechanics of dataset construction are described, highlighting challenges and key design decisions, an overview of how CORD-19 has been used, and several shared tasks built around the dataset are described.
ConceptNet 5.5: An Open Multilingual Graph of General Knowledge
TLDR
A new version of the linked open data resource ConceptNet is presented that is particularly well suited to be used with modern NLP techniques such as word embeddings, with state-of-the-art results on intrinsic evaluations of word relatedness that translate into improvements on applications of word vectors, including solving SAT-style analogies.
Web-Scale Querying through Linked Data Fragments
TLDR
This paper introduces Linked Data Fragments, a publishing method that enables servers to maintain availability rates as high as any regular http server, allowing querying to scale reliably to much larger numbers of clients.
Consolidating Commonsense Knowledge
TLDR
This paper proposes principles and a representation model and applies this approach to consolidate seven separate sources into a first integrated Common Sense Knowledge Graph (CSKG), and presents statistics of CSKG, and presents initial investigations of its utility on four QA datasets.
CORD-19 Named Entities Knowledge Graph (CORD19-NEKG)
Scalable Zero-shot Entity Linking with Dense Entity Retrieval
TLDR
This paper introduces a simple and effective two-stage approach for zero-shot linking, based on fine-tuned BERT architectures, and shows that it performs well in the non-zero-shot setting, obtaining the state-of-the-art result on TACKBP-2010.
DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter
TLDR
This work proposes a method to pre-train a smaller general-purpose language representation model, called DistilBERT, which can be fine-tuned with good performances on a wide range of tasks like its larger counterparts, and introduces a triple loss combining language modeling, distillation and cosine-distance losses.
RoBERTa: A Robustly Optimized BERT Pretraining Approach
TLDR
It is found that BERT was significantly undertrained, and can match or exceed the performance of every model published after it, and the best model achieves state-of-the-art results on GLUE, RACE and SQuAD.
PyTorch-BigGraph: A Large-scale Graph Embedding System
TLDR
PyTorch-BigGraph (PBG), an embedding system that incorporates several modifications to traditional multi-relation embedding systems that allow it to scale to graphs with billions of nodes and trillions of edges, is presented.
...
...