KGTK: A Toolkit for Large Knowledge Graph Manipulation and Analysis

@inproceedings{Ilievski2020KGTKAT,
  title={KGTK: A Toolkit for Large Knowledge Graph Manipulation and Analysis},
  author={Filip Ilievski and Daniel Garijo and Hans Chalupsky and Naren Teja Divvala and Yixiang Yao and Craig Milo Rogers and Ronpeng Li and Juefu Liu and Amandeep Singh and Daniel Schwabe and Pedro A. Szekely},
  booktitle={SEMWEB},
  year={2020}
}
Knowledge graphs (KGs) have become the preferred technology for representing, sharing and adding knowledge to modern AI applications. While KGs have become a mainstream technology, the RDF/SPARQL-centric toolset for operating with them at scale is heterogeneous, difficult to integrate and only covers a subset of the operations that are commonly needed in data science applications. In this paper, we present KGTK, a data science-centric toolkit to represent, create, transform, enhance and analyze… Expand

Figures from this paper

Creating and Querying Personalized Versions of Wikidata on a Laptop
TLDR
This paper introduces KGTK Kypher, a query language and processor that allows users to create personalized variants of Wikidata on a laptop, and presents several use cases that illustrate the types of analyses that Kyphers enables users to run on the fullWikidata KG on aaptop, combining data from external resources such as DBpedia. Expand
A Study of the Quality of Wikidata
TLDR
A framework to detect and analyze low-quality statements in Wikidata by shedding light on the current practices exercised by the community is developed, revealing challenges with duplicate entities, missing triples, violated type rules, and taxonomic distinctions. Expand
User-friendly Comparison of Similarity Algorithms on Wikidata
TLDR
A user-friendly interface is presented that allows flexible computation of similarity between Qnodes in Wikidata, and a REST API is provided that can compute most similar neighbors for any Qnode inWikidata. Expand
Extracting a Knowledge Base of Mechanisms from COVID-19 Papers
TLDR
This work pursues the construction of a knowledge base of mechanisms—a fundamental concept across the sciences, which encompasses activities, functions and causal relations, ranging from cellular processes to economic impacts, by developing a broad, unified schema. Expand
Text mining approaches for dealing with the rapidly expanding literature on COVID-19
TLDR
This review discusses the corpora, modeling resources, systems and shared tasks that have been introduced for COVID-19, and lists 39 systems that provide functionality such as search, discovery, visualization and summarization over the CO VID-19 literature. Expand
Analyzing Race and Country of Citizenship Bias in Wikidata
TLDR
There is an overrepresentation of white individuals and those with citizenship in Europe and North America; the rest of the groups are generally underrepresented. Expand
Machine-Assisted Script Curation
TLDR
Machine-Aided Script Curator automates portions of the script creation process with suggestions for event types, links to Wikidata, and sub-events that may have been forgotten. Expand
Deep Learning applications for COVID-19
TLDR
This survey explores how Deep Learning has battled the CO VID-19 pandemic and provides directions for future research on COVID-19, and evaluates the current state of Deep Learning and concludes with key limitations of Deep learning for COvid-19 applications. Expand
Commonsense Knowledge in Wikidata
TLDR
This paper investigates whetherWikidata con-tains commonsense knowledge which is complementary to existing commonsense sources, and proposes three recommended actions to improve the coverage and quality of Wikidata-CS further. Expand
COVID-19 Literature Knowledge Graph Construction and Drug Repurposing Report Generation
TLDR
A novel and comprehensive knowledge discovery framework, COVID-KG, to extract fine-grained multimedia knowledge elements (entities, relations and events) from scientific literature and exploit the constructed multimedia knowledge graphs (KGs) for question answering and report generation. Expand
...
1
2
...

References

SHOWING 1-10 OF 35 REFERENCES
Querying Wikidata: Comparing SPARQL, Relational and Graph Databases
In this paper, we experimentally compare the efficiency of various database engines for the purposes of querying the Wikidata knowledge-base, which can be conceptualised as a directed edge-labelledExpand
SPARQL Web-Querying Infrastructure: Ready for Action?
TLDR
It is found that only one-third of endpoints make descriptive meta-data available, making it difficult to locate or learn about their content and capabilities, and patchy support for established SParQL features like ORDER BY as well as for new SPARQL 1.1 features is found. Expand
Exchange and Consumption of Huge RDF Data
TLDR
This paper shows how to enhance the exchanged HDT with additional structures to support some basic forms of SPARQL query resolution without the need of "unpacking" the data. Expand
HDTQ: Managing RDF Datasets in Compressed Space
TLDR
This work introduces HDTQ (HDT Quads), an extension of HDT that is able to represent quadruples (or quads) while still being highly compact and queryable and is a competitive alternative to well-established systems. Expand
LOD Lab: Scalable Linked Data Processing
TLDR
This presentation explains how the Linked Open Data, one of the biggest knowledge bases ever built, is an ideal test-bed for knowledge representation and reasoning, heterogeneous nature, and complexity. Expand
Web-Scale Querying through Linked Data Fragments
TLDR
This paper introduces Linked Data Fragments, a publishing method that enables servers to maintain availability rates as high as any regular http server, allowing querying to scale reliably to much larger numbers of clients. Expand
DBpedia: A Nucleus for a Web of Open Data
TLDR
The extraction of the DBpedia datasets is described, and how the resulting information is published on the Web for human-andmachine-consumption and how DBpedia could serve as a nucleus for an emerging Web of open data. Expand
ConceptNet 5.5: An Open Multilingual Graph of General Knowledge
TLDR
A new version of the linked open data resource ConceptNet is presented that is particularly well suited to be used with modern NLP techniques such as word embeddings, with state-of-the-art results on intrinsic evaluations of word relatedness that translate into improvements on applications of word vectors, including solving SAT-style analogies. Expand
LOD-a-lot - A Queryable Dump of the LOD Cloud
TLDR
LOD-a-lot democratizes access to the Linked Open Data (LOD) Cloud by serving more than 28 billion unique triples from 650 K datasets over a single self-indexed file, enabling Web-scale repeatable experimentation and research even by standard laptops. Expand
MFIBlocks: An effective blocking algorithm for entity resolution
TLDR
A blocking approach is introduced that avoids selecting a blocking key altogether, relieving the user from this difficult task and is based on maximal frequent itemsets selection, allowing early evaluation of block quality based on the overall commonality of its members. Expand
...
1
2
3
4
...