• Publications
  • Influence
An Unsupervised Algorithm for Learning Blocking Schemes
TLDR
A pair wise comparison of data objects is a requisite step in many data mining applications, but has quadratic complexity. Expand
  • 53
  • 11
Semi-supervised Instance Matching Using Boosted Classifiers
TLDR
This paper presents a minimally supervised instance matching approach that is able to deliver competitive performance using only 2i¾?% training data and little parameter tuning. Expand
  • 30
  • 5
  • PDF
Information Extraction in Illicit Web Domains
TLDR
We propose a lightweight, feature-agnostic Information Extraction (IE) paradigm specifically designed for illicit domains that employ atypical language models and suffer from concept drift. Expand
  • 22
  • 3
  • PDF
An unsupervised instance matcher for schema-free RDF data
TLDR
This article presents an unsupervised system that performs instance matching between entities in schema-free Resource Description Framework (RDF) files. Expand
  • 25
  • 2
Using contexts and constraints for improved geotagging of human trafficking webpages
TLDR
Extracting geographical tags from webpages is a well-motiva-ted application in many domains. Expand
  • 12
  • 2
  • PDF
Domain-Specific Knowledge Graph Construction
  • M. Kejriwal
  • Computer Science
  • SpringerBriefs in Computer Science
  • 5 March 2019
  • 10
  • 2
A DNF Blocking Scheme Learner for Heterogeneous Datasets
TLDR
We present an unsupervised algorithmic pipeline for learning DNF blocking schemes on RDF graph datasets, as well as structurally heterogeneous tables. Expand
  • 9
  • 2
  • PDF
A two-step blocking scheme learner for scalable link discovery
TLDR
A two-step procedure for learning a link-discovery blocking scheme is presented. Expand
  • 19
  • 1
  • PDF
P4ML: A Phased Performance-Based Pipeline Planner for Automated Machine Learning
While many problems could benefit from recent advances in machine learning, significant time and expertise are required to design customized solutions to each problem. Prior attempts to automateExpand
  • 10
  • 1
  • PDF
Supervised typing of big graphs using semantic embeddings
TLDR
We propose a supervised algorithm for generating type embeddings in the same semantic vector space as a given set of entity embeds, and apply it to a type recommendation task. Expand
  • 7
  • 1
  • PDF