• Corpus ID: 4644547

Abstractive Tabular Dataset Summarization via Knowledge Base Semantic Embeddings

  title={Abstractive Tabular Dataset Summarization via Knowledge Base Semantic Embeddings},
  author={Paul Azunre and Craig Corcoran and David Sullivan and Garrett Honke and Rebecca Ruppel and Sandeep Verma and Jonathon Morgan},
This paper describes an abstractive summarization method for tabular data which employs a knowledge base semantic embedding to generate the summary. Assuming the dataset contains descriptive text in headers, columns and/or some augmenting metadata, the system employs the embedding to recommend a subject/type for each text segment. Recommendations are aggregated into a small collection of super types considered to be descriptive of the dataset by exploiting the hierarchy of types in a pre… 

Figures and Tables from this paper

Keep It or Not: Word Level Quality Estimation for Post-Editing

Experimental results revealed that the Doc2Vec model performs better than the BoW model on the word level QE task, and the threshold based on which OK/BAD decisions are taken for the MT words is derived.

Distil : A Mixed-Initiative Model Discovery System for Subject Matter Experts ( Demo )

We present in-progress work on Distil, a mixed-initiative system to enable non-experts with subject matter expertise to generate data-driven models using an interactive analytic question first



Applying Universal Schemas for Domain Specific Ontology Expansion

This paper investigates the use of universal schemas (Riedel et al., 2013) as a mechanism for ontology maintenance on top of two unique data sources: 14 million full-text scientific articles and chapters, plus a 1 million concept handcurated medical ontology.

TYPifier: Inferring the type semantics of structured data

This work forms the problem of inferring the type semantics of structured data as a clustering problem and discusses the features needed to obtain several solutions based on existing clustering solutions, and presents TYPifier, a novel clustering algorithm that in experiments, yields better typification results than the baseline clustering Solutions.

Entity Typing Using Distributional Semantics and DBpedia

A gap in coverage between what is in the text and what can be annotated with fine grained types is created and this allows semantic querying over a dataset, for example selecting all politicians or football players.

Supervised typing of big graphs using semantic embeddings

The algorithm is agnostic to the derivation of the underlying entity embeddings, generalizes well to hundreds of types and achieves near-linear scaling on Big Graphs containing many millions of triples and instances by virtue of an incremental execution.

Knowledge Base Completion Using Embeddings and Rules

This paper proposes a novel approach which incorporates rules seamlessly into embedding models for KB completion, and formulates inference as an integer linear programming (ILP) problem, with the objective function generated fromembedding models and the constraints translated from rules.

RDF2Vec: RDF Graph Embeddings for Data Mining

RDF2Vec is presented, an approach that uses language modeling approaches for unsupervised feature extraction from sequences of words, and adapts them to RDF graphs, and shows that feature vector representations of general knowledge graphs such as DBpedia and Wikidata can be easily reused for different tasks.

Unsupervised Learning of Sentence Embeddings Using Compositional n-Gram Features

This work presents a simple but efficient unsupervised objective to train distributed representations of sentences, which outperforms the state-of-the-art un supervised models on most benchmark tasks, highlighting the robustness of the produced general-purpose sentence embeddings.

Distributed Representations of Sentences and Documents

Paragraph Vector is an unsupervised algorithm that learns fixed-length feature representations from variable-length pieces of texts, such as sentences, paragraphs, and documents, and its construction gives the algorithm the potential to overcome the weaknesses of bag-of-words models.

GloVe: Global Vectors for Word Representation

A new global logbilinear regression model that combines the advantages of the two major model families in the literature: global matrix factorization and local context window methods and produces a vector space with meaningful substructure.

Distributed Representations of Words and Phrases and their Compositionality

This paper presents a simple method for finding phrases in text, and shows that learning good vector representations for millions of phrases is possible and describes a simple alternative to the hierarchical softmax called negative sampling.