WebChild: harvesting and organizing commonsense knowledge from the web

@article{Tandon2014WebChildHA,
  title={WebChild: harvesting and organizing commonsense knowledge from the web},
  author={Niket Tandon and Gerard de Melo and Fabian M. Suchanek and Gerhard Weikum},
  journal={Proceedings of the 7th ACM international conference on Web search and data mining},
  year={2014}
}
This paper presents a method for automatically constructing a large commonsense knowledge base, called WebChild, from Web contents. [] Key Method Our method is based on semi-supervised Label Propagation over graphs of noisy candidate assertions. We automatically derive seeds from WordNet and by pattern matching from Web text collections. The Label Propagation algorithm provides us with domain sets and range sets for 19 different relations, and with confidence-ranked assertions between WordNet senses. Large…

Figures and Tables from this paper

Refined Commonsense Knowledge from Large-Scale Web Contents

TLDR
This paper presents a method called A SCENT ++ to automatically build a large-scale knowledge base (KB) of CSK assertions, with refined expressiveness and both better precision and recall than prior works.

Acquiring Comparative Commonsense Knowledge from the Web

TLDR
This paper relies on open information extraction methods to obtain large amounts of comparisons from the Web and develops a joint optimization model for cleaning and disambiguating this knowledge with respect to WordNet, which relies on integer linear programming and semantic coherence scores.

WebChild 2.0 : Fine-Grained Commonsense Knowledge Distillation

TLDR
This paper presents a system based on a series of algorithms to distill fine-grained disambiguated commonsense knowledge from massive amounts of text.

Commonsense in Parts: Mining Part-Whole Relations from the Web and Image Tags

TLDR
A new method for automatically acquiring part-whole commonsense from Web contents and image tags at an unprecedented scale, yielding many millions of assertions, while specifically addressing the four shortcomings of prior work.

Machine Knowledge: Creation and Curation of Comprehensive Knowledge Bases

Equipping machines with comprehensive knowledge of the world's entities and their relationships has been a long-standing goal of AI. Over the last decade, large-scale knowledge bases, also known as

Advanced Semantics for Commonsense Knowledge Extraction

TLDR
This paper presents a methodology, called Ascent, to automatically build a large-scale knowledge base (KB) of CSK assertions, with advanced expressiveness and both better precision and recall than prior works.

Commonsense Properties from Query Logs and Question Answering Forums

TLDR
Quasimodo, a methodology and tool suite for distilling commonsense properties from non-standard web sources that focuses on salient properties that are typically associated with certain objects or concepts, is presented.

Domain specific commonsense relation extraction from bag of concepts metadata

TLDR
This paper proposes an approach to collect commonsense relations for specific domains by mining knowledge of global structure and internal association in the bag of concepts from metadata of data collections, and extracts Commonsense relations of Concepts from social tags of image datasets to show the efficiency of the solution.

Mining Verb-Oriented Commonsense Knowledge

TLDR
This paper proposes a knowledge-driven approach to mine verb-oriented commonsense knowledge from verb phrases with the help of taxonomy, and designs an entropy-based filter to cope with noisy input verb phrases and proposes a joint model based on minimum description length and a neural language model to generate verb- oriented common-sense knowledge.

Commonsense Properties fromQuery Logs andQuestion

TLDR
Quasimodo, a methodology and tool suite for distilling commonsense properties from non-standard web sources that focuses on salient properties that are typically associated with certain objects or concepts, is presented.
...

References

SHOWING 1-10 OF 49 REFERENCES

Deriving a Web-Scale Common Sense Fact Database

TLDR
This paper shows how to gather large amounts of common sense facts from Web n-gram data, using seeds from the ConceptNet collection, and shows that this approach extends ConceptNet by many orders of magnitude at comparable levels of precision.

WebSets: extracting sets of entities from the web using unsupervised information extraction

TLDR
This work describes a open-domain information extraction method for extracting concept-instance pairs from an HTML corpus that relies on a novel approach for clustering terms found in HTML tables, and then assigning concept names to these clusters using Hearst patterns.

Yago: a core of semantic knowledge

TLDR
YAGO builds on entities and relations and currently contains more than 1 million entities and 5 million facts, which includes the Is-A hierarchy as well as non-taxonomic relations between entities (such as HASONEPRIZE).

Identifying Relations for Open Information Extraction

TLDR
Two simple syntactic and lexical constraints on binary relations expressed by verbs are introduced in the ReVerb Open IE system, which more than doubles the area under the precision-recall curve relative to previous extractors such as TextRunner and woepos.

ConceptNet — A Practical Commonsense Reasoning Tool-Kit

ConceptNet is a freely available commonsense knowledge base and natural-language-processing tool-kit which supports many practical textual-reasoning tasks over real-world documents including

ClausIE: clause-based open information extraction

TLDR
ClausIE is a novel, clause-based approach to open information extraction, which extracts relations and their arguments from natural language text using a small set of domain-independent lexica, operates sentence by sentence without any post-processing, and requires no training data.

An Adapted Lesk Algorithm for Word Sense Disambiguation Using WordNet

TLDR
This paper presents an adaptation of Lesk's dictionary-based word sense disambiguation algorithm that uses the lexical database WordNet as the source of glosses for this approach, and attains an overall accuracy of 32%.

Commonsense Knowledge Extraction Using Concepts Properties

TLDR
This paper presents a semantically grounded method for extracting commonsense knowledge that is able to extract thousands of commonsense facts with little human interaction and high accuracy.

Bootstrapping a Game with a Purpose for Commonsense Collection

TLDR
An architecture to combine the best of both worlds: A game with a purpose that induces humans to clean up data automatically extracted by text mining and bootstrapping (i.e., training the text miner on the output of the game) improves the subsequent performance of thetext miner.

Good, Great, Excellent: Global Inference of Semantic Intensities

TLDR
A primarily unsupervised approach that uses semantics from Web-scale data to rank words by assigning them positions on a continuous scale, which achieves substantial improvements over previous work on both pairwise and rank correlation metrics.