Distilling Task Knowledge from How-To Communities

  title={Distilling Task Knowledge from How-To Communities},
  author={Cuong Xuan Chu and Niket Tandon and Gerhard Weikum},
  journal={Proceedings of the 26th International Conference on World Wide Web},
  • C. ChuNiket TandonG. Weikum
  • Published 3 April 2017
  • Computer Science
  • Proceedings of the 26th International Conference on World Wide Web
Knowledge graphs have become a fundamental asset for search engines. A fair amount of user queries seek information on problem-solving tasks such as building a fence or repairing a bicycle. However, knowledge graphs completely lack this kind of how-to knowledge. This paper presents a method for automatically constructing a formal knowledge base on tasks and task-solving steps, by tapping the contents of online communities such as WikiHow. We employ Open-IE techniques to extract noisy candidates… 

Figures and Tables from this paper

Task2KB: A Public Task-Oriented Knowledge Base

A novel knowledge base, ‘Task2KB’, is proposed, which is constructed using data crawled from WikiHow, an online knowledge resource offer- ing instructional articles on a wide range of tasks, which encapsulates various types of task-related information andributes.

Machine Knowledge: Creation and Curation of Comprehensive Knowledge Bases

Equipping machines with comprehensive knowledge of the world's entities and their relationships has been a long-standing goal of AI. Over the last decade, large-scale knowledge bases, also known as

Show Me More Details: Discovering Hierarchies of Procedures from Semi-structured Web Data

This work develops a simple and efficient method that links steps in an article to other articles with similar goals, recursively constructing an open-domain hierarchical knowledge-base of procedures based on wikiHow, a website containing more than 110k instructional articles.

Know-How in Programming Tasks: From Textual Tutorials to Task-Oriented Knowledge Graph

The resulting knowledge graph, TaskKG, includes a hierarchical taxonomy of activities, three types of activities relationships and five types of activity attributes, and enables activity-centric knowledge search and is promising in helping developers finding correct answers to programming how-to questions.

Information to Wisdom: Commonsense Knowledge Extraction and Compilation

This tutorial presents state-of-the-art methodologies towards the compilation and consolidation of commonsense knowledge (CSK), covering text-extraction-based, multi-modal and Transformer-based techniques, with special focus on the issues of web search and ranking, as of relevance to the WSDM community.

What Computers Should Know, Shouldn't Know, and Shouldn't Believe

Automatically constructed knowledge bases are a powerful asset for search, analytics, recommendations and data integration, with intensive use at big industrial stake-holders, forming the Web of Linked Open Data.

Reasoning about Goals, Steps, and Temporal Ordering with WikiHow

This work proposes a suite of reasoning tasks on two types of relations between procedural events: goal-step relations and step-step temporal relations, and introduces a dataset targeting these two relations based on wikiHow, a website of instructional how-to articles.

Procedural Knowledge Mining - A New Method for Extracting Best Practices by Applying Machine Learning on Data Graph

This work presents a new method for formalizing good practices extracted from the web, and extracting the best practice for a given request by applying the techniques of artificial learning and text summary on data graphs.

MyFixit: An Annotated Dataset, Annotation Tool, and Baseline Methods for Information Extraction from Repair Manuals

This paper introduces a semi-structured dataset of repair manuals and proposes methods that can serve as baselines for information extraction (IE) from the instructional text in repair manuals, including an unsupervised method based on a bags-of-n-grams similarity for extracting the needed tools in each repair step, and a deep-learning-based sequence labeling model for extracts the identity of disassembled parts.



Leveraging Procedural Knowledge for Task-oriented Search

A set of textual features and structural features are proposed to identify key search phrases from task descriptions, and then adapt similar features to extract wikiHow-style procedural knowledge descriptions from search queries and relevant text snippets.

Acquiring Comparative Commonsense Knowledge from the Web

This paper relies on open information extraction methods to obtain large amounts of comparisons from the Web and develops a joint optimization model for cleaning and disambiguating this knowledge with respect to WordNet, which relies on integer linear programming and semantic coherence scores.

Open Information Extraction: The Second Generation

The second generation of Open IE systems are described, which rely on a novel model of how relations and their arguments are expressed in English sentences to double precision/recall compared with previous systems such as TEXTRUNNER and WOE.

Open Language Learning for Information Extraction

Open Information Extraction (IE) systems extract relational tuples from text, without requiring a pre-specified vocabulary, by identifying relation phrases and associated arguments in arbitrary

Knowlywood: Mining Activity Knowledge From Hollywood Narratives

A pipeline for semantic parsing and knowledge distillation is developed, to systematically compile semantically refined activity frames, mined from about two million scenes of movies, TV series, and novels.

Creating Causal Embeddings for Question Answering with Minimal Supervision

This work argues that a better approach is to look for answers that are related to the question in a relevant way, according to the information need of the question, which may be determined through task-specific embeddings, and implements causality as a use case.

Cross Sentence Inference for Process Knowledge

This work extends standard within sentence joint inference to inference across multiple sentences, which promotes role assignments that are compatible across different descriptions of the same process.

POLY: Mining Relational Paraphrases from Multilingual Sentences

A new method for building language resources that systematically organize paraphrases for binary relations and the resource itself, called POLY is presented, which shows significant improvements in precision and recall over the prior works on PATTY and DEFIE.

PATTY: A Taxonomy of Relational Patterns with Semantic Types

PATTY is a large resource for textual patterns that denote binary relations between entities that are semantically typed and organized into a subsumption taxonomy that harnesses the rich type system and entity population of large knowledge bases.

Question-Answer Driven Semantic Role Labeling: Using Natural Language to Annotate Natural Language

The results show that non-expert annotators can produce high quality QA-SRL data, and also establish baseline performance levels for future work on this task, and introduce simple classifierbased models for predicting which questions to ask and what their answers should be.