FlashRelate: extracting relational data from semi-structured spreadsheets using examples

@article{Barowy2015FlashRelateER,
  title={FlashRelate: extracting relational data from semi-structured spreadsheets using examples},
  author={Daniel W. Barowy and Sumit Gulwani and Ted Hart and Benjamin G. Zorn},
  journal={Proceedings of the 36th ACM SIGPLAN Conference on Programming Language Design and Implementation},
  year={2015}
}
  • Daniel W. Barowy, Sumit Gulwani, +1 author B. Zorn
  • Published 3 June 2015
  • Computer Science
  • Proceedings of the 36th ACM SIGPLAN Conference on Programming Language Design and Implementation
With hundreds of millions of users, spreadsheets are one of the most important end-user applications. Spreadsheets are easy to use and allow users great flexibility in storing data. This flexibility comes at a price: users often treat spreadsheets as a poor man's database, leading to creative solutions for storing high-dimensional data. The trouble arises when users need to answer queries with their data. Data manipulation tools make strong assumptions about data layouts and cannot read these… Expand
Metadata Extraction for Low-Quality Semi-structured Spreadsheets
TLDR
An automated relation extractor tool that lets ordinary users extract structured relational tables from spreadsheets without previous experience is produced and a framework that automatically extracts relational data (tables) from spreadsheet and converts from Low-Quality data to High- quality data is introduced. Expand
NOAH: Interactive Spreadsheet Exploration with Dynamic Hierarchical Overviews
TLDR
The user studies demonstrate that NOAH makes it more intuitive, easier, and faster to navigate spreadsheet data compared to traditional spreadsheets like Microsoft Excel and spreadsheet plug-ins like Pivot Table, for a variety of exploration tasks. Expand
FIDEX: filtering spreadsheet data using examples
TLDR
This work presents a system, FIDEX, that can efficiently learn desired data filtering expressions from a small set of positive and negative string examples, and designs an expressive DSL to represent disjunctive filter expressions needed for several real-world data filtering tasks. Expand
Query processing of schema design problems for data-driven renormalization
TLDR
This thesis formally defines two kinds of queries—the point query and the stable interval query—to help users making design decisions and proposes two index structures, which can represent a list of FDs concisely but also process the queries efficiently. Expand
Rule-based spreadsheet data transformation from arbitrary to relational tables
TLDR
A novel table object model and rule-based language for table analysis and interpretation is presented that is intended to represent a physical and logical structure of an arbitrary table in the transformation process. Expand
Efficiently Transforming Tables for Joinability
TLDR
This work studies the problem of efficiently joining textual data under the condition that the join columns are not formatted the same and cannot be equi-joined, but they become joinable under some transformations, and shows that an efficient algorithm can be developed based on the common characteristics of the joined columns. Expand
TabbyXL: Rule-Based Spreadsheet Data Extraction and Transformation
TLDR
A table object model and domain-specific language of table analysis and interpretation rules are determined and a tool for transforming spreadsheet data from arbitrary to relational tables is considered. Expand
Foofah: Transforming Data By Example
TLDR
This paper develops a technique to synthesize data transformation programs by example, reducing this burden by allowing the analyst to describe the transformation with a small input-output example pair, without being concerned with the transformation steps required to get there. Expand
BlinkFill: Semi-supervised Programming By Example for Syntactic String Transformations
TLDR
A data structure InputDataGraph is developed to succinctly represent a large set of logical patterns that are shared across the input data, and used to efficiently learn substring expressions in a new PBE system BlinkFill. Expand
Example-Driven User Intent Discovery: Empowering Users to Cross the SQL Barrier Through Query by Example
TLDR
It is found that SQUID eliminates the barriers in studying the database schema, formalizing task semantics, and writing syntactically correct SQL queries, and thus, substantially alleviates the need for technical expertise in data exploration. Expand
...
1
2
3
4
5
...

References

SHOWING 1-10 OF 52 REFERENCES
Automatic web spreadsheet data extraction
TLDR
A system that automatically extracts relational data from spreadsheets, thereby enabling relational spreadsheet integration and a novel view of how users organize their data in spreadsheets is presented. Expand
Senbazuru: A Prototype Spreadsheet Database Management System
TLDR
It is demonstrated that Senbazuru, a prototype spreadsheet database management system (SSDBMS), is able to extract relational information from spreadsheets, which opens up opportunities for integration among spreadsheets and with other relational sources. Expand
FlashExtract: a framework for data extraction by examples
TLDR
This work presents a general framework FlashExtract to extract relevant data from semi-structured documents using examples, and describes instantiation of the framework to three different domains: text files, webpages, and spreadsheets. Expand
Quicksilver: Automatic Synthesis of Relational Queries
TLDR
This paper presents Quicksilver, a programming-by-demonstration solution that derives queries from user inputs that is designed to be easy and intuitive for users who are not familiar with database theory. Expand
Spreadsheet data manipulation using examples
TLDR
This work presents a programming by example methodology that allows end users to automate such repetitive tasks over large spreadsheet data by designing a domain-specific language and developing a synthesis algorithm that can learn programs in that language from user-provided examples. Expand
Spreadsheet table transformations from examples
TLDR
An automatic technique that takes from a user an example of how the user needs to transform a table of data, and provides to the user a program that implements the transformation described by the example, and presents a language of programs TableProg that can describe transformations that real users require. Expand
Automating string processing in spreadsheets using input-output examples
TLDR
The design of a string programming/expression language that supports restricted forms of regular expressions, conditionals and loops is described and an algorithm based on several novel concepts for synthesizing a desired program in this language is described from input-output examples. Expand
From spreadsheets to relational databases and back
TLDR
This paper presents techniques and tools to transform spreadsheets into relational databases and back, and implemented the data refinement rules and constructed Haskell-based tools to manipulate, optimize and refactor Excel-like spreadsheets. Expand
RoadRunner: automatic data extraction from data-intensive web sites
TLDR
The main intuition is that, in a dataintensive web site, pages can be classified in a small number of classes, such that pages belonging to the same class share a rather tight structure, and an novel technique is studied, which automatically generates a common wrapper by exploiting similarities and differences among pages of the sameclass. Expand
Generating Finite-State Transducers for Semi-Structured Data Extraction from the Web
TLDR
This paper presents SoftMealy, a novel wrapper representation formalism based on a finite-state transducer and contextual rules that can wrap a wide range of semistructured Web pages because FSTs can encode each different attribute permutation as a path. Expand
...
1
2
3
4
5
...