Scalable test data generation from multidimensional models

@inproceedings{Torlak2012ScalableTD,
  title={Scalable test data generation from multidimensional models},
  author={Emina Torlak},
  booktitle={SIGSOFT FSE},
  year={2012}
}
  • E. Torlak
  • Published in SIGSOFT FSE 11 November 2012
  • Computer Science
Multidimensional data models form the core of modern decision support software. The need for this kind of software is significant, and it continues to grow with the size and variety of datasets being collected today. Yet real multidimensional instances are often unavailable for testing and benchmarking, and existing data generators can only produce a limited class of such structures. In this paper, we present a new framework for scalable generation of test data from a rich class of… 

Figures and Tables from this paper

Practical Model-driven Data Generation for System Testing
TLDR
This work presents a novel approach, whereby it employs a combination of metaheuristic search and Satisfiability Modulo Theories (SMT) for constraint solving, and indicates that this approach presents substantial benefits over the state of the art in terms of applicability and scalability.
Applying combinatorial test data generation to big data applications
TLDR
The experience shows that combinatorial testing can be effectively applied to big data applications, and the test data sets created using this approach for the two ETL applications are only a small fraction of the original data source, but were able to detect all the faults found with the originalData source.
Just can't get enough: Synthesizing Big Data
TLDR
This work presents an automatic approach to data synthetization from existing data sources that enables a fully automatic generation of large amounts of complex, realistic, synthetic data.
Issues in big data testing and benchmarking
TLDR
Initial solutions and challenges with respect to big data generation, methods for creating realistic, privacy-aware, and arbitrarily scalable data sets, workloads, and benchmarks from real world data are described.
Reversing statistics for scalable test databases generation
TLDR
RSGen is proposed, an approach to generating datasets out of customer metadata information, including schema, integrity constraints and statistics, that enables generation of data that closely matches the customer environment, and is fast, scalable and extensible.
Automated Synthesis and Dynamic Analysis of Tradeoff Spaces for Object-Relational Mapping
TLDR
An approach to solving the problem of producing software systems that achieve acceptable tradeoffs among multiple non-functional properties that combines synthesis of spaces of design alternatives from logical specifications and dynamic analysis of each point in the resulting spaces is proposed.
Toward tractable instantiation of conceptual data models using non-semantics-preserving model transformations
TLDR
This work extends the set of ORM models that can be transformed to ORM− models by using a class of non-semantics-preserving transformations called constraint strengthening, and formalizes the approach as a special case of Stevens’ model transformation framework.
Touchstone: Generating Enormous Query-Aware Test Databases
TLDR
The design and implementation of a new data generator, called Touchstone, which adopts the random sampling algorithm instantiating the query parameters and the new data generation schema generating the test database, to achieve fully parallel data generation, linear scalability and austere memory consumption.
Is this Real?: Generating Synthetic Data that Looks Real
TLDR
This evaluation of Synner demonstrates its effectiveness at generating realistic data when compared with Mockaroo, a popular data generation tool, and with hired developers who coded data generation scripts for a fee.
An Experimental Approach and Monitoring Tools for Evaluating a Dynamic Cubing System
TLDR
The authors propose an approach and different tools to evaluate the performance and assess the effectiveness of a model in the field of dynamic cubing and identify a number of useful tools necessary to develop an experimental evaluation strategy.
...
1
2
...

References

SHOWING 1-10 OF 49 REFERENCES
Scalable analysis of conceptual data models
TLDR
This study extends ORM− with support for two of them: objectification and a restricted class of external uniqueness constraints, which significantly improve the ability to analyze the ORM models created by developers using the new tool.
Scalable satisfiability checking and test data generation from modeling diagrams
TLDR
This work defines a restricted subset of ORM that allows efficient reasoning yet contains most constraints overwhelmingly used in practice, and shows that the problem of deciding whether these constraints are consistent is solvable in polynomial time, and produces a highly efficient checker.
Data generation using declarative constraints
TLDR
It is argued that a natural, expressive, and declarative mechanism for specifying data characteristics is through cardinality constraints; a cardinality constraint specifies that the output of a query over the generated database have a certain cardinality.
Generating consistent test data: Restricting the search space by a generator formula
TLDR
A new two-step approach is proposed that operationalizes the generator formula by translating it into a sequence of operators, and then executes it to construct the test database and introduces two powerful operators: the generation operator and the test-and-repair operator.
A foundation for capturing and querying complex multidimensional data
Constraint-based test database generation for SQL queries
TLDR
An approach for the automatic generation of a test database for a set of SQL queries using a test criterion specifically tailored for the SQL language (SQLFpc) is proposed, generating a testdatabase of reduced size with an elevated coverage and mutation score.
A survey on summarizability issues in multidimensional modeling
Capturing summarizability with integrity constraints in OLAP
TLDR
A sound and complete algorithm for solving the implication of dimension constraints that uses heuristics based on the structure of the dimension and the constraints to speed up its execution is given.
Quickly generating billion-record synthetic databases
TLDR
Several database generation techniques are presented, in terms of generating billion-record SQL databases using C programs running on a shared-nothing computer system consisting of a hundred processors, with a thousand discs.
Flexible Database Generators
TLDR
This paper presents a flexible, easy to use, and scalable framework for database generation, and discusses how to map several proposed synthetic distributions to this framework and report preliminary results.
...
1
2
3
4
5
...