Probabilistic Databases

@inproceedings{Suciu2011ProbabilisticD,
  title={Probabilistic Databases},
  author={Dan Suciu and Dan Olteanu and Christopher R{\'e} and Christoph E. Koch},
  booktitle={Probabilistic Databases},
  year={2011}
}
Probabilistic databases are databases where the value of some attributes or the presence of some records are uncertain and known only with some probability. Applications in many areas such as information extraction, RFID and scientific data management, data cleaning, data integration, and financial risk assessment produce large volumes of uncertain data, which are best modeled and processed by a probabilistic database. This book presents the state of the art in representation formalisms and… 
Query processing techniques in probabilistic databases
TLDR
This paper discusses various queries including joins, top-k, skyline, aggregates on probabilistic databases, and provides outline of most popular probabilism databases: MayBMS and Trio.
Scaling Probabilistic Databases
TLDR
This project aims to fill the prevalent gap between the two fields of Databases and Machine Learning by scaling probabilistic databases to a distributed setting, which is a topic that so far has not been addressed in the literature.
Local structure and determinism in probabilistic databases
TLDR
This paper develops a novel approach for efficiently evaluating probabilistic queries over correlated databases where correlations are represented using a factor graph, a class of graphical models widely used for capturing correlations and performing statistical inference.
Normal Forms and Normalization for Probabilistic Databases under Sharp Constraints
TLDR
It is shown that well-known syntactic normal form conditions capture probabilistic databases with desirable update behavior, and that standard normalization procedures can be applied to standard representations of probabilism databases to obtain database schemata that satisfy the normal form condition, and can be updated efficiently.
Efficient Updates of Uncertain Databases
TLDR
The goal of this work is to start the investigation of the update problem of U-relations and to tackle the problem of evaluating queries with anti-joins over this formalism, and to introduce an extension of the Urelation representation system that may lead to an exponential decrease in the representation of an updated uncertain database.
A probabilistic relational database model and algebra
TLDR
A probabilistic relational database model, called PRDB, for representing and querying uncertain information of objects in practice is introduced and a set of the properties of the probabilism relational algebraic operations in PRDB also are formulated and proven.
Ranking Query Answers in Probabilistic Databases: Complexity and Efficient Algorithms
TLDR
This paper investigates the problem of ranking query answers in probabilistic databases and gives a dichotomy for ranking in case of conjunctive queries without repeating relation symbols: it is either in polynomial time or NP-hard, Surprisingly, the syntactic characterisation of tractable queries is not the same as for probability computation.
On the Connections between Relational and XML Probabilistic Data Models
TLDR
Translations between relational and XML models, based on the notion of compact representations of probability distributions over possible worlds, are detailed in this article, and interesting open issues about the connections are presented.
Conditioning Probabilistic Relational Data with Referential Constraints
TLDR
This paper devise and present polynomial algorithms for conditioning probabilistic relational databases with referential constraints involved, in which formulae of tuples are independent events in order to achieve some tractability results.
Most Probable Explanations for Probabilistic Database Queries
TLDR
This work investigates problems relative to a variety of query languages, ranging from conjunctive queries to ontology-mediated queries, and provides a detailed complexity analysis.
...
1
2
3
4
5
...

References

SHOWING 1-10 OF 152 REFERENCES
Representing and Querying Correlated Tuples in Probabilistic Databases
  • P. Sen, A. Deshpande
  • Computer Science
    2007 IEEE 23rd International Conference on Data Engineering
  • 2007
TLDR
This work develops an efficient strategy for query evaluation over Probabilistic databases by casting the query processing problem as an inference problem in an appropriately constructed probabilistic graphical model and presents several optimizations specific to probabilism databases that enable efficient query evaluation.
Read-once functions and query evaluation in probabilistic databases
TLDR
This paper develops novel, more efficient factorization algorithms that directly construct the read-once expression for a result tuple Boolean formula (if one exists), for a large subclass of queries (specifically, conjunctive queries without self-joins).
Indexing correlated probabilistic databases
TLDR
This paper develops efficient data structures and indexes for supporting inference and decision support queries over large-scale, correlated databases, and presents a comprehensive experimental study illustrating the benefits of the approach to query processing in probabilistic databases.
PrDB: managing and exploiting rich correlations in probabilistic databases
TLDR
This work defines a Probabilistic database model, PrDB, that uses graphical models, a state-of-the-art probabilistic modeling technique developed within the statistics and machine learning community, to model uncertain data and shows how the use of shared correlations, together with a novel inference algorithm based on bisimulation, can speed query processing significantly.
Implementing NOT EXISTS Predicates over a Probabilistic Database
TLDR
This paper presents an approach for supporting queries with NOT EXISTS in a probabilistic database management system, by leveraging the existing query processing infrastructure, and describes how this technique was integrated with MystiQ, and how it incorporated the top-k multi-simulation and safe-plans optimizations.
Efficient Top-k Query Evaluation on Probabilistic Data
TLDR
This paper describes a novel approach, which computes and ranks efficiently the top-k answers to a SQL query on a probabilistic database, which is to run in parallel several Monte-Carlo simulations, one for each candidate answer, and approximate each probability only to the extent needed to compute correctly the top -k answers.
Creating probabilistic databases from duplicated data
TLDR
A flexible modular framework for scalably creating a probabilistic database out of a dirty relation of duplicated data is presented and the challenges raised in utilizing this framework for large relations of string data are overviewed.
Semantics and evaluation of top-k queries in probabilistic databases
  • Xi Zhang, Jan Chomicki
  • Computer Science
    2008 IEEE 24th International Conference on Data Engineering Workshop
  • 2008
TLDR
A new semantics, Global-Topk, is introduced that satisfies three intuitive postulates for the semantics of top-k queries in probabilistic databases, and introduces a new semantics that satisfies those postulates to a large degree.
A probabilistic relational algebra for the integration of information retrieval and database systems
TLDR
The concept of vague predicates which yield probabilistic weights instead of Boolean values are introduced, thus allowing for queries with vague selection conditions and implements uncertainty and vagueness in combination with the relational model.
Exploiting shared correlations in probabilistic databases
TLDR
This work shows how data characteristics can be leveraged to make the query evaluation process more efficient, and introduces a new data structure, called the random variable elimination graph (rv-elim graph) that can be built from the PGM obtained from query evaluation.
...
1
2
3
4
5
...