PCPs and the Hardness of Generating Private Synthetic Data

@inproceedings{Ullman2011PCPsAT,
  title={PCPs and the Hardness of Generating Private Synthetic Data},
  author={Jonathan Ullman and Salil P. Vadhan},
  booktitle={TCC},
  year={2011}
}
Assuming the existence of one-way functions, we show that there is no polynomial-time, differentially private algorithm A that takes a database D ∈ ({0, 1}d)n and outputs a "synthetic database" D all of whose two-way marginals are approximately equal to those of D. (A two-way marginal is the fraction of database rows x ∈ {0, 1}d with a given pair of values in a given pair of columns). This answers a question of Barak et al. (PODS '07), who gave an algorithm running in time poly(n, 2d). Our… 

Faster Algorithms for Privately Releasing Marginals

TLDR
To the knowledge, this work is the first algorithm capable of privately releasing marginal queries with non-trivial worst-case accuracy guarantees in time substantially smaller than the number of k-way marginal queries, which is dΘ(k) (for k≪d).

Faster private release of marginals on small databases

TLDR
To the best of the knowledge, this is the first algorithm capable of privately answering marginal queries with a non-trivial worst-case accuracy guarantee for databases containing poly(d, k) records in time exp(o(d)).

Fingerprinting codes and the price of approximate differential privacy

TLDR
The results rely on the existence of short fingerprinting codes (Boneh and Shaw, CRYPTO'95; Tardos, STOC'03), which are closely connected to the sample complexity of differentially private data release.

Using Convex Relaxations for Efficiently and Privately Releasing Marginals

TLDR
This work presents a polynomial time algorithm that matches the best known information-theoretic bounds when k = 2 and achieves average error at most Õ(√nd[k/2]/4), an improvement over previous work on when k is small and when error o(n) is desirable.

Faster Algorithms for Privately Releasing Marginals Please share how this access benefits you. Your story matters

TLDR
To the knowledge, this work gives an algorithm that runs in time d O ( √ k ) and releases a private summary capable of answering any k -way marginal query with at most ± .

Differentially Private Data Releasing for Smooth Queries with Synthetic Database Output

TLDR
This work develops an $\epsilon$-differentially private mechanism for the class of $K$-smooth queries that outputs a synthetic database and achieves an accuracy of $O (n^{-\frac{K}{2d+K}}/\ep silon )$, and runs in polynomial time.

Answering n{2+o(1)} counting queries with differential privacy is hard

TLDR
It is proved that if one-way functions exist, then there is no algorithm that takes as input a database db ∈ dbset, and k = ~Ω(n2) arbitrary efficiently computable counting queries, runs in time poly(d, n), and returns an approximate answer to each query, while satisfying differential privacy.

A learning theory approach to noninteractive database privacy

TLDR
It is shown that, ignoring computational constraints, it is possible to release synthetic databases that are useful for accurately answering large classes of queries while preserving differential privacy and a relaxation of the utility guarantee is given.

Strong Hardness of Privacy from Weak Traitor Tracing

TLDR
The hardness result for a polynomial size query set resp.

New Oracle-Efficient Algorithms for Private Synthetic Data Release

TLDR
Three new algorithms for constructing differentially private synthetic data are presented---a sanitized version of a sensitive dataset that approximately preserves the answers to a large collection of statistical queries that are computationally efficient when given access to an optimization oracle.
...

References

SHOWING 1-10 OF 40 REFERENCES

Practical privacy: the SuLQ framework

TLDR
This work considers a statistical database in which a trusted administrator introduces noise to the query responses with the goal of maintaining privacy of individual database entries, and modify the privacy analysis to real-valued functions f and arbitrary row types, greatly improving the bounds on noise required for privacy.

Calibrating Noise to Sensitivity in Private Data Analysis

TLDR
The study is extended to general functions f, proving that privacy can be preserved by calibrating the standard deviation of the noise according to the sensitivity of the function f, which is the amount that any single argument to f can change its output.

Interactive privacy via the median mechanism

TLDR
The median mechanism is the first privacy mechanism capable of identifying and exploiting correlations among queries in an interactive setting, and an efficient implementation is given, with running time polynomial in the number of queries, the database size, and the domain size.

Cryptographic limitations on learning Boolean formulae and finite automata

TLDR
It is proved that a polynomial-time learning algorithm for Boolean formulae, deterministic finite automata or constant-depth threshold circuits would have dramatic consequences for cryptography and number theory and is applied to obtain strong intractability results for approximating a generalization of graph coloring.

The complexity of properly learning simple concept classes

Differential Privacy

  • C. Dwork
  • Computer Science
    Encyclopedia of Cryptography and Security
  • 2006
TLDR
A general impossibility result is given showing that a formalization of Dalenius' goal along the lines of semantic security cannot be achieved, which suggests a new measure, differential privacy, which, intuitively, captures the increased risk to one's privacy incurred by participating in a database.

Boosting and Differential Privacy

TLDR
This work obtains an $O(\eps^2) bound on the {\em expected} privacy loss from a single $\eps$-\dfp{} mechanism, and gets stronger bounds on the expected cumulative privacy loss due to multiple mechanisms, each of which provides $\eps-differential privacy or one of its relaxations, and each ofWhich operates on (potentially) different, adaptively chosen, databases.

Computationally Sound Proofs

  • S. Micali
  • Computer Science, Mathematics
    SIAM J. Comput.
  • 2000
TLDR
If a special type of computationally sound proof exists, it is shown that Blum's notion of program checking can be meaningfully broadened so as to prove that $\cal N \cal P$-complete languages are checkable.

Universal arguments and their applications

  • B. BarakOded Goldreich
  • Computer Science, Mathematics
    Proceedings 17th IEEE Annual Conference on Computational Complexity
  • 2002
TLDR
It is shown that universal-arguments can be constructed based on standard intractability assumptions that refer to polynomial-size circuits (rather than assumptions referring to subexponential- size circuits as used in the construction of CS-proofs).

Privacy-Preserving Datamining on Vertically Partitioned Databases

TLDR
Under a rigorous definition of breach of privacy, Dinur and Nissim proved that unless the total number of queries is sub-linear in the size of the database, a substantial amount of noise is required to avoid a breach, rendering the database almost useless.