Likelihood-Based Data Squashing: A Modeling Approach to Instance Construction

  title={Likelihood-Based Data Squashing: A Modeling Approach to Instance Construction},
  author={David Madigan and Nandini Raghavan and William DuMouchel and Martha Nason and Christian Posse and Greg Ridgeway},
  journal={Data Mining and Knowledge Discovery},
Squashing is a lossy data compression technique that preserves statistical information. Specifically, squashing compresses a massive dataset to a much smaller one so that outputs from statistical analyses carried out on the smaller (squashed) dataset reproduce outputs from the same statistical analyses carried out on the original dataset. Likelihood-based data squashing (LDS) differs from a previously published squashing algorithm insofar as it uses a statistical model to squash the data. The… CONTINUE READING
Highly Cited
This paper has 56 citations. REVIEW CITATIONS

From This Paper

Topics from this paper.


Publications citing this paper.
Showing 1-10 of 39 extracted citations

57 Citations

Citations per Year
Semantic Scholar estimates that this publication has 57 citations based on the available data.

See our FAQ for additional information.


Publications referenced by this paper.
Showing 1-10 of 16 references

Squashing at le atter

  • W. DuMouchel, C. Volinsky, T. Johnson, C. Cortes, D. Pregibon
  • Proceedings of the Fifth ACM Conference on…
  • 1999
1 Excerpt

Report of the working group on storage I / O issues in large - scale computing

  • G. A. Gibson, J. S. Vitter, J. Wilkes
  • ACM Computing Surveys
  • 1996
1 Excerpt

Empirical Model Building and Response Surfaces

  • G.E.P. Box, N. R. Draper
  • John Wiley & Sons, New York, NY, USA, Bradley, P…
  • 1987
1 Excerpt

Similar Papers

Loading similar papers…