Learn More
— We review the principles of Minimum Description Length and Stochastic Complexity as used in data compression and statistical modeling. Stochastic complexity is formulated as the solution to optimum universal coding problems extending Shannon's basic source coding theorem. The normalized maximized likelihood, mixture, and predictive codings are each shown(More)
Classification is an important problem in the emerging field of data mining. Although classification has been studied extensively in the past, most of the classification algorithms are designed only for memory-resident data, thus limiting their suitability for data mining large data sets. This paper discusses issues in building a scalable classi-fier and(More)
Abs&uct-In this paper an irreducible parameterization for a finite memory source is constructed in the form of a tree machine. A universal information source for the set of finite memory sources is constructed by a predictive modification of an earlier studied algorithm-Context. It is shown that this universal source incorporates any minimal data-generating(More)
The so-called denoising problem, relative to normal models for noise, is formalized such that`noise' is deened as the incompressible part in the data while the compressible part deenes the meaningful information bearing signal. Such a decomposition is eeected by minimization of the ideal code length, called for by the Minimum Description Length (MDL)(More)
Inspired by theoretical results on universal modeling, a general framework for sequential modeling of gray-scale images is proposed and applied to lossless compression. The model is based on stochastic complexity considerations and is implemented with a tree structure. It is efficiently estimated by a modification of the universal algorithm context. Several(More)