Learn More
— We review the principles of Minimum Description Length and Stochastic Complexity as used in data compression and statistical modeling. Stochastic complexity is formulated as the solution to optimum universal coding problems extending Shannon's basic source coding theorem. The normalized maximized likelihood, mixture, and predictive codings are each shown(More)
Classification is an important problem in the emerging field of data mining. Although classification has been studied extensively in the past, most of the classification algorithms are designed only for memory-resident data, thus limiting their suitability for data mining large data sets. This paper discusses issues in building a scalable classi-fier and(More)
The so-called denoising problem, relative to normal models for noise, is formalized such that`noise' is deened as the incompressible part in the data while the compressible part deenes the meaningful information bearing signal. Such a decomposition is eeected by minimization of the ideal code length, called for by the Minimum Description Length (MDL)(More)
I. INTRODUCTION The NML (Normalized Maximum Likelihood) universal model has certain minmax optimal properties but it has two shortcomings: the normalizing coefficient can be evaluated in a closed form only for special model classes, and it does not define a random process so that it cannot be used for prediction. We present a universal conditional NML(More)
Abs&uct-In this paper an irreducible parameterization for a finite memory source is constructed in the form of a tree machine. A universal information source for the set of finite memory sources is constructed by a predictive modification of an earlier studied algorithm-Context. It is shown that this universal source incorporates any minimal data-generating(More)