This paper describes lossless compression algorithms for multisets of sequences, taking advantage of the multiset's unordered structure. Multisets are a generalization of sets, where members areâ€¦ (More)

This thesis makes several contributions to the field of data compression. Lossless data compression algorithms shorten the description of input objects, such as sequences of text, in a way thatâ€¦ (More)

The Boolean Satisfiability Problem (SAT) belongs to the class of NP-complete problems, meaning that there is no known deterministic algorithm that can solve an arbitrary problem instance in less thanâ€¦ (More)

Most of the world's digital data is currently encoded in a sequential form, and compression methods for sequences have been studied extensively. However, there are many types of non-sequential dataâ€¦ (More)

This article makes several improvements to the classic PPM algorithm, resulting in a new algorithm with superior compression effectiveness on human text. The key differences of our algorithm toâ€¦ (More)

Technological advances have led to an explosion in the number of available biological datasets. These include measurements on a genomic scale such as extensive genotype data and the profiling ofâ€¦ (More)

The majority of online content is written in languages other than English, and is most commonly encoded in UTF-8, the world's dominant Unicode character encoding. Traditional compression algorithmsâ€¦ (More)