• Corpus ID: 10019208

Near-Optimal Compression of Probabilistic Counting Sketches for Networking Applications

  title={Near-Optimal Compression of Probabilistic Counting Sketches for Networking Applications},
  author={Bj{\"o}rn Scheuermann and Martin Mauve},
Sketches—data structures for probabilistic, duplicate insensitive counting—are central building blocks of a number of recently proposed network protocols, for example in the context of wireless sensor networks. They can be used to perform robust, distributed data aggregation in a broad range of settings and applications. However, the structure of these sketches is very redundant, making effective compression vital if they are to be transmitted over a network. Here, we propose lossless… 

Figures from this paper

A survey of sketches in traffic measurement: Design, Optimization, Application and Implementation
This work introduces the preparation of flows for measurement, then detail the most recent investigations of design, aggregation, decoding, application and implementation of sketches for network measurement, covering more than 90 sketch designs and optimization strategies.
Sketch for traffic measurement: design, optimization, application and implementation
This work introduces the preparation of flows for measurement, then details the most recent investigations of design, aggregation, decoding, application and implementation of sketches for network measurement, and conducts an in-depth study of the existing literature.
Non-Mergeable Sketching for Cardinality Estimation
It is proved that the Martingale transform is optimal in the non-mergeable world, and that the Fishmonger sketch in particular is optimal among linearizable sketches, with an MVP of $H_0/2 \approx 1.63$.
How to Make Private Distributed Cardinality Estimation Practical, and Get Differential Privacy for Free
It is revealed that if the cardinality to be estimated is large enough, the protocol can achieve (ε,δ)-differential privacy automatically, without requiring any additional manipulation of the output, which signifies a new approach for achieving differential privacy that departs from the mainstream approach.
High-Speed Per-Flow Traffic Measurement with Probabilistic Multiplicity Counting
Probabilistic Multiplicity Counting (PMC) is presented, a novel data structure that is capable of accounting traffic per flow probabilistically and provides very accurate traffic statistics.
Approximating Private Set Union/Intersection Cardinality With Logarithmic Complexity
Efficient approximate protocols, whose accuracy can be tuned according to application requirements are proposed, which are derived from the PSU-CA protocol with virtually no cost and can hide its output.
Cardinality Estimation for Elephant Flows
For many practical applications, it is a fundamental problem to estimate the flow cardinalities over big network data consisting of numerous flows (especially a large quantity of mouse flows mixed
Cardinality Estimation for Elephant Flows: A Compact Solution Based on Virtual Register Sharing
A unified framework of virtual estimators is proposed that allows the idea of sharing to apply to an array of cardinality estimation solutions, e.g., HyperLogLog and PCSA, achieving far better memory efficiency than the best existing work.
A probabilistic method for cooperative hierarchical aggregation of data in VANETs
This work proposes soft-state sketches-an extension of Flajolet-Martin sketches-as a probabilistic approximation for the hierarchical aggregation of observations in dissemination-based, distributed traffic information systems, which is duplicate insensitive and results in a very flexible aggregate construction and a high quality of the aggregates.
Distributed super point cardinality estimation under sliding time window for high speed network
  • Jie Xu
  • Computer Science
  • 2018
The algorithm proposed in this paper could detect super points and estimate their cardinalities under sliding time window in real time and devises a novel reversible hash function scheme to restore super point from a pool of AT.


Approximate aggregation techniques for sensor databases
This work generalizes well known duplicate-insensitive sketches for approximating COUNT to handle SUM and presents and analyze methods for using sketches to produce accurate results with low communication and computation overhead, and presents an extensive experimental validation of the methods.
Counting by Coin Tossings
This text is an informal review of several randomized algorithms that have appeared over the past two decades and have proved instrumental in extracting efficiently quantitative characteristics of
Synopsis diffusion for robust aggregation in sensor networks
This paper presents a general framework for achievingantly more accurate and reliable answers by combining energy-efficient multi-path routing schemes with techniques that avoid double-counting, and demonstrates the significant robustness, accuracy, and energy-efficiency improvements of synopsis diffusion over previous approaches.
HyperLogLog: the analysis of a near-optimal cardinality estimation algorithm
This extended abstract describes and analyses a near-optimal probabilistic algorithm, HYPERLOGLOG, dedicated to estimating the number of \emphdistinct elements (the cardinality) of very large data
Bitmap Algorithms for Counting Active Flows on High-Speed Links
A family of bitmap algorithms that address the problem of counting the number of distinct header patterns (flows) seen on a high-speed link and can be used to detect DoS attacks and port scans and to solve measurement problems.
Probabilistic Counting Algorithms for Data Base Applications
A class of probabilistic counting algorithms with which one can estimate the number of distinct elements in a large collection of data in a single pass using only a small additional storage and only a few operations per element scanned is introduced.
Efficient and decentralized computation of approximate global state
The need for efficient computation of approximate global state lies at the heart of a wide range of problems in distributed systems and solving these problems can radically improve the design of robust, efficient and self-managed distributed systems.
Probabilistic aggregation for data dissemination in VANETs
An algorithm for the hierarchical aggregation of observations in dissemination-based, distributed traffic information systems that overcomes two central problems of existing aggregation schemes for VANET applications and contains a modified Flajolet-Martin sketch as a probabilistic approximation.
Loglog counting of large cardinalities
Using an auxiliary memory smaller than the size of this abstract, the LOGLOG algorithm makes it possible to estimate in a single pass and within a few percents the number of different words in the
Order statistics and estimating cardinalities of massive data sets
  • F. Giroire
  • Computer Science, Mathematics
    Discret. Appl. Math.
  • 2009
A new class of algorithms to estimate the cardinality of very large multisets using constant memory and doing only one pass on the data is introduced here. It is based on order statistics rather than