Approximating the number of differences between remote sets

@article{Agarwal2006ApproximatingTN,
  title={Approximating the number of differences between remote sets},
  author={Sachin Agarwal and Ari Trachtenberg},
  journal={2006 IEEE Information Theory Workshop - ITW '06 Punta del Este},
  year={2006},
  pages={217-221}
}
  • S. Agarwal, A. Trachtenberg
  • Published 13 March 2006
  • Computer Science
  • 2006 IEEE Information Theory Workshop - ITW '06 Punta del Este
We consider the problem of approximating the number of differences between sets held on remote hosts using minimum communication. Efficient solutions to this problem are important for streamlining a variety of communication sensitive network applications, including data synchronization in mobile networks, gossip protocols and content delivery networks. Using tools from the field of interactive communication, we show that this problem requires about as much communication as the problem of… 

Figures from this paper

Nye's Trie and Floret Estimators: Techniques for Detecting and Repairing Divergence in the SCADS Distributed Storage Toolkit
TLDR
The floret estimator is introduced, a novel sublinear-space set summarization structure used to estimate the cardinalities of set difference, union, and intersection operations in the SCADS system.
Error management and detection in computer networks using Bloom filters
TLDR
By using the Bloom filter, a new method for error management and detection in computer networks is presented and the results show that Bloom filter with the length of x input string is able to detects all errors except for errors with thelength of exactly 2x, 4x and... etc.
InfoPuzzle: Exploring Group Decision Making in Mobile Peer-to-Peer Databases
As Internet-based services and mobile computing devices, such as smartphones and tablets, become ubiquitous, society's reliance on them to accomplish critical and time-sensitive tasks, such as
Collaborative Data Gathering in Wireless Sensor Networks Using Measurement Co-Occurrence
  • K. Kalpakis, Shilang Tang
  • Computer Science
    2007 International Conference on Sensor Technologies and Applications (SENSORCOMM 2007)
  • 2007
Collaborative Data Gathering in Wireless Sensor Networks Using Measurement Co-Occurrence
TLDR
This work proposes a novel collaborative data gathering approach utilizing data co-occurrence, which is different from data correlation, that offers a trade-off between communication costs of data gathering versus errors at estimating the sensor measurements at the base station.
Avaliação Empírica de Técnicas de Comparação Privada Aplicadas na Resolução de Entidades
TLDR
The results indicate that the use of HAC in non-textual data comparison can improve the accuracy of PPRL Resumo and evaluates if Homomorphic Asymmetric Cryptography (HAC) can improveThe accuracy of the comparison involving non- Textual private data.
An Approach to Obfuscate Password-Based Authentication
This work explores how various approaches are being used to strengthen password-based authentication mechanism by obfuscating user password credential (positive identification) while allowing access
Pseudo-Passwords and Non-textual Approaches
This chapter describes various complementary approaches of passwords, namely, Honeywords, Cracking-Resistant Password Vaults using Natural Encoders, Bloom Filter, and non-textual and graphical
A Web Crawler-based Consensus Analysis System for Cross-Border Products
TLDR
An Internet consensus analysis system and data processing method for cross-border products using web crawler to obtain consensus relating to the products from Internet and can analyze the emotional tendency of the Internet public opinions by emotional dictionary.
Automatic Chinese Topic Term Spelling Correction in Online Pinyin Input
TLDR
This paper proposes a novel Chinese spelling correction model directly targeting at the original keyboard input, and integrates this model to an online Chinese input method, to improve the spelling suggestion feature.
...
1
2
...

References

SHOWING 1-10 OF 27 REFERENCES
Summary cache: a scalable wide-area web cache sharing protocol
TLDR
This paper demonstrates the benefits of cache sharing, measures the overhead of the existing protocols, and proposes a new protocol called "summary cache", which reduces the number of intercache protocol messages, reduces the bandwidth consumption, and eliminates 30% to 95% of the protocol CPU overhead, all while maintaining almost the same cache hit ratios as ICP.
Compressed bloom filters
A Bloom filter is a simple space-efficient randomized data structure for representing a set in order to support membership queries. Although Bloom filters allow false positives, for many applications
Informed content delivery across adaptive overlay networks
TLDR
This work makes the case for an erasure-resilient encoding of the content, and demonstrates the performance benefits of informed content-delivery mechanisms and how they complement existing overlay network architectures.
On the scalability of data synchronization protocols for PDAs and mobile devices
TLDR
This survey examines a number of popular and representative synchronization protocols, such as Palm's HotSync, Pumatech's Intellisync and the industry-wide SyncML, and compares them to a novel synchronization approach, CPISync, which addresses some of their scalability concerns.
Coding for computing
TLDR
It is shown that if only the sender can transmit, the number of bits required is a conditional entropy of a naturally defined graph.
Space/time trade-offs in hash coding with allowable errors
TLDR
Analysis of the paradigm problem demonstrates that allowing a small number of test messages to be falsely identified as members of the given set will permit a much smaller hash area to be used without increasing reject time.
Finding Similar Files in a Large File System
TLDR
Application of sif can be found in file management, information collecting, program reuse, file synchronization, data compression, and maybe even plagiarism detection.
Serial computations of Levenshtein distances
TLDR
This chapter focuses on the problem of evaluating a longest common subsequence, which is expressively equivalent to the simple form of the Levenshtein distance.
The zero-error side information problem and chromatic numbers (Corresp.)
TLDR
A discrete random variable X is to be transmitted by means of a discrete signal so that the probability of error must be exactly zero, and the problem is to minimize the signal's alphabet size.
...
1
2
3
...