# Performance in Practice of String Hashing Functions

@inproceedings{Ramakrishna1997PerformanceIP, title={Performance in Practice of String Hashing Functions}, author={M. V. Ramakrishna and Justin Zobel}, booktitle={DASFAA}, year={1997} }

String hashing is a fundamental operation, used in countless applications where fast access to distinct strings is required. In this paper we describe a class of string hashing functions and explore its performance. In particular, using experiments with both small sets of keys and a large key set from a text database, we show that it is possible to achieve performance close to that theoretically predicted for hashing functions. We also consider criteria for choosing a hashing function and use…

## 86 Citations

Choosing Best Hashing Strategies and Hash Functions

- Computer Science, Mathematics2009 IEEE International Advance Computing Conference
- 2009

The paper gives the guideline to choose a best suitable hashing method hash function for a particular problem and presents six suitable various classes of hash functions in which most of the problems can find their solution.

Strongly Universal String Hashing is Fast

- Computer ScienceComput. J.
- 2014

Fast strongly universal string hashing families are presented: they can process data at a rate of 0.2 CPU cycle per byte and it is found that these families—though they require a large buffer of random numbers—are often faster than popular hash functions with weaker theoretical guarantees.

The universality of iterated hashing over variable-length strings

- Computer Science, MathematicsDiscret. Appl. Math.
- 2012

Fast and Compact Hash Tables for Integer Keys

- Computer ScienceACSC
- 2009

This paper explains how to efficiently implement an array hash table for integers and demonstrates, through careful experimental evaluations, which hash table offers the best performance for maintaining a large dictionary of integers in-memory, on a current cache-oriented processor.

Cache-Conscious Collision Resolution in String Hash Tables

- Computer ScienceSPIRE
- 2005

Two alternatives to the standard representation of string hash tables are explored: the simple expedient of including the string in its node, and the more drastic step of replacing each list of nodes by a contiguous array of characters.

Redesigning the string hash table, burst trie, and BST to exploit cache

- Computer ScienceJEAL
- 2011

Two alternatives to the standard representation of strings are explored: the simple expedient of including the string in its node, and, for linked lists, the more drastic step of replacing each list of nodes by a contiguous array of characters.

Performance of Data Structures for Small Sets of Strings

- Computer ScienceACSC
- 2002

This paper test the performance of the same data structures on small sets of strings, in the context of document processing for index construction, and shows that the new structures, in particular the burst trie, are the most efficient choice for this task.

Coding schemes variation and its impact on string hashing

- Computer ScienceComput. Stand. Interfaces
- 2002

String hashing for collection-based compression

- Computer Science
- 2015

A CBC system, cobald, was developed which employs a two-step scheme: a preliminary long-range delta encoding step using the fingerprint index, followed by a compression of the delta file by a standard compression utility.

Burst tries: a fast, efficient data structure for string keys

- Computer ScienceTOIS
- 2002

These experiments show that the burst trie is particularly effective for the skewed frequency distributions common in text collections, and dramatically outperforms all other data structures for the task of managing strings while maintaining sort order.

## References

SHOWING 1-10 OF 26 REFERENCES

Distribution-dependent hashing functions and their characteristics

- Computer ScienceSIGMOD '75
- 1975

A study of the performance measures obtained during tests of "Distribution-dependent" hashing functions indicates that in certain cases, distribution-dependent methods perform better than the division method.

Hashing practice: analysis of hashing and universal hashing

- Computer ScienceSIGMOD '88
- 1988

This paper considers the problem of achieving analytical performance of hashing techniques in practice with reference to successful search lengths, unsuccessful search lengths and the expected worst case performance (expected length of the longest probe sequence).

Selecting a hashing algorithm

- Computer ScienceSoftw. Pract. Exp.
- 1990

The results of investigations into the performance of some widely used hashing algorithms are presented and it is shown that some of these algorithms are far from optimal.

File organization using composite perfect hashing

- Computer ScienceACM Trans. Database Syst.
- 1989

This work proposes and analyzes a composite perfect hashing scheme for large external files that guarantees retrieval of any record in a single disk access and supports efficient range searches in addition to being a completely dynamic file organization scheme.

Expected Worst-Case Performance of Hash Files

- Computer ScienceComput. J.
- 1982

The following problem is studied: consider a hash file and the longest probe sequence that occurs when retrieving a record. How long is this probe sequence expected to be? The approach taken differs…

Expected Length of the Longest Probe Sequence in Hash Code Searching

- Computer ScienceJACM
- 1981

An investigation ts made of the expected value of the maximum number of accesses needed to locate any element m a hashing file under various colhston resoluuon schemes, showing that the actual behawor of the worst case in hash tables is quite good on the average.

Phonetic string matching: lessons from information retrieval

- Computer ScienceSIGIR '96
- 1996

The parallels between information retrieval and phonetic matching are explained, and the new phonetics matching techniques described are compared with existing techniques to demonstrate that the new techniques are superior.

Practical performance of Bloom filters and parallel free-text searching

- Computer ScienceCACM
- 1989

The performance of hash transformations with reference to the filter error rate is the focus of this article.

General performance analysis of key-to-address transformation methods using an abstract file concept

- Computer ScienceCACM
- 1973

This paper presents a new approach to the analysis of performance of the various key-to-address transformation methods. In this approach the keys in a file are assumed to have been selected from the…

Algorithms in C

- Computer Science
- 1990

Algorithms in C is a comprehensive repository of algorithms, complete with code, with extensive treatment of searching and advanced data structures, sorting, string processing, computational geometry, graph problems, and mathematical algorithms.