# Additive Feature Hashing

@article{Andrecut2021AdditiveFH, title={Additive Feature Hashing}, author={Mircea Andrecut}, journal={ArXiv}, year={2021}, volume={abs/2102.03943} }

The hashing trick is a machine learning technique used to encode categorical features into a numerical vector representation of pre-defined fixed length. It works by using the categorical hash values as vector indices, and updating the vector values at those indices. Here we discuss a different approach based on additive-hashing and the "almost orthogonal" property of high-dimensional random vectors. That is, we show that additive feature hashing can be performed directly by adding the hash…

## References

SHOWING 1-10 OF 11 REFERENCES

Feature hashing for large scale multitask learning

- Computer ScienceICML '09
- 2009

This paper provides exponential tail bounds for feature hashing and shows that the interaction between random subspaces is negligible with high probability, and demonstrates the feasibility of this approach with experimental results for a new use case --- multitask learning.

A New Paradigm for Collision-Free Hashing: Incrementality at Reduced Cost

- Mathematics, Computer ScienceEUROCRYPT
- 1997

A simple, new paradigm for the design of collision-free hash functions, where any function emanating from this paradigm is incremental, which means that rather than having to re-compute the hash of x′ from scratch, I can quickly "update" the old hash value to the new one, in time proportional to the amount of modification made in x to get x′.

Incremental Cryptography: The Case of Hashing and Signing

- Computer ScienceCRYPTO
- 1994

The idea is that having once applied the transformation to some document M, the time to update the result upon modification of M should be "proportional" to the "amount of modification" done to M.

High-Dimensional Vector Semantics

- Computer Science, MathematicsArXiv
- 2018

This paper shows that this intriguing property of “almost orthogonal” property of high-dimensional random vectors can be used to “memorize” random vectors by simply adding them, and provides an efficient probabilistic solution to the set membership problem.

COLLABORATIVE SPAM FILTERING WITH THE HASHING TRICK

- Computer Science
- 2009

There is substantial deviation in users’ notions of what constitutes spam and ham, and these realities make it extremelydifficult to assemble a single, global spam classifi er.

An Introduction to Random Indexing

- Computer Science
- 2005

The Random Indexing word space approach is introduced, which presents an efficient, scalable and incremental alternative to standard word space methods.

Random indexing of text samples for latent semantic analysis

- Computer Science
- 2000

Random Indexing of Text Samples for Latent Semantic Analysis Pentti Kanerva Jan Kristoferson Anders Holst kanerva@sics.se, aho@sic.se RWCP Theoretical Foundation SICS Laboratory Swedish Institute of Computer Science, Box 1263, SE-16429 Kista, Sweden LatentSemantic Analysis is a method of computing vectors that captures ent corpus and the vectors capture words-by-contexts matrix meaning.

Contributions to the study of SMS spam filtering: new collection and results

- Computer ScienceDocEng '11
- 2011

A new real, public and non-encoded SMS spam collection that is the largest one as far as the authors know is offered and the performance achieved by several established machine learning methods is compared.

The WiLI benchmark dataset for written language identification

- Computer ScienceArXiv
- 2018

This paper describes the WiLI-2018 benchmark dataset for monolingual written natural language identification. WiLI-2018 is a publicly available, free of charge dataset of short text extracts from…

Introduction to Information Retrieval

- Computer ScienceJ. Assoc. Inf. Sci. Technol.
- 2010