Corpus ID: 17269653

Language Recognition using Random Indexing

  title={Language Recognition using Random Indexing},
  author={Aditya Joshi and Johan T. Halseth and P. Kanerva},
Random Indexing is a simple implementation of Random Projections with a wide range of applications. It can solve a variety of problems with good accuracy without introducing much complexity. Here we use it for identifying the language of text samples. We present a novel method of generating language representation vectors using letter blocks. Further, we show that the method is easily implemented and requires little computational power and space. Experiments on a number of model parameters… Expand
Temporal Random Indexing of Context Vectors Applied to Event Detection
A novel RI representation where the effect imposing a probability distribution on the number of randomized entries leads to a class of RI representations and an algorithm that is log linear in the size of word corpus to track the semantic relationship of the query word to other words for suggesting the events that are relevant to the word in question. Expand
Hyperdimensional Computing for Text Classification
Hyperdimensional computing explores the emulation of cognition by computing with hypervectors as an alternative to computing with numbers. Hypervectors are highdimensional, holographic, andExpand
Bit-Selection Control for Energy-Efficient Handwritten Digits Recognition Hyperdimensional Computing Architecture
A proposed bit-selection control trims redundant bits in the associative memory that do not contribute any information during classification leading to improved throughput and energy-savings without sacrificing accuracy. Expand
Hyper-dimensional computing for a visual question-answering system that is trainable end-to-end
All the operations in the system, namely creating the knowledge base and evaluating the questions against it, are differentiable, thereby making the system easily trainable in an end-to-end fashion. Expand
Hyperdimensional biosignal processing: A case study for EMG-based hand gesture recognition
This work describes the use of HDC in a smart prosthetic application, namely hand gesture recognition from a stream of Electromyography (EMG) signals, and enhances the encoder to adaptively mitigate the effect of gesture-timing uncertainties across different subjects endogenously. Expand
Semantic technologies for detecting names of new drugs on darknets
There is an emerging international phenomenon of drugs that are sold without any control on online marketplaces. An example of a former online marketplace is Silk Road, best known as a platform forExpand
Sentiment analysis for improving healthcare system for women
The system proposes a feedback mechanism wherein, sentiment analysis is performed from surveys and tweets based on prevailing health issues among adult women in India and the social opinion onExpand


An Introduction to Random Indexing
The Random Indexing word space approach is introduced, which presents an efficient, scalable and incremental alternative to standard word space methods. Expand
Efficient Estimation of Word Representations in Vector Space
Two novel model architectures for computing continuous vector representations of words from very large data sets are proposed and it is shown that these vectors provide state-of-the-art performance on the authors' test set for measuring syntactic and semantic word similarities. Expand
Random indexing of text samples for latent semantic analysis
Random Indexing of Text Samples for Latent Semantic Analysis Pentti Kanerva Jan Kristoferson Anders Holst, RWCP Theoretical Foundation SICS Laboratory Swedish Institute of Computer Science, Box 1263, SE-16429 Kista, Sweden LatentSemantic Analysis is a method of computing vectors that captures ent corpus and the vectors capture words-by-contexts matrix meaning. Expand
Dimensionality reduction by random mapping: fast similarity computation for clustering
  • Samuel Kaski
  • Computer Science
  • 1998 IEEE International Joint Conference on Neural Networks Proceedings. IEEE World Congress on Computational Intelligence (Cat. No.98CH36227)
  • 1998
It is demonstrated that the document classification accuracy obtained after the dimensionality has been reduced using a random mapping method will be almost as good as the original accuracy if the final dimensionality is sufficiently large. Expand
Computing with 10,000-bit words
  • P. Kanerva
  • Computer Science
  • 2014 52nd Annual Allerton Conference on Communication, Control, and Computing (Allerton)
  • 2014
The paper describes a 10,000bit architecture that resembles von Neumann's and is suited for statistical learning from data and is used in cognitive modeling and natural-language processing where it is referred to by names such as Holographic Reduce Representation, Vector Symbolic Architecture, Random Indexing, Semantic Indexing and Semantic Pointer Architecture. Expand
Latent semantic indexing: a probabilistic analysis
It is proved that under certain conditions LSI does succeed in capturing the underlying semantics of the corpus and achieves improved retrieval performance. Expand
Visualizing Data using t-SNE
We present a new technique called “t-SNE” that visualizes high-dimensional data by giving each datapoint a location in a two or three-dimensional map. The technique is a variation of StochasticExpand
Sparse Distributed Memory
Pentti Kanerva's Sparse Distributed Memory presents a mathematically elegant theory of human long term memory that resembles the cortex of the cerebellum, and provides an overall perspective on neural systems. Expand
A Solution to Plato's Problem: The Latent Semantic Analysis Theory of Acquisition, Induction, and Representation of Knowledge.
How do people know as much as they do with as little information as they get? The problem takes many forms; learning vocabulary from text is an especially dramatic and convenient case for research. AExpand
A mathematical theory of communication
  • C. Shannon
  • Computer Science, Mathematics
  • Bell Syst. Tech. J.
  • 1948
In this final installment of the paper we consider the case where the signals or the messages or both are continuously variable, in contrast with the discrete nature assumed until now. To aExpand