• Publications
  • Influence
Indexing by Latent Semantic Analysis
A new method for automatic indexing and retrieval is described. The approach is to take advantage of implicit higher-order structure in the association of terms with documents (“semantic structure”)Expand
Indexing by Latent Semantic Analysis
A new method for automatic indexing and retrieval to take advantage of implicit higher-order structure in the association of terms with documents (“semantic structure”) in order to improve the detection of relevant documents on the basis of terms found in queries. Expand
A Solution to Plato's Problem: The Latent Semantic Analysis Theory of Acquisition, Induction, and Representation of Knowledge.
How do people know as much as they do with as little information as they get? The problem takes many forms; learning vocabulary from text is an especially dramatic and convenient case for research. AExpand
Using Linear Algebra for Intelligent Information Retrieval
A lexical match between words in users’ requests and those in or assigned to documents in a database helps retrieve textual materials from scientific databases. Expand
A Bayesian Approach to Filtering Junk E-Mail
This work examines methods for the automated construction of filters to eliminate such unwanted messages from a user’s mail stream, and shows the efficacy of such filters in a real world usage scenario, arguing that this technology is mature enough for deployment. Expand
Improving the retrieval of information from external sources
A statistical method is described called latent semantic indexing, which models the implicit higher order structure in the association of words and objects and improves retrieval performance by up to 30%. Expand
Latent Semantic Analysis.
L'Analyse Semantique Latente est une approche statistique introduite pour ameliorer la recherche d'information. Elle consiste a reduire la dimensionalite du probleme de la recherche d'informationExpand
Hierarchical classification of Web content
  • S. Dumais, Hao Chen
  • Computer Science
  • SIGIR '00
  • 1 July 2000
This paper explores the use of hierarchical structure for classifying a large, heterogeneous collection of web content using support vector machine (SVM) classifiers, which have been shown to be efficient and effective for classification, but not previously explored in the context of hierarchical classification. Expand
Inductive learning algorithms and representations for text categorization
A comparison of the effectiveness of five different automatic learning algorithms for text categorization in terms of learning speed, realtime classification speed, and classification accuracy is compared. Expand