• Publications
  • Influence
Computing Semantic Relatedness Using Wikipedia-based Explicit Semantic Analysis
TLDR
This work proposes Explicit Semantic Analysis (ESA), a novel method that represents the meaning of texts in a high-dimensional space of concepts derived from Wikipedia that results in substantial improvements in correlation of computed relatedness scores with human judgments.
Wikipedia-based Semantic Interpretation for Natural Language Processing
TLDR
This work proposes a novel method, called Explicit Semantic Analysis (ESA), for fine-grained semantic interpretation of unrestricted natural language texts, which represents meaning in a high-dimensional space of concepts derived from Wikipedia, the largest encyclopedia in existence.
A word at a time: computing word relatedness using temporal semantic analysis
TLDR
This paper proposes a new semantic relatedness model, Temporal Semantic Analysis (TSA), which captures this temporal information in word semantics as a vector of concepts over a corpus of temporally-ordered documents.
Overcoming the Brittleness Bottleneck using Wikipedia: Enhancing Text Categorization with Encyclopedic Knowledge
TLDR
It is proposed to enrich document representation through automatic use of a vast compendium of human knowledge--an encyclopedia, and empirical results confirm that this knowledge-intensive representation brings text categorization to a qualitatively new level of performance across a diverse collection of datasets.
Feature Generation for Text Categorization Using World Knowledge
TLDR
Improved machine learning algorithms for text categorization with generated features based on domain-specific and common-sense knowledge are enhanced, addressing the two main problems of natural language processing--synonymy and polysemy.
Text categorization with many redundant features: using aggressive feature selection to make SVMs competitive with C4.5
TLDR
A novel measure is developed that captures feature redundancy, and is used to analyze a large collection of datasets and shows that for problems plagued with numerous redundant features the performance of C4.5 is significantly superior to that of SVM, while aggressive feature selection allows SVM to beat C 4.5 by a narrow margin.
Concept-Based Information Retrieval Using Explicit Semantic Analysis
TLDR
This article introduces a new concept-based retrieval approach based on Explicit Semantic Analysis (ESA), a recently proposed method that augments keyword-based text representation with concept- based features, automatically extracted from massive human knowledge repositories such as Wikipedia.
Learning causality for news events prediction
TLDR
A new methodology for modeling and predicting such future news events using machine learning and data mining techniques is presented, and the Pundit algorithm generalizes examples of causality pairs to infer a causality predictor.
Learning Implicit Transfer for Person Re-identification
TLDR
The implicit approach models camera transfer by a binary relation R={(x,y)|x and y describe the same person seen from cameras A and B respectively, which implies that the camera transfer function is a multi-valued mapping and not a single-valued transformation.
Selective Sampling for Nearest Neighbor Classifiers
TLDR
The proposed LSS algorithm is a lookahead algorithm for selective sampling of examples for nearest neighbor classifiers, looking for the example with the highest utility, taking its effect on the resulting classifier into account.
...
1
2
3
4
5
...