• Publications
  • Influence
An Efficient k-Means Clustering Algorithm: Analysis and Implementation
TLDR
This work presents a simple and efficient implementation of Lloyd's k-means clustering algorithm, which it calls the filtering algorithm, and establishes the practical efficiency of the algorithm's running time.
Using “Annotator Rationales” to Improve Machine Learning for Text Categorization
TLDR
It is hypothesize that in some situations, providing rationales is a more fruitful use of an annotator's time than annotating more examples, and presents a learning method that exploits the rationales during training to boost performance significantly on a sample task, namely sentiment classification of movie reviews.
A local search approximation algorithm for k-means clustering
TLDR
This work considers the question of whether there exists a simple and practical approximation algorithm for k-means clustering, and presents a local improvement heuristic based on swapping centers in and out that yields a (9+ε)-approximation algorithm.
A visibility matching tone reproduction operator for high dynamic range scenes
TLDR
A tone reproduction operator is presented that preserves visibility in high dynamic range scenes and introduces a new histogram adjustment technique, based on the population of local adaptation luminances in a scene, that incorporates models for human contrast sensitivity, glare, spatial acuity and color sensitivity.
A Modality Lexicon and its use in Automatic Tagging
This work is supported, in part, by the Johns Hopkins Human Language Technology Center of Excellence. Any opinions, findings, and conclusions or recommendations expressed in this material are those
Comparing Real and Synthetic Images: Some Ideas about Metrics
TLDR
Numerical techniques for comparing real and synthetic luminance images are explored and components of a perceptually based metric using ideas from the image compression literature are introduced.
Named Entity Recognition using Hundreds of Thousands of Features
TLDR
An approach to named entity recognition that uses support vector machines to capture transition probabilities in a lattice using the CoNLL-2003 Shared Task training data.
Modality and Negation in SIMT Use of Modality and Negation in Semantically-Informed Syntactic MT
TLDR
The resulting system significantly outperformed a linguistically naive baseline model, and reached the highest scores yet reported on the NIST 2009 Urdu–English test set, supports the hypothesis that both syntactic and semantic information can improve translation quality.
JHU/APL at TREC 2001: Experiments in Filtering and in Arabic, Video, and Web Retrieval
TLDR
A first attempt was made to hold a content-based video retrieval track at TREC, a new suite of tools for image analysis and multimedia retrieval was developed, and a first attempt at Arabic language retrieval while emphasizing a language-neutral approach was made.
The analysis of a simple k-means clustering algorithm
TLDR
This paper presents a simple and efficient implementation of Lloyd's k-means clustering algorithm, which it differs from most other approaches in that it precomputes a kd-tree data structure for the data points rather than the center points.
...
...