• Publications
  • Influence
Constrained K-means Clustering with Background Knowledge
Clustering is traditionally viewed as an unsupervised method for data analysis. However, in some cases information about the problem domain is available in addition to the data instances themselves.Expand
  • 2,272
  • 211
  • PDF
Clustering with Instance-Level Constraints
Clustering algorithms conduct a search through the space of possible organizations of a data set. In this paper, we propose two types of instance-level clustering constraints { must-link andExpand
  • 604
  • 55
  • PDF
Noun Phrase Coreference as Clustering
This paper introduces a new, unsupervised algorithm for noun phrase coreference resolution. It differs from existing methods in that it views corererence resolution as a clustering task. In anExpand
  • 202
  • 19
  • PDF
Mining GPS Traces for Map Refinement
Despite the increasing popularity of route guidance systems, current digital maps are still inadequate for many advanced applications in automotive safety and convenience. Among the drawbacks are theExpand
  • 191
  • 15
  • PDF
Measuring Constraint-Set Utility for Partitional Clustering Algorithms
Clustering with constraints is an active area of machine learning and data mining research. Previous empirical work has convincingly shown that adding constraints to clustering improves performance,Expand
  • 190
  • 14
  • PDF
Machine Learning that Matters
  • K. Wagstaff
  • Computer Science, Mathematics
  • ICML
  • 18 June 2012
Much of current machine learning (ML) research has lost its connection to problems of import to the larger world of science and society. From this perspective, there exist glaring limitations in theExpand
  • 153
  • 13
  • PDF
Alpha seeding for support vector machines
A key practical obstacle in applying support vector machines to many large-scale data mining tasks is that SVM's generally scale quadratically (or worse) in the number of examples or support vectors.
  • 87
  • 11
  • PDF
Multidocument Summarization via Information Extraction
We present and evaluate the initial version of RIPTIDES, a system that combines information extraction, extraction-based summarization, and natural language generation to support user-directedExpand
  • 104
  • 8
  • PDF
Multiple-Instance Regression with Structured Data
We present a multiple-instance regression algorithm that models internal bag structure to identify the items most relevant to the bag labels. Multiple-instance regression (MIR) operates on a set ofExpand
  • 25
  • 6
  • PDF
VAST: An ASKAP Survey for Variables and Slow Transients
The Australian Square Kilometre Array Pathfinder (ASKAP) will give us an unprecedented opportunity to investigate the transient sky at radio wavelengths. In this paper we present VAST, an ASKAPExpand
  • 57
  • 6
  • PDF