Delbert Dueck

Learn More
Clustering data by identifying a subset of representative examples is important for processing sensory signals and detecting patterns in data. Such "exemplars" can be found by randomly choosing an initial subset of data points and then iteratively refining it, but this works well only if that initial choice is close to a good solution. We devised a method(More)
Unsupervised categorization of images or image parts is often needed for image and video summarization or as a preprocessing step in supervised methods for classification, tracking and segmentation. While many metric-based techniques have been applied to this problem in the vision community, often, the most natural measures of similarity (e.g., number of(More)
Clustering is a fundamental problem in machine learning and has been approached in many ways. Two general and quite different approaches include iteratively fitting a mixture model (e.g., using EM) and linking together pairs of training cases that have high affinity (e.g., using spectral methods). Pair-wise clustering algorithms need not compute sufficient(More)
AFFINITY PROPAGATION: CLUSTERING DATA BY PASSING MESSAGES Delbert Dueck Doctor of Philosophy Graduate Department of Electrical & Computer Engineering University of Toronto 2009 Clustering data by identifying a subset of representative examples is important for detecting patterns in data and in processing sensory signals. Such “exemplars” can be found by(More)
MOTIVATION We address the problem of multi-way clustering of microarray data using a generative model. Our algorithm, probabilistic sparse matrix factorization (PSMF), is a probabilistic extension of a previous hard-decision algorithm for this problem. PSMF allows for varying levels of sensor noise in the data, uncertainty in the hidden prototypes used to(More)
Many kinds of data can be viewed as consisting of a set of vectors, each of which is a noisy combination of a small number of noisy prototype vectors. Physically, these prototype vectors may correspond to different hidden variables that play a role in determining the measured data. For example, a gene’s expression is influenced by the presence of(More)
A key problem of interest to biologists and medical researchers is the selection of a subset of queries or treatments that provide maximum utility for a population of targets. For example, when studying how gene deletion mutants respond to each of thousands of drugs, it is desirable to identify a small subset of genes that nearly uniquely define a drug(More)
This copy is for your personal, non-commercial use only. . clicking here colleagues, clients, or customers by , you can order high-quality copies for your If you wish to distribute this article to others . here following the guidelines can be obtained by Permission to republish or repurpose articles or portions of articles (this information is current as of(More)
One of the key components of sequencing technologies [1]is proper separation of a single species/strain/allele of the target sequence (e.g., gene) from a sample. Traditionally, this is achieved chemically, for example through the use of specific primer sequences. However, it is possible that multiple related species are picked up with the same primer. This(More)