• Publications
  • Influence
Deep Canonical Correlation Analysis
We introduce Deep Canonical Correlation Analysis (DCCA), a method to learn complex nonlinear transformations of two views of data such that the resulting representations are highly linearlyExpand
  • 855
  • 215
A gentle tutorial of the em algorithm and its application to parameter estimation for Gaussian mixture and hidden Markov models
We describe the maximum-likelihood parameter estimation problem and how the ExpectationMaximization (EM) algorithm can be used for its solution. We first describe the abstract form of the EMExpand
  • 2,770
  • 164
On Deep Multi-View Representation Learning
We consider learning representations (features) in the setting in which we have access to multiple unlabeled views of the data for representation learning while only one view is available at testExpand
  • 366
  • 72
A Class of Submodular Functions for Document Summarization
We design a class of submodular functions meant for document summarization tasks. These functions each combine two terms, one which encourages the summary to be representative of the corpus, and theExpand
  • 562
  • 64
Unsupervised pattern discovery in human chromatin structure through genomic segmentation
We trained Segway, a dynamic Bayesian network method, simultaneously on chromatin data from multiple experiments, including positions of histone modifications, transcription-factor binding and openExpand
  • 415
  • 50
An integrated encyclopedia of DNA elements in the human genome
The human genome encodes the blueprint of life, but the function of the vast majority of its nearly three billion bases is unknown. The Encyclopedia of DNA Elements (ENCODE) project hasExpand
  • 2,367
  • 43
MVA Processing of Speech Features
In this paper, we investigate a technique consisting of mean subtraction, variance normalization and time sequence filtering. Unlike other techniques, it applies auto-regression moving-average (ARMA)Expand
  • 228
  • 40
Multi-document Summarization via Budgeted Maximization of Submodular Functions
We treat the text summarization problem as maximizing a submodular function under a budget constraint. We show, both theoretically and empirically, a modified greedy algorithm can efficiently solveExpand
  • 331
  • 37
Factored Language Models and Generalized Parallel Backoff
We introduce factored language models (FLMs) and generalized parallel backoff (GPB). An FLM represents words as bundles of features (e.g., morphological classes, stems, data-driven clusters, etc.),Expand
  • 323
  • 28
Optimizing matrix multiply using PHiPAC: a portable, high-performance, ANSI C coding methodology
Modern microprocessors can achieve high performance on linear algebra kernels but this currently requires extensive machinoapeci6c hand tuning. We have developed a methodology whereby near-peakExpand
  • 425
  • 22