• Publications
  • Influence
COVID-CT-Dataset: A CT Scan Dataset about COVID-19
TLDR
We build a publicly available CT scan dataset about COVID-19, to foster the development of AI methods for using CT to screen and test CO VID-19 based on CT. Expand
  • 142
  • 27
  • PDF
Petuum: A New Platform for Distributed Machine Learning on Big Data
  • E. Xing, Q. Ho, +7 authors Y. Yu
  • Computer Science
  • IEEE Transactions on Big Data
  • 30 December 2013
TLDR
We propose a general-purpose framework, Petuum, that systematically addresses data- and model-parallel challenges in large-scale ML, by observing that many ML programs are fundamentally optimization-centric and admit error-tolerant, iterative-convergent algorithmic solutions. Expand
  • 199
  • 24
  • PDF
Poseidon: An Efficient Communication Architecture for Distributed Deep Learning on GPU Clusters
TLDR
We present Poseidon, an efficient communication architecture for distributed DL on GPUs. Expand
  • 148
  • 24
  • PDF
On the Automatic Generation of Medical Imaging Reports
TLDR
We propose a co-attention mechanism to localize regions containing abnormalities and generate narrations for them, and develop a hierarchical LSTM model to generate long paragraphs. Expand
  • 112
  • 20
  • PDF
Petuum: A New Platform for Distributed Machine Learning on Big Data
TLDR
We propose a general-purpose framework, Petuum, that systematically addresses data- and model-parallel challenges in large-scale ML, by observing that many ML programs are fundamentally optimization-centric and admit error-tolerant, iterative-convergent algorithmic solutions. Expand
  • 189
  • 18
Integrating Document Clustering and Topic Modeling
TLDR
We propose a multi-grain clustering topic model (MGCTM) which integrates document clustering and topic modeling into a unified framework and jointly performs the two tasks to achieve the overall best performance. Expand
  • 133
  • 13
  • PDF
Towards Automated ICD Coding Using Deep Learning
TLDR
We use character-aware neural language models to generate hidden representations of written diagnosis descriptions and ICD codes, and design an attention mechanism to address the mismatch between the numbers of descriptions and corresponding codes. Expand
  • 57
  • 13
  • PDF
Incorporating Word Correlation Knowledge into Topic Modeling
TLDR
We build a Markov Random Field (MRF) regularized Latent Dirichlet Allocation (LDA) model, which defines a MRF on the latent topic layer of LDA to encourage words labeled as similar to share the same topic label. Expand
  • 69
  • 7
  • PDF
Crypto-Nets: Neural Networks over Encrypted Data
TLDR
We use homomorphic encryption in the following protocol: the data owner encrypts the data and sends the ciphertexts to the third party to obtain a prediction from a trained model. Expand
  • 86
  • 6
  • PDF
Diversifying Restricted Boltzmann Machine for Document Modeling
TLDR
We propose Diversified RBM which diversifies the hidden units, to make them cover not only the dominant topics, but also those in the long-tail region. Expand
  • 63
  • 5
  • PDF