• Publications
  • Influence
Compressing Pre-trained Language Models by Matrix Decomposition
A two-stage model-compression method to reduce a model’s inference time cost by first decomposing the matrices in the model into smaller matrices and then performing feature distillation on the internal representation to recover from the decomposition.
Transfer Learning Between Related Tasks Using Expected Label Proportions
A novel application of the XR framework for transfer learning between related tasks, where knowing the labels of task A provides an estimation of the label proportion of task B, and a stochastic batched approximation procedure is proposed.