A Mutual Information Maximization Perspective of Language Representation Learning

@article{Kong2019AMI,
  title={A Mutual Information Maximization Perspective of Language Representation Learning},
  author={Lingpeng Kong and Cyprien de Masson d'Autume and Wang Ling and Lei Yu and Zihang Dai and Dani Yogatama},
  journal={ArXiv},
  year={2019},
  volume={abs/1910.08350}
}
We show state-of-the-art word representation learning methods maximize an objective function that is a lower bound on the mutual information between different parts of a word sequence (i.e., a sentence). Our formulation provides an alternative perspective that unifies classical word embedding models (e.g., Skip-gram) and modern contextual embeddings (e.g., BERT, XLNet). In addition to enhancing our theoretical understanding of these methods, our derivation leads to a principled framework that… CONTINUE READING

References

Publications referenced by this paper.
SHOWING 1-10 OF 34 REFERENCES

Learning deep representations by mutual information estimation and maximization

Alex Fedorov, Samuel Lavoie-Marchildon, Karan Grewal
  • ICLR
  • 2018
VIEW 5 EXCERPTS
HIGHLY INFLUENTIAL

A theoretical analysis of contrastive unsupervised representation learning

Sanjeev Arora, Hrishikesh Khandeparkar, Mikhail Khodak, Orestis Plevrakis, Nikunj Saunshi
  • In Proc. of ICML,
  • 2019
VIEW 3 EXCERPTS