• Published 2019

A MUTUAL INFORMATION MAXIMIZATION PERSPEC-

@inproceedings{2019AMI,
  title={A MUTUAL INFORMATION MAXIMIZATION PERSPEC-},
  author={},
  year={2019}
}
    We show state-of-the-art word representation learning methods maximize an objective function that is a lower bound on the mutual information between different parts of a word sequence (i.e., a sentence). Our formulation provides an alternative perspective that unifies classical word embedding models (e.g., Skip-gram) and modern contextual embeddings (e.g., BERT, XLNet). In addition to enhancing our theoretical understanding of these methods, our derivation leads to a principled framework that… CONTINUE READING

    References

    Publications referenced by this paper.
    SHOWING 1-10 OF 34 REFERENCES

    Learning deep representations by mutual information estimation and maximization

    Alex Fedorov, Samuel Lavoie-Marchildon, Karan Grewal
    • ICLR
    • 2018
    VIEW 5 EXCERPTS
    HIGHLY INFLUENTIAL

    A theoretical analysis of contrastive unsupervised representation learning

    Sanjeev Arora, Hrishikesh Khandeparkar, Mikhail Khodak, Orestis Plevrakis, Nikunj Saunshi
    • In Proc. of ICML,
    • 2019
    VIEW 3 EXCERPTS