Corpus ID: 227239041

Profile Prediction: An Alignment-Based Pre-Training Task for Protein Sequence Models

  title={Profile Prediction: An Alignment-Based Pre-Training Task for Protein Sequence Models},
  author={Pascal Sturmfels and J. Vig and Ali Madani and Nazneen Rajani},
  • Pascal Sturmfels, J. Vig, +1 author Nazneen Rajani
  • Published 2020
  • Computer Science, Biology
  • ArXiv
  • For protein sequence datasets, unlabeled data has greatly outpaced labeled data due to the high cost of wet-lab characterization. Recent deep-learning approaches to protein prediction have shown that pre-training on unlabeled data can yield useful representations for downstream tasks. However, the optimal pre-training strategy remains an open question. Instead of strictly borrowing from natural language processing (NLP) in the form of masked or autoregressive language modeling, we introduce a… CONTINUE READING
    1 Citations

    Figures and Tables from this paper

    MSA Transformer


    UDSMProt: universal deep sequence models for protein classification
    • 4
    Evaluating Protein Transfer Learning with TAPE
    • 75
    • PDF
    ProteinNet: a standardized data set for machine learning of protein structure
    • 28
    • PDF
    Learning protein sequence embeddings using information from structure
    • 56
    • PDF
    ProGen: Language Modeling for Protein Generation
    • 21
    • PDF