Bioinformatics approaches for improved recombinant protein production in Escherichia coli: protein solubility prediction

@article{Chang2014BioinformaticsAF,
  title={Bioinformatics approaches for improved recombinant protein production in Escherichia coli: protein solubility prediction},
  author={Catherine Ching Han Chang and Jiangning Song and Beng Ti Tey and Ramakrishnan Nagasundara Ramanan},
  journal={Briefings in bioinformatics},
  year={2014},
  volume={15 6},
  pages={
          953-62
        }
}
The solubility of recombinant protein expressed in Escherichia coli often represents the production yield. However, up-to-date, instances of successful production of soluble recombinant proteins in E. coli expression system with high yield remain scarce. This is mainly due to the difficulties in improving the overall production capacity, as most of the well-established strategies usually involve a series of trial and error steps with unguaranteed success. One way to concurrently improve the… 

Tables from this paper

Prediction of soluble heterologous protein expression levels in Escherichia coli from sequence-based features and its potential in biopharmaceutical process development
TLDR
The potential utility of this emergent technology to increase the efficiency of BD strategies and thereby to reduce the cost of establishing a process for soluble protein expression are critically examined.
Codon usage clusters correlation: towards protein solubility prediction in heterologous expression systems in E. coli
TLDR
A strong positive correlation between solubility and the degree of conservation of codons usage clusters is observed for two independent datasets and supports the notion that codon usage may dictate translation rate and modulate co-translational folding.
Recombinant expression of insoluble enzymes in Escherichia coli: a systematic review of experimental design and its manufacturing implications
TLDR
An absence of a coherent strategy with disparate practices being used to promote solubility is identified and the potential to approach recombinant expression systematically, with the aid of modern bioinformatics, modelling, and ‘omics’ based systems-level analysis techniques is discussed.
Stepwise optimization of recombinant protein production in Escherichia coli utilizing computational and experimental approaches
TLDR
A stepwise methodology linking the factors from both levels for optimizing the production of soluble recombinant protein in E. coli is proposed, which can facilitate the optimization of gene- and protein-based factors in silico tools.
Improvement of solubility and yield of recombinant protein expression in E. coli using a two-step system
TLDR
The aim of this study was to balance the rate of protein production and the imposed cellular stresses using a two-step expression system and showed that expression yield and soluble/insoluble ratio of GFP have been increased 5 and 2.5 times in comparison with the single step process, respectively.
EPSOL: sequence-based protein solubility prediction using multidimensional embedding
TLDR
EPSOL, a novel deep learning architecture for the prediction of protein solubility in an E. coli expression system, which automatically obtains comprehensive protein feature representations using multidimensional embedding, is presented and outperformed all existing sequence-basedsolubility predictors.
Periscope: quantitative prediction of soluble protein expression in the periplasm of Escherichia coli
TLDR
A predictor with a two-stage architecture, to predict the real-valued expression level of target protein in the periplasm, and results indicate that the occurrence of dipeptide glutamine and aspartic acid is the most important feature for the classification model.
A review of machine learning methods to predict the solubility of overexpressed recombinant proteins in Escherichia coli
TLDR
This paper presents an extensive review of the existing models to predict protein solubility in Escherichia coli recombinant protein overexpression system and concludes that some of the models present acceptable prediction performances and convenient user interfaces can be considered as valuable tools to predict recombinant Protein Solubility results before performing real laboratory experiments, thus saving labour, time and cost.
In silico screening and heterologous expression of soluble dimethyl sulfide monooxygenases of microbial origin in Escherichia coli.
TLDR
The in silico gene screening methodology established from this study can increase the success rate of producing soluble and functional enzymes while avoiding the laborious trial and error involved in the screening of a large pool of genes available.
...
...

References

SHOWING 1-10 OF 43 REFERENCES
Learning to predict expression efficacy of vectors in recombinant protein production
TLDR
It is shown that a machine learning approach to the prediction of the efficacy of a vector for a target protein in a recombinant protein production system is promising and may compliment traditional knowledge-driven study ofThe efficacy.
PROSO II – a new method for protein solubility prediction
TLDR
A novel machine‐learning‐based model called PROSO II which makes use of new classification methods and growth in experimental data to improve coverage and accuracy of solubility predictions and constitutes a substantial improvement in protein solubilities predictions.
Protein solubility: sequence based prediction and experimental verification
TLDR
A machine-learning approach called PROSO is presented to assess the chance of a protein to be soluble upon heterologous expression in Escherichia coli based on its amino acid composition and possesses improved discriminatory capacity.
New fusion protein systems designed to give soluble expression in Escherichia coli.
Three native E. coli proteins-NusA, GrpE, and bacterioferritin (BFR)-were studied in fusion proteins expressed in E. coli for their ability to confer solubility on a target insoluble protein at the
SOLpro: accurate sequence-based prediction of protein solubility
TLDR
A sequence-based prediction method able to accurately predict the propensity of a protein to be soluble on overexpression could be used to prioritize targets in large-scale proteomics projects and to identify mutations likely to increase the solubility of insoluble proteins.
Predicting the solubility of recombinant proteins in Escherichia coli.
TLDR
A statistical model that uses binomial logistic regression for predicting the solubility of heterologous proteins expressed in E. coli in either soluble or insoluble form is described.
Understanding the relationship between the primary structure of proteins and its propensity to be soluble on overexpression in Escherichia coli
TLDR
Thermostability, in vivo half‐life, Asn, Thr, and Tyr content, and tripeptide composition of a protein are correlated to the propensity of aprotein to be soluble on overexpression in E. coli.
Engineering soluble proteins for structural genomics
TLDR
The utility of using a green fluorescent protein (GFP) folding reporter assay to evolve an enzymatically active, soluble variant of a hyperthermophilic protein, and determining its structure by X-ray crystallography is demonstrated, which provides insight into the substrate specificity of the enzyme and the improved solubility of the variant.
A support vector machine-based method for predicting the propensity of a protein to be soluble or to form inclusion body on overexpression in Escherichia coli
TLDR
Six physicochemical properties together with residue and dipeptide-compositions have been used to develop a support vector machine-based classifier to predict the overexpression status in E.coli, and it performs reasonably well in predicting the propensity of a protein to be soluble or to form inclusion bodies.
Sequence-based prediction of protein solubility.
...
...