Construction of precise support vector machine based models for predicting promoter strength
Cytochromes P450 (CYPs) are crucial targets when predicting the ADME properties (absorption, distribution, metabolism, and excretion) of drugs in development. Particularly, CYPs mediated drug-drug interactions are responsible for major failures in the drug design process. Accurate and robust screening filters are thus needed to predict interactions of potent compounds with CYPs as early as possible in the process. In recent years, more and more 3D structures of various CYP isoforms have been solved, opening the gate of accurate structure-based studies of interactions. Nevertheless, the ligand-based approach still remains popular. This success can be explained by the growing number of available data and the satisfying performances of existing machine learning (ML) methods. The aim of this contribution is to give an overview of the recent achievements in ML applications to CYP datasets. Particularly, popular methods such as support vector machine, decision trees, artificial neural networks, k-nearest neighbors, and partial least squares will be compared as well as the quality of the datasets and the descriptors used. Consensus of different methods will also be discussed. Often reaching 90% of accuracy, the models will be analyzed to highlight the key descriptors permitting the good prediction of CYPs binding.