A data-driven approach to predict Small-for-Gestational-Age infants


This work studies the problem of identifying risk factors of Small for Gestational Age (SGA) and building classifiers for SGA prediction. Recently, SGA infants have received more and more concerns as this illness brings many difficulties to them along with their whole life. Some experts have begun to study the risk factors of SGA onset by using traditional statistical ways. Others have used logistic regression (LR) to construct SGA prediction models. Meanwhile, machine learning have evolved and envisioned as a tool able to potentially identify babies with SGA. This work tests several feature selection methods. Based on the risk factors obtained through them, it trains support vector machine, random forest, and LR models and evaluates them via 10-fold cross validation in terms of precision and area under the curve of receiver operator characteristic curve. The results show that sparse LR of the wrapper algorithms owns the best feature selection effectiveness. In addition, this work compares data driven factors and knowledge driven factors and shows that the feature selection is necessary and effective. Among the trained classifiers, the LR model achieves the best performance on the data driven factors. Keywords—Classification; feature selection; machine learning; prediction model; small for gestational age

DOI: 10.1109/ICNSC.2016.7479016

7 Figures and Tables

Cite this paper

@inproceedings{Sun2016ADA, title={A data-driven approach to predict Small-for-Gestational-Age infants}, author={Jingchao Sun and Lu Liu and Jianqiang Li and Ji-Jiang Yang and Shi Chen and Qing Wang and MengChu Zhou and Rong Lia and Bo Liu and Jing Bi}, booktitle={ICNSC}, year={2016} }