Lian Yan

Learn More
When the goal is to achieve the best correct classification rate, cross entropy and mean squared error are typical cost functions used to optimize classifier performance. However, for many real-world classification problems, the ROC curve is a more meaningful performance measure. We demonstrate that minimizing cross entropy or mean squared error does not(More)
We propose a new method for learning a general statistical inference engine, operating on discrete and mixed discrete/continuous feature spaces. Such a model allows inference on any of the discrete features, given values for the remaining features. Applications are, e.g., to medical diagnosis with multiple possible diseases, fault diagnosis, information(More)
In order to effectively use machine learning algorithms, e.g., neural networks, for the analysis of survival data, the correct treatment of censored data is crucial. The concordance index (CI) is a typical metric for quantifying the predictive ability of a survival model. We propose a new algorithm that directly uses the CI as the objective function to(More)
Classification has been commonly used in many data mining projects in the financial service industry. For instance, to predict collectability of accounts receivable, a binary class label is created based on whether a payment is received within a certain period. However, optimization of the classifier does not necessarily lead to maximization of return on(More)
A lift curve, with the true positive rate on the y-axis and the customer pull (or contact) rate on the x-axis, is often used to depict the model performance in many data mining applications, especially in the area of customer relationship management (CRM). Typically, these applications concern only the model accuracy at a relatively small pull or(More)