Staal A. Vinterbo

Learn More
The problem of disseminating a data set for machine learning while controlling the disclosure of data source identity is described using a commuting diagram of functions. This formalization is used to present and analyze an optimization problem balancing privacy and data utility requirements. The analysis points to the application of a generalization(More)
Monitoring vital signs and locations of certain classes of ambulatory patients can be useful in overcrowded emergency departments and at disaster scenes, both on-site and during transportation. To be useful, such monitoring needs to be portable and low cost, and have minimal adverse impact on emergency personnel, e.g., by not raising an excessive number of(More)
MOTIVATION Interpretation of classification models derived from gene-expression data is usually not simple, yet it is an important aspect in the analytical process. We investigate the performance of small rule-based classifiers based on fuzzy logic in five datasets that are different in size, laboratory origin and biomedical domain. RESULTS The(More)
Privacy concerns are among the major barriers to efficient secondary use of information and data on humans. Differential privacy is a relatively recent measure that has received much attention in machine learning as it quantifies individual risk using a strong cryptographically motivated notion of privacy. At the core of differential privacy lies the(More)
This paper evaluates the variable selection performed by several machine-learning techniques on a myocardial infarction data set. The focus of this work is to determine which of 43 input variables are considered relevant for prediction of myocardial infarction. The algorithms investigated were logistic regression (with stepwise, forward, and backward(More)
We analyze the discriminatory power of k-nearest neighbors, logistic regression, artificial neural networks (ANNs), decision tress, and support vector machines (SVMs) on the task of classifying pigmented skin lesions as common nevi, dysplastic nevi, or melanoma. Three different classification tasks were used as benchmarks: the dichotomous problem of(More)
One of the common limitations of expert systems for medical diagnosis is that they make an implicit assumption that multiple disorders do not co-occur in a single patient. The need for this simplifying assumption stems from the fact that finding minimal sets of disorders that cover all symptoms for a given patient is generally computationally intractable(More)
A set S that has a non-empty intersection with every set in a collection of sets C is called a hitting set of C. If no elements can be removed from S without violating the hitting set property, then we say that S is minimal. Several interesting problems can in part be formulated as that of having to ®nd one or more minimal hitting sets. Many of these(More)
Actual use of regression models in clinical practice depends on model simplicity. Reducing the number of variables in a model contributes to this goal. The quality of a particular selection of variables for a logistic regression model can be defined in terms of the number of variables selected and the model's discriminatory performance, as measured by the(More)
We investigate the use of perceptrons for classification of microarray data where we use two datasets that were published in [Nat. Med. 7 (6) (2001) 673] and [Science 286 (1999) 531]. The classification problem studied by Khan et al. is related to the diagnosis of small round blue cell tumours (SRBCT) of childhood which are difficult to classify both(More)