György J. Simon

Learn More
A precursor to many attacks on networks is often a reconnaissance operation, more commonly referred to as a scan. Despite the vast amount of attention focused on methods for scan detection, the state-of-the-art methods suffer from high rate of false alarms and low rate of scan detection. In this paper, we formalize the problem of scan detection as a data(More)
Type-2 Diabetes Mellitus is a growing epidemic that often leads to severe complications. Effective preventive measures exist and identifying patients at high risk of diabetes is a major health-care need. The use of association rule mining (ARM) is advantageous, as it was specifically developed to identify associations between risk factors in an(More)
The common neurodegenerative pathologies underlying dementia are Alzheimer's disease (AD), Lewy body disease (LBD) and frontotemporal lobar degeneration (FTLD). Our aim was to identify patterns of atrophy unique to each of these diseases using antemortem structural MRI scans of pathologically confirmed dementia cases and build an MRI-based differential(More)
Associative classification is a predictive modeling technique that constructs a classifier based on class association rules (also known as predictive association rules; PARs). PARs are association rules where the consequence of the rule is a class label. Associative classification has gained substantial research attention because it successfully joins the(More)
Mining patterns from electronic health-care records (EHR) can potentially lead to better and more cost-effective treatments. We aim to find the groups of ICD-9 diagnosis codes from EHRs that can predict the improvement of urinary incontinence of home health care (HHC) patients and are also interpretable to domain experts. In this paper, we propose two(More)
Prediabetes is the most important risk factor for developing type-2 diabetes mellitus, an important and growing epidemic. Prediabetes is often associated with comorbidities including hypercholesterolemia. While statin drugs are indicated to treat hypercholesterolemia, recent reports suggest a possible increased risk of developing overt diabetes associated(More)
In this paper, we propose a method, where the labeling of the data set is carried out in a semi-supervised manner with user-specified guarantees about the quality of the labeling. In our scheme, we assume that for each class, we have some heuristics available, each of which can identify instances of one particular class. The heuristics are assumed to have(More)
Estimating the number of false negatives for a classifier when the true outcome of the classification is ascertained only for a limited number of instances is an important problem, with a wide range of applications from epidemiology to computer/network security. The frequently applied method is random sampling. However, when the target (positive) class of(More)
Sepsis incidents have doubled from 2000 through 2008, and hospitalizations for these diagnoses have increased by 70%. The use of the Surviving Sepsis Campaign (SSC) guidelines can lead to earlier diagnosis and treatment; however, the effectiveness of the SSC guidelines in preventing complications for this population is unclear. The overall purpose of this(More)