Behrouz Minaei-Bidgoli

Learn More
Clustering ensembles combine multiple partitions of the given data into a single clustering solution of better quality. Inspired by the success of supervised boosting algorithms, we devise an adaptive scheme for integration of multiple non-independent clusterings. Individual partitions in the ensemble are sequentially generated by clustering specially(More)
Association rule discovery is an ever increasing area of interest in data mining. Finding rules for attributes with numerical values is still a challenging point in the process of association rule discovery. Most of popular methods for association rule mining cannot be applied to the numerical data without data discretization. There have been efforts to(More)
1 Berhouz Minaei-Bigdoli, Michigan State University, Department of Computer Science, Genetic Algorithms Research and Applications Group (GARAGe), minaeibi@cse.msu.edu 2 Deborah A. Kashy, Michigan State University, Department of Psychology, kashyd@msu.edu 3 Gerd Kortemeyer, Michigan State University, Division of Science and Math Education, korte@lite.msu.edu(More)
This paper presents an approach for classifying students in order to predict their final grade based on features extracted from logged data in an education web-based system. A combination of multiple classifiers leads to a significant improvement in classification performance. Through weighting the feature vectors using a Genetic Algorithm we can optimize(More)
Combination of multiple clusterings is an important task in the area of unsupervised learning. Inspired by the success of supervised bagging algorithms, we propose a resampling scheme for integration of multiple independent clusterings. Individual partitions in the ensemble are sequentially generated by clustering specially selected subsamples of the given(More)
In this paper, the efficacy of seven data classification methods; Decision Tree (DT), k-Nearest Neighbor (k-NN), Logistic Regression (LogR), Naïve Bayes (NB), C4.5, Support Vector Machine (SVM) and Linear Classifier (LC) with regard to the Area Under Curve (AUC) metric have been compared. The effects of parameters including size of the dataset, kind of the(More)
The combination of multiple clusterings is a difficult problem in the practice of distributed data mining. Both the cluster generation mechanism and the partition integration process influence the quality of the combinations. We propose a data resampling approach for building cluster ensembles that are both robust and stable. In particular, we investigate(More)
Many stability measures, such as Normalized Mutual Information (NMI), have been proposed to validate a set of partitionings. It is highly possible that a set of partitionings may contain one (or more) high quality cluster(s) but is still adjudged a bad cluster by a stability measure, and as a result, is completely neglected. Inspired by evaluation(More)
One of the main tasks related to multiword expressions (MWEs) is compound verb identification. There have been so many works on unsupervised identification of multiword verbs in many languages, but there has not been any conspicuous work on Persian language yet. Persian multiword verbs (known as compound verbs), are a kind of light verb construction (LVC)(More)