The goal of clustering is to detect the presence of distinct groups in a data set and assign group labels to the observations. Nonparametric clustering is based on the premise that the observations may be regarded as a sample from some underlying density in feature space and that groups correspond to modes of this density. The goal then is to find the modes… (More)

High density clusters can be characterized by the connected components of a level set L(λ) = {x : p(x) > λ} of the underlying probability density function p generating the data, at some appropriate level λ ≥ 0. The complete hierarchical clustering can be characterized by a cluster tree T = λ L(λ). In this paper, we study the behavior of a density level set… (More)

- Samuel L Ventura, Rebecca Nugent, Erica R H Fuchs
- 2015

To date, methods used to disambiguate inventors in the United States Patent and Trademark Office (USPTO) database have been rule-and threshold-based (requiring and leveraging expert knowledge) or semi-supervised algorithms trained on statistically generated artificial labels. Using a large, hand-disambiguated set of 98,762 labeled USPTO inventor records… (More)

A fundamental goal of educational research is identifying students' current stage of skill mastery (complete/partial/none). In recent years a number of cognitive diagnosis models have become a popular means of estimating student skill knowledge. However, these models become difficult to estimate as the number of students, items, and skills grows. There… (More)

While students' skill set profiles can be estimated with formal cognitive diagnosis models [8], their computational complexity makes simpler proxy skill estimates attractive [1, 4, 6]. These estimates can be clustered to generate groups of similar students. Often hierarchical agglomerative clustering or k-means clustering is utilized, requiring, for K… (More)

- Research Showcase, Cmu, Rebecca Nugent, Werner Stuetzle
- 2008

We present a plug-in method for estimating the cluster tree of a density. The method takes advantage of the ability to exactly compute the level sets of a piecewise constant density estimate. We then introduce clustering with confidence, an automatic pruning procedure that assesses significance of splits (and thereby clusters) in the cluster tree; the only… (More)

In educational research, a fundamental goal is identifying which skills students have mastered, which skills they have not, and which skills they are in the process of mastering. As the number of examinees, items, and skills increases, the estimation of even simple cognitive diagnosis models becomes difficult. We adopt a faster, simpler approach: cluster a… (More)

- Brian Nelson, Rebecca Nugent, André A Rupp
- 2012

This special issue of JEDM was dedicated to bridging work done in the disciplines of educational and psychological assessment and educational data mining (EDM) via the assessment design and implementation framework of evidence-centered design (ECD). It consisted of a series of five papers: one conceptual paper on ECD, three applied case studies that use ECD… (More)

In educational research, a fundamental goal is identifying which skills students have mastered, which skills they have not, and which skills they are in the process of mastering. As the number of examinees, items, and skills increases, the estimation of even simple cognitive diagnosis models becomes difficult. To address this, we introduce a capability… (More)

- Research Showcase, Cmu, Daniel J Mcdonald, Daniel Joseph Mcdonald, Matt Harrison, Surya Tokdar +24 others
- 2015

In this thesis, I derive generalization error bounds — bounds on the expected inaccuracy of the predictions — for time series forecasting models. These bounds allow forecasters to select among competing models, and to declare that, with high probability, their chosen model will perform well — without making strong assumptions about the data generating… (More)