Learn More
We carried out metagenomic shotgun sequencing and a metagenome-wide association study (MGWAS) of fecal, dental and salivary samples from a cohort of individuals with rheumatoid arthritis (RA) and healthy controls. Concordance was observed between the gut and oral microbiomes, suggesting overlap in the abundance and function of species at different body(More)
While Distance Weighted Discrimination (DWD) is an appealing approach to classification in high dimensions, it was designed for balanced datasets. In the case of unequal costs, biased sampling, or unbalanced data, there are major improvements available, using appropriately weighted versions of DWD (wDWD). A major contribution of this paper is the(More)
In multicategory classification, standard techniques typically treat all classes equally. This treatment can be problematic when the dataset is unbalanced in the sense that certain classes have very small class proportions compared to others. The minority classes may be ignored or discounted during the classification process due to their small proportions.(More)
Classification is an important topic in statistics and machine learning with great potential in many real applications. In this paper, we investigate two popular large-margin classification methods, Support Vector Machine (SVM) and Distance Weighted Discrimination (DWD), under two contexts: the high-dimensional, low-sample size data and the imbalanced data.(More)
Set classification problems arise when classification tasks are based on sets of observations as opposed to individual observations. In set classification, a classification rule is trained with N sets of observations, where each set is labeled with class information, and the prediction of a class label is performed also with a set of observations. Data sets(More)
The primary objectives of this paper are: 1.) to apply Statistical Learning Theory (SLT), specifically Partial Least Squares (PLS) and Kernelized PLS (K-PLS), to the universal "feature-rich/case-poor" (also known as "large p small n", or "high-dimension, low-sample size") microarray problem by eliminating those features (or probes) that do not contribute to(More)
Predicting the recurrence of non-small cell lung cancer remains a clinical challenge. The current best practice employs heuristic decisions based on the TNM classification scheme that many believe can be improved upon. Much research has recently been devoted to searching for gene signatures derived from gene expression microarrays for this challenge, but a(More)