• Corpus ID: 18923178

Classification of Grouped Data

  title={Classification of Grouped Data},
  author={Mahdi Azmandian and G. Kowshik and Anqi Wu and Yu-Han Chang},
Data often occurs in groups, such as a set of noisy measurements or multiple samples collected from some organism. Classical machine learning tends to consider datasets that consist of individually labeled points, rather than datasets where sets of similarly-labeled points may be grouped together. First we introduce this problem formulation, and then propose a set of potential solutions, which we broadly call a Distribution-Based Approach (DBA). DBA leverages the structure of the problem, and… 



Analyzing the effectiveness and applicability of co-training

It is demonstrated that when learning from labeled and unlabeled data, algorithms explicitly leveraging a natural independent split of the features outperform algorithms that do not and may out-perform algorithms not using a split.

A Survey on Transfer Learning

The relationship between transfer learning and other related machine learning techniques such as domain adaptation, multitask learning and sample selection bias, as well as covariate shift are discussed.

Combining labeled and unlabeled data with co-training

A PAC-style analysis is provided for a problem setting motivated by the task of learning to classify web pages, in which the description of each example can be partitioned into two distinct views, to allow inexpensive unlabeled data to augment, a much smaller set of labeled examples.

Boosting for transfer learning

This paper presents a novel transfer learning framework called TrAdaBoost, which extends boosting-based learning algorithms and shows that this method can allow us to learn an accurate model using only a tiny amount of new data and a large amount of old data, even when the new data are not sufficient to train a model alone.

Self-taught learning: transfer learning from unlabeled data

An approach to self-taught learning that uses sparse coding to construct higher-level features using the unlabeled data to form a succinct input representation and significantly improve classification performance.

Multitask Learning

Prior work on MTL is reviewed, new evidence that MTL in backprop nets discovers task relatedness without the need of supervisory signals is presented, and new results for MTL with k-nearest neighbor and kernel regression are presented.

Transductive Learning via Spectral Graph Partitioning

This work proposes an algorithm that robustly achieves good generalization performance and that can be trained efficiently, and shows a connection to transductive Support Vector Machines, and that an effective Co-Training algorithm arises as a special case.

Trajectory Outlier Detection: A Partition-and-Detect Framework

A novel partition-and-detect framework for trajectory outlier detection is proposed, which partitions a trajectory into a set of line segments, and then, detects outlying line segments for trajectory outliers.

Trajectory clustering: a partition-and-group framework

A new partition-and-group framework for clustering trajectories is proposed, which partitions a trajectory into a set of line segments, and then, groups similar line segments together into a cluster, and a trajectory clustering algorithm TRACLUS is developed, which discovers common sub-trajectories from real trajectory data.

Is Learning The n-th Thing Any Easier Than Learning The First?

  • S. Thrun
  • Computer Science, Education
  • 1995
It is shown that across the board, lifelong learning approaches generalize consistently more accurately from less training data, by their ability to transfer knowledge across learning tasks.