Cost-Sensitive Batch Mode Active Learning: Designing Astronomical Observation by Optimizing Telescope Time and Telescope Choice

  title={Cost-Sensitive Batch Mode Active Learning: Designing Astronomical Observation by Optimizing Telescope Time and Telescope Choice},
  author={Xide Xia and Pavlos Protopapas and Finale Doshi-Velez},
Astronomers and telescope operators must make decisions about what to observe given limited telescope time. To optimize this decision-making process, we present a batch, cost-sensitive, active learning approach that exploits structure in the unlabeled dataset, accounts for label uncertainty, and minimizes annotation costs. We first cluster the unlabeled instances in feature space. We next introduce an uncertainty-reducing selection criterion that encourages the batch of selected instances to… 

Figures and Tables from this paper

Deep Similarity-Based Batch Mode Active Learning with Exploration-Exploitation

The proposed deep neural network based algorithm outperforms the baselines with both higher classification accuracy and faster convergence rate on a variety of classification tasks: MNIST classification, opinion polarity detection, and heart failure prediction.

Budgeted Batch Mode Active Learning with Generalized Cost and Utility Functions

This paper presents a learning framework that actively selects optimal set of examples within a given budget, based on given utility and cost functions, and proposes a novel utility function based on the Facility Location problem that considers three important characteristics of utility i.e., diversity, density and point utility.

Optimizing spectroscopic follow-up strategies for supernova photometric classification with active learning

A framework for spectroscopic follow-up design for optimizing supernova photometric classification is reported, which achieves 2.3 times higher purity and comparable figure of merit results after only 180 d of observation, or 800 queries, and is able to double purity results.

An Information Theory Approach on Deciding Spectroscopic Follow-ups

This work proposes a methodology in a probabilistic setting that determines a-priory which objects are worth taking spectrum to obtain better insights and provides a general framework for follow-up strategies that can be extended beyond classification and to include other forms of follow-ups beyond spectroscopy.

PS3: Partition-Based Skew-Specialized Sampling for Batch Mode Active Learning in Imbalanced Text Data

A novel partition-based batch mode active learning framework that pre-selects a subset of most uncertain data items and then selects a representative set from this uncertainty space to tackle the class-skew problem, and uses a data-driven skew-specialized cluster representation with a higher potential to “cherry pick” minority classes.

Galaxy Zoo: Probabilistic Morphology through Bayesian CNNs and Active Learning

By combining human and machine intelligence, Galaxy Zoo will be able to classify surveys of any conceivable scale on a timescale of weeks, providing massive and detailed morphology catalogues to support research into galaxy evolution.

Active Anomaly Detection for time-domain discoveries

The first application of adaptive machine learning to the identification of anomalies in a data set of non-periodic astronomical light curves is presented, providing the first evidence that AAD algorithms can play a central role in the search for new physics in the era of large scale sky surveys.

A Survey of Deep Active Learning

A formal classification method for the existing work in deep active learning is provided, along with a comprehensive and systematic overview, to investigate whether AL can be used to reduce the cost of sample annotation while retaining the powerful learning capabilities of DL.

Active Learning Strategy for COVID-19 Annotated Dataset

A novel discriminative batch-mode active learning (DS3) is proposed to allow faster and more effective COVID-19 data annotation and the results of significance testing verify the effectiveness of DS3 and its superiority over baseline active learning algorithms.




It is argued that AL—where the data whose inclusion in the training set would most improve predictions on the testing set are queried for manual follow-up—is an effective approach and is appropriate for many astronomical applications.

Active Learning by Querying Informative and Representative Examples

The proposed QUIRE approach provides a systematic way for measuring and combining the informativeness and representativeness of an unlabeled instance by incorporating the correlation among labels and is extended to multi-label learning by actively querying instance-label pairs.

Batch mode active sampling based on marginal probability distribution matching

A novel criterion is proposed which achieves good generalization performance of a classifier by specifically selecting a set of query samples that minimizes the difference in distribution between the labeled and the unlabeled data, after annotation.

Guided Feature Labeling for Budget-Sensitive Learning Under Extreme Class Imbalance

This work presents an alternative to active feature labeling, Guided Feature Labeling, in this paradigm, human domain experts are tasked with feature labeling.

Fast and optimal nonparametric sequential design for astronomical observations

A Bayesian model averaging perspective is taken to learn astronomical objects, employing a Bayesian nonparametric approach to accommodate the deviation from convex combinations of known log-SEDs.

Incorporating Diversity in Active Learning with Support Vector Machines

This work presents a new approach that is especially designed to construct batches and incorporates a diversity measure that has low computational requirements making it feasible for large scale problems with several thousands of examples.

Paired Sampling in Density-Sensitive Active Learning

A new paired-sampling density-sensitive method embodying two key principles of Balanced sampling on both sides of the decision boundary is developed and exploiting the natural grouping of unlabeled data establishes a more meaningful non-Euclidean distance function with respect to estimated category membership.

Dual Strategy Active Learning

A dynamic approach, called DUAL, where the strategy selection parameters are adaptively updated based on estimated future residual error reduction after each actively sampled point, to outperform static strategies over a large operating range.

The Value of Unlabeled Data for Classification Problems

It is demonstrated that Fisher information matrices can be used to judge the asymp-totic value of unlabeled data and this methodology is applied to both passive partially supervised learning and active learning.

Committee-Based Sampling For Training Probabilistic Classifiers