Feature selection strategies for classifying high dimensional astronomical data sets

  title={Feature selection strategies for classifying high dimensional astronomical data sets},
  author={Ciro Donalek and A. Arunkumar and S. George Djorgovski and Ashish A. Mahabal and Matthew J. Graham and Thomas J. Fuchs and Michael J. Turmon and Ninan Sajeeth Philip and Michael T Yang and Giuseppe Longo},
  journal={2013 IEEE International Conference on Big Data},
The amount of collected data in many scientific fields is increasing, all of them requiring a common task: extract knowledge from massive, multi parametric data sets, as rapidly and efficiently possible. This is especially true in astronomy where synoptic sky surveys are enabling new research frontiers in the time domain astronomy and posing several new object classification challenges in multi dimensional spaces; given the high number of parameters available for each object, feature selection… 

Figures and Tables from this paper

Survey of Object-Based Data Reduction Techniques in Observational Astronomy

The main goal of this article is to describe existing datasets on which algorithms are frequently tested, to characterize and classify available data reduction algorithms and identify promising solutions capable of addressing present and future challenges in astronomy.

Automated Real-Time Classification and Decision Making in Massive Data Streams from Synoptic Sky Surveys

This work is developing a set of machine learning tools to detect, classify and plan a response to transient events for astronomy applications, using the Catalina Real-time Transient Survey (CRTS) as a scientific and methodological testbed.

Computational Intelligence Challenges and Applications on Large-Scale Astronomical Time Series Databases

An overview of machine learning and computational intelligence applications to time-domain astronomy is presented, focusing on the LSST, and future big data challenges and new lines of research in TDA are identified and discussed from the viewpoint of computational intelligence/machine learning.

Real-time data mining of massive data streams from synoptic sky surveys

Return of the features

A forward selection method in order to compute, evaluate, and characterize better performing features for regression and classification problems, and demonstrates that the sets of features determined with this approach improve the performances of the regression models significantly when compared to the performance of the classic features from the literature.

Classical Machine Learning Techniques in the Search of Extrasolar Planets

A supervised learning approach is presented to refine the results produced by a case-by-case analysis of light-curves, harnessing the generalization power of machine learning techniques to predict the currently unclassified light-Curves.

Data Driven Discovery in Astrophysics

Some aspects of the current state of data-intensive astronomy, its methods, and some outstanding data analysis challenges are reviewed, including some of the recent examples of novel, machine learning tools.

Featureless Classification of Light Curves

This work represents time series by a density model and advocates that this new approach of representing time series has potential in tasks beyond classification, e.g., unsupervised learning.

Refining Exoplanet Detection Using Supervised Learning and Feature Engineering

A supervised learning approach is presented to refine the results produced by a case-by-case analysis of light-curves, harnessing the generalization power of machine learning techniques to predict the currently unclassified light-Curves.

Machine Learning in Astronomy: a practical overview

This document summarizes the topics of supervised and unsupervised learning algorithms presented during the IAC Winter School 2018, and provides practical information on the application of such tools to astronomical datasets.



Machine-assisted discovery of relationships in astronomy

High-volume feature-rich data sets are becoming the bread-and-butter of 21st century astronomy but present significant challenges to scientific discovery. In particular, identifying scientifically


A methodology for variable-star classification, drawing from modern machine-learning techniques, which is effective for identifying samples of specific science classes and presents the first astronomical use of hierarchical classification methods to incorporate a known class taxonomy in the classifier.

Flashes in a star stream: Automated classification of astronomical transient events

This work is exploring a variety of novel techniques, mostly Bayesian, to respond to the challenges of automated, rapid classification of transient events detected in the modern synoptic sky surveys, using the ongoing CRTS sky survey as a testbed.

Automated supervised classification of variable stars - I. Methodology

An overview of the stellar variability classes that are presently known, in terms of some relevant stellar parameters, to use the class descriptions obtained as the basis for an automated ``supervised classification'' of large databases.

New Approaches to Object Classification in Synoptic Sky Surveys

Digital synoptic sky surveys pose several new object classification challenges. In surveys where real-time detection and classification of transient events is a science driver, there is a need for an

Discovery, classification, and scientific exploration of transient events from the Catalina Real-time Transient Survey

The Catalina Real-Time Transient Survey is described, that discovers and publishes transient events at optical wavelengths in real time, thus benefiting the entire community and focusing on the challenges of the automated classification and prioritization of transient events.

Sky Surveys

Sky surveys represent a fundamental data basis for astronomy. We use them to map in a systematic way the universe and its constituents, and to discover new types of objects or phenomena. We review

Automated supervised classification of variable stars in the CoRoT programme. Method and application

Context: Aims: In this work, we describe the pipeline for the fast supervised classification of light curves observed by the CoRoT exoplanet CCDs. We present the classification results obtained for

Wrappers for Feature Subset Selection

Analysis of RR Lyrae Stars in the Northern Sky Variability Survey

We use data from the Northern Sky Variability Survey (NSVS), obtained from the first-generation Robotic Optical Transient Search Experiment (ROTSE-I), to identify and study RR Lyrae variable stars in