Imbalance Learning for Variable Star Classification

  title={Imbalance Learning for Variable Star Classification},
  author={Zafiirah Hosenie and R. J. Lyon and Ben W. Stappers and Arrykrishna Mootoovaloo and Vanessa McBride},
The accurate automated classification of variable stars into their respective sub-types is difficult. Machine learning based solutions often fall foul of the imbalanced learning problem, which causes poor generalisation performance in practice, especially on rare variable star sub-types. In previous work, we attempted to overcome such deficiencies via the development of a hierarchical machine learning classifier. This 'algorithm-level' approach to tackling imbalance, yielded promising results… 

Figures and Tables from this paper

Benchmark of Data Preprocessing Methods for Imbalanced Classification

A benchmark of 16 preprocessing methods on six cybersecurity datasets together with 17 public imbalanced datasets from other domains is presented and the main findings are: most of the time, a data preprocessing method that improves classification performance exists.

MeerCRAB: MeerLICHT classification of real and bogus transients using deep learning

A deep learning pipeline based on the convolutional neural network architecture called MeerCRAB is presented, designed to filter out the so called “bogus” detections from true astrophysical sources in the transient detection pipeline of the MeerLICHT telescope.

Automatic Catalog of RRLyrae from ~ 14 million VVV Light Curves: How far can we go with traditional machine-learning?

The results indicate that color is an informative feature type of the RRL objective class that should always be considered in automatic classification methods via ML, from a classification perspective based on photometric broad-band data.

Improving Intrusion Detection for Imbalanced Network Traffic using Generative Deep Learning

A conditional tabular generative adversarial network (CTGAN) model with common machine learning algorithms to construct more effective detection systems while addressing the imbalance issue showed that CTGAN can improve the performance of imbalance learning for intrusion detection with SVM and DT.

Drifting Features: Detection and evaluation in the context of automatic RRLs identification in VVV

A new strategy to cope with small changes on the data over long angular distances or long periods of time, which cannot be easily detected by statistical methods, is developed, and Drifting Features can be efficiently identified using ML methods.

Alert Classification for the ALeRCE Broker System: The Light Curve Classifier

This classifier corresponds to the first attempt to classify multiple classes of stochastic variables (including core- and host-dominated active galactic nuclei, blazars, young stellar objects, and cataclysmic variables) in addition to different classes of periodic and transient sources, using real data.

A survey on machine learning based light curve analysis for variable astronomical sources

This survey reviews important developments in light Curve analysis over the past years, summarizes the basic concepts in machine learning and their applications in light curve analysis and concludes perspectives and challenges for light curveAnalysis in the near future.

Deep Attention-based Supernovae Classification of Multiband Light Curves

A deep attention model (TimeModAttn) is proposed to classify multiband light curves of different SN types, avoiding photometric or hand-crafted feature computations, missing-value assumptions, and explicit imputation/interpolation methods.

Discovery of five new Galactic symbiotic stars in the VPHAS+ survey

We report the validation of a recently proposed infrared selection criterion for symbiotic stars (SySts). Spectroscopic data were obtained for seven candidates, selected from the SySt candidates of

Modeling the Multiwavelength Variability of Mrk 335 Using Gaussian Processes

The optical and UV variability of the majority of active galactic nuclei may be related to the reprocessing of rapidly changing X-ray emission from a more compact region near the central black hole.



Comparing Multi-class, Binary and Hierarchical Machine Learning Classification schemes for variable stars

A new hierarchical structure is developed and a new set of classification features are proposed, enabling the accurate identification of subtypes of cepheids, RR Lyrae and eclipsing binary stars in CRTS data.

The class imbalance problem: A systematic study

The assumption that the class imbalance problem does not only affect decision tree systems but also affects other classification systems such as Neural Networks and Support Vector Machines is investigated.

Streaming Classification of Variable Stars

A streaming probabilistic classification model that uses a set of newly designed features that work incrementally to achieve high classification performance, staying an order of magnitude faster than traditional classification approaches.

Automatic Survey-invariant Classification of Variable Stars

A full Probabilistic model is proposed that represents the joint distribution of features from two surveys, as well as a probabilistic transformation of the features from one survey to the other, and represents the features of each domain as a Gaussian mixture and models the transformation as a translation, rotation, and scaling of each separate component.

Uncertain Classification of Variable Stars: Handling Observational GAPS and Noise

A novel method is proposed that increases the performance of automatic classifiers of variable stars by incorporating the deviations that scarcity of observations produces, and finds that RR Lyrae stars can be classified with ~80% accuracy just by observing the first 5% of the whole lightcurves’ observations in the MACHO and OGLE catalogs.

SMOTE: Synthetic Minority Over-sampling Technique

A combination of the method of oversampling the minority (abnormal) class and under-sampling the majority class can achieve better classifier performance (in ROC space) and a combination of these methods and the area under the Receiver Operating Characteristic curve (AUC) and the ROC convex hull strategy is evaluated.

A package for the automated classification of periodic variable stars

A machine learning package for the classification of periodic variable stars finds that recall and precision do not vary significantly if there are more than 80 data points and the duration is more than a few weeks, and investigates how the performance varies with the number ofData points and duration of observations.

Machine learning search for variable stars

It is found that the considered machine learning classifiers are more efficient (they find more variables and less false candidates) compared to traditional techniques that consider individual variability indices or their linear combination.

Photometric Supernova Classification With Machine Learning

A multi-faceted classification pipeline, combining existing and new approaches, finds that by using either the SALT2 or the wavelet feature sets with a BDT algorithm, accurate classification is possible purely from light curve data, without the need for any redshift information.

STACCATO: a novel solution to supernova photometric classification with biased training sets

A novel method, called STACCATO (SynThetically Augmented Light Curve ClassificATiOn') that synthetically augments a biased training set by generating additional training data from the fitted GPs, and increases performance, as measured by the area under the Receiver Operating Characteristic curve (AUC).