Improving machine learning-derived photometric redshifts and physical property estimates using unlabelled observations

  title={Improving machine learning-derived photometric redshifts and physical property estimates using unlabelled observations},
  author={Andrew Humphrey and P. A. C. Cunha and Ana Paulino-Afonso and Stergios Amarantidis and R. Carvajal and Jean Michel Gomes and Israel Matute and Polychronis Papaderos},
  journal={Monthly Notices of the Royal Astronomical Society},
In the era of huge astronomical surveys, machine learning offers promising solutions for the efficient estimation of galaxy properties. The traditional, ‘supervised’ paradigm for the application of machine learning involves training a model on labelled data, and using this model to predict the labels of previously unlabelled data. The semi-supervised ‘pseudo-labelling’ technique offers an alternative paradigm, allowing the model training algorithm to learn from both labelled data and as-yet… 

Figures from this paper



LightGBM: A Highly Efficient Gradient Boosting Decision Tree

It is proved that, since the data instances with larger gradients play a more important role in the computation of information gain, GOSS can obtain quite accurate estimation of the information gain with a much smaller data size, and is called LightGBM.

Pseudo-Label : The Simple and Efficient Semi-Supervised Learning Method for Deep Neural Networks

This simple and efficient method of semi-supervised learning for deep neural networks is proposed, trained in a supervised fashion with labeled and unlabeled data simultaneously and favors a low-density separation between classes.

Data Structures for Statistical Computing in Python

P pandas is a new library which aims to facilitate working with data sets common to finance, statistics, and other related fields and to provide a set of fundamental building blocks for implementing statistical models.

Dask: Parallel Computation with Blocked algorithms and Task Scheduling

This work couple blocked algorithms with dynamic and memory aware task scheduling to achieve a parallel and out-of-core NumPy clone and shows how this extends the effective scale of modern hardware to larger datasets.

Unions Team including Pan-Starrs Team, & CFIS

  • 2014