Applying machine learning to catalogue matching in astrophysics

  title={Applying machine learning to catalogue matching in astrophysics},
  author={David Rohde and Michael J Drinkwater and Marcus R. Gallagher and Timothy C. Downs and Marianne Doyle},
  journal={Monthly Notices of the Royal Astronomical Society},
We present the results of applying automated machine learning techniques to the problem of matching different object catalogues in astrophysics. In this study, we take two partially matched catalogues where one of the two catalogues has a large positional uncertainty. The two catalogues we used here were taken from the H I Parkes All Sky Survey (HIPASS) and SuperCOSMOS optical survey. Previous work had matched 44 per cent (1887 objects) of HIPASS to the SuperCOSMOS catalogue. A supervised… Expand

Figures and Tables from this paper

Applying the Support Vector Machine Method to Matching IRAS and SDSS Catalogues
  • Chen Cao
  • Computer Science
  • Data Sci. J.
  • 2007
Results of applying a machine learning technique, the Support Vector Machine, to the astronomical problem of matching the Infra-Red Astronomical Satellite (IRAS) and Sloan Digital Sky Survey (SDSS) object catalogues show a good identification performance, better than that derived from classical cross-matching algorithms. Expand
Matching of catalogues by probabilistic pattern classification
We consider the statistical problem of catalogue matching from a machine learning perspective with the goal of producing probabilistic outputs, and using all available information. A framework isExpand
Astronomical catalogue matching as a mixture model problem
It is demonstrated that by employing a predictive Bayesian formalism it is possible to use all available information to assist in obtaining the most reliable matches and still obtain undistorted conclusions. Expand
Bayesian Matching for X-Ray and Infrared Sources in the MYStIX Project
Identifying the infrared counterparts of X-ray sources in Galactic plane fields such as those of the MYStIX project presents particular difficulties due to the high density of infrared sources. ThisExpand
Support vector machines and kd-tree for separating quasars from large survey data bases
We compare the performance of two automated classification algorithms, k-dimensional tree (kd-tree) and support vector machines (SVMs), to separate quasars from stars in the data bases of the SloanExpand
Scientific Data Mining in Astronomy
  • K. Borne
  • Computer Science, Physics
  • Next Generation of Data Mining
  • 2008
To facilitate data-driven discoveries in astronomy, a new data-oriented research paradigm for astronomy and astrophysics is envisioned -- astroinformatics, which is described as both a research approach and an educational imperative for modern data-intensive astronomy. Expand
Early universe cosmology and its observational effects on the cosmic microwave background
This Thesis is written in three parts. The first part describes the analytic calculation of the unequal-time correlator of cosmic strings and superstrings. The first efficient constraint analysis ofExpand
Automatic classification for WDMS with Isomap and SVM
An unsupervised learning algorithm for Nonlinear Dimensionality Reduction (NLDR) named Isometric Feature Mapping (Isomap) is discussed and a classification model is generated by training SVM with low-dimensional dataset from SDSS-DR10 and this model can be applied to carry out large scale data mining. Expand
Use of Neural Networks for the Identification of New z > 3.6 QSOs from FIRST–SDSS DR5
We aim to obtain a complete sample of z ≥ 3. 6 radio QSOs from FIRST sources having star-like counterparts in the SDSS DR5 photometric survey (rAB ≤ 20. 2). The starting sample of FIRST–DR5 pairsExpand
AstroDAS: Sharing Assertions Across Astronomy Catalogues Through Distributed Annotation
The prototype for the Astronomy Distributed Annotation System (AstroDAS) complements the existing OpenSkyQuery tools for federated database queries, and provides web service methods to allow clients to create and store mapping annotations as relational database tuples on annotation servers. Expand


The HIPASS catalogue - I. Data presentation
The H I Parkes All-Sky Survey (HIPASS) catalogue forms the largest uniform catalogue of H I sources compiled to date, with 4315 sources identified purely by their H I content. The catalogue dataExpand
The HIPASS catalogue - II. Completeness, reliability and parameter accuracy
The H I Parkes All Sky Survey (HIPASS) is a blind extragalactic H I 21-cm emission-line survey covering the whole southern sky from declination -90degrees to +25degrees. The HIPASS catalogue (HICAT),Expand
The SuperCOSMOS Sky Survey – II. Image detection, parametrization, classification and photometry
In this, the second in a series of three papers concerning the SuperCOSMOS Sky Survey, we describe the methods for image detection, parametrization, classification and photometry. We demonstrate theExpand
Data mining for multi-wavelength cross-referencing
In this paper, we deal with FOCA ultraviolet data and their cross-referencing with the DPOSS optical catalog, through data mining techniques. While traditional cross-referencing consists inExpand
Wide field imaging – I. Applications of neural networks to object detection and star/galaxy classification
Astronomical wide-field imaging performed with new large-format CCD detectors poses data reduction problems of unprecedented scale, which are difficult to deal with using traditional interactiveExpand
The SuperCOSMOS Sky Survey – I. Introduction and description
In this, the first in a series of three papers concerning the SuperCOSMOS Sky Survey (SSS), we give an introduction and user guide to the survey programme. We briefly describe other wide-fieldExpand
Neural neZtworks in astronomy
This review is aimed to both astronomers and computer scientists (who often know little about potentially interesting applications), and will focus their attention on some of the most interesting fields of application, namely: object extraction and classification, time series analysis, noise identification, and data mining. Expand
The parkes half-jansky flat-spectrum sample
We present a new sample of Parkes half-jansky flat-spectrum radio sources, having made a particular effort to find any previously unidentified sources. The sample contains 323 sources selectedExpand
Ensembles of Classifiers for Morphological Galaxy Classification
We compare the use of three algorithms for performing automated morphological galaxy classification using a sample of 800 galaxies. Classifiers are created using a single training set as well asExpand
SExtractor: Software for source extraction
We present the automated techniques we have developed for new software that optimally detects, deblends, measures and classifies sources from astronomical images: SExtractor ( Source Extractor  ). WeExpand