DAME: A Web Oriented Infrastructure for Scientific Data Mining & Exploration

@article{Brescia2010DAMEAW,
  title={DAME: A Web Oriented Infrastructure for Scientific Data Mining \& Exploration},
  author={Massimo Brescia and Giuseppe Longo and S. George Djorgovski and Stefano Cavuoti and Raffaele D’Abrusco and Ciro Donalek and Alessandro Di Guido and Michelangelo Fiore and Mauro Garofalo and Omar Laurino and Ashish A. Mahabal and Francesco Manna and Alfonso Nocella and Giovanni D'Angelo and Maurizio Paolillo},
  journal={ArXiv},
  year={2010},
  volume={abs/1010.4843}
}
Nowadays, many scientific areas share the same need of being able to deal with massive and distributed datasets and to perform on them complex knowledge extraction tasks. This simple consideration is behind the international efforts to build virtual organizations such as, for instance, the Virtual Observatory (VObs). DAME (DAta Mining & Exploration) is an innovative, general purpose, Web-based, VObs compliant, distributed data mining infrastructure specialized in Massive Data Sets exploration… 

Figures from this paper

Data mining and knowledge discovery resources for astronomy in the web 2.0 age

TLDR
The DAME (DAta Mining and Exploration) Program exposes a series of web-based services to perform scientific investigation on astronomical massive data sets to become a prototype of an efficient data mining framework in the data-centric era.

Data mining and Knowledge Discovery Resources for Astronomy in the Web 2 . 0

TLDR
The engineering design and requirements of the DAME (DAME Mining & Exploration) Program are projected towards a new paradigm of Web based resou ces, which reflect the final goal to become a prot otype of an efficient data mining framework in the data-centric era.

Extracting Knowledge From Massive Astronomical Data Sets

TLDR
This paper briefly outlines some general problems encountered when applying DM/KDD methods to astrophysical problems, and describes the DAME (DAta Mining & Exploration) web application, specifically tailored to work on MDS, which can be effectively applied also to smaller data sets.

The detection of globular clusters in galaxies as a data mining problem

TLDR
An extensive set of experiments revealed that the use of accurate structural parameters does improve the result, but only by ∼5%.

VOGCLUSTERS: an example of DAME web application

We present the alpha release of the VOGCLUSTERS web application, specialized for data and text mining on globular clusters. It is one of the web2.0 technology based services of Data Mining &

Genetic Algorithm Modeling with GPU Parallel Computing Technology

TLDR
A multi-purpose genetic algorithm, designed and implemented with GPGPU / CUDA parallel computing technology, derived from a multi-core CPU serial implementation, already scientifically successfully tested and validated on astrophysical massive data classification problems, through a web application resource (DAMEWARE).

Photometric classification of emission line galaxies with machine-learning methods

TLDR
The results of the experiments show that the application of self-adaptive data mining algorithms trained on spectroscopic data sets and applied to carefully chosen photometric parameters represents a viable alternative to the classical methods that employ time-consuming spectroscopy observations.

References

SHOWING 1-10 OF 47 REFERENCES

Mining Knowledge in Astrophysical Massive Data Sets

DAME : A Distributed Web Based Framework for Knowledge Discovery in Databases

TLDR
The result of the DAME project effort is a service-oriented architecture, by using appropriate standards and incorporating Cloud/Grid paradigms and Web services, that will have as main target the integration of interdisciplinary distributed systems within and across organizational domains.

KNIME: The Konstanz Information Miner

TLDR
Some of the design aspects of the underlying architecture of the Konstanz Information Miner are described and briefly sketch how new nodes can be incorporated.

Scalability Of Machine Learning Algorithms

TLDR
The scalability of concept-learning algorithms was examined, showing that, although their worst-case computational complexity is over-quadratic, most of the examined algorithms can handle large amounts of data.

The Fourth Paradigm: Data-Intensive Scientific Discovery

This presentation will set out the eScience agenda by explaining the current scientific data deluge and the case for a “Fourth Paradigm” for scientific exploration. Examples of data intensive science

Map-Reduce for Machine Learning on Multicore

TLDR
This work shows that algorithms that fit the Statistical Query model can be written in a certain "summation form," which allows them to be easily parallelized on multicore computers and shows basically linear speedup with an increasing number of processors.

Engineering rich internet applications with a model-driven approach

TLDR
An evolutionary approach for incorporating a wealth of RIA features into an existing Web engineering methodology and notation is illustrated and the experience demonstrates that it is possible to model RIA application requirements at a high-level using a platform-independent notation, and generate the client-side and server-side code automatically.

Genetic Algorithms and Machine Learning

TLDR
There is no a priori reason why machine learning must borrow from nature, but many machine learning systems now borrow heavily from current thinking in cognitive science, and rekindled interest in neural networks and connectionism is evidence of serious mechanistic and philosophical currents running through the field.

RESTful Web Services

TLDR
This book shows how you can connect to the programmable web with the technologies you already use every day and harness the power of the Web for programmable applications: you just have to work with the Web instead of against it.

Interoperability of archives in the VO

TLDR
This work reports on standardization work currently going on in the AVO and AstroGRID projects in the following areas: - Exchange formats for tabular data; - Semantic definitions for quantities in tabularData; - Identification of user and authorization to use resources; - Query interfaces to archives; - Catalogues of data resources.