Edmon Begoli

Learn More
Big data phenomenon refers to the practice of collection and processing of very large data sets and associated systems and algorithms used to analyze these massive datasets. Architectures for big data usually range across multiple machines and clusters, and they commonly consist of multiple special purpose sub-systems. Coupled with the knowledge discovery(More)
This paper describes an effort at the University of Tennessee's National Institute for Computational Sciences (NICS) to integrate Apache Spark into the widely used TORQUE HPC batch environment. The similarities and differences between the execution of a Spark program and that of an MPI program on a cluster are used to motivate how to implement Spark/TORQUE(More)
Infrastructure-as-a-Service has revolutionized the manner in which users commission computing infrastructure. Coupled with Big Data platforms (Hadoop, Cassandra), IaaS has democratized the ability to store and process massive datasets. For users that need to customize or create new Big Data stacks, however, readily available solutions do not yet exist.(More)
Intended as a survey for practicing architects and researchers seeking an overview of the state-of-the-art architectures for data analysis, this paper provides an overview of the emerging data management and analytic platforms including parallel databases, Hadoop-based systems, High Performance Computing (HPC) platforms and platforms popularly referred to(More)
The Polystore architecture revisits the federated approach to access and querying the standalone, independent databases in the uniform and optimized fashion, but this time in the context of heterogeneous data and specialized analyses. In light of this architectural philosophy, and in the light of the major data architecture development efforts at the US(More)
We present a service platform for schema-leess exploration of data and discovery of patient-related statistics from healthcare data sets. The architecture of this platform is motivated by the need for fast, schema-less, and flexible approaches to SQL-based exploration and discovery of information embedded in the common, heterogeneously structured healthcare(More)
Inaugural APKDD workshop presents state-of-the-art research and industry practices in the areas of analytic platforms and architectures for large scale data collection and organization, comprehensive data analysis, improvement of collection and organization methods and analysis of large data sets. Workshop participants will share presentations, empirical(More)