Corpus ID: 8878263

Machine Learning and Cloud Computing: Survey of Distributed and SaaS Solutions

  title={Machine Learning and Cloud Computing: Survey of Distributed and SaaS Solutions},
  author={Daniel Pop},
Applying popular machine learning algorithms to large amounts of data raised new challenges for the ML practitioners. [...] Key Method First direction is of popular statistics tools and libraries (R system, Python) deployed in the cloud. A second line of products is augmenting existing tools with plugins that allow users to create a Hadoop cluster in the cloud and run jobs on it. Next on the list are libraries of distributed implementations for ML algorithms, and on-premise deployments of complex systems for…Expand
A Survey on Distributed Machine Learning
The challenges and opportunities of distributed machine learning over conventional (centralized) machine learning are outlined, discussing the techniques used, and providing an overview of the systems that are available are provided. Expand
Towards MLOps: A Case Study of ML Pipeline Platform
  • Yue Zhou, Yue Yu, Bo Ding
  • 2020 International Conference on Artificial Intelligence and Computer Engineering (ICAICE)
  • 2020
The development and deployment of machine learning (ML) applications differ significantly from traditional applications in many ways, which have led to an increasing need for efficient and reliableExpand
Advanced analytics through FPGA based query processing and deep reinforcement learning
Today, vast streams of structured and unstructured data have been incorporated in databases, and analytical processes are applied to discover patterns, correlations, trends and other usefulExpand
A Public Cloud Based SOA Workflow for Machine Learning Based Recommendation Algorithms
A framework that will simulate the architectural setup of a cloud environment and examine how it can leverage Apriori and sequential pattern based recommendation algorithms using R and a multi layered application encompassing its backend architecture, user interface built using the responsive web design technique and its development workflow is proposed. Expand
Predictive Modelling for E-Commerce Data Classification Tasks: An Azure Machine Learning Approach
The predictive analytics is intensely integrated into the current society. From spam email filtering, to predicting movies you like based on reviews, to categorize product within e-commerce data, toExpand
Design and implementation of a framework for provisioning algorithms as a service
Evaluating results demonstrate that providing multiple scalability models and high-end web servers will improve algorithm performance and achieve availability and reliability using the framework. Expand
Architecture of a Scalable Platform for Monitoring Multiple Big Data Frameworks
This paper presents a distributed, scalable, highly available platform able to collect, store, query and process monitoring data obtained from multiple Big Data frameworks, and presents its architecture and initial results obtained. Expand
Scalable Computing: Practice and Experience
The plethora of sensors deployed in Internet of Things (IoT) environments generate unprecedented volumes of data, thereby creating a data deluge. Data collected from these sensors can be used toExpand
An Approach to Failure Prediction in a Cloud Based Environment
This research will aid computer hardware companies and cloud service providers in designing a reliable fault-tolerant system by providing a better device selection, thereby improving system availability and minimizing unscheduled system downtime. Expand
Key challenges and research direction in cloud storage
Comparisons of different methods and approaches of researchers in tackling issues pertaining to data availability through data replication, data partitioning, data management, and data placement under the context of cloud enabled technologies are tabulated. Expand


NIMBLE: a toolkit for the implementation of parallel data mining and machine learning algorithms on mapreduce
NIMBLE is presented, a portable infrastructure that has been specifically designed to enable the rapid implementation of parallel ML-DM algorithms and is currently runs on top of Hadoop, which is an open-source MR implementation. Expand
SystemML: Declarative machine learning on MapReduce
This paper proposes SystemML in which ML algorithms are expressed in a higher-level language and are compiled and executed in a MapReduce environment and describes and empirically evaluate a number of optimization strategies for efficiently executing these algorithms on Hadoop, an open-source mapReduce implementation. Expand
Scaling up Machine Learning
Extensive coverage of parallelization of boosted trees, support vector machines, spectral clustering, belief propagation, and other popular learning algorithms accompanied by deep dives into several applications make the book equally useful for researchers, students, and practitioners. Expand
Distributed GraphLab : A Framework for Machine Learning and Data Mining in the Cloud
While high-level data parallel frameworks, like MapReduce, simplify the design and implementation of large-scale data processing systems, they do not naturally or efficiently support many importantExpand
Distributed GraphLab: A Framework for Machine Learning in the Cloud
This paper develops graph based extensions to pipelined locking and data versioning to reduce network congestion and mitigate the effect of network latency, and introduces fault tolerance to the GraphLab abstraction using the classic Chandy-Lamport snapshot algorithm. Expand
Scaling Up Machine Learning: Large-Scale Machine Learning Using DryadLINQ
The main motivation behind the development of DryadLINQ was to make it easier for non-specialists to write general purpose, scalable programs that can operate on very large input datasets. In orderExpand
Parallel approaches to machine learning - A comprehensive survey
Map reduce is another important technique that has evolved during this period and as the literature has it, it has been proved to be an important aid in delivering performance of machine learning algorithms on GPUs. Expand
Dryad: distributed data-parallel programs from sequential building blocks
The Dryad execution engine handles all the difficult problems of creating a large distributed, concurrent application: scheduling the use of computers and their CPUs, recovering from communication or computer failures, and transporting data between vertices. Expand
DryadLINQ: A System for General-Purpose Distributed Data-Parallel Computing Using a High-Level Language
It is shown that excellent absolute performance can be attained--a general-purpose sort of 1012 Bytes of data executes in 319 seconds on a 240-computer, 960- disk cluster--as well as demonstrating near-linear scaling of execution time on representative applications as the authors vary the number of computers used for a job. Expand
Snow: A Parallel Computing Framework for the R System
A simple parallel computing framework for the statistical programming language R that focuses on parallelization of familiar higher level mapping functions and emphasizes simplicity of use in order to encourage adoption by a wide range of R users is presented. Expand