Corpus ID: 22964936

Towards a Unified Graph Model for Supporting Data Management and Usable Machine Learning

  title={Towards a Unified Graph Model for Supporting Data Management and Usable Machine Learning},
  author={Guoliang Li and Meihui Zhang and B. Ooi},
  journal={IEEE Data Eng. Bull.},
Data management and machine learning are two important tasks in data science. However, they have been independently studied so far. We argue that they should be complementary to each other. On the one hand, machine learning requires data management techniques to extract, integrate, clean the data, to support scalable and usable machine learning, making it user-friendly and easily deployable. On the other hand, data management relies on machine learning techniques to curate data and improve its… Expand
Model ... Training data Test data at day 1 Test data at day 100 Model Model Model Model ... Cloud ... Hospital 1 Hospital 2 Hospital n
Recent advances in artificial intelligence (AI) and machine learning have created a general perception that AI could be used to solve complex problems, and in some situations over-hyped as a toolExpand
PANDA: Facilitating Usable AI Development
A new perspective on developing AI solutions is taken, and a solution for making AI usable is presented that will enable all subject matter experts (eg. Clinicians) to exploit AI like data scientists. Expand
A metric on directed graphs and Markov chains based on hitting probabilities
This work introduces a metric on the state space of any ergodic, finite-state, time-homogeneous Markov chain and, in particular, on any Markov chains derived from a directed graph, and explores the nature of the metric, compare it to alternative methods, and demonstrate its utility for weak recovery of community structure in dense graphs. Expand


Petuum: A New Platform for Distributed Machine Learning on Big Data
This work proposes a general-purpose framework, Petuum, that systematically addresses data- and model-parallel challenges in large-scale ML, by observing that many ML programs are fundamentally optimization-centric and admit error-tolerant, iterative-convergent algorithmic solutions. Expand
Spark: Cluster Computing with Working Sets
Spark can outperform Hadoop by 10x in iterative machine learning jobs, and can be used to interactively query a 39 GB dataset with sub-second response time. Expand
Infrastructure for Usable Machine Learning: The Stanford DAWN Project
This document outlines opportunities for infrastructure supporting usable, end-to-end machine learning applications in the context of the nascent DAWN (Data Analytics for What's Next) project at Stanford. Expand
Human-in-the-loop Data Integration
A hybrid human-machine data integration framework that harnesses human ability to address this problem, and applies initially to the problem of entity matching, and develops a crowd-powered database system CDB. Expand
TensorFlow: A system for large-scale machine learning
The TensorFlow dataflow model is described and the compelling performance that Tensor Flow achieves for several real-world applications is demonstrated. Expand
GraphChi: Large-Scale Graph Computation on Just a PC
This work presents GraphChi, a disk-based system for computing efficiently on graphs with billions of edges, and builds on the basis of Parallel Sliding Windows to propose a new data structure Partitioned Adjacency Lists, which is used to design an online graph database graphChi-DB. Expand
Contextual crowd intelligence
A more intelligent database management system (DBMS) that captures knowledge to effectively address the industry/domain specific applications and discusses the challenges towards building such a system through examples in healthcare predictive analysis. Expand
Deep Learning at Scale and at Ease
This article designs a distributed deep learning platform called SINGA, which has an intuitive programming model based on the common layer abstraction of deep learning models, and shows that it outperforms many other state-of-the-art deep learning systems. Expand
SINGA: A Distributed Deep Learning Platform
A distributed deep learning system, called SINGA, for training big models over large datasets, which supports a variety of popular deep learning models and provides different neural net partitioning schemes for training large models. Expand
Graph Analytics Through Fine-Grained Parallelism
The topological properties of the underlying graph are explored to design and implement a highly effective concurrency control scheme for efficient synchronous processing in an in-memory graph analytical engine and the results show that the proposed hybrid synchronous scheduler has significantly outperformed other synchronous Scheduler in existing graph analytical engines, as well as BSP and asynchronous schedulers. Expand