G-TADOC: Enabling Efficient GPU-Based Text Analytics without Decompression

  title={G-TADOC: Enabling Efficient GPU-Based Text Analytics without Decompression},
  author={Feng Zhang and Zaifeng Pan and Yanliang Zhou and Jidong Zhai and Xipeng Shen and O. Mutlu and Xiaoyong Du},
  journal={2021 IEEE 37th International Conference on Data Engineering (ICDE)},
  • Feng Zhang, Zaifeng Pan, +4 authors Xiaoyong Du
  • Published 2021
  • Computer Science
  • 2021 IEEE 37th International Conference on Data Engineering (ICDE)
Text analytics directly on compression (TADOC) has proven to be a promising technology for big data analytics. GPUs are extremely popular accelerators for data analytics systems. Unfortunately, no work so far shows how to utilize GPUs to accelerate TADOC. We describe G-TADOC, the first framework that provides GPU-based text analytics directly on compression, effectively enabling efficient text analytics on GPUs without decompressing the input data.G-TADOC solves three major challenges. First… Expand


Zwift: A Programming Framework for High Performance Text Analytics on Compressed Data
Zwift is presented, the first programming framework for TADOC, which consists of a Domain Specific Language, a compiler and runtime, and a utility library, and experiments show that Zwift significantly improves programming productivity, while effectively unleashing the power of TAD OC. Expand
FineStream: Fine-Grained Window-Based Stream Processing on CPU-GPU Integrated Architectures
This paper proposes a data stream system, called FineStream, for efficient window-based stream processing on integrated architectures that performs fine-grained workload scheduling between CPU and GPU to take advantage of both architectures, and it also provides efficient mechanism for handling dynamic stream queries. Expand
Spark-GPU: An accelerated in-memory data processing engine on clusters
The design and implementation of Spark-GPU is presented that enables Spark to utilize GPU's massively parallel processing ability to achieve both high performance and high throughput and improves the performance of machine learning workloads and SQL queries. Expand
Efficient Document Analytics on Compressed Data: Method, Challenges, Algorithms, Insights
This work proposes the concept of compression-based direct processing to enable direct document analytics on compressed data and presents how the concept can be materialized on Sequitur, a compression algorithm that produces hierarchical grammar-like representations. Expand
Multi-GPU Graph Analytics
This work presents a single-node, multi-GPU programmable graph processing library that allows programmers to easily extend single-GPU graph algorithms to achieve scalable performance on large graphs with billions of edges, and achieves best-of-class performance across operations and datasets. Expand
Gunrock: GPU Graph Analytics
The results show that on a single GPU, Gunrock has on average at least an order of magnitude speedup over Boost and PowerGraph, comparable performance to the fastest GPU hardwired primitives and CPU shared-memory graph libraries, and better performance than any other GPU high-level graph library. Expand
TADOC: Text Analytics Directly on Compression
Experiments show that TADOC can save 90.8% storage space and 87.9% memory usage, while halving data processing times, on six data analytics tasks of various complexities. Expand
SABER: Window-Based Hybrid Stream Processing for Heterogeneous Architectures
Saber is described, a hybrid high-performance relational stream processing engine for CPUs and GPGPUs that increases processing throughput while maintaining low latency for a wide range of streaming SQL queries with both small and large window sizes. Expand
MapD: a GPU-powered big data analytics and visualization platform
This paper presents MapD, a big data analytics platform that can query and visualize big data up to 100x faster than other systems, and leverages the massive parallelism of commodity GPUs to execute SQL queries over multi-billion row datasets with millisecond response times. Expand
Enabling Efficient Random Access to Hierarchically-Compressed Data
A set of techniques are presented that successfully eliminate the limitation of direct data processing for random accesses, and for the first time, establish the feasibility of effectively handling both data traversal operations and random data accesses on hierarchically-compressed data. Expand