• Corpus ID: 245650783

FamilySeer: Towards Optimized Tensor Codes by Exploiting Computation Subgraph Similarity

  title={FamilySeer: Towards Optimized Tensor Codes by Exploiting Computation Subgraph Similarity},
  author={Shanjun Zhang and Mingzhen Li and Hailong Yang and Yi Liu and Zhongzhi Luan and Depei Qian},
Deploying various deep learning (DL)models efficiently has boosted the research on DL compilers. The difficulty of generating optimized tensor codes drives DL compiler to ask for the auto-tuning approaches, and the increasing demands require increasing autotuning efficiency and quality. Currently, the DL compilers partition the input DL models into several subgraphs and leverage the autotuning to find the optimal tensor codes of these subgraphs. However, existing auto-tuning approaches usually… 



Ansor : Generating High-Performance Tensor Programs for Deep Learning

Ansor is presented, a tensor program generation framework for deep learning applications that can find high-performance programs that are outside the search space of existing state-of-the-art approaches.

The Deep Learning Compiler: A Comprehensive Survey

This article performs a comprehensive survey of existing DL compilers by dissecting the commonly adopted design in details, with emphasis on the DL oriented multi-level IRs, and frontend/backend optimizations.

Tensor Comprehensions: Framework-Agnostic High-Performance Machine Learning Abstractions

A language close to the mathematics of deep learning called Tensor Comprehensions offering both imperative and declarative styles, a polyhedral Just-In-Time compiler to convert a mathematical description of a deep learning DAG into a CUDA kernel with delegated memory management and synchronization, and a compilation cache populated by an autotuner are contributed.

TenSet: A Large-scale Program Performance Dataset for Learned Tensor Compilers

This work introduces TenSet, a large-scale tensor program performance dataset, and provides comprehensive studies on how to learn and evaluate the cost models, including data collection, model architectures, loss functions, transfer learning, and evaluation metrics.

MetaTune: Meta-Learning Based Cost Model for Fast and Efficient Auto-tuning Frameworks

MetaTune is proposed, a meta-learning based cost model that more quickly and accurately predicts the performance of optimized codes with pre-trained model parameters that provides 8 to 13% better inference time on average for four CNN models with comparable or lower optimization time while outperforming transfer learning by 10% in cross-platform cases.

PET: Optimizing Tensor Programs with Partially Equivalent Transformations and Automated Corrections

PET is proposed, the first DNN framework that optimizes tensor programs with partially equivalent transformations and automated corrections, and design an efficient search algorithm to quickly discover highly optimized programs by combining fully and partially equivalent optimizations at the tensor, operator, and graph levels.

A Learned Performance Model for Tensor Processing Units

It is shown that the learned model outperforms a heavily-optimized analytical performance model on two tasks—tile-size selection and operator fusion—and that it helps an autotuner discover faster programs in a setting where access to TPUs is limited or expensive.

Optimizing DNN computation graph using graph substitutions

This work formally defines the Optimizing Computation Graph using Graph Substitutions (OCGGS) problem, and proves it to be NP-hard and Poly-APX-complete, and develops two exact and efficient methods to the OCGGS problem.

Optimizing DNN Computation with Relaxed Graph Substitutions

A backtracking search algorithm is introduced over a set of relaxed graph substitutions to find optimized networks and a flow-based graph split algorithm is used to recursively split a computation graph into smaller subgraphs to allow efficient search.

BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding

A new language representation model, BERT, designed to pre-train deep bidirectional representations from unlabeled text by jointly conditioning on both left and right context in all layers, which can be fine-tuned with just one additional output layer to create state-of-the-art models for a wide range of tasks.