• Publications
  • Influence
SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and <1MB model size
TLDR
This work proposes a small DNN architecture called SqueezeNet, which achieves AlexNet-level accuracy on ImageNet with 50x fewer parameters and is able to compress to less than 0.5MB (510x smaller than AlexNet). Expand
The Landscape of Parallel Computing Research: A View from Berkeley
TLDR
The parallel landscape is frame with seven questions, and the following are recommended to explore the design space rapidly: • The overarching goal should be to make it easy to write programs that execute efficiently on highly parallel computing systems • The target should be 1000s of cores per chip, as these chips are built from processing elements that are the most efficient in MIPS (Million Instructions per Second) per watt, MIPS per area of silicon, and MIPS each development dollar. Expand
FBNet: Hardware-Aware Efficient ConvNet Design via Differentiable Neural Architecture Search
TLDR
This work proposes a differentiable neural architecture search (DNAS) framework that uses gradient-based methods to optimize ConvNet architectures, avoiding enumerating and training individual architectures separately as in previous methods. Expand
- LEVEL ACCURACY WITH 50 X FEWER PARAMETERS AND < 0 . 5 MB MODEL SIZE
Recent research on deep convolutional neural networks (CNNs) has focused primarily on improving accuracy. For a given accuracy level, it is typically possible to identify multiple CNN architecturesExpand
SqueezeSeg: Convolutional Neural Nets with Recurrent CRF for Real-Time Road-Object Segmentation from 3D LiDAR Point Cloud
TLDR
An end-to-end pipeline called SqueezeSeg based on convolutional neural networks (CNN), which takes a transformed LiDAR point cloud as input and directly outputs a point-wise label map, which is then refined by a conditional random field (CRF) implemented as a recurrent layer. Expand
Dense Point Trajectories by GPU-Accelerated Large Displacement Optical Flow
TLDR
This paper provides a method for computing point trajectories based on a fast parallel implementation of a recent optical flow algorithm that tolerates fast motion and proves that the fixed point matrix obtained in the optical flow technique is positive semi-definite. Expand
DenseNet: Implementing Efficient ConvNet Descriptor Pyramids
TLDR
DenseNet is presented, an open source system that computes dense, multiscale features from the convolutional layers of a CNN based object classifier. Expand
Large Batch Optimization for Deep Learning: Training BERT in 76 minutes
TLDR
The empirical results demonstrate the superior performance of LAMB across various tasks such as BERT and ResNet-50 training with very little hyperparameter tuning, and the optimizer enables use of very large batch sizes of 32868 without any degradation of performance. Expand
Bus encoding to prevent crosstalk delay
  • Bret Victor, K. Keutzer
  • Engineering, Computer Science
  • IEEE/ACM International Conference on Computer…
  • 4 November 2001
TLDR
This paper finds that a 32-bit bus can be encoded with 40 wires using a code with memory or 46 wires with a memoryless code, in comparison to the 63 wires required with simple shielding. Expand
SqueezeSegV2: Improved Model Structure and Unsupervised Domain Adaptation for Road-Object Segmentation from a LiDAR Point Cloud
TLDR
This work introduces a new model SqueezeSegV2, which is more robust against dropout noises in LiDAR point cloud and therefore achieves significant accuracy improvement, and a domain-adaptation training pipeline consisting of three major components: learned intensity rendering, geodesic correlation alignment, and progressive domain calibration. Expand
...
1
2
3
4
5
...