• Publications
  • Influence
EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks
TLDR
A new scaling method is proposed that uniformly scales all dimensions of depth/width/resolution using a simple yet highly effective compound coefficient and is demonstrated the effectiveness of this method on scaling up MobileNets and ResNet. Expand
Searching for MobileNetV3
TLDR
This paper starts the exploration of how automated search algorithms and network design can work together to harness complementary approaches improving the overall state of the art of MobileNets. Expand
MnasNet: Platform-Aware Neural Architecture Search for Mobile
TLDR
An automated mobile neural architecture search (MNAS) approach, which explicitly incorporate model latency into the main objective so that the search can identify a model that achieves a good trade-off between accuracy and latency. Expand
EfficientDet: Scalable and Efficient Object Detection
TLDR
This paper systematically study neural network architecture design choices for object detection and proposes a weighted bi-directional feature pyramid network (BiFPN) and a compound scaling method that uniformly scales the resolution, depth, and width for all backbone, feature network, and box/class prediction networks at the same time. Expand
MixConv: Mixed Depthwise Convolutional Kernels
TLDR
This paper proposes a new mixed depthwise convolution (MixConv), which naturally mixes up multiple kernel sizes in a single convolution, and improves the accuracy and efficiency for existing MobileNets on both ImageNet classification and COCO object detection. Expand
Adversarial Examples Improve Image Recognition
TLDR
This work proposes AdvProp, an enhanced adversarial training scheme which treats adversarial examples as additional examples, to prevent overfitting, and shows that AdvProp improves a wide range of models on various image recognition tasks and performs better when the models are bigger. Expand
BigNAS: Scaling Up Neural Architecture Search with Big Single-Stage Models
TLDR
The proposed BigNAS, an approach that challenges the conventional wisdom that post-processing of the weights is necessary to get good prediction accuracies, is proposed, able to train a single set of shared weights on ImageNet and use these weights to obtain child models whose sizes range from 200 to 1000 MFLOPs. Expand
Nyströmformer: A Nyström-Based Algorithm for Approximating Self-Attention
TLDR
This work proposes Nyströmformer - a model that exhibits favorable scalability as a function of sequence length and performs favorably relative to other efficient self-attention methods. Expand
Smooth Adversarial Training
TLDR
The purpose of smooth activation functions in SAT is to allow it to find harder adversarial examples and compute better gradient updates during adversarial training, which improves adversarial robustness for "free", i.e., no drop in accuracy and no increase in computational cost. Expand
Search to Distill: Pearls Are Everywhere but Not the Eyes
TLDR
This work presents a new architecture-aware Knowledge Distillation approach that finds student models (pearls for the teacher) that are best for distilling the given teacher model and leverages Neural Architecture Search (NAS), equipped with the authors' KD-guided reward, to search for the best student architectures for a given teacher. Expand
...
1
2
3
4
5
...