• Publications
  • Influence
MobiRNN: Efficient Recurrent Neural Network Execution on Mobile GPU
TLDR
MobiRNN is presented, a mobile-specific optimization framework that implements GPU offloading specifically for mobile GPUs that does significantly decrease the latency of running RNN models on phones. Expand
UIWear: Easily Adapting User Interfaces for Wearable Devices
TLDR
The UIWear system abstracts a logical model of the smartphone GUI, re-tailors the GUI for the wearable device based on the specified UI design, and compiles it into a companion app that is performed comparably or better than the corresponding companion app. Expand
DeFormer: Decomposing Pre-trained Transformers for Faster Question Answering
TLDR
DeFormer, a decomposed transformer, is introduced, which substitutes the full self-attention with question-wide and passage-wideself-attentions in the lower layers, which allows for question-independent processing of the input text representations. Expand
Towards Accurate and Reliable Energy Measurement of NLP Models
TLDR
This work shows that existing software-based energy estimations are not accurate because they do not take into account hardware differences and how resource utilization affects energy consumption, and quantify the error by using a hardware power meter. Expand
DeQA: On-Device Question Answering
TLDR
DeQA is presented, a suite of latency- and memory- optimizations that adapts existing QA systems to run completely locally on mobile phones and provides at least 13x speedup on average on the mobile phone across all three datasets. Expand
IrEne: Interpretable Energy Prediction for Transformers
TLDR
IrEne is an interpretable and extensible energy prediction system that accurately predicts the inference energy consumption of a wide range of Transformer-based NLP models and can be used to conduct energy bottleneck analysis and to easily evaluate the energy impact of different architectural choices. Expand
DeQA: On-DeviceQuestion Answering
Today there is no effective support for device-wide question answering on mobile devices. State-of-the-art QAmodels are deep learning behemoths designed for the cloud which run extremely slow andExpand
Are Mobile DNN Accelerators Accelerating DNNs?
TLDR
An in-depth study of one set of commercially-available mobile accelerators, the Intel Neural Compute Sticks (NCS), and performs a systematic measurement study of the latency and energy of this accelerator under a variety of DNNs. Expand
Demo: UIWear: Easily Adapting User Interfaces for Wearable Devices
TLDR
A working prototype of the system UIWear is shown that allows a developer to easily extend a smartphone application to other wearable interfaces and can create the same functionality as existing companion apps with an order-of-magnitude less programming effort. Expand