Corpus ID: 219793039

BlazePose: On-device Real-time Body Pose tracking

  title={BlazePose: On-device Real-time Body Pose tracking},
  author={Valentin Bazarevsky and I. Grishchenko and Karthik Raveendran and Tyler Lixuan Zhu and Fangfang Zhang and Matthias Grundmann},
We present BlazePose, a lightweight convolutional neural network architecture for human pose estimation that is tailored for real-time inference on mobile devices. During inference, the network produces 33 body keypoints for a single person and runs at over 30 frames per second on a Pixel 2 phone. This makes it particularly suited to real-time use cases like fitness tracking and sign language recognition. Our main contributions include a novel body pose tracking solution and a lightweight body… Expand
Body Weight Estimation using 2D Body Image
A novel computer-vision based method for body weight estimation using only 2D images of people is proposed, and the results obtained are much faster due to the reduced complexities of the proposed models, with facial models performing better than full body models. Expand
Deep Learning based Virtual Point Tracking for Real-Time Target-less Dynamic Displacement Measurement in Railway Applications
This work proposes virtual point tracking for real-time target-less dynamic displacement measurement, incorporating deep learning techniques and domain knowledge to tackle this issue. Expand
Automatic generation of a 3D sign language avatar on AR glasses given 2D videos of human signers
In this paper we present a prototypical implementation of a pipeline that allows the automatic generation of a German Sign Language avatar from 2D video material. The presentation is accompanied byExpand
Interactive Video Acquisition and Learning System for Motor Assessment of Parkinson's Disease
  • Yunyue Wei, Bingquan Zhu, Chen Hou, Chen Zhang, Yanan Sui
  • Computer Science
  • 2021
Diagnosis and treatment for Parkinson’s disease rely on the evaluation of motor functions, which is expensive and time consuming when performing at clinics. It is also difficult for patients toExpand
Learning When Agents Can Talk to Drivers Using the INAGT Dataset and Multisensor Fusion
This paper examines sensor fusion techniques for modeling opportunities for proactive speech-based in-car interfaces. We leverage the Is Now a Good Time (INAGT) dataset, which consists of automotive,Expand
Reducing latency and bandwidth for video streaming using keypoint extraction and digital puppetry
This paper proposes an alternative to the conventional codec through the implementation of a keypoint-centric encoder relying on the transmission of keypoint information from within a video feed, as shown in Figure 1. Expand
BodySLAM: Opportunistic User Digitization in Multi-User AR/VR Experiences
The core idea behind BodySLAM is to uses disparate camera views from users to digitize the body, hands and mouth of other people, and then relay that information back to the respective users. Expand


OpenPose: Realtime Multi-Person 2D Pose Estimation Using Part Affinity Fields
OpenPose is released, the first open-source realtime system for multi-person 2D pose detection, including body, foot, hand, and facial keypoints, and the first combined body and foot keypoint detector, based on an internal annotated foot dataset. Expand
BlazeFace: Sub-millisecond Neural Face Detection on Mobile GPUs
These contributions include a lightweight feature extraction network inspired by, but distinct from MobileNetV1/V2, a GPU-friendly anchor scheme modified from Single Shot MultiBox Detector (SSD), and an improved tie resolution strategy alternative to non-maximum suppression. Expand
Real-time Facial Surface Geometry from Monocular Video on Mobile GPUs
An end-to-end neural network-based model for inferring an approximate 3D mesh representation of a human face from single camera input for AR applications and demonstrates super-realtime inference speed on mobile GPUs and a high prediction quality that is comparable to the variance in manual annotations of the same image. Expand
PifPaf: Composite Fields for Human Pose Estimation
The new PifPaf method, which uses a Part Intensity Field to localize body parts and a Part Association Field to associate body parts with each other to form full human poses, outperforms previous methods at low resolution and in crowded, cluttered and occluded scenes. Expand
Stacked Hourglass Networks for Human Pose Estimation
This work introduces a novel convolutional network architecture for the task of human pose estimation that is described as a “stacked hourglass” network based on the successive steps of pooling and upsampling that are done to produce a final set of predictions. Expand
Deep High-Resolution Representation Learning for Human Pose Estimation
This paper proposes a network that maintains high-resolution representations through the whole process of human pose estimation and empirically demonstrates the effectiveness of the network through the superior pose estimation results over two benchmark datasets: the COCO keypoint detection dataset and the MPII Human Pose dataset. Expand
Microsoft COCO: Common Objects in Context
We present a new dataset with the goal of advancing the state-of-the-art in object recognition by placing the question of object recognition in the context of the broader question of sceneExpand
On-device, real-time hand tracking with mediapipe. on-device-real-time-hand-tracking-with
  • html. [Online; accessed April
  • 2020
Openpose 1.1.0 benchmark. 1-DynFGvoScvfWDA1P4jDInCkbD4lg0IKOYbXgEq0sK0
  • [Online; accessed March
  • 2020
Azure kinect body tracking joints