• Publications
  • Influence
An Articulatory-Based Singing Voice Synthesis Using Tongue and Lips Imaging
TLDR
This study uses tongue and lips image sequences recorded during singing performance to predict vocal tract properties via Line Spectral Frequencies (LSF) and developsArticulatory-based singing voice synthesis is developed using both models.
Mixup-Based Acoustic Scene Classification Using Multi-Channel Convolutional Neural Network
TLDR
This paper explores the use of Multi-channel CNN for the classification task, which aims to extract features from different channels in an end-to-end manner, and explores the using of mixup method, which can provide higher prediction accuracy and robustness in contrast with previous models.
Sample Mixed-Based Data Augmentation for Domestic Audio Tagging
TLDR
A convolutional recurrent neural network with attention module with log-scaled mel spectrum as a baseline system is applied to audio tagging, achieving an state-of-the-art of equal error rate (EER) of 0.10 on DCASE 2016 task4 dataset with mixup approach, outperforming the baseline system without data augmentation.
Deep Convolutional Neural Network-Based Early Automated Detection of Diabetic Retinopathy Using Fundus Image
TLDR
This paper explored the use of deep convolutional neural network methodology for the automatic classification of diabetic retinopathy using color fundus image, and obtained an accuracy of 94.5% on the authors' dataset, outperforming the results obtained by using classical approaches.
Tongue contour extraction from ultrasound images based on deep neural network
TLDR
This article presents a method based on deep neural networks to automatically extract tongue contour from ultrasound images on a speech dataset using a deep autoencoder trained to learn the relationship between an image and its related contour.
Robust contour tracking in ultrasound tongue image sequences
TLDR
A new contour-tracking algorithm is presented for ultrasound tongue image sequences, which can follow the motion of tongue contours over long durations with good robustness and can be useful in applications such as speech recognition where very long sequences must be analyzed in their entirety.
AIM 2020: Scene Relighting and Illumination Estimation Challenge
TLDR
The novel VIDIT dataset used in the AIM 2020 challenge and the different proposed solutions and final evaluation results over the 3 challenge tracks are presented.
Convolutional neural network-based automatic classification of midsagittal tongue gestural targets using B-mode ultrasound images.
TLDR
The CNN-based method achieves state-of-the-art performance, even though no pre-training of the CNN was carried out, and the speaker-dependent and speaker-independent tongue gestural target classification experiments are conducted.
Multi-Scale DenseNet-Based Electricity Theft Detection
TLDR
This paper presents a novel approach for automatic detection by using a multi-scale dense connected convolution neural network (multi-scale DenseNet) in order to capture the long-term and short-term periodic features within the sequential data.
Mixup Based Privacy Preserving Mixed Collaboration Learning
TLDR
This paper proposes a novel model averaging method combined with mixup, which provides protection against inversion attack and conducts experiments using state-of-the-art deep network architectures on multiple types of dataset to show that it improves the classification accuracy of models.
...
...