• Publications
  • Influence
Automatic Dialect Detection in Arabic Broadcast Speech
TLDR
This work investigates different approaches for dialect identification in Arabic broadcast speech, using phonetic, lexical features obtained from a speech recognition system, and acoustic features using the i-vector framework, and combined these features using a multi-class Support Vector Machine (SVM).
A complete KALDI recipe for building Arabic speech recognition systems
TLDR
A prototype broadcast news system using 200 hours GALE data that is publicly available through LDC and the first effort to share reproducible sizable training and testing results on MSA system is shared.
Universal Adversarial Audio Perturbations
TLDR
It is demonstrated the existence of universal adversarial perturbations, which can fool a family of audio classification architectures, for both targeted and untargeted attack scenarios, and a proof that the proposed penalty method theoretically converges to a solution that corresponds to universal adversaries.
GPU accelerated acoustic likelihood computations
TLDR
The use of Graphics Processors Unit for computing acoustic likelihoods in a speech recognition system shows that GPU is 5x faster than the CPU SSE-based implementation, which led to a speed up of 35% on a large vocabulary task.
Segmentation of recordings based on partial transcriptions
TLDR
This paper presents the approach used to produce a training database from a set of recorded newscasts for which the authors had inaccurate transcriptions, and develops a time marking procedure using the speech recognition engine for segmentation accuracy.
Speaker adaptation using the i-vector technique for bottleneck features
TLDR
The effect of speaker adaptation based on the I-vector framework in the context of stacked bottleneck features is studied, which achieved an absolute WER improvement of 1.2% on an Arabic Broadcast news task.
ETS System for AV+EC 2015 Challenge
TLDR
One of the main findings is that the frame stacking technique improves the quality of the predictions made by the model, and the improvements were also observed in all other modalities.
Content-based video copy detection using nearest-neighbor mapping
TLDR
The NN mapping for video copy detection gives minimal normalized detection cost rate (min NDCR) comparable to that achieved with audio copy detection for the same task.
CRIM´s Content-Based Copy Detection System for TRECVID
TLDR
A new method for SIFT quantizing is introduced, which improves the time computation performance while keeping a good precision for SFT representation and provides easy parallel processing on a graphics processing unit, leading to a very fast search.
...
...