Learn More
This paper presents a new technique for voiced/unvoiced (V/UV) discrimination based on the extraction of pitch period. Empirical mode decomposition (EMD) is employed for multi-band representation of speech signal in time domain. The fundamental oscillation in a speech segment is determined in the autocorrelation function (ACF) of the EMD space. A damped(More)
There are many studies that collect and store life log for personal memory. The paper explains how a system can create someone's life log in an inexpensive way to share daily life events with family or friends through socialnetwork or messaging. In the modern world where people are usually busier than ever, family members are geographically distributed due(More)
The efficiency of Hilbert spectrum (HS) in time-frequency representation (TFR) of audio signals is investigated in this paper. HS is derived by applying empirical mode decomposition (EMD), a newly developed data adaptive method for nonlinear and non-stationary signal analysis together with Hilbert transform. EMD represents any time domain signal as a sum of(More)
A novel and structural representation of speech is recently proposed, where the dimensions of inevitable non-linguistic features are diminished. This representation is called the acoustic universal structure. Every speech event is described as distribution and distance between any two events is calculated as normalized cross correlation. Then, a global(More)
Non-linguistic factors such as morphological differences in vocal tracts inevitably affect acoustic features of speech. Recently, a new speech representation, called as structural representation, was proposed which is completely independent of these factors. In the representation, the absolute property of speech events is totally discarded and their(More)
This paper presents a text independent speaker identification system using multi-band features with artificial neural network. Linear predictive cepstrum coefficients (LPCCs) computed from sub-band signals with higher order statistics (HOS) are employed as the main features to represent the speaker characteristics. The multi-band representation of the(More)
Acoustic features in speech are affected inevitably by non-linguistic factors, which easily decrease speech recognition performance. Recently, we proposed a structural and speaker-invariant representation of speech, where speech substances are completely discarded and speech contrasts are only extracted. After converting an input speech stream into N(More)
Most of the speech synthesizers have been developed as text (phoneme sequence) to speech converters and, in this framework, text input is a precondition for speech production. However, we can say that no child acquires spoken language by reading a given text out. Children are explained to acquire spoken language by imitating the utterances of their parents(More)
This research presents an innovative system for adaptive speech denoising using Independent Component Analysis (ICA) and Voice Activity Detection (VAD). Designed for instantaneous mixtures (two sources and two microphones), the proposed system identifies the noise contained in each noisy mixture. For that type of noise applies the most suitable ICA method(More)