Harish Arsikere

Learn More
Recent research has demonstrated the usefulness of subglottal resonances (SGRs) in speaker normalization. However, existing algorithms for estimating SGRs from speech signals have limited applicability—they are effective with isolated vowels only. This paper proposes a novel algorithm for estimating the first three SGRs (Sg1; Sg2 and Sg3) from continuous(More)
This paper presents an algorithm for automatically estimating speaker height. It is based on: (1) a recently-proposed model of the subglottal system that explains the inverse relation observed between subglottal resonances and height, and (2) an improved version of our previous algorithm for automatically estimating the second subglottal resonance (Sg2).(More)
This paper deals with the automatic estimation of the second subglottal resonance (Sg2) from natural speech spoken by adults, since our previous work focused only on estimating Sg2 from isolated diphthongs. A new database comprising speech and subglottal data of native American English (AE) speakers and bilingual Spanish/English speakers was used for the(More)
This letter investigates the use of MFCCs and GMMs for 1) improving the state of the art in speaker height estimation, and 2) rapid estimation of subglottal resonances (SGRs) without relying on formant and pitch tracking (unlike our previous algorithm in [1]). The proposed system comprises a set of height-dependent GMMs modeling static and dynamic MFCC(More)
This letter focuses on the automatic estimation of the first subglottal resonance (Sg1). A database comprising speech and subglottal data of native American English speakers and bilingual Spanish/English speakers was used for the analysis. Data from 11 speakers (five males and six females) were used to derive an empirical relation among the first formant(More)
This paper presents a large-scale study of subglottal resonances (SGRs) (the resonant frequencies of the tracheo-bronchial tree) and their relations to various acoustical and physiological characteristics of speakers. The paper presents data from a corpus of simultaneous microphone and accelerometer recordings of consonant-vowel-consonant (CVC) words(More)
The growth of Massive Open Online Courses (MOOCs) has been remarkable in the last few years. A significant amount of MOOCs content is in the form of videos and participants often use non-linear navigation to browse through a video. This paper proposes the design of a system that provides non-linear navigation in educational videos using features derived(More)
Previous studies of subglottal resonances have reported findings based on relatively few subjects, and the relations between these resonances, subglottal anatomy, and models of subglottal acoustics are not well understood. In this study, accelerometer signals of subglottal acoustics recorded during sustained [a:] vowels of 50 adult native speakers (25(More)
Speech to personal assistants (e.g., reminders, calendar entries, messaging, voice search) is often uttered under cognitive load, causing nonfinal pausing that can result in premature recognition cut-offs. Prior research suggests that prepausal features can discriminate final from nonfinal pauses, but it does not reveal how speakers would behave if given(More)
Current speech-input systems typically use a nonspeech threshold for end-of-utterance detection. While usually sufficient for short utterances, the approach can cut speakers off during pauses in more complex utterances. We elicit personal-assistant speech (reminders, calendar entries, messaging, search) using a recognizer with a dramatically increased(More)