Rahila Parveen

Learn More
This paper reports the design and development of an 82 speaker Urdu speech corpus for speaker independent spontaneous speech recognition using the CMU Sphinx Open Source Toolkit for Speech Recognition. The corpus consists of 45 hours of spontaneous and read speech data from 82 speakers (42 male and 40 female), recorded over a microphone and a telephone(More)
This paper presents the development of acoustic and language models for robust Urdu speech recognition using the CMU Sphinx Open Source Toolkit for speech recognition. Three models have been developed incrementally, with the addition of speech data of up to two speakers per pass; one model using data from 40 female speakers only, one from 41 male speakers(More)
The paper presents a design schema and details of a new Urdu POS tagset. This tagset is designed due to challenges encountered in working with existing tagsets for Urdu. It uses tags that judiciously incorporate information about special morpho-syntactic categories found in Urdu. With respect to the overall naming schema and the basic divisions, the tagset(More)
  • 1