Mahmood Bijankhan

Learn More
This paper addresses some of the issues learned during the course of building a written language resource (called ‘Peykare’) for contemporary Persian. After defining five linguistic varieties and 24 different registers based on these linguistic varieties, we collected the texts for Peykare to do a linguistic analysis, including cross-register differences.(More)
Persian is one of the Indo-European languages which has borrowed its script from Arabic, a member of Semitic language family. Since Persian and Arabic scripts are so similar, problems arise when we want to process an electronic text. In this paper, some of the common problems faced experimentally in developing a corpus for Persian are discussed. The sources(More)
Generating pronunciation variants of words is an important applicable subject in speech researches and is used extensively in automatic speech segmentation and recognition systems. In this way, decision trees are extremely used to model pronunciation variants of words and sub-word unites. In the case of word unites and very large vocabulary, to train(More)
Small acoustic differences in duration, intensity and vowel formants were found between initial and final accented target words in Persian, by the side of substantial differences in f0. On the basis of these data and the results of a perception experiment in which an f0 continuum was superimposed on a single source utterance, we conclude that Persian has a(More)
In this research, a Text-To-Speech system for Farsi language has been implemented. The proposed synthesizer concatenates Farsi syllables in a TD-PSOLA manner. This paper is mainly concentrated on investigation about pitch variations in Farsi sentences and presentation of some novel rules for modeling these variations. Based on the location of stressed(More)
This paper describes an ongoing research to create an acoustic phonetic based telephone Farsi speech database, called “Tfarsdat”. It is compared with two LDC Farsi corpora, OGI and Call friend in terms of corpus dialectology. Up to now, we have recorded about 8 hours of monologue calls containing spontaneous and read speech for 64 speakers belonging to one(More)
Persian clitic groups differ from words. Most importantly, a pitch accent (L+)H* is associated with the word-final (i.e. base-final) syllable of clitic groups, but with the word-final syllable of words, meaning that clitics remain outside the domain of the word. The pitch accent marks the stress, but we found no independent durational or spectral(More)
Morphological and syntactic annotation of multi-token units confront several problems due to the concatenating nature of Persian script and so its orthographic variation. In the present paper, by the analysis of the different collocation types of the tokens, the compositional, non-compositional and semicompositional constructions are described and then, in(More)