Learn More
Combining diverse low-level features from multiple modalities has consistently improved performance over a range of video processing tasks, including event detection. In our work, we study graph based clustering techniques for integrating information from multiple modalities by identifying word clusters spread across the different modalities. We present(More)
Text-to-speech synthesis (TTS) is the final stage in the speech-tospeech (S2S) translation pipeline, producing an audible rendition of translated text in the target language. TTS systems typically rely on a lexicon to look up pronunciations for each word in the input text. This is problematic when the target language is dialectal Arabic, because the(More)
  • 1