Sercan Ömer Arik

Learn More
We present Deep Voice, a production-quality text-to-speech system constructed entirely from deep neural networks. Deep Voice lays the groundwork for truly end-to-end neural speech synthesis. The system comprises five major building blocks: a segmentation model for locating phoneme boundaries, a grapheme-tophoneme conversion model, a phoneme duration(More)
We introduce a technique for augmenting neural text-to-speech (TTS) with lowdimensional trainable speaker embeddings to generate different voices from a single model. As a starting point, we show improvements over the two state-ofthe-art approaches for single-speaker neural TTS: Deep Voice 1 and Tacotron. We introduce Deep Voice 2, which is based on a(More)
Mode-division multiplexing (MDM) can increase the capacity of direct-detection short-reach systems in proportion to the number of modes employed. MDM requires compensation of modal crosstalk at a transmitter or receiver by the multi-input multi-output (MIMO) signal processing. We show that the channel estimation required for the MIMO processing in a basis(More)
We present the fundamentals of multiple-input, multiple-output (MIMO) signal processing for mode-division multiplexing (MDM) in multimode fiber (MMF). As an introduction, we review current long-haul optical transmission systems and how continued traffic growth motivates study of new methods to increase transmission capacity per fiber. We describe the key(More)
Keyword spotting (KWS) constitutes a major component of human-technology interfaces. Maximizing the detection accuracy at a low false alarm (FA) rate, while minimizing the footprint size, latency and complexity are the goals for KWS. Towards achieving them, we study Convolutional Recurrent Neural Networks (CRNNs). Inspired by large-scale state-ofthe-art(More)
 Abstract— Phase retrieval has important applications in optical imaging, communications and sensing. Lifting the dimensionality of the problem allows phase retrieval to be approximated as a convex optimization problem in a higher-dimensional space. Convex optimization-based phase retrieval has been shown to yield high accuracy, yet its low-complexity(More)
Efficient solutions for the classification of multi-view images can be built on graph-based algorithms when little information is known about the scene or cameras. Such methods typically require a pair-wise similarity measure between images, where a common choice is the Euclidean distance. However, the accuracy of the Euclidean distance as a similarity(More)
We review channel models for mode-division multiplexing (MDM) systems, the statistics derived from them, and their implications for system performance and complexity. We present the fundamentals of architectures and algorithms for multi-input multi-output (MIMO) equalization. With careful physical link design and judicious choice of signal processing(More)
Accommodating sustained exponential traffic growth in optical networks requires scaling the spatial dimension using space-division multiplexing. Numerous uncoupled spatial channels may be realized by activating multiple parallel fibers or cores in multicore fibers. A multiplicity of uncoupled spatial channels will render the granularity provided by multiple(More)