Yonatan Vaizman

Learn More
Digital music has become prolific in the web in recent decades. Automated recommendation systems are essential for users to discover music they love and for artists to reach appropriate audience. When manual annotations and user preference data is lacking (e.g. for new artists) these systems must rely on content based methods. Besides powerful machine(More)
Emotional content is a major component in music. It has long been a research topic of interest to discover the acoustic patterns in the music that carry that emotional information, and enable performers to communicate emotional messages to listeners. Previous works looked in the audio signal for local cues, most of which assume monophonic music, and their(More)
We propose the multivariate autoregressive model for content based music auto-tagging. At the song level our approach leverages the multivariate autoregressive mixture (ARM) model, a generative time-series model for audio, which assumes each feature vector in an audio fragment is a linear function of previous feature vectors. To tackle tagmodel estimation,(More)
We developed automatic computational tools for the monitoring of pathological mental states – including characterization, detection, and classification. We show that simple temporal domain features of speech may be used to correctly classify up to 80% of the speakers in a two-way classification task. We further show that some features strongly correlate(More)
A challenge for speech recognition for voice-controlled household devices, like the Amazon Echo or Google Home, is robustness against interfering background speech. Formulated as a far-field speech recognition problem, another person or media device in proximity can produce background speech that can interfere with the device-directed speech. We expand on(More)
The ability to automatically recognize a person’s behavioral context can contribute to health monitoring, aging care, and many other domains. Validating context recognition in the wild is crucial to promote practical applications that work in real-life settings. The authors collected more than 300,000 minutes of sensor data with context labels from(More)
  • 1