Learn More
This article presents two methods for the automatic detection of social events that were evaluated on the annotated set of pictures as part of the 2011 Mediaeval benchmark [1]. The first method uses a set of web pages and a semantic space obtained by Latent Dirichelet Allocation (LDA, [2, 3]) to classify pictures from Flickr. The second approach uses the(More)
Various studies highlighted that topic-based approaches give a powerful spoken content representation of documents. Nonetheless, these documents may contain more than one main theme, and their automatic transcription inevitably contains errors. In this study, we propose an original and promising framework based on a compact representation of a textual(More)
In this paper, we present a method of tweet contextualization by using a semantic space to extend the tweet vocabulary. This method is evaluated on the tweet contextualization benchmark. Contextualiza-tion is build with the sentences from English Wikipedia. The context is obtained by querying a baseline system of summary. The query is made with words from a(More)
In this paper, we study the impact of dialogue representations and classification methods in the task of theme identification of telephone conversation services having highly imperfect automatic transcriptions. Two dialogue representations are firstly compared: the classical Term Frequency-Inverse Document Frequency with Gini purity criteria (TF-IDF-Gini)(More)
Although the current transcription systems could achieve high recognition performance, they still have a lot of difficulties to transcribe speech in very noisy environments. The transcription quality has a direct impact on classification tasks using text features. In this paper, we propose to identify themes of telephone conversation services with the(More)
The paper introduces new features for describing possible focus variation in a human/human conversation. The application considered is a real-life telephone customer care service. The purpose is to hypothesize the dominant theme of conversations between a casual customer calling. Conversations are processed by an automatic speech recognition system that(More)
Mapping text documents in an LDA-based topic-space is a classical way to extract high-level representation of text documents. Unfortunately, LDA is highly sensitive to hyper-parameters related to the number of classes, or word and topic distribution, and there is no systematic way to pre-estimate optimal configurations. Moreover, various hyper-parameter(More)
—The amount of information exchanged over the Internet is continuously growing, taking the form of short text messages on microblogging platforms such as Twitter. Due to the limited size of these types of messages, their understanding may require to know the context of their occurrence. In this paper, we propose a higher-level representation of short text(More)