• Corpus ID: 8177828

Capturing Chat: Annotation and Tools for Multiparty Casual Conversation

  title={Capturing Chat: Annotation and Tools for Multiparty Casual Conversation},
  author={Emer Gilmartin and Nick Campbell},
Casual multiparty conversation is an understudied but very common genre of spoken interaction, whose analysis presents a number of challenges in terms of data scarcity and annotation. [] Key Method We also describe a visualization tool, STAVE, developed during the annotation process, which allows long stretches of talk or indeed entire conversations to be viewed, aiding preliminary identification of features and patterns worthy of analysis. It is hoped that this tool will be of use to other researchers…

Figures and Tables from this paper

Chats and Chunks: Annotation and Analysis of Multiparty Long Casual Conversations
This work assembles a collection of conversations from three existing corpora, describing the collection, organization, and annotation of structural chat and chunk phases in these conversations, and noting significant differences in the distribution of overlap, laughter and disfluency in chat and chunks phases.
Silence and overlap in chat and chunk phases of multiparty casual conversation
Casual conversation, ‘talk for the sake of talking’, is often multiparty, with no clear practical goal, and can last up to several hours. Longer conversations proceed in phases of chat and chunk,
Exploring Multiparty Casual Talk for Social Human-Machine Dialogue
The current knowledge on the structure of casual talk is outlined, investigations in the domain are described, the findings on timing, laughter, and disfluency in this domain are summarized, and how they can inform the design and implementation of truly social machine dialogue partners are discussed.
Chunks in Multiparty Conversation - Building Blocks for Extended Social Talk
Preliminary explorations of the structure of long multiparty casual conversations focus on a binary distinction between two types of interaction phases—chat and chunk, finding that chat and chunk segments show differences in the distribution of their duration.
Stitching Together the Conversation - Considerations in the Design of Extended Social Talk
The work towards modelling such talk is described, focussing on theories of casual talk, insights gleaned from human-human corpora, and implications for dialog system design.
Breath Holds in Chat and Chunk Phases of Multiparty Casual Conversation
Breathing has been associated with all the basic turn organisation categories, with breath holds, where a participant holds their breath for a short period of time, specifically claimed as markers of
AlloSat: A New Call Center French Corpus for Satisfaction and Frustration Analysis
This work proposes a corpus with a rich annotation scheme enabling a real-time investigation of the axis frustration / satisfaction and presents a new corpus, named AlloSat, composed of real-life call center conversations in French that is continuously annotated in frustration and satisfaction.
Evaluation of Speaker Anonymization on Emotional Speech
The results show that the VPC baseline system does not suppress speakers’ emotions against informed attackers, and the emotion recognition performance is degraded by 15% relative to IEMOCAP data, similar to the degradation observed for automatic speech recognition used to evaluate the preservation of the linguistic information.
Software Support for Discourse-Based Textual Information Analysis: A Systematic Literature Review and Software Guidelines in Practice
A wide and interdisciplinary systematic literature review (SLR) on the information visualization and the software solutions adopted to support pragmatics textual analysis and some software guidelines for improving the detected weaknesses are provided.


Disfluency in Multiparty Social Talk
Much research on disfluencies in spontaneous spoken interaction has been carried out on corpora of task-based conversations, resulting in greater understanding of the role of several phenomena.
The AMI meeting corpus
The corpus is being distributed using a web server designed to allow convenient browsing and download of multimedia content and associated annotations, as well as data collection, annotation and distribution.
D64: a corpus of richly recorded conversational interaction
The D64 corpus is a multimodal corpus recorded over two successive days and comprises annotations on conversational involvement, speech activity and pauses as well as information of the average degree of change in the movement of participants.
Using DiAML and ANVIL for multimodal dialogue annotations
This paper shows how interoperable dialogue act annotations can conveniently be obtained using the newly implemented facility in the ANVIL annotation tool to produce XML-based output directly in the DiAML format, thus supporting the creation of interoperable annotated corpora of multimodal dialogue.
The TRAINS 93 Dialogues
This report describes a corpus of task-oriented dialogues set in the TRAINS domain. A user collaborates with a planning assistant to accomplish some task involving manufacturing and shipping goods in
Analysing Casual Conversation
Introduction: Collecting and transcribing casual conversation 1. Making meanings in everyday talk 2. Relevant approaches to analysing casual conversation 3. The grammar of casual conversation:
Spontal: A Swedish Spontaneous Dialogue Corpus of Audio, Video and Motion Capture
The Spontal database of spontaneous Swedish dialogues, a corpus of 120 dialogues of at least 30 minutes each captured in high-quality audio, high-resolution video and with a motion capture system, is presented.
The Hcrc Map Task Corpus
A corpus of unscripted, task-oriented dialogues which has been designed, digitally recorded, and transcribed to support the study of spontaneous speech on many levels is described.
ELAN: a Professional Framework for Multimodality Research
The efforts made to make ELAN a tool that meets the flexibility, efficiency and time accuracy requirements of annotation tools are described, with special attention to the developments in the area of time accuracy.
Disfluency in Swedish human–human and human–machine travel booking dialogues
This thesis studies disfluency in spontaneous Swedish speech, i.e., the occurrence of hesitation phenomena like eh, oh, truncated words, repetitions and repairs, mispronunciations, truncated words