Jean Carletta

Learn More
Currently, computational linguists and cognitive scientists working in the area of discourse and dialogue argue that their subjective judgments are reliable using several different statistics, none of which are easily interpretable or comparable to each other. Meanwhile, researchers in content analysis have already experienced the same difficulties and come(More)
The AMI Meeting Corpus is a multi-modal data set consisting of 100 hours of meeting recordings. It is being created in the context of a project that is developing meeting browsing technology and will eventually be released publicly. Some of the meetings it contains are naturally occurring, and some are elicited, particularly using a scenario in which the(More)
Dialogue work, like the rest of linguistics, has traditionally used isolated examples, either constructed or real. Now many researchers are beginning to try to code large dialogue corpora for higher level dialogue structure in the hope of giving their findings a firmer basis. The purpose of this paper is to introduce and describe the reliability of a scheme(More)
Creating the AMI Meeting Corpus was an ambitious endeavour, probably more ambitious than the people who first thought of it realize even now. It contains 100 hours of meetings captured using a whole host of synchronized recording devices, and is designed to support work in speech and video processing, language engineering, corpus linguistics, and(More)
Multimodal corpora that show humans interacting via language are now relatively easy to collect. Current tools allow one either to apply sets of time-stamped codes to the data and consider their timing and sequencing or to describe some specific linguistic structure that is present in the data, built over the top of some form of transcription. To further(More)
The research below explores schemes for evaluating automatic summaries of business meetings, using the ICSI Meeting Corpus (Janin et al., 2003). Both automatic and subjective evaluations were carried out, with a central interest being whether or not the two types of evaluations correlate with each other. The evaluation metrics were used to compare and(More)
We present an annotation scheme for information status (IS) in dialogue, and validate it on three Switchboard dialogues. We show that our scheme has good reproducibility, and compare it with previous attempts to code IS and related features. We eventually apply the scheme to 147 dialogues, thus producing a corpus that contains nearly 70,000 NPs annotated(More)