TexPrax: A Messaging Application for Ethical, Real-time Data Collection and Annotation

  title={TexPrax: A Messaging Application for Ethical, Real-time Data Collection and Annotation},
  author={Lorenz Stangier and Ji-Ung Lee and Yuxi Wang and Marvin M{\"u}ller and Nicholas R. J. Frick and Joachim Metternich and Iryna Gurevych},
Collecting and annotating task-oriented dialog data is difficult, especially for highly specific domains that require expert knowledge. At the same time, informal communication channels such as instant messengers are increasingly being used at work. This has led to a lot of work-relevant information that is disseminated through those channels and needs to be post-processed manually by the employees. To alleviate this problem, we present TexPrax, a messaging system to collect and annotate… 

Figures and Tables from this paper



Investigating label suggestions for opinion mining in German Covid-19 social media

This work develops guidelines to conduct a controlled annotation study with social science students and finds that suggestions from a model trained on a small, expert-annotated dataset already lead to a substantial improvement – in terms of inter-annotator agreement and annotation quality – compared to students that do not receive any label suggestions.

Wissen aus betrieblichen Chats nachhaltig nutzen/Results of the transfer project „text analyses in company practice – TexPrax“. Sustainable use of knowledge from company chats

Wertvolles Wissen wird in zahlreichen Prozessen auf dem Shopfloor schriftlich festgehalten. Dies geschieht in formalen Prozessen wie dem Shopfloor Management oder in informeller, direkter

End-to-End Learning of Flowchart Grounded Task-Oriented Dialogs

This work proposes a novel problem within end-to-end learning of task oriented dialogs (TOD), in which the dialog system mimics a troubleshooting agent who helps a user by diagnosing their problem (e.g., car not starting), and designs a neural model, FLONET, which uses a retrieval-augmented generation architecture to train the dialog agent.

Understanding Instant Messaging in the Workplace

The goal of this proposed study is to observe the use of WhatsApp at work using the existing framework, develop an IM usage model at the workplace and test and validate the IM model to make it more workplace-friendly that encourages to full implementation as a medium of communication in the workplace.

“Everyone wants to do the model work, not the data work”: Data Cascades in High-Stakes AI

This paper defines, identifies, and presents empirical evidence on Data Cascades—compounding events causing negative, downstream effects from data issues—triggered by conventional AI/ML practices that undervalue data quality.

The INCEpTION Platform: Machine-Assisted and Knowledge-Oriented Interactive Annotation

INCEpTION is a new annotation platform for tasks including interactive and semantic annotation (e.g., concept linking, fact linking, knowledge base population, semantic frame annotation) that incorporates machine learning capabilities which actively assist and guide annotators.

SUS: A 'Quick and Dirty' Usability Scale

This chapter describes the System Usability Scale (SUS) a reliable, low-cost usability scale that can be used for global assessments of systems usability.

Amazon Mechanical Turk: A Research Tool for Organizations and Information Systems Scholars

  • Kevin Crowston
  • Computer Science, Business
    Shaping the Future of ICT Research
  • 2012
A simple typology of research data collected using AMT is presented and potential threats to reliability and validity are discussed and possible ways to address those threats are discussed.

HuggingFace's Transformers: State-of-the-art Natural Language Processing

The \textit{Transformers} library is an open-source library that consists of carefully engineered state-of-the art Transformer architectures under a unified API and a curated collection of pretrained models made by and available for the community.

Unsupervised Cross-lingual Representation Learning at Scale

It is shown that pretraining multilingual language models at scale leads to significant performance gains for a wide range of cross-lingual transfer tasks, and the possibility of multilingual modeling without sacrificing per-language performance is shown for the first time.