Chinese Whispers - an Efficient Graph Clustering Algorithm and its Application to Natural Language Processing Problems
- Chris Biemann
- Computer Science
- 9 June 2006
The performance of Chinese Whispers is measured on Natural Language Processing (NLP) problems as diverse as language separation, acquisition of syntactic word classes and word sense disambiguation.
UKP: Computing Semantic Textual Similarity by Combining Multiple Content Similarity Measures
- Daniel Bär, Chris Biemann, Iryna Gurevych, Torsten Zesch
- Computer ScienceInternational Workshop on Semantic Evaluation
- 7 June 2012
This work uses a simple log-linear regression model, trained on the training data, to combine multiple text similarity measures of varying complexity, which range from simple character and word n-grams and common subsequences to complex features such as Explicit Semantic Analysis vector comparisons and aggregation of word similarity based on lexical-semantic resources.
Do Supervised Distributional Methods Really Learn Lexical Inference Relations?
- Omer Levy, Steffen Remus, Chris Biemann, Ido Dagan
- Computer ScienceNorth American Chapter of the Association for…
- 2015
This work investigates a collection of distributional representations of words used in supervised settings for recognizing lexical inference relations between word pairs, and shows that they do not actually learn a relation between two words, but an independent property of a single word in the pair.
WebAnno: A Flexible, Web-based and Visually Supported System for Distributed Annotations
- Seid Muhie Yimam, Iryna Gurevych, Richard Eckart de Castilho, Chris Biemann
- Computer ScienceAnnual Meeting of the Association for…
- 1 August 2013
WebAnno offers annotation project management, freely configurable tagsets and the management of users in different roles, and the architecture design allows adding additional modes of visualization and editing, when new kinds of annotations are to be supported.
HateXplain: A Benchmark Dataset for Explainable Hate Speech Detection
- Binny Mathew, Punyajoy Saha, Seid Muhie Yimam, Chris Biemann, Pawan Goyal, Animesh Mukherjee
- Computer ScienceAAAI Conference on Artificial Intelligence
- 18 December 2020
HateXplain is introduced, the first benchmark hate speech dataset covering multiple aspects of the issue and utilizes existing state-of-the-art models, observing that models, which utilize the human rationales for training, perform better in reducing unintended bias towards target communities.
Corpus Portal for Search in Monolingual Corpora
- U. Quasthoff, Matthias Richter, Chris Biemann
- Computer ScienceInternational Conference on Language Resources…
- 1 May 2006
A simple and flexible schema for storing and presenting monolingual language resources is proposed to ease the application of algorithms for monolingUAL and interlingual studies.
TopicTiling: A Text Segmentation Algorithm based on LDA
- Martin Riedl, Chris Biemann
- Computer ScienceAnnual Meeting of the Association for…
- 9 July 2012
This work presents a Text Segmentation algorithm called TopicTiling, which is based on the well-known TextTiling algorithm, and segments documents using the Latent Dirichlet Allocation topic model, and is computationally less expensive than other LDA-based segmentation methods.
NoSta-D Named Entity Annotation for German: Guidelines and Dataset
- Darina Benikova, Chris Biemann, Marc Reznicek
- Computer ScienceInternational Conference on Language Resources…
- 1 May 2014
The approach to creating annotation guidelines based on linguistic and semantic considerations is described, and how they were iteratively refined and tested in the early stages of annotation to arrive at the largest publicly available dataset for German NER, consisting of over 31,000 manually annotated sentences from German Wikipedia and German online news.
A Report on the Complex Word Identification Shared Task 2018
- Seid Muhie Yimam, Chris Biemann, Marcos Zampieri
- LinguisticsBEA@NAACL-HLT
- 24 April 2018
The second CWI shared task featured multilingual and multi-genre datasets divided into four tracks, two tasks: binary classification and probabilistic classification and a total of 12 teams submitted their results in different task/track combinations.
Making Sense of Word Embeddings
- Maria Pelevina, Nikolay Arefiev, Chris Biemann, Alexander Panchenko
- Computer ScienceRep4NLP@ACL
- 1 August 2016
This work presents a simple yet effective approach that can induce a sense inventory from existing word embeddings via clustering of ego-networks of related words and an integrated WSD mechanism enables labeling of words in context with learned sense vectors.
...
...