Learn More
We are witnessing a paradigm shift in Human Language Technology (HLT) that may well have an impact on the field comparable to the statistical revolution: acquiring large-scale resources by exploiting collective intelligence. An illustration of this new approach is <i>Phrase Detectives</i>, an interactive online <i>game with a purpose</i> for creating(More)
Annotated corpora of the size needed for modern computational linguistics research cannot be created by small groups of hand annotators. One solution is to exploit collaborative work on the Web and one way to do this is through games like the ESP game. Applying this methodology however requires developing methods for teaching subjects the rules of the game(More)
Together with the rapidly growing amount of online data we register an immense need for intelligent search engines that access a restricted amount of data as found in intranets or other limited domains. This sort of search engines must go beyond simple keyword indexing/matching, but they also have to be easily adaptable to new domains without huge costs.(More)
This paper reports on the ongoing work of Phrase Detectives, an attempt to create a very large anaphorically annotated text corpus. Annotated corpora of the size needed for modern computational linguistics research cannot be created by small groups of hand-annotators however the ESP game and similar games with a purpose have demonstrated how it might be(More)
One of the more novel approaches to collaboratively creating language resources in recent years is to use online games to collect and validate data. The most significant challenges collaborative systems face are how to train users with the necessary expertise and how to encourage participation on a scale required to produce high quality data comparable with(More)
Despite the impressive progress made in recent years in all areas of natural language processing there are still tasks that do not perform well enough to be used in everyday applications. One example is anaphora resolution. The most promising approach to get significant improvements in this area is to create sufficiently large linguistically annotated(More)
We present ASemiNER, a semi-supervised algorithm for identifying Named Entities (NEs) in Arabic text. ASemiNER does not require annotated training data, or gazetteers. It also can be easily adapted to handle more than the three standard NE types (Person, Location, and Organisation). To our knowledge, our algorithm is the first study that intensively(More)
In this paper we present an overview of MultiLing 2015, a special session at SIG-dial 2015. MultiLing is a community-driven initiative that pushes the state-of-the-art in Automatic Summarization by providing data sets and fostering further research and development of summariza-tion systems. There were in total 23 participants this year submitting their(More)
The volume of information available on the Web is increasing rapidly. The need for systems that can automatically summarize documents is becoming ever more desirable. For this reason, text summarization has quickly grown into a major research area as illustrated by the DUC and TAC conference series. Summarization systems for Arabic are however still not as(More)