Learn More
We are witnessing a paradigm shift in Human Language Technology (HLT) that may well have an impact on the field comparable to the statistical revolution: acquiring large-scale resources by exploiting collective intelligence. An illustration of this new approach is <i>Phrase Detectives</i>, an interactive online <i>game with a purpose</i> for creating(More)
This paper reports on the ongoing work of Phrase Detectives, an attempt to create a very large anaphorically annotated text corpus. Annotated corpora of the size needed for modern computational linguistics research cannot be created by small groups of hand-annotators however the ESP game and similar games with a purpose have demonstrated how it might be(More)
This paper presents an overview of automatic methods for building domain knowledge structures (domain models) from text collections. Applications of domain models have a long history within knowledge engineering and artificial intelligence. In the last couple of decades they have surfaced noticeably as a useful tool within natural language processing,(More)
Together with the rapidly growing amount of online data we register an immense need for intelligent search engines that access a restricted amount of data as found in intranets or other limited domains. This sort of search engines must go beyond simple keyword indexing/matching, but they also have to be easily adaptable to new domains without huge costs.(More)
I mproving Web search technology is a hot topic. One aspect that makes it so interesting is the fact that Web documents are typically not plain text files—instead, they contain a tremendous amount of implicit knowledge stored in the markup of the documents. Much of this need not be used in general Web search, because the search engine doesn't need to(More)
The Web provides a massive knowledge source. The same is true for intranets and other electronic document collections. However, much of that knowledge is encoded implicitly and cannot be applied directly without processing it into some more appropriate structures. Searching, browsing, question answering for example could all benefit from domain specific(More)
Large-scale linguistically annotated resources have become available in recent years. This is partly due to sophisticated automatic and semi-automatic approaches that work well on specific tasks such as part-of-speech tagging. For more complex linguistic phenomena like anaphora resolution there are no tools that result in high-quality annotations without(More)
The ability to make progress in Computational Linguistics depends on the availability of large annotated corpora, but creating such corpora by hand annotation is very expensive and time consuming; in practice, it is unfeasible to think of annotating more than one million words. However, the success of Wikipedia, the ESP game, and other projects shows that(More)
One of the most significant challenges facing systems of collective intelligence is how to encourage participation on the scale required to produce high quality data. This paper details ongoing work with Phrase Detectives, an online game-with-a-purpose deployed on Facebook, and investigates user motivations for participation in social network gaming where(More)