• Publications
  • Influence
Learning Morphology with Morfette
TLDR
Morfette is a modular, data-driven, probabilistic system which learns to perform joint morphological tagging and lemmatization from morphologically annotated corpora. Expand
  • 132
  • 31
  • PDF
#hardtoparse: POS Tagging and Parsing the Twitterverse
TLDR
We evaluate the statistical dependency parser, Malt, on a new dataset of sentences taken from tweets. Expand
  • 124
  • 18
  • PDF
QuestionBank: Creating a Corpus of Parse-Annotated Questions
TLDR
This paper describes the development of QuestionBank, a corpus of 4000 parse-annotated questions for (i) use in training parsers employed in QA, and (ii) evaluation of question parsing. Expand
  • 99
  • 15
  • PDF
Long-Distance Dependency Resolution in Automatically Acquired Wide-Coverage PCFG-Based LFG Approximations
TLDR
This paper shows how finite approximations of long distance dependency (LDD) resolution can be obtained automatically for wide-coverage, robust, probabilistic Lexical-Functional Grammar (LFG) resources acquired from treebanks. Expand
  • 139
  • 13
  • PDF
From News to Comment: Resources and Benchmarks for Parsing the Language of Web 2.0
TLDR
We evaluate four Wall-Street-Journal-trained statistical parsers on a new dataset containing 1,000 phrase structure trees for sentences from microblogs (tweets) and discussion forum posts. Expand
  • 83
  • 11
  • PDF
Automatic Extraction of Arabic Multiword Expressions
TLDR
In this paper we investigate the automatic acquisition of Arabic Multiword Expressions (MWEs from available data resources. Expand
  • 73
  • 11
  • PDF
Statistical Post-Editing for a Statistical MT System
Statistical post-editing (SPE) techniques have been successfully applied to the output of Rule Based MT (RBMT) systems. In this paper we investigate the impact of SPE on a standard Phrase-BasedExpand
  • 53
  • 10
  • PDF
Dynamically structuring, updating and interrelating representations of visual and linguistic discourse context
TLDR
We develop a real-time, natural language virtual reality (NLVR) system (called LIVE, for Linguistic Interaction with Virtual Environments) based on both visual and linguistic salience. Expand
  • 58
  • 9
  • PDF
Wide-Coverage Deep Statistical Parsing Using Automatic Dependency Structure Annotation
TLDR
A number of researchers have recently conducted experiments comparing deep hand-crafted wide-coverage with shallow treebank- and machine-learning-based parsers at the level of dependencies, using simple and automatic methods to convert tree output generated by the shallow parsers into dependencies. Expand
  • 58
  • 8
  • PDF
An Extensive Empirical Evaluation of Character-Based Morphological Tagging for 14 Languages
TLDR
This paper investigates neural character-based morphological tagging for languages with complex morphology and large tag sets. Expand
  • 31
  • 8
  • PDF