Learn More
Knowledge Base Population (KBP) is an evaluation track of the Text Analysis Conference (TAC), a workshop series organized by the National Institute of Standards and Technology (NIST). The KBP evaluation includes three tasks that target information extraction and question answering technologies: Entity Linking, Slot Filling, and Cold Start. The Cold Start(More)
Knowledge Base Population (KBP) is an evaluation track of the Text Analysis Conference (TAC), a workshop series organized by the National Institute of Standards and Technology (NIST). In 2013, the KBP evaluations included five tasks targeting information extraction and question answering technologies: Slot Filling tasks were introduced in 2013 in an effort(More)
Location Based Service (LBS) is one kind of ubiquitous applications whose functions are based on the locations of clients. The core of LBS is an effective positioning system. As wireless LAN (WLAN) costs less and is easy to access, using WLAN for indoor positioning has been widely studied recently. K nearest neighbors (KNN) is one of the basic deterministic(More)
Incorporating linguistic knowledge into word alignment is becoming increasingly important for current approaches in statistical machine translation research. To improve automatic word alignment and ultimately machine translation quality, an annotation framework is jointly proposed by LDC (Linguistic Data Consortium) and IBM. The framework enriches word(More)
This paper describes recent efforts at Linguistic Data Consortium at the University of Pennsylvania to create manual transcripts as a shared resource for human language technology research and evaluation. Speech recognition and related technologies in particular call for substantial volumes of transcribed speech for use in system development, and for human(More)
Parallel aligned treebanks (PAT) are linguistic corpora annotated with morphological and syntactic structures that are aligned at sentence as well as sub-sentence levels. They are valuable resources for improving machine translation (MT) quality. Recently, there has been an increasing demand for such data, especially for divergent language pairs. The(More)
Context-awareness is one of the major research areas of pervasive computing. Context plays an important role in such systems. In most existing work, researchers view context as all elements in the environment of an application and use it just as passive data. This kind of context is unfavorable to the quality of context-aware applications. During the(More)
This contribution describes an Arabic-English parallel word aligned treebank corpus from the Linguistic Data Consortium that is currently under production. Herein we primarily focus on efforts required to assemble the package and instructions for using it. It was crucial that word alignment be performed on tokens produced during treebanking to ensure(More)
The interest in syntactically-annotated data for improving machine translation quality has spurred the growing demand for parallel aligned treebank data. To meet this demand, the Linguistic Data Consortium (LDC) has created large volume, multilingual and multi-level aligned treebank corpora by aligning and integrating existing treebank annotation resources.(More)
We have been creating large-scale manual word alignment corpora for Arabic-English and Chinese-English language pairs in genres such as newsire, broadcast news and conversation, and web blogs. We are now meeting the challenge of word aligning further varieties of web data for Chinese and Arabic " dialects ". Human word alignment annotation can be costly and(More)