• Publications
  • Influence
Evaluating phonemic transcription of low-resource tonal languages for language documentation
Transcribing speech is an important part of language documentation, yet speech recognition technology has not been widely harnessed to aid linguists. We explore the use of a neural networkExpand
Design and Implementation of the Online ILSP Greek Corpus
This paper presents the Hellenic National (HNC), which is the corpus of Modern Greek developed by the Institute for Language and Speech Processing (ILSP). Expand
The META-SHARE Language Resources Sharing Infrastructure: Principles, Challenges, Solutions
This paper presents META-SHARE, an open resource exchange infrastructure, which aims to boost visibility, documentation, identification, openness and sharing, collaboration, preservation and interoperability of language data and basic language processing tools. Expand
A Matching Technique In Example-Based Machine Translation
This paper addresses an important problem in Example-Based Machine Translation (EMBT), namely how to measure similarity between a sentence fragment and a set of stored examples. Expand
The META-SHARE Metadata Schema for the Description of Language Resources
This paper presents a metadata model for the description of language resources proposed in the framework of the META-SHARE infrastructure, aiming to cover both datasets and tools/technologies used for their processing. Expand
Named Entity Recognition in Greek Texts
We present a system that recognizes and classifies named entities (NE) in Greek text based on pattern matching techniques. Expand
Efficiently extract recurring tree fragments from large treebanks
Citation for published version (APA): Sangati, F., Zuidema, W., & Bod, R. (2010). Efficiently extract recurring tree fragments from large treebanks. In N. Calzolari, K. Choukri, B. Maegaard, J.Expand
Example retrieval from a translation memory
Clustering of translation memory is proposed to make the retrieval of similar translation examples from a translation memory more efficient, while a second contribution is a metric of text similarity which is based on both surface structure and content. Expand
OpenMinTeD: A Platform Facilitating Text Mining of Scholarly Content
The OpenMinTeD platform aims to bring full text Open Access scholarly content from a wide range of providers together with Text and Data Mining (TDM) tools from various Natural Language Processing frameworks and TDM developers in an integrated environment. Expand