Learn More
There has been substantial recent interest in annotation schemes that can be applied consistently to many languages. Building on several recent efforts to unify morphological and syntactic annotation, the Universal Dependencies (UD) project seeks to introduce a cross-linguistically applicable part-of-speech tagset, feature inventory , and set of dependency(More)
This paper summarises the contributions of the teams at the Turku to the news translation tasks for translating from and to Finnish. Our models address the problem of treating morphology and data coverage in various ways. We introduce a new efficient tool for word alignment and discuss factori-sations, gappy language models and re-inflection techniques for(More)
In this paper, we introduce several vector space manipulation methods that are applied to trained vector space models in a post-hoc fashion, and present an application of these techniques in semantic role labeling for Finnish and English. Specifically , we show that the vectors can be circularly shifted to encode syntactic information and subsequently(More)
In this paper, we report on the development of a large-scale Finnish Inter-net parsebank, currently consisting of 1.5 billion tokens in 116 million sentences. The data is fully morphologically and syntactically analyzed and it has been used to extract flat and syntactic n-gram collections, as well as verb-argument and noun-argument n-grams. Additionally,(More)
OBJECTIVES In this paper, we study the development and domain-adaptation of statistical syntactic parsers for three different clinical domains in Finnish. METHODS AND MATERIALS The materials include text from daily nursing notes written by nurses in an intensive care unit, physicians' notes from cardiology patients' health records, and daily nursing notes(More)
In this paper we introduce our system capable of producing semantic parses of sentences using three different annotation formats. The system was used to participate in the SemEval-2014 Shared Task on broad-coverage semantic dependency parsing and it was ranked third with an overall F 1-score of 80.49%. The system has a pipeline architecture, consisting of(More)
In this paper we present our winning system in the WMT16 Shared Task on Cross-Lingual Pronoun Prediction, where the objective is to predict a missing target language pronoun based on the target and source sentences. Our system is a deep recurrent neural network, which reads both the source language and target language context with a softmax layer making the(More)
This paper describes baseline systems for Finnish-English and English-Finnish machine translation using standard phrase-based and factored models including morphological features. We experiment with compound splitting and morphological seg-mentation and study the effect of adding noisy out-of-domain data to the parallel and the monolingual training data.(More)
We present a syntactic analysis query toolkit geared specifically towards massive dependency parsebanks and morphologically rich languages. The query language allows arbitrary tree queries, including negated branches, and is suitable for querying analyses with rich morphological annotation. Treebanks of over a million words can be comfortably queried on a(More)
Recently, there has been great interest both in the development of cross-linguistically applicable annotation schemes and in the application of syntactic parsers at web scale to create parsebanks of online texts. The combination of these two trends to create massive, consistently annotated parsebanks in many languages holds enormous potential for the(More)