Eva Hajicová

Learn More
The availability of annotated data (with as rich and “deep” annotation as possible) is desirable in any new developments. Textual data are being used for so-called training phase of various empirical methods solving various problems in the field of computational linguistics. While there are many methods that use texts in their plain (or raw) form (in most(More)
In the present paper we discuss some issues connected with the condition of projectivity in a dependency based description of language (see Sgall, Hajičová, and Panevová (1986), Hajičová, Partee, and Sgall (1998)), with a special regard to the annotation scheme of the Prague Dependency Treebank (PDT, see Hajič (1998)). After a short Introduction (Section(More)
The dichotomy of topic and focus, based, in the Praguean Functional Generative Description, on the scale of communicative dynamism, is relevant not only for a possible placement of the sentence in a context, but also for its semantic interpretation. An automatic identification of topic and focus may use the input information on word order, on the systemic(More)
The requirements of the depth and precision of annotation vary for different intended uses of the corpus but it has been commonly accepted nowadays that the standard annotations of surface structure are only the first steps in a more ambitious research program, aiming at a creation of advanced resources for most different systems of natural language(More)
We introduce a substantial update of the Prague Czech-English Dependency Treebank, a parallel corpus manually annotated at the deep syntactic layer of linguistic representation. The English part consists of the Wall Street Journal (WSJ) section of the Penn Treebank. The Czech part was translated from the English source sentence by sentence. This paper gives(More)
We present the Prague Discourse Treebank 1.0, a collection of Czech texts annotated for various discourse-related phenomena "beyond the sentence boundary". The treebank contains manual annotations of (1), discourse connectives, their arguments and senses, (2), textual coreference, and (3), bridging anaphora, all carried out on 50k sentences of the treebank.(More)
The present paper reports on a preparatory research for building a language corpus annotation scenario capturing the discourse relations in Czech. We primarily focus on the description of the syntactically motivated relations in discourse, basing our findings on the theoretical background of the Prague Dependency Treebank 2.0 and the Penn Discourse Treebank(More)