János Csirik

Learn More
Detecting uncertain and negative assertions is essential in most BioMedical Text Mining tasks where, in general, the aim is to derive factual knowledge from textual data. This article reports on a corpus annotation project that has produced a freely available resource for research on handling negation and uncertainty in biomedical texts (we call this corpus(More)
The CoNLL 2010 Shared Task was dedicated to the detection of uncertainty cues and their linguistic scope in natural language texts. The motivation behind this task was that distinguishing factual and uncertain information in texts is of essential importance in information extraction. This paper provides a general overview of the shared task, including the(More)
This article reports on a corpus annotation project that has produced a freely available resource for research on handling negation and uncertainty in biomedical texts (we call this corpus the BioScope corpus). The corpus consists of three parts, namely medical free texts, biological full papers and biological scientific abstracts. The dataset contains(More)
We present a sequence of new linear-time, bounded-space, on-line bin packing algorithms, the K -Bounded Best Fit algorithms (BBF K ). They are based on the Θ(n log  n) Best Fit algorithm in much the same way as the Next-K Fit algorithms are based on the Θ(n log  n) First Fit algorithm. Unlike the Next-K Fit algorithms, whose asymptotic worst-case ratios(More)
Different program slicing methods are used for maintenance, reverse engineering, testing and debugging. Slicing algorithms can be classified as static slicing and dynamic slicing methods. In several applications the computation of dynamic slices is more preferable since it can produce more precise results. In this paper we introduce a new forward global(More)
Herein, we present the process of developing the first Hungarian Dependency TreeBank. First, short references are made to dependency grammars we considered important in the development of our Treebank. Second, mention is made of existing dependency corpora for other languages. Third, we present the steps of converting the Szeged Treebank into(More)
The Szeged Corpus is a manually annotated natural language corpus currently comprising 1.2 million word entries, 145 thousand different word forms, and an additional 225 thousand punctuation marks. With this, it is the largest manually processed Hungarian textual database that serves as a reference material for research in natural language processing as(More)