In this paper we present general assumptions and goals of the LUNA (spoken Language UNderstanding in multilinguAl communication systems) project. We describe the process of collecting a Polish corpus of spoken dialogs and the accepted annotation schema of this corpus at several levels, from transcription of dialogs and morphosyntactic analysis, to semantic… (More)
The paper presents a collection of resources developed for Information Extraction (IE) from Polish texts. In particular, we mention two IE platforms adapted to Polish and several IE applications built on top of one of them: named entity recognition, creation of terminology lexicons, and data extraction from medical texts.
In this paper we present arguments that elaborating a rule based information extraction system is a good starting point for obtaining a semantic annotated corpus of medical data. Our claim is supported by evaluation results of the automatic annotation of a corpus containing hospital discharge reports of diabetic patients.
In this paper, we address the problem of Quality of Service (QoS) provisioning in Optical Burst Switching (OBS) networks. When examining the literature on OBS we can find several proposals of mechanisms dealing with QoS. As these mechanisms are evaluated in a specific node/network scenario each one, the results are usually not comparable. The aim of this… (More)
The paper focuses on resolving natural language issues which have been affecting performance of our system processing Polish medical data. In particular, we address phenomena such as ellipsis, anaphora, comparisons, coordination and negation occurring in mammogram reports. We propose practical data-driven solutions which allow us to improve the system's… (More)
In the paper, we address the problem of recognition of non-domain phrases in terminology lists obtained with an automatic term extraction tool. We focus on identification of multi-word phrases that are general terms and discourse function expressions. We tested several methods based on domain corpora comparison and a method based on contexts of phrases… (More)
In the paper, we propose a new method of identifying terms nested within candidates for the terms extracted from domain texts. The list of all terms is then ranked by the process of automatic term recognition. Our method of identifying nested terms is based on two aspects: grammatical correctness and normalised pointwise mutual information (NPMI) counted… (More)
The paper discusses a program for removing patient identification information from hospital discharge documents in order to make them available for scientific research e.g. information extraction system designing. The presented method allows de-anonymization of documents using a key-code file that is created on the basis of a patient's surname, forename and… (More)
In the paper we present the method of automatic recognition and annotation of proper names which occur in dialogs gathered at the Warsaw city transportation information center. We describe different types of proper names and how people use them in dialogs. We present rules of automatic recognition and lemmatization of proper names in the transportation… (More)