• Publications
  • Influence
Automatic De-identification of Medical Texts in Spanish: the MEDDOCAN Track, Corpus, Guidelines, Methods and Evaluation of Results
This paper summarizes the settings, data and results of the first shared track on anonymization of medical documents in Spanish, the MEDDOCAN (Medical Document Anonymization) track, which relied on a carefully constructed synthetic corpus of clinical case documents following annotation guidelines for sensitive data based on the analysis of the EU General Data Protection Regulation.
CLARIN: Common Language Resources and Technology Infrastructure
This article presents CLARIN, a project that aims to promote the use of technological tools in research in the fields of the Humanities and Social Sciences.
PharmaCoNER: Pharmacological Substances, Compounds and proteins Named Entity Recognition track
This work organized the first shared task on detecting drug and chemical entities in Spanish medical documents, named PharmaCoNER, and generated annotation guidelines together with a corpus of 1,000 manually annotated clinical case studies to foster the development of new resources for clinical and biomedical text mining systems of Spanish medical data.
Annotation of negation in the IULA Spanish Clinical Record Corpus
Comunicacio presentada a SemBEaR 2017: Computational Semantics Beyond Events and Roles, celebrat el dia 4 d'abril de 2017 a Valencia, Espanya.
The MeSpEN Resource for English-Spanish Medical Machine Translation and Terminologies : Census of Parallel Corpora , Glossaries and Term Translations
This article describes an exhaustive effort to identify and characterize heterogeneous types of documents and glossaries useful to build parallel corpora for Spanish-English medical machine translation systems.
The Spanish Resource Grammar
  • M. Marimon
  • Linguistics, Computer Science
  • 1 May 2010
The Spanish Resource Grammar is described, an open-source multi-purpose broad-coverage precise grammar for Spanish that integrates shallow processing functionalities -- morphological analysis, and Named Entity recognition and classification -- into the parsing process.
Towards the automatic merging of language resources
This work has addressed the merging of two verbs subcategorization frame (SCF) lexica for Spanish, and presented a new method for automating merging resources, with the objective of reducing human intervention.
The Tibidabo Treebank
  • M. Marimon
  • Computer Science
    Proces. del Leng. Natural
  • 30 September 2010
The existence of the Tibidabo treebank will facilitate research into the development and evaluation of a hybrid architecture combining symbolic and stochastic approaches to NLP, as well as investigations oriented to hybridization of shallow--deep techniques for NLP.
MultiVal - towards a multilingual valence lexicon
A tool is addressed for the creation of such a multilingual valence resource through converging or converting existing resources, as part of corpus annotation for less resourced languages.
YATS: Yet Another Text Simplifier
Experimental results show good performance of the lexical simplification component when compared to a hard-to-beat baseline, good syntactic simplification accuracy, and according to human assessment, improvements over the best reported results in the literature for a system with same architecture as YATS.