Italy goes to Stanford: a collection of CoreNLP modules for Italian

@article{Aprosio2016ItalyGT,
  title={Italy goes to Stanford: a collection of CoreNLP modules for Italian},
  author={Alessio Palmero Aprosio and Giovanni Moretti},
  journal={ArXiv},
  year={2016},
  volume={abs/1609.06204}
}
In this we paper present Tint, an easy-to-use set of fast, accurate and extendable Natural Language Processing modules for Italian. It is based on Stanford CoreNLP and is freely available as a standalone software or a library that can be integrated in an existing project. 

Tables from this paper

Tint, the Swiss-Army Tool for Natural Language Processing in Italian
TLDR
The last version of Tint, an opensource, fast and extendable Natural Language Processing suite for Italian based on Stanford CoreNLP, is presented, including a set of text processing components for fine-grained linguistic analysis, from tokenization to relation extraction.
CoreNLP-it: A UD Pipeline for Italian based on Stanford CoreNLP
TLDR
A collection of modules for Italian language processing based on CoreNLP and Universal Dependencies, which is easily adaptable to new languages provided with an UD Treebank.
Tint 2.0: an All-inclusive Suite for NLP in Italian
TLDR
The new release of Tint 2.0 is presented, an open-source, fast and extendable Natural Language Processing suite for Italian based on Stanford CoreNLP that includes some improvements of the existing NLP modules, and a set of new text processing components for finegrained linguistic analysis that were not available so far.
Automatic Text Preprocessing for Intelligent Dialog Agents
TLDR
A new Text Preprocessing Pipeline based on a Hybrid approach which involve rule-based and stochastic approaches is described, which involves a Style Correction Module, a Clitic Decomposition Module and a POS Tagging and Lemmatization Module.
Cross-Lingual Abstract Meaning Representation Parsing
TLDR
It is shown that it is possible to use AMR annotations for English as a semantic representation for sentences written in other languages, and a method to evaluate the parsers that does not require gold standard data in the target languages is proposed.
MUSST: A Multilingual Syntactic Simplification Tool
We describe MUSST, a multilingual syntactic simplification tool. The tool supports sentence simplifications for English, Italian and Spanish, and can be easily extended to other languages. Our
Word Sense Disambiguation technique based on Semantic Networks and Expectation Maximization
TLDR
Obiettivo del lavoro di tesi e lo sviluppo di un modulo di Word Sense Disambiguation ai fini di aumentare l'esperienza utente nell'interazione with un agente conversazionale.
Natural language processing to classify named entities of the Brazilian Union Official Diary
TLDR
This article proposes and analyzes the construction of a corpus to extract named entities using the Union Official Diary of the Brazil as source of information and provides a study of a set of tools that perform natural language processing.
Personalized PageRank with Syntagmatic Information for Multilingual Word Sense Disambiguation
TLDR
The next-generation knowledge-based WSD system, SyntagRank, leverages the disambiguated pairs of co-occurring words included in SyntagNet, a lexical-semantic combination resource, to perform state-of-the-art knowledge- based WSD in a multilingual setting.
MicroNeel: Combining NLP Tools to Perform Named Entity Detection and Linking on Microposts
TLDR
The MicroNeel system for Named Entity Recognition and Entity Linking on Italian microposts, which participated in the NEELIT task at EVALITA 2016, is presented.
...
1
2
3
4
5
...

References

SHOWING 1-10 OF 29 REFERENCES
The Stanford CoreNLP Natural Language Processing Toolkit
TLDR
The design and use of the Stanford CoreNLP toolkit is described, an extensible pipeline that provides core natural language analysis, and it is suggested that this follows from a simple, approachable design, straightforward interfaces, the inclusion of robust and good quality analysis components, and not requiring use of a large amount of associated baggage.
Converting Italian Treebanks: Towards an Italian Stanford Dependency Treebank
The paper addresses the challenge of converting MIDT, an existing dependency– based Italian treebank resulting from the harmonization and merging of smaller resources, into the Stanford Dependencies
The TextPro Tool Suite
TLDR
TextPro, a suite of modular Natural Language Processing (NLP) tools for analysis of Italian and English texts, designed so as to integrate and reuse state of the art NLP components developed by researchers at FBK is presented.
An Ensemble Model for the EVALITA 2011 Dependency Parsing Task
TLDR
This paper compared the results obtained by different parsing algorithms implemented in MaltParser with an ensemble model made available by Mihai Surdeanu, and found the best results were achieved by the ensemble model which was selected for the official submission.
GATE: an Architecture for Development of Robust HLT applications
TLDR
GATE is presented, a framework and graphical development environment which enables users to develop and deploy language engineering components and resources in a robust fashion and can be used to develop applications and Resources in multiple languages, based on its thorough Unicode support.
Building a Treebank for Italian: a Data-driven Annotation Schema
TLDR
This paper presents a data-driven annotation schema developed for an Italian treebank ensuring data coverage and consistency between annotation of linguistic phenomena and describes the cyclical development of the annotation schema highlighting the richness and flexibility of the format.
Evaluation of Natural Language and Speech Tool for Italian: International Workshop, EVALITA 2011, Rome, January 24-25, 2012, Revised Selected Papers
TLDR
This volume collects the final and extended contributions presented at EVALITA 2011, the third edition of the evaluation campaign, and is organized in topical sections roughly corresponding to evaluation tasks.
Improving efficiency and accuracy in multilingual entity extraction
TLDR
This paper discusses some implementation and data processing challenges encountered while developing a new multilingual version of DBpedia Spotlight that is faster, more accurate and easier to configure, and compares the solution to the previous system.
Multilingual and cross-domain temporal tagging
TLDR
The authors' publicly available temporal tagger HeidelTime is presented, which is easily extensible to further languages due to its strict separation of source code and language resources like patterns and rules.
I-CAB: the Italian Content Annotation Bank
TLDR
The Italian Content Annotation Bank (I-C AB), a corpus of Italian news annotated with semantic information at different levels, has been presented, with the annotation schemes developed for the ACE Entity Detection and Time Expressions Recognitio and Normalization tasks adopted.
...
1
2
3
...