Stanza: A Python Natural Language Processing Toolkit for Many Human Languages

@inproceedings{Qi2020StanzaAP,
  title={Stanza: A Python Natural Language Processing Toolkit for Many Human Languages},
  author={Peng Qi and Yuhao Zhang and Yuhui Zhang and Jason Bolton and Christopher D. Manning},
  booktitle={ACL},
  year={2020}
}
We introduce Stanza, an open-source Python natural language processing toolkit supporting 66 human languages. Compared to existing widely used toolkits, Stanza features a language-agnostic fully neural pipeline for text analysis, including tokenization, multi-word token expansion, lemmatization, part-of-speech and morphological feature tagging, dependency parsing, and named entity recognition. We have trained Stanza on a total of 112 datasets, including the Universal Dependencies treebanks and… Expand
163 Citations
KLPT – Kurdish Language Processing Toolkit
  • PDF
Enhancing deep neural networks with morphological information
  • 2
  • Highly Influenced
  • PDF
Ensemble lemmatization with the Classical Language Toolkit
  • Highly Influenced
Unsupervised Cross-Lingual Part-of-Speech Tagging for Truly Low-Resource Scenarios
  • Highly Influenced
  • PDF
"Wikily" Neural Machine Translation Tailored to Cross-Lingual Tasks
  • Highly Influenced
  • PDF
...
1
2
3
4
5
...

References

SHOWING 1-10 OF 23 REFERENCES
Universal Dependency Parsing from Scratch
  • 154
  • PDF
CoNLL 2017 Shared Task: Multilingual Parsing from Raw Text to Universal Dependencies
  • 253
  • PDF
Learning multilingual named entity recognition from Wikipedia
  • 256
  • PDF
The Stanford CoreNLP Natural Language Processing Toolkit
  • 5,255
  • PDF
Enriching Word Vectors with Subword Information
  • 4,784
  • PDF
AnCora: Multilevel Annotated Corpora for Catalan and Spanish
  • 297
  • PDF
Contextual String Embeddings for Sequence Labeling
  • 600
  • PDF
UDPipe 2.0 Prototype at CoNLL 2018 UD Shared Task
  • 56
  • PDF
...
1
2
3
...