A functional toolkit for morphological and phonological processing, application to a Sanskrit tagger

  title={A functional toolkit for morphological and phonological processing, application to a Sanskrit tagger},
  author={G{\'e}rard P. Huet},
  journal={Journal of Functional Programming},
  pages={573 - 614}
  • G. Huet
  • Published 7 January 2005
  • Linguistics
  • Journal of Functional Programming
We present the Zen toolkit for morphological and phonological processing of natural languages. This toolkit is presented in literate programming style, in the Pidgin ML subset of the Objective Caml functional programming language. This toolkit is based on a systematic representation of finite state automata and transducers as decorated lexical trees. All operations on the state space data structures use the zipper technology, and a uniform sharing functor permits systematic maximum sharing as… 

Shallow syntax analysis in Sanskrit guided by semantic nets constraints

We present the state of the art of a computational platform for the analysis of classical Sanskrit. The platform comprises modules for phonology, morphology, segmentation and shallow syntax analysis,

Unsupervised Learning of Morphology and the Languages of the World

The idea is that concatenative affixation, i.e., how stems and affixes are stringed together to form words, can be modelled simplistically and case studies show how this naive model can be used for stemming, language identification and bootstrapping language description.

Strengths and weaknesses of finite-state technology: a case study in morphological grammar development

  • S. Wintner
  • Computer Science
    Natural Language Engineering
  • 2008
This paper investigates the strengths and weaknesses of existing technology, focusing on various aspects of large-scale grammar development, and compares a finite-state implementation with an equivalent Java program with respect to ease of development, modularity, maintainability of the code, and space and time efficiency.

Knowledge representation of grammatical constructs of Sanskrit Language and modular architecture of ParGram

  • N. TapaswiS. Jain
  • Computer Science
    2013 International Conference on Advances in Technology and Engineering (ICATE)
  • 2013
This work regards the problem of obtaining a transfer grammar that reverses the meaning construction, taking into account the generation performance, as a problem of profound study of the design of meaning representation.

GF: A Multilingual Grammar Formalism

  • Aarne Ranta
  • Linguistics, Computer Science
    Lang. Linguistics Compass
  • 2009
The most ambitious multilingual grammar is the GF Resource Grammar Library, which implements the main grammar rules of 12 languages, and enables non-linguist programmers to build linguistically correct applications.

Computational linguistics resources for Indo-Iranian languages

  • S. Virk
  • Linguistics, Computer Science
  • 2013
This thesis elucidates the development of computational grammars for six Indo-Iranian languages and explores different lexical and syntactical aspects of these languages and reports a mechanical development of a Hindi resource grammar starting from an Urdu resource grammar using the Grammatical Framework.

The GF Resource Grammar Library

The focus of this paper is on the linguistic aspects of the GF Resource Grammar Library—in particular, what syntactic structures are covered and how the problems arising in different languages have been solved.

Three Tools for Language Processing: BNF Converter, Functional Morphology, and Extract

Purely functional programming and meta programming based on declarative models are productive approaches to language processing and language resource building. Three tools are presented as evidence

Analysis of Sanskrit Text: Parsing and Semantic Relations

The proposed Sanskrit parser is able to create semantic nets for many classes of Sanskrit paragraphs and is taking care of both external and internal sandhi in the Sanskrit words.

On the Syntax and Translation of Finnish Discourse Clitics

  • Aarne Ranta
  • Linguistics, Computer Science
    Shall We Play the Festschrift Game?
  • 2012
A formal grammar to specify the syntax and morphology of Finnish discourse clitics, a set of morphemes which attach to words and express things like contrasting and reminding, is built.



Lexicon-directed segmentation and tagging in Sanskrit

  • G. Huet
  • Computer Science, Linguistics
  • 2003
A methodology for Sanskrit processing by computer, which analyses the linear structure of a Sanskrit sentence as a set of possible interpretations under sandhi analysis, and uses an original design for a finite-state transducers toolkit, based on functional programming principles.

A General Computational Model for Word-Form Recognition and Production

A language independent model for recognition and production of word forms is presented, based on a new way of describing morphological alternations that is capable of both analyzing and synthesizing word-forms.

Applications of Finite-State Transducers in Natural Language Processing

This paper is a review of some of the major applications of finite-state transducers in Natural Language Processing ranging from morphological analysis to finite-state parsing. The analysis and

Zen and the Art of Symbolic Computing: Light and Fast Applicative Algorithms for Computational Linguistics

  • G. Huet
  • Computer Science, Linguistics
  • 2003
Computational linguistics is an application of computer science which presents interesting challenges from the programming methodology point of view and demands a principled modular architecture with complex cooperation between the various layers.

A Stochastic Finite-State Word-Segmentation Algorithm for Chinese

This paper presents a stochastic finite-state model wherein the basic workhorse is the weighted finite- state transducer and the model segments Chinese text into dictionary entries and words derived by various productive lexical processes, and provides pronunciations for these words.

Grammatical Framework

  • Aarne Ranta
  • Computer Science, Linguistics
    Journal of Functional Programming
  • 2004
This paper starts with a gradual introduction to GF, going through a sequence of simpler formalisms till the full power is reached, followed by a systematic presentation of the GF formalism and outlines of the main algorithms: partial evaluation and parser generation.

Rational Transductions for Phonetic Conversion and Phonology

The nite-state formal devices described in this chapter and tested in the context of phonetics and phonology proved to be both convenient for linguistic description and adapted for eecient implementation.

Linear Contexts and the Sharing Functor: Techniques for Symbolic Computation

This paper argues that zippers, i.e. unary contexts generalizing stacks, are concrete representations of linear functions on algebraic data types, and proposes a uniform sharing functor, which allows the fine-tuning of bucket balancing.

Regular Models of Phonological Rule Systems

This paper shows in detail how this framework applies to ordered sets of context-sensitive rewriting rules and also to grammars in Koskenniemi's two-level formalism.

Deterministic Part-of-Speech Tagging with Finite-State Transducers

A finite-state tagger is presented, inspired by the rule-based tagger, that operates in optimal time in the sense that the time to assign tags to a sentence corresponds to the time required to follow a single path in a deterministic finite- state machine.