• Corpus ID: 3476035

Sanskrit Sandhi Splitting using $\pmb{seq2(seq)^2}$

  title={Sanskrit Sandhi Splitting using \$\pmb\{seq2(seq)^2\}\$},
  author={Rahul Aralikatte and Neelamadhav Gantayat and Naveen Panwar and Anush Sankaran and Senthil Mani},
  journal={arXiv: Computation and Language},
In Sanskrit, small words (morphemes) are combined through a morphophonological process called Sandhi to form compound words. Sandhi splitting is the process of splitting a given compound word into its constituent morphemes. Although rules governing the splitting of words exist, it is highly challenging to identify the location of the splits in a compound word, as the same compound word might be broken down in multiple ways to provide syntactically correct splits. % it is highly challenging to… 

Figures and Tables from this paper



Building a Word Segmenter for Sanskrit Overnight

This work proposes an approach that uses a deep sequence to sequence (seq2seq) model that takes only the sandhied string as the input and predicts the unsandHied string and preforms better than the current state of the art.

Sanskrit Morphological Analyser: Some Issues

Sanskrit has rich inflectional as well as derivational morphology, and in spite of the existence of a formally defined and well described grammar, construction of a set of computational tools for the analysis of Sanskrit texts could not take a momentum for a long time.

Building a Wide Coverage Sanskrit Morphological Analyzer : A Practical Approach

The complexity involved in building a wide coverage analyzer for Sanskrit is pointed out and a morphological analyzer that has been built using the available eresources, based on ad-hoc principles is described.

SandhiKosh: A Benchmark Corpus for Evaluating Sanskrit Sandhi Tools

A Sanskrit benchmark called SandhiKosh is developed to evaluate the completeness and accuracy of Sanskrit Sandhi tools and it is demonstrated that these tools have substantial scope for improvement.

Sanskrit Compound Processor

This paper discusses the automatic segmentation and type identification of a compound using simple statistics that results from the manually annotated data.

Long Short-Term Memory Neural Networks for Chinese Word Segmentation

A novel neural network model for Chinese word segmentation is proposed, which adopts the long short-term memory (LSTM) neural network to keep the previous important information in memory cell and avoids the limit of window size of local context.

Towards Computational Processing of Sanskrit

  • G. Huet
  • Linguistics, Computer Science
  • 2003
A solution to the tagging of verb phrases which correctly handles the non-associativity of external sandhi arising from the treatment of preverb ā is proposed, which involves a notion of phantom phoneme.

How to Construct Deep Recurrent Neural Networks

Two novel architectures of a deep RNN are proposed which are orthogonal to an earlier attempt of stacking multiple recurrent layers to build aDeep RNN, and an alternative interpretation is provided using a novel framework based on neural operators.

Learning Phrase Representations using RNN Encoder–Decoder for Statistical Machine Translation

Qualitatively, the proposed RNN Encoder‐Decoder model learns a semantically and syntactically meaningful representation of linguistic phrases.

Speech recognition with deep recurrent neural networks

This paper investigates deep recurrent neural networks, which combine the multiple levels of representation that have proved so effective in deep networks with the flexible use of long range context that empowers RNNs.