• Publications
  • Influence
Learning Multilingual Word Embeddings in Latent Metric Space: A Geometric Approach
This work proposes a novel geometric approach for learning bilingual mappings given monolingual embeddings and a bilingual dictionary that outperforms previous approaches on the bilingual lexicon induction and cross-lingual word similarity tasks. Expand
iNLPSuite: Monolingual Corpora, Evaluation Benchmarks and Pre-trained Multilingual Language Models for Indian Languages
This paper introduces NLP resources for 11 major Indian languages from two major language families, and creates datasets for the following tasks: Article Genre Classification, Headline Prediction, Wikipedia Section-Title Prediction, Cloze-style Multiple choice QA, Winograd NLI and COPA. Expand
The IIT Bombay English-Hindi Parallel Corpus
The corpus has been pre-processed for machine translation, and baseline phrase-based SMT and NMT translation results on this corpus are reported, making it the largest publicly available English-Hindi parallel corpus. Expand
Shata-Anuvadak: Tackling Multiway Translation of Indian Languages
We present a compendium of 110 Statistical Machine Translation systems built from parallel corpora of 11 Indian languages belonging to both Indo-Aryan and Dravidian families. We analyze theExpand
Overview of the 6th Workshop on Asian Translation
This paper presents the results of the shared tasks from the 6th workshop on Asian translation (WAT2019) including Ja↔En, Ja↔Zh scientific paper translation subtasks, Ja↔En, Ja↔Ko, Ja↔En patentExpand
A System for Compound Noun Multiword Expression Extraction for Hindi
A system for extracting Hindi compound noun multiword expressions (MWE) from a given corpus is described, based on linguistic and psycholinguistic principles, and the extraction methods use various statistical co-occurrence measures to exploit the statistical idiosyncrasy of MWEs. Expand
Overview of the 7th Workshop on Asian Translation
This paper presents the results of the shared tasks from the 7th workshop on Asian translation (WAT2020). For the WAT2020, 20 teams participated in the shared tasks and 14 teams submitted theirExpand
AI4Bharat-IndicNLP Corpus: Monolingual Corpora and Word Embeddings for Indic Languages
The IndicNLP corpus, a large-scale, general-domain corpus containing 2.7 billion words for 10 Indian languages from two language families, is presented and it is shown that the IndiNLP embeddings significantly outperform publicly available pre-trained embedding on multiple evaluation tasks. Expand
NICT’s Participation in WAT 2018: Approaches Using Multilingualism and Recurrently Stacked Layers
This paper described all NMT systems for the following translation tasks the authors participated in and noted that a single multilingual/bidirectional model (without ensembling) has the potential to achieve (near) stateof-the-art results for all the language pairs. Expand
A Survey of Multilingual Neural Machine Translation
An in-depth survey of existing literature on multilingual neural machine translation (MNMT) is presented and various approaches are categorized based on their central use-case and then further categorize them based on resource scenarios, underlying modeling principles, core-issues, and challenges. Expand