Citius at SemEval-2017 Task 2: Cross-Lingual Similarity from Comparable Corpora and Dependency-Based Contexts

@inproceedings{Gamallo2017CitiusAS,
  title={Citius at SemEval-2017 Task 2: Cross-Lingual Similarity from Comparable Corpora and Dependency-Based Contexts},
  author={Pablo Gamallo},
  booktitle={SemEval@ACL},
  year={2017}
}
  • Pablo Gamallo
  • Published in SemEval@ACL 1 August 2017
  • Linguistics, Computer Science
This article describes the distributional strategy submitted by the Citius team to the SemEval 2017 Task 2. Even though the team participated in two subtasks, namely monolingual and cross-lingual word similarity, the article is mainly focused on the cross-lingual subtask. Our method uses comparable corpora and syntactic dependencies to extract count-based and transparent bilingual distributional contexts. The evaluation of the results show that our method is competitive with other cross-lingual… 

Tables from this paper

SemEval-2017 Task 2: Multilingual and Cross-lingual Semantic Word Similarity

Results show that systems that combine statistical knowledge from text corpora, in the form of word embeddings, and external knowledge from lexical resources are best performers in both subtasks.

The Impact of Linguistic Knowledge in Different Strategies to Learn Cross-Lingual Distributional Models

The experiments show that syntactic information benefits traditional models based on text alignment but harms mapped crosslingual embeddings, and includes different levels of linguistic knowledge in the process of building cross-lingual models for English and Spanish.

The Impact of Linguistic Knowledge in Different Strategies to Learn Cross-Lingual Distributional Models

In recent years, with the emergence of neural networks and word embeddings, there has been a growing interest in working on cross-lingual distributional models learned from monolingual corpora to

Contextualized Translations of Phrasal Verbs with Distributional Compositional Semantics and Monolingual Corpora

A compositional distributional method to generate contextualized senses of words and identify their appropriate translations in the target language using monolingual corpora to translate phrasal verbs in context is described.

Contextualized Translations of Phrasal Verbs with Distributional Compositional Semantics and Monolingual Corpora

A compositional distributional method to generate contextualized senses of words and identify their appropriate translations in the target language using monolingual corpora to translate phrasal verbs in context is described.

Supporting terminology extraction with dependency parses

The aim of this work was to improve term candidate selection by reducing the number of incorrect sequences using a dependency parser for Polish.

References

SHOWING 1-10 OF 26 REFERENCES

SemEval-2017 Task 2: Multilingual and Cross-lingual Semantic Word Similarity

Results show that systems that combine statistical knowledge from text corpora, in the form of word embeddings, and external knowledge from lexical resources are best performers in both subtasks.

Learning Spanish-Galician Translation Equivalents Using a Comparable Corpus and a Bilingual Dictionary

The method proposed in this paper, relying on seed patterns generated from external bilingual dictionaries, allows us to achieve similar results to those from parallel corpus, and the huge amount of comparable corpora available via Web can be viewed as a never-ending source of lexicographic information.

A Framework for the Construction of Monolingual and Cross-lingual Word Similarity Datasets

This work proposes an automatic standardization for the construction of cross-lingual similarity datasets, and provides an evaluation, demonstrating its reliability and robustness.

Finding semantically related words in Dutch: co-occurrences versus syntactic contexts

Six vector-based techniques to retrieve semantically related nouns from a corpus of Dutch are used and it is found that a full syntactic context model clearly outperforms all other approaches, both in its overall performance as in the proportion of synonyms it discovers.

Comparing explicit and predictive distributional semantic models endowed with syntactic contexts

The results show that the traditional count-based model with syntactic dependencies outperforms other strategies, including dependency-based embeddings, but just for the tasks focused on discovering similarity between words with the same function (i.e. near-synonyms).

Finding Terminology Translations from Non-parallel Corpora

We present a statistical word feature, the Word Relation Matrix, which can be used to find translated pairs of words and terms from non-parallel corpora, across language groups. Online dictionary

Learning bilingual lexicons from comparable English and Spanish corpora

The current approach, which relies not on a bilingual dictionary but on the previous extraction of bilingual information from parallel corpora, makes a significant improvement to about 79% of words translations identified correctly.

Disambiguation of single noun translations extracted from bilingual comparable corpora

A bilingual dictionary acquisition system which extracts translations from non-parallel but comparable corpora of a specific academic domain and disambiguates the extracted translations.

Improving Bilingual Lexicon Extraction from Comparable Corpora Using Window-Based and Syntax-Based Models

The reported results show that the combination of the two context representations significantly improves the performance of bilingual lexicon extraction compared to using each of the representations individually.

Text: now in 2D! A framework for lexical expansion with contextual similarity

A new metaphor of two-dimensional text for data-driven semantic modeling of natural language is proposed, which provides an entirely new angle on the representation of text: not only syntagmatic