• Corpus ID: 5447407

B2SG: a TOEFL-like Task for Portuguese

  title={B2SG: a TOEFL-like Task for Portuguese},
  author={Rodrigo Wilkens and Leonardo Zilio and Eduardo Ferreira and Aline Villavicencio},
Resources such as WordNet are useful for NLP applications, but their manual construction consumes time and personnel, and frequently results in low coverage. One alternative is the automatic construction of large resources from corpora like distributional thesauri, containing semantically associated words. However, as they may contain noise, there is a strong need for automatic ways of evaluating the quality of the resulting resource. This paper introduces a gold standard that can aid in this… 

Tables from this paper

Distributional and Knowledge-Based Approaches for Computing Portuguese Word Similarity

This paper, which aims to be a reference for those interested in computing word similarity in Portuguese, presents several approaches for this task, motivated by the recent availability of state-of-the-art distributional models of Portuguese words, which add to several lexical knowledge bases for this language, available for a longer time.

Assessing Lexical-Semantic Regularities in Portuguese Word Embeddings

A new test, dubbed TALES, is created with an exclusive focus on Portuguese lexical-semantic relations, acquired from lexical resources, and suggests that word embeddings may be a useful source of information for enriching those resources, something the authors also discuss.

TALES: Test Set of Portuguese Lexical-Semantic Relations for AssessingWord Embeddings

This paper describes the creation of a new test for assessing Portuguese word embeddings, dubbed TALES, with an exclusive focus on lexical-semantic relations, acquired from lexical resources in Portuguese, and reports on the performance of methods previously used for solving analogies, with pre-trained Portuguese word embeddeds, when applied to the created dataset.

BAHP: Benchmark of Assessing Word Embeddings in Historical Portuguese

BAHP: a benchmark of assessing word embeddings in Historical Portuguese is conducted, which contains four types of tests: analogy, similarity, outlier detection, and coherence, which demonstrate that the test sets are capable of measuring the quality of vector space models and can provide a holistic view of the model’s ability to capture syntactic and semantic information.

Unsupervised Approaches for Computing Word Similarity in Portuguese

There are several valid approaches for computing word similarity in Portuguese, but not one that outperforms all the others in every single test, and distributional models seem to capture relatedness better, but LKBs are better suited for computing genuine similarity.

A Survey on Portuguese Lexical Knowledge Bases: Contents, Comparison and Combination

This work analyses several lexical-semantic knowledge bases for Portuguese and shows that, instead of selecting a single LKB to use, it is generally worth combining the contents of all the open Portuguese LKBs, towards better results.

Comparing and Combining Portuguese Lexical-Semantic Knowledge Bases

The open Portuguese L KBs are briefly analysed, with a focus on size and overlapping contents, and new LKBs are created from their redundant information.

CM 2 News : Towards a Corpus for Multilingual Multi-Document Summarization

The ongoing construction of CM2News, a semantic-annotated corpus for fostering research on multilingual multidocument summarization, is described, which is a result of the Sustento Project, which aims at generating linguistic knowledge for multi-document summarization.

Using the Outlier Detection Task to Evaluate Distributional Semantic Models

It is observed that embeddings outperform count-based representations when their contexts are made up of bag-of-words, and there are no sharp differences between the two models if the word contexts are defined as syntactic dependencies.



New Experiments in Distributional Representations of Synonymy

A TOEFL-like test using WordNet is generated, containing thousands of questions and composed only of words occurring with sufficient corpus frequency to support sound distributional comparisons, leading to a similarity measure which significantly outperforms the best proposed to date.

OpenWordNet-PT: An Open Brazilian Wordnet for Reasoning

The reasons for a Brazilian Portuguese Wordnet are discussed, the process to get a preliminary version of such a resource is used and possible steps to improving the preliminary version are discussed.

BabelNet: Building a Very Large Multilingual Semantic Network

A very large, wide-coverage multilingual semantic network that integrates lexicographic and encyclopedic knowledge from WordNet and Wikipedia and Machine Translation is also applied to enrich the resource with lexical information for all languages.

Towards the Automatic Creation of a Wordnet from a Term-Based Lexical Network

The work described here aims to create a wordnet automatically from a semantic network based on terms. So, a clustering procedure is ran over a synonymy network, in order to obtain synsets. Then, the

Multimodal Distributional Semantics

This work proposes a flexible architecture to integrate text- and image-based distributional information, and shows in a set of empirical tests that the integrated model is superior to the purely text-based approach, and it provides somewhat complementary semantic information with respect to the latter.

Improving Word Representations via Global Context and Multiple Word Prototypes

A new neural network architecture is presented which learns word embeddings that better capture the semantics of words by incorporating both local and global document context, and accounts for homonymy and polysemy by learning multiple embedDings per word.

Automatic Retrieval and Clustering of Similar Words

A word similarity measure based on the distributional pattern of words allows the automatically constructed thesaurus to be significantly closer to WordNet than Roget Thesaurus is.

EuroWordNet: A multilingual database with lexical semantic networks

  • P. Vossen
  • Linguistics, Computer Science
    Springer Netherlands
  • 1998
Cross-Linguistic Alignment of Wordnets with an Inter-Lingual-Index W. Peters, et al.

Community Evaluation and Exchange of Word Vectors at wordvectors.org

This work presents a website and suite of offline tools that facilitate evaluation of word vectors on standard lexical semantics benchmarks and permit exchange and archival by users who wish to find good vectors for their applications.

A Solution to Plato's Problem: The Latent Semantic Analysis Theory of Acquisition, Induction, and Representation of Knowledge.

A new general theory of acquired similarity and knowledge representation, latent semantic analysis (LSA), is presented and used to successfully simulate such learning and several other psycholinguistic phenomena.