• Corpus ID: 7020534

A Multilingual Database of Idioms

@inproceedings{Villavicencio2004AMD,
  title={A Multilingual Database of Idioms},
  author={Aline Villavicencio and Timothy Baldwin and Benjamin Waldron},
  booktitle={International Conference on Language Resources and Evaluation},
  year={2004}
}
This paper presents a possible architecture for a multilingual database of idioms. We discuss the challenges that idioms present to the creation of such a database and propose a possible encoding that maximises the amount of information that can be stored for different languages. Such a resource provides important information for linguistic, computational linguistic and psycholinguistic use, and allows for the comparison of different phenomena in different languages. This can provide the basis… 

Figures from this paper

Lexical Encoding of MWEs

This paper presents an architecture for the lexical encoding of these expressions in a database, that takes into account their flexibility, and extends in a straightforward manner the one required for simplex words, and maximises the information contained in the description of multiwords.

The Collection of Distributionally Idiosyncratic Items: A Multilingual Resource for Linguistic Research

This work proposes a system which allows us to document the information about BWs from dictionaries and linguistic literature, together with corpus data and example queries for major text corpora, and points to other phraseologically oriented collections.

The Lexicon-Grammar of Italian Idioms Simonetta Vietri

This paper presents the Lexicon-Grammar classification of Italian idioms that has been constructed on formal principles and, as such, can be exploited in information extraction. Among MWEs, idioms

The Lexicon-Grammar of Italian Idioms

This paper presents the Lexicon-Grammar classification of Italian idioms that has been constructed on formal principles and can be exploited in information extraction and two binary matrixes of two classes of idioms will be presented.

An Empirical Study for a Machine Aided Translation of French Prepositions 'à', 'de' and 'en' into English

This paper presents a study about ambiguous French prepositions, stressing out their roles as dependencies introducers, in order to derive some translation heuristics into English, based on a

Computational Model of the Modern Georgian Language and Search Patterns for an Online Dictionary of Idioms

The use of finite state technology, specifically lexc and xfst, for the morphological analysis of the Modern Georgian language and the application of a morphological transducer to solve problems of lemmatization and alphabetization noticed in Georgian dictionaries are described.

Automatic Acquisition of Knowledge About Multiword Predicates

The results demonstrate that combining statistical approaches with linguistic information is beneficial, both for the acquisition of knowledge about metaphorical and idiomatic MWPs, and for the organization of such knowledge in a computational lexicon.

A Lexicon Module for a Grammar Development Environment

The database module presented addresses issues which have caused problems in the past and the power of a database architecture provides a number of practical advantages as well as a solid framework for future extension.

Variation in V+the+N idioms

The term ‘idiom’ can refer to two types of fixed expressions. First, in a narrow sense, idioms are ‘expressions whose idiomaticity is semantic; typical expressions are kick the bucket, spill the

References

SHOWING 1-10 OF 12 REFERENCES

Multiword Expressions: A Pain in the Neck for NLP

The various kinds of multiword expressions should be analyzed in distinct ways, including listing "words with spaces", hierarchically organized lexicons, restricted combinatoric rules, lexical selection, "idiomatic constructions" and simple statistical affinity.

Collins COBUILD dictionary of idioms

Clearly laid-out and easy-to-use, the Collins COBUILD Dictionary of Idioms will prove to be fascinating and invaluable to teachers and learners of English at all levels.

Multiword expressions: linguistic precision and reusability

How the lexicon of multiword expressions is encoded in a database is discussed and the implications for building a reusable lexical resource are described.

A constructional approach to idioms and word formation

This dissertation explores a constructional approach to various aspects of grammar, in particular idioms and derivational morphology, within the Head-Driven Phrase Structure Grammar (HPSG) framework, and shows that idioms frequently occur in non-canonical forms.

AVOIDANCE OF IDIOMS IN A SECOND LANGUAGE: THE EFFECT OF L1-L2 DEGREE OF SIMILARITY

The study investigates whether avoidance of L2 (English) idioms is determined by the degree of similarity to their L1 (Hebrew) counterparts. Four degrees of similarity were established through a

A Lexicon Module for a Grammar Development Environment

The database module presented addresses issues which have caused problems in the past and the power of a database architecture provides a number of practical advantages as well as a solid framework for future extension.

Noun-Noun Compound Machine Translation A Feasibility Study on Shallow Processing

  • Takaaki TanakaTimothy Baldwin
  • Linguistics
    Proceedings of the ACL 2003 workshop on Multiword expressions analysis, acquisition and treatment -
  • 2003
The results of a feasibility study on the ability of memory-based machine translation and word-to-word compositional machine translation to translate Japanese and English noun-noun compounds are described.

The nature of idioms

  • LinGO Working Paper No. 2002-04
  • 2002

The nature of idioms. LinGO Working Paper No

  • The nature of idioms. LinGO Working Paper No
  • 2002