The Web as Corpus and Online Corpora for Legal Translations

  title={The Web as Corpus and Online Corpora for Legal Translations},
  author={Patrizia Giampieri},
  journal={Comparative Legilinguistics},
  pages={35 - 56}
Abstract Legal language is hallmarked by a pedantic and user-unfriendly jargon whose constructs are all but intuitive, not to mention the legal system specificity which makes it unique in every country. Second language (L2) learners or scholars, hence, may find it difficult to understand the language of the law; whereas translators may consider legal lexical phrases and patterns rather intricate to deal with. The literature claims that a practical way to deepen language knowledge can be found… 

Tables from this paper

The Web as a legal language resource
It will argue that the Web as corpus can be a reliable legal language resource as long as a selection of tools are used, and the query syntax is accurate.
The importance of Internet systematic search for legal translations
It is highlighted that a systematic approach to Google search is necessary in order to deliver high-quality translation work and in particular, advanced search must be performed, together with an accurate consultation of legal documents and of authoritative sources, such as experts' forums or sites.
Legal corpora: A trial lesson with translators and lawyers
Legal translators are often confronted with the peculiarities of legal writing, especially if they have undergone little training in legal matters (Bhatia, 1997; Tiersma, 1999; Williams, 2011). At
The BoLC for Legal Translations: A Trial Lesson
It will remark that despite some drawbacks, such as the absence of POS tagging and lemmatization, and a quite complex search syntax, the BoLC helps dispel doubts and deliver outstanding translation work.
Corpulyzer: A Novel Framework for Building Low Resource Language Corpora
This article proposes a generic framework of Corpus Analyzer – Corpulyzer – a novel framework for building low resource language corpora and demonstrates the efficacy of the framework by creating a high-quality large scale corpus for the Urdu language as a case study.
Data-driven learning in English for academic purposes class
The paper findings highlight that the second translations were better as regards grammar and word choices, while sentence structures still showed influences of the students’ first language.


Concordancing the web: promise and problems, tools and techniques
KWiCFinder (KF), developed by the author to help realize the web’s promise for language scholars and learners, is described and motivated in detail and an initial solution to the pitfalls of ‘webidence’ in serious research is proposed.
The Quality of Legal Dictionaries: An Assessment
The quality of the different bilingual legal dictionaries between the languages of the Member States of the European Union will be assessed and it is concluded that most legal dictionary must be classified as a word list, which implies here that they are of dubious quality.
Web as Corpus
Whether the web is indeed a corpus is considered; a history of the theme is presented in which it is viewed as a development of the empiricist turn which has brought corpora center-stage in the course of the 1990s; and some thoughts on how the web could be put at the linguist’s disposal rather more usefully than current search engines allow are concluded.
Min(d)ing English language data on the Web : What can Google tell us?
The biggest challenge of today is undoubtedly the growing body of text-based information available on the World Wide Web, which forms in fact the largest store of textual data in existence, and as such it constitutes a tantalizing resource for various linguistic purposes.
Exploring constructions on the web: a case study
It will be shown that – at least in the present case – the WebCorp software provides a more reliable means of retrieving data from the web than Google.
Creating General-Purpose Corpora Using Automated Search Engine Queries
The comparison shows that the news corpora are derived from either representative or Internet corpora and cannot provide a window into modern language use in general, and Google is a poor concordancer.
The Corpus of Contemporary American English as the first reliable monitor corpus of English
The Corpus of Contemporary American English is the first large, genre-balanced corpus of any language, which has been designed and constructed from the ground up as a ‘monitor corpus’, and which can
Last Words: Googleology is Bad Science
The World Wide Web is enormous, free, immediately available, and largely linguistic; as the authors discover, on ever more fronts, that language analysis and generation from big data, so it becomes appealing to use the Web as a data source.
The Routledge Handbook of Corpus Linguistics
The Routledge Handbook of Corpus Linguistics is edited by Anne O'Keeffe (University of Limerick, Ireland) and Michael McCarthy (University of Nottingham, UK and Pennsylvania State University, USA).
Corpus Linguistics at Work
The book adopts and exemplifies the parameters of the corpus-driven approach and posits a new unit of linguistic description defined systematically in the light of corpus evidence.