• Publications
  • Influence
Evaluating corpora documentation with regards to the Ethics and Big Data Charter
An analysis of the documentation coverage of the most frequently mentioned language resources with regards to the Charter is proposed in order to show the benefit it offers.
Crowdsourcing for Language Resource Development: Criticisms About Amazon Mechanical Turk Overpowering Use
This article is a position paper about Amazon Mechanical Turk, the use of which has been steadily growing in language processing in the past few years, and proposes practical and organizational solutions in order to improve language resources development.
Covering various Needs in Temporal Annotation: a Proposal of Extension of ISO TimeML that Preserves Upward Compatibility
A critical analysis of the ISO TimeML standard shows that the norm suffers from weaknesses that should be corrected to fit a larger variety of needs in NLP and in corpus linguistics.
Where the data are coming from? Ethics, crowdsourcing and traceability for Big Data in Human Language Technology
The authors want to warn the Big Data community about some recent usage of hu-man computation, and foster some behaviours, especially con-cerning traceability, implemented in the form of a charter, the Ethics and Big Data Charter.
Yes, We Care! Results of the Ethics and Natural Language Processing Surveys
We present here the context and results of two surveys (a French one and an international one) concerning Ethics and NLP, which we designed and conducted between June and September 2015. These
From Legal Documents to Legal Document Management Systems; The Case of LegiCrowd (short paper)
It is argued that a full fledged legal document management system relying on semantic representation is key to resolving conflict and facilitating transparency of Online Legal Documents, and a quick overview of the LegiCrowd project, a crowdsourced approach to legal documents annotation, which paves the way towards such solution.