Corpus ID: 218973784

An Evaluation Dataset for Identifying Communicative Functions of Sentences in English Scholarly Papers

  title={An Evaluation Dataset for Identifying Communicative Functions of Sentences in English Scholarly Papers},
  author={Kenichi Iwatsuki and Florian Boudin and Akiko Aizawa},
Formulaic expressions, such as ‘in this paper we propose’, are used by authors of scholarly papers to perform communicative functions; the communicative function of the present example is ‘stating the aim of the paper’. Collecting such expressions and pairing them with their communicative functions would be highly valuable for various tasks, particularly for writing assistance. However, such collection and paring in a principled and automated manner would require high-quality annotated data… Expand
Communicative-Function-Based Sentence Classification for Construction of an Academic Formulaic Expression Database
This study considers a fully automated construction of a CF-labelled FE database using the top–down approach, in which the CF labels are first assigned to sentences, and then the FEs are extracted. Expand
Extraction and Evaluation of Formulaic Expressions Used in Scholarly Papers
A new approach that is robust to variation of spans and forms of formulaic expressions is proposed and a new extraction method that utilises named entities and dependency structures to remove the non-formulaic part from a sentence is proposed. Expand


Using Formulaic Expressions in Writing Assistance Systems
This work proposes a new framework for semantic searches of FEs and a new method to leverage both existing dictionaries and domain sentence corpora, and expands an existing FE dictionary to consider building a more comprehensive and domain-specific FE dictionary. Expand
Rhetorical Move Detection in English Abstracts: Multi-label Sentence Classifiers and their Annotated Corpora
MAZEA (Multi-label Argumentative Zoning for English Abstracts), a multi-label classifier which automatically identifies rhetorical moves in abstracts but allows for a given sentence to be assigned as many labels as appropriate is presented. Expand
A Function-First Approach to Identifying Formulaic Language in Academic Writing.
Abstract There is currently much interest in creating pedagogically-oriented descriptions of formulaic language. Research in this area has typically taken what we call a ‘form-first’ approach, inExpand
A phrase-frame list for social science research article introductions
Abstract This study aimed to contribute to recent corpus-based efforts in compiling lists of academic expressions by deriving a pedagogically useful list of phrase-frames for a specific part-genre,Expand
An Academic Formulas List: New Methods in Phraseology Research
This research creates an empirically derived, pedagogically useful list of formulaic sequences for academic speech and writing, comparable with the Academic Word List (Coxhead 2000), called theExpand
Lexical Bundles in L1 and L2 Academic Writing.
Some high-frequency expressions in published texts, such as in the context of all over the world, were underused in both student corpora, while the L2 student writers overused certain expressions which native academics rarely used. Expand
As can be seen: Lexical bundles and disciplinary variation
An important component of fluent linguistic production is control of the multi-word expressions referred to as clusters, chunks or bundles. These are extended collocations which appear moreExpand
GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding
A benchmark of nine diverse NLU tasks, an auxiliary dataset for probing models for understanding of specific linguistic phenomena, and an online platform for evaluating and comparing models, which favors models that can represent linguistic knowledge in a way that facilitates sample-efficient learning and effective knowledge-transfer across tasks. Expand
Comparing patterns of L1 versus L2 English academic professionals: Lexical bundles in Telecommunications research journals☆
Examining the structural and functional types of lexical bundles employed by L1 English and L1 Chinese professionals writing for English medium Telecommunications journals shows that L1 and L2 professionals employ bundles with different structural characteristics serving similar functions. Expand
Building a Lexicon of Formulaic Language for Language Learners
This work presents two enhancements: the use of a new measure to promote the identification of lexicalized sequences, and an expansion to include sequences with gaps, showing that good performance in the second enhancement depends crucially on the first, and that the lexicon conforms much more with human judgment of formulaic language than alternatives. Expand