Corpus-based Study of the Rhetorical Organization and Lexical Realization of Scientific Research Abstracts

  • John Blake

Student thesis: Doctoral ThesisDoctor of Philosophy


A key difficulty for novice writers when drafting scientific research abstracts is adherence to the discourse community expectation of generic integrity. This is especially the case for writers with English as an Additional Language. In order to meet these generic expectations, it is necessary to understand what the disciplinary norms are in terms of rhetorical move structure and the language used to realize those moves. Therefore, this research aims to describe the rhetorical organization and lexical realization in a corpus of scientific research abstracts.

A balanced corpus of research abstracts (n = 1, 000) from top-tier journals in ten scientific disciplines was created. This corpus contained 7, 200 sentences, each of which was manually annotated using rhetorical moves as pragmatic units. Tailor made online annotation tools were created to ensure the quality of the annotations. Specialist informants and double annotators were used to verify annotation accuracy.

Tailor-made scripts were created to identify, count, classify and extract the annotated rhetorical moves. The individual moves within each discipline were counted. The frequencies were compared and contrasted. Patterns for adjacent pairs of rhetorical moves and the full rhetorical move sequences were investigated. The permutations of rhetorical move sequences for each abstract were extracted. These sequence permutations were also compared and contrasted, revealing three dimensions in which differences occur: linearity, cyclicity and variation.

Analysis showed substantial differences in usage among the scientific disciplines. Permutations of move sequences varied in terms of linearity, which was based on the assumption of an expected order of INTRODUCTION, PURPOSE, METHOD, RESULTS followed by DISCUSSION. Some move sequences included the fronting of rhetorical moves, such as placing the RESULT MOVE before the METHOD MOVE. Cyclicity was present in disciplines that were concerned with the development and evaluation of algorithms and artifacts. In these disciplines, adjacent pairs of moves were often repeated, such as METHOD-RESULT, METHOD-RESULT with the first pair of moves describing the development phase and the second describing the evaluation phase. The third dimension was in the variation of permutations found within each discipline. This study found an immense variation in applied scientific and engineering disciplines, which is in stark contrast to the formulaic abstracts typical in medical research. Slightly under 200 different move sequence permutations were uncovered in the corpus. Analysis was conducted to investigate the similarities and differences in rhetorical organization on the three dimensions of linearity, cyclicity and variation. Based on these results, a Borromean Rings framework was devised to map the disciplinary generic conventions of rhetorical organization of research abstracts onto a linguistic landscape with three dimensions. The theoretical implications and practical application of these results are elucidated.

Lexical realization within moves within disciplines was also investigated using keyness and grammatical tenses as proxies for lexis and grammar. Cluster analysis was used to reduce the dimensionality and identify the extent to which keyness and grammatical tense usage are move-specific and/or discipline-specific. The cluster analysis grouped the ten disciplines. The resultant clusters were very similar to the results of analysis using the Borromean Rings framework. The main difference being that cluster analysis classified the disciplines slightly more finely in one branch of the dendrogram. Dispersion of key words varied greatly across moves and disciplines.

Both key words and grammatical tenses showed move-specific and discipline specific collocations and colligations. Disciplinary variation is pervasive, but patterns of collocation and colligations are perceptible. Knowledge of these patterns can help novice writers of scientific research abstracts climb the cline of competence and learn how to draft abstracts that meet the generic expectations of their community of practice.
Date of AwardMar 2021
Original languageEnglish
SupervisorKrzysztof Kredens (Supervisor)

Cite this