Evaluating collocation in spoken dialogic corpora

  • Love, R. (Speaker)
  • Isobelle Clarke (Speaker)
  • Mark McGlashan (Speaker)

Activity: Talk or presentation typesOral presentation


The study of collocation is a fundamental approach in the corpus linguistics toolkit. Many studies use the ‘collocation window’ method, which measures collocation within a range, for example, five words to the left (5L) and right (5R) of the given node word (Gablasova et al., 2017: 158). This approach is facilitated by mainstream corpus tools – for example Sketch Engine – that allow the user to define the collocational span according to their research interests.

At a span of 5L and 5R, the tool searches for collocational patterns within strings of up to 11 tokens in length (five either side of the node, plus the node). For corpora where each text is individually authored, collocation windows occur within texts. Yet, when files within corpora consist of multiple texts authored by multiple authors, some collocation windows occur across text boundaries. Dialogic spoken corpora are a case in point, tending to comprise transcripts of conversations between two or more speakers. In this context, each corpus text contains many utterance boundaries as the speakers produce turns. In casual conversation, speaker turns may routinely amount to just a few tokens in length; therefore, the collocation window method – when applied to dialogic corpora – routinely searches windows that span across utterance boundaries. The outcome of this is that collocations ‘co-produced’ by two speakers (producing one word each of a collocation pair) are in no way distinguished from collocations produced solely by individual speakers. Consequently, in a case study of collocation in the Spoken BNC2014, this talk shows the effect of restricting the collocational span to utterance boundaries in comparison to spans of 5L and 5R.

Gablasova, D., Brezina, V., & McEnery, A. M. (2017). Collocations in corpus-based language learning research: identifying, comparing and interpreting the evidence. Language Learning, 67(Suppl. 1), 155-179. https://doi.org/10.1111/lang.12225
Period3 Jul 20236 Jul 2023
Event titleCorpus Linguistics International Conference 2023
Event typeConference
Conference number12
LocationLancaster, United KingdomShow on map
Degree of RecognitionInternational