The Spoken BNC2014: Designing and building a spoken corpus of everyday conversations

Robbie Love, Claire Dembry, Andrew Hardie, Vaclav Brezina, Tony McEnery

Research output: Contribution to journalArticlepeer-review


This paper introduces the Spoken British National Corpus 2014, an
11.5-million-word corpus of orthographically transcribed conversations
among L1 speakers of British English from across the UK, recorded in the years
2012–2016. After showing that a survey of the recent history of corpora of spoken British English justifies the compilation of this new corpus, we describe
the main stages of the Spoken BNC2014’s creation: design, data and metadata
collection, transcription, XML encoding, and annotation. In doing so we aim
to (i) encourage users of the corpus to approach the data with sensitivity to the
many methodological issues we identified and attempted to overcome while compiling the Spoken BNC2014, and (ii) inform (future) compilers of spoken corpora
of the innovations we implemented to attempt to make the construction of corpora representing spontaneous speech in informal contexts more tractable, both
logistically and practically, than in the past.
Original languageEnglish
Pages (from-to)319-344
JournalInternational Journal of Corpus Linguistics
Issue number3
Publication statusPublished - 31 Dec 2017

Bibliographical note

© John Benjamins Publishing Company
This is an open access article under a OA CC BY license


Dive into the research topics of 'The Spoken BNC2014: Designing and building a spoken corpus of everyday conversations'. Together they form a unique fingerprint.
  • The British National Corpora

    Love, R., 3 Mar 2023, (Accepted/In press) The Oxford Handbook of British Englishes. Montgomery, C. & Moore, E. (eds.). New York

    Research output: Chapter in Book/Published conference outputChapter (peer-reviewed)peer-review

Cite this