Cross-genre individual variation in language use: A study of 112 idiolects

Research output: Unpublished contribution to conferenceUnpublished Conference Paperpeer-review

Abstract

The idea of authorship attribution is based on two assumptions: (i) that some
language users have unique linguistic styles, or quantifiable ‘idiolects’, and (ii) that features characteristic of those styles are likely to recur with a relatively stable frequency in an individual's linguistic output. Studies of individual linguistic variation show a tendency to use sociolinguistically homogenous data focusing on one genre and the few existing cross-genre studies are typically limited to two genres e.g. (Kestemont et al. 2012; Stamatatos 2013). The study reported in this paper takes a different approach: one hundred and twelve participants have shared with us natural language samples from six discourse types. We have collected emails, text messages, university essays, oral interview data, oral image description data, and digital data of Google search behaviour. Each participant’s dataset thus comprises a wide range of genres but also of communication channels, contexts, and language input modes. The individual datasets consist of roughly 10,000 words each, amounting to a total corpus size of over a million words. Using stylometric classification tools, we have measured within-author and between-author variability and obtained results indicating very low levels of individual stability across genres.
We offer a sociolinguistically-based interpretation of the results and discuss their implications for forensic authorship analysis.

References:
Kestemont, M., Luyckx, K., Daelemans, W. and Crombez, T., 2012. Cross-genre authorship verification using unmasking. English Studies, 93(3), pp.340-356.

Stamatatos, E., 2013. On the robustness of authorship attribution based on character n-gram features. Journal of Law and Policy, 21(2), pp.421-439.
Original languageEnglish
Publication statusPublished - 18 Jul 2022
EventFourth European Conference of the International Association of Forensic and Legal Linguistics - Porto, Portugal
Duration: 18 Jul 202221 Jul 2022

Conference

ConferenceFourth European Conference of the International Association of Forensic and Legal Linguistics
Country/TerritoryPortugal
CityPorto
Period18/07/2221/07/22

Keywords

  • forensic linguistics
  • authorship analysis
  • idiolect

Fingerprint

Dive into the research topics of 'Cross-genre individual variation in language use: A study of 112 idiolects'. Together they form a unique fingerprint.

Cite this