The impact in forensic voice comparison of lack of calibration and of mismatched conditions between the known-speaker recording and the relevant-population sample recordings

Research output: Contribution to journalArticlepeer-review

Abstract

In a 2017 New South Wales case, a forensic practitioner conducted a forensic voice comparison using a Gaussian mixture model – universal background model (GMM-UBM). The practitioner did not report the results of empirical tests of the performance of this system under conditions reflecting those of the case under investigation. The practitioner trained the model for the numerator of the likelihood ratio using the known-speaker recording, but trained the model for the denominator of the likelihood ratio (the UBM) using high-quality audio recordings, not recordings which reflected the conditions of the known-speaker recording. There was therefore a difference in the mismatch between the numerator model and the questioned-speaker recording versus the mismatch between the denominator model and the questioned-speaker recording. In addition, the practitioner did not calibrate the output of the system. The present paper empirically tests the performance of a replication of the practitioner’s system. It also tests a system in which the UBM was trained on known-speaker-condition data and which was empirically calibrated. The performance of the former system was very poor, and the performance of the latter was substantially better.
Original languageEnglish
Pages (from-to)e1-e7
JournalForensic Science International
Volume283
Early online date19 Dec 2017
DOIs
Publication statusPublished - 1 Feb 2018

Bibliographical note

© 2017, Elsevier. Licensed under the Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International.

Keywords

  • Forensic voice comparison
  • Automatic speaker recognition
  • GMM-UBM
  • ikelihood ratio
  • Validation
  • Calibration
  • Admissibilit

Fingerprint

Dive into the research topics of 'The impact in forensic voice comparison of lack of calibration and of mismatched conditions between the known-speaker recording and the relevant-population sample recordings'. Together they form a unique fingerprint.

Cite this