Phoneme Aware Speech Synthesis via Fine Tune Transfer Learning with a Tacotron Spectrogram Prediction Network

Jordan J. Bird, Anikó Ekárt, Diego R. Faria

Research output: Chapter in Book/Published conference outputConference publication


The implications of realistic human speech imitation are both promising but potentially dangerous. In this work, a pre-trained Tacotron Spectrogram Feature Prediction Network is fine tuned with two 1.6 h speech datasets for 100,000 learning iterations, producing two individual models. The two Speech datasets are completely identical in content other than their textual representation, one follows the standard English language, whereas the second is an English phonetic representation in order to study the effects on the learning processes. To test imitative abilities post-training, thirty lines of speech are recorded from a human to be imitated. The models then attempt to produce these voice lines themselves, and the acoustic fingerprint of the outputs are compared to the real human speech. On average, English notation achieves 27.36%, whereas Phonetic English notation achieves 35.31% similarity to a human being. This suggests that representation of English through the International Phonetic Alphabet serves as more useful data than written English language. Thus, it is suggested from these experiments that a phonetic-aware paradigm would improve the abilities of speech synthesis similarly to its effects in the field of speech recognition.
Original languageEnglish
Title of host publicationAdvances in Computational Intelligence Systems - Contributions Presented at the 19th UK Workshop on Computational Intelligence, 2019
EditorsZhaojie Ju, Dalin Zhou, Alexander Gegov, Longzhi Yang, Chenguang Yang
Number of pages12
ISBN (Electronic)978-3-030-29933-0
ISBN (Print)978-3-030-29932-3
Publication statusPublished - 30 Aug 2019
Event19th UK Workshop on Computational Intelligence : UKCI 2019 - Portsmouth, United Kingdom
Duration: 4 Sept 20196 Sept 2019

Publication series

NameAdvances in Intelligent Systems and Computing
ISSN (Print)2194-5357
ISSN (Electronic)2194-5365


Conference19th UK Workshop on Computational Intelligence
Country/TerritoryUnited Kingdom


  • Fine tune learning
  • Fingerprint analysis
  • Phonetic awareness
  • Speech synthesis
  • Tacotron


Dive into the research topics of 'Phoneme Aware Speech Synthesis via Fine Tune Transfer Learning with a Tacotron Spectrogram Prediction Network'. Together they form a unique fingerprint.

Cite this