AI is great, isn’t it?Tone direction and illocutionary force delivery of tag questions in Amazon’s AI NTTS Polly

  1. Alfonso Carlos Rodríguez Fernández-Peña 1
  1. 1 Universidad de Oviedo
    info

    Universidad de Oviedo

    Oviedo, España

    ROR https://ror.org/006gksa02

Journal:
Estudios de fonética experimental

ISSN: 2385-3573 1575-5533

Year of publication: 2023

Issue: 32

Pages: 227-242

Type: Article

DOI: 10.1344/EFE-2023-32-227-242 DIALNET GOOGLE SCHOLAR lock_openOpen access editor

More publications in: Estudios de fonética experimental

Abstract

This work provides a descriptive analysis of the tone direction and its inherent illocutionary force in question tags delivered by Amazon’s neural text-to-speech system Polly. We included three types of tag questions (reverse-polarity tags — both positive and negative —, copy tags and command tags) for which 10 sentences were used as input in each case. The data included 600 utterances produced by British and American English voices currently available on Amazon’s NTTS. The audio files were examined with the speech analysis software Praat to identify the tone pattern for each utterance and confirm the intended illocutionary force. The results show that Amazon’s AI speech synthesis technology is not yet fully reliable and produces a high rate of utterances whose pragmatic load is undesired when using natural spontaneous speech traits as question tags.

Bibliographic References

  • Boersma, P., & Weenink, D. (1992–2023). Praat: Doing phonetics by computer (Version 6.3.18) [Computer program]. http://www.praat.org/
  • Cattel, R. (1973). Negative transportation and tag questions. Language, 49, 612–639. https://doi.org/10.2307/412354
  • Cohen, M. H., Giangola, J. P. & Balogh, J. (2004). Voice user interface design. Addison-Wesley Professional.
  • Collins, B., & Mees, I. M. (2013). Practical pho-netics and phonology. A Resource Book for Students. Routledge. https://doi.org/10.4324/9780203080023
  • Cruttenden, A. (2014). Gimson’s Pronunciation of English. Routledge. https://doi.org/10.4324/9780203784969
  • Estebas Vilaplana, E. (2014). Teach yourself Eng-lish pronunciation: An interactive course for Spanish speakers. Universidad Nacional de Educación a Distancia.
  • Gómez González, M. A., & Sánchez Roura, M. T. (2016). English pronunciation for speakers of Spanish: from theory to practice. Walter de Gruyter. https://doi.org/10.1515/9781501510977
  • Kay, P. (2006). Pragmatic aspects of grammatical constructions. In L. R. Horn, & G. Ward (Eds.), The Handbook of Pragmatics. (pp. 675–700). Blackwell Publishing.
  • Kim, H., Kim, S., & Yoon, S. (2022). Guided-TTS: A diffusion model for text-to-speech via classifier guidance. Proceedings of Machine Learning Research, 162 [Proceedings of the 39th International Conference on Machine Learning], 11119–11133.
  • Kons, Z., Shechtman, S., Sorin, A., Hoory, R., Rabinovitz, C., & da Silva Morais, E. (2018). Neural TTS voice conversion. In 2018 IEEE Spoken Language Technology Workshop (SLT) (pp. 290–296). IEEE. https://doi.org/10.1109/SLT.2018.8639550
  • Lakoff, R. (1969). A syntactic argument for nega-tive transportation. In R. I. Binnick, A. Da-vidson, G. M. Green, & J. L. Morgan (Eds.), Papers from the 5th Regional Meeting of the Chicago Linguistic Society (pp. 140–147). De-partment of Linguistics, University of Chicago.
  • Leech, G., & Svartvik, J. (1994). A communicative grammar of English. Longman. https://doi.org/10.4324/9781315836041
  • Mateo, M. (2014). Exploring pragmatics and pho-netics for successful translation. VIAL (Vigo In-ternational Journal of Applied Linguistics), 11, 111–135.
  • McCawley, J. D. (1988). The syntactic phenomena of English. University of Chicago Press.
  • Mott, B. (2011). English phonetics and phonology for Spanish speakers. Publicacions i Edicions de la Universitat de Barcelona.
  • Parrot, M. (2010). Grammar for English language teachers. Cambridge University Press. https://doi.org/10.1017/9781009406536
  • Roach, P. (2009). English phonetics and phonolo-gy: A practical course. Cambridge University Press.
  • Rodríguez Fernández-Peña, A. C. (2022). La equivalencia pragmática de las 3Ts en inglés y español. LynX: Panorámica de estudios lingüís-ticos, Extra 25 [Gramática Contrastiva: Méto-dos y Perspectivas, ed. M. A. Lledó], 177–218.
  • Sadock, J. M. (1974). Toward a linguistic theory of speech acts. Academic Press.
  • Shen, J., Pang, R., Weiss, R. J., Schuster, M., Jait-ly, N., Yang, Z., Chen, Z., Zhang, Y., & Skerrb-Ryan, R. (2018). Natural TTS Synthesis by conditioning WaveNet on MEL spectro-gram predictions. 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (pp. 4779–4783). IEEE. https://doi.org/10.1109/ICASSP.2018.8461368
  • Swan, M. (2005). Practical English usage. Oxford University Press.
  • Tench, P. (2009). The pronunciation of grammar [Conference presentation]. 3rd International Congress on English Grammar. Salem, TN, In-dia.
  • Thomson, A. J., & Martinet, A. V. (1986). A prac-tical English grammar. Oxford University Press.
  • van den Oord, A., Vinyals, O., & Kavukcuoglu, K. (2017). Neural discrete representation learn-ing. In I. Guyon, U. Von Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, & R. Garnett (Eds.), Advances in neural information processing systems 30 (NIPS 2017) (pp. 6306–6315). Curran Associates Inc.
  • Vince, M., & Emmerson, P. (2003). First Certifi-cate language practice. Macmillan Education.
  • Wells, J. C. (2006). English intonation. an intro-duction. Cambridge University Press.