Biography
João Cabral is a research fellow at Trinity College Dublin, in the School of Computer Science and Statistics, as part of the ADAPT Centre. He received B.Sc. and M.Sc. degrees from Instituto Superior Técnico (IST), Lisbon, Portugal, in Electrical and Computer Engineering, in 2003 and 2006 respectively. He was awarded a Ph.D. degree in Computer Science and Informatics from the University of Edinburgh, U.K., in 2010, funded by a European Commission Marie Curie Fellowship, under the Early Stage Research Training (E.S.T) scheme. Before joining Trinity College Dublin in 2013, he also worked as a postdoctoral research fellow at the University College Dublin, as part of the CNGL research centre, from 2010.
Publications and Further Research Outputs
Peer-Reviewed Publications
Darragh Higgins, Katja Zibrek, Joao Cabral, Donal Egan, Rachel McDonnell, Sympathy for the digital: Influence of synthetic voice on affinity, social presence and empathy for photorealistic virtual humans, Computers & Graphics, 2022
Katja Zibrek, Joao Cabral, Rachel McDonnell, Does Synthetic Voice Alter Social Response to a Photorealistic Character in Virtual Reality?, Motion, Interaction and Games (MIG), Virtual Event, Switzerland, Association for Computing Machinery, 2021, pp1 - 6
Beatriz Raposo de Medeiros, João Paulo Cabral, Alexsandro R. Meireles, and Andre A. Baceti, A comparative study of fundamental frequency stability between speech and singing, Speech Communication, 128, 2021, p15 - 23
João P. Cabral and Alexsandro R. Meireles, Transformation of voice quality in singing using glottal source features, Workshop on Speech, Music and Mind 2019 (SMM 2019), Vienna, Austria, 14 September 2019, ISCA, 2019, pp31 - 35
Leigh Clark, Philip Doyle, Diego Garaialde, Emer Gilmartin, Stephan Schögl, Jens Edlund, Matthew Aylett, Cosmin Munteanu, João P. Cabral, and Benjamin R. Cowan, The State of Speech in HCI: Trends, Themes and Challenges, Interacting with Computers, 2019
Benjamin R. Cowan, Philip Doyle, Justin Edwards, Diego Garaialde, Ali Hayes-Brady, Holly P. Branigan, João Cabral, Leigh Clark, What's in an accent? The impact of accented synthetic speech on lexical choice in human-machine dialogue, the 1st International Conference on Conversational User Interfaces, Dublin, Ireland, 2019
João P. Cabral, Estimation of the Asymmetry Parameter of the Glottal Flow Waveform Using the Electroglottographic Signal, INTERSPEECH 2018, Hyderabad, India, 2-6 Septmeber, 2018
Beatriz R. de Medeiros and João P. Cabral, Acoustic distinctions between speech and singing: Is singing acoustically more stable than speech?, Speech Prosody, Poznań, Poland, 13-16 June, 2018, pp542 - 546
Leigh Clark, João Cabral, Benjamin Cowan, The CogSIS Project: Examining the Cognitive Effects of Speech Interface Synthesis, British Human Computer Interaction Conference, Belfast, 2-6 July, 2018
João P. Cabral, Benjamin R. Cowan, Katja Zibrek, Rachel McDonnell, The Influence of Synthetic Voice on the Evaluation of a Virtual Character, Interspeech 2017, Stockholm, Sweden, 20-24 August, ISCA, 2017, pp229 - 233
João P. Cabral, Christian Saam, Eva Vanmassenhove, Stephen Bradley, Fasih Haider, The ADAPT entry to the Blizzard Challenge 2016, Blizzard Challenge 2016 Workshop, Cupertino, CA, USA, 2016
Eva Vanmassenhove, João P. Cabral, Fasih Haider, Prediction of Emotions from Text using Sentiment Analysis for Expressive Speech Synthesis, 9th ISCA Workshop on Speech Synthesis, Sunnyvale, CA, USA, 13-15 September, 2016, pp22 - 27
João P. Cabral, Yuyun Huang, Christy Elias, Ketong Su and Nick Campbell, Interface for Monitoring of Engagement from Audio-Visual Cues, The 1st Joint Conference on Facial Analysis, Animation, and Auditory-Visual Speech Processing, Vienna, Austria, 11-13 Septmeber, ISCA, 2015
Séamus Lawless, Peter Lavin, Mostafa Bayomi, João P. Cabral and M. Rami Ghorab, Text Summarization and Speech Synthesis for the Automated Generation of Personalized Audio Presentations, 20th International Conference on Application of Natural Language to Information Systems (NLDB), Passau, germany, June 17-19, edited by Springer , Springer, 2015, pp307 - 320
Elias Christy, João P. Cabral and Nick Campbell, Audio features for the Classification of Engagement, Workshop on Engagement in Social Intelligent Virtual Agents, Delft, Netherlands, 25th August 2015, 2015, pp8 - 12
Yuyun, Huang, Christy Elias., Cabral, João P. Cabral, Atul Nautiyal, Christian Saam and Nick Campbell, Towards Classification of Engagement in Human Interaction with Talking Robots, Communications in Computer and Information Science , 17th International Conference on Human-Computer Interaction, Los Angeles, USA, 2-7 August 2015, 528, Springer, 2015, pp741 - 746
Éva Székely, Zeeshan Ahmed, Shannon Hennig, João P. Cabral and Julie Carson-Berndsen, Predicting synthetic voice style from facial expressions. An application for augmented conversations, Speech Communication, 57, 2014, p63 - 75
Zeeshan Ahmed and João P. Cabral, HMM-Based Speech Synthesiser For The Urdu Language, Workshop on Spoken Language Technologies for Under-Resourced Languages (SLTU), Saint Petersburg, Russia, 14 May, 2014, pp92 - 97
João P. Cabral, Steve Renals, Korin Richmond and Junichi Yamagishi, Glottal Spectral Separation for Speech Synthesis, IEEE Journal of Selected Topics in Signal Processing: Special Issue on Statistical Parametric Speech Synthesis, 8, (2), 2014, p195 - 208
João P. Cabral, Nick Campbell, Sree Ganesh, Mina Kheirkhah, Emer Gilmartin, Fasih Haider, Eamonn Kenny, Andrew Murphy, Neasa Ní, Chiaráin, Thomas Pellegrini and Odei Rey Orozko, MILLA: A Multimodal Interactive Language Learning Agent, SemDial 2014, Edinburgh, United Kingdom, September 1st-3rd, 2014
João P. Cabral, Uniform Concatenative Excitation Model for Synthesising Speech without Voiced/Unvoiced Classification, INTERSPEECH, Lyon, France, August 2013, edited by International Speech Communication Association (ISCA) , 2013, pp1082 - 1085
João P. Cabral and Julie Carson-Berndsen, Towards a Better Representation of Glottal Pulse Shape Characteristics in Modelling the Envelope Modulation of Aspiration Noise, Lecture Notes in Computer Science: Advances in Nonlinear Speech Processing, NOLISP International Conference, Mons, Belgium, 19-21 June, edited by Thomas Drugman and Thierry Dutoit , 7911, 2013, pp67 - 74
Udochukwu Kalu Ogbureke, João P. Cabral and Julie Carson-Berndsen, Explicit Duration Modelling in HMM-based Speech Synthesis using a Hybrid Hidden Markov Model-Multilayer Perceptron, SAPA - SCALE Conference, Portland, USA, 7-8 September, 2012
Éva Székely Zeeshan Ahmed, João P. Cabral and Julie Carson-Berndsen, WinkTalk: A Demonstration of a Multimodal Speech Synthesis Platform Linking Facial Expressions to Expressive Synthetic Voices, the Third Workshop on Speech and Language Processing for Assistive Technologies, Montreal, Canada, 7 June, Association for Computational Linguistics, 2012, pp5 - 8
Amalia Zahra, João P. Cabral, Mark Kane and Julie Carson-Berndsen, Automatic Classification of Pronunciation Errors Using Decision Trees and Speech Recognition Technology, International Symposium on Automatic Detection of Errors in Pronunciation Training (IS ADEPT), Stockholm, Sweden, 6-8 June, 2012, pp65 - 69
João P. Cabral, Mark Kane, Zeeshan Ahmed, Mohamed Abou-Zleikha, Éva Székely, Amalia Zahra, Udochukwu Kalu Ogbureke, Peter Cahill, Julie Carson-Berndsen, Stephan Schlögl, Rapidly Testing the Interaction Model of a Pronunciation Training System via Wizard-of-Oz, International Conference on Language Resources and Evaluation (LREC'12), Istanbul, Turkey, 21-27 May, 2012, pp23 - 25
Udochukwu Kalu Ogbureke, João P. Cabral and Julie Carson-Berndsen, Explicit Duration Modelling in HMM-based Speech Synthesis Using Continuous Hidden Markov Model, International Conference on Information Sciences, Signal Processing and their Applications (ISSPA 2012), Montreal, Canada, 3-5 July, IEEE, 2012, pp700 - 705
Udochukwu Kalu Ogbureke, João P. Cabral and Julie Carson-Berndsen, Using Noisy Speech to Study the Robustness of a Continuous F0 Modelling Method in HMM-based Speech Synthesis, International Conference on Speech Prosody, Shanghai, China, 22-25 May, 2012, pp67 - 70
Udochukwu Kalu Ogbureke, João P. Cabral and Julie Carson-Berndsen, Using multilayer perceptron for voicing strength estimation in HMM-based speech synthesis, International Conference on Information Sciences, Signal Processing and their Applications (ISSPA 2012), Montreal, Canada, 2-5 July, IEEE, 2012, pp683 - 688
Mark Kane, João P. Cabral, Amalia Zahra and Julie Carson-Berndsen, Introducing Difficulty-Levels in Pronunciation Learning, International Speech Communication Association Special Interest Group on Speech and Language Technology in Education (SLaTE), Venice, Italy, 24-26 August, International Speech Communication Association (ISCA), 2011, pp37 - 40
João P. Cabral, John Kane, Christer Gobl and Julie Carson-Berndsen, Evaluation of glottal epoch detection algorithms on different voice types, INTERSPEECH, Florence, Italy, 28-31 August, International Speech Communication Association (ISCA), 2011, pp1989 - 1992
Éva Székely, João P. Cabral, Peter Cahill and Julie Carson-Berndsen, Clustering Expressive Speech Styles in Audiobooks Using Glottal Source Parameters, INTERSPEECH, Florence, Italy, International Speech Communication Association (ISCA), 2011, pp2409 - 2412
João P. Cabral, Steve Renals, Junichi Yamagishi and Korin Richmond, HMM-based speech synthesiser using the LF-model of the glottal source, International Conference on Acoustics, Speech and Signal Processing (ICASSP), Prague, Czech Republic, 22-27 May, IEEE, 2011, pp4704 - 4707
João Paulo Cabral, HMM-based Speech Synthesis Using an Acoustic Glottal Source Model, The University of Edinburgh, 2010
João P. Cabral, Steve Renals, Korin Richmond and Junichi Yamagishi, An HMM-based speech synthesiser using glottal post-filtering, 7th ISCA Speech Synthesis Workshop (SSW7), Kyoto, Japan, 2010, pp365 - 370
J. Sebastian Andersson, João P. Cabral, Leornado Badino, Junichi Yamagishi and Robert A.J. Clark, Glottal source and prosodic prominence modelling in HMM-based speech synthesis for the Blizzard Challenge 2009, The Blizzard Challenge 2009, Edinburgh, UK, 4 Septmeber, 2009
João P. Cabral, Steve Renals, Korin Richmond and Junichi Yamagishi, "HMM-based speech synthesis with an acoustic glottal source model, The First Young Researchers Workshop in Speech Technology, Dublin, Ireland, 25 April, 2009
João P. Cabral, Steve Renals, Korin Richmond and Junichi Yamagishi, Glottal Spectral Separation for Parametric Speech Synthesis, INTERSPEECH 2008, Brisbane, Australia, 22-26 September, International Speech Communication Association (ISCA), 2008, pp1829 - 1832
Guilherme Raimundo, João P. Cabral, Celso Melo, Luís C. Oliveira, Ana Paiva, Isabel Trancoso , Telling Stories with a Synthetic Character: Understanding Inter-modalities Relations, Lecture Notes in Computer Science: Verbal and Nonverbal Communication Behaviours, COST Action 2102 International Workshop on Verbal and Nonverbal Communication Behaviours, Vietri sul Mare, Italy, 29-31 March, 4775, Springer Berlin Heidelberg, 2007, pp310 - 323
João P. Cabral, Steve Renals, Korin Richmond and Junichi Yamagishi, Towards an Improved Modeling of the Glottal Source in Statistical Parametric Speech Synthesis, 6th ISCA Workshop on Speech Synthesis (SSW6), Bonn, Germany, 22-24 August, International Speech Communication Association (ISCA), 2007, pp113 - 118
João P. Cabral, Transforming Prosody and Voice Quality to Generate Emotions in Speech, Instituto Superior Técnico (IST), 2006
João P. Cabral and Luís C. Oliveira, EmoVoice: A System to Generate Emotions in Speech, INTERSPEECH, Pittsburgh, USA, 17-21 Septmeber, International Speech Communication Association (ISCA), 2006, pp1798 - 1801
João P. Cabral, Luís C. Oliveira, Guilherme Raimundo, Ana Paiva, What voice do we expect from a synthetic character?, 11th International Conference on Speech and Computer (SPECOM 2006), St. Petersburg, Russia, 26-29 June, 2006
João P. Cabral and Luís C. Oliveira, Pitch-Synchronous Time-Scaling for High-Frequency Excitation Regeneration, INTERSPEECH, Lisbon, Portugal, 4-8 September, International Speech Communication Association (ISCA), 2005, pp1513 - 1516
João P. Cabral and Luís C. Oliveira, Pitch-Synchronous Time-Scaling for Prosodic and Voice Quality Transformations, INTERSPEECH, Lisbon, Portugal, 4-8 September, International Speech Communication Association (ISCA), 2005, pp1137 - 1140
João P. Cabral, Evaluation of Methods for Excitation Regeneration in Bandwidth Extension of Speech, Instituto Superior Técnico (IST) and Royal Institute of Technology (KTH), 2003
Non-Peer-Reviewed Publications
Peter Cahill, Udochukwu Ogbureke, Jo ̃ao Cabral, Eva Szekely,Mohamed Abou-Zleikha, Zeeshan Ahmed and Julie Carson-Berndsen, UCD Blizzard Challenge 2011 Entry, Blizzard Challenge Workshop 2011, Turin, Italy, 2 September, 2011
Research Expertise
Description
My main research work in on Text-To-Speech synthesis (TTS) and development of innovative commercial applications of this research, such as expressive AI voices for Audiobooks, Spoken Dialogue Systems, and Animation. I'm also interested in analysis of emotion and affect in speech. I've great expertise in analysis and modelling of glottal source parameters. These features are important in TTS for better transforming the type of voice, such as breathy or tense voices, and emotions. Other areas of expertise include speech signal processing, statistical learning algorithms for speech processing and deep learning.Projects
- Title
- Expressive Speech Synthesis: VoiceTune
- Summary
- Research project to develop expressive Text-to-Speech commercial applications for industry. The project aims to validate prototype product/service and commercial value to companies that need AI expressive voice solutions.
- Funding Agency
- Enterprise Ireland
- Date From
- 2020
- Date To
- 2022
- Title
- CogSIS - Cognitive Effects of Speech Interface Synthesis
- Summary
- Through the growth of intelligent personal assistants, pervasive and wearable computing and robot based technologies speech interfaces are set to become a common dialogue partner. Technological challenges around the production of natural synthetic voices have been widely researched. Yet comparatively little is understood about how synthesis affects user experience, in particular how design decisions around naturalness (e.g. accent used and expressivity) impact the assumptions we make about speech interfaces as communicative actors (i.e. our partner models). Our ground-breaking project examines the psychological consequences of synthesis design decisions on the relationship between humans and speech technology. It fuses knowledge, concepts and methods from psycholinguistics, experimental psychology, and human-computer interaction (e.g. perspective taking and partner modelling research in human-human dialogue, controlled experiments, questionnaires) and speech technology (generation of natural speech synthesis) to 1) understand how synthesis design choices, specifically accent and expressivity, impact a user's partner model, 2) how these choices interact with context and 3) impact language production.
- Funding Agency
- Irish Research Council
- Date From
- 2017
- Date To
- 2018
- Title
- Production, Perception and Cognition in the interception between speech and singing
- Summary
- The issue raised in this project is that although we know how to intuitively distinguish between speech and singing, there are portions of each in which one perceives the coexistence of both and this suggests that there is a gradation, more than an abrupt change in phonation and other aspects of speech. The aim of this research project is to focus on some aspects of production and perception of speech and singing in order to answer the question: Are speech and singing completely different phenomena? Experimental studies are conducted that include: collection of a corpus of spoken and singing data, measurements of acoustic differences between the two types of data, a perception test which aims to designate the presented stimulus as speaking or singing, and to use machine learning to further study the acoustic differences between the two sound categories. The results are analysed by taking into account cognitive aspects of speech and song.
- Funding Agency
- São Paulo Research Foundation (FAPESP)
- Date From
- 2016
- Date To
- 2017
Recognition
Memberships
Member of the International Speech Communication Association (ISCA)
Member of the Marie Curie Fellows Association (MCFA)