Keynote Speeches

All date and time in the Technical Program is based on standard Hong Kong time (GMT+8)

Keynote SpeakerTitleDateTime
Prof. Satoshi Nakamura, Nara Institute of Science and TechnologyTowards More Human-like Machine Speech ChainJanuary 25, 202109:00 - 10:00
Prof. Qingfang Zhang, Renmin University of ChinaSpoken Word Production in Chinese: Behavioural and Electrophysiological StudyJanuary 26, 202111:15 - 12:15
Prof. Björn W. Schuller, Imperial College London / University of AugsburgIs Speech the New Blood? On Digital Diagnostics and DonationsJanuary 26, 202115:45 - 16:45

Keynote Speaker: Prof. Satoshi Nakamura, Nara Institute of Science and Technology
Topic: Towards More Human-like Machine Speech Chain
Date: January 25, 2021
Time: 9:00am – 10:00am

Despite the close relationship between speech perception and production, research in automatic speech recognition (ASR) and text-to-speech synthesis (TTS) has been done more or less independently. In human communication, on the other hand, a closed-loop mechanism with auditory feedback from the speaker’s mouth to her ear has been known as Speech Chain. Considering this human mechanism we have firstly proposed Machine Speech Chain in 2017 as a chain mechanism of ASR and TTS. Recently, we extended the Machine Speech Chain to the incremental Machine Speech Chain to realize the online real-time chain. Furthermore, we also extended the Machine Speech Chain to Multimodal Chain to cope with the human multimodal closed-loop mechanism.

Satoshi Nakamura is Professor of Graduate School of Science and Technology, Nara Institute of Science and Technology, Japan, Project Leader of Tourism Information Analytics Team of RIKEN, Center for Advanced Intelligence Project AIP, Honorary Professor of Karlsruhe Institute of Technology, Germany, and ATR Fellow. He received his B.S. from Kyoto Institute of Technology in 1981 and Ph.D. from Kyoto University in 1992. He was Associate Professor of Graduate School of Information Science at Nara Institute of Science and Technology in 1994-2000. He was Director of ATR Spoken Language Communication Research Laboratories in 2000-2008 and Vice President of ATR in 2007-2008. He was Director-General of Keihanna Research Laboratories and the Executive Director of Knowledge-Creating Communication Research Center, National Institute of Information and Communications Technology, Japan in 2009- 2010. He is currently Director of Augmented Human Communication laboratory and a Full Professor of Graduate School of Information Science at Nara Institute of Science and Technology. He is interested in modeling and systems of speech-to-speech translation and speech recognition. He is one of the leaders of speech-to-speech translation research and has been serving various speech-to-speech translation research projects in the world including C-STAR, IWSLT, and A-STAR. He received the Yamashita Research Award, Kiyasu Award from the Information Processing Society of Japan, Telecom System Award, AAMT Nagao Award, Docomo Mobile Science Award in 2007, ASJ Award for Distinguished Achievements in Acoustics. He received the Commendation for Science and Technology from the Minister of Education, Science and Technology, and the Commendation for Science and Technology by the Minister of Internal Affair and Communications. He also received the LREC Antonio Zampolli Award 2012. He has been elected Board Member of International Speech Communication Association, ISCA, from 2011 to 2019, IEEE Signal Processing Magazine Editorial Board Member since 2012-2015, IEEE SPS Speech and Language Technical Committee Member since 2013-2016, IEEE Fellow since 2016, and ISCA Fellow since 2020.

Keynote Speaker: Prof. Qingfang Zhang, Renmin University of China
Topic: Spoken Word Production in Chinese: Behavioural and Electrophysiological Study
Date: January 26, 2021
Time: 11:15am – 12:15pm

Spoken word production involves conceptual preparation, lexical access, phonological encoding, phonetic encoding, and articulation. There are two controversial issues in speech production field: one is the relation between lexical access and phonological encoding, and the other is the proximate unit of phonological encoding. For the first issue, studies revealed that there is an interactive pattern in alphabetic languages (i.e., English, Dutch) but a discrete pattern in Chinese as a non-alphabetic language. For the second issue, studies demonstrated that phonemes are proximate units in alphabetic languages but syllables in Chinese. In the talk, I will present behavioral and electrophysiological evidences, and discuss the implication of these findings across languages.

Qingfang Zhang is a professor at the Department of Psychology, Renmin University of China, and interested in research of language production. She is an associate editor of Frontiers in Psychology: Cognitive Science (since 2015). She has published more than 70 journal papers on language production topics in a wide array of journals, including Brain and Language, Psychophysiology, Neuropsychologia, Applied Psycholinguistics, Language and Speech, Neuroscience, Brain Research, Behavioral Brain Research, Frontiers in Human Neuroscience, Frontiers in Psychology: Language Science, Acta Psychologica Sinica (in Chinese).




Keynote Speaker: Björn W. Schuller, Imperial College London / University of Augsburg
Topic: Is Speech the New Blood? On Digital Diagnostics and Donations
Date: January 26, 2021
Time: 3:45pm – 4:45pm

For mental health, features related to spoken language as accelerated speech and talkativeness found their way into modern diagnostic manuals decades ago, as for bipolar disorder. With the lungs and vocal tract’s key role in voice production, it also seems intuitive that respiratory diseases are “audible” in spoken language. Overall, the cognitive, motor, and physical aspects involved in the complex production mechanisms of speech can be richly exploited for human health state monitoring – also in automated ways. This talk will discuss spoken language’s potential to serve as the “new blood” for contact-less selected medical pre-diagnoses – potentially even from a distance. From Autism and Alzheimer’s via COVID-19 to Parkinson’s and Rett Syndrome, one finds repeated report of success of machine learning for such diagnostics based on voice acoustics and linguistics. Recent deep learning helped boost robustness in the presence of noise and package loss learning representation and even network topologies fully automated. Deep data augmentation, transfer, and lifelong learning increasingly allow to learn from little data. For on-device local processing, squeezing enables running analyses even on wearables. However, more efforts are demanded for reinforced and self-supervised adaptation to novel users and hardware. Further, results’ explanation becomes essential – potentially involving natural language generation for verbalisation. As data still is key, its sharing via federated learning and likes, but also ongoing smart collection of “speech donations” exploiting collaborative learning remain to be put into practice. With that, the dream of a “speech test” for health analysis reality seems within reach.

Björn W. Schuller received his diploma, doctoral degree, habilitation, and Adjunct Teaching Professor in Machine Intelligence and Signal Processing all in EE/IT from TUM in Munich/Germany. He is Full Professor of Artificial Intelligence and the Head of GLAM at Imperial College London/UK, Full Professor and Chair of Embedded Intelligence for Health Care and Wellbeing at the University of Augsburg/Germany, co-founding CEO and current CSO of audEERING – an Audio Intelligence company based near Munich and in Berlin/Germany, and permanent Visiting Professor at HIT/China amongst other Professorships and Affiliations. Previous stays include Full Professor at the University of Passau/Germany, and Researcher at Joanneum Research in Graz/Austria, and the CNRS-LIMSI in Orsay/France. He is a Fellow of the IEEE and Golden Core Awardee of the IEEE Computer Society, Fellow of the BCS, Fellow of the ISCA, President-Emeritus of the AAAC, and Senior Member of the ACM. He (co-)authored 900+ publications (30k+ citations, h-index=85), is Field Chief Editor of Frontiers in Digital Health and was Editor in Chief of the IEEE Transactions on Affective Computing amongst manifold further commitments and service to the community. His 30+ awards include having been honoured as one of 40 extraordinary scientists under the age of 40 by the WEF in 2015. He served as Coordinator/PI in 15+ European Projects, is an ERC Starting and DFG Reinhart-Koselleck Grantee, and consultant of companies such as Barclays, GN, Huawei, or Samsung.