Program Details

All date and time in the Technical Program is based on standard Hong Kong time (GMT+8)

Sunday, January 24, 2021
13:30 - 15:00Tutorial 1
Chair: Jun Du
T-1: AI for Sound: Large-scale Robust Audio Tagging with Audio and Visual Multimodality
Qiuqiang Kong, ByteDance; Juncheng Li, Carnegie Mellon University
13:30 - 15:00Tutorial 2
Chair: Hung-Yi Lee
T-2: Pushing the Frontier of Neural Text to Speech
Xu Tan, Microsoft Research Asia (MSRA)
15:30-17:00Tutorial 3
Chair: Jun Du
T-3: Audio-Visual Speech Source Separation
Qingju Liu, Cambridge Huawei Research Centre
15:30-17:00Tutorial 4
Chair: Nengheng Zheng
T-4: Neural Mechanisms Underlying Speech Perception in Noise
Nai Ding, Zhejiang University
Monday, January 25, 2021
08:30 - 09:00Opening Ceremony
09:00 - 10:00Keynote Speech 1
Chair: Brian Mak
Towards More Human-like Machine Speech Chain
Prof. Satoshi Nakamura, Nara Institute of Science and Technology
10:15 - 12:15Oral Session O1-A: Linguistics, Phonetics, Phonology, and Language Modeling
Chairs: Aijun Li, Lawrence Cheung
O1-A-1: Articulatory and Acoustic Features of Mandarin /ɹ/: A Preliminary Study
Shuwen Chen, The Chinese University of Hong Kong; Peggy Mok, The Chinese University of Hong Kong
O1-A-2: Spoken Language Understanding with Sememe Knowledge as Domain Knowledge
Sixia Li, Japan Advanced Institute of Science and Technology; Jianwu Dang, Japan Advanced Institute of Science and Technology & Tianjin University; Longbiao Wang, Tianjin University
O1-A-3: Consonantal effects of aspiration on onset f0 in Cantonese
Xinran Ren, The Chinese University of Hong Kong; Peggy Mok, The Chinese University of Hong Kong
O1-A-4: Complex Patterns of Tonal Realization in Taifeng Chinese
Xiaoyan Zhang, Beijing International Studies University; Aijun Li, Institute of Linguistics, Chinese Academy of Social Sciences; Zhiqiang Li, University of San Francisco
O1-A-5: The Acoustic Correlates and Time Span of the Non-modal Phonation in Kunshan Wu Chinese
Wenwei Xu, The Chinese University of Hong Kong; Peggy Mok, The Chinese University of Hong Kong
O1-A-6: Acoustical Characteristics of the Cantonese Vowels and Tones Produced by Hearing Impaired Speakers
Wai-Sum Lee, City University of Hong Kong; Irene Ching-Yin Tsoi, City University of Hong Kong
10:15 - 12:15Oral Session O1-B: Speaker, Language, and Emotion Recognition
Chairs: Wei Rao, Tom Ko
O1-B-1: Unsupervised Cross-Lingual Speech Emotion Recognition Using Domain Adversarial Neural Network
Xiong Cai, Shenzhen International Graduate School, Tsinghua University; Zhiyong Wu, Shenzhen International Graduate School, Tsinghua University & The Chinese University of Hong Kong; Kuo Zhong, Shenzhen International Graduate School, Tsinghua University; Bin Su, Shenzhen International Graduate School, Tsinghua University; Dongyang Dai, Shenzhen International Graduate School, Tsinghua University; Helen Meng, Shenzhen International Graduate School, Tsinghua University & The Chinese University of Hong Kong
O1-B-2: Channel Interdependence Enhanced Speaker Embeddings for Far-Field Speaker Verification
Ling-jun Zhao, The Hong Kong Polytechnic University; Man-Wai Mak, The Hong Kong Polytechnic University
O1-B-3: Estimating Mutual Information in Prosody Representation for Emotional Prosody Transfer in Speech Synthesis
Guangyan Zhang, The Chinese University of Hong Kong; Shirong Qiu, The Chinese University of Hong Kong; Ying Qin, The Chinese University of Hong Kong; Tan Lee, The Chinese University of Hong Kong
O1-B-4: Revisiting the Statistics Pooling Layer in Deep Speaker Embedding Learning
Shuai Wang, Shanghai Jiao Tong University; Yexin Yang, Shanghai Jiao Tong University; Yanmin Qian, Shanghai Jiao Tong University; Kai Yu, Shanghai Jiao Tong University
O1-B-5: Adversarial Training for Multi-domain Speaker Recognition
Qing Wang, Northwestern Polytechnical University; Wei Rao, Tencent Media Lab; Pengcheng Guo, Northwestern Polytechnical University; Lei Xie, Northwestern Polytechnical University
O1-B-6: Towards Realizing Sign Language to Emotional Speech Conversion by Deep Learning
Weizhe Wang, Northwest Normal University; Hongwu Yang, Northwest Normal University & National and Provincial Joint Engineering Laboratory of Learning Analysis Technology in Online Education
13:30 - 15:30Oral Session O2-A: Speech Prosody, Production and Perception
Chairs: Wentao Gu, Peggy Mok
O2-A-1: Multi-Scale Model for Mandarin Tone Recognition
Linkai Peng, Beijing Language and Culture University; Wang Dai, Beijing Language and Culture University; Dengfeng Ke, Beijing Language and Culture University; Jinsong Zhang, Beijing Language and Culture University
O2-A-2: A Comparison Study on the Alignment of Prosodic and Semantic Units and Its Effects on F0 Shifting in L1 and L2 English Spontaneous Speech
Yuqing Zhang, Beijing Language and Culture University; Zhu Li, Beijing Language and Culture University; Jinsong Zhang, Beijing Language and Culture University
O2-A-3: Automatic Speaker-level Pronunciation Assessment of L2 Speech Using Posterior Probabilities from Multiple Utterances
Guolei Jiang, Northeastern University & SpeechX Limited; Chunhong Liao, SpeechX Limited; Kun Li, SpeechX Limited; Pengfei Liu, SpeechX Limited; Linying Jiang, Northeastern University; Helen Meng, The Chinese University of Hong Kong
O2-A-4: Prosodic Profiles of the Mandarin Speech Conveying Ironic Compliment
Shanpeng Li, Nanjing Normal University & Nanjing University of Science and Technology; Wentao Gu, Nanjing Normal University
O2-A-5: Speaker Charisma Analyzed through the Cultural Lens
Anna Gutnyk, Nanjing Normal University; Oliver Niebuhr, University of Southern Denmark; Wentao Gu, Nanjing Normal University
O2-A-6: Effects of Mandarin Tones on Acoustic Cue Weighting Patterns for Prominence
Wei Zhang, McGill University; Meghan Clayards, McGill University; Jinsong Zhang, Beijing Language and Culture University
13:30 - 15:30Oral Session O2-B: Speech Recognition
Chairs: Haihua Xu, Hung-yi Lee
O2-B-1: An Investigation of Positional Encoding in Transformer-based End-to-end Speech Recognition
Fengpeng Yue, Southern University of Science and Technology; Tom Ko, Southern University of Science and Technology
O2-B-2: Non-autoregressive Deliberation-Attention based End-to-End ASR
Changfeng Gao, Institute of Acoustics, Chinese Academy of Sciences & University of Chinese Academy of Sciences; Gaofeng Cheng, Institute of Acoustics, Chinese Academy of Sciences & University of Chinese Academy of Sciences; Jun Zhou, Institute of Acoustics, Chinese Academy of Sciences & University of Chinese Academy of Sciences; Pengyuan Zhang, Institute of Acoustics, Chinese Academy of Sciences & University of Chinese Academy of Sciences; Yonghong Yan, Institute of Acoustics, Chinese Academy of Sciences & University of Chinese Academy of Sciences
O2-B-3: Improving Attention-based End-to-end ASR by Incorporating an N-gram Neural Network
Junyi Ao, Southern University of Science and Technology; Tom Ko, Southern University of Science and Technology
O2-B-4: Leveraging Text Data Using Hybrid Transformer-LSTM Based End-to-End ASR in Transfer Learning
Zhiping Zeng, Nanyang Technological University & Huya AI; Van Tung Pham, Nanyang Technological University; Haihua Xu, Nanyang Technological University; Yerbolat Khassanov, Nanyang Technological University & Nazarbayev University; Eng Siong Chng, Nanyang Technological University; Chongjia Ni, Alibaba Group; Bin Ma, Alibaba Group
O2-B-5: UNet++-Based Multi-Channel Speech Dereverberation and Distant Speech Recognition
Tuo Zhao, University of Missouri; Yunxin Zhao, University of Missouri; Shaojun Wang, PAII Inc.; Mei Han, PAII Inc.
O2-B-6: Context-Aware RNNLM Rescoring For Conversational Speech Recognition
Kun Wei, Northwestern Polytechnical University; Pengcheng Guo, Northwestern Polytechnical University; Hang Lv, Northwestern Polytechnical University; Zhen Tu, Zhuiyi Technology; Lei Xie, Northwestern Polytechnical University
15:45 - 16:30Poster Session P1
Chairs: Ming Li, Jun Du
P1-1: Capsule Network based End-to-end System for Detection of Replay Attacks
Meidan Ouyang, National University of Singapore; Rohan Kumar Das, National University of Singapore; Jichen Yang, National University of Singapore; Haizhou Li, National University of Singapore
P1-2: Impact of Mismatched Spectral Amplitude Levels on Vowel Identification in Simulated Electric-acoustic Hearing
Changjie Pan, Southern University of Science and Technology; Fei Chen, Southern University of Science and Technology
P1-3: Audio Caption in a Car Setting with a Sentence-Level Loss
Xuenan Xu, Shanghai Jiao Tong University; Heinrich Dinkel, Shanghai Jiao Tong University; Mengyue Wu, Shanghai Jiao Tong University; Kai Yu, Shanghai Jiao Tong University
P1-4: PLDA-based Speaker Verification in Multi-Enrollment Scenario using Expected Vector Approach
Meet Soni, Tata Consultancy Services; Ashish Panda, Tata Consultancy Services
P1-5: A New Method for Improving Generative Adversarial Networks in Speech Enhancement
Fan Yang, Institute of Acoustics, Chinese Academy of Sciences & University of Chinese Academy of Sciences; Junfeng Li, Institute of Acoustics, Chinese Academy of Sciences & University of Chinese Academy of Sciences; Yonghong Yan, Institute of Acoustics, Chinese Academy of Sciences & University of Chinese Academy of Sciences & Xinjiang Technical Institute of Physics and Chemistry, Chinese Academy of Sciences
P1-6: Prosody and Dialogue Act: A Perceptual Study on Chinese Interrogatives
Gan Huang, Institute of Linguistics, Chinese Academy of Social Sciences; Aijun Li, Institute of Linguistics, Chinese Academy of Social Sciences & University of Chinese Academy of Social Sciences; Sichen Zhang, University of Chinese Academy of Social Sciences; Liang Zhang, China University of Political Science and Law
P1-7: Context-dependent Label Smoothing Regularization for Attention-based End-to-End Code-Switching Speech Recognition
Zheying Huang, Institute of Acoustics, Chinese Academy of Sciences & University of Chinese Academy of Sciences; Peng Li, National Computer Network Emergency Response Technical Team Coordination Center of China; Ji Xu, Institute of Acoustics, Chinese Academy of Sciences & University of Chinese Academy of Sciences; Pengyuan Zhang, Institute of Acoustics, Chinese Academy of Sciences & University of Chinese Academy of Sciences; Yonghong Yan, Institute of Acoustics, Chinese Academy of Sciences & University of Chinese Academy of Sciences & Xinjiang Technical Institute of Physics and Chemistry, Chinese Academy of Sciences
P1-8: MoEVC: A Mixture of Experts Voice Conversion System With Sparse Gating Mechanism for Online Computation Acceleration
Yu-Tao Chang, National Chiao Tung University; Yuan-Hong Yang, National Chiao Tung University; Yu-Huai Peng, Academia Sinica; Syu-Siang Wang, Academia Sinica; Tai-Shih Chi, National Chiao Tung University; Yu Tsao, Academia Sinica; Hsin-Min Wang, Academia Sinica
P1-9: Comparing the Rhythm of Instrumental Music and Vocal Music in Mandarin and English
Lujia Yang, Shanghai Jiao Tong University; Hongwei Ding, Shanghai Jiao Tong University
P1-10: Tone Realization in Mandarin Speech: A Large Corpus based Study of Disyllabic Words
Yaru Wu, Université Paris-Saclay, CNRS, LIMSI & Laboratoire de Phonétique et Phonologie, CNRS, Sorbonne Nouvelle; Lori Lamel, Université Paris-Saclay, CNRS, LIMSI; Martine Adda-Decker, Université Paris-Saclay, CNRS, LIMSI & Laboratoire de Phonétique et Phonologie, CNRS, Sorbonne Nouvelle
P1-11: On Adaptive LASSO-based Sparse Time-Varying Complex AR Speech Analysis
Keiichi Funaki, University of Nishihara
P1-12: Speech Emotion Recognition Based on Acoustic Segment Model
Siyuan Zheng, University of Science and Technology of China; Jun Du, University of Science and Technology of China; Hengshun Zhou, University of Science and Technology of China; Xue Bai, University of Science and Technologoy of China; Chin-Hui Lee, Georgia Institute of Technology; Shipeng Li, Shenzhen Institute of Artificial Intelligence and Robotics for Society
P1-13: Approaches to Improving Recognition of Underrepresented Named Entities in Hybrid ASR Systems
Tingzhi Mao, Xinjiang University; Yerbolat Khassanov, Nanyang Technological University & Nazarbayev University; Van Tung Pham, Nanyang Technological University; Haihua Xu, Nanyang Technological University; Hao Huang, Xinjiang University; Eng Siong Chng, Nanyang Technological University
16:00 - 17:00Tencent Session: The Frontiers of Speech Front-End Processing
Panelists: Shidong Shang, Tencent Ethereal Audio Lab; Dan Su, Tencent AI Lab; Junfeng Li, Institute of Acoustics, Chinese Academy of Sciences; Fei Xiang, Xiaomi; Guilin Ma, iFlytek
16:45 - 17:30Poster Session P2
Chairs: Qing Wang, Caicai Zhang
P2-1: RNN-transducer With Language Bias for End-to-end Mandarin-English Code-switching Speech Recognition
Shuai Zhang, University of Chinese Academy of Sciences & Institute of Automation, Chinese Academy of Sciences; Jiangyan Yi, Institute of Automation, Chinese Academy of Sciences; Zhengkun Tian, University of Chinese Academy of Sciences & Institute of Automation, Chinese Academy of Sciences; Jianhua Tao, University of Chinese Academy of Sciences & Institute of Automation, Chinese Academy of Sciences & CAS Center for Excellence in Brain Science and Intelligence Technology; Ye Bai, University of Chinese Academy of Sciences & Institute of Automation, Chinese Academy of Sciences
P2-2: A Practical Way to Improve Automatic Phonetic Segmentation Performance
Wenjie Peng, Beijing Language and Culture University; Yingming Gao, TU Dresden; Binghuai Lin, Tencent Science and Technology Ltd.; Jinsong Zhang, Beijing Language and Culture University
P2-3: Speaker Embedding Augmentation with Noise Distribution Matching
Xun Gong, Shanghai Jiao Tong University; Zhengyang Chen, Shanghai Jiao Tong University; Yexin Yang, Shanghai Jiao Tong University; Shuai Wang, Shanghai Jiao Tong University; Lan Wang, Shenzhen Institutes of Advanced Technology; Yanmin Qian, Shanghai Jiao Tong University
P2-4: Hierarchically Attending Time-Frequency and Channel Features for Improving Speaker Verification
Chenglong Wang, University of Science and Technology of China & Institute of Automation, Chinese Academy of Sciences; Jiangyan Yi, Institute of Automation, Chinese Academy of Sciences; Jianhua Tao, Institute of Automation, Chinese Academy of Sciences & University of Chinese Academy of Sciences & CAS Center for Excellence in Brain Science and Intelligence Technology; Ye Bai, Institute of Automation, Chinese Academy of Sciences & University of Chinese Academy of Sciences; Zhengkun Tian, Institute of Automation, Chinese Academy of Sciences & University of Chinese Academy of Sciences
P2-5: Text Enhancement for Paragraph Processing in End-to-End Code-switching TTS
Chunyu Qiang, Institute of Automation, Chinese Academy of Sciences & University of Chinese Academy of Sciences; Jianhua Tao, Institute of Automation, Chinese Academy of Sciences & University of Chinese Academy of Sciences & CAS Center for Excellence in Brain Science and Intelligence Technology; Ruibo Fu, Institute of Automation, Chinese Academy of Sciences & University of Chinese Academy of Sciences; Zhengqi Wen, Institute of Automation, Chinese Academy of Sciences; Jiangyan Yi, Institute of Automation, Chinese Academy of Sciences; Tao Wang, Institute of Automation, Chinese Academy of Sciences & University of Chinese Academy of Sciences; Shiming Wang, Institute of Automation, Chinese Academy of Sciences & University of Science and Technology of China
P2-6: Frequency-specific Brain Network Dynamics during Perceiving Real words and Pseudowords
Taiyang Guo, Japan Advanced Institute of Science and Technology; Jianwu Dang, Japan Advanced Institute of Science and Technology & Tianjin University; Gaoyan Zhang, Tianjin University; Bin Zhao, Japan Advanced Institute of Science and Technology & Tianjin University; Masashi Unoki, Japan Advanced Institute of Science and Technology
P2-7: An Experimental Research on Tonal Errors in Monosyllables of Standard Spoken Chinese Language Produced by Uyghur Learners
Qiuyuan Li, Chinese Academy of Social Science; Yuan Jia, Chinese Academy of Social Science
P2-8: Automatic Detection of Word-Level Reading Errors in Non-native English Speech Based on ASR Output
Ying Qin, ETS Research & The Chinese University of Hong Kong; Yao Qian, ETS Research & Microsoft; Anastassia Loukina, ETS Research; Patrick Lange, ETS Research; Abhinav Misra, ETS Research; Keelan Evanini, ETS Research; Tan Lee, The Chinese University of Hong Kong
P2-9: Acoustic Word Embedding System for Code-Switching Query-by-example Spoken Term Detection
Murong Ma, Duke Kunshan University; Haiwei Wu, Duke Kunshan University; Xuyang Wang, AI Lab of Lenovo Research; Lin Yang, AI Lab of Lenovo Research; Junjie Wang, AI Lab of Lenovo Research; Ming Li, Duke Kunshan University & Wuhan University
P2-10: Syllable-Based Acoustic Modeling with Lattice-Free MMI for Mandarin Speech Recognition
Jie Li, Kwai; Zhiyun Fan, Institute of Automation, Chinese Academy of Sciences; Xiaorui Wang, Kwai; Yan Li, Kwai
P2-11: Production of Tone 3 Sandhi by Advanced Korean Learners of Mandarin
Xin Li, Fudan University; Yin Huang, Fudan University; Yunheng Xu, Fudan University; Linxin Yi, Fudan University; Yuming Yuan, Fudan University; Min Xiang, Fudan University
P2-12: Low-complexity Post-processing Method for Speech Enhancement
Feng Bao, Tencent Media Lab; Yuepeng Li, Tencent Media Lab; Shidong Shang, Tencent Media Lab
Tuesday, January 26, 2021
09:00 - 11:00Oral Session O3-A: Special Session: Aging in Spoken Language Processing
Chairs: Bin Li, Gang Peng
O3-A-1: Improved End-to-End Dysarthric Speech Recognition via Meta-learning Based Model Re-initialization
Disong Wang,The Chinese University of Hong Kong; Jianwei Yu, The Chinese University of Hong Kong; Xixin Wu, University of Cambridge; Lifa Sun,SpeechX Limited; Xunying Liu,The Chinese University of Hong Kong; Helen Meng, The Chinese University of Hong Kong
O3-A-2: Usability and practicality of speech recording by mobile phones for phonetic analysis
Yihan Guan, City University of Hong Kong; Bin Li, City University of Hong Kong
O3-A-3: Age-Related Decline of Classifier Usage in Southwestern Mandarin
Yun Feng, The Hong Kong Polytechnic University; Yan Feng, The Hong Kong Polytechnic University; Chenwei Xie, The Hong Kong polytechnic University; William Shi-Yuan WANG,The Hong Kong Polytechnic University
O3-A-4: Age-Invariant Speaker Embedding for Diarization of Cognitive Assessments
Sean Shensheng Xu, The Hong Kong Polytechnic University; Man-Wai Mak, The Hong Kong Polytechnic University; Ka Ho Wong, The Chinese University of Hong Kong; Helen Meng, The Chinese University of Hong Kong; Timothy C.Y. Kwok, The Chinese University of Hong Kong
O3-A-6: Improves Neural Acoustic Word Embeddings Query By Example Spoken Term Detection With Wav2vec Pretraining And Circle Loss
Zhaoqi Li, Institute of Acoustics, Chinese Academy of Sciences & University of Chinese Academy of Sciences; Wu Long, Institute of Acoustics, Chinese Academy of Sciences & University of Chinese Academy of Sciences; Ta Li, Institute of Acoustics, Chinese Academy of Sciences & University of Chinese Academy of Sciences; Yonghong Yan, Institute of Acoustics, Chinese Academy of Sciences & University of Chinese Academy of Sciences & Xinjiang Technical Institute of Physics and Chemistry, Chinese Academy of Sciences

09:00 - 11:00Oral Session O3-B: Speech Enhancement and Hear Aids
Chairs: Changchun Bao, Huijun Ding
O3-B-1: An Attention-augmented Fully Convolutional Neural Network for Monaural Speech Enhancement
Zezheng Xu, Beijing University of Posts and Telecommunications; Ting Jiang, Beijing University of Posts and Telecommunications; Chao Li, Beijing University of Posts and Telecommunications; Jiacheng Yu, Beijing University of Posts and Telecommunications
O3-B-2: GAN-Based Inter-Channel Amplitude Ratio Decoding in Multi-Channel Speech Coding
Jinru Zhu, Beijing University of Technology; Changchun Bao, Beijing University of Technology
O3-B-3: Sams-Net: A Sliced Attention-based Neural Network for Music Source Separation
Tingle Li, Duke Kunshan University & Tiangong University; Jiawei Chen, Tiangong University; Haowen Hou, Tencent Inc.; Ming Li, Duke Kunshan University & Wuhan University
O3-B-4: Rapid Word Learning of Children with Cochlear Implants: Phonological Structure and Mutual Exclusivity
Yu-Chen Hung, Children's Hearing Foundation; Tzu-Hui Lin, Children's Hearing Foundation
O3-B-5: Deep Time Delay Neural Network for Speech Enhancement with Full Data Learning
Cunhang Fan, Institute of Automation, Chinese Academy of Sciences & University of Chinese Academy of Sciences; Bin Liu, Institute of Automation, Chinese Academy of Sciences; Jianhua Tao, Institute of Automation, Chinese Academy of Sciences & CAS Center for Excellence in Brain Science and Intelligence Technology & University of Chinese Academy of Sciences; Jiangyan Yi, Institute of Automation, Chinese Academy of Sciences; Zhengqi Wen, Institute of Automation, Chinese Academy of Sciences; Leichao Song, Institute of Automation, Chinese Academy of Sciences & University of Chinese Academy of Sciences
O3-B-6: Phase Spectrum Recovery for Enhancing Low-Quality Speech Captured by Laser Microphones
Chang Liu, University of Science and Technology of China; Yang Ai, University of Science and Technology of China; Zhenhua Ling, University of Science and Technology of China
11:15 - 12:15Keynote Speech 2
Chair: Aijun Li
Spoken Word Production in Chinese: Behavioural and Electrophysiological Study
Prof. Qingfang Zhang, Renmin University of China
13:30 - 15:30Oral Session O4-A: Speech Synthesis and Voice Conversion
Chairs: Xurong Xie, Lei Xie
O4-A-1: Non-parallel Sequence-to-Sequence Voice Conversion for Arbitrary Speakers
Ying Zhang, Kwai; Hao Che, Kwai; Xiaorui Wang, Kwai
O4-A-2: ByteSing: A Chinese Singing Voice Synthesis System Using Duration Allocated Encoder-Decoder Acoustic Models and WaveRNN Vocoders
Yu Gu, ByteDance AI Lab; Xiang Yin, ByteDance AI Lab; Yonghui Rao, ByteDance AI Lab; Yuan Wan, ByteDance AI Lab; Benlai Tang, ByteDance AI Lab; Yang Zhang, ByteDance AI Lab; Jitong Chen, ByteDance AI Lab; Yuxuan Wang, ByteDance AI Lab; Zejun Ma, ByteDance AI Lab
O4-A-3: Exploring Cross-lingual Singing Voice Synthesis Using Speech Data
Yuewen Cao, The Chinese University of Hong Kong; Songxiang Liu, The Chinese University of Hong Kong; Shiyin Kang, Huya Inc.; Na Hu, Tencent AI Lab; Peng Liu, Tencent AI Lab; Xunying Liu, The Chinese University of Hong Kong; Dan Su, Tencent AI Lab; Dong Yu, Tencent AI Lab; Helen Meng, The Chinese University of Hong Kong
O4-A-4: Towards Fine-Grained Prosody Control for Voice Conversion
Zheng Lian, National Laboratory of Pattern Recognition, CASIA & University of Chinese Academy of Sciences; Rongxiu Zhong, National Laboratory of Pattern Recognition, CASIA & University of Chinese Academy of Sciences; Zhengqi Wen, National Laboratory of Pattern Recognition, CASIA; Bin Liu, National Laboratory of Pattern Recognition, CASIA; Jianhua Tao, National Laboratory of Pattern Recognition, CASIA & CAS Center for Excellence in Brain Science and Intelligence Technology & University of Chinese Academy of Sciences
O4-A-5: Accent and Speaker Disentanglement in Many-to-many Voice Conversion
Zhichao Wang, Northwestern Polytechnical University; Wenshuo Ge, Northwestern Polytechnical University; Xiong Wang, Northwestern Polytechnical University; Shan Yang, Northwestern Polytechnical University; Wendong Gan, iQIYI Inc; Haitao Chen, iQIYI Inc; Hai Li, iQIYI Inc; Lei Xie, Northwestern Polytechnical University; Xiulin LI, Databaker (Beijing) Technology Co., Ltd
O4-A-6: Controllable Emotion Transfer for End-To-End Speech Synthesis
Tao Li, Northwestern Polytechnical University; Shan Yang, Northwestern Polytechnical University; Liumeng Xue, Northwestern Polytechnical University; Lei Xie, Northwestern Polytechnical University
13:30 - 15:30Oral Session O4-B: Spoken Language Technology
Chairs: Longbiao Wang, Fei Chen
O4-B-1: Order-aware Pairwise Intoxication Detection
Meng Ge, Tianjin University; Ruixiong Zhang, Didi Chuxing; Wei Zou, Didi Chuxing; Xiangang Li, Didi Chuxing; Cheng Gong, Didi Chuxing; Longbiao Wang, Tianjin University; Jianwu Dang, Tianjin University & Japan Advanced Institute of Science and Technology
O4-B-2: Transformer-based Empathetic Response Generation Using Dialogue Situation and Advanced-Level Definition of Empathy
Yi-Hsuan Wang, National Cheng Kung University; Jia-Hao Hsu, National Cheng Kung University; Chung-Hsien Wu, National Cheng Kung University; Tsung-Hsien Yang, Telecommunication Laboratories Chunghwa Telecom Co., Ltd.
O4-B-3: A Model Ensemble Approach for Sound Event Localization and Detection
Qing Wang, University of Science and Technology of China; Huaxin Wu, iFLYTEK; Zijun Jing, iFLYTEK; Feng Ma, iFLYTEK; Yi Fang, iFLYTEK; Yuxuan Wang, University of Science and Technology of China; Tairan Chen, University of Science and Technology of China; Jia Pan, iFLYTEK; Jun Du, University of Science and Technology of China; Chin-Hui Lee, Georgia Institute of Technology
O4-B-4: Dialogue Act Recognition using Branch Architecture with Attention Mechanism for Imbalanced Data
Mengfei Wu, Tianjin University; Longbiao Wang, Tianjin University; Yuke Si, Tianjin University; Jianwu Dang, Tianjin University & Japan Advanced Institute of Science and Technology
O4-B-5: An Eye-tracking Study of Transposed-letter Effect in English Word Recognition by Mandarin Speakers
Huan Lei, Japan Advanced Institute of Science and Technology; Jianwu Dang, Japan Advanced Institute of Science and Technology; Yu Chen, Tianjin University of Technology
O4-B-6: Automatic Extraction of Semantic Patterns in Dialogs using Convex Polytopic Model
Jingyan Zhou, The Chinese University of Hong Kong; Xiaoying Zhang, The Chinese University of Hong Kong; Xiaohan Feng, The Chinese University of Hong Kong; King Keung Wu, SpeechX Limited; Helen Meng, The Chinese University of Hong Kong
15:45 - 16:45Keynote Speech 3
Chair: Tan Lee
Is Speech the New Blood? On Digital Diagnostics and Donations
Prof. Björn W. Schuller, Imperial College London / University of Augsburg
16:45 - 17:30Closing Ceremony