Overview
Interspeech is a technical conference focused on speech processing and application, emphasizing interdisciplinary approaches addressing all aspects of speech science and technology, ranging from basic theories to advanced applications.
Accepted publications
Tutorials
INTERSPEECH 2023 Tutorial on Resource-Efficient and Cross-Modal Learning Toward Foundation Models
August 20
In this tutorial, the first session will introduce the theoretical advantages of large-scale pre-trained foundation models by the universal approximation theory and how to update the large-scale speech and acoustic models effectively using parameter-efficient learning. Next, our second session will introduce how we can do effective cross-modal pre-training of representations across visual, speech, and language modalities, which can be learned without necessarily needing aligned data across modalities and can benefit tasks in individual modalities as well. Finally, our third session will explore different applications on multimedia processing benefited from the pre-training of acoustic and language modelling with benchmark performance.
Amazon organizers: Huck Yang and Shalini Ghosh
Website: https://interspeech2023.org/tutorials/#toggle-id-3
Amazon organizers: Huck Yang and Shalini Ghosh
Website: https://interspeech2023.org/tutorials/#toggle-id-3
ISCA SPSC Symposium 2023
Colocated with Interspeech 2023
2023 ISCA SPSC Symposium
August 19, 2023
The third edition of the Symposium on Security & Privacy in Speech Communication, focuses on speech and voice through which we express ourselves. As speech communication can be used to command virtual assistants to transport emotion or to identify oneself, the symposium encourages participants to give answers on how we can strengthen security and privacy for speech representation types in user-centric human/machine interaction? The symposium therefore sees that interdisciplinary exchange is in high demand and aims to bring together researchers and practitioners across multiple disciplines – more specifically: signal processing, cryptography, security, human-computer interaction, law, ethics, and anthropology.
-
August 23, 2023Senior principal scientist Jasha Droppo on the shared architectures of large language models and spectrum quantization text-to-speech models — and other convergences between the two fields.
-
August 16, 2023Learning to represent truncated sentences with semantic graphs improves models’ ability to infer missing content.
-
December 20, 2022Ariadna Sanchez, a scientist who works in polyglot text to speech, draws on her musical background to help find novel solutions.