ICASSP 2025 Keynote - Selected References
TDNN's aka Convolutional Nets |
---|
Alex Waibel; Phoneme Recognition Using Time-Delay Neural Networks. Meeting of the Institute of Electrical, Information and Communication Engineers (IEICE), Tokyo, Japan, December 1987 |
A. Waibel, T. Hanazawa, G. Hinton, K. Shikano, K. Lang; Phoneme Recognition Using Time-Delay Neural Networks. ATR Interpreting Telephony Research Laboratories, Technical Report TR-I0006, October 30, 1987 |
Alex Waibel, Toshiyuki Hanazawa, Geoffrey Hinton, Kiyohiro Shikano, Kevin J. Lang; Phoneme Recognition Using Time-Delay Neural Networks. IEEE Transactions on Acoustics, Speech and Signal Processing, Vol. 37, No. 3; March 1989 |
John B. Hampshire II, Alex H. Waibel; |
Haffner, Patrick, Waibel, Alex; Multi-state time delay networks for continuous speech recognition. Advances in neural information processing systems 4, NIPS,1991 |
Overview Papers |
---|
Waibel, Alex; Interactive translation of conversational speech. IEEE, 1996 |
Waibel, Alex; Fugen, Christian; Spoken language translation. IEEE Signal Processing Magazine, 2008 |
Waibel, Alex; Multimodal Dialogue Processing for Machine Translation. The Handbook of Multimodal-Multisensor Interfaces, Volume 3, Chapter 14, Association for Computing Machinery and Morgan & Claypool Publishers; June 25, 2019 |
Foundations: Neural Models for Speech and Language |
---|
Waibel, Alex; Sawai, Hidefumi; Shikano, Kiyohiro; Modularity and scaling in large phonemic neural networks. IEEE Transactions on Acoustics, Speech, and Signal Processing, 1989 |
Waibel, Alexander; Neural network approaches for speech recognition. Advances in Speech Signal Processing, 1992 |
Waibel, Alex; Modular construction of time-delay neural networks for speech recognition. Neural computation, MIT Press One Rogers Street, Cambridge, MA 02142-1209, 1989 |
Suhm, Bernhard; Waibel, Alex; Towards better language models for spontaneous speech. Proceedings of the ICSLP, 1994 |
Haffner, Patrick; Waibel, Alex; Multi-state time delay networks for continuous speech recognition. Advances in neural information processing systems, 1991 |
Hampshire, John; Waibel, Alex; Connectionist architectures for multi-speaker phoneme recognition. Advances in neural information processing systems, 1989 |
Wang, Ye-Yi; Waibel, Alex; A connectionist model for dialog processing. ICASSP, 1991 |
Wang, Y; Waibel, Alex; Connectionist transfer in machine translation. Proceedings of the International Conference on Recent Advances in Natural Language Processing, Tzigov Chark, Bulgaria, 1995 |
Ha, Thanh-Le; Niehues, Jan; Waibel, Alexander; Toward multilingual neural machine translation with universal encoder and decoder. arXiv preprint arXiv:1611.04798, 2016 |
Sperber, Matthias; Neubig, Graham; Niehues, Jan; Waibel, Alex; Neural lattice-to-sequence models for uncertain inputs. arXiv preprint arXiv:1704.00559, 2017 |
Ha, Thanh-Le; Niehues, Jan; Waibel, Alexander; Effective strategies in zero-shot neural machine translation. arXiv preprint arXiv:1711.07893, 2017 |
Sperber, Matthias; Niehues, Jan; Neubig, Graham; Stüker, Sebastian; Waibel, Alex; Self-attentional acoustic models. Interspeech 2018 |
Alex Waibel, Phoneme Recognition Using Time-Delay Neural Networks, Meeting of the Institute of Electrical, Information and Communication Engineers (IEICE), Tokyo, Japan, December 1987 |
Pham, Ngoc-Quan; Nguyen, Thai-Son; Niehues, Jan; Müller, Markus; Stüker, Sebastian; Waibel, Alexander; Very deep self-attention networks for end-to-end speech recognition. arXiv preprint arXiv:1904.13377, 2019 |
Sperber, Matthias; Neubig, Graham; Pham, Ngoc-Quan; Waibel, Alex; Self-attentional models for lattice inputs. arXiv preprint arXiv:1906.01617, 2019 |
Pham, Ngoc-Quan; Nguyen, Tuan-Nam; Stüker, Sebastian; Waibel, Alexander; Efficient weight factorization for multilingual speech recognition. Interspeech 2021 |
Nguyen, Thai-Son; Stueker, Sebastian; Niehues, Jan; Waibel, Alex; Improving sequence-to-sequence speech recognition training with on-the-fly data augmentation. ICASSP-2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2020 |
Mullov, Carlos; Pham, Ngoc-Quan; Waibel, Alexander; Unsupervised transfer learning in multilingual neural machine translation with cross-lingual word embeddings. arXiv preprint arXiv:2103.06689, 2021 |
Pham, Ngoc-Quan; Nguyen, Tuan-Nam; Stüker, Sebastian; Waibel, Alexander; Efficient weight factorization for multilingual speech recognition. arXiv preprint arXiv:2105.03010, 2021 |
Bärmann, Leonard; Peller-Konrad, Fabian; Constantin, Stefan; Asfour, Tamim; Waibel, Alex; Deep episodic memory for verbalization of robot experience. IEEE Robotics and Automation Letters, 2021 |
Christian Huber, Juan Hussain, Tuan-Nam Nguyen, Kaihang Song, Sebastian Stüker, Alexander Waibel; Supervised Adaptation of Sequence-to-Sequence Speech Recognition Systems using Batch-Weighting. The 1st Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics and the 10th International Joint Conference on Natural Language Processing - AACL-IJCNLP, 2020 |
Speech Translation Systems |
---|
Waibel, Alexander; Lane, Ian R; System and methods for maintaining speech-to-speech translation in the field. US Patent 8,204,739, 2012 |
Waibel, Alex; Badran, Ahmed; Black, Alan W; Frederking, Robert; Gates, Donna; Lavie, Alon; Levin, Lori; Lenzo, Kevin; Tomokiyo, Laura Mayfield; Reichert, Juergen; Speechalator: Two-way speech-to-speech translation in your hand. Companion Volume of the Proceedings of HLT-NAACL 2003-Demonstrations, 2003 |
Waibel, Alex; Jain, Ajay; McNair, Arthur; Tebelskis, Joe; Osterholtz, Louise; Saito, Hiroaki; Schmidbauer, Otto; Sloboda, Tilo; Woszczyna, Monika; JANUS: Speech-to-speech translation using connectionist and non-connectionist techniques. Advances in neural information processing systems, 1991 |
Stüker, Sebastian; Paulik, Matthias; Kolss, Muntsin; Fugen, Christian; Waibel, Alex; Speech translation enhanced ASR for european parliament speeches-on the influence of ASR performance on speech translation. IEEE International Conference on Acoustics, Speech and Signal Processing-ICASSP, 2007 |
Jain, AN; McNair, AE; Waibel, A; Saito, H; Hauptmann, AG; Tebelskis, J; Connectionist and symbolic processing in speech-to-speech translation: the JANUS system. Proceedings of Machine Translation Summit III: Papers, 1991 |
Kolss, Muntsin; Vogel, Stephan; Waibel, Alex; Stream decoding for simultaneous spoken language translation. Interspeech, 2008 |
Eck, Matthias; Lane, Ian; Zhang, Ying; Waibel, Alex; Jibbigo: Speech-to-speech translation on mobile devices. IEEE Spoken Language Technology Workshop, 2010 |
Agarwal, Milind; Agarwal, Sweta; Anastasopoulos, Antonios; Bentivogli, Luisa; Bojar, Ondřej; Borg, Claudia; Carpuat, Marine; Cattoni, Roldano; Cettolo, Mauro; Chen, Mingda; Findings of the IWSLT 2023 evaluation campaign. Association for Computational Linguistics, 2023 |
Koneru, Sai; Nguyen, Thai-Binh; Pham, Ngoc-Quan; Liu, Danni; Li, Zhaolin; Waibel, Alexander; Niehues, Jan; Blending LLMs into Cascaded Speech Translation: KIT's Offline Speech Translation System for IWSLT 2024. arXiv preprint arXiv:2406.16777, 2024 |
Enes Yavuz Ugan, Mohammed Mediani, Omar Al Jawabra, Aya Khader, Yining Liu, Alexander Waibel; Modular Design of a Front-End and Back-End Speech-to-Speech Translation Application for Psychiatric Treatment of Refugees, IEEE GHTC23, 2023 |
Simultaneous Speech Translation |
---|
Waibel, Alexander; Fuegen, Christian; Simultaneous translation of open domain lectures and speeches. US Patent 8,090,570, 2012 |
Fügen, Christian; Waibel, Alex; Kolss, Muntsin; Simultaneous translation of lectures and speeches. Machine translation, Springer Netherlands, 2007 |
Bails, Jennifer; No longer lost in translation, Pittsburgh Tribune Review, Oct. 18, 2005 |
WTAE News: Report on New Speech to Speech Technologies, USA, Oct. 2025 |
Cho, Eunah; Niehues, Jan; Waibel, Alex; Segmentation and punctuation prediction in speech language translation using a monolingual translation system. Proceedings of the 9th International Workshop on Spoken Language Translation: Papers, 2012 |
Cho, Eunah; Niehues, Jan; Kilgour, Kevin; Waibel, Alex; Punctuation insertion for real-time spoken language translation. Proceedings of the 12th International Workshop on Spoken Language Translation: Papers, 2015 |
Niehues, Jan; Nguyen, Thai Son; Cho, Eunah; Ha, Thanh-Le; Kilgour, Kevin; Müller, Markus; Sperber, Matthias; Stüker, Sebastian; Waibel, Alex; Dynamic Transcription for Low-Latency Speech Translation. Interspeech, 2016 |
Niehues, Jan; Cho, Eunah; Ha, Thanh-Le; Waibel, Alex; Pre-translation for neural machine translation. arXiv preprint arXiv:1610.05243, 2016 |
Cho, Eunah; Niehues, Jan; Waibel, Alex; NMT-Based Segmentation and Punctuation Insertion for Real-Time Spoken Language Translation. Interspeech, 2017 |
Niehues, Jan; Pham, Ngoc-Quan; Ha, Thanh-Le; Sperber, Matthias; Waibel, Alex; Low-latency neural speech translation. Interspeech 2018 |
Nguyen, Thai Son; Niehues, Jan; Cho, Eunah; Ha, Thanh-Le; Kilgour, Kevin; Muller, Markus; Sperber, Matthias; Stueker, Sebastian; Waibel, Alex; Low latency asr for simultaneous speech translation. arXiv preprint arXiv:2003.09891, 2020 |
Nguyen, Thai-Son; Stüker, Sebastian; Waibel, Alex; Super-human performance in online low-latency recognition of conversational speech. arXiv preprint arXiv:2010.03449, 2020 |
Huber, Christian; Dinh, Tu Anh; Mullov, Carlos; Pham, Ngoc Quan; Nguyen, Thai Binh; Retkowski, Fabian; Constantin, Stefan; Ugan, Enes Yavuz; Liu, Danni; Li, Zhaolin; End-to-end evaluation for low-latency simultaneous speech translation. arXiv preprint arXiv:2308.03415, 2023 |
Multilingual Speech |
---|
Müller, Markus; Stüker, Sebastian; Waibel, Alex; Neural language codes for multilingual acoustic models. arXiv preprint arXiv:1807.01956, 2018 |
Müller, Markus; Stüker, Sebastian; Waibel, Alex; Neural codes to factor language in multilingual speech recognition. ICASSP-2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2019 |
Pham, Ngoc-Quan; Waibel, Alex; Niehues, Jan; Adaptive multilingual speech recognition with pretrained models. arXiv preprint arXiv:2205.12304, 2022 |
Huber, Christian; Ugan, Enes Yavuz; Waibel, Alexander; Code-switching without switching: Language agnostic end-to-end speech translation. arXiv preprint arXiv:2210.01512, 2022 |
Muller, Markus; Waibel, Alexander; Neural modulation codes for multilingual and style dependent speech and language processing. US Patent App. 17/312,496, 2022 |
Ugan, Enes Yavuz; Huber, Christian; Hussain, Juan; Waibel, Alexander; Language-agnostic Code-Switching in End-To-End Speech Recognition. CoRR, 2022 |
Ugan, Enes Yavuz; Pham, Ngoc-Quan; Waibel, Alex; DECM: Evaluating bilingual ASR performance on a code-switching/mixing benchmark. Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation, 2024 |
Enes Yavuz Ugan, Ngoc-Quan Pham, Leonard Bärmann, Alex Waibel; |
Conversional Speech Generation |
---|
Ruede, Robin; Müller, Markus; Stüker, Sebastian; Waibel, Alex; Yeah, right, uh-huh: a deep learning backchannel predictor; Advanced social interaction with agents: 8th international workshop on spoken dialog systems, Springer International Publishing, 2019 |
Ruede, Robin; Müller, Markus; Stüker, Sebastian; Waibel, Alex; Enhancing Backchannel Prediction Using Word Embeddings. Interspeech 2017 |
Multimodal Processing |
---|
Yang, Jie; Waibel, Alex; A real-time face tracker. Proceedings Third IEEE Workshop on Applications of Computer Vision. WACV, 1996 |
Yang, Jie; Waibel, Alex; Tracking human faces in real-time. Carnegie-Mellon University. Department of Computer Science, 1995 |
Duchnowski, Paul; Meier, Uwe; Waibel, Alex; See me, hear me: integrating automatic speech recognition and lip-reading. ICSLP, 1994 |
Waibel, Alex; Stiefelhagen, Rainer; Carlson, Rolf; Casas, Joseph; Kleindienst, Jan; Lamel, Lori; Lanz, Oswald; Mostefa, Djamel; Omologo, Maurizio; Pianesi, Fabio; Computers in the human interaction loop. Handbook of Ambient Intelligence and Smart Environments, 2010 |
Stiefelhagen, Rainer; Yang, Jie; Waibel, Alex; Estimating focus of attention based on gaze and sound. Proceedings of the 2001 workshop on Perceptive user interfaces, 2001 |
Bub, Udo; Hunke, Martin; Waibel, Alex; Knowing who to listen to in speech recognition: Visually guided beamforming. International Conference on Acoustics, Speech, and Signal Processing, 1995 |
Yang, Jie; Gao, Jiang; Zhang, Ying; Chen, Xilin; Waibel, Alex; An automatic sign recognition and translation system. Proceedings of the 2001 workshop on Perceptive user interfaces, 2001 |
Zhang, Jing; Chen, Xilin; Hanneman, Andreas; Yang, Jie; Waibel, Alex; A robust approach for recognition of text embedded in natural scenes. International Conference on Pattern Recognition, 2002 |
Yang, Jie; Gao, Jiang; Zhang, Ying; Waibel, Alex; Towards automatic sign translation. Proceedings of the first international conference on Human language technology research, 2001 |
Waibel, Alexander; Translation and integration of presentation materials with cross-lingual multi-media support. US Patent 9,678,953, 2017 |
Waibel, Alex; Steusloff, Hartwig; Stiefelhagen, Rainer; CHIL: Computers in the human interaction loop, Proceedings of the 5th International Workshop on Image Analysis for Multimedia Interactive Services, WIAMIS 2004, Lisboa, Portugal, April 21-24, 2004 |
Waibel, Alexander; Multimodal dialogue processing for machine translation. The Handbook of Multimodal-Multisensor Interfaces: Language Processing, Software, Commercialization, and Emerging Directions-Volume 3, 2019 |
Continually Learning |
---|
Waibel, Alexander; Lane, Ian R; Enhanced speech-to-speech translation system and methods for adding a new word, US Patent 8,972,268, 2015 |
Suhm, Bernhard; Woszczyna, Monika; Waibel, Alex; Detection and transcription of new words, Eurospeech, 1993 |
Eck, Matthias; Vogel, Stephan; Waibel, Alex; Low cost portability for statistical machine translation based on n-gram frequency and tf-idf, Proceedings of the Second International Workshop on Spoken Language Translation, 2005 |
Eck, Matthias; Vogel, Stephan; Waibel, Alex; Communicating Unknown Words in Machine Translation, LREC, 2008 |
Waibel, Alex; Schultz, Tanja; Vogel, Stephan; Fugen, C; Honal, Matthias; Kolss, Muntsin; Reichert, Jürgen; Stuker, S; Towards language portability in statistical speech translation, 2004 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2004 |
Waibel, Alex; Paulik, Matthias; Systems and methods for training statistical speech translation systems from speech utilizing a universal speech recognizer, US Patent 8,898,052, 2014 |
Pham, Ngoc-Quan; Niehues, Jan; Waibel, Alex; Towards one-shot learning for rare-word translation with external experts, arXiv preprint arXiv:1809.03182, 2018 |
Pham, Ngoc-Quan; Niehues, Jan; Ha, Thanh-Le; Waibel, Alex; Improving zero-shot translation with language-independent constraints, arXiv preprint arXiv:1906.08584, 2019 |
Pham, Ngoc-Quan; Niehues, Jan; Waibel, Alexander; Towards continually learning new languages, arXiv preprint arXiv:2211.11703, 2022 |
Eck, Matthias; Vogel, Stephan; Waibel, Alex; Low cost portability for statistical machine translation based on n-gram coverage, Proceedings of Machine Translation Summit: Papers, 2005 |
Huber, Christian; Hussain, Juan; Stüker, Sebastian; Waibel, Alexander; Instant one-shot word-learning for context-specific neural sequence-to-sequence speech recognition, 2021 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU), 2021 |
Waibel, Alexander; Translation training with cross-lingual multi-media support, US Patent 11,256,882, 2022 |
Huber, Christian; Kumar, Rishu; Bojar, Ondřej; Waibel, Alexander; Short-Term Word-Learning in a Dynamically Changing Environment, arXiv preprint arXiv:2203.15404, 2022 |
Bärmann, Leonard; Kartmann, Rainer; Peller-Konrad, Fabian; Niehues, Jan; Waibel, Alex; Asfour, Tamim; Incremental learning of humanoid robot behavior from natural interaction and large language models, Frontiers in Robotics and AI, 2024 |
Huber, Christian; Waibel, Alexander; Continuously Learning New Words in Automatic Speech Recognition, ICASSP 2025-2025 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2025 |
Pham, Ngoc-Quan; Niehues, Jan; Waibel, Alexander; Towards continually learning new languages, arXiv preprint arXiv:2211.11703, 2022 |
Mullov, Carlos; Pham, Ngoc-Quan; Waibel, Alexander; Decoupled Vocabulary Learning Enables Zero-Shot Translation from Unseen Languages; arXiv preprint arXiv:2408.02290, 2024 |
Christian Huber, Alexander Waibel; Handling Numeric Expressions in Automatic Speech Recognition. |
Emotion in Speech |
---|
Polzin, Thomas S; Waibel, Alexander; Emotion-sensitive human-computer interfaces. ISCA tutorial and research workshop (ITRW) on speech and emotion, 2000 |
Dellaert, Frank; Polzin, Thomas; Waibel, Alex; Recognizing emotion in speech. Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP, 1996 |
Polzin, Thomas S; Waibel, Alex; Detecting emotions in speech. Proceedings of the CMC, 1998 |
Error Repair and Correction |
---|
Waibel, Alexander; Suhm, Bernhard; McNair, Arthur; Method and apparatus for correcting and repairing machine-transcribed input using independent or cross-modal secondary input. US Patent 5,855,000, 1998 |
Suhm, Bernhard; Myers, Brad; Waibel, Alex; Multimodal error correction for speech user interfaces. ACM transactions on computer-human interaction (TOCHI), 2001 |
Waibel, Alex H; McNair, Arthur E; Locating and correcting erroneously recognized portions of utterances by rescoring based on two n-best lists. US Patent 5,712,957, 1998 |
Soltau, Hagen; Waibel, Alex; On the influence of hyperarticulated speech on recognition performance. ICSLP, 1998 |
Suhm, Bernhard; Myers, Brad; Waibel, Alex; Interactive recovery from speech recognition errors in speech user interfaces. Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP, 1996 |
Soltau, Hagen; Waibel, Alex; Specialized acoustic models for hyperarticulated speech. IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings, 2000 |
Soltau, Hagen; Metze, Florian; Waibel, Alex; Compensating for hyperarticulation by modeling articulatory properties. Interspeech, 2002 |
Hurst, Wolfgang; Yang, Jie; Waibel, Alex; Error repair in human handwriting: an intelligent user interface for automatic online handwriting recognition. Proceedings. IEEE International Joint Symposia on Intelligence and Systems, 1998 |
Suhm, Bernhard; Waibel, Alex; Exploiting repair context in interactive error recovery. Eurospeech, 1997 |
Constantin, Stefan; Waibel, Alex; Error correction and extraction in request dialogs. 5th International Conference on Natural Language and Speech Processing (ICNLSP 2022), 2020 |
Constantin, Stefan; Eyiokur, Fevziye Irem; Yaman, Dogucan; Bärmann, Leonard; Waibel, Alex; Multimodal Error Correction with Natural Language and Pointing Gestures. Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023 |
Stefan Constantin, Alex Waibel; Comparison of Error Correction and Extraction Approaches. Signals and Communication Technology (Springer) - Practical Solutions for Diverse Real-World NLP Applications, 2024 |
Meeting Recognition and Processing |
---|
Waibel, Alex; Bett, Michael; Metze, Florian; Ries, Klaus; Schaaf, Thomas; Schultz, Tanja; Soltau, Hagen; Yu, Hua; Zechner, Klaus; Advances in automatic meeting record creation and access. IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings, 2001 |
Waibel, Alex; Bett, Michael; Finke, Michael; Stiefelhagen, Rainer; Meeting browser: Tracking and summarizing meetings. Proceedings of the DARPA broadcast news workshop, 1998 |
Stiefelhagen, Rainer; Yang, Jie; Waibel, Alex; Modeling focus of attention for meeting indexing based on multiple cues. IEEE Transactions on Neural Networks, 2002 |
Stiefelhagen, Rainer; Yang, Jie; Waibel, Alex; Modeling focus of attention for meeting indexing. Proceedings of the seventh ACM international conference on Multimedia, 1999 |
Bett, Michael; Gross, Ralph; Yu, Hua; Zhu, Xiaojin; Pan, Yue; Yang, Jie; Waibel, Alex; Multimodal Meeting Tracker. RIAO, Paris, France, 2000 |
Gross, Ralph; Bett, Michael; Yu, Hua; Zhu, Xiaojin; Pan, Yue; Yang, Jie; Waibel, Alex; Towards a multimodal meeting record. IEEE International Conference on Multimedia and Expo. ICME2000. Proceedings. Latest Advances in the Fast Changing World of Multimedia, 2000 |
Waibel, Alex; Yu, Hua; Schultz, Tanja; Pan, Yue; Bett, Michael; Westphal, Martin; Soltau, Hagen; Schaaf, Thomas; Metze, Florian; Advances in meeting recognition. Proceedings of the First International Conference on Human Language Technology Research, 2001 |
Springer, Shane Paul; Waibel, Alexander; Providing instant processing of virtual meeting recordings. US Patent App. 17/732,891, 2023 |
Robust Speech |
---|
Nguyen, Thai-Binh; Waibel, Alexander; Convoifilter: A case study of doing cocktail party speech recognition. IEEE International Conference on Acoustics, Speech, and Signal Processing Workshops (ICASSPW), 2024 |
Nguyen, Thai-Binh; Waibel, Alexander; MSA-ASR: Efficient Multilingual Speaker Attribution with frozen ASR Models. ICASSP -2025 IEEE International Conference on Acoustics, Speech and Signal Processing, 2025 |
Accent Conversion |
---|
Nguyen, Tuan-Nam; Pham, Ngoc-Quan; Waibel, Alexander, Accent Conversion using Pre-trained Model and Synthesized Data from Voice Conversion, Interspeech 2022 |
Nguyen, Tuan-Nam; Pham, Ngoc-Quan; Waibel, Alexander, Syntacc: Synthesizing multi-accent speech by weight factorization, ICASSP 2023- IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2023 |
Nguyen, Tuan Nam; Waibel, Alexander; Accent conversion for virtual conferences, |
Nguyen, Tuan Nam; Pham, Ngoc Quan; Waibel, Alexander; Accent conversion using discrete units with parallel data synthesized from controllable accented TTS, arXiv preprint arXiv:2410.03734, 2024 [PDF] |
Nguyen, Tuan Nam; Akti, Seymanur; Pham, Ngoc Quan; Waibel, Alexander; Improving Pronunciation and Accent Conversion through Knowledge Distillation And Synthetic Ground-Truth from Native TTS, ICASSP 2025-2025 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2025 [PDF] |
Nguyen, Tuan Nam; Waibel, Alexander; SYNTHESIZING MULTI-ACCENT SPEECH USING ADAPTIVE WEIGHTS, US Patent App. 18/205,287, 2024 [PDF] |
Speech Summarization |
---|
Hori, Chiori; Furui, Sadaoki; Malkin, Rob; Yu, Hua; Waibel, Alex; Automatic speech summarization applied to English broadcast news speech. IEEE International Conference on Acoustics, Speech, and Signal Processing, 2002 |
Zechner, Klaus; Waibel, Alex; DIASUMM: Flexible summarization of spontaneous dialogues in unrestricted domains. COLING 2000 Volume 2: The 18th International Conference on Computational Linguistics, 2000 |
Hori, Chiori; Furui, Sadaoki; Malkin, Rob; Yu, Hua; Waibel, Alex; Automatic summarization of english broadcast news speech. Proceedings of the second international conference on Human Language Technology Research, 2002 |
Retkowski, Fabian; Waibel, Alexander; From text segmentation to smart chaptering: A novel benchmark for structuring video transcriptions. In Proceedings of the 18th Conference of the European Chapter of the Association for Computational Linguistics (Volume 1: Long Papers), pages 406–419, St. Julian’s, Malta. Association for Computational Linguistics, 2024 |
Retkowski, Fabian; Waibel, Alexander; Zero-Shot Strategies for Length-Controllable Summarization. arXiv preprint arXiv:2501.00233, 2024 |
Retkowski, Fabian; Züfle, Maike; Sudmann, Andreas; Pfau, Dinah; Niehues, Jan; Waibel, Alexander; From Speech to Summary: A Comprehensive Survey of Speech Summarization. arXiv preprint arXiv:[]., 2025 |
Face Dubbing |
---|
Ritter, Max; Meier, Uwe; Yang, Jie; Waibel, Alex; Face translation: A multimodal translation agent. AVSP, 1999 |
Waibel, Alexander; Behr, Moritz; Yaman, Dogucan; Eyiokur, Fevziye Irem; Nguyen, Tuan-Nam; Mullov, Carlos; Demirtas, Mehmet Arif; Kantarci, Alperen; Constantin, Stefan; Ekenel, Hazim Kemal; Face-dubbing++: LIP-synchronous, voice preserving translation of videos. IEEE International Conference on Acoustics, Speech, and Signal Processing Workshops (ICASSPW), 2023 |
Yaman, Dogucan; Eyiokur, Fevziye Irem; Bärmann, Leonard; Akti, Seymanur; Ekenel, Hazım Kemal; Waibel, Alexander; Audio-Visual Speech Representation Expert for Enhanced Talking Face Video Generation and Evaluation. Proceedings Of The IEEE/CVF Conference On Computer Vision And Pattern Recognition, 2024 |
Yaman, Dogucan; Eyiokur, Fevziye Irem; Bärmann, Leonard; Ekenel, Hazım Kemal; Waibel, Alexander; Audio-driven Talking Face Generation with Stabilized Synchronization Loss. European Conference on Computer Vision, 2024 |
Eyiokur, Fevziye Irem; Huber, Christian; Nguyen, Thai-Binh; Nguyen, Tuan-Nam; Retkowski, Fabian; Ugan, Enes Yavuz; Yaman, Dogucan; Waibel, Alexander; Titanic Calling: Low Bandwidth Video Conference from the Titanic Wreck. Frontiers in Robotics and AI, 2024 |