About me

I earned my Ph.D. from Xi’an Jiaotong University under the guidance of Prof. Jihua Zhu. Throughout my doctoral journey, I had the privilege of being a visiting researcher at Multimedia Computing Group of Delft University of Technology under the guidance of Prof. Odette Scharenborg and at ASLP of Northwestern Polytechnical University under the guidance of Prof. Lei Xie. Currently, I am working as a researcher in a video game company in Shanghai. My primary focus lies in speech generation, encompassing areas such as text-to-speech synthesis and speech conversion.

Selected Publications

Zhichao Wang, Xinsheng Wang, Qicong Xie, Tao Li, Lei Xie et al. MSM-VC: High-fidelity Source Style Transfer for Non-Parallel Voice Conversion by Multi-scale Style Modeling [J]. IEEE/ACM Transactions on Audio, Speech, and Language Processing (TASLP), 2023.
Xinsheng Wang, Qicong Xie, Jihua Zhu, Lei Xie, and Odette Scharenborg. AnyoneNet: Synchronized Speech and Talking Head Generation for Arbitrary Persons [J]. IEEE Transactions on Multimedia (TMM), 2022.
Tao Li, Xinsheng Wang, Qicong Xie, Zhichao Wang, Xie Lei. Cross-speaker emotion disentangling and transfer for end-to-end speech synthesis. IEEE/ACM Transactions on Audio, Speech, and Language Processing (TASLP), 2022.
Yi Lei, Shan Yang, Xinsheng Wang, Lei Xie. Msemotts: Multi-scale emotion transfer, prediction, and control for emotional speech synthesis. IEEE/ACM Transactions on Audio, Speech, and Language Processing (TASLP), 2022.
Xinsheng Wang, Tingting Qiao, Jihua Zhu, et al. Generating Images From Spoken Descriptions [J]. IEEE/ACM Transactions on Audio, Speech, and Language Processing (TASLP), 2021.
Xinsheng Wang, Justin van der Hout, Jihua Zhu, et al. Synthesizing spoken descriptions of images [J]. IEEE/ACM Transactions on Audio, Speech, and Language Processing (TASLP), 2021.

Zhichao Wang, Xinsheng Wang, Lei Xie, et al. Delivering Speaking Style in Low-Resource Voice Conversion with Multi-Factor Constraints. ICASSP, 2023.
Yi Lei, Shan Yang, Xinsheng Wang, Qicong Xie et. al. UniSyn: An End-to-End Unified Model for Text-to-Speech and Singing Voice Synthesis. AAAI, 2023
Heyang Xue, Xinsheng Wang, Yongmao Zhang, et al. Learn2Sing 2.0: Diffusion and Mutual Information-Based Target Speaker SVS by Learning from Singing Teacher. Interspeech, 2022.
Xinsheng Wang, Yu Wang, et al. Opencpop: A High-Quality Open Source Chinese Popular Song Corpus for Singing Voice Synthesis. Interspeech, 2022.
Xinsheng Wang, Siyuan Feng, Jihua Zhu, Show and Speak: Directly Synthesize Spoken Description of Images. ICASSP, 2021.
Xinsheng Wang, Ruijian Jia, Shanmin Pang, et al. Look, Listen and Infer. ACM MM, 2021.
Liming Wang, Xinsheng Wang, Mark Hasegawa-Johnson, et al. Align or attend? Toward more efficient and accurate spoken word discovery using speech-to-image retrieval. ICASSP, 2021.
Xinsheng Wang, Tian Tian, Jihua Zhu, Odette Scharenborg. Learning Fine-grained Semantics in Spoken Language Using Visual Grounding. IEEE ISCAS, 2021.
Xinsheng Wang, Tingting Qiao, Jihua Zhu, et al. S2IGAN: Speech-to-Image Generation via Adversarial Learning. Interspeech 2020.