JAIST Repository >
b. 情報科学研究科・情報科学系 >
b11. 会議発表論文・発表資料等 >
b11-1. 会議発表論文・発表資料 >
このアイテムの引用には次の識別子を使用してください:
http://hdl.handle.net/10119/17027
|
タイトル: | Non-parallel Voice Conversion based on Hierarchical Latent Embedding Vector Quantized Variational Autoencoder |
著者: | Ho, Tuan Vu Akagi, Masato |
キーワード: | Voice Conversion Challenge 2020 cross-lingual variational auoencoder hierarchical structure |
発行日: | 2020-10-30 |
出版者: | International Speech Communication Association |
誌名: | Proc. Joint Workshop for the Blizzard Challenge and Voice Conversion Challenge 2020 |
開始ページ: | 140 |
終了ページ: | 144 |
DOI: | 10.21437/VCC_BC.2020-20 |
抄録: | This paper proposes a hierarchical latent embedding structure for Vector Quantized Variational Autoencoder (VQVAE) to improve the performance of the non-parallel voice conversion (NPVC) model. Previous studies on NPVC based on vanilla VQVAE use a single codebook to encode the linguistic information at a fixed temporal scale. However, the linguistic structure contains different semantic levels (e.g., phoneme, syllable, word) that span at various temporal scales. Therefore, the converted speech may contain unnatural pronunciations which can degrade the naturalness of speech. To tackle this problem, we propose to use the hierarchical latent embedding structure which comprises several vector quantization blocks operating at different temporal scales. When trained with a multi-speaker database, our proposed model can encode the voice characteristics into the speaker embedding vector, which can be used in one-shot learning settings. Results from objective and subjective tests indicate that our proposed model outperforms the conventional VQVAE based model in both intra-lingual and cross-lingual conversion tasks. The official results from Voice Conversion Challenge 2020 reveal that our proposed model achieved the highest naturalness performance among autoencoder based models in both tasks. Our implementation is being made available at https://github.com/tuanvu92/VCC2020. |
Rights: | Copyright (C) 2020 International Speech Communication Association. Ho, T.V., Akagi, M. (2020) Non-parallel Voice Conversion based on Hierarchical Latent Embedding Vector Quantized Variational Autoencoder. Proc. Joint Workshop for the Blizzard Challenge and Voice Conversion Challenge 2020, pp.140-144, DOI: 10.21437/VCC_BC.2020-20. http://dx.doi.org/10.21437/VCC_BC.2020-20 |
URI: | http://hdl.handle.net/10119/17027 |
資料タイプ: | publisher |
出現コレクション: | b11-1. 会議発表論文・発表資料 (Conference Papers)
|
このアイテムのファイル:
ファイル |
記述 |
サイズ | 形式 |
3400.pdf | | 667Kb | Adobe PDF | 見る/開く |
|
当システムに保管されているアイテムはすべて著作権により保護されています。
|