JAIST Repository: Non-parallel Voice Conversion based on Hierarchical Latent Embedding Vector Quantized Variational Autoencoder

トップページ| 北陸先端科学技術大学院大学| 附属図書館

一覧

コミュニティ
& コレクション
タイトル
著者
日付
学位論文
リサーチレポート・テクニカルメモランダム

登録利用者:

登録者ページ
利用者(E-people)

当システムについて

JAIST Repository >
b. 情報科学研究科・情報科学系 >
b11. 会議発表論文・発表資料等 >
b11-1. 会議発表論文・発表資料 >

このアイテムの引用には次の識別子を使用してください: https://hdl.handle.net/10119/17027

タイトル:	Non-parallel Voice Conversion based on Hierarchical Latent Embedding Vector Quantized Variational Autoencoder
著者:	Ho, Tuan Vu Akagi, Masato
キーワード:	Voice Conversion Challenge 2020 cross-lingual variational auoencoder hierarchical structure
発行日:	2020-10-30
出版者:	International Speech Communication Association
誌名:	Proc. Joint Workshop for the Blizzard Challenge and Voice Conversion Challenge 2020
開始ページ:	140
終了ページ:	144
DOI:	10.21437/VCC_BC.2020-20
抄録:	This paper proposes a hierarchical latent embedding structure for Vector Quantized Variational Autoencoder (VQVAE) to improve the performance of the non-parallel voice conversion (NPVC) model. Previous studies on NPVC based on vanilla VQVAE use a single codebook to encode the linguistic information at a fixed temporal scale. However, the linguistic structure contains different semantic levels (e.g., phoneme, syllable, word) that span at various temporal scales. Therefore, the converted speech may contain unnatural pronunciations which can degrade the naturalness of speech. To tackle this problem, we propose to use the hierarchical latent embedding structure which comprises several vector quantization blocks operating at different temporal scales. When trained with a multi-speaker database, our proposed model can encode the voice characteristics into the speaker embedding vector, which can be used in one-shot learning settings. Results from objective and subjective tests indicate that our proposed model outperforms the conventional VQVAE based model in both intra-lingual and cross-lingual conversion tasks. The official results from Voice Conversion Challenge 2020 reveal that our proposed model achieved the highest naturalness performance among autoencoder based models in both tasks. Our implementation is being made available at https://github.com/tuanvu92/VCC2020.
Rights:	Copyright (C) 2020 International Speech Communication Association. Ho, T.V., Akagi, M. (2020) Non-parallel Voice Conversion based on Hierarchical Latent Embedding Vector Quantized Variational Autoencoder. Proc. Joint Workshop for the Blizzard Challenge and Voice Conversion Challenge 2020, pp.140-144, DOI: 10.21437/VCC_BC.2020-20. http://dx.doi.org/10.21437/VCC_BC.2020-20
URI:	https://hdl.handle.net/10119/17027
資料タイプ:	publisher
出現コレクション:	b11-1. 会議発表論文・発表資料 (Conference Papers)

このアイテムのファイル:

ファイル	記述	サイズ	形式
3400.pdf		667Kb	Adobe PDF	見る/開く

当システムに保管されているアイテムはすべて著作権により保護されています。

お問合せ先 : 北陸先端科学技術大学院大学　研究推進課図書館情報係 (ir-sys[at]ml.jaist.ac.jp)