JAIST Repository >
b. 情報科学研究科・情報科学系 >
b10. 学術雑誌論文等 >
b10-1. 雑誌掲載論文 >

このアイテムの引用には次の識別子を使用してください: http://hdl.handle.net/10119/11535

タイトル: Improving Naturalness of HMM-Based TTS Trained with Limited Data by Temporal Decomposition
著者: PHUNG, Trung-Nghia
PHAN, Thanh-Son
VU, Thang Tat
LUONG, Mai Chi
AKAGI, Masato
キーワード: text to speech
HMM-based TTS
hybrid TTS
limited data
temporal decomposition
発行日: 2013-11-01
出版者: 電子情報通信学会
誌名: IEICE TRANSACTIONS on Information and Systems
巻: E96-D
号: 11
開始ページ: 2417
終了ページ: 2426
抄録: The most important advantage of HMM-based TTS is its highly intelligible. However, speech synthesized by HMM-based TTS is muffled and far from natural, especially under limited data conditions, which is mainly caused by its over-smoothness. Therefore, the motivation for this paper is to improve the naturalness of HMM-based TTS trained under limited data conditions while preserving its intelligibility. To achieve this motivation, a hybrid TTS between HMM-based TTS and the modified restricted Temporal Decomposition (MRTD), named HTD in this paper, was proposed. Here, TD is an interpolation model of decomposing a spectral or prosodic sequence of speech into sparse event targets and dynamic event functions, and MRTD is one simplified version of TD. With a determination of event functions close to the concept of co-articulation in speech, MRTD can synthesize smooth speech and the smoothness in synthesized speech can be adjusted by manipulating event targets of MRTD. Previous studies have also found that event functions of MRTD can represent linguistic information of speech, which is important to perceive speech intelligibility, while sparse event targets can convey the non-linguistics information, which is important to perceive the naturalness of speech. Therefore, prosodic trajectories and MRTD event functions of the spectral trajectory generated by HMM-based TTS were kept unchanged to preserve the high and stable intelligibility of HMM-based TTS. Whereas MRTD event targets of the spectral trajectory generated by HMM-based TTS were rendered with an original speech database to enhance the naturalness of synthesized speech. Experimental results with small Vietnamese datasets revealed that the proposed HTD was equivalent to HMM-based TTS in terms of intelligibility but was superior to it in terms of naturalness. Further discussions show that HTD had a small footprint. Therefore, the proposed HTD showed its strong efficiency under limited data conditions.
Rights: Copyright (C)2013 IEICE. Trung-Nghia PHUNG, Thanh-Son PHAN, Thang Tat VU, Mai Chi LUONG, and Masato AKAGI, IEICE TRANSACTIONS on Information and Systems, E96-D(11), 2013, 2417-2426. http://www.ieice.org/jpn/trans_online/
URI: http://hdl.handle.net/10119/11535
資料タイプ: publisher
出現コレクション:b10-1. 雑誌掲載論文 (Journal Articles)

このアイテムのファイル:

ファイル 記述 サイズ形式
IEICE-D2013_Nghia.pdf2048KbAdobe PDF見る/開く

当システムに保管されているアイテムはすべて著作権により保護されています。

 


お問い合わせ先 : 北陸先端科学技術大学院大学 研究推進課図書館情報係