JAIST Repository: Emotional speech synthesis system based on a three-layered model using a dimensional approach

トップページ| 北陸先端科学技術大学院大学| 附属図書館

一覧

コミュニティ
& コレクション
タイトル
著者
日付
学位論文
リサーチレポート・テクニカルメモランダム

登録利用者:

登録者ページ
利用者(E-people)

当システムについて

JAIST Repository >
b. 情報科学研究科・情報科学系 >
b11. 会議発表論文・発表資料等 >
b11-1. 会議発表論文・発表資料 >

このアイテムの引用には次の識別子を使用してください: https://hdl.handle.net/10119/14744

タイトル:	Emotional speech synthesis system based on a three-layered model using a dimensional approach
著者:	Xue, Yawen Hamada, Yasuhiro Akagi, Masato
発行日:	2015-12-19
出版者:	Institute of Electrical and Electronics Engineers (IEEE)
誌名:	2015 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA)
開始ページ:	505
終了ページ:	514
DOI:	10.1109/APSIPA.2015.7415323
抄録:	This paper proposes an emotional speech synthesis system based on a three-layered model using a dimensional approach. Most previous studies related to emotional speech synthesis using the dimensional approach focused on the relationship between acoustic features and emotion dimensions (valence and activation) only. However, people do not perceive emotion directly from acoustic features. Hence, the acoustic features have being particularly difficult to predict, and the affectiveness of the synthesized sound is far from that intended. The ultimate goal of this research is to improve the accuracy of acoustic feature estimation and modification rules in order to synthesize affective speech more similar to that intended in the dimensional emotion space. The proposed system is composed by three layers: acoustic features, semantic primitives, and emotion dimensions. Fuzzy Inference System (FIS) is used to connect the three layers. The related acoustic features of each semantic primitive are selected for synthesizing the emotional speech. On the basis of morphing rules, the estimated acoustic features can be applied to synthesize emotional speech. Listening tests were carried out to verify whether the synthesized speech can give the intended impression in the dimensional emotion space. Results show that not only is the accuracy of estimated acoustic features raised but also the modification rules work well for the synthesized speech, resulting in the proposed method improving the quality of synthesized speech.
Rights:	Copyright (C) 2015 APSIPA. This material is posted here with permission of APSIPA. Yawen Xue, Yasuhiro Hamada and Masato Akagi, 2015 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA), 2015, 505-514. http://dx.doi.org/10.1109/APSIPA.2015.7415323
URI:	https://hdl.handle.net/10119/14744
資料タイプ:	author
出現コレクション:	b11-1. 会議発表論文・発表資料 (Conference Papers)

このアイテムのファイル:

ファイル	記述	サイズ	形式
APSIPA2015_Xue.pdf		1383Kb	Adobe PDF	見る/開く

当システムに保管されているアイテムはすべて著作権により保護されています。

お問合せ先 : 北陸先端科学技術大学院大学　研究推進課学術情報係 (ir-sys[at]ml.jaist.ac.jp)