JAIST Repository >
School of Information Science >
Conference Papers >
Conference Papers >

Please use this identifier to cite or link to this item: http://hdl.handle.net/10119/14744

Title: Emotional speech synthesis system based on a three-layered model using a dimensional approach
Authors: Xue, Yawen
Hamada, Yasuhiro
Akagi, Masato
Issue Date: 2015-12-19
Publisher: Institute of Electrical and Electronics Engineers (IEEE)
Magazine name: 2015 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA)
Start page: 505
End page: 514
DOI: 10.1109/APSIPA.2015.7415323
Abstract: This paper proposes an emotional speech synthesis system based on a three-layered model using a dimensional approach. Most previous studies related to emotional speech synthesis using the dimensional approach focused on the relationship between acoustic features and emotion dimensions (valence and activation) only. However, people do not perceive emotion directly from acoustic features. Hence, the acoustic features have being particularly difficult to predict, and the affectiveness of the synthesized sound is far from that intended. The ultimate goal of this research is to improve the accuracy of acoustic feature estimation and modification rules in order to synthesize affective speech more similar to that intended in the dimensional emotion space. The proposed system is composed by three layers: acoustic features, semantic primitives, and emotion dimensions. Fuzzy Inference System (FIS) is used to connect the three layers. The related acoustic features of each semantic primitive are selected for synthesizing the emotional speech. On the basis of morphing rules, the estimated acoustic features can be applied to synthesize emotional speech. Listening tests were carried out to verify whether the synthesized speech can give the intended impression in the dimensional emotion space. Results show that not only is the accuracy of estimated acoustic features raised but also the modification rules work well for the synthesized speech, resulting in the proposed method improving the quality of synthesized speech.
Rights: Copyright (C) 2015 APSIPA. This material is posted here with permission of APSIPA. Yawen Xue, Yasuhiro Hamada and Masato Akagi, 2015 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA), 2015, 505-514. http://dx.doi.org/10.1109/APSIPA.2015.7415323
URI: http://hdl.handle.net/10119/14744
Material Type: author
Appears in Collections:b11-1. 会議発表論文・発表資料 (Conference Papers)

Files in This Item:

File Description SizeFormat
APSIPA2015_Xue.pdf1383KbAdobe PDFView/Open

All items in DSpace are protected by copyright, with all rights reserved.

 


Contact : Library Information Section, JAIST (ir-sys[at]ml.jaist.ac.jp)