JAIST Repository >
School of Information Science >
Articles >
Journal Articles >
Please use this identifier to cite or link to this item:
http://hdl.handle.net/10119/18112
|
Title: | Quality Improvement of Vietnamese HMM-Based Speech Synthesis System Based on Decomposition of Naturalness and Intelligibility using Non-negative Matrix Factorization |
Authors: | Dinh, Anh-Tuan Phan, Thanh-Son Akagi, Masato |
Keywords: | Hidden Markov model non-negative matrix factorization naturalness-intelligibility decomposition |
Issue Date: | 2016-11-12 |
Publisher: | Springer International Publishing |
Magazine name: | Advances in Information and Communication Technology |
Start page: | 490 |
End page: | 499 |
DOI: | 10.1007/978-3-319-49073-1_53 |
Abstract: | Hidden Markov model (HMM)-based synthesized speech is intelligible but not natural especially under limited data condition because of over-smoothing of the speech spectra and F0 envelope. One solution is using voice conversion methods to convert over-smoothed speech parameters to natural ones. Although conventional conversion methods transform speech spectra and F0 envelope to natural ones to improve naturalness, they cause unexpected distortions in acceptable intelligibility of synthesized speech e.g. destroying tonal information. The aim of this study is to develop a method for improving naturalness without violating acceptable intelligibility by employing our novel asymmetric bilinear model (ABM) involving non-negative matrix factorization (NMF) to separate the naturalness and intelligibility of synthesized speech. Subjective evaluations carried out on Vietnamese data confirm that the achieved synthesis quality is higher than other methods under limited data condition. Moreover, proposed method is capable of modifying over-smoothed F0 envelope without destroying tonal information. |
Rights: | Copyright (C) 2017 Springer International Publishing AG. This is the author-created version of Springer, Anh-Tuan Dinh, Thanh-Son Phan & Masato Akagi, Advances in Information and Communication Technology, 2017, 490–499. The final publication is available at http://link.springer.com, https://doi.org/10.1007/978-3-319-49073-1_53 |
URI: | http://hdl.handle.net/10119/18112 |
Material Type: | author |
Appears in Collections: | b10-1. 雑誌掲載論文 (Journal Articles)
|
Files in This Item:
File |
Description |
Size | Format |
ICTA2016.pdf | | 414Kb | Adobe PDF | View/Open |
|
All items in DSpace are protected by copyright, with all rights reserved.
|