JAIST Repository >
b. 情報科学研究科・情報科学系 >
b11. 会議発表論文・発表資料等 >
b11-1. 会議発表論文・発表資料 >

このアイテムの引用には次の識別子を使用してください: http://hdl.handle.net/10119/18158

タイトル: Data Augmentation Using McAdams-Coefficient-Based Speaker Anonymization for Fake Audio Detection
著者: Li, Kai
Li, Sheng
Lu, Xugang
Akagi, Masato
Liu, Meng
Zhang, Lin
Zeng, Chang
Wang, Longbiao
Dang, Jianwu
Unoki, Masashi
キーワード: fake audio detection
data augmentation
McAdams coefficients
speaker anonymization
発行日: 2022-09
出版者: International Speech Communication Association
誌名: Proc. InterSpeech 2022
開始ページ: 664
終了ページ: 668
DOI: 10.21437/Interspeech.2022-10088
抄録: Fake audio detection (FAD) is a technique to distinguish synthetic speech from natural speech. In most FAD systems, removing irrelevant features from acoustic speech while keeping only robust discriminative features is essential. Intuitively, speaker information entangled in acoustic speech should be suppressed for the FAD task. Particularly in a deep neural network (DNN)-based FAD system, the learning system may learn speaker information from a training dataset and cannot generalize well on a testing dataset. In this paper, we propose to use the speaker anonymization (SA) technique to suppress speaker information from acoustic speech before inputting it into a DNN-based FAD system. We adopted the McAdamscoefficient-based SA (MC-SA) algorithm, and this is expected that the entangled speaker information will not be involved in the DNN-based FAD learning. Based on this idea, we implemented a light convolutional neural network bidirectional long short-term memory (LCNN-BLSTM)-based FAD system and conducted experiments on the Audio Deep Synthesis Detection Challenge (ADD2022) datasets. The results showed that removing the speaker information from acoustic speech improved the relative performance in the first track of ADD2022 by 17.66%.
Rights: Copyright (C) 2022 International Speech Communication Association. Kai Li, Sheng Li, Xugang Lu, Masato Akagi, Meng Liu, Lin Zhang, Chang Zeng, Longbiao Wang, Jianwu Dang, Masashi Unoki, Proc. InterSpeech2022, 2022, pp.664-668. doi:10.21437/Interspeech.2022-10088
URI: http://hdl.handle.net/10119/18158
資料タイプ: publisher
出現コレクション:b11-1. 会議発表論文・発表資料 (Conference Papers)

このアイテムのファイル:

ファイル 記述 サイズ形式
li22o_interspeech.pdf295KbAdobe PDF見る/開く

当システムに保管されているアイテムはすべて著作権により保護されています。

 


お問い合わせ先 : 北陸先端科学技術大学院大学 研究推進課図書館情報係