|
JAIST Repository >
b. 情報科学研究科・情報科学系 >
b11. 会議発表論文・発表資料等 >
b11-1. 会議発表論文・発表資料 >
このアイテムの引用には次の識別子を使用してください:
http://hdl.handle.net/10119/18158
|
タイトル: | Data Augmentation Using McAdams-Coefficient-Based Speaker Anonymization for Fake Audio Detection |
著者: | Li, Kai Li, Sheng Lu, Xugang Akagi, Masato Liu, Meng Zhang, Lin Zeng, Chang Wang, Longbiao Dang, Jianwu Unoki, Masashi |
キーワード: | fake audio detection data augmentation McAdams coefficients speaker anonymization |
発行日: | 2022-09 |
出版者: | International Speech Communication Association |
誌名: | Proc. InterSpeech 2022 |
開始ページ: | 664 |
終了ページ: | 668 |
DOI: | 10.21437/Interspeech.2022-10088 |
抄録: | Fake audio detection (FAD) is a technique to distinguish synthetic speech from natural speech. In most FAD systems, removing irrelevant features from acoustic speech while keeping only robust discriminative features is essential. Intuitively, speaker information entangled in acoustic speech should be suppressed
for the FAD task. Particularly in a deep neural network (DNN)-based FAD system, the learning system may learn speaker information from a training dataset and cannot generalize well on a testing dataset. In this paper, we propose to use the speaker anonymization (SA) technique to suppress speaker information from acoustic speech before inputting it into a DNN-based FAD system. We adopted the McAdamscoefficient-based SA (MC-SA) algorithm, and this is expected that the entangled speaker information will not be involved in the DNN-based FAD learning. Based on this idea, we implemented
a light convolutional neural network bidirectional long short-term memory (LCNN-BLSTM)-based FAD system and conducted experiments on the Audio Deep Synthesis Detection Challenge (ADD2022) datasets. The results showed that removing the speaker information from acoustic speech improved the relative performance in the first track of ADD2022 by 17.66%. |
Rights: | Copyright (C) 2022 International Speech Communication Association. Kai Li, Sheng Li, Xugang Lu, Masato Akagi, Meng Liu, Lin Zhang, Chang Zeng, Longbiao Wang, Jianwu Dang, Masashi Unoki, Proc. InterSpeech2022, 2022, pp.664-668. doi:10.21437/Interspeech.2022-10088 |
URI: | http://hdl.handle.net/10119/18158 |
資料タイプ: | publisher |
出現コレクション: | b11-1. 会議発表論文・発表資料 (Conference Papers)
|
このアイテムのファイル:
ファイル |
記述 |
サイズ | 形式 |
li22o_interspeech.pdf | | 295Kb | Adobe PDF | 見る/開く |
|
当システムに保管されているアイテムはすべて著作権により保護されています。
|