JAIST Repository: More Human-Like Gameplay by Blending Policies From Supervised and Reinforcement Learning

トップページ| 北陸先端科学技術大学院大学| 附属図書館

一覧

コミュニティ
& コレクション
タイトル
著者
日付
学位論文
リサーチレポート・テクニカルメモランダム

登録利用者:

登録者ページ
利用者(E-people)

当システムについて

JAIST Repository >
d. 融合科学系 >
d10. 学術雑誌論文等 >
d10-1. 雑誌掲載論文 >

このアイテムの引用には次の識別子を使用してください: https://hdl.handle.net/10119/20010

タイトル:	More Human-Like Gameplay by Blending Policies From Supervised and Reinforcement Learning
著者:	Ogawa, Tatsuyoshi Hsueh, Chu-Hsuan Ikeda, Kokolo
キーワード:	Board game human-likeness player modeling reinforcement learning supervised learning
発行日:	2024-07-11
出版者:	Institute of Electrical and Electronics Engineers (IEEE)
誌名:	IEEE Transactions on Games
巻:	16
号:	4
開始ページ:	831
終了ページ:	843
DOI:	10.1109/TG.2024.3424668
抄録:	Modeling human players’ behaviors in games is a key challenge for making natural computer players, evaluating games, and generating content. To achieve better human–computer interaction, researchers have tried various methods to create human-like artificial intelligence. In chess and Go, supervised learning with deep neural networks is known as one of the most effective ways to predict humanmoves. However, formany other games (e.g., Shogi), it is hard to collect a similar amount of game records, resulting in poor move-matching accuracy of the supervised learning. We propose a method to compensate for the weakness of the supervised learning policy by Blending it with an AlphaZero-like reinforcement learning policy. Experiments on Shogi showed that the Blend method significantly improved the move-matching accuracy over supervised learning models. Experiments on chess and Go with a limited number of game records also showed similar results. The Blendmethodwas effectivewith bothmedium and large numbers of games, particularly the medium case.We confirmed the robustness of the Blend model to the parameter and discussed the mechanism why themove-matching accuracy improves. In addition,we showed that theBlend model performed better than existingwork that tried to improve the move-matching accuracy.
Rights:	Copyright (c) 2024 Author(s). Tatsuyoshi Ogawa, Chu-Hsuan Hsueh, and Kokolo Ikeda, IEEE Transactions on Games, vol. 16, no. 4, pp. 831-843. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/). Original publication is available on IEEE Xplore via https://doi.org/10.1109/TG.2024.3424668.
URI:	https://hdl.handle.net/10119/20010
資料タイプ:	publisher
出現コレクション:	d10-1. 雑誌掲載論文 (Journal Articles)

このアイテムのファイル:

ファイル	記述	サイズ	形式
T-IKEDA-K-0930-2.pdf		804Kb	Adobe PDF	見る/開く

当システムに保管されているアイテムはすべて著作権により保護されています。

お問合せ先 : 北陸先端科学技術大学院大学　研究推進課学術情報係 (ir-sys[at]ml.jaist.ac.jp)