2025/01/09 更新

イマガワ タカヒサ
今川 孝久
IMAGAWA Takahisa
Scopus 論文情報  
総論文数: 0  総Citation: 0  h-index: 4

Citation Countは当該年に発表した論文の被引用数

所属
大学院情報工学研究院 知的システム工学研究系
職名
助教
外部リンク

研究分野

  • 情報通信 / 知能情報学  / 人工知能,強化学習,プランニング

取得学位

  • 東京大学  -  博士(学術)   2018年03月

  • 東京大学  -  修士(学術)   2015年03月

  • 東京大学  -  学士(教養)   2013年03月

学内職務経歴

  • 2024年02月 - 現在   九州工業大学   大学院情報工学研究院   知的システム工学研究系     助教

論文

  • Off-Policy Meta-Reinforcement Learning with Belief-Based Task Inference 査読有り 国際誌

    Imagawa T., Hiraoka T., Tsuruoka Y.

    IEEE Access   10   49494 - 49507   2022年01月

     詳細を見る

    担当区分:筆頭著者   記述言語:英語   掲載種別:研究論文(学術雑誌)

    Meta-reinforcement learning (RL) addresses the problem of sample inefficiency in deep RL by using experience obtained in past tasks for solving a new task. However, most existing meta-RL methods require partially or fully on-policy data, which hinders the improvement of sample efficiency. To alleviate this problem, we propose a novel off-policy meta-RL method, embedding learning and uncertainty evaluation (ELUE). An ELUE agent is characterized by the learning of what we call a task embedding space, an embedding space for representing the features of tasks. The agent learns a belief model over the task embedding space and trains a belief-conditional policy and Q-function. The belief model is designed to be agnostic to the order in which task information is obtained, thereby reducing the difficulty of task embedding learning. For a new task, the ELUE agent collects data by the pretrained policy, and updates its belief on the basis of the belief model. Thanks to the belief update, the performance of the agent improves with a small amount of data. In addition, the agent updates the parameters of its policy and Q-function so that it can adjust the pretrained relationships when there are enough data. We demonstrate that ELUE outperforms state-of-the-art meta RL methods through experiments on meta-RL benchmarks.

    DOI: 10.1109/ACCESS.2022.3170582

    Scopus

    その他リンク: https://www.scopus.com/inward/record.uri?partnerID=HzOxMe3b&scp=85129665822&origin=inward

  • DROPOUT Q-FUNCTIONS FOR DOUBLY EFFICIENT REINFORCEMENT LEARNING 査読有り 国際誌

    Hiraoka T., Imagawa T., Hashimoto T., Onishi T., Tsuruoka Y.

    ICLR 2022 - 10th International Conference on Learning Representations   2022年01月

     詳細を見る

    担当区分:最終著者   記述言語:英語   掲載種別:研究論文(国際会議プロシーディングス)

    Randomized ensembled double Q-learning (REDQ) (Chen et al., 2021b) has recently achieved state-of-the-art sample efficiency on continuous-action reinforcement learning benchmarks. This superior sample efficiency is made possible by using a large Q-function ensemble. However, REDQ is much less computationally efficient than non-ensemble counterparts such as Soft Actor-Critic (SAC) (Haarnoja et al., 2018a). To make REDQ more computationally efficient, we propose a method of improving computational efficiency called DroQ, which is a variant of REDQ that uses a small ensemble of dropout Q-functions. Our dropout Q-functions are simple Q-functions equipped with dropout connection and layer normalization. Despite its simplicity of implementation, our experimental results indicate that DroQ is doubly (sample and computationally) efficient. It achieved comparable sample efficiency with REDQ, much better computational efficiency than REDQ, and comparable computational efficiency with that of SAC.

    Scopus

    その他リンク: https://www.scopus.com/inward/record.uri?partnerID=HzOxMe3b&scp=85146871335&origin=inward

  • Meta-Model-Based Meta-Policy Optimization 査読有り 国際誌

    Hiraoka T., Imagawa T., Tangkaratt V., Osa T., Onishi T., Tsuruoka Y.

    Proceedings of Machine Learning Research   157   129 - 144   2021年01月

     詳細を見る

    担当区分:責任著者   記述言語:英語   掲載種別:研究論文(国際会議プロシーディングス)

    Model-based meta-reinforcement learning (RL) methods have recently been shown to be a promising approach to improving the sample efficiency of RL in multi-task settings. However, the theoretical understanding of those methods is yet to be established, and there is currently no theoretical guarantee of their performance in a real-world environment. In this paper, we analyze the performance guarantee of model-based meta-RL methods by extending the theorems proposed by Janner et al. (2019). On the basis of our theoretical results, we propose Meta-Model-Based Meta-Policy Optimization (M3PO), a model-based meta-RL method with a performance guarantee. We demonstrate that M3PO outperforms existing meta-RL methods in continuous-control benchmarks.

    Scopus

    その他リンク: https://www.scopus.com/inward/record.uri?partnerID=HzOxMe3b&scp=85137101456&origin=inward

  • Learning robust options by conditional value at risk optimization 査読有り 国際誌

    Hiraoka T., Imagawa T., Mori T., Onishi T., Tsuruoka Y.

    Advances in Neural Information Processing Systems   32   2019年01月

     詳細を見る

    担当区分:責任著者   記述言語:英語   掲載種別:研究論文(国際会議プロシーディングス)

    Options are generally learned by using an inaccurate environment model (or simulator), which contains uncertain model parameters. While there are several methods to learn options that are robust against the uncertainty of model parameters, these methods only consider either the worst case or the average (ordinary) case for learning options. This limited consideration of the cases often produces options that do not work well in the unconsidered case. In this paper, we propose a conditional value at risk (CVaR)-based method to learn options that work well in both the average and worst cases. We extend the CVaR-based policy gradient method proposed by Chow and Ghavamzadeh (2014) to deal with robust Markov decision processes and then apply the extended method to learning robust options. We conduct experiments to evaluate our method in multi-joint robot control tasks (HopperIceBlock, Half-Cheetah, and Walker2D). Experimental results show that our method produces options that 1) give better worst-case performance than the options learned only to minimize the average-case loss, and 2) give better average-case performance than the options learned only to minimize the worst-case loss.

    Scopus

    その他リンク: https://www.scopus.com/inward/record.uri?partnerID=HzOxMe3b&scp=85090171999&origin=inward