OSA Takayuki



Associate Professor


2-4 Hibikino, Wakamatsu-ku, Kitakyushu-shi, Fukuoka

Research Fields, Keywords


Undergraduate Education 【 display / non-display

  • 2007.03   The University of Tokyo   Faculty of Engineering   Graduated   JAPAN

Post Graduate Education 【 display / non-display

  • 2015.03  The University of Tokyo  Graduate School, Division of Engineering  Doctoral Program  Completed  JAPAN

Degree 【 display / non-display

  • The University of Tokyo -  Doctor of Engineering  2015.03

Biography in Kyutech 【 display / non-display

  • 2019.03

    Kyushu Institute of TechnologyGraduate School of Life Science and Systems Engineering   Department of Human Intelligence Systems   Associate Professor  

Biography before Kyutech 【 display / non-display

  • 2018.04

    The University of Tokyo   School of Engineering   Specially Appointed Lecturer   JAPAN

  • 2017.04

    The University of Tokyo   Specially Appointed Assistant Professor   JAPAN

  • 2015.04

    TU Darmstadt   Postdoctoral Researcher   GERMANY

  • 2010.04

    Terumo Corp.   JAPAN

Academic Society Memberships 【 display / non-display

  • 2014.09

    The Robotics Society of Japan  JAPAN

  • 2010.03



Publications (Article) 【 display / non-display

  • Hierarchical Stochastic Optimization with Application to Parameter Tuning for Electronically Controlled Transmissions

    IEEE Robotics and Automation Letters  ( IEEE )  5 ( 2 ) 628 - 635   2020.01  [Refereed]


  • Manipulation Planning with Multimodal End Point Optimization for Obtaining Multiple Solutions

    Osa Takayuki, Sato Masaya, Moriki Kazuya, Sugiyama Satoshi, Sugita Naohiko, Nakao Masayuki

    Journal of the Robotics Society of Japan  ( The Robotics Society of Japan )  37 ( 8 ) 718 - 725   2019.10  [Refereed]

     View Summary

    <p>Motion planning for robotics manipulation is an essential component for automating various tasks. In this study we discuss optimization-based motion planning methods for robotic manipulation. The optimization-based method can compute smooth and collision-free trajectories with relatively short computational cost. Although existing methods are often designed to output a single solution, the objective function is often multimodal and there exist multiple solutions to achieve a given task. On such a task, obtaining multiple solutions gives a user an opportunity to choose one of the solutions based on factors which are not encoded in the objective function. To address this issue, we propose a motion planning framework that finds multiple solutions. The proposed method is validated in simulated environments with a four-link manipulator in 2D space and a 6 DoFs manipualtor in 3D space. </p>

    DOI CiNii

  • Hierarchical reinforcement learning via advantage-weighted information maximization

        2019.05  [Refereed]

    USA  New Orleans 

     View Summary

    © 7th International Conference on Learning Representations, ICLR 2019. All Rights Reserved. Real-world tasks are often highly structured. Hierarchical reinforcement learning (HRL) has attracted research interest as an approach for leveraging the hierarchical structure of a given task in reinforcement learning (RL). However, identifying the hierarchical policy structure that enhances the performance of RL is not a trivial task. In this paper, we propose an HRL method that learns a latent variable of a hierarchical policy using mutual information maximization. Our approach can be interpreted as a way to learn a discrete and latent representation of the state-action space. To learn option policies that correspond to modes of the advantage function, we introduce advantage-weighted importance sampling. In our HRL method, the gating policy learns to select option policies based on an option-value function, and these option policies are optimized based on the deterministic policy gradient method. This framework is derived by leveraging the analogy between a monolithic policy in standard RL and a hierarchical policy in HRL by using a deterministic option policy. Experimental results indicate that our HRL approach can learn a diversity of options and that it can enhance the performance of RL in continuous control tasks.


  • Hierarchical reinforcement learning of multiple grasping strategies with human instructions

      32 ( 18 ) 955 - 968   2018.09  [Refereed]

     View Summary

    © 2018, © 2018 Informa UK Limited, trading as Taylor & Francis Group and The Robotics Society of Japan. Grasping is an essential component for robotic manipulation and has been investigated for decades. Prior work on grasping often assumes that a sufficient amount of training data is available for learning and planning robotic grasps. However, constructing such an exhaustive training dataset is very challenging in practice, and it is desirable that a robotic system can autonomously learn and improves its grasping strategy. Although recent work has presented autonomous data collection through trial and error, such methods are often limited to a single grasp type, e.g. vertical pinch grasp. To address these issues, we present a hierarchical policy search approach for learning multiple grasping strategies. To leverage human knowledge, multiple grasping strategies are initialized with human demonstrations. In addition, a database of grasping motions and point clouds of objects is also autonomously built upon a set of grasps given by a user. The problem of selecting the grasp location and grasp policy is formulated as a bandit problem in our framework. We applied our reinforcement learning to grasping both rigid and deformable objects. The experimental results show that our framework autonomously learns and improves its performance through trial and error and can grasp previously unseen objects with a high accuracy.

    DOI Scopus

  • Sample and Feedback Efficient Hierarchical Reinforcement Learning from Human Preferences

        596 - 601   2018.09  [Refereed]

     View Summary

    © 2018 IEEE. While reinforcement learning has led to promising results in robotics, defining an informative reward function is challenging. Prior work considered including the human in the loop to jointly learn the reward function and the optimal policy. Generating samples from a physical robot and requesting human feedback are both taxing efforts for which efficiency is critical. We propose to learn reward functions from both the robot and the human perspectives to improve on both efficiency metrics. Learning a reward function from the human perspective increases feedback efficiency by assuming that humans rank trajectories according to a low-dimensional outcome space. Learning a reward function from the robot perspective circumvents the need for a dynamics model while retaining the sample efficiency of model-based approaches. We provide an algorithm that incorporates bi-perspective reward learning into a general hierarchical reinforcement learning framework and demonstrate the merits of our approach on a toy task and a simulated robot grasping task.

    DOI Scopus

display all >>

Publications (Books) 【 display / non-display

  • An Algorithmic Perspective on Imitation Learning

    Takayuki Osa, Joni Pajarinen, Gerhard Neumann, J. Andrew Bagnell, Pieter Abbeel, Jan Peters ( Joint Work )

    Now Publisher  2018.03

Conference Prsentations (Oral, Poster) 【 display / non-display

  • Hierarchical Stochastic Optimization with Application to Parameter Tuning for Electronically Controlled Transmissions

    Hiroyuki Karasawa, Tomohiro Kanemaki, Kei Oomae, Rui Fukui, Masayuki Nakao, Takayuki Osa

    IEEE International Conference on Robotics and Automation (ICRA)  (Paris, France)  2020.05  -  2020.06  IEEE

  • オペレータの機械操作を模倣した大径ワイヤの自動整列巻取り — スケールモデルの開発と学習アルゴリズムの提案—

    仁保隆嘉, 長隆之, 森木和也, 鈴木翔大, 杉田直彦, 中尾政之

    第25回ロボティクスシンポジア  (北海道函館市 湯の川温泉 花びしホテル)  2020.03  -  2020.03 

  • 階層型確率的最適化によるトランスミッション制御パラメータの学習と実機による評価

    唐澤宏之, 金牧知宏, 大前圭, 福井類, 中尾政之, 長隆之

    第25回ロボティクスシンポジア  (北海道函館市 湯の川温泉 花びしホテル)  2020.03  -  2020.03 

  • Reducing Overestimation Bias in Multi-Agent Domains Using Double Centralized Critics

    Johannes Ackerman, Takayuki Osa, Masashi Sugiyama

    Deep Reinforcement Learning Workshop NeurIPS 2019  (Vancouver, Canada)  2019.12  -  2019.12 

  • Trajectory Optimization via Density Estimation

    Takayuki Osa

    第37回日本ロボット学会学術講演会  (早稲田大学)  2019.09  -  2019.09  日本ロボット学会

display all >>

Lectures 【 display / non-display

  • How should we design a robot learning system?ro

    Workshop on Robot Learning: Control and Interaction in the Real World at NeurIPS 2019   2019.12.14 

  • 模倣学習および強化学習による動作計画

    第120回ロボット工学セミナー ( 中央大学 後楽園キャンパス )  2019.06.27  日本ロボット学会

  • チュートリアル「強化学習」

    2018年度人工知能学会全国大会   2018.06.05 

Honors and Awards 【 display / non-display

  • 2014 IEEE Robotics and Automation Society Japan Chapter Young Award (ICRA2014)

    2014.06     JAPAN

    Winner: Takayuki Osa

Grants-in-Aid for Scientific Research 【 display / non-display

  • Hierarchical Reinforcement Learning for Autonomous Motion Planning with Real Robots

    Grant-in-Aid for Young Scientists(B)

    Project Year:  2019.04  -  2023.03

    Project Number:  19K20370

Career of Research abroad 【 display / non-display

  • RoMaNS - Robotic Manipulation for Nuclear Sort and Segregation

    TU Darmstadt  Project Year:  2015.04  -  2017.03.31

  • Automation of Robotic Surgery Using Visual Information

    Technical University Munich  Project Year:  2008.09  -  2009.09


Activities of Academic societies and Committees 【 display / non-display

  • 2019.04

    The Robotics Society of Japan  

  • 2019.02

    Neural Information Processing Systems (NeurIPS)   Reviewer for Neural Information Processing Systems (NeurIPS) 2019

  • 2018.10

    International Conference on Machine Learning (ICML)   Reviewer for the Thirty-Sixth International Conference on Machine Learning (ICML 2019)

  • 2018.04

    Neural Information Processing Systems (NeurIPS)   Reviewer for Neural Information Processing Systems (NIPS) 2018

  • 2018.02

    Conference on Robot Learning (CoRL)   Area Chair for the 2018 Conference on Robot Learning

display all >>