October 21, 2019: Combining Human Demonstrations and Interventions for Safe Training of Autonomous Systems in Real-Time

October 21, 2019
4:00 p.m.
210 Robeson Hall
Dr. John Valasek, Texas A&M University
Faculty Host: Dr. Craig Woolsey

Abstract: Reinforcement Learning (RL) has yielded many recent successes in solving complex tasks that meet and exceed the capabilities of human counterparts, demonstrated in video game environments (Mnih et al. 2015), robotic manipulators (Andrychowicz et al. 2018), and various open-source simulated scenarios (Lillicrap et al. 2015). However, RL approaches are sample inefficient and slow to converge to this impressive behavior, and limited significantly by the need to explore potential strategies through trial and error. The resultant behavior that is initially random and slow to reach proficiency can be poorly suited to various situations, such as physically embodied ground and air vehicles, or in scenarios where sufficient capability must be achieved in short time spans. In such situations, the random exploration of the state space of an untrained agent can result in unsafe behaviors and catastrophic failure of a physical system, potentially resulting in unacceptable damage or downtime. Similarly, slow convergence of the agent’s performance requires exceedingly many interactions with the environment, which is often prohibitively difficult or infeasible for physical systems that are subject to energy constraints, component failures, and operation in dynamic or ad- verse environments.

This presentation introduces Cycle-of-Learning (CoL) as a framework using an actor-critic architecture with a loss function that combines behavior cloning and 1-step Q-learning losses with an off-policy pre-training step from human demonstrations. This enables transition from behavior cloning to reinforcement learning without performance degradation and improves reinforcement learning in terms of overall performance and training time. This approach is shown to outperform state-of-the-art techniques for combining behavior cloning and reinforcement learning, for both dense and sparse reward scenarios. Results also suggest that directly including the behavior cloning loss on demonstration data helps to ensure stable learning and ground future policy updates.

Bio: John Valasek is Director, Vehicle Systems & Control Laboratory (https://vscl.tamu.edu), Thaman Professor of Undergraduate Teaching Excellence, Professor of Aerospace Engineering, and member of the Honors Faculty at Texas A&M University (TAMU). He has been actively conducting autonomy and flight controls research of manned and unmanned air vehicles in both industry and academia for 33 years. John was previously a Flight Control Engineer for the Northrop Corporation, Aircraft Division in the Flight Controls Research Group, and on the AGM-137 Tri-Services Standoff Attack Missile (TSSAM) program. At TAMU since 1997, John was the Founding Director of the Center for Autonomous Vehicles and Sensor Systems (CANVASS), a College of Engineering level multi-engineering department center in which he organized and led funding efforts in underwater, ground, air, and space autonomous systems. John holds two patents: one for autonomous air refueling of Unmanned Air Systems (UAS), and a second for the design of a UAS. He is an author of four books, most recently Advances in Computational Intelligence and Autonomy for Aerospace Systems AIAA (2018). John has served as Chair of Committee to 49 completed graduate degrees including recipients of the University Award for Outstanding Accomplishment in Research: Doctoral Level (2013), and Masters Level (2018). He is the recipient of the 2014 ASEE/AIAA John Leeland Atwood Award for national outstanding aerospace educator, and from 2006 – 2008 he served as the National President of Sigma Gamma Tau (SGT), including faculty advisor to the TAMU student branch of SGT from 2000 – present. John is a Fellow of AIAA, Senior Member of IEEE, Chair Elect of the AIAA Intelligent Systems Technical Committee, and an Associate Editor of the Journal of Guidance, Control, and Dynamics. John earned the B.S. degree in Aerospace Engineering from California State Polytechnic University, Pomona, and the M.S. and Ph.D. in Aerospace Engineering from the University of Kansas.

October 21, 2019: Combining Human Demonstrations and Interventions for Safe Training of Autonomous Systems in Real-Time

Follow us on social