MULE: Multi-terrain and Unknown Load Adaptation for Effective Quadrupedal Locomotion

Vamshi Kumar Kurva, Shishir Kolathaya

Indian Institute of Science (IISc), Bangalore

(Preprint under review)

Abstract

Quadrupedal robots are increasingly deployed for load-carrying tasks across diverse terrains. While Model Pre- dictive Control (MPC)-based methods can account for payload variations, they often depend on predefined gait schedules or trajectory generators, limiting their adaptability in unstruc- tured environments. To address these limitations, we propose an Adaptive Reinforcement Learning (RL) framework that enables quadrupedal robots to dynamically adapt to both varying payloads and diverse terrains. The framework consists of a nominal policy responsible for baseline locomotion and an adaptive policy that learns corrective actions to preserve stability and improve command tracking under payload variations. We validate the proposed approach through large-scale simulation experiments in Isaac Gym and real-world hardware deployment on a Unitree Go1 quadruped. The controller was tested on flat ground, slopes, and stairs under both static and dynamic payload changes. Across all settings, our adaptive controller consistently outperformed the controller in tracking body height and velocity commands, demonstrating enhanced robustness and adaptability without requiring explicit gait design or manual tuning.

Background

Traditional load adaptation methods for quadrupeds are model-based force controllers that regulate GRFs at the stance feet based on desired height and payload variations. They typically model quadrupeds as an SRB and compute optimal GRFs at contact points, relying on gait or trajectory generators to predefine foot contact schedules based on gait and velocity. A switch-based controller applies force control to stance legs and PD control to swing legs, making the system sensitive to early or delayed contacts on unstructured terrains, which can cause instability. In contrast, RL-based controllers directly output desired joint positions tracked by a PD controller, eliminating phase-based switching and enabling implicit adaptation to terrain and contact variations. Domain randomization is commonly used in RL-based methods to improve robustness to disturbances by introducing parameter variations during training; however, an excessive range of randomization often leads to conservative policies that prioritize robustness over optimal performance. This highlights the need for adaptive policies that dynamically adjust to payload changes and terrain variations.

Training Architecture

We introduce an Adaptive RL framework for locomotion under varying payload conditions by augmenting the nominal policy with an adaptive corrective policy. The framework is trained in a two-phase process: the nominal policy is first trained under normal conditions, followed by an adaptive policy that provides corrective actions without the need for explicit payload parameter estimation.

Comparison with the baseline

We compared the tracking performance against DreamWaQ, an RL-based controller, with the base mass randomized in the range [0, 10] kg, which we refer to as the Baseline controller. Since our proposed method is also RL-based, this allows for a fair comparison of adaptive performance under mass variations.

Baseline

Adaptive (ours)

The GIFs show the baseline versus the adaptive controller under a progressive increase of payload up to 10 kg. The baseline controller struggles to maintain stable locomotion, exhibiting noticeable foot scuffing and instability at higher payloads. In contrast, the adaptive controller successfully compensates for the added weight, maintaining balance and coordination throughout.

Results

The proposed Adaptive RL framework was successfully deployed on a Unitree Go1 robot and evaluated across diverse terrains, including flat ground, slopes, and stairs, under both varying static payloads and dynamic payload scenarios, such as freely moving iron balls placed in a traymounted on the robot’s base. Across all tested conditions, our policy consistently outperformed the baseline controller in terms of accurately tracking both the commanded body height and velocity.