GRoQ-LoCO: Generalist and Robot-agnostic Quadruped Locomotion Control using Offline Datasets

Narayanan PP, Sarvesh Prasanth Venkatesan, Srinivas Kantha Reddy, Shishir Kolathaya

Indian Institute of Science (IISc), Bangalore

(Preprint under review)

Abstract

Recent advancements in large-scale offline training have demonstrated the potential of generalist policy learning for complex robotic tasks. However, applying these principles to legged locomotion remains a challenge due to continuous dynamics and the need for real-time adaptation across diverse terrains and robot morphologies. In this work, we propose GRoQ-LoCO, a scalable, attention-based framework that learns a single generalist locomotion policy across multiple quadruped robots and terrains, relying solely on offline datasets. Our approach leverages expert demonstrations from two distinct locomotion behaviors - stair traversal (non-periodic gaits) and flat terrain traversal (periodic gaits) - collected across multiple quadruped robots, to train a generalist model that enables behavior fusion for both behaviors. Crucially, our framework operates directly on proprioceptive data from all robots without incorporating any robot-specific encodings. The policy is directly deployable on an Intel i7 nuc, producing low-latency control outputs without any test-time optimization. Our extensive experiments demonstrate strong zero-shot transfer across highly diverse quadruped robots and terrains, including hardware deployment on the Unitree Go1, a commercially available 12kg robot. Notably, we evaluate challenging cross-robot training setups where different locomotion skills are unevenly distributed across robots, yet observe successful transfer of both flat walking and stair traversal behaviors to all robots at test time. We also show preliminary walking on Stoch 5, a 70kg quadruped, on flat and outdoor terrains without requiring any fine tuning. These results highlight the potential for robust generalist locomotion across diverse robots and terrains.

Contributions

A Generalist Locomotion Controller: We develop a single policy that controls multiple distinct quadrupedal robots without requiring robot-specific information.
Offline Multi-Behavior Learning: We demonstrate that purely offline training on diverse motion data produces a policy with periodic gaits and multi-terrain traversibility.
Zero-Shot Transfer and Robustness: Our framework achieves strong zero-shot transfer across diverse quadruped robots and terrains, including hardware deployment on commercial platforms like the Unitree Go1 and the Stoch 5, without requiring fine-tuning.

Training Architecture

Data Generation — Figure 1: Offline data generation pipeline used in GRoQ-LoCO, illustrating trajectory collection from expert RL policies on diverse terrains and robot morphologies.

Model Diagram — Figure 2: Model architecture of GRoQ-LoCO, showing the sequential processing pipeline with observation encoding, causal attention, GRU-based temporal modeling, and MLP action prediction.

Expert demonstrations are collected in Isaac-Gym simulation across morphologically diverse quadruped platforms using two specialized controllers—a periodic gait controller for flat terrain and a non-periodic controller for stairs. GRoQ-LoCO integrates causal multi-head attention mechanisms before and after a GRU core to enhance temporal representation learning. Observation embeddings and GRU hidden states are contextually refined to produce smooth and adaptive actions across diverse robot morphologies and terrains. Zero-shot generalization is showcased on robots including Unitree Go1, B1 and B2, Stoch5, Stoch3, Aliengo, X30, and Lite3.

Experiments

We evaluate our generalist policy across five settings, spanning multiple quadruped robots and terrain types—flat ground, stairs, and slopes—with data sparsely distributed across this space.

Zero-Shot Skill and Robot Transfer. Our policy generalizes to unseen robot morphologies and terrains without any fine-tuning, demonstrating effective knowledge transfer. Robots not seen during training successfully walk, climb stairs, and traverse slopes in a zero-shot setting.

Out-of-Distribution Generalization. Despite training only on stairs up to 17 cm step height, policies succeed on both smooth and irregular inclines up to 40° and on unseen stair geometries beyond the training range, showing emergent base stabilization and terrain adaptation.

Cross-Robot Skill Sharing. Robots that only contributed flat-terrain data exhibit robust stair traversal. This highlights effective generalization enabled by shared locomotion structure.

Real-World Deployment Without Tuning. Policies trained entirely in simulation are deployed directly on hardware (Go1, Stoch5), demonstrating successful flat, stair, and slope traversal in real-world conditions.

Check out our paper for more details.

GRoQ-LoCO: Generalist and Robot-agnostic Quadruped Locomotion Control using Offline Datasets

Abstract

Contributions

Training Architecture

Experiments

Results

Zero-shot stair climbing across morphologies

Go1 Zero-shot on stairs and slopes

Stoch5 Zero-Shot Flat Terrain Locomotion (Preliminary Hardware Results)

Cite