Recent advancements in large-scale offline training have demonstrated the potential of generalist policy learning for complex robotic tasks. However, applying these principles to legged locomotion remains a challenge due to continuous dynamics and the need for real-time adaptation across diverse terrains and robot morphologies. In this work, we propose GRoQ-LoCO, a scalable, attention-based framework that learns a single generalist locomotion policy across multiple quadruped robots and terrains, relying solely on offline datasets. Our approach leverages expert demonstrations from two distinct locomotion behaviors - stair traversal (non-periodic gaits) and flat terrain traversal (periodic gaits) - collected across multiple quadruped robots, to train a generalist model that enables behavior fusion for both behaviors. Crucially, our framework operates directly on proprioceptive data from all robots without incorporating any robot-specific encodings. The policy is directly deployable on an Intel i7 nuc, producing low-latency control outputs without any test-time optimization. Our extensive experiments demonstrate strong zero-shot transfer across highly diverse quadruped robots and terrains, including hardware deployment on the Unitree Go1, a commercially available 12kg robot. Notably, we evaluate challenging cross-robot training setups where different locomotion skills are unevenly distributed across robots, yet observe successful transfer of both flat walking and stair traversal behaviors to all robots at test time. We also show preliminary walking on Stoch 5, a 70kg quadruped, on flat and outdoor terrains without requiring any fine tuning. These results highlight the potential for robust generalist locomotion across diverse robots and terrains.
Expert demonstrations are collected in Isaac-Gym simulation across morphologically diverse quadruped platforms using two specialized controllers—a periodic gait controller for flat terrain and a non-periodic controller for stairs. GRoQ-LoCO integrates causal multi-head attention mechanisms before and after a GRU core to enhance temporal representation learning. Observation embeddings and GRU hidden states are contextually refined to produce smooth and adaptive actions across diverse robot morphologies and terrains. Zero-shot generalization is showcased on robots including Unitree Go1, B1 and B2, Stoch5, Stoch3, Aliengo, X30, and Lite3.
We evaluate our generalist policy across five settings, spanning multiple quadruped robots and terrain types—flat ground, stairs, and slopes—with data sparsely distributed across this space.
Zero-Shot Skill and Robot Transfer. Our policy generalizes to unseen robot morphologies and terrains without any fine-tuning, demonstrating effective knowledge transfer. Robots not seen during training successfully walk, climb stairs, and traverse slopes in a zero-shot setting.
Out-of-Distribution Generalization. Despite training only on stairs up to 17 cm step height, policies succeed on both smooth and irregular inclines up to 40° and on unseen stair geometries beyond the training range, showing emergent base stabilization and terrain adaptation.
Cross-Robot Skill Sharing. Robots that only contributed flat-terrain data exhibit robust stair traversal. This highlights effective generalization enabled by shared locomotion structure.
Real-World Deployment Without Tuning. Policies trained entirely in simulation are deployed directly on hardware (Go1, Stoch5), demonstrating successful flat, stair, and slope traversal in real-world conditions.
Check out our paper for more details.
@misc{pp2025groqlocogeneralistrobotagnosticquadruped,
title={GRoQ-LoCO: Generalist and Robot-agnostic Quadruped Locomotion Control using Offline Datasets},
author={Narayanan PP and Sarvesh Prasanth Venkatesan and Srinivas Kantha Reddy and Shishir Kolathaya},
year={2025},
eprint={2505.10973},
archivePrefix={arXiv},
primaryClass={cs.RO},
url={https://arxiv.org/abs/2505.10973},
}