mirror of
https://github.com/huggingface/lerobot.git
synced 2026-06-03 04:11:24 +00:00
Add GR00T N1.7 policy configuration, checkpoint compatibility, processor parity, LIBERO documentation, and focused tests. Co-authored-by: Ryan Halabi <ryhalabi@nvidia.com>
185 lines
7.2 KiB
Plaintext
185 lines
7.2 KiB
Plaintext
# GR00T Policy
|
|
|
|
GR00T is an NVIDIA foundation model family for generalized humanoid robot reasoning and skills. It is a cross-embodiment policy that accepts multimodal input, including language, images, and proprioception, to perform manipulation tasks in diverse environments.
|
|
|
|
LeRobot integrates GR00T through the `groot` policy type. The default model family is GR00T N1.5, and GR00T N1.7 can be selected with `policy.model_version=n1.7`.
|
|
|
|
## Model Overview
|
|
|
|
NVIDIA Isaac GR00T N1.5 is an upgraded version of the GR00T N1 foundation model. GR00T N1.7 extends the family with a Cosmos-Reason2/Qwen3-VL backbone and N1.7 checkpoints for SimplerEnv, DROID, and LIBERO.
|
|
|
|
Developers and researchers can post-train GR00T with their own real or synthetic data to adapt it for specific humanoid robots or tasks.
|
|
|
|
GR00T uses pre-trained vision and language encoders with a flow matching action transformer to model a chunk of actions conditioned on vision, language, and proprioception.
|
|
|
|
<img
|
|
src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/lerobot/lerobot-groot-paper1%20(1).png"
|
|
alt="An overview of GR00T"
|
|
width="80%"
|
|
/>
|
|
|
|
Its strong performance comes from being trained on an expansive and diverse humanoid dataset, which includes:
|
|
|
|
- Real captured data from robots.
|
|
- Synthetic data generated using NVIDIA Isaac GR00T Blueprint.
|
|
- Internet-scale video data.
|
|
|
|
This approach allows the model to be highly adaptable through post-training for specific embodiments, tasks, and environments.
|
|
|
|
## Installation Requirements
|
|
|
|
Install LeRobot with the GR00T extra:
|
|
|
|
```bash
|
|
pip install "lerobot[groot]"
|
|
```
|
|
|
|
GR00T is intended for NVIDIA GPU-accelerated systems. The `groot` extra installs the policy dependencies, including `transformers`, `diffusers`, `peft`, `dm-tree`, and Flash Attention where available. If Flash Attention is unavailable or incompatible, LeRobot falls back to SDPA attention in supported GR00T paths, with lower expected throughput.
|
|
|
|
For a source checkout, follow the Environment Setup in the [Installation Guide](./installation), then install the extra:
|
|
|
|
```bash
|
|
uv sync --locked --extra groot
|
|
```
|
|
|
|
If you need to install Flash Attention manually for your CUDA/PyTorch build, use the wheel or source build recommended by the [Flash Attention project](https://github.com/Dao-AILab/flash-attention).
|
|
|
|
## Usage
|
|
|
|
To use GR00T N1.5 in your LeRobot configuration, specify the policy type:
|
|
|
|
```bash
|
|
--policy.type=groot
|
|
```
|
|
|
|
To use GR00T N1.7:
|
|
|
|
```bash
|
|
--policy.type=groot \
|
|
--policy.model_version=n1.7
|
|
```
|
|
|
|
## Training
|
|
|
|
### Training Command Example
|
|
|
|
Here's a complete training command for finetuning the base GR00T model on your own dataset:
|
|
|
|
```bash
|
|
# Using a multi-GPU setup
|
|
accelerate launch \
|
|
--multi_gpu \
|
|
--num_processes=$NUM_GPUS \
|
|
$(which lerobot-train) \
|
|
--output_dir=$OUTPUT_DIR \
|
|
--save_checkpoint=true \
|
|
--batch_size=$BATCH_SIZE \
|
|
--steps=$NUM_STEPS \
|
|
--save_freq=$SAVE_FREQ \
|
|
--log_freq=$LOG_FREQ \
|
|
--policy.push_to_hub=true \
|
|
--policy.type=groot \
|
|
--policy.repo_id=$REPO_ID \
|
|
--policy.tune_diffusion_model=false \
|
|
--dataset.repo_id=$DATASET_ID \
|
|
--wandb.enable=true \
|
|
--wandb.disable_artifact=true \
|
|
--job_name=$JOB_NAME
|
|
```
|
|
|
|
For N1.7, add:
|
|
|
|
```bash
|
|
--policy.model_version=n1.7
|
|
```
|
|
|
|
## Performance Results
|
|
|
|
### LIBERO Benchmark Results
|
|
|
|
> [!NOTE]
|
|
> Follow the [LIBERO](./libero) setup instructions before running `lerobot-eval`.
|
|
|
|
GR00T has demonstrated strong performance on the LIBERO benchmark suite. To compare and test its LeRobot implementation, we finetuned the GR00T N1.5 model for 30k steps on the LIBERO dataset and compared the results to the GR00T reference results.
|
|
|
|
| Benchmark | LeRobot Implementation | GR00T Reference |
|
|
| ------------------ | ---------------------- | --------------- |
|
|
| **Libero Spatial** | 82.0% | 92.0% |
|
|
| **Libero Object** | 99.0% | 92.0% |
|
|
| **Libero Long** | 82.0% | 76.0% |
|
|
| **Average** | 87.0% | 87.0% |
|
|
|
|
These results demonstrate GR00T's strong generalization capabilities across diverse robotic manipulation tasks. To reproduce these results, follow the instructions in the [LIBERO](./libero) section.
|
|
|
|
### GR00T N1.7 LIBERO Checkpoints
|
|
|
|
NVIDIA publishes GR00T N1.7 LIBERO checkpoints at [`nvidia/GR00T-N1.7-LIBERO`](https://huggingface.co/nvidia/GR00T-N1.7-LIBERO), with one subdirectory per LIBERO suite:
|
|
|
|
| Suite | Checkpoint subdirectory |
|
|
| -------------- | ----------------------- |
|
|
| LIBERO Spatial | `libero_spatial` |
|
|
| LIBERO Object | `libero_object` |
|
|
| LIBERO Goal | `libero_goal` |
|
|
| LIBERO 10 | `libero_10` |
|
|
|
|
Preliminary LeRobot integration results:
|
|
|
|
| Suite | Status | Success rate | n_episodes |
|
|
| -------------- | ------ | -----------: | ---------: |
|
|
| LIBERO Spatial | ✓ | ~95% | XX |
|
|
| LIBERO Object | ✓ | XX% | XX |
|
|
| LIBERO Goal | ✓ | XX% | XX |
|
|
| LIBERO 10 | ✓ | XX% | XX |
|
|
| **Average** | ✓ | **XX%** | **XX** |
|
|
|
|
Replace the `XX` placeholders with final eval artifacts before merge.
|
|
|
|
Download the suite checkpoint locally, then point `--policy.base_model_path` at the downloaded subdirectory. `--policy.path` is reserved for LeRobot checkpoints that contain a LeRobot `config.json` with a `type` field.
|
|
|
|
```bash
|
|
huggingface-cli download nvidia/GR00T-N1.7-LIBERO \
|
|
--include "libero_spatial/*" \
|
|
--local-dir ./GR00T-N1.7-LIBERO
|
|
|
|
lerobot-eval \
|
|
--policy.type=groot \
|
|
--policy.model_version=n1.7 \
|
|
--policy.base_model_path=./GR00T-N1.7-LIBERO/libero_spatial \
|
|
--policy.embodiment_tag=libero_sim \
|
|
--env.type=libero \
|
|
--env.task=libero_spatial \
|
|
--eval.n_episodes=50
|
|
```
|
|
|
|
Use `eval.n_episodes >= 50` per suite when reporting success rates.
|
|
|
|
### Evaluate in your hardware setup
|
|
|
|
Once you have trained your model using your parameters you can run inference in your downstream task. Follow the instructions in [Policy Deployment (lerobot-rollout)](./inference). For example:
|
|
|
|
```bash
|
|
lerobot-rollout\
|
|
--strategy.type=sentry \
|
|
--strategy.upload_every_n_episodes=5 \
|
|
--robot.type=bi_so_follower \
|
|
--robot.left_arm_port=/dev/ttyACM1 \
|
|
--robot.right_arm_port=/dev/ttyACM0 \
|
|
--robot.id=bimanual_follower \
|
|
--robot.cameras='{ right: {"type": "opencv", "index_or_path": 0, "width": 640, "height": 480, "fps": 30},
|
|
left: {"type": "opencv", "index_or_path": 2, "width": 640, "height": 480, "fps": 30},
|
|
top: {"type": "opencv", "index_or_path": 4, "width": 640, "height": 480, "fps": 30},
|
|
}' \
|
|
--display_data=true \
|
|
--dataset.repo_id=<user>/eval_groot-bimanual \
|
|
--dataset.single_task="Grab and handover the red cube to the other arm" \
|
|
--dataset.streaming_encoding=true \
|
|
--dataset.encoder_threads=2 \
|
|
# --dataset.camera_encoder.vcodec=auto \
|
|
--policy.path=<user>/groot-bimanual \ # your trained model
|
|
--duration=600
|
|
```
|
|
|
|
## License
|
|
|
|
GR00T N1.5 follows NVIDIA's license terms, consistent with the original [GR00T repository](https://github.com/NVIDIA/Isaac-GR00T). GR00T N1.7 is released under the [NVIDIA Open Model License Agreement](https://www.nvidia.com/en-us/agreements/enterprise-software/nvidia-open-model-license/).
|