Compare commits

..

11 Commits

Author SHA1 Message Date
Jade Choghari
03cce79c88 Merge branch 'main' into feat/behavior-1k 2025-12-04 18:50:56 +01:00
Michel Aractingi
3918ab7882 Merge branch 'main' into feat/behavior-1k 2025-11-03 13:28:31 +01:00
Michel Aractingi
65b0e73ae4 * refactor behaviour1k_lerobot_dataset.py
* add example scripts to load behaviour 1k data in `load_behaviour1k_dataset.py`
2025-11-03 12:23:12 +00:00
Jade Choghari
ca7c5fcdfe remove tester 2025-10-30 18:14:09 +01:00
Jade Choghari
28f8098df4 fix style 2025-10-30 18:12:50 +01:00
Jade Choghari
db7d501281 remove comments 2025-10-30 18:12:06 +01:00
Jade Choghari
88380fe34e update changes 2025-10-30 18:11:27 +01:00
Jade Choghari
154abfd233 update
Signed-off-by: Jade Choghari <chogharijade@gmail.com>
2025-10-27 17:52:21 +01:00
Jade Choghari
dc14266762 add
Signed-off-by: Jade Choghari <chogharijade@gmail.com>
2025-10-27 16:44:58 +01:00
Michel Aractingi
fd623e0cc5 Modify convert_to_lerobot_v3 script for behaviours dataset to take a single task id and create a dataset outof it 2025-10-24 17:06:21 +02:00
Michel Aractingi
a52e88d349 add scripts for convert behavior-1k to datasetv3 2025-10-24 14:17:30 +02:00
56 changed files with 1745 additions and 3721 deletions

View File

@@ -31,8 +31,7 @@ jobs:
name: Upload Preview and Comment
if: >
github.event.workflow_run.event == 'pull_request' &&
github.event.workflow_run.conclusion == 'success' &&
github.repository == 'huggingface/lerobot'
github.event.workflow_run.conclusion == 'success'
uses: huggingface/doc-builder/.github/workflows/upload_pr_documentation.yml@main
with:
package_name: lerobot

View File

@@ -42,9 +42,7 @@ jobs:
# This job builds and deploys the official documentation.
build_main_docs:
name: Build Main Docs
if: >
(github.event_name == 'push' || github.event_name == 'workflow_dispatch') &&
github.repository == 'huggingface/lerobot'
if: github.event_name == 'push' || github.event_name == 'workflow_dispatch'
permissions:
contents: read
uses: huggingface/doc-builder/.github/workflows/build_main_documentation.yml@main
@@ -60,7 +58,7 @@ jobs:
# The result of this job triggers the 'Upload PR Documentation' workflow.
build_pr_docs:
name: Build PR Docs
if: github.event_name == 'pull_request' && github.repository == 'huggingface/lerobot'
if: github.event_name == 'pull_request'
permissions:
contents: read
pull-requests: write

View File

@@ -45,6 +45,7 @@ permissions:
env:
UV_VERSION: "0.8.0"
PYTHON_VERSION: "3.10"
DOCKER_IMAGE_NAME: huggingface/lerobot-gpu
# Ensures that only the latest commit for a PR or branch is built, canceling older runs.
concurrency:

View File

@@ -43,7 +43,6 @@ jobs:
name: Build CPU Docker for Nightly
runs-on:
group: aws-general-8-plus
if: github.repository == 'huggingface/lerobot'
outputs:
image_tag: ${{ env.DOCKER_IMAGE_NAME_CPU }}
steps:
@@ -78,7 +77,6 @@ jobs:
name: Build GPU Docker for Nightly
runs-on:
group: aws-general-8-plus
if: github.repository == 'huggingface/lerobot'
outputs:
image_tag: ${{ env.DOCKER_IMAGE_NAME_GPU }}
steps:

View File

@@ -29,7 +29,6 @@ jobs:
build-and-publish:
name: Build and publish Python distributions
runs-on: ubuntu-latest
if: github.repository == 'huggingface/lerobot'
outputs:
version: ${{ steps.extract_info.outputs.tag_version }}
permissions:

View File

@@ -45,7 +45,6 @@ jobs:
stale:
name: Close Stale Issues and PRs
runs-on: ubuntu-latest
if: github.repository == 'huggingface/lerobot'
permissions:
actions: write
contents: write # only for delete-branch option

View File

@@ -43,7 +43,6 @@ jobs:
full-tests:
name: Full Unbound Tests
runs-on: ubuntu-latest
if: github.repository == 'huggingface/lerobot'
env:
MUJOCO_GL: egl
HF_HOME: /mnt/cache/.cache/huggingface

View File

@@ -92,10 +92,6 @@
- local: phone_teleop
title: Phone
title: "Teleoperators"
- sections:
- local: torch_accelerators
title: PyTorch accelerators
title: "Supported Hardware"
- sections:
- local: notebooks
title: Notebooks

View File

@@ -1,42 +0,0 @@
# PyTorch accelerators
LeRobot supports multiple hardware acceleration options for both training and inference.
These options include:
- **CPU**: CPU executes all computations, no dedicated accelerator is used
- **CUDA**: acceleration with NVIDIA & AMD GPUs
- **MPS**: acceleration with Apple Silicon GPUs
- **XPU**: acceleration with Intel integrated and discrete GPUs
## Getting Started
To use particular accelerator, a suitable version of PyTorch should be installed.
For CPU, CUDA, and MPS backends follow instructions provided on [PyTorch installation page](https://pytorch.org/get-started/locally).
For XPU backend, follow instructions from [PyTorch documentation](https://docs.pytorch.org/docs/stable/notes/get_start_xpu.html).
### Verifying the installation
After installation, accelerator availability can be verified by running
```python
import torch
print(torch.<backend_name>.is_available()) # <backend_name> is cuda, mps, or xpu
```
## How to run training or evaluation
To select the desired accelerator, use the `--policy.device` flag when running `lerobot-train` or `lerobot-eval`. For example, to use MPS on Apple Silicon, run:
```bash
lerobot-train
--policy.device=mps ...
```
```bash
lerobot-eval \
--policy.device=mps ...
```
However, in most cases, presence of an accelerator is detected automatically and `policy.device` parameter can be omitted from CLI commands.

View File

@@ -4,12 +4,11 @@ This guide covers the complete setup process for the Unitree G1 humanoid, from i
## About the Unitree G1
We offer support for both 29 and 23 DOF G1. We introduce:
We offer support for both 29 and 23 DOF G1. In this first PR we introduce:
- **`unitree g1` robot class, handling low level communication with the humanoid**
- **ZMQ socket bridge** for remote communication over WiFi, allowing one to deploy policies remotely instead of over ethernet or directly on the Orin
- **GR00T locomotion policy** for bipedal walking and balance
- **MuJoCo simulation mode** for testing policies without the physical robot
---
@@ -192,10 +191,6 @@ Press `Ctrl+C` to stop the policy.
---
## Extra: Running in Simulation Mode (MuJoCo)
You can now test and develop policies without a physical robot using MuJoCo. to do so set `is_simulation=True` in config.
## Additional Resources
- [Unitree SDK Documentation](https://github.com/unitreerobotics/unitree_sdk2_python)

View File

@@ -11,14 +11,13 @@ LeRobot provides several utilities for manipulating datasets:
3. **Merge Datasets** - Combine multiple datasets into one. The datasets must have identical features, and episodes are concatenated in the order specified in `repo_ids`
4. **Add Features** - Add new features to a dataset
5. **Remove Features** - Remove features from a dataset
6. **Convert to Video** - Convert image-based datasets to video format for efficient storage
The core implementation is in `lerobot.datasets.dataset_tools`.
An example script detailing how to use the tools API is available in `examples/dataset/use_dataset_tools.py`.
## Command-Line Tool: lerobot-edit-dataset
`lerobot-edit-dataset` is a command-line script for editing datasets. It can be used to delete episodes, split datasets, merge datasets, add features, remove features, and convert image datasets to video format.
`lerobot-edit-dataset` is a command-line script for editing datasets. It can be used to delete episodes, split datasets, merge datasets, add features, and remove features.
Run `lerobot-edit-dataset --help` for more information on the configuration of each operation.
@@ -87,71 +86,9 @@ lerobot-edit-dataset \
--operation.feature_names "['observation.images.top']"
```
#### Convert to Video
Convert an image-based dataset to video format, creating a new LeRobotDataset where images are stored as videos. This is useful for reducing storage requirements and improving data loading performance. The new dataset will have the exact same structure as the original, but with images encoded as MP4 videos in the proper LeRobot format.
```bash
# Local-only: Save to a custom output directory (no hub push)
lerobot-edit-dataset \
--repo_id lerobot/pusht_image \
--operation.type convert_to_video \
--operation.output_dir /path/to/output/pusht_video
# Save with new repo_id (local storage)
lerobot-edit-dataset \
--repo_id lerobot/pusht_image \
--new_repo_id lerobot/pusht_video \
--operation.type convert_to_video
# Convert and push to Hugging Face Hub
lerobot-edit-dataset \
--repo_id lerobot/pusht_image \
--new_repo_id lerobot/pusht_video \
--operation.type convert_to_video \
--push_to_hub true
# Convert with custom video codec and quality settings
lerobot-edit-dataset \
--repo_id lerobot/pusht_image \
--operation.type convert_to_video \
--operation.output_dir outputs/pusht_video \
--operation.vcodec libsvtav1 \
--operation.pix_fmt yuv420p \
--operation.g 2 \
--operation.crf 30
# Convert only specific episodes
lerobot-edit-dataset \
--repo_id lerobot/pusht_image \
--operation.type convert_to_video \
--operation.output_dir outputs/pusht_video \
--operation.episode_indices "[0, 1, 2, 5, 10]"
# Convert with multiple workers for parallel processing
lerobot-edit-dataset \
--repo_id lerobot/pusht_image \
--operation.type convert_to_video \
--operation.output_dir outputs/pusht_video \
--operation.num_workers 8
```
**Parameters:**
- `output_dir`: Custom output directory (optional - by default uses `new_repo_id` or `{repo_id}_video`)
- `vcodec`: Video codec to use - options: `h264`, `hevc`, `libsvtav1` (default: `libsvtav1`)
- `pix_fmt`: Pixel format - options: `yuv420p`, `yuv444p` (default: `yuv420p`)
- `g`: Group of pictures (GOP) size - lower values give better quality but larger files (default: 2)
- `crf`: Constant rate factor - lower values give better quality but larger files, 0 is lossless (default: 30)
- `fast_decode`: Fast decode tuning option (default: 0)
- `episode_indices`: List of specific episodes to convert (default: all episodes)
- `num_workers`: Number of parallel workers for processing (default: 4)
**Note:** The resulting dataset will be a proper LeRobotDataset with all cameras encoded as videos in the `videos/` directory, with parquet files containing only metadata (no raw image data). All episodes, stats, and tasks are preserved.
### Push to Hub
Add the `--push_to_hub true` flag to any command to automatically upload the resulting dataset to the Hugging Face Hub:
Add the `--push_to_hub` flag to any command to automatically upload the resulting dataset to the Hugging Face Hub:
```bash
lerobot-edit-dataset \
@@ -159,7 +96,7 @@ lerobot-edit-dataset \
--new_repo_id lerobot/pusht_after_deletion \
--operation.type delete_episodes \
--operation.episode_indices "[0, 2, 5]" \
--push_to_hub true
--push_to_hub
```
There is also a tool for adding features to a dataset that is not yet covered in `lerobot-edit-dataset`.

View File

@@ -24,7 +24,7 @@ Built from pure Transformer encoders, X-VLA scales naturally with model size and
<img
src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/lerobot/xvla-architecture2.png"
alt="XVLA Architecture 2"
style="width: 60%; height: auto;"
style="width: 32%; max-width: 450px; height: auto;"
/>
</p>
@@ -120,7 +120,7 @@ Adapted for Google Robot platforms.
### Recommended Training Configuration
When fine-tuning X-VLA for a new embodiment or task, we recommend not freezing the VLM, and also setting the `policy.dtype=bfloat16` to not hit OOM errors.
When fine-tuning X-VLA for a new embodiment or task, we recommend the following freezing strategy:
```bash
lerobot-train \
@@ -129,26 +129,25 @@ lerobot-train \
--job_name=xvla_training \
--policy.path="lerobot/xvla-base" \
--policy.repo_id="HF_USER/xvla-your-robot" \
--policy.dtype=bfloat16 \
--policy.action_mode=auto \
--steps=20000 \
--steps=3000 \
--policy.device=cuda \
--policy.freeze_vision_encoder=false \
--policy.freeze_language_encoder=false \
--policy.train_policy_transformer=true \
--policy.train_soft_prompts=true \
--policy.freeze_vision_encoder=True \
--policy.freeze_language_encoder=True \
--policy.train_policy_transformer=True \
--policy.train_soft_prompts=True \
--policy.action_mode=YOUR_ACTION_MODE
```
### Training Parameters Explained
| Parameter | Default | Description |
| -------------------------- | ------- | ---------------------------------------------- |
| `freeze_vision_encoder` | `false` | Do not freeze the VLM vision encoder weights |
| `freeze_language_encoder` | `false` | Do not freeze the VLM language encoder weights |
| `train_policy_transformer` | `true` | Allow policy transformer layers to train |
| `train_soft_prompts` | `true` | Allow soft prompts to train |
| Parameter | Default | Description |
| -------------------------- | ------- | ---------------------------------------- |
| `freeze_vision_encoder` | `True` | Freeze the VLM vision encoder weights |
| `freeze_language_encoder` | `True` | Freeze the VLM language encoder weights |
| `train_policy_transformer` | `True` | Allow policy transformer layers to train |
| `train_soft_prompts` | `True` | Allow soft prompts to train |
**💡 Best Practice**: For Phase II adaptation to new embodiments, do not freeze the VLM encoders and also train the policy transformer and soft prompts.
**💡 Best Practice**: For Phase II adaptation to new embodiments, freeze the VLM encoders and only train the policy transformer and soft prompts. This provides excellent sample efficiency with minimal compute.
### Example: Training on Bimanual Robot
@@ -158,15 +157,14 @@ lerobot-train \
--output_dir=./outputs/xvla_bimanual \
--job_name=xvla_so101_training \
--policy.path="lerobot/xvla-base" \
--policy.dtype=bfloat16 \
--policy.repo_id="YOUR_USERNAME/xvla-biso101" \
--steps=3000 \
--policy.device=cuda \
--policy.action_mode=so101_bimanual \
--policy.freeze_vision_encoder=false \
--policy.freeze_language_encoder=false \
--policy.train_policy_transformer=true \
--policy.train_soft_prompts=true
--policy.freeze_vision_encoder=True \
--policy.freeze_language_encoder=True \
--policy.train_policy_transformer=True \
--policy.train_soft_prompts=True
```
💡 **Best Performance:** If you have sufficient computational resources and want to achieve best X-VLA finetuning performance, you should follow the official finetuning strategy:
@@ -174,7 +172,71 @@ lerobot-train \
**🔥 Full-finetune all components with a custom learning-rate scheme**
To ensure stable optimization, the Vision-Language Model (VLM) must be trained with only 1/10 of the base learning rate, while all other components use the full LR.
This LR ratio is crucial for achieving strong and stable finetuning performance. This is already done for you by default.
This LR ratio is crucial for achieving strong and stable finetuning performance.
To enable this behavior, you must:
1. Implement a custom optimizer and register it in your training config
```
from dataclasses import dataclass, asdict
from lerobot.optim.optimizers import OptimizerConfig
import torch
@OptimizerConfig.register_subclass("xvla-adamw")
@dataclass
class XVLAAdamW(OptimizerConfig):
lr: float = 1e-4
betas: tuple[float, float] = (0.9, 0.99)
eps: float = 1e-8
weight_decay: float = 0.0
grad_clip_norm: float = 10.0
def build(self, params: dict) -> torch.optim.Optimizer:
"""
Expect `named_parameters()` as input.
Apply lr = lr / 10 for all VLM-related parameters.
"""
assert isinstance(params, dict), \
"Custom LR optimizer requires `named_parameters()` as inputs."
kwargs = asdict(self)
kwargs.pop("grad_clip_norm")
vlm_group, other_group = [], []
for name, p in params.items():
if not p.requires_grad:
continue
if "vlm" in name.lower():
vlm_group.append(p)
else:
other_group.append(p)
param_groups = [
{"params": vlm_group, "lr": self.lr * 0.1, "weight_decay": self.weight_decay * 0.1},
{"params": other_group, "lr": self.lr, "weight_decay": self.weight_decay},
]
return torch.optim.AdamW(param_groups, **kwargs)
```
2. Modify X-VLAs get_optim_params to return named parameters
Replace:
```
def get_optim_params(self) -> dict:
"""Return only trainable parameters for optimization."""
return filter(lambda p: p.requires_grad, self.parameters())
```
with:
```
def get_optim_params(self):
"""Return trainable named parameters."""
return filter(lambda kv: kv[1].requires_grad, self.named_parameters())
```
This ensures the optimizer receives a dict of named parameters, allowing it to correctly detect VLM modules and apply the 1/10 LR rule.
❕Note
Completely matching the official reported performance may require an additional warm-up LR schedule for soft-prompts, which can bring minor improvements.
@@ -264,26 +326,6 @@ domain_id = 3
The domain_id is automatically added to observations by the `XVLAAddDomainIdProcessorStep` in the preprocessing pipeline.
The `lerobot/xvla-base` model has been trained on the following domain IDs. It is recommended to choose one that most resembles your robot/configuration:
#### Fine-tuning Datasets
| Dataset Name | Domain ID |
| ---------------- | --------- |
| Bridge | 0 |
| RT1 | 1 |
| Calvin | 2 |
| libero | 3 |
| widowx-air | 4 |
| AIR-AGILEX-HQ | 5 |
| robotwin2_abs_ee | 6 |
| robotwin2_clean | 6 |
| robocasa-human | 7 |
| VLABench | 8 |
| AGIBOT-challenge | 9 |
| AIR-AGILEX | 10 |
| AIRBOT | 18 |
### 3. Processor Steps
X-VLA requires specific preprocessing and postprocessing steps for proper operation.

View File

@@ -0,0 +1,464 @@
#!/usr/bin/env python
# Copyright 2025 The HuggingFace Inc. team. All rights reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
"""
BehaviorLeRobotDatasetV3: A wrapper around LeRobotDataset v3.0 for loading BEHAVIOR-1K data.
This wrapper extends LeRobotDataset to support BEHAVIOR-1K specific features:
- Modality and camera selection (rgb, depth, seg_instance_id)
- Efficient chunk streaming mode with keyframe access
- Additional BEHAVIOR-1K metadata (cam_rel_poses, task_info, etc.)
"""
import logging
from collections.abc import Callable
from pathlib import Path
import datasets
import numpy as np
from behaviour_1k_constants import ROBOT_CAMERA_NAMES, ROBOT_TYPE
from torch.utils.data import Dataset, get_worker_info
from lerobot.datasets.lerobot_dataset import CODEBASE_VERSION, LeRobotDataset, LeRobotDatasetMetadata
from lerobot.datasets.utils import (
check_delta_timestamps,
get_delta_indices,
get_safe_version,
hf_transform_to_torch,
)
from lerobot.datasets.video_utils import decode_video_frames, get_safe_default_codec
from lerobot.utils.constants import HF_LEROBOT_HOME
logger = logging.getLogger(__name__)
class BehaviorLeRobotDatasetMetadata(LeRobotDatasetMetadata):
"""
Extended metadata class for BEHAVIOR-1K datasets.
Adds support for:
- Modality and camera filtering
- Custom metainfo and annotation paths
"""
def __init__(
self,
repo_id: str,
root: str | Path | None = None,
revision: str | None = None,
force_cache_sync: bool = False,
metadata_buffer_size: int = 10,
modalities: set[str] | None = None,
cameras: set[str] | None = None,
):
self.modalities = set(modalities) if modalities else {"rgb", "depth", "seg_instance_id"}
self.camera_names = set(cameras) if cameras else {"head", "left_wrist", "right_wrist"}
assert self.modalities.issubset({"rgb", "depth", "seg_instance_id"}), (
f"Modalities must be subset of ['rgb', 'depth', 'seg_instance_id'], got {self.modalities}"
)
assert self.camera_names.issubset(set(ROBOT_CAMERA_NAMES[ROBOT_TYPE])), (
f"Camera names must be subset of {list(ROBOT_CAMERA_NAMES[ROBOT_TYPE])}, got {self.camera_names}"
)
super().__init__(repo_id, root, revision, force_cache_sync, metadata_buffer_size)
@property
def filtered_features(self) -> dict[str, dict]:
"""Return only features matching selected modalities and cameras."""
features = {}
for name, feature_info in self.features.items():
if not name.startswith("observation.images."):
features[name] = feature_info
continue
parts = name.split(".")
if len(parts) >= 4:
modality = parts[2]
camera = parts[3]
if modality in self.modalities and camera in self.camera_names:
features[name] = feature_info
return features
@property
def video_keys(self) -> list[str]:
"""Return only video keys for selected modalities and cameras."""
all_video_keys = super().video_keys
filtered_keys = []
for key in all_video_keys:
parts = key.split(".")
if len(parts) >= 4:
modality = parts[2]
camera = parts[3]
if modality in self.modalities and camera in self.camera_names:
filtered_keys.append(key)
return filtered_keys
def get_metainfo_path(self, ep_index: int) -> Path:
"""Get path to episode metainfo file."""
if "metainfo_path" in self.info:
fpath = self.info["metainfo_path"].format(episode_index=ep_index)
return Path(fpath)
return None
def get_annotation_path(self, ep_index: int) -> Path:
"""Get path to episode annotation file."""
if "annotation_path" in self.info:
fpath = self.info["annotation_path"].format(episode_index=ep_index)
return Path(fpath)
return None
class BehaviorLeRobotDatasetV3(LeRobotDataset):
"""
BEHAVIOR-1K wrapper for LeRobotDataset v3.0.
Each BEHAVIOR-1K dataset contains a single task (e.g., behavior1k-task0000).
See https://huggingface.co/collections/lerobot/behavior-1k for all available tasks.
Key features:
- Modality and camera selection
- Efficient chunk streaming with keyframe access (recommended for B1K with GOP=250)
- Support for BEHAVIOR-1K specific observations (cam_rel_poses, task_info, task_index)
"""
def __init__(
self,
repo_id: str,
root: str | Path | None = None,
episodes: list[int] | None = None,
image_transforms: Callable | None = None,
delta_timestamps: dict[list[float]] | None = None,
tolerance_s: float = 1e-4,
revision: str | None = None,
force_cache_sync: bool = False,
download_videos: bool = True,
video_backend: str | None = None,
batch_encoding_size: int = 1,
# BEHAVIOR-1K specific arguments
modalities: list[str] | None = None,
cameras: list[str] | None = None,
check_timestamp_sync: bool = True,
chunk_streaming_using_keyframe: bool = True,
shuffle: bool = True,
seed: int = 42,
):
"""
Initialize BEHAVIOR-1K dataset.
Args:
repo_id: HuggingFace repository ID (e.g., "lerobot/behavior1k-task0000")
root: Local directory for dataset storage
episodes: List of episode indices to load (for train/val split)
image_transforms: Torchvision v2 transforms for images
delta_timestamps: Temporal offsets for history/future frames
tolerance_s: Tolerance for timestamp synchronization
revision: Git revision/branch to load
force_cache_sync: Force re-download from hub
download_videos: Whether to download video files
video_backend: Video decoder ('pyav' or 'torchcodec')
batch_encoding_size: Batch size for video encoding
modalities: List of modalities to load (None = all: rgb, depth, seg_instance_id)
cameras: List of cameras to load (None = all: head, left_wrist, right_wrist)
check_timestamp_sync: Verify timestamp synchronization (can be slow)
chunk_streaming_using_keyframe: Use keyframe-based streaming (STRONGLY RECOMMENDED for B1K)
shuffle: Shuffle chunks in streaming mode
seed: Random seed for shuffling
"""
Dataset.__init__(self)
self.repo_id = repo_id
if root:
self.root = Path(root)
else:
dataset_name = repo_id.split("/")[-1] if "/" in repo_id else repo_id
self.root = HF_LEROBOT_HOME / dataset_name
self.image_transforms = image_transforms
self.delta_timestamps = delta_timestamps
self.tolerance_s = tolerance_s
self.revision = revision if revision else CODEBASE_VERSION
self.video_backend = video_backend if video_backend else get_safe_default_codec()
self.delta_indices = None
self.batch_encoding_size = batch_encoding_size
self.episodes_since_last_encoding = 0
self.seed = seed
self.image_writer = None
self.episode_buffer = None
self.writer = None
self.latest_episode = None
self._current_file_start_frame = None
self.root.mkdir(exist_ok=True, parents=True)
if modalities is None:
modalities = ["rgb", "depth", "seg_instance_id"]
if "seg_instance_id" in modalities:
assert chunk_streaming_using_keyframe, (
"For performance, seg_instance_id requires chunk_streaming_using_keyframe=True"
)
if "depth" in modalities:
assert self.video_backend == "pyav", "Depth videos require video_backend='pyav'"
if cameras is None:
cameras = ["head", "left_wrist", "right_wrist"]
self.meta = BehaviorLeRobotDatasetMetadata(
repo_id=self.repo_id,
root=self.root,
revision=self.revision,
force_cache_sync=force_cache_sync,
modalities=modalities,
cameras=cameras,
)
if episodes is not None:
self.episodes = sorted([i for i in episodes if i < len(self.meta.episodes)])
else:
self.episodes = list(range(len(self.meta.episodes)))
logger.info(f"Total episodes: {len(self.episodes)}")
self._chunk_streaming_using_keyframe = chunk_streaming_using_keyframe
if self._chunk_streaming_using_keyframe:
if not shuffle:
logger.warning("Chunk streaming enabled but shuffle=False. This may reduce randomness.")
self.chunks = self._get_keyframe_chunk_indices()
self.current_streaming_chunk_idx = None if shuffle else 0
self.current_streaming_frame_idx = None if shuffle else self.chunks[0][0] if self.chunks else 0
self.obs_loaders = {}
self._should_obs_loaders_reload = True
self._lazy_loading = False
self._recorded_frames = self.meta.total_frames
self._writer_closed_for_reading = False
try:
if force_cache_sync:
raise FileNotFoundError
self.hf_dataset = self.load_hf_dataset()
except (AssertionError, FileNotFoundError, NotADirectoryError):
self.revision = get_safe_version(self.repo_id, self.revision)
self.download_episodes(download_videos)
self.hf_dataset = self.load_hf_dataset()
if self.delta_timestamps is not None:
check_delta_timestamps(self.delta_timestamps, self.meta.fps, self.tolerance_s)
self.delta_indices = get_delta_indices(self.delta_timestamps, self.meta.fps)
@property
def fps(self) -> int:
"""Frames per second."""
return self.meta.fps
@property
def features(self) -> dict:
"""Dataset features (filtered by modalities/cameras)."""
return self.meta.filtered_features
@property
def num_episodes(self) -> int:
"""Number of episodes."""
return len(self.episodes)
@property
def num_frames(self) -> int:
"""Total number of frames."""
return len(self.hf_dataset)
def get_episodes_file_paths(self) -> list[str]:
"""
Get download patterns for requested episodes.
Returns glob patterns for download rather than specific file paths.
Note: Unlike the base LeRobotDataset, this method cannot filter downloads to only
requested episodes because:
1. BEHAVIOR-1K episode indices are encoded (e.g., 10010 for task 1, episode 10)
2. Episodes are chunked across multiple parquet/video files
3. The parquet files are organized by chunk, not by episode
Therefore, we download full data/meta/video directories and rely on
`self.load_hf_dataset()` to filter to requested episodes from the loaded data.
"""
allow_patterns = ["data/**", "meta/**"]
# Filter by modalities and cameras for video patterns
if len(self.meta.video_keys) > 0:
if len(self.meta.modalities) != 3 or len(self.meta.camera_names) != 3:
# Only download specific modality/camera combinations
for modality in self.meta.modalities:
for camera in self.meta.camera_names:
allow_patterns.append(f"**/observation.images.{modality}.{camera}/**")
else:
# Download all videos (no filtering needed)
allow_patterns.append("videos/**")
return allow_patterns
def download_episodes(self, download_videos: bool = True) -> None:
"""
Download episodes with modality/camera filtering.
Follows the same pattern as base LeRobotDataset.download() but uses
get_episodes_file_paths() which returns patterns for modality/camera filtering.
"""
ignore_patterns = None if download_videos else "videos/"
files = self.get_episodes_file_paths()
self.pull_from_repo(allow_patterns=files, ignore_patterns=ignore_patterns)
def pull_from_repo(
self,
allow_patterns: list[str] | str | None = None,
ignore_patterns: list[str] | str | None = None,
) -> None:
"""Pull dataset from HuggingFace Hub."""
from huggingface_hub import snapshot_download
logger.info(f"Pulling dataset {self.repo_id} from HuggingFace Hub...")
snapshot_download(
self.repo_id,
repo_type="dataset",
revision=self.revision,
local_dir=self.root,
allow_patterns=allow_patterns,
ignore_patterns=ignore_patterns,
)
def load_hf_dataset(self) -> datasets.Dataset:
"""Load dataset from parquet files."""
from datasets import load_dataset
path = str(self.root / "data")
hf_dataset = load_dataset("parquet", data_dir=path, split="train")
hf_dataset.set_transform(hf_transform_to_torch)
return hf_dataset
def _get_keyframe_chunk_indices(self, chunk_size: int = 250) -> list[tuple[int, int, int]]:
"""
Divide episodes into chunks based on GOP size (keyframe interval).
For BEHAVIOR-1K, GOP size is 250 frames for efficient storage.
Returns:
List of (start_index, end_index, local_start_index) tuples
"""
chunks = []
offset = 0
for ep_array_idx in self.episodes:
# self.episodes contains array indices, so access directly
ep = self.meta.episodes[ep_array_idx]
length = ep["length"]
local_starts = list(range(0, length, chunk_size))
local_ends = local_starts[1:] + [length]
for local_start, local_end in zip(local_starts, local_ends, strict=True):
chunks.append((offset + local_start, offset + local_end, local_start))
offset += length
return chunks
def __getitem__(self, idx: int) -> dict:
"""Get item by index, with optional chunk streaming."""
if not self._chunk_streaming_using_keyframe:
item = self.hf_dataset[idx]
for key in self.meta.video_keys:
if key in self.features:
ep_idx = item["episode_index"].item()
timestamp = item["timestamp"].item()
video_path = self.root / self.meta.get_video_file_path(ep_idx, key)
frames = decode_video_frames(
video_path, [timestamp], self.tolerance_s, self.video_backend
)
item[key] = frames.squeeze(0)
if self.image_transforms is not None:
for key in self.features:
if key.startswith("observation.images."):
item[key] = self.image_transforms(item[key])
if "task_index" in item:
task_idx = item["task_index"].item()
try:
item["task"] = self.meta.tasks.iloc[task_idx].name
except (IndexError, AttributeError):
item["task"] = f"task_{task_idx}"
return item
return self._get_item_streaming(idx)
def _get_item_streaming(self, idx: int) -> dict:
"""Get item in chunk streaming mode."""
if self.current_streaming_chunk_idx is None:
worker_info = get_worker_info()
worker_id = 0 if worker_info is None else worker_info.id
rng = np.random.default_rng(self.seed + worker_id)
rng.shuffle(self.chunks)
self.current_streaming_chunk_idx = rng.integers(0, len(self.chunks)).item()
self.current_streaming_frame_idx = self.chunks[self.current_streaming_chunk_idx][0]
if self.current_streaming_frame_idx >= self.chunks[self.current_streaming_chunk_idx][1]:
self.current_streaming_chunk_idx += 1
if self.current_streaming_chunk_idx >= len(self.chunks):
self.current_streaming_chunk_idx = 0
self.current_streaming_frame_idx = self.chunks[self.current_streaming_chunk_idx][0]
self._should_obs_loaders_reload = True
item = self.hf_dataset[self.current_streaming_frame_idx]
ep_idx = item["episode_index"].item()
if self._should_obs_loaders_reload:
for loader in self.obs_loaders.values():
if hasattr(loader, "close"):
loader.close()
self.obs_loaders = {}
self.current_streaming_episode_idx = ep_idx
self._should_obs_loaders_reload = False
for key in self.meta.video_keys:
if key in self.features:
timestamp = item["timestamp"].item()
video_path = self.root / self.meta.get_video_file_path(ep_idx, key)
frames = decode_video_frames(video_path, [timestamp], self.tolerance_s, self.video_backend)
item[key] = frames.squeeze(0)
if self.image_transforms is not None:
for key in self.features:
if key.startswith("observation.images."):
item[key] = self.image_transforms(item[key])
if "task_index" in item:
task_idx = item["task_index"].item()
try:
item["task"] = self.meta.tasks.iloc[task_idx].name
except (IndexError, AttributeError):
item["task"] = f"task_{task_idx}"
self.current_streaming_frame_idx += 1
return item
def __len__(self) -> int:
"""Total number of frames."""
return len(self.hf_dataset)

View File

@@ -0,0 +1,350 @@
#!/usr/bin/env python
# Copyright 2025 The HuggingFace Inc. team. All rights reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
from collections import OrderedDict
import numpy as np
import torch as th
ROBOT_TYPE = "R1Pro"
FPS = 30
ROBOT_CAMERA_NAMES = {
"A1": {
"external": "external::external_camera",
"wrist": "external::wrist_camera",
},
"R1Pro": {
"left_wrist": "robot_r1::robot_r1:left_realsense_link:Camera:0",
"right_wrist": "robot_r1::robot_r1:right_realsense_link:Camera:0",
"head": "robot_r1::robot_r1:zed_link:Camera:0",
},
}
# Camera resolutions and corresponding intrinstics
HEAD_RESOLUTION = (720, 720)
WRIST_RESOLUTION = (480, 480)
# TODO: Fix A1
CAMERA_INTRINSICS = {
"A1": {
"external": np.array(
[[306.0, 0.0, 360.0], [0.0, 306.0, 360.0], [0.0, 0.0, 1.0]], dtype=np.float32
), # 240x240
"wrist": np.array(
[[388.6639, 0.0, 240.0], [0.0, 388.6639, 240.0], [0.0, 0.0, 1.0]], dtype=np.float32
), # 240x240
},
"R1Pro": {
"head": np.array(
[[306.0, 0.0, 360.0], [0.0, 306.0, 360.0], [0.0, 0.0, 1.0]], dtype=np.float32
), # 720x720
"left_wrist": np.array(
[[388.6639, 0.0, 240.0], [0.0, 388.6639, 240.0], [0.0, 0.0, 1.0]], dtype=np.float32
), # 480x480
"right_wrist": np.array(
[[388.6639, 0.0, 240.0], [0.0, 388.6639, 240.0], [0.0, 0.0, 1.0]], dtype=np.float32
), # 480x480
},
}
# Dataset features for BEHAVIOR-1K LeRobotDataset v3.0
BEHAVIOR_DATASET_FEATURES = {
# Actions
"action": {
"dtype": "float32",
"shape": (23,), # 23-dimensional action space for R1Pro
"names": None,
},
# Proprioception
"observation.state": {
"dtype": "float32",
"shape": (256,), # Full proprioception state
"names": None,
},
# Camera relative poses
"observation.cam_rel_poses": {
"dtype": "float32",
"shape": (21,), # 3 cameras * 7 (pos + quat)
"names": None,
},
# Task information
"observation.task_info": {
"dtype": "float32",
"shape": (None,), # Variable size
"names": None,
},
# RGB images
"observation.images.rgb.head": {
"dtype": "video",
"shape": [720, 720, 3],
"names": ["height", "width", "channels"],
},
"observation.images.rgb.left_wrist": {
"dtype": "video",
"shape": [480, 480, 3],
"names": ["height", "width", "channels"],
},
"observation.images.rgb.right_wrist": {
"dtype": "video",
"shape": [480, 480, 3],
"names": ["height", "width", "channels"],
},
# Depth images
"observation.images.depth.head": {
"dtype": "video",
"shape": [720, 720, 1],
"names": ["height", "width", "channels"],
},
"observation.images.depth.left_wrist": {
"dtype": "video",
"shape": [480, 480, 1],
"names": ["height", "width", "channels"],
},
"observation.images.depth.right_wrist": {
"dtype": "video",
"shape": [480, 480, 1],
"names": ["height", "width", "channels"],
},
# Segmentation instance ID images
"observation.images.seg_instance_id.head": {
"dtype": "video",
"shape": [720, 720, 1],
"names": ["height", "width", "channels"],
},
"observation.images.seg_instance_id.left_wrist": {
"dtype": "video",
"shape": [480, 480, 1],
"names": ["height", "width", "channels"],
},
"observation.images.seg_instance_id.right_wrist": {
"dtype": "video",
"shape": [480, 480, 1],
"names": ["height", "width", "channels"],
},
}
# Action indices
ACTION_QPOS_INDICES = {
"A1": OrderedDict(
{
"arm": np.s_[0:6],
"gripper": np.s_[6:7],
}
),
"R1Pro": OrderedDict(
{
"base": np.s_[0:3],
"torso": np.s_[3:7],
"left_arm": np.s_[7:14],
"left_gripper": np.s_[14:15],
"right_arm": np.s_[15:22],
"right_gripper": np.s_[22:23],
}
),
}
# Proprioception configuration
PROPRIOCEPTION_INDICES = {
"A1": OrderedDict(
{
"joint_qpos": np.s_[0:8],
"joint_qpos_sin": np.s_[8:16],
"joint_qpos_cos": np.s_[16:24],
"joint_qvel": np.s_[24:32],
"joint_qeffort": np.s_[32:40],
"eef_0_pos": np.s_[40:43],
"eef_0_quat": np.s_[43:47],
"grasp_0": np.s_[47:48],
"gripper_0_qpos": np.s_[48:50],
"gripper_0_qvel": np.s_[50:52],
}
),
"R1Pro": OrderedDict(
{
"joint_qpos": np.s_[
0:28
], # Full robot joint positions, the first 6 are base joints, which is NOT allowed in standard track
"joint_qpos_sin": np.s_[
28:56
], # Full robot joint positions, the first 6 are base joints, which is NOT allowed in standard track
"joint_qpos_cos": np.s_[
56:84
], # Full robot joint positions, the first 6 are base joints, which is NOT allowed in standard track
"joint_qvel": np.s_[84:112],
"joint_qeffort": np.s_[112:140],
"robot_pos": np.s_[140:143], # Global pos, this is NOT allowed in standard track
"robot_ori_cos": np.s_[143:146], # Global ori, this is NOT allowed in standard track
"robot_ori_sin": np.s_[146:149], # Global ori, this is NOT allowed in standard track
"robot_2d_ori": np.s_[149:150], # 2D global ori, this is NOT allowed in standard track
"robot_2d_ori_cos": np.s_[150:151], # 2D global ori, this is NOT allowed in standard track
"robot_2d_ori_sin": np.s_[151:152], # 2D global ori, this is NOT allowed in standard track
"robot_lin_vel": np.s_[152:155],
"robot_ang_vel": np.s_[155:158],
"arm_left_qpos": np.s_[158:165],
"arm_left_qpos_sin": np.s_[165:172],
"arm_left_qpos_cos": np.s_[172:179],
"arm_left_qvel": np.s_[179:186],
"eef_left_pos": np.s_[186:189],
"eef_left_quat": np.s_[189:193],
"gripper_left_qpos": np.s_[193:195],
"gripper_left_qvel": np.s_[195:197],
"arm_right_qpos": np.s_[197:204],
"arm_right_qpos_sin": np.s_[204:211],
"arm_right_qpos_cos": np.s_[211:218],
"arm_right_qvel": np.s_[218:225],
"eef_right_pos": np.s_[225:228],
"eef_right_quat": np.s_[228:232],
"gripper_right_qpos": np.s_[232:234],
"gripper_right_qvel": np.s_[234:236],
"trunk_qpos": np.s_[236:240],
"trunk_qvel": np.s_[240:244],
"base_qpos": np.s_[244:247], # Base joint position, this is NOT allowed in standard track
"base_qpos_sin": np.s_[247:250], # Base joint position, this is NOT allowed in standard track
"base_qpos_cos": np.s_[250:253], # Base joint position, this is NOT allowed in standard track
"base_qvel": np.s_[253:256],
}
),
}
# Proprioception indices
PROPRIO_QPOS_INDICES = {
"A1": OrderedDict(
{
"arm": np.s_[0:6],
"gripper": np.s_[6:8],
}
),
"R1Pro": OrderedDict(
{
"torso": np.s_[6:10],
"left_arm": np.s_[10:24:2],
"right_arm": np.s_[11:24:2],
"left_gripper": np.s_[24:26],
"right_gripper": np.s_[26:28],
}
),
}
# Joint limits (lower, upper)
JOINT_RANGE = {
"A1": {
"arm": (
th.tensor([-2.8798, 0.0, -3.3161, -2.8798, -1.6581, -2.8798], dtype=th.float32),
th.tensor([2.8798, 3.1415, 0.0, 2.8798, 1.6581, 2.8798], dtype=th.float32),
),
"gripper": (th.tensor([0.00], dtype=th.float32), th.tensor([0.03], dtype=th.float32)),
},
"R1Pro": {
"base": (
th.tensor([-0.75, -0.75, -1.0], dtype=th.float32),
th.tensor([0.75, 0.75, 1.0], dtype=th.float32),
),
"torso": (
th.tensor([-1.1345, -2.7925, -1.8326, -3.0543], dtype=th.float32),
th.tensor([1.8326, 2.5307, 1.5708, 3.0543], dtype=th.float32),
),
"left_arm": (
th.tensor([-4.4506, -0.1745, -2.3562, -2.0944, -2.3562, -1.0472, -1.5708], dtype=th.float32),
th.tensor([1.3090, 3.1416, 2.3562, 0.3491, 2.3562, 1.0472, 1.5708], dtype=th.float32),
),
"left_gripper": (th.tensor([0.00], dtype=th.float32), th.tensor([0.05], dtype=th.float32)),
"right_arm": (
th.tensor([-4.4506, -3.1416, -2.3562, -2.0944, -2.3562, -1.0472, -1.5708], dtype=th.float32),
th.tensor([1.3090, 0.1745, 2.3562, 0.3491, 2.3562, 1.0472, 1.5708], dtype=th.float32),
),
"right_gripper": (th.tensor([0.00], dtype=th.float32), th.tensor([0.05], dtype=th.float32)),
},
}
EEF_POSITION_RANGE = {
"A1": {
"0": (th.tensor([0.0, -0.7, 0.0], dtype=th.float32), th.tensor([0.7, 0.7, 0.7], dtype=th.float32)),
},
"R1Pro": {
"left": (
th.tensor([0.0, -0.65, 0.0], dtype=th.float32),
th.tensor([0.65, 0.65, 2.5], dtype=th.float32),
),
"right": (
th.tensor([0.0, -0.65, 0.0], dtype=th.float32),
th.tensor([0.65, 0.65, 2.5], dtype=th.float32),
),
},
}
TASK_NAMES_TO_INDICES = {
# B10
"turning_on_radio": 0,
"picking_up_trash": 1,
"putting_away_Halloween_decorations": 2,
"cleaning_up_plates_and_food": 3,
"can_meat": 4,
"setting_mousetraps": 5,
"hiding_Easter_eggs": 6,
"picking_up_toys": 7,
"rearranging_kitchen_furniture": 8,
"putting_up_Christmas_decorations_inside": 9,
# B20
"set_up_a_coffee_station_in_your_kitchen": 10,
"putting_dishes_away_after_cleaning": 11,
"preparing_lunch_box": 12,
"loading_the_car": 13,
"carrying_in_groceries": 14,
"bringing_in_wood": 15,
"moving_boxes_to_storage": 16,
"bringing_water": 17,
"tidying_bedroom": 18,
"outfit_a_basic_toolbox": 19,
# B30
"sorting_vegetables": 20,
"collecting_childrens_toys": 21,
"putting_shoes_on_rack": 22,
"boxing_books_up_for_storage": 23,
"storing_food": 24,
"clearing_food_from_table_into_fridge": 25,
"assembling_gift_baskets": 26,
"sorting_household_items": 27,
"getting_organized_for_work": 28,
"clean_up_your_desk": 29,
# B40
"setting_the_fire": 30,
"clean_boxing_gloves": 31,
"wash_a_baseball_cap": 32,
"wash_dog_toys": 33,
"hanging_pictures": 34,
"attach_a_camera_to_a_tripod": 35,
"clean_a_patio": 36,
"clean_a_trumpet": 37,
"spraying_for_bugs": 38,
"spraying_fruit_trees": 39,
# B50
"make_microwave_popcorn": 40,
"cook_cabbage": 41,
"chop_an_onion": 42,
"slicing_vegetables": 43,
"chopping_wood": 44,
"cook_hot_dogs": 45,
"cook_bacon": 46,
"freeze_pies": 47,
"canning_food": 48,
"make_pizza": 49,
}
TASK_INDICES_TO_NAMES = {v: k for k, v in TASK_NAMES_TO_INDICES.items()}

View File

@@ -0,0 +1,605 @@
#!/usr/bin/env python
# Copyright 2025 The HuggingFace Inc. team. All rights reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
"""Convert Behavior Dataset to LeRobotDataset v3.0 format"""
import argparse
import json
import logging
import shutil
from pathlib import Path
import jsonlines
import pandas as pd
import pyarrow as pa
import tqdm
from datasets import Dataset, Features, Image
from lerobot.datasets.compute_stats import aggregate_stats
from lerobot.datasets.utils import (
DEFAULT_CHUNK_SIZE,
DEFAULT_DATA_FILE_SIZE_IN_MB,
DEFAULT_DATA_PATH,
DEFAULT_VIDEO_FILE_SIZE_IN_MB,
DEFAULT_VIDEO_PATH,
LEGACY_EPISODES_PATH,
LEGACY_EPISODES_STATS_PATH,
LEGACY_TASKS_PATH,
cast_stats_to_numpy,
flatten_dict,
get_file_size_in_mb,
get_parquet_file_size_in_mb,
get_parquet_num_frames,
load_info,
update_chunk_file_indices,
write_episodes,
write_info,
write_stats,
write_tasks,
)
from lerobot.datasets.video_utils import concatenate_video_files, get_video_duration_in_s
from lerobot.utils.utils import init_logging
# script to convert one single task to v3.1
# TASK = 1
NEW_ROOT = Path("/fsx/jade_choghari/tmp/bb")
def get_total_episodes_task(local_dir: Path, task_id: int, task_ranges: dict, step) -> int:
"""
Calculates the total number of episodes for a single, specified task.
"""
# Simply load the episodes for the task and count them.
episodes = legacy_load_episodes_task(
local_dir=local_dir, task_id=task_id, task_ranges=task_ranges, step=step
)
return len(episodes)
NUM_CAMERAS = 9
def get_total_frames_task(local_dir, meta_path, task_id: int, task_ranges: dict, step: int) -> int:
episodes_metadata = legacy_load_episodes_task(
local_dir=local_dir, task_id=task_id, task_ranges=task_ranges, step=step
)
total_frames = 0
# like 'duration'
for ep in episodes_metadata.values():
duration_s = ep["length"]
total_frames += int(duration_s)
return total_frames
def convert_info(
root, new_root, data_file_size_in_mb, video_file_size_in_mb, meta_path, task_id: int, task_ranges, step
):
info = load_info(root)
info["codebase_version"] = "v3.0"
del info["total_videos"]
info["data_files_size_in_mb"] = data_file_size_in_mb
info["video_files_size_in_mb"] = video_file_size_in_mb
info["data_path"] = DEFAULT_DATA_PATH
info["video_path"] = DEFAULT_VIDEO_PATH if info["video_path"] is not None else None
info["fps"] = int(info["fps"])
for key in info["features"]:
if info["features"][key]["dtype"] == "video":
# already has fps in video_info
continue
info["features"][key]["fps"] = info["fps"]
info["total_episodes"] = get_total_episodes_task(root, task_id, task_ranges, step)
info["total_videos"] = info["total_episodes"] * NUM_CAMERAS
info["total_frames"] = get_total_frames_task(root, meta_path, task_id, task_ranges, step)
info["total_tasks"] = 1
write_info(info, new_root)
def load_jsonlines(fpath: Path) -> list[any]:
with jsonlines.open(fpath, "r") as reader:
return list(reader)
def legacy_load_tasks(local_dir: Path) -> tuple[dict, dict]:
tasks = load_jsonlines(local_dir / LEGACY_TASKS_PATH)
# return tasks dict such that
tasks = {item["task_index"]: item["task"] for item in sorted(tasks, key=lambda x: x["task_index"])}
task_to_task_index = {task: task_index for task_index, task in tasks.items()}
return tasks, task_to_task_index
def convert_tasks(root, new_root, task_id: int):
tasks, _ = legacy_load_tasks(root)
if task_id not in tasks:
raise ValueError(f"Task ID {task_id} not found in tasks (available: {list(tasks.keys())})")
tasks = {task_id: tasks[task_id]}
task_indices = tasks.keys()
task_strings = tasks.values()
df_tasks = pd.DataFrame({"task_index": task_indices}, index=task_strings)
write_tasks(df_tasks, new_root)
def concat_data_files(paths_to_cat, new_root, chunk_idx, file_idx, image_keys):
# TODO(rcadene): to save RAM use Dataset.from_parquet(file) and concatenate_datasets
dataframes = [pd.read_parquet(file) for file in paths_to_cat]
# Concatenate all DataFrames along rows
concatenated_df = pd.concat(dataframes, ignore_index=True)
path = new_root / DEFAULT_DATA_PATH.format(chunk_index=chunk_idx, file_index=file_idx)
path.parent.mkdir(parents=True, exist_ok=True)
if len(image_keys) > 0:
schema = pa.Schema.from_pandas(concatenated_df)
features = Features.from_arrow_schema(schema)
for key in image_keys:
features[key] = Image()
schema = features.arrow_schema
else:
schema = None
concatenated_df.to_parquet(path, index=False, schema=schema)
def get_image_keys(root):
info = load_info(root)
features = info["features"]
image_keys = [key for key, ft in features.items() if ft["dtype"] == "image"]
return image_keys
def convert_data(root: Path, new_root: Path, data_file_size_in_mb: int, task_index: int):
task_dir_name = f"task-00{task_index}"
data_dir = root / "data" / task_dir_name
ep_paths = sorted(data_dir.glob("*.parquet"))
image_keys = get_image_keys(root)
ep_idx = 0
chunk_idx = 0
file_idx = 0
size_in_mb = 0
num_frames = 0
paths_to_cat = []
episodes_metadata = []
logging.info(f"Converting data files from {len(ep_paths)} episodes")
for ep_path in tqdm.tqdm(ep_paths, desc="convert data files"):
ep_size_in_mb = get_parquet_file_size_in_mb(ep_path)
ep_num_frames = get_parquet_num_frames(ep_path)
ep_metadata = {
"episode_index": ep_idx,
"data/chunk_index": chunk_idx,
"data/file_index": file_idx,
"dataset_from_index": num_frames,
"dataset_to_index": num_frames + ep_num_frames,
}
size_in_mb += ep_size_in_mb
num_frames += ep_num_frames
episodes_metadata.append(ep_metadata)
ep_idx += 1
if size_in_mb < data_file_size_in_mb:
paths_to_cat.append(ep_path)
continue
if paths_to_cat:
concat_data_files(paths_to_cat, new_root, chunk_idx, file_idx, image_keys)
# Reset for the next file
size_in_mb = ep_size_in_mb
paths_to_cat = [ep_path]
chunk_idx, file_idx = update_chunk_file_indices(chunk_idx, file_idx, DEFAULT_CHUNK_SIZE)
# Write remaining data if any
if paths_to_cat:
concat_data_files(paths_to_cat, new_root, chunk_idx, file_idx, image_keys)
return episodes_metadata
def convert_videos_of_camera(
root: Path, new_root: Path, video_key: str, video_file_size_in_mb: int, task_index: int
):
# Access old paths to mp4
# videos_dir = root / "videos"
# ep_paths = sorted(videos_dir.glob(f"*/{video_key}/*.mp4"))
task_dir_name = f"task-00{task_index}"
videos_dir = root / "videos" / task_dir_name / video_key
ep_paths = sorted(videos_dir.glob("*.mp4"))
print("ep_paths", ep_paths)
ep_idx = 0
chunk_idx = 0
file_idx = 0
size_in_mb = 0
duration_in_s = 0.0
paths_to_cat = []
episodes_metadata = []
for ep_path in tqdm.tqdm(ep_paths, desc=f"convert videos of {video_key}"):
ep_size_in_mb = get_file_size_in_mb(ep_path)
ep_duration_in_s = get_video_duration_in_s(ep_path)
# Check if adding this episode would exceed the limit
if size_in_mb + ep_size_in_mb >= video_file_size_in_mb and len(paths_to_cat) > 0:
# Size limit would be exceeded, save current accumulation WITHOUT this episode
concatenate_video_files(
paths_to_cat,
new_root
/ DEFAULT_VIDEO_PATH.format(video_key=video_key, chunk_index=chunk_idx, file_index=file_idx),
)
# Update episodes metadata for the file we just saved
for i, _ in enumerate(paths_to_cat):
past_ep_idx = ep_idx - len(paths_to_cat) + i
episodes_metadata[past_ep_idx][f"videos/{video_key}/chunk_index"] = chunk_idx
episodes_metadata[past_ep_idx][f"videos/{video_key}/file_index"] = file_idx
# Move to next file and start fresh with current episode
chunk_idx, file_idx = update_chunk_file_indices(chunk_idx, file_idx, DEFAULT_CHUNK_SIZE)
size_in_mb = 0
duration_in_s = 0.0
paths_to_cat = []
# Add current episode metadata
ep_metadata = {
"episode_index": ep_idx,
f"videos/{video_key}/chunk_index": chunk_idx, # Will be updated when file is saved
f"videos/{video_key}/file_index": file_idx, # Will be updated when file is saved
f"videos/{video_key}/from_timestamp": duration_in_s,
f"videos/{video_key}/to_timestamp": duration_in_s + ep_duration_in_s,
}
episodes_metadata.append(ep_metadata)
# Add current episode to accumulation
paths_to_cat.append(ep_path)
size_in_mb += ep_size_in_mb
duration_in_s += ep_duration_in_s
ep_idx += 1
# Write remaining videos if any
if paths_to_cat:
concatenate_video_files(
paths_to_cat,
new_root
/ DEFAULT_VIDEO_PATH.format(video_key=video_key, chunk_index=chunk_idx, file_index=file_idx),
)
# Update episodes metadata for the final file
for i, _ in enumerate(paths_to_cat):
past_ep_idx = ep_idx - len(paths_to_cat) + i
episodes_metadata[past_ep_idx][f"videos/{video_key}/chunk_index"] = chunk_idx
episodes_metadata[past_ep_idx][f"videos/{video_key}/file_index"] = file_idx
return episodes_metadata
def get_video_keys(root):
info = load_info(root)
features = info["features"]
video_keys = [key for key, ft in features.items() if ft["dtype"] == "video"]
return video_keys
def convert_videos(root: Path, new_root: Path, video_file_size_in_mb: int, task_id: int):
logging.info(f"Converting videos from {root} to {new_root}")
video_keys = get_video_keys(root)
if len(video_keys) == 0:
return None
video_keys = sorted(video_keys)
eps_metadata_per_cam = []
for camera in video_keys:
eps_metadata = convert_videos_of_camera(root, new_root, camera, video_file_size_in_mb, task_id)
eps_metadata_per_cam.append(eps_metadata)
num_eps_per_cam = [len(eps_cam_map) for eps_cam_map in eps_metadata_per_cam]
if len(set(num_eps_per_cam)) != 1:
raise ValueError(f"All cams dont have same number of episodes ({num_eps_per_cam}).")
episods_metadata = []
num_cameras = len(video_keys)
num_episodes = num_eps_per_cam[0]
for ep_idx in tqdm.tqdm(range(num_episodes), desc="convert videos"):
# Sanity check
ep_ids = [eps_metadata_per_cam[cam_idx][ep_idx]["episode_index"] for cam_idx in range(num_cameras)]
ep_ids += [ep_idx]
if len(set(ep_ids)) != 1:
raise ValueError(f"All episode indices need to match ({ep_ids}).")
ep_dict = {}
for cam_idx in range(num_cameras):
ep_dict.update(eps_metadata_per_cam[cam_idx][ep_idx])
episods_metadata.append(ep_dict)
return episods_metadata
def infer_task_episode_ranges(episodes_jsonl_path: Path) -> dict:
"""
Parse the Behavior-1K episodes.jsonl metadata and infer contiguous episode ranges per unique task.
Returns a dict:
{ task_id: { "task_string": ..., "ep_start": ..., "ep_end": ... } }
"""
task_ranges = {}
task_id = 0
current_task_str = None
ep_start = None
ep_end = None
with open(episodes_jsonl_path) as f:
for line in f:
if not line.strip():
continue
ep = json.loads(line)
ep_idx = ep["episode_index"]
task_str = ep["tasks"][0] if ep["tasks"] else "UNKNOWN"
if current_task_str is None:
current_task_str = task_str
ep_start = ep_idx
ep_end = ep_idx
elif task_str == current_task_str:
ep_end = ep_idx
else:
# close previous task group
task_ranges[task_id] = {
"task_string": current_task_str,
"ep_start": ep_start,
"ep_end": ep_end,
}
task_id += 1
# start new one
current_task_str = task_str
ep_start = ep_idx
ep_end = ep_idx
# store last task
if current_task_str is not None:
task_ranges[task_id] = {
"task_string": current_task_str,
"ep_start": ep_start,
"ep_end": ep_end,
}
return task_ranges
def legacy_load_episodes_task(local_dir: Path, task_id: int, task_ranges: dict, step: int = 10) -> dict:
"""
Load only the episodes belonging to a specific task, inferred automatically from episode ranges.
Args:
local_dir (Path): Root path containing legacy meta/episodes.jsonl
task_id (int): Which task to load (key from the inferred task_ranges dict)
task_ranges (dict): Mapping from infer_task_episode_ranges()
step (int): Episode index step (Behavior-1K = 10)
"""
all_episodes = legacy_load_episodes(local_dir)
# get the range for this task
if task_id not in task_ranges:
raise ValueError(f"Task id {task_id} not found in task_ranges")
ep_start = task_ranges[task_id]["ep_start"]
ep_end = task_ranges[task_id]["ep_end"]
task_episode_indices = range(ep_start, ep_end + step, step)
return {i: all_episodes[i] for i in task_episode_indices if i in all_episodes}
def legacy_load_episodes(local_dir: Path) -> dict:
episodes = load_jsonlines(local_dir / LEGACY_EPISODES_PATH)
return {item["episode_index"]: item for item in sorted(episodes, key=lambda x: x["episode_index"])}
def legacy_load_episodes_stats(local_dir: Path) -> dict:
episodes_stats = load_jsonlines(local_dir / LEGACY_EPISODES_STATS_PATH)
return {
item["episode_index"]: cast_stats_to_numpy(item["stats"])
for item in sorted(episodes_stats, key=lambda x: x["episode_index"])
}
def legacy_load_episodes_stats_task(local_dir: Path, task_id: int, task_ranges: dict, step: int = 10) -> dict:
all_stats = legacy_load_episodes_stats(local_dir)
if task_id not in task_ranges:
raise ValueError(f"Task id {task_id} not found in task_ranges")
ep_start = task_ranges[task_id]["ep_start"]
ep_end = task_ranges[task_id]["ep_end"]
task_episode_indices = range(ep_start, ep_end + step, step)
return {i: all_stats[i] for i in task_episode_indices if i in all_stats}
def generate_episode_metadata_dict(
episodes_legacy_metadata, episodes_metadata, episodes_stats, episodes_videos=None
):
num_episodes = len(episodes_metadata)
episodes_legacy_metadata_vals = list(episodes_legacy_metadata.values())
episodes_stats_vals = list(episodes_stats.values())
episodes_stats_keys = list(episodes_stats.keys())
for i in range(num_episodes):
ep_legacy_metadata = episodes_legacy_metadata_vals[i]
ep_metadata = episodes_metadata[i]
ep_stats = episodes_stats_vals[i]
ep_ids_set = {
ep_legacy_metadata["episode_index"],
ep_metadata["episode_index"],
episodes_stats_keys[i],
}
if episodes_videos is None:
ep_video = {}
else:
ep_video = episodes_videos[i]
ep_ids_set.add(ep_video["episode_index"])
# we skip this check because ep_ids have a step of 10, whereas we convert with a step of 1
# if len(ep_ids_set) != 1:
# raise ValueError(f"Number of episodes is not the same ({ep_ids_set}).")
ep_dict = {**ep_metadata, **ep_video, **ep_legacy_metadata, **flatten_dict({"stats": ep_stats})}
ep_dict["meta/episodes/chunk_index"] = 0
ep_dict["meta/episodes/file_index"] = 0
yield ep_dict
def convert_episodes_metadata(
root, new_root, episodes_metadata, task_id: int, task_ranges, episodes_video_metadata=None
):
logging.info(f"Converting episodes metadata from {root} to {new_root}")
# filter by task
episodes_legacy_metadata = legacy_load_episodes_task(root, task_id=task_id, task_ranges=task_ranges)
episodes_stats = legacy_load_episodes_stats_task(root, task_id=task_id, task_ranges=task_ranges)
num_eps_set = {len(episodes_legacy_metadata), len(episodes_metadata)}
if episodes_video_metadata is not None:
num_eps_set.add(len(episodes_video_metadata))
if len(num_eps_set) != 1:
raise ValueError(f"Number of episodes is not the same ({num_eps_set}).")
ds_episodes = Dataset.from_generator(
lambda: generate_episode_metadata_dict(
episodes_legacy_metadata, episodes_metadata, episodes_stats, episodes_video_metadata
)
)
write_episodes(ds_episodes, new_root)
stats = aggregate_stats(list(episodes_stats.values()))
write_stats(stats, new_root)
def convert_dataset_local(
data_path: Path,
new_repo: Path,
task_id: int,
data_file_size_in_mb: int = DEFAULT_DATA_FILE_SIZE_IN_MB,
video_file_size_in_mb: int = DEFAULT_VIDEO_FILE_SIZE_IN_MB,
force_conversion: bool = False,
):
"""
Convert a local dataset to v3.x format, task-by-task, without using the Hugging Face Hub.
Args:
data_path (Path): path to local dataset root (e.g. /fsx/.../2025-challenge-demos)
new_repo (Path): path where converted dataset will be written (e.g. /fsx/.../behavior1k_v3)
task_id (int): which task to convert (index)
data_file_size_in_mb (int): max size per data chunk
video_file_size_in_mb (int): max size per video chunk
force_conversion (bool): overwrite existing conversion if True
"""
root = Path(data_path)
new_root = Path(new_repo)
# Clean up if needed
if new_root.exists() and force_conversion:
shutil.rmtree(new_root)
new_root.mkdir(parents=True, exist_ok=True)
print(f"🔹 Starting conversion for task {task_id}")
print(f"Input root: {root}")
print(f"Output root: {new_root}")
# Infer task episode ranges
episodes_meta_path = root / "meta" / "episodes.jsonl"
task_ranges = infer_task_episode_ranges(episodes_meta_path)
convert_info(
root,
new_root,
data_file_size_in_mb,
video_file_size_in_mb,
episodes_meta_path,
task_id,
task_ranges,
step=10,
)
convert_tasks(root, new_root, task_id)
episodes_metadata = convert_data(root, new_root, data_file_size_in_mb, task_index=task_id)
episodes_videos_metadata = convert_videos(root, new_root, video_file_size_in_mb, task_id=task_id)
convert_episodes_metadata(
root,
new_root,
episodes_metadata,
task_id=task_id,
task_ranges=task_ranges,
episodes_video_metadata=episodes_videos_metadata,
)
print(f"✅ Conversion complete for task {task_id}")
print(f"Converted dataset written to: {new_root}")
if __name__ == "__main__":
import argparse
from pathlib import Path
init_logging()
parser = argparse.ArgumentParser(
description="Convert Behavior-1K tasks to LeRobot v3 format (local only)"
)
parser.add_argument(
"--data-path",
type=str,
required=True,
help="Path to the local Behavior-1K dataset (e.g. /fsx/francesco_capuano/.cache/behavior-1k/2025-challenge-demos)",
)
parser.add_argument(
"--new-repo",
type=str,
required=True,
help="Path to the output directory for the converted dataset",
)
parser.add_argument(
"--task-id",
type=int,
required=True,
help="Task index to convert (e.g. 0, 1, 2, ...)",
)
parser.add_argument(
"--data-file-size-in-mb",
type=int,
default=DEFAULT_DATA_FILE_SIZE_IN_MB,
help=f"Maximum size per data chunk (default: {DEFAULT_DATA_FILE_SIZE_IN_MB})",
)
parser.add_argument(
"--video-file-size-in-mb",
type=int,
default=DEFAULT_VIDEO_FILE_SIZE_IN_MB,
help=f"Maximum size per video chunk (default: {DEFAULT_VIDEO_FILE_SIZE_IN_MB})",
)
parser.add_argument(
"--force-conversion",
action="store_true",
help="Force overwrite of existing conversion output if present.",
)
args = parser.parse_args()
convert_dataset_local(
data_path=Path(args.data_path),
new_repo=Path(args.new_repo),
task_id=args.task_id,
data_file_size_in_mb=args.data_file_size_in_mb,
video_file_size_in_mb=args.video_file_size_in_mb,
force_conversion=args.force_conversion,
)

View File

@@ -0,0 +1,130 @@
#!/usr/bin/env python
# Copyright 2025 The HuggingFace Inc. team. All rights reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
"""
Test script to verify BEHAVIOR-1K dataset loading with v3.0 wrapper.
"""
import argparse
import logging
from behavior_lerobot_dataset_v3 import BehaviorLeRobotDatasetV3
from lerobot.utils.utils import init_logging
init_logging()
def load_behavior1k_dataset(repo_id, root):
"""Test basic dataset loading."""
logging.info("=" * 80)
logging.info("Testing BEHAVIOR-1K dataset loading")
logging.info("=" * 80)
logging.info(f"\n1. Loading dataset with repo_id: {repo_id}")
dataset = BehaviorLeRobotDatasetV3(
repo_id=repo_id,
root=root,
modalities=["rgb"],
cameras=["head"],
chunk_streaming_using_keyframe=False,
check_timestamp_sync=False,
)
logging.info("\n2. Dataset loaded successfully!")
logging.info(f" - Number of episodes: {dataset.num_episodes}")
logging.info(f" - Number of frames: {dataset.num_frames}")
logging.info(f" - FPS: {dataset.fps}")
logging.info(f" - Features: {list(dataset.features)}")
return dataset
def load_behavior1k_dataset_with_multiple_modalities(repo_id, root):
"""Test loading multiple modalities and cameras."""
logging.info("\n" + "=" * 80)
logging.info("Testing multi-modality loading with repo_id: {repo_id}")
logging.info("=" * 80)
logging.info(f"\n1. Loading dataset with RGB + Depth with repo_id: {repo_id}")
dataset = BehaviorLeRobotDatasetV3(
repo_id=repo_id,
root=root,
modalities=["rgb", "depth"],
cameras=["head", "left_wrist", "right_wrist"],
chunk_streaming_using_keyframe=False,
check_timestamp_sync=False,
video_backend="pyav",
)
logging.info(f"\n2. Dataset loaded with modalities: {list(dataset.features)}")
logging.info(f" - Total features: {len(dataset.features)}")
rgb_keys = [k for k in dataset.features if "rgb" in k]
depth_keys = [k for k in dataset.features if "depth" in k]
logging.info(f" - RGB features: {rgb_keys}")
logging.info(f" - Depth features: {depth_keys}")
logging.info("\n3. SUCCESS! Multi-modality loading works.")
return dataset
def stream_behavior1k_dataset(repo_id, root):
"""Test chunk streaming mode."""
logging.info("\n" + "=" * 80)
logging.info("Testing chunk streaming mode")
logging.info("=" * 80)
logging.info("\n1. Loading dataset with chunk streaming...")
dataset = BehaviorLeRobotDatasetV3(
repo_id=repo_id,
root=root,
modalities=["rgb"],
cameras=["head"],
chunk_streaming_using_keyframe=True,
shuffle=True,
seed=42,
check_timestamp_sync=False,
)
logging.info("\n2. Dataset loaded in streaming mode")
logging.info(f" - Number of chunks: {len(dataset.chunks)}")
logging.info(f" - First chunk range: {dataset.chunks[0]}")
logging.info("\n3. Testing frame access in streaming mode...")
for i in range(min(3, len(dataset))):
frame = dataset[i]
logging.info(
f" - Frame {i}: episode_index={frame['episode_index'].item()}, "
f"task_index={frame['task_index'].item()}"
)
logging.info("\n4. SUCCESS! Chunk streaming works.")
return dataset
if __name__ == "__main__":
parser = argparse.ArgumentParser()
parser.add_argument("--repo-id", type=str, default=None)
parser.add_argument("--root", type=str, default=None)
args = parser.parse_args()
load_behavior1k_dataset(args.repo_id, args.root)
load_behavior1k_dataset_with_multiple_modalities(args.repo_id, args.root)
stream_behavior1k_dataset(args.repo_id, args.root)

View File

@@ -1,454 +0,0 @@
#!/usr/bin/env python3
"""
WBT (Whole Body Tracking) Dance Policy for Unitree G1
Uses ONNX model with motion data baked in.
Pattern matches gr00t_locomotion.py - uses UnitreeG1 robot class.
Usage:
python examples/unitree_g1/dance.py
"""
import argparse
import json
import logging
import threading
import time
from xml.etree import ElementTree
import numpy as np
import onnx
import onnxruntime as ort
import pinocchio as pin
from lerobot.robots.unitree_g1.config_unitree_g1 import UnitreeG1Config
from lerobot.robots.unitree_g1.unitree_g1 import UnitreeG1
logging.basicConfig(level=logging.INFO)
logger = logging.getLogger(__name__)
# =============================================================================
# CONFIGURATION
# =============================================================================
DANCE_ONNX_PATH = "examples/unitree_g1/fastsac_g1_29dof_dancing.onnx"
CONTROL_DT = 0.02 # 50 Hz
NUM_DOFS = 29
# Default joint positions (holosoma training defaults)
DEFAULT_DOF_POS = np.array([
-0.312, 0.0, 0.0, 0.669, -0.363, 0.0, # Left leg (6)
-0.312, 0.0, 0.0, 0.669, -0.363, 0.0, # Right leg (6)
0.0, 0.0, 0.0, # Waist (3)
0.2, 0.2, 0.0, 0.6, 0.0, 0.0, 0.0, # Left arm (7)
0.2, -0.2, 0.0, 0.6, 0.0, 0.0, 0.0, # Right arm (7)
], dtype=np.float32)
# Stiff hold KP/KD (for initialization)
STIFF_KP = np.array([
150, 150, 200, 200, 40, 40,
150, 150, 200, 200, 40, 40,
200, 200, 100,
100, 100, 100, 100, 50, 50, 50,
100, 100, 100, 100, 50, 50, 50,
], dtype=np.float32)
STIFF_KD = np.array([
2.5, 2.5, 2.5, 2.5, 2.5, 2.5,
2.5, 2.5, 2.5, 2.5, 2.5, 2.5,
5.0, 5.0, 5.0,
2.5, 2.5, 2.5, 2.5, 2.5, 2.5, 2.5,
2.5, 2.5, 2.5, 2.5, 2.5, 2.5, 2.5,
], dtype=np.float32)
# Joints to freeze at 0 with high KP
FROZEN_JOINTS = [13, 14, 20, 21, 27, 28]
FROZEN_KP = 500.0
FROZEN_KD = 5.0
# =============================================================================
# QUATERNION UTILITIES
# =============================================================================
def quat_inverse(q):
return np.concatenate((q[:, 0:1], -q[:, 1:]), axis=1)
def quat_mul(a, b):
a, b = a.reshape(-1, 4), b.reshape(-1, 4)
w1, x1, y1, z1 = a[..., 0], a[..., 1], a[..., 2], a[..., 3]
w2, x2, y2, z2 = b[..., 0], b[..., 1], b[..., 2], b[..., 3]
ww = (z1 + x1) * (x2 + y2)
yy = (w1 - y1) * (w2 + z2)
zz = (w1 + y1) * (w2 - z2)
xx = ww + yy + zz
qq = 0.5 * (xx + (z1 - x1) * (x2 - y2))
w = qq - ww + (z1 - y1) * (y2 - z2)
x = qq - xx + (x1 + w1) * (x2 + w2)
y = qq - yy + (w1 - x1) * (y2 + z2)
z = qq - zz + (z1 + y1) * (w2 - x2)
return np.stack([w, x, y, z]).T.reshape(a.shape)
def subtract_frame_transforms(q01, q02):
return quat_mul(quat_inverse(q01), q02)
def matrix_from_quat(q):
r, i, j, k = q[..., 0], q[..., 1], q[..., 2], q[..., 3]
two_s = 2.0 / (q * q).sum(-1)
o = np.stack((
1 - two_s * (j*j + k*k), two_s * (i*j - k*r), two_s * (i*k + j*r),
two_s * (i*j + k*r), 1 - two_s * (i*i + k*k), two_s * (j*k - i*r),
two_s * (i*k - j*r), two_s * (j*k + i*r), 1 - two_s * (i*i + j*j),
), -1)
return o.reshape(q.shape[:-1] + (3, 3))
def xyzw_to_wxyz(xyzw):
return np.concatenate([xyzw[:, -1:], xyzw[:, :3]], axis=1)
def quat_to_rpy(q):
w, x, y, z = q
roll = np.arctan2(2*(w*x + y*z), 1 - 2*(x**2 + y**2))
pitch = np.arcsin(np.clip(2*(w*y - z*x), -1, 1))
yaw = np.arctan2(2*(w*z + x*y), 1 - 2*(y**2 + z**2))
return roll, pitch, yaw
def rpy_to_quat(rpy):
roll, pitch, yaw = rpy
cy, sy = np.cos(yaw*0.5), np.sin(yaw*0.5)
cp, sp = np.cos(pitch*0.5), np.sin(pitch*0.5)
cr, sr = np.cos(roll*0.5), np.sin(roll*0.5)
return np.array([cr*cp*cy + sr*sp*sy, sr*cp*cy - cr*sp*sy,
cr*sp*cy + sr*cp*sy, cr*cp*sy - sr*sp*cy])
# =============================================================================
# PINOCCHIO FK
# =============================================================================
DOF_NAMES = (
"left_hip_pitch_joint", "left_hip_roll_joint", "left_hip_yaw_joint",
"left_knee_joint", "left_ankle_pitch_joint", "left_ankle_roll_joint",
"right_hip_pitch_joint", "right_hip_roll_joint", "right_hip_yaw_joint",
"right_knee_joint", "right_ankle_pitch_joint", "right_ankle_roll_joint",
"waist_yaw_joint", "waist_roll_joint", "waist_pitch_joint",
"left_shoulder_pitch_joint", "left_shoulder_roll_joint", "left_shoulder_yaw_joint", "left_elbow_joint",
"left_wrist_roll_joint", "left_wrist_pitch_joint", "left_wrist_yaw_joint",
"right_shoulder_pitch_joint", "right_shoulder_roll_joint", "right_shoulder_yaw_joint", "right_elbow_joint",
"right_wrist_roll_joint", "right_wrist_pitch_joint", "right_wrist_yaw_joint",
)
class PinocchioFK:
"""Pinocchio forward kinematics for torso_link orientation."""
def __init__(self, urdf_text: str):
root = ElementTree.fromstring(urdf_text)
for parent in root.iter():
for child in list(parent):
if child.tag.split("}")[-1] in {"visual", "collision"}:
parent.remove(child)
xml_text = '<?xml version="1.0"?>\n' + ElementTree.tostring(root, encoding="unicode")
self.model = pin.buildModelFromXML(xml_text, pin.JointModelFreeFlyer())
self.data = self.model.createData()
pin_names = [n for n in self.model.names if n not in ["universe", "root_joint"]]
self.idx_map = np.array([DOF_NAMES.index(n) for n in pin_names])
self.ref_frame_id = self.model.getFrameId("torso_link")
logger.info(f"Pinocchio FK: {len(pin_names)} joints, torso_link frame={self.ref_frame_id}")
def get_torso_quat(self, pos, quat_wxyz, dof_pos):
"""Get torso_link orientation in world frame."""
quat_xyzw = np.array([quat_wxyz[1], quat_wxyz[2], quat_wxyz[3], quat_wxyz[0]])
config = np.concatenate([pos, quat_xyzw, dof_pos[self.idx_map]])
pin.framesForwardKinematics(self.model, self.data, config)
coeffs = pin.Quaternion(self.data.oMf[self.ref_frame_id].rotation).coeffs()
return np.array([coeffs[3], coeffs[0], coeffs[1], coeffs[2]]).reshape(1, 4)
# =============================================================================
# DANCE CONTROLLER
# =============================================================================
class DanceController:
"""
Handles WBT dance policy for the Unitree G1 robot.
This controller manages:
- 29-joint observation processing
- Pinocchio FK for torso orientation
- Policy inference with motion data from ONNX
"""
def __init__(self, policy, robot, pinocchio_fk, motor_kp, motor_kd, action_scale):
self.policy = policy
self.robot = robot
self.pinocchio_fk = pinocchio_fk
self.motor_kp = motor_kp
self.motor_kd = motor_kd
self.action_scale = action_scale
self.obs_dim = policy.get_inputs()[0].shape[1]
self.last_action = np.zeros((1, NUM_DOFS), dtype=np.float32)
self.motion_command = None
self.ref_quat_xyzw = None
self.timestep = 0
self.yaw_offset = 0.0
# Get initial motion data from ONNX
dummy = np.zeros((1, self.obs_dim), dtype=np.float32)
outs = self.policy.run(["joint_pos", "joint_vel", "ref_quat_xyzw"],
{"obs": dummy, "time_step": np.array([[0]], dtype=np.float32)})
self.motion_command = np.concatenate(outs[0:2], axis=1)
self.ref_quat_xyzw = outs[2]
self.motion_start_pose = outs[0].flatten()
# Thread management
self.dance_running = False
self.dance_thread = None
logger.info(f"DanceController: obs_dim={self.obs_dim}, action_scale={action_scale}")
def capture_yaw_offset(self):
"""Capture robot's current yaw for relative tracking."""
robot_state = self.robot.lowstate_buffer.get_data()
if robot_state and self.pinocchio_fk:
quat = np.array(robot_state.imu_state.quaternion, dtype=np.float32)
dof = np.array([robot_state.motor_state[i].q for i in range(NUM_DOFS)], dtype=np.float32)
torso_q = self.pinocchio_fk.get_torso_quat(np.zeros(3), quat, dof)
_, _, self.yaw_offset = quat_to_rpy(torso_q.flatten())
logger.info(f"Captured yaw offset: {np.degrees(self.yaw_offset):.1f}°")
def _remove_yaw_offset(self, quat_wxyz):
"""Remove stored yaw offset from orientation."""
if abs(self.yaw_offset) < 1e-6:
return quat_wxyz
yaw_q = rpy_to_quat((0, 0, -self.yaw_offset)).reshape(1, 4)
return quat_mul(yaw_q, quat_wxyz)
def run_step(self):
"""Single dance step - reads state, runs policy, sends commands."""
robot_state = self.robot.lowstate_buffer.get_data()
if robot_state is None:
return
# Read robot state
quat = np.array(robot_state.imu_state.quaternion, dtype=np.float32)
ang_vel = np.array(robot_state.imu_state.gyroscope, dtype=np.float32)
dof_pos = np.array([robot_state.motor_state[i].q for i in range(NUM_DOFS)], dtype=np.float32)
dof_vel = np.array([robot_state.motor_state[i].dq for i in range(NUM_DOFS)], dtype=np.float32)
# Compute motion_ref_ori_b using FK
if self.pinocchio_fk:
torso_q = self.pinocchio_fk.get_torso_quat(np.zeros(3), quat, dof_pos)
torso_q = self._remove_yaw_offset(torso_q)
motion_ori = xyzw_to_wxyz(self.ref_quat_xyzw)
rel_quat = subtract_frame_transforms(torso_q, motion_ori)
ori_b = matrix_from_quat(rel_quat)[..., :2].reshape(1, -1)
else:
ori_b = np.zeros((1, 6), dtype=np.float32)
dof_rel = (dof_pos - DEFAULT_DOF_POS).reshape(1, -1)
# Build observation (alphabetical order)
obs_dict = {
"actions": self.last_action,
"base_ang_vel": ang_vel.reshape(1, 3),
"dof_pos": dof_rel,
"dof_vel": dof_vel.reshape(1, -1),
"motion_command": self.motion_command,
"motion_ref_ori_b": ori_b,
}
obs = np.concatenate([obs_dict[k].astype(np.float32) for k in sorted(obs_dict.keys())], axis=1)
obs = np.clip(obs, -100, 100)
# Run policy
outs = self.policy.run(["actions", "joint_pos", "joint_vel", "ref_quat_xyzw"],
{"obs": obs, "time_step": np.array([[self.timestep]], dtype=np.float32)})
action = np.clip(outs[0], -100, 100)
self.motion_command = np.concatenate(outs[1:3], axis=1)
self.ref_quat_xyzw = outs[3]
self.last_action = action.copy()
# Compute target positions
target_pos = DEFAULT_DOF_POS + action.flatten() * self.action_scale
# Send commands
for i in range(NUM_DOFS):
if i in FROZEN_JOINTS:
self.robot.msg.motor_cmd[i].q = 0.0
self.robot.msg.motor_cmd[i].kp = FROZEN_KP
self.robot.msg.motor_cmd[i].kd = FROZEN_KD
else:
self.robot.msg.motor_cmd[i].q = float(target_pos[i])
self.robot.msg.motor_cmd[i].kp = self.motor_kp[i]
self.robot.msg.motor_cmd[i].kd = self.motor_kd[i]
self.robot.msg.motor_cmd[i].qd = 0
self.robot.msg.motor_cmd[i].tau = 0
self.robot.send_action(self.robot.msg)
self.timestep += 1
def _dance_thread_loop(self):
"""Background thread that runs the dance policy."""
logger.info("Dance thread started")
while self.dance_running:
start_time = time.time()
try:
self.run_step()
except Exception as e:
logger.error(f"Error in dance loop: {e}")
import traceback
traceback.print_exc()
elapsed = time.time() - start_time
sleep_time = max(0, CONTROL_DT - elapsed)
time.sleep(sleep_time)
logger.info("Dance thread stopped")
def start_dance_thread(self):
"""Start the dance control thread."""
if self.dance_running:
logger.warning("Dance thread already running")
return
# Reset state for fresh start
self.timestep = 0
self.last_action.fill(0)
# Re-get initial motion data
dummy = np.zeros((1, self.obs_dim), dtype=np.float32)
outs = self.policy.run(["joint_pos", "joint_vel", "ref_quat_xyzw"],
{"obs": dummy, "time_step": np.array([[0]], dtype=np.float32)})
self.motion_command = np.concatenate(outs[0:2], axis=1)
self.ref_quat_xyzw = outs[2]
self.capture_yaw_offset()
logger.info("Starting dance control thread...")
self.dance_running = True
self.dance_thread = threading.Thread(target=self._dance_thread_loop, daemon=True)
self.dance_thread.start()
def stop_dance_thread(self):
"""Stop the dance control thread."""
if not self.dance_running:
return
logger.info("Stopping dance control thread...")
self.dance_running = False
if self.dance_thread:
self.dance_thread.join(timeout=2.0)
logger.info("Dance control thread stopped")
def reset_to_motion_pose(self, duration: float = 3.0):
"""Move robot to initial motion pose over given duration."""
logger.info(f"Moving to dance start pose ({duration}s)...")
robot_state = self.robot.lowstate_buffer.get_data()
init_pos = np.array([robot_state.motor_state[i].q for i in range(NUM_DOFS)], dtype=np.float32)
target_pos = self.motion_start_pose
num_steps = int(duration / CONTROL_DT)
for step in range(num_steps):
alpha = step / num_steps
interp = init_pos * (1 - alpha) + target_pos * alpha
for i in range(NUM_DOFS):
if i in FROZEN_JOINTS:
self.robot.msg.motor_cmd[i].q = 0.0
self.robot.msg.motor_cmd[i].kp = FROZEN_KP
self.robot.msg.motor_cmd[i].kd = FROZEN_KD
else:
self.robot.msg.motor_cmd[i].q = float(interp[i])
self.robot.msg.motor_cmd[i].kp = STIFF_KP[i]
self.robot.msg.motor_cmd[i].kd = STIFF_KD[i]
self.robot.msg.motor_cmd[i].qd = 0
self.robot.msg.motor_cmd[i].tau = 0
self.robot.msg.crc = self.robot.crc.Crc(self.robot.msg)
self.robot.lowcmd_publisher.Write(self.robot.msg)
time.sleep(CONTROL_DT)
logger.info("At dance start pose!")
# =============================================================================
# MAIN
# =============================================================================
def load_dance_policy(onnx_path: str):
"""Load dance policy and extract metadata."""
logger.info(f"Loading dance policy: {onnx_path}")
policy = ort.InferenceSession(onnx_path)
model = onnx.load(onnx_path)
metadata = {p.key: json.loads(p.value) for p in model.metadata_props}
motor_kp = np.array(metadata.get("kp", STIFF_KP), dtype=np.float32)
motor_kd = np.array(metadata.get("kd", STIFF_KD), dtype=np.float32)
action_scale = float(metadata.get("action_scale", 1.0))
urdf_text = metadata.get("robot_urdf", None)
logger.info(f" Obs dim: {policy.get_inputs()[0].shape[1]}")
logger.info(f" Action scale: {action_scale}")
logger.info(f" KP range: [{motor_kp.min():.1f}, {motor_kp.max():.1f}]")
# Build Pinocchio FK if URDF available
pinocchio_fk = None
if urdf_text:
logger.info(" Building Pinocchio FK from URDF...")
pinocchio_fk = PinocchioFK(urdf_text)
else:
logger.warning(" No URDF in metadata - FK will not work!")
return policy, pinocchio_fk, motor_kp, motor_kd, action_scale
def main():
parser = argparse.ArgumentParser(description="WBT Dance Policy for Unitree G1")
parser.add_argument("--onnx", type=str, default=DANCE_ONNX_PATH, help="Path to dance ONNX model")
parser.add_argument("--sim", action="store_true", help="Run in simulation mode")
args = parser.parse_args()
print("=" * 70)
print("💃 WBT DANCE POLICY")
print("=" * 70)
# Load policy
policy, pinocchio_fk, motor_kp, motor_kd, action_scale = load_dance_policy(args.onnx)
# Initialize robot
logger.info("Initializing robot...")
config = UnitreeG1Config()
robot = UnitreeG1(config)
logger.info("Robot connected!")
# Create controller
controller = DanceController(policy, robot, pinocchio_fk, motor_kp, motor_kd, action_scale)
try:
# Move to start pose
controller.reset_to_motion_pose(duration=3.0)
# Start dancing
controller.start_dance_thread()
logger.info("Dancing! Press Ctrl+C to stop.")
print("-" * 70)
# Log status periodically
while True:
time.sleep(2.0)
logger.info(f"timestep={controller.timestep}")
except KeyboardInterrupt:
print("\n\nStopping...")
finally:
controller.stop_dance_thread()
robot.disconnect()
print("\nDone!")
if __name__ == "__main__":
main()

View File

@@ -1,479 +0,0 @@
#!/usr/bin/env python
# Copyright 2025 The HuggingFace Inc. team. All rights reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
"""
Example: Holosoma Whole-Body Locomotion (23-DOF and 29-DOF)
This example demonstrates loading Holosoma whole-body locomotion policies
and running them on the Unitree G1 robot.
Supports both:
- 23-DOF native policies (82D observations, 23D actions)
- 29-DOF policies (100D observations, 29D actions)
"""
import argparse
import logging
import threading
import time
import numpy as np
import onnxruntime as ort
from huggingface_hub import hf_hub_download
from lerobot.robots.unitree_g1.config_unitree_g1 import UnitreeG1Config
from lerobot.robots.unitree_g1.unitree_g1 import UnitreeG1
logging.basicConfig(level=logging.INFO)
logger = logging.getLogger(__name__)
# =============================================================================
# 29-DOF Configuration
# =============================================================================
# fmt: off
HOLOSOMA_29DOF_DEFAULT_ANGLES = np.array([
-0.312, 0.0, 0.0, 0.669, -0.363, 0.0, # left leg
-0.312, 0.0, 0.0, 0.669, -0.363, 0.0, # right leg
0.0, 0.0, 0.0, # waist (yaw, roll, pitch)
0.2, 0.2, 0.0, 0.6, 0.0, 0.0, 0.0, # left arm
0.2, -0.2, 0.0, 0.6, 0.0, 0.0, 0.0, # right arm
], dtype=np.float32)
HOLOSOMA_29DOF_KP = np.array([
40.179238471, 99.098427777, 40.179238471, 99.098427777, 28.501246196, 28.501246196, # left leg
40.179238471, 99.098427777, 40.179238471, 99.098427777, 28.501246196, 28.501246196, # right leg
40.179238471, 28.501246196, 28.501246196, # waist
14.250623098, 14.250623098, 14.250623098, 14.250623098, 14.250623098, 16.778327481, 16.778327481, # left arm
14.250623098, 14.250623098, 14.250623098, 14.250623098, 14.250623098, 16.778327481, 16.778327481, # right arm
], dtype=np.float32)
HOLOSOMA_29DOF_KD = np.array([
2.557889765, 6.308801854, 2.557889765, 6.308801854, 1.814445687, 1.814445687, # left leg
2.557889765, 6.308801854, 2.557889765, 6.308801854, 1.814445687, 1.814445687, # right leg
2.557889765, 1.814445687, 1.814445687, # waist
0.907222843, 0.907222843, 0.907222843, 0.907222843, 0.907222843, 1.068141502, 1.068141502, # left arm
0.907222843, 0.907222843, 0.907222843, 0.907222843, 0.907222843, 1.068141502, 1.068141502, # right arm
], dtype=np.float32)
# =============================================================================
# 23-DOF Configuration (native G1-23: no waist_roll/pitch, no wrist_pitch/yaw)
# Derived from 29-DOF Holosoma values
# =============================================================================
# Joint order: 6 left leg, 6 right leg, 1 waist_yaw, 5 left arm, 5 right arm
HOLOSOMA_23DOF_DEFAULT_ANGLES = np.array([
-0.312, 0.0, 0.0, 0.669, -0.363, 0.0, # left leg (from 29-DOF)
-0.312, 0.0, 0.0, 0.669, -0.363, 0.0, # right leg (from 29-DOF)
0.0, # waist_yaw only (from 29-DOF)
0.2, 0.2, 0.0, 0.6, 0.0, # left arm first 5 joints (from 29-DOF)
0.2, -0.2, 0.0, 0.6, 0.0, # right arm first 5 joints (from 29-DOF)
], dtype=np.float32)
HOLOSOMA_23DOF_KP = np.array([
40.179238471, 99.098427777, 40.179238471, 99.098427777, 28.501246196, 28.501246196, # left leg
40.179238471, 99.098427777, 40.179238471, 99.098427777, 28.501246196, 28.501246196, # right leg
40.179238471, # waist_yaw
14.250623098, 14.250623098, 14.250623098, 14.250623098, 14.250623098, # left arm
14.250623098, 14.250623098, 14.250623098, 14.250623098, 14.250623098, # right arm
], dtype=np.float32)
HOLOSOMA_23DOF_KD = np.array([
2.557889765, 6.308801854, 2.557889765, 6.308801854, 1.814445687, 1.814445687, # left leg
2.557889765, 6.308801854, 2.557889765, 6.308801854, 1.814445687, 1.814445687, # right leg
2.557889765, # waist_yaw
0.907222843, 0.907222843, 0.907222843, 0.907222843, 0.907222843, # left arm
0.907222843, 0.907222843, 0.907222843, 0.907222843, 0.907222843, # right arm
], dtype=np.float32)
# Maps 23-DOF policy index → 29-DOF motor index
# 23-DOF: legs(0-11), waist_yaw(12), L_arm(13-17), R_arm(18-22)
# 29-DOF: legs(0-11), waist(12-14), L_arm(15-21), R_arm(22-28)
DOF_23_TO_MOTOR_MAP = [
0, 1, 2, 3, 4, 5, # left leg → motor 0-5
6, 7, 8, 9, 10, 11, # right leg → motor 6-11
12, # waist_yaw → motor 12
15, 16, 17, 18, 19, # left arm (skip wrist_pitch/yaw) → motor 15-19
22, 23, 24, 25, 26, # right arm (skip wrist_pitch/yaw) → motor 22-26
]
# fmt: on
# Control parameters
LOCOMOTION_CONTROL_DT = 0.02 # 50Hz
LOCOMOTION_ACTION_SCALE = 0.25
ANG_VEL_SCALE = 0.25
DOF_POS_SCALE = 1.0
DOF_VEL_SCALE = 0.05
GAIT_PERIOD = 1.0
DEFAULT_HOLOSOMA_REPO_ID = "nepyope/holosoma_locomotion"
def load_holosoma_policy(
repo_id: str = DEFAULT_HOLOSOMA_REPO_ID,
policy_name: str = "fastsac",
local_path: str | None = None,
) -> tuple[ort.InferenceSession, int]:
"""Load Holosoma policy and detect observation dimension.
Returns:
(policy, obs_dim) tuple where obs_dim is 82 (23-DOF) or 100 (29-DOF)
"""
if local_path is not None:
logger.info(f"Loading policy from local path: {local_path}")
policy_path = local_path
else:
logger.info(f"Loading policy from Hugging Face Hub: {repo_id}")
policy_path = hf_hub_download(repo_id=repo_id, filename=f"{policy_name}_g1_29dof.onnx")
policy = ort.InferenceSession(policy_path)
# Detect observation dimension from model input shape
input_shape = policy.get_inputs()[0].shape
obs_dim = input_shape[1] if len(input_shape) > 1 else input_shape[0]
logger.info(f"Policy loaded successfully")
logger.info(f" Input: {policy.get_inputs()[0].name}, shape: {input_shape} → obs_dim={obs_dim}")
logger.info(f" Output: {policy.get_outputs()[0].name}, shape: {policy.get_outputs()[0].shape}")
return policy, obs_dim
class HolosomaLocomotionController:
"""
Handles Holosoma whole-body locomotion for Unitree G1.
Supports both 23-DOF (82D obs) and 29-DOF (100D obs) policies.
"""
def __init__(self, policy, robot, config, obs_dim: int = 100):
self.policy = policy
self.robot = robot
self.config = config
self.obs_dim = obs_dim
# Detect policy type from observation dimension
self.is_23dof = (obs_dim == 82)
self.num_dof = 23 if self.is_23dof else 29
# Velocity commands
self.locomotion_cmd = np.array([0.0, 0.0, 0.0], dtype=np.float32)
# State variables sized for policy type
self.qj = np.zeros(self.num_dof, dtype=np.float32)
self.dqj = np.zeros(self.num_dof, dtype=np.float32)
self.locomotion_action = np.zeros(self.num_dof, dtype=np.float32)
self.locomotion_obs = np.zeros(obs_dim, dtype=np.float32)
self.last_unscaled_action = np.zeros(self.num_dof, dtype=np.float32)
# Select config based on DOF
if self.is_23dof:
self.default_angles = HOLOSOMA_23DOF_DEFAULT_ANGLES
self.kp = HOLOSOMA_23DOF_KP
self.kd = HOLOSOMA_23DOF_KD
self.motor_map = DOF_23_TO_MOTOR_MAP
else:
self.default_angles = HOLOSOMA_29DOF_DEFAULT_ANGLES
self.kp = HOLOSOMA_29DOF_KP
self.kd = HOLOSOMA_29DOF_KD
self.motor_map = list(range(29)) # Identity map for 29-DOF
# Phase state for gait
self.phase = np.zeros((1, 2), dtype=np.float32)
self.phase[0, 0] = 0.0
self.phase[0, 1] = np.pi
self.phase_dt = 2 * np.pi / (50.0 * GAIT_PERIOD)
self.is_standing = False
self.counter = 0
self.locomotion_running = False
self.locomotion_thread = None
logger.info(f"HolosomaLocomotionController initialized")
logger.info(f" Mode: {'23-DOF (82D obs)' if self.is_23dof else '29-DOF (100D obs)'}")
logger.info(f" Action dim: {self.num_dof}")
def holosoma_locomotion_run(self):
"""Main locomotion loop - handles both 23-DOF and 29-DOF."""
self.counter += 1
if self.counter == 1:
print("\n" + "=" * 60)
print(f"🚀 RUNNING HOLOSOMA {self.num_dof}-DOF LOCOMOTION POLICY")
print(f" {self.obs_dim}D observations → {self.num_dof}D actions")
print("=" * 60 + "\n")
robot_state = self.robot.get_observation()
if robot_state is None:
return
# Remote controller
if robot_state.wireless_remote is not None:
self.robot.remote_controller.set(robot_state.wireless_remote)
else:
self.robot.remote_controller.lx = 0.0
self.robot.remote_controller.ly = 0.0
self.robot.remote_controller.rx = 0.0
self.robot.remote_controller.ry = 0.0
# Deadzone
ly = self.robot.remote_controller.ly if abs(self.robot.remote_controller.ly) > 0.1 else 0.0
lx = self.robot.remote_controller.lx if abs(self.robot.remote_controller.lx) > 0.1 else 0.0
rx = self.robot.remote_controller.rx if abs(self.robot.remote_controller.rx) > 0.1 else 0.0
self.locomotion_cmd[0] = ly
self.locomotion_cmd[1] = -lx
self.locomotion_cmd[2] = -rx
# Read joint states using motor map
for i in range(self.num_dof):
motor_idx = self.motor_map[i]
self.qj[i] = robot_state.motor_state[motor_idx].q
self.dqj[i] = robot_state.motor_state[motor_idx].dq
# IMU
quat = robot_state.imu_state.quaternion
ang_vel = np.array(robot_state.imu_state.gyroscope, dtype=np.float32)
gravity_orientation = self.robot.get_gravity_orientation(quat)
# Scale observations
qj_obs = (self.qj - self.default_angles) * DOF_POS_SCALE
dqj_obs = self.dqj * DOF_VEL_SCALE
ang_vel_scaled = ang_vel * ANG_VEL_SCALE
# Phase update
cmd_norm = np.linalg.norm(self.locomotion_cmd[:2])
ang_cmd_norm = np.abs(self.locomotion_cmd[2])
if cmd_norm < 0.01 and ang_cmd_norm < 0.01:
self.phase[0, :] = np.pi * np.ones(2)
self.is_standing = True
elif self.is_standing:
self.phase = np.array([[0.0, np.pi]], dtype=np.float32)
self.is_standing = False
else:
phase_tp1 = self.phase + self.phase_dt
self.phase = np.fmod(phase_tp1 + np.pi, 2 * np.pi) - np.pi
sin_phase = np.sin(self.phase[0, :])
cos_phase = np.cos(self.phase[0, :])
# Build observation (format depends on DOF)
if self.is_23dof:
# 82D: [23 actions, 3 ang_vel, 1 cmd_yaw, 2 cmd_lin, 2 cos, 23 pos, 23 vel, 3 grav, 2 sin]
self.locomotion_obs[0:23] = self.last_unscaled_action
self.locomotion_obs[23:26] = ang_vel_scaled
self.locomotion_obs[26] = self.locomotion_cmd[2]
self.locomotion_obs[27:29] = self.locomotion_cmd[:2]
self.locomotion_obs[29:31] = cos_phase
self.locomotion_obs[31:54] = qj_obs
self.locomotion_obs[54:77] = dqj_obs
self.locomotion_obs[77:80] = gravity_orientation
self.locomotion_obs[80:82] = sin_phase
else:
# 100D: [29 actions, 3 ang_vel, 1 cmd_yaw, 2 cmd_lin, 2 cos, 29 pos, 29 vel, 3 grav, 2 sin]
self.locomotion_obs[0:29] = self.last_unscaled_action
self.locomotion_obs[29:32] = ang_vel_scaled
self.locomotion_obs[32] = self.locomotion_cmd[2]
self.locomotion_obs[33:35] = self.locomotion_cmd[:2]
self.locomotion_obs[35:37] = cos_phase
self.locomotion_obs[37:66] = qj_obs
self.locomotion_obs[66:95] = dqj_obs
self.locomotion_obs[95:98] = gravity_orientation
self.locomotion_obs[98:100] = sin_phase
# Policy inference
obs_input = self.locomotion_obs.reshape(1, -1).astype(np.float32)
ort_inputs = {self.policy.get_inputs()[0].name: obs_input}
ort_outs = self.policy.run(None, ort_inputs)
raw_action = ort_outs[0].squeeze()
clipped_action = np.clip(raw_action, -100.0, 100.0)
self.last_unscaled_action = clipped_action.copy()
self.locomotion_action = clipped_action * LOCOMOTION_ACTION_SCALE
# Debug
if self.counter <= 3:
print(f"\n[Holosoma Debug #{self.counter}]")
print(f" Phase: ({self.phase[0, 0]:.3f}, {self.phase[0, 1]:.3f})")
print(f" Cmd: ({self.locomotion_cmd[0]:.2f}, {self.locomotion_cmd[1]:.2f}, {self.locomotion_cmd[2]:.2f})")
print(f" Action range: [{raw_action.min():.3f}, {raw_action.max():.3f}]")
# Compute target positions
target_dof_pos = self.default_angles + self.locomotion_action
# Send commands to motors via motor map
for i in range(self.num_dof):
motor_idx = self.motor_map[i]
self.robot.msg.motor_cmd[motor_idx].q = target_dof_pos[i]
self.robot.msg.motor_cmd[motor_idx].qd = 0
self.robot.msg.motor_cmd[motor_idx].kp = self.kp[i]
self.robot.msg.motor_cmd[motor_idx].kd = self.kd[i]
self.robot.msg.motor_cmd[motor_idx].tau = 0
# For 23-DOF: zero out missing joints (waist_roll/pitch, wrist_pitch/yaw)
if self.is_23dof:
missing_motors = [13, 14, 20, 21, 27, 28] # waist_roll, waist_pitch, wrist_pitch/yaw
for motor_idx in missing_motors:
self.robot.msg.motor_cmd[motor_idx].q = 0.0
self.robot.msg.motor_cmd[motor_idx].qd = 0
self.robot.msg.motor_cmd[motor_idx].kp = 40.0
self.robot.msg.motor_cmd[motor_idx].kd = 2.0
self.robot.msg.motor_cmd[motor_idx].tau = 0
self.robot.send_action(self.robot.msg)
def _locomotion_thread_loop(self):
logger.info("Locomotion thread started")
while self.locomotion_running:
start_time = time.time()
try:
self.holosoma_locomotion_run()
except Exception as e:
logger.error(f"Error in locomotion loop: {e}")
import traceback
traceback.print_exc()
elapsed = time.time() - start_time
sleep_time = max(0, LOCOMOTION_CONTROL_DT - elapsed)
time.sleep(sleep_time)
logger.info("Locomotion thread stopped")
def start_locomotion_thread(self):
if self.locomotion_running:
logger.warning("Locomotion thread already running")
return
logger.info("Starting locomotion control thread...")
self.locomotion_running = True
self.locomotion_thread = threading.Thread(target=self._locomotion_thread_loop, daemon=True)
self.locomotion_thread.start()
logger.info("Locomotion control thread started!")
def stop_locomotion_thread(self):
if not self.locomotion_running:
return
logger.info("Stopping locomotion control thread...")
self.locomotion_running = False
if self.locomotion_thread:
self.locomotion_thread.join(timeout=2.0)
logger.info("Locomotion control thread stopped")
def reset_robot(self):
"""Move joints to default position."""
logger.info(f"Moving {self.num_dof} joints to default position...")
total_time = 3.0
num_step = int(total_time / self.robot.control_dt)
robot_state = self.robot.get_observation()
# Record current positions
init_dof_pos = np.zeros(self.num_dof, dtype=np.float32)
for i in range(self.num_dof):
motor_idx = self.motor_map[i]
init_dof_pos[i] = robot_state.motor_state[motor_idx].q
# Interpolate to target
for step in range(num_step):
alpha = step / num_step
for i in range(self.num_dof):
motor_idx = self.motor_map[i]
target = self.default_angles[i]
self.robot.msg.motor_cmd[motor_idx].q = init_dof_pos[i] * (1 - alpha) + target * alpha
self.robot.msg.motor_cmd[motor_idx].qd = 0
self.robot.msg.motor_cmd[motor_idx].kp = self.kp[i]
self.robot.msg.motor_cmd[motor_idx].kd = self.kd[i]
self.robot.msg.motor_cmd[motor_idx].tau = 0
# Zero missing joints for 23-DOF
if self.is_23dof:
for motor_idx in [13, 14, 20, 21, 27, 28]:
self.robot.msg.motor_cmd[motor_idx].q = 0.0
self.robot.msg.motor_cmd[motor_idx].qd = 0
self.robot.msg.motor_cmd[motor_idx].kp = 40.0
self.robot.msg.motor_cmd[motor_idx].kd = 2.0
self.robot.msg.motor_cmd[motor_idx].tau = 0
self.robot.msg.crc = self.robot.crc.Crc(self.robot.msg)
self.robot.lowcmd_publisher.Write(self.robot.msg)
time.sleep(self.robot.control_dt)
logger.info(f"Reached default position ({self.num_dof} joints)")
# Hold for 2 seconds
logger.info("Holding default position for 2 seconds...")
hold_steps = int(2.0 / self.robot.control_dt)
for _ in range(hold_steps):
for i in range(self.num_dof):
motor_idx = self.motor_map[i]
self.robot.msg.motor_cmd[motor_idx].q = self.default_angles[i]
self.robot.msg.motor_cmd[motor_idx].qd = 0
self.robot.msg.motor_cmd[motor_idx].kp = self.kp[i]
self.robot.msg.motor_cmd[motor_idx].kd = self.kd[i]
self.robot.msg.motor_cmd[motor_idx].tau = 0
if self.is_23dof:
for motor_idx in [13, 14, 20, 21, 27, 28]:
self.robot.msg.motor_cmd[motor_idx].q = 0.0
self.robot.msg.motor_cmd[motor_idx].qd = 0
self.robot.msg.motor_cmd[motor_idx].kp = 40.0
self.robot.msg.motor_cmd[motor_idx].kd = 2.0
self.robot.msg.motor_cmd[motor_idx].tau = 0
self.robot.msg.crc = self.robot.crc.Crc(self.robot.msg)
self.robot.lowcmd_publisher.Write(self.robot.msg)
time.sleep(self.robot.control_dt)
logger.info("Ready to start locomotion!")
if __name__ == "__main__":
parser = argparse.ArgumentParser(description="Holosoma Locomotion Controller for Unitree G1")
parser.add_argument("--repo-id", type=str, default=DEFAULT_HOLOSOMA_REPO_ID)
parser.add_argument("--policy", type=str, default="fastsac", choices=["fastsac", "ppo"])
parser.add_argument("--local-path", type=str, default=None, help="Path to local ONNX file")
args = parser.parse_args()
# Load policy and detect dimensions
policy, obs_dim = load_holosoma_policy(
repo_id=args.repo_id,
policy_name=args.policy,
local_path=args.local_path,
)
# Initialize robot
config = UnitreeG1Config()
robot = UnitreeG1(config)
# Initialize controller with detected obs_dim
controller = HolosomaLocomotionController(
policy=policy,
robot=robot,
config=config,
obs_dim=obs_dim,
)
try:
#controller.reset_robot()
controller.start_locomotion_thread()
logger.info(f"Robot initialized with Holosoma {'23-DOF' if obs_dim == 82 else '29-DOF'} policy")
logger.info("Use remote controller: LY=fwd/back, LX=left/right, RX=rotate")
logger.info("Press Ctrl+C to stop")
while True:
time.sleep(1.0)
except KeyboardInterrupt:
print("\nStopping locomotion...")
controller.stop_locomotion_thread()
print("Done!")

View File

@@ -1,607 +0,0 @@
#!/usr/bin/env python3
"""
Locomotion ↔ Dance Toggle for Unitree G1
Press Enter to instantly switch between locomotion and dance modes.
- Starts in LOCOMOTION mode (joystick control)
- Press Enter → DANCE mode (resets to frame 0)
- Press Enter → LOCOMOTION mode
- Repeat...
Auto-recovery feature:
- If robot tilts beyond threshold during dance, auto-switches to locomotion
- When robot recovers (tilt below recovery threshold), resumes dance from where it left off
Usage:
python examples/unitree_g1/locomotion_to_dance.py
python examples/unitree_g1/locomotion_to_dance.py --tilt-threshold 25 --recovery-threshold 10
"""
import argparse
import json
import logging
import select
import sys
import threading
import time
from xml.etree import ElementTree
import numpy as np
import onnx
import onnxruntime as ort
import pinocchio as pin
from huggingface_hub import hf_hub_download
from lerobot.robots.unitree_g1.config_unitree_g1 import UnitreeG1Config
from lerobot.robots.unitree_g1.unitree_g1 import UnitreeG1
logging.basicConfig(level=logging.INFO)
logger = logging.getLogger(__name__)
# =============================================================================
# CONFIGURATION
# =============================================================================
NUM_DOFS = 29
CONTROL_DT = 0.02 # 50Hz
# Locomotion config
DEFAULT_HOLOSOMA_REPO_ID = "nepyope/holosoma_locomotion"
LOCOMOTION_ACTION_SCALE = 0.25
ANG_VEL_SCALE = 0.25
DOF_POS_SCALE = 1.0
DOF_VEL_SCALE = 0.05
GAIT_PERIOD = 1.0
# Dance config
DANCE_ONNX_PATH = "examples/unitree_g1/fastsac_g1_29dof_dancing.onnx"
FROZEN_JOINTS = [13, 14, 20, 21, 27, 28]
FROZEN_KP = 500.0
FROZEN_KD = 5.0
# fmt: off
# 29-DOF defaults (holosoma training)
DEFAULT_29DOF_ANGLES = np.array([
-0.312, 0.0, 0.0, 0.669, -0.363, 0.0, # left leg
-0.312, 0.0, 0.0, 0.669, -0.363, 0.0, # right leg
0.0, 0.0, 0.0, # waist
0.2, 0.2, 0.0, 0.6, 0.0, 0.0, 0.0, # left arm
0.2, -0.2, 0.0, 0.6, 0.0, 0.0, 0.0, # right arm
], dtype=np.float32)
DEFAULT_29DOF_KP = np.array([
40.179, 99.098, 40.179, 99.098, 28.501, 28.501,
40.179, 99.098, 40.179, 99.098, 28.501, 28.501,
40.179, 28.501, 28.501,
14.251, 14.251, 14.251, 14.251, 14.251, 16.778, 16.778,
14.251, 14.251, 14.251, 14.251, 14.251, 16.778, 16.778,
], dtype=np.float32)
DEFAULT_29DOF_KD = np.array([
2.558, 6.309, 2.558, 6.309, 1.814, 1.814,
2.558, 6.309, 2.558, 6.309, 1.814, 1.814,
2.558, 1.814, 1.814,
0.907, 0.907, 0.907, 0.907, 0.907, 1.068, 1.068,
0.907, 0.907, 0.907, 0.907, 0.907, 1.068, 1.068,
], dtype=np.float32)
# 23-DOF config (no waist_roll/pitch, no wrist_pitch/yaw)
DEFAULT_23DOF_ANGLES = np.array([
-0.312, 0.0, 0.0, 0.669, -0.363, 0.0, # left leg
-0.312, 0.0, 0.0, 0.669, -0.363, 0.0, # right leg
0.0, # waist_yaw only
0.2, 0.2, 0.0, 0.6, 0.0, # left arm (5 joints)
0.2, -0.2, 0.0, 0.6, 0.0, # right arm (5 joints)
], dtype=np.float32)
DEFAULT_23DOF_KP = np.array([
40.179, 99.098, 40.179, 99.098, 28.501, 28.501,
40.179, 99.098, 40.179, 99.098, 28.501, 28.501,
40.179,
14.251, 14.251, 14.251, 14.251, 14.251,
14.251, 14.251, 14.251, 14.251, 14.251,
], dtype=np.float32)
DEFAULT_23DOF_KD = np.array([
2.558, 6.309, 2.558, 6.309, 1.814, 1.814,
2.558, 6.309, 2.558, 6.309, 1.814, 1.814,
2.558,
0.907, 0.907, 0.907, 0.907, 0.907,
0.907, 0.907, 0.907, 0.907, 0.907,
], dtype=np.float32)
# 23-DOF policy index → 29-DOF motor index
DOF_23_TO_MOTOR = [
0, 1, 2, 3, 4, 5, # left leg
6, 7, 8, 9, 10, 11, # right leg
12, # waist_yaw
15, 16, 17, 18, 19, # left arm (skip wrist_pitch/yaw)
22, 23, 24, 25, 26, # right arm (skip wrist_pitch/yaw)
]
MISSING_23DOF_MOTORS = [13, 14, 20, 21, 27, 28]
# fmt: on
# =============================================================================
# QUATERNION UTILITIES
# =============================================================================
def quat_inverse(q):
return np.concatenate((q[:, 0:1], -q[:, 1:]), axis=1)
def quat_mul(a, b):
a, b = a.reshape(-1, 4), b.reshape(-1, 4)
w1, x1, y1, z1 = a[..., 0], a[..., 1], a[..., 2], a[..., 3]
w2, x2, y2, z2 = b[..., 0], b[..., 1], b[..., 2], b[..., 3]
ww = (z1 + x1) * (x2 + y2)
yy = (w1 - y1) * (w2 + z2)
zz = (w1 + y1) * (w2 - z2)
xx = ww + yy + zz
qq = 0.5 * (xx + (z1 - x1) * (x2 - y2))
w = qq - ww + (z1 - y1) * (y2 - z2)
x = qq - xx + (x1 + w1) * (x2 + w2)
y = qq - yy + (w1 - x1) * (y2 + z2)
z = qq - zz + (z1 + y1) * (w2 - x2)
return np.stack([w, x, y, z]).T.reshape(a.shape)
def subtract_frame_transforms(q01, q02):
return quat_mul(quat_inverse(q01), q02)
def matrix_from_quat(q):
r, i, j, k = q[..., 0], q[..., 1], q[..., 2], q[..., 3]
two_s = 2.0 / (q * q).sum(-1)
o = np.stack((
1 - two_s * (j*j + k*k), two_s * (i*j - k*r), two_s * (i*k + j*r),
two_s * (i*j + k*r), 1 - two_s * (i*i + k*k), two_s * (j*k - i*r),
two_s * (i*k - j*r), two_s * (j*k + i*r), 1 - two_s * (i*i + j*j),
), -1)
return o.reshape(q.shape[:-1] + (3, 3))
def xyzw_to_wxyz(xyzw):
return np.concatenate([xyzw[:, -1:], xyzw[:, :3]], axis=1)
def quat_to_rpy(q):
w, x, y, z = q
roll = np.arctan2(2*(w*x + y*z), 1 - 2*(x**2 + y**2))
pitch = np.arcsin(np.clip(2*(w*y - z*x), -1, 1))
yaw = np.arctan2(2*(w*z + x*y), 1 - 2*(y**2 + z**2))
return roll, pitch, yaw
def rpy_to_quat(rpy):
roll, pitch, yaw = rpy
cy, sy = np.cos(yaw*0.5), np.sin(yaw*0.5)
cp, sp = np.cos(pitch*0.5), np.sin(pitch*0.5)
cr, sr = np.cos(roll*0.5), np.sin(roll*0.5)
return np.array([cr*cp*cy + sr*sp*sy, sr*cp*cy - cr*sp*sy,
cr*sp*cy + sr*cp*sy, cr*cp*sy - sr*sp*cy])
# =============================================================================
# PINOCCHIO FK
# =============================================================================
DOF_NAMES = (
"left_hip_pitch_joint", "left_hip_roll_joint", "left_hip_yaw_joint",
"left_knee_joint", "left_ankle_pitch_joint", "left_ankle_roll_joint",
"right_hip_pitch_joint", "right_hip_roll_joint", "right_hip_yaw_joint",
"right_knee_joint", "right_ankle_pitch_joint", "right_ankle_roll_joint",
"waist_yaw_joint", "waist_roll_joint", "waist_pitch_joint",
"left_shoulder_pitch_joint", "left_shoulder_roll_joint", "left_shoulder_yaw_joint", "left_elbow_joint",
"left_wrist_roll_joint", "left_wrist_pitch_joint", "left_wrist_yaw_joint",
"right_shoulder_pitch_joint", "right_shoulder_roll_joint", "right_shoulder_yaw_joint", "right_elbow_joint",
"right_wrist_roll_joint", "right_wrist_pitch_joint", "right_wrist_yaw_joint",
)
class PinocchioFK:
def __init__(self, urdf_text: str):
root = ElementTree.fromstring(urdf_text)
for parent in root.iter():
for child in list(parent):
if child.tag.split("}")[-1] in {"visual", "collision"}:
parent.remove(child)
xml_text = '<?xml version="1.0"?>\n' + ElementTree.tostring(root, encoding="unicode")
self.model = pin.buildModelFromXML(xml_text, pin.JointModelFreeFlyer())
self.data = self.model.createData()
pin_names = [n for n in self.model.names if n not in ["universe", "root_joint"]]
self.idx_map = np.array([DOF_NAMES.index(n) for n in pin_names])
self.ref_frame_id = self.model.getFrameId("torso_link")
def get_torso_quat(self, pos, quat_wxyz, dof_pos):
quat_xyzw = np.array([quat_wxyz[1], quat_wxyz[2], quat_wxyz[3], quat_wxyz[0]])
config = np.concatenate([pos, quat_xyzw, dof_pos[self.idx_map]])
pin.framesForwardKinematics(self.model, self.data, config)
coeffs = pin.Quaternion(self.data.oMf[self.ref_frame_id].rotation).coeffs()
return np.array([coeffs[3], coeffs[0], coeffs[1], coeffs[2]]).reshape(1, 4)
def get_torso_tilt(self, pos, quat_wxyz, dof_pos):
"""Get torso tilt angle from upright (degrees). Uses roll and pitch."""
torso_q = self.get_torso_quat(pos, quat_wxyz, dof_pos)
roll, pitch, _ = quat_to_rpy(torso_q.flatten())
# Tilt is the angle from vertical - combine roll and pitch
tilt_rad = np.sqrt(roll**2 + pitch**2)
return np.degrees(tilt_rad), np.degrees(roll), np.degrees(pitch)
# =============================================================================
# LOCOMOTION CONTROLLER
# =============================================================================
class LocomotionController:
"""Holosoma whole-body locomotion (23-DOF or 29-DOF)."""
def __init__(self, policy, robot, obs_dim: int):
self.policy = policy
self.robot = robot
self.obs_dim = obs_dim
# Detect DOF mode
self.is_23dof = (obs_dim == 82)
self.num_dof = 23 if self.is_23dof else 29
if self.is_23dof:
self.default_angles = DEFAULT_23DOF_ANGLES
self.kp = DEFAULT_23DOF_KP
self.kd = DEFAULT_23DOF_KD
self.motor_map = DOF_23_TO_MOTOR
logger.info("Locomotion: 23-DOF (82D obs)")
else:
self.default_angles = DEFAULT_29DOF_ANGLES
self.kp = DEFAULT_29DOF_KP
self.kd = DEFAULT_29DOF_KD
self.motor_map = list(range(29))
logger.info("Locomotion: 29-DOF (100D obs)")
self.cmd = np.zeros(3, dtype=np.float32)
self.qj = np.zeros(self.num_dof, dtype=np.float32)
self.dqj = np.zeros(self.num_dof, dtype=np.float32)
self.obs = np.zeros(obs_dim, dtype=np.float32)
self.last_action = np.zeros(self.num_dof, dtype=np.float32)
self.phase = np.array([[0.0, np.pi]], dtype=np.float32)
self.phase_dt = 2 * np.pi / (50.0 * GAIT_PERIOD)
self.is_standing = True
def run_step(self):
"""Single locomotion step."""
state = self.robot.lowstate_buffer.get_data()
if state is None:
return
# Joystick
if state.wireless_remote is not None:
self.robot.remote_controller.set(state.wireless_remote)
ly = self.robot.remote_controller.ly if abs(self.robot.remote_controller.ly) > 0.1 else 0.0
lx = self.robot.remote_controller.lx if abs(self.robot.remote_controller.lx) > 0.1 else 0.0
rx = self.robot.remote_controller.rx if abs(self.robot.remote_controller.rx) > 0.1 else 0.0
self.cmd[0], self.cmd[1], self.cmd[2] = ly, -lx, -rx
# Read joints via motor map
for i in range(self.num_dof):
self.qj[i] = state.motor_state[self.motor_map[i]].q
self.dqj[i] = state.motor_state[self.motor_map[i]].dq
# IMU
quat = state.imu_state.quaternion
ang_vel = np.array(state.imu_state.gyroscope, dtype=np.float32)
gravity = self.robot.get_gravity_orientation(quat)
# Scale
qj_obs = (self.qj - self.default_angles) * DOF_POS_SCALE
dqj_obs = self.dqj * DOF_VEL_SCALE
ang_vel_s = ang_vel * ANG_VEL_SCALE
# Phase
cmd_mag = np.linalg.norm(self.cmd[:2])
ang_mag = abs(self.cmd[2])
if cmd_mag < 0.01 and ang_mag < 0.01:
self.phase[0, :] = np.pi
self.is_standing = True
elif self.is_standing:
self.phase = np.array([[0.0, np.pi]], dtype=np.float32)
self.is_standing = False
else:
self.phase = np.fmod(self.phase + self.phase_dt + np.pi, 2*np.pi) - np.pi
sin_ph, cos_ph = np.sin(self.phase[0]), np.cos(self.phase[0])
# Build obs
if self.is_23dof:
self.obs[0:23] = self.last_action
self.obs[23:26] = ang_vel_s
self.obs[26] = self.cmd[2]
self.obs[27:29] = self.cmd[:2]
self.obs[29:31] = cos_ph
self.obs[31:54] = qj_obs
self.obs[54:77] = dqj_obs
self.obs[77:80] = gravity
self.obs[80:82] = sin_ph
else:
self.obs[0:29] = self.last_action
self.obs[29:32] = ang_vel_s
self.obs[32] = self.cmd[2]
self.obs[33:35] = self.cmd[:2]
self.obs[35:37] = cos_ph
self.obs[37:66] = qj_obs
self.obs[66:95] = dqj_obs
self.obs[95:98] = gravity
self.obs[98:100] = sin_ph
# Inference
obs_in = self.obs.reshape(1, -1).astype(np.float32)
ort_in = {self.policy.get_inputs()[0].name: obs_in}
raw_action = self.policy.run(None, ort_in)[0].squeeze()
clipped = np.clip(raw_action, -100.0, 100.0)
self.last_action = clipped.copy()
scaled = clipped * LOCOMOTION_ACTION_SCALE
target = self.default_angles + scaled
# Send commands
for i in range(self.num_dof):
motor_idx = self.motor_map[i]
self.robot.msg.motor_cmd[motor_idx].q = float(target[i])
self.robot.msg.motor_cmd[motor_idx].qd = 0
self.robot.msg.motor_cmd[motor_idx].kp = self.kp[i]
self.robot.msg.motor_cmd[motor_idx].kd = self.kd[i]
self.robot.msg.motor_cmd[motor_idx].tau = 0
# Zero missing joints for 23-DOF
if self.is_23dof:
for idx in MISSING_23DOF_MOTORS:
self.robot.msg.motor_cmd[idx].q = 0.0
self.robot.msg.motor_cmd[idx].qd = 0
self.robot.msg.motor_cmd[idx].kp = 40.0
self.robot.msg.motor_cmd[idx].kd = 2.0
self.robot.msg.motor_cmd[idx].tau = 0
self.robot.send_action(self.robot.msg)
def reset(self):
"""Reset state for fresh start."""
self.last_action.fill(0)
self.phase = np.array([[0.0, np.pi]], dtype=np.float32)
self.is_standing = True
# =============================================================================
# DANCE CONTROLLER
# =============================================================================
class DanceController:
"""WBT dance policy with FK for torso tracking."""
def __init__(self, policy, robot, pinocchio_fk, motor_kp, motor_kd, action_scale):
self.policy = policy
self.robot = robot
self.pinocchio_fk = pinocchio_fk
self.motor_kp = motor_kp
self.motor_kd = motor_kd
self.action_scale = action_scale
self.obs_dim = policy.get_inputs()[0].shape[1]
self.last_action = np.zeros((1, NUM_DOFS), dtype=np.float32)
self.motion_command = None
self.ref_quat_xyzw = None
self.timestep = 0
self.yaw_offset = 0.0
logger.info(f"Dance: obs_dim={self.obs_dim}, action_scale={action_scale}")
def initialize(self, reset_to_frame_0: bool = True):
"""Initialize dance. If reset_to_frame_0=True, starts from frame 0. Otherwise resumes."""
if reset_to_frame_0:
self.timestep = 0
self.last_action.fill(0)
# Get initial motion data at frame 0
dummy = np.zeros((1, self.obs_dim), dtype=np.float32)
outs = self.policy.run(["joint_pos", "joint_vel", "ref_quat_xyzw"],
{"obs": dummy, "time_step": np.array([[0]], dtype=np.float32)})
self.motion_command = np.concatenate(outs[0:2], axis=1)
self.ref_quat_xyzw = outs[2]
logger.info("Dance: reset to frame 0")
else:
# Resume from current timestep - just update motion command for current frame
dummy = np.zeros((1, self.obs_dim), dtype=np.float32)
outs = self.policy.run(["joint_pos", "joint_vel", "ref_quat_xyzw"],
{"obs": dummy, "time_step": np.array([[self.timestep]], dtype=np.float32)})
self.motion_command = np.concatenate(outs[0:2], axis=1)
self.ref_quat_xyzw = outs[2]
logger.info(f"Dance: resuming from frame {self.timestep}")
# Capture yaw offset
state = self.robot.lowstate_buffer.get_data()
if state and self.pinocchio_fk:
quat = np.array(state.imu_state.quaternion, dtype=np.float32)
dof = np.array([state.motor_state[i].q for i in range(NUM_DOFS)], dtype=np.float32)
torso_q = self.pinocchio_fk.get_torso_quat(np.zeros(3), quat, dof)
_, _, self.yaw_offset = quat_to_rpy(torso_q.flatten())
logger.info(f"Dance yaw offset: {np.degrees(self.yaw_offset):.1f}°")
def _remove_yaw_offset(self, quat_wxyz):
if abs(self.yaw_offset) < 1e-6:
return quat_wxyz
yaw_q = rpy_to_quat((0, 0, -self.yaw_offset)).reshape(1, 4)
return quat_mul(yaw_q, quat_wxyz)
def run_step(self):
"""Single dance step."""
state = self.robot.lowstate_buffer.get_data()
if state is None:
return
quat = np.array(state.imu_state.quaternion, dtype=np.float32)
ang_vel = np.array(state.imu_state.gyroscope, dtype=np.float32)
dof_pos = np.array([state.motor_state[i].q for i in range(NUM_DOFS)], dtype=np.float32)
dof_vel = np.array([state.motor_state[i].dq for i in range(NUM_DOFS)], dtype=np.float32)
# FK for torso orientation
if self.pinocchio_fk:
torso_q = self.pinocchio_fk.get_torso_quat(np.zeros(3), quat, dof_pos)
torso_q = self._remove_yaw_offset(torso_q)
motion_ori = xyzw_to_wxyz(self.ref_quat_xyzw)
rel_quat = subtract_frame_transforms(torso_q, motion_ori)
ori_b = matrix_from_quat(rel_quat)[..., :2].reshape(1, -1)
else:
ori_b = np.zeros((1, 6), dtype=np.float32)
dof_rel = (dof_pos - DEFAULT_29DOF_ANGLES).reshape(1, -1)
# Build obs (alphabetical)
obs_dict = {
"actions": self.last_action,
"base_ang_vel": ang_vel.reshape(1, 3),
"dof_pos": dof_rel,
"dof_vel": dof_vel.reshape(1, -1),
"motion_command": self.motion_command,
"motion_ref_ori_b": ori_b,
}
obs = np.concatenate([obs_dict[k].astype(np.float32) for k in sorted(obs_dict.keys())], axis=1)
obs = np.clip(obs, -100, 100)
# Inference
outs = self.policy.run(["actions", "joint_pos", "joint_vel", "ref_quat_xyzw"],
{"obs": obs, "time_step": np.array([[self.timestep]], dtype=np.float32)})
action = np.clip(outs[0], -100, 100)
self.motion_command = np.concatenate(outs[1:3], axis=1)
self.ref_quat_xyzw = outs[3]
self.last_action = action.copy()
target = DEFAULT_29DOF_ANGLES + action.flatten() * self.action_scale
# Send commands
for i in range(NUM_DOFS):
if i in FROZEN_JOINTS:
self.robot.msg.motor_cmd[i].q = 0.0
self.robot.msg.motor_cmd[i].kp = FROZEN_KP
self.robot.msg.motor_cmd[i].kd = FROZEN_KD
else:
self.robot.msg.motor_cmd[i].q = float(target[i])
self.robot.msg.motor_cmd[i].kp = self.motor_kp[i]
self.robot.msg.motor_cmd[i].kd = self.motor_kd[i]
self.robot.msg.motor_cmd[i].qd = 0
self.robot.msg.motor_cmd[i].tau = 0
self.robot.send_action(self.robot.msg)
self.timestep += 1
# =============================================================================
# MAIN
# =============================================================================
def main():
parser = argparse.ArgumentParser(description="Locomotion ↔ Dance Toggle")
parser.add_argument("--loco-repo", type=str, default=DEFAULT_HOLOSOMA_REPO_ID)
parser.add_argument("--dance-onnx", type=str, default=DANCE_ONNX_PATH)
args = parser.parse_args()
print("=" * 70)
print("🚶 LOCOMOTION ↔ 💃 DANCE")
print("=" * 70)
print("Press ENTER to toggle between modes")
print("=" * 70)
# Load locomotion policy
logger.info("Loading locomotion policy...")
loco_path = hf_hub_download(repo_id=args.loco_repo, filename="fastsac_g1_29dof.onnx")
loco_policy = ort.InferenceSession(loco_path)
loco_obs_dim = loco_policy.get_inputs()[0].shape[1]
logger.info(f"Locomotion: {loco_obs_dim}D obs")
# Load dance policy
logger.info("Loading dance policy...")
dance_policy = ort.InferenceSession(args.dance_onnx)
dance_model = onnx.load(args.dance_onnx)
dance_meta = {p.key: json.loads(p.value) for p in dance_model.metadata_props}
dance_kp = np.array(dance_meta.get("kp", DEFAULT_29DOF_KP), dtype=np.float32)
dance_kd = np.array(dance_meta.get("kd", DEFAULT_29DOF_KD), dtype=np.float32)
dance_action_scale = float(dance_meta.get("action_scale", 1.0))
logger.info(f"Dance: {dance_policy.get_inputs()[0].shape[1]}D obs, scale={dance_action_scale}")
# Build Pinocchio FK
pinocchio_fk = None
if "robot_urdf" in dance_meta:
logger.info("Building Pinocchio FK...")
pinocchio_fk = PinocchioFK(dance_meta["robot_urdf"])
# Initialize robot
logger.info("Initializing robot...")
config = UnitreeG1Config()
robot = UnitreeG1(config)
logger.info("Robot connected!")
# Create controllers
loco_ctrl = LocomotionController(loco_policy, robot, loco_obs_dim)
dance_ctrl = DanceController(dance_policy, robot, pinocchio_fk, dance_kp, dance_kd, dance_action_scale)
# State
mode = "locomotion"
toggle_event = threading.Event()
shutdown = threading.Event()
# Input thread
def input_loop():
while not shutdown.is_set():
if select.select([sys.stdin], [], [], 0.1)[0]:
sys.stdin.readline()
toggle_event.set()
input_thread = threading.Thread(target=input_loop, daemon=True)
input_thread.start()
print("\n🚶 LOCOMOTION MODE - Use joystick to walk")
print(" Press ENTER to switch to DANCE")
print("-" * 70)
step = 0
try:
while not shutdown.is_set():
t0 = time.time()
# Check toggle
if toggle_event.is_set():
toggle_event.clear()
if mode == "locomotion":
mode = "dance"
dance_ctrl.initialize()
print("\n" + "=" * 70)
print("💃 DANCE MODE (frame 0)")
print(" Press ENTER to switch to LOCOMOTION")
print("=" * 70)
else:
mode = "locomotion"
loco_ctrl.reset()
print("\n" + "=" * 70)
print("🚶 LOCOMOTION MODE")
print(" Press ENTER to switch to DANCE")
print("=" * 70)
# Run controller
if mode == "locomotion":
loco_ctrl.run_step()
else:
dance_ctrl.run_step()
# Log
if step % 100 == 0:
if mode == "locomotion":
print(f"[LOCO ] step={step:5d} cmd=[{loco_ctrl.cmd[0]:.2f},{loco_ctrl.cmd[1]:.2f},{loco_ctrl.cmd[2]:.2f}]")
else:
print(f"[DANCE] step={step:5d} timestep={dance_ctrl.timestep}")
step += 1
elapsed = time.time() - t0
if elapsed < CONTROL_DT:
time.sleep(CONTROL_DT - elapsed)
except KeyboardInterrupt:
print("\n\nStopping...")
finally:
shutdown.set()
robot.disconnect()
print("Done!")
if __name__ == "__main__":
main()

View File

@@ -1,447 +0,0 @@
#!/usr/bin/env python
# Copyright 2025 The HuggingFace Inc. team. All rights reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
"""
Example: Unitree RL 12-DOF Legs-Only Locomotion (TorchScript)
This example demonstrates loading a 12-DOF legs-only locomotion policy
(TorchScript .pt format) and running it on the Unitree G1 robot.
Key characteristics:
- Single TorchScript policy (.pt)
- 47D observations, 12D actions (legs only)
- Phase-based gait timing
- Arms and waist held at fixed positions
"""
import argparse
import logging
import threading
import time
import numpy as np
import torch
from huggingface_hub import hf_hub_download
from scipy.spatial.transform import Rotation as R
from lerobot.robots.unitree_g1.config_unitree_g1 import UnitreeG1Config
from lerobot.robots.unitree_g1.unitree_g1 import UnitreeG1
logging.basicConfig(level=logging.INFO)
logger = logging.getLogger(__name__)
# 12-DOF leg joint configuration
# Joint order: [L_hip_pitch, L_hip_roll, L_hip_yaw, L_knee, L_ankle_pitch, L_ankle_roll,
# R_hip_pitch, R_hip_roll, R_hip_yaw, R_knee, R_ankle_pitch, R_ankle_roll]
LEG_JOINT_INDICES = [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11]
# Default leg angles for standing
DEFAULT_LEG_ANGLES = np.array([
-0.1, 0.0, 0.0, 0.3, -0.2, 0.0, # left leg
-0.1, 0.0, 0.0, 0.3, -0.2, 0.0, # right leg
], dtype=np.float32)
# KP/KD for leg joints
LEG_KPS = np.array([150, 150, 150, 300, 40, 40, 150, 150, 150, 300, 40, 40], dtype=np.float32)
LEG_KDS = np.array([6, 6, 6, 4, 2, 2, 6, 6, 6, 4, 2, 2], dtype=np.float32)
# Waist configuration (held at zero)
WAIST_JOINT_INDICES = [12, 13, 14] # yaw, roll, pitch
WAIST_KPS = np.array([250, 250, 250], dtype=np.float32)
WAIST_KDS = np.array([5, 5, 5], dtype=np.float32)
# Arm configuration (indices 15-28, held at initial position)
ARM_JOINT_INDICES = list(range(15, 29))
ARM_KPS = np.array([80, 80, 80, 80, 40, 40, 40, # left arm (shoulder + wrist)
80, 80, 80, 80, 40, 40, 40], dtype=np.float32) # right arm
ARM_KDS = np.array([3, 3, 3, 3, 1.5, 1.5, 1.5,
3, 3, 3, 3, 1.5, 1.5, 1.5], dtype=np.float32)
# Control parameters
LOCOMOTION_CONTROL_DT = 0.02 # 50Hz control rate
LOCOMOTION_ACTION_SCALE = 0.25
ANG_VEL_SCALE = 0.25
DOF_POS_SCALE = 1.0
DOF_VEL_SCALE = 0.05
CMD_SCALE = np.array([2.0, 2.0, 0.25], dtype=np.float32)
MAX_CMD = np.array([0.8, 0.5, 1.57], dtype=np.float32) # max vx, vy, yaw_rate
# Gait parameters
GAIT_PERIOD = 0.8 # seconds
DEFAULT_REPO_ID = "nepyope/unitree_rl_locomotion"
def load_torchscript_policy(
repo_id: str = DEFAULT_REPO_ID,
filename: str = "motion.pt",
) -> torch.jit.ScriptModule:
"""Load TorchScript locomotion policy from Hugging Face Hub.
Args:
repo_id: Hugging Face Hub repository ID containing the policy.
filename: Policy filename (default: motion.pt).
"""
logger.info(f"Loading TorchScript policy from Hugging Face Hub ({repo_id}/{filename})...")
policy_path = hf_hub_download(
repo_id=repo_id,
filename=filename,
)
policy = torch.jit.load(policy_path)
policy.eval()
logger.info("TorchScript policy loaded successfully")
return policy
class UnitreeRLLocomotionController:
"""
Handles 12-DOF legs-only locomotion control for the Unitree G1 robot.
This controller manages:
- Single TorchScript policy
- 47D observations (single frame)
- 12D action output (legs only)
- Arms and waist held at fixed positions
- Phase-based gait timing
"""
def __init__(self, policy, robot, config):
self.policy = policy
self.robot = robot
self.config = config
# Velocity commands (vx, vy, yaw_rate)
self.locomotion_cmd = np.array([0.0, 0.0, 0.0], dtype=np.float32)
# State variables (12 DOF legs)
self.qj = np.zeros(12, dtype=np.float32)
self.dqj = np.zeros(12, dtype=np.float32)
self.locomotion_action = np.zeros(12, dtype=np.float32)
self.locomotion_obs = np.zeros(47, dtype=np.float32)
# Initial arm positions (captured on reset)
self.initial_arm_positions = np.zeros(14, dtype=np.float32)
# Counter for phase calculation
self.counter = 0
# Thread management
self.locomotion_running = False
self.locomotion_thread = None
logger.info("UnitreeRLLocomotionController initialized")
logger.info(" Observation dim: 47, Action dim: 12 (legs only)")
def locomotion_run(self):
"""12-DOF legs-only locomotion policy loop."""
self.counter += 1
if self.counter == 1:
print("\n" + "=" * 60)
print("🚀 RUNNING UNITREE RL 12-DOF LOCOMOTION POLICY")
print(" 47D observations → 12D actions (legs only)")
print(" Arms and waist held at fixed positions")
print("=" * 60 + "\n")
# Get current observation
robot_state = self.robot.get_observation()
if robot_state is None:
return
# Get command from remote controller
if robot_state.wireless_remote is not None:
self.robot.remote_controller.set(robot_state.wireless_remote)
else:
self.robot.remote_controller.lx = 0.0
self.robot.remote_controller.ly = 0.0
self.robot.remote_controller.rx = 0.0
self.robot.remote_controller.ry = 0.0
self.locomotion_cmd[0] = self.robot.remote_controller.ly # forward/backward
self.locomotion_cmd[1] = self.robot.remote_controller.lx * -1 # left/right (inverted)
self.locomotion_cmd[2] = self.robot.remote_controller.rx * -1 # yaw (inverted)
# Get leg joint positions and velocities (12 DOF)
for i, motor_idx in enumerate(LEG_JOINT_INDICES):
self.qj[i] = robot_state.motor_state[motor_idx].q
self.dqj[i] = robot_state.motor_state[motor_idx].dq
# Get IMU data
quat = robot_state.imu_state.quaternion
ang_vel = np.array(robot_state.imu_state.gyroscope, dtype=np.float32)
# Scale observations
gravity_orientation = self.robot.get_gravity_orientation(quat)
qj_obs = (self.qj - DEFAULT_LEG_ANGLES) * DOF_POS_SCALE
dqj_obs = self.dqj * DOF_VEL_SCALE
ang_vel_scaled = ang_vel * ANG_VEL_SCALE
# Calculate phase
count = self.counter * LOCOMOTION_CONTROL_DT
phase = (count % GAIT_PERIOD) / GAIT_PERIOD
sin_phase = np.sin(2 * np.pi * phase)
cos_phase = np.cos(2 * np.pi * phase)
# Build 47D observation vector
# [0:3] - angular velocity (scaled)
# [3:6] - gravity orientation
# [6:9] - velocity command (scaled)
# [9:21] - joint positions (12D, relative to default)
# [21:33] - joint velocities (12D, scaled)
# [33:45] - previous actions (12D)
# [45] - sin_phase
# [46] - cos_phase
self.locomotion_obs[0:3] = ang_vel_scaled
self.locomotion_obs[3:6] = gravity_orientation
self.locomotion_obs[6:9] = self.locomotion_cmd * CMD_SCALE * MAX_CMD
self.locomotion_obs[9:21] = qj_obs
self.locomotion_obs[21:33] = dqj_obs
self.locomotion_obs[33:45] = self.locomotion_action
self.locomotion_obs[45] = sin_phase
self.locomotion_obs[46] = cos_phase
# Run policy inference (TorchScript)
obs_tensor = torch.from_numpy(self.locomotion_obs).unsqueeze(0).float()
with torch.no_grad():
action_tensor = self.policy(obs_tensor)
self.locomotion_action = action_tensor.squeeze().numpy()
# Transform action to target joint positions
target_leg_pos = DEFAULT_LEG_ANGLES + self.locomotion_action * LOCOMOTION_ACTION_SCALE
# Debug logging (first 3 iterations)
if self.counter <= 3:
print(f"\n[Unitree RL Debug #{self.counter}]")
print(f" Phase: {phase:.3f} (sin={sin_phase:.3f}, cos={cos_phase:.3f})")
print(f" Cmd (vx, vy, yaw): ({self.locomotion_cmd[0]:.2f}, {self.locomotion_cmd[1]:.2f}, {self.locomotion_cmd[2]:.2f})")
print(f" Action range: [{self.locomotion_action.min():.3f}, {self.locomotion_action.max():.3f}]")
# Send commands to LEG motors (0-11)
for i, motor_idx in enumerate(LEG_JOINT_INDICES):
self.robot.msg.motor_cmd[motor_idx].q = target_leg_pos[i]
self.robot.msg.motor_cmd[motor_idx].qd = 0
self.robot.msg.motor_cmd[motor_idx].kp = LEG_KPS[i]
self.robot.msg.motor_cmd[motor_idx].kd = LEG_KDS[i]
self.robot.msg.motor_cmd[motor_idx].tau = 0
# Hold WAIST motors at zero (12, 13, 14)
for i, motor_idx in enumerate(WAIST_JOINT_INDICES):
self.robot.msg.motor_cmd[motor_idx].q = 0.0
self.robot.msg.motor_cmd[motor_idx].qd = 0
self.robot.msg.motor_cmd[motor_idx].kp = WAIST_KPS[i]
self.robot.msg.motor_cmd[motor_idx].kd = WAIST_KDS[i]
self.robot.msg.motor_cmd[motor_idx].tau = 0
# Hold ARM motors at initial position (15-28)
for i, motor_idx in enumerate(ARM_JOINT_INDICES):
self.robot.msg.motor_cmd[motor_idx].q = self.initial_arm_positions[i]
self.robot.msg.motor_cmd[motor_idx].qd = 0
self.robot.msg.motor_cmd[motor_idx].kp = ARM_KPS[i]
self.robot.msg.motor_cmd[motor_idx].kd = ARM_KDS[i]
self.robot.msg.motor_cmd[motor_idx].tau = 0
# Send command
self.robot.send_action(self.robot.msg)
def _locomotion_thread_loop(self):
"""Background thread that runs the locomotion policy at specified rate."""
logger.info("Locomotion thread started")
while self.locomotion_running:
start_time = time.time()
try:
self.locomotion_run()
except Exception as e:
logger.error(f"Error in locomotion loop: {e}")
import traceback
traceback.print_exc()
# Sleep to maintain control rate
elapsed = time.time() - start_time
sleep_time = max(0, LOCOMOTION_CONTROL_DT - elapsed)
time.sleep(sleep_time)
logger.info("Locomotion thread stopped")
def start_locomotion_thread(self):
if self.locomotion_running:
logger.warning("Locomotion thread already running")
return
logger.info("Starting locomotion control thread...")
self.locomotion_running = True
self.locomotion_thread = threading.Thread(target=self._locomotion_thread_loop, daemon=True)
self.locomotion_thread.start()
logger.info("Locomotion control thread started!")
def stop_locomotion_thread(self):
if not self.locomotion_running:
return
logger.info("Stopping locomotion control thread...")
self.locomotion_running = False
if self.locomotion_thread:
self.locomotion_thread.join(timeout=2.0)
logger.info("Locomotion control thread stopped")
def reset_robot(self):
"""Move legs to default standing position over 2 seconds (arms are captured and held)."""
logger.info("Moving legs to default position...")
total_time = 2.0
num_step = int(total_time / self.robot.control_dt)
# Get current state
robot_state = self.robot.get_observation()
# Capture initial arm positions (to hold during locomotion)
for i, motor_idx in enumerate(ARM_JOINT_INDICES):
self.initial_arm_positions[i] = robot_state.motor_state[motor_idx].q
logger.info(f"Captured initial arm positions: {self.initial_arm_positions[:4]}...")
# Record current leg positions
init_leg_pos = np.zeros(12, dtype=np.float32)
for i, motor_idx in enumerate(LEG_JOINT_INDICES):
init_leg_pos[i] = robot_state.motor_state[motor_idx].q
# Interpolate legs to default position
for step in range(num_step):
alpha = step / num_step
# Interpolate leg positions
for i, motor_idx in enumerate(LEG_JOINT_INDICES):
target_pos = DEFAULT_LEG_ANGLES[i]
self.robot.msg.motor_cmd[motor_idx].q = (
init_leg_pos[i] * (1 - alpha) + target_pos * alpha
)
self.robot.msg.motor_cmd[motor_idx].qd = 0
self.robot.msg.motor_cmd[motor_idx].kp = LEG_KPS[i]
self.robot.msg.motor_cmd[motor_idx].kd = LEG_KDS[i]
self.robot.msg.motor_cmd[motor_idx].tau = 0
# Hold waist at zero
for i, motor_idx in enumerate(WAIST_JOINT_INDICES):
self.robot.msg.motor_cmd[motor_idx].q = 0.0
self.robot.msg.motor_cmd[motor_idx].qd = 0
self.robot.msg.motor_cmd[motor_idx].kp = WAIST_KPS[i]
self.robot.msg.motor_cmd[motor_idx].kd = WAIST_KDS[i]
self.robot.msg.motor_cmd[motor_idx].tau = 0
# Hold arms at initial position
for i, motor_idx in enumerate(ARM_JOINT_INDICES):
self.robot.msg.motor_cmd[motor_idx].q = self.initial_arm_positions[i]
self.robot.msg.motor_cmd[motor_idx].qd = 0
self.robot.msg.motor_cmd[motor_idx].kp = ARM_KPS[i]
self.robot.msg.motor_cmd[motor_idx].kd = ARM_KDS[i]
self.robot.msg.motor_cmd[motor_idx].tau = 0
self.robot.msg.crc = self.robot.crc.Crc(self.robot.msg)
self.robot.lowcmd_publisher.Write(self.robot.msg)
time.sleep(self.robot.control_dt)
logger.info("Reached default leg position")
# Hold position for 2 seconds
logger.info("Holding default position for 2 seconds...")
hold_time = 2.0
num_hold_steps = int(hold_time / self.robot.control_dt)
for _ in range(num_hold_steps):
# Hold legs at default
for i, motor_idx in enumerate(LEG_JOINT_INDICES):
self.robot.msg.motor_cmd[motor_idx].q = DEFAULT_LEG_ANGLES[i]
self.robot.msg.motor_cmd[motor_idx].qd = 0
self.robot.msg.motor_cmd[motor_idx].kp = LEG_KPS[i]
self.robot.msg.motor_cmd[motor_idx].kd = LEG_KDS[i]
self.robot.msg.motor_cmd[motor_idx].tau = 0
# Hold waist at zero
for i, motor_idx in enumerate(WAIST_JOINT_INDICES):
self.robot.msg.motor_cmd[motor_idx].q = 0.0
self.robot.msg.motor_cmd[motor_idx].qd = 0
self.robot.msg.motor_cmd[motor_idx].kp = WAIST_KPS[i]
self.robot.msg.motor_cmd[motor_idx].kd = WAIST_KDS[i]
self.robot.msg.motor_cmd[motor_idx].tau = 0
# Hold arms at initial position
for i, motor_idx in enumerate(ARM_JOINT_INDICES):
self.robot.msg.motor_cmd[motor_idx].q = self.initial_arm_positions[i]
self.robot.msg.motor_cmd[motor_idx].qd = 0
self.robot.msg.motor_cmd[motor_idx].kp = ARM_KPS[i]
self.robot.msg.motor_cmd[motor_idx].kd = ARM_KDS[i]
self.robot.msg.motor_cmd[motor_idx].tau = 0
self.robot.msg.crc = self.robot.crc.Crc(self.robot.msg)
self.robot.lowcmd_publisher.Write(self.robot.msg)
time.sleep(self.robot.control_dt)
logger.info("Ready to start locomotion!")
if __name__ == "__main__":
parser = argparse.ArgumentParser(description="Unitree RL 12-DOF Locomotion Controller for Unitree G1")
parser.add_argument(
"--repo-id",
type=str,
default=DEFAULT_REPO_ID,
help=f"Hugging Face Hub repo ID for policy (default: {DEFAULT_REPO_ID})",
)
parser.add_argument(
"--filename",
type=str,
default="motion.pt",
help="Policy filename (default: motion.pt)",
)
args = parser.parse_args()
# Load policy
policy = load_torchscript_policy(repo_id=args.repo_id, filename=args.filename)
# Initialize robot
config = UnitreeG1Config()
robot = UnitreeG1(config)
# Initialize locomotion controller
locomotion_controller = UnitreeRLLocomotionController(
policy=policy,
robot=robot,
config=config,
)
# Reset robot and start locomotion thread
try:
locomotion_controller.reset_robot()
locomotion_controller.start_locomotion_thread()
# Log status
logger.info("Robot initialized with Unitree RL locomotion policy")
logger.info("Locomotion controller running in background thread")
logger.info("Use remote controller to command velocity:")
logger.info(" Left stick Y: forward/backward")
logger.info(" Left stick X: left/right")
logger.info(" Right stick X: rotate")
logger.info("Press Ctrl+C to stop")
# Keep robot alive
while True:
time.sleep(1.0)
except KeyboardInterrupt:
print("\nStopping locomotion...")
locomotion_controller.stop_locomotion_thread()
print("Done!")

View File

@@ -263,7 +263,6 @@ default.extend-ignore-identifiers-re = [
"ein",
"thw",
"inpt",
"ROBOTIS",
]
# TODO: Uncomment when ready to use

View File

@@ -26,4 +26,4 @@ DEFAULT_OBS_QUEUE_TIMEOUT = 2
SUPPORTED_POLICIES = ["act", "smolvla", "diffusion", "tdmpc", "vqbet", "pi0", "pi05"]
# TODO: Add all other robots
SUPPORTED_ROBOTS = ["so100_follower", "so101_follower", "bi_so100_follower", "omx_follower"]
SUPPORTED_ROBOTS = ["so100_follower", "so101_follower", "bi_so100_follower"]

View File

@@ -54,7 +54,6 @@ from lerobot.robots import ( # noqa: F401
bi_so100_follower,
koch_follower,
make_robot_from_config,
omx_follower,
so100_follower,
so101_follower,
)

View File

@@ -136,40 +136,21 @@ def update_meta_data(
df["_orig_chunk"] = df[orig_chunk_col].copy()
df["_orig_file"] = df[orig_file_col].copy()
# Get mappings for this video key
# Update chunk and file indices to point to destination
df[orig_chunk_col] = video_idx["chunk"]
df[orig_file_col] = video_idx["file"]
# Apply per-source-file timestamp offsets
src_to_offset = video_idx.get("src_to_offset", {})
src_to_dst = video_idx.get("src_to_dst", {})
# Apply per-source-file mappings
if src_to_dst:
# Map each episode to its correct destination file and apply offset
if src_to_offset:
# Apply offset based on original source file
for idx in df.index:
# Convert to Python int to avoid numpy type mismatch in dict lookup
src_key = (int(df.at[idx, "_orig_chunk"]), int(df.at[idx, "_orig_file"]))
# Get destination chunk/file for this source file
dst_chunk, dst_file = src_to_dst.get(src_key, (video_idx["chunk"], video_idx["file"]))
df.at[idx, orig_chunk_col] = dst_chunk
df.at[idx, orig_file_col] = dst_file
# Apply timestamp offset
offset = src_to_offset.get(src_key, 0)
df.at[idx, f"videos/{key}/from_timestamp"] += offset
df.at[idx, f"videos/{key}/to_timestamp"] += offset
elif src_to_offset:
# Fallback: use same destination for all, but apply per-file offsets
df[orig_chunk_col] = video_idx["chunk"]
df[orig_file_col] = video_idx["file"]
for idx in df.index:
# Convert to Python int to avoid numpy type mismatch in dict lookup
src_key = (int(df.at[idx, "_orig_chunk"]), int(df.at[idx, "_orig_file"]))
src_key = (df.at[idx, "_orig_chunk"], df.at[idx, "_orig_file"])
offset = src_to_offset.get(src_key, 0)
df.at[idx, f"videos/{key}/from_timestamp"] += offset
df.at[idx, f"videos/{key}/to_timestamp"] += offset
else:
# Fallback to simple offset (for backward compatibility)
df[orig_chunk_col] = video_idx["chunk"]
df[orig_file_col] = video_idx["file"]
df[f"videos/{key}/from_timestamp"] = (
df[f"videos/{key}/from_timestamp"] + video_idx["latest_duration"]
)
@@ -287,12 +268,6 @@ def aggregate_videos(src_meta, dst_meta, videos_idx, video_files_size_in_mb, chu
videos_idx[key]["episode_duration"] = 0
# Track offset for each source (chunk, file) pair
videos_idx[key]["src_to_offset"] = {}
# Track destination (chunk, file) for each source (chunk, file) pair
videos_idx[key]["src_to_dst"] = {}
# Initialize dst_file_durations if not present
# dst_file_durations tracks duration of each destination file
if "dst_file_durations" not in videos_idx[key]:
videos_idx[key]["dst_file_durations"] = {}
for key, video_idx in videos_idx.items():
unique_chunk_file_pairs = {
@@ -307,13 +282,9 @@ def aggregate_videos(src_meta, dst_meta, videos_idx, video_files_size_in_mb, chu
chunk_idx = video_idx["chunk"]
file_idx = video_idx["file"]
dst_file_durations = video_idx["dst_file_durations"]
current_offset = video_idx["latest_duration"]
for src_chunk_idx, src_file_idx in unique_chunk_file_pairs:
# Convert to Python int to ensure consistent dict keys
src_chunk_idx = int(src_chunk_idx)
src_file_idx = int(src_file_idx)
src_path = src_meta.root / DEFAULT_VIDEO_PATH.format(
video_key=key,
chunk_index=src_chunk_idx,
@@ -327,17 +298,14 @@ def aggregate_videos(src_meta, dst_meta, videos_idx, video_files_size_in_mb, chu
)
src_duration = get_video_duration_in_s(src_path)
dst_key = (chunk_idx, file_idx)
if not dst_path.exists():
# New destination file: offset is 0
videos_idx[key]["src_to_offset"][(src_chunk_idx, src_file_idx)] = 0
videos_idx[key]["src_to_dst"][(src_chunk_idx, src_file_idx)] = dst_key
# Store offset before incrementing
videos_idx[key]["src_to_offset"][(src_chunk_idx, src_file_idx)] = current_offset
dst_path.parent.mkdir(parents=True, exist_ok=True)
shutil.copy(str(src_path), str(dst_path))
# Track duration of this destination file
dst_file_durations[dst_key] = src_duration
videos_idx[key]["episode_duration"] += src_duration
current_offset += src_duration
continue
# Check file sizes before appending
@@ -345,11 +313,10 @@ def aggregate_videos(src_meta, dst_meta, videos_idx, video_files_size_in_mb, chu
dst_size = get_file_size_in_mb(dst_path)
if dst_size + src_size >= video_files_size_in_mb:
# Rotate to a new file - offset is 0
chunk_idx, file_idx = update_chunk_file_indices(chunk_idx, file_idx, chunk_size)
dst_key = (chunk_idx, file_idx)
# Rotate to a new file, this source becomes start of new destination
# So its offset should be 0
videos_idx[key]["src_to_offset"][(src_chunk_idx, src_file_idx)] = 0
videos_idx[key]["src_to_dst"][(src_chunk_idx, src_file_idx)] = dst_key
chunk_idx, file_idx = update_chunk_file_indices(chunk_idx, file_idx, chunk_size)
dst_path = dst_meta.root / DEFAULT_VIDEO_PATH.format(
video_key=key,
chunk_index=chunk_idx,
@@ -357,20 +324,16 @@ def aggregate_videos(src_meta, dst_meta, videos_idx, video_files_size_in_mb, chu
)
dst_path.parent.mkdir(parents=True, exist_ok=True)
shutil.copy(str(src_path), str(dst_path))
# Track duration of this new destination file
dst_file_durations[dst_key] = src_duration
# Reset offset for next file
current_offset = src_duration
else:
# Append to existing destination file
# Offset is the current duration of this destination file
current_dst_duration = dst_file_durations.get(dst_key, 0)
videos_idx[key]["src_to_offset"][(src_chunk_idx, src_file_idx)] = current_dst_duration
videos_idx[key]["src_to_dst"][(src_chunk_idx, src_file_idx)] = dst_key
# Append to existing video file - use current accumulated offset
videos_idx[key]["src_to_offset"][(src_chunk_idx, src_file_idx)] = current_offset
concatenate_video_files(
[dst_path, src_path],
dst_path,
)
# Update duration of this destination file
dst_file_durations[dst_key] = current_dst_duration + src_duration
current_offset += src_duration
videos_idx[key]["episode_duration"] += src_duration

View File

@@ -23,8 +23,6 @@ from lerobot.optim.schedulers import CosineDecayWithWarmupSchedulerConfig
from lerobot.policies.rtc.configuration_rtc import RTCConfig
from lerobot.utils.constants import OBS_IMAGES
DEFAULT_IMAGE_SIZE = 224
@PreTrainedConfig.register_subclass("pi0")
@dataclass
@@ -53,10 +51,7 @@ class PI0Config(PreTrainedConfig):
# Real-Time Chunking (RTC) configuration
rtc_config: RTCConfig | None = None
image_resolution: tuple[int, int] = (
DEFAULT_IMAGE_SIZE,
DEFAULT_IMAGE_SIZE,
) # see openpi `preprocessing_pytorch.py`
image_resolution: tuple[int, int] = (224, 224) # see openpi `preprocessing_pytorch.py`
# Add empty images. Used to add empty cameras when no image features are present.
empty_cameras: int = 0

View File

@@ -41,7 +41,7 @@ else:
PaliGemmaForConditionalGeneration = None
from lerobot.configs.policies import PreTrainedConfig
from lerobot.policies.pi0.configuration_pi0 import DEFAULT_IMAGE_SIZE, PI0Config
from lerobot.policies.pi0.configuration_pi0 import PI0Config
from lerobot.policies.pretrained import PreTrainedPolicy, T
from lerobot.policies.rtc.modeling_rtc import RTCProcessor
from lerobot.utils.constants import (
@@ -337,7 +337,6 @@ class PaliGemmaWithExpertModel(
action_expert_config,
use_adarms=None,
precision: Literal["bfloat16", "float32"] = "bfloat16",
image_size: int = DEFAULT_IMAGE_SIZE,
):
if use_adarms is None:
use_adarms = [False, False]
@@ -357,7 +356,6 @@ class PaliGemmaWithExpertModel(
vlm_config_hf.text_config.vocab_size = 257152
vlm_config_hf.text_config.use_adarms = use_adarms[0]
vlm_config_hf.text_config.adarms_cond_dim = vlm_config.width if use_adarms[0] else None
vlm_config_hf.vision_config.image_size = image_size
vlm_config_hf.vision_config.intermediate_size = 4304
vlm_config_hf.vision_config.projection_dim = 2048
vlm_config_hf.vision_config.projector_hidden_act = "gelu_fast"
@@ -521,17 +519,11 @@ class PI0Pytorch(nn.Module): # see openpi `PI0Pytorch`
paligemma_config = get_gemma_config(config.paligemma_variant)
action_expert_config = get_gemma_config(config.action_expert_variant)
if config.image_resolution[0] != config.image_resolution[1]:
raise ValueError(
f"PaliGemma expects square image resolution, invalid resolution: {config.image_resolution}"
)
self.paligemma_with_expert = PaliGemmaWithExpertModel(
paligemma_config,
action_expert_config,
use_adarms=[False, False],
precision=config.dtype,
image_size=config.image_resolution[0],
)
self.action_in_proj = nn.Linear(config.max_action_dim, action_expert_config.width)
@@ -820,13 +812,16 @@ class PI0Pytorch(nn.Module): # see openpi `PI0Pytorch`
)
dt = -1.0 / num_steps
dt = torch.tensor(dt, dtype=torch.float32, device=device)
x_t = noise
for step in range(num_steps):
time = 1.0 + step * dt
time_tensor = torch.tensor(time, dtype=torch.float32, device=device).expand(bsize)
time = torch.tensor(1.0, dtype=torch.float32, device=device)
while time >= -dt / 2:
expanded_time = time.expand(bsize)
def denoise_step_partial_call(input_x_t, current_timestep=time_tensor):
# Define a closure function to properly capture expanded_time
# This avoids the lambda expression (E731) and loop variable binding (B023) issues
def denoise_step_partial_call(input_x_t, current_timestep=expanded_time):
return self.denoise_step(
state=state,
prefix_pad_masks=prefix_pad_masks,
@@ -851,11 +846,15 @@ class PI0Pytorch(nn.Module): # see openpi `PI0Pytorch`
else:
v_t = denoise_step_partial_call(x_t)
x_t = x_t + dt * v_t
# Euler step
x_t += dt * v_t
# Record x_t and v_t after Euler step
if self.rtc_processor is not None and self.rtc_processor.is_debug_enabled():
self.rtc_processor.track(time=time, x_t=x_t, v_t=v_t)
time += dt
return x_t
def denoise_step(

View File

@@ -22,8 +22,6 @@ from lerobot.optim.optimizers import AdamWConfig
from lerobot.optim.schedulers import CosineDecayWithWarmupSchedulerConfig
from lerobot.policies.rtc.configuration_rtc import RTCConfig
DEFAULT_IMAGE_SIZE = 224
@PreTrainedConfig.register_subclass("pi05")
@dataclass
@@ -52,10 +50,7 @@ class PI05Config(PreTrainedConfig):
# Real-Time Chunking (RTC) configuration
rtc_config: RTCConfig | None = None
image_resolution: tuple[int, int] = (
DEFAULT_IMAGE_SIZE,
DEFAULT_IMAGE_SIZE,
) # see openpi `preprocessing_pytorch.py`
image_resolution: tuple[int, int] = (224, 224) # see openpi `preprocessing_pytorch.py`
# Add empty images. Used to add empty cameras when no image features are present.
empty_cameras: int = 0

View File

@@ -41,7 +41,7 @@ else:
PaliGemmaForConditionalGeneration = None
from lerobot.configs.policies import PreTrainedConfig
from lerobot.policies.pi05.configuration_pi05 import DEFAULT_IMAGE_SIZE, PI05Config
from lerobot.policies.pi05.configuration_pi05 import PI05Config
from lerobot.policies.pretrained import PreTrainedPolicy, T
from lerobot.policies.rtc.modeling_rtc import RTCProcessor
from lerobot.utils.constants import (
@@ -336,7 +336,6 @@ class PaliGemmaWithExpertModel(
action_expert_config,
use_adarms=None,
precision: Literal["bfloat16", "float32"] = "bfloat16",
image_size: int = DEFAULT_IMAGE_SIZE,
):
if use_adarms is None:
use_adarms = [False, False]
@@ -356,7 +355,6 @@ class PaliGemmaWithExpertModel(
vlm_config_hf.text_config.vocab_size = 257152
vlm_config_hf.text_config.use_adarms = use_adarms[0]
vlm_config_hf.text_config.adarms_cond_dim = vlm_config.width if use_adarms[0] else None
vlm_config_hf.vision_config.image_size = image_size
vlm_config_hf.vision_config.intermediate_size = 4304
vlm_config_hf.vision_config.projection_dim = 2048
vlm_config_hf.vision_config.projector_hidden_act = "gelu_fast"
@@ -520,17 +518,11 @@ class PI05Pytorch(nn.Module): # see openpi `PI0Pytorch`
paligemma_config = get_gemma_config(config.paligemma_variant)
action_expert_config = get_gemma_config(config.action_expert_variant)
if config.image_resolution[0] != config.image_resolution[1]:
raise ValueError(
f"PaliGemma expects square image resolution, invalid resolution: {config.image_resolution}"
)
self.paligemma_with_expert = PaliGemmaWithExpertModel(
paligemma_config,
action_expert_config,
use_adarms=[False, True],
precision=config.dtype,
image_size=config.image_resolution[0],
)
self.action_in_proj = nn.Linear(config.max_action_dim, action_expert_config.width)
@@ -795,13 +787,16 @@ class PI05Pytorch(nn.Module): # see openpi `PI0Pytorch`
)
dt = -1.0 / num_steps
dt = torch.tensor(dt, dtype=torch.float32, device=device)
x_t = noise
for step in range(num_steps):
time = 1.0 + step * dt
time_tensor = torch.tensor(time, dtype=torch.float32, device=device).expand(bsize)
time = torch.tensor(1.0, dtype=torch.float32, device=device)
while time >= -dt / 2:
expanded_time = time.expand(bsize)
def denoise_step_partial_call(input_x_t, current_timestep=time_tensor):
# Define a closure function to properly capture expanded_time
# This avoids the lambda expression (E731) and loop variable binding (B023) issues
def denoise_step_partial_call(input_x_t, current_timestep=expanded_time):
return self.denoise_step(
prefix_pad_masks=prefix_pad_masks,
past_key_values=past_key_values,
@@ -825,11 +820,15 @@ class PI05Pytorch(nn.Module): # see openpi `PI0Pytorch`
else:
v_t = denoise_step_partial_call(x_t)
x_t = x_t + dt * v_t
# Euler step
x_t += dt * v_t
# Record x_t and v_t after Euler step
if self.rtc_processor is not None and self.rtc_processor.is_debug_enabled():
self.rtc_processor.track(time=time, x_t=x_t, v_t=v_t)
time += dt
return x_t
def denoise_step(

View File

@@ -783,15 +783,18 @@ class VLAFlowMatching(nn.Module):
use_cache=self.config.use_cache,
fill_kv_cache=True,
)
num_steps = self.config.num_steps
dt = -1.0 / num_steps
dt = -1.0 / self.config.num_steps
dt = torch.tensor(dt, dtype=torch.float32, device=device)
x_t = noise
for step in range(num_steps):
time = 1.0 + step * dt
time_tensor = torch.tensor(time, dtype=torch.float32, device=device).expand(bsize)
time = torch.tensor(1.0, dtype=torch.float32, device=device)
def denoise_step_partial_call(input_x_t, current_timestep=time_tensor):
while time >= -dt / 2:
expanded_time = time.expand(bsize)
# Define a closure function to properly capture expanded_time
# This avoids the lambda expression (E731) and loop variable binding (B023) issues
def denoise_step_partial_call(input_x_t, current_timestep=expanded_time):
return self.denoise_step(
x_t=input_x_t,
prefix_pad_masks=prefix_pad_masks,
@@ -815,11 +818,15 @@ class VLAFlowMatching(nn.Module):
else:
v_t = denoise_step_partial_call(x_t)
x_t = x_t + dt * v_t
# Euler step
x_t += dt * v_t
# Record x_t and v_t after Euler step (other params are recorded in rtc_processor.denoise_step)
if self.rtc_processor is not None and self.rtc_processor.is_debug_enabled():
self.rtc_processor.track(time=time, x_t=x_t, v_t=v_t)
time += dt
return x_t
def denoise_step(

View File

@@ -1,21 +0,0 @@
#!/usr/bin/env python
# Copyright 2025 The HuggingFace Inc. team. All rights reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# OMX is a fully open-source robot from ROBOTIS.
# More information at: https://ai.robotis.com/omx/introduction_omx.html
from .config_omx_follower import OmxFollowerConfig
from .omx_follower import OmxFollower

View File

@@ -1,39 +0,0 @@
# Copyright 2024 The HuggingFace Inc. team. All rights reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
from dataclasses import dataclass, field
from lerobot.cameras import CameraConfig
from ..config import RobotConfig
@RobotConfig.register_subclass("omx_follower")
@dataclass
class OmxFollowerConfig(RobotConfig):
# Port to connect to the arm
port: str
disable_torque_on_disconnect: bool = True
# `max_relative_target` limits the magnitude of the relative positional target vector for safety purposes.
# Set this to a positive scalar to have the same value for all motors, or a dictionary that maps motor
# names to the max_relative_target value for that motor.
max_relative_target: float | dict[str, float] | None = None
# cameras
cameras: dict[str, CameraConfig] = field(default_factory=dict)
# Set to `True` for backward compatibility with previous policies/dataset
use_degrees: bool = False

View File

@@ -1,225 +0,0 @@
#!/usr/bin/env python
# Copyright 2024 The HuggingFace Inc. team. All rights reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
import logging
import time
from functools import cached_property
from typing import Any
from lerobot.cameras.utils import make_cameras_from_configs
from lerobot.motors import Motor, MotorCalibration, MotorNormMode
from lerobot.motors.dynamixel import (
DriveMode,
DynamixelMotorsBus,
OperatingMode,
)
from lerobot.utils.errors import DeviceAlreadyConnectedError, DeviceNotConnectedError
from ..robot import Robot
from ..utils import ensure_safe_goal_position
from .config_omx_follower import OmxFollowerConfig
logger = logging.getLogger(__name__)
class OmxFollower(Robot):
"""
- [OMX](https://github.com/ROBOTIS-GIT/open_manipulator),
expansion, developed by Woojin Wie and Junha Cha from [ROBOTIS](https://ai.robotis.com/)
"""
config_class = OmxFollowerConfig
name = "omx_follower"
def __init__(self, config: OmxFollowerConfig):
super().__init__(config)
self.config = config
norm_mode_body = MotorNormMode.DEGREES if config.use_degrees else MotorNormMode.RANGE_M100_100
self.bus = DynamixelMotorsBus(
port=self.config.port,
motors={
"shoulder_pan": Motor(11, "xl430-w250", norm_mode_body),
"shoulder_lift": Motor(12, "xl430-w250", norm_mode_body),
"elbow_flex": Motor(13, "xl430-w250", norm_mode_body),
"wrist_flex": Motor(14, "xl330-m288", norm_mode_body),
"wrist_roll": Motor(15, "xl330-m288", norm_mode_body),
"gripper": Motor(16, "xl330-m288", MotorNormMode.RANGE_0_100),
},
calibration=self.calibration,
)
self.cameras = make_cameras_from_configs(config.cameras)
@property
def _motors_ft(self) -> dict[str, type]:
return {f"{motor}.pos": float for motor in self.bus.motors}
@property
def _cameras_ft(self) -> dict[str, tuple]:
return {
cam: (self.config.cameras[cam].height, self.config.cameras[cam].width, 3) for cam in self.cameras
}
@cached_property
def observation_features(self) -> dict[str, type | tuple]:
return {**self._motors_ft, **self._cameras_ft}
@cached_property
def action_features(self) -> dict[str, type]:
return self._motors_ft
@property
def is_connected(self) -> bool:
return self.bus.is_connected and all(cam.is_connected for cam in self.cameras.values())
def connect(self, calibrate: bool = True) -> None:
"""
For OMX robots that come pre-calibrated:
- If default calibration from package doesn't match motors, read from motors and save
- This allows using pre-calibrated robots without manual calibration
- If no calibration file exists, use factory default values (homing_offset=0, range_min=0, range_max=4095)
"""
if self.is_connected:
raise DeviceAlreadyConnectedError(f"{self} already connected")
self.bus.connect()
if not self.is_calibrated and calibrate:
logger.info(
"Mismatch between calibration values in the motor and the calibration file or no calibration file found"
)
self.calibrate()
for cam in self.cameras.values():
cam.connect()
self.configure()
logger.info(f"{self} connected.")
@property
def is_calibrated(self) -> bool:
return self.bus.is_calibrated
def calibrate(self) -> None:
self.bus.disable_torque()
logger.info(f"\nUsing factory default calibration values for {self}")
logger.info(f"\nWriting default configuration of {self} to the motors")
for motor in self.bus.motors:
self.bus.write("Operating_Mode", motor, OperatingMode.EXTENDED_POSITION.value)
for motor in self.bus.motors:
self.bus.write("Drive_Mode", motor, DriveMode.NON_INVERTED.value)
self.calibration = {}
for motor, m in self.bus.motors.items():
self.calibration[motor] = MotorCalibration(
id=m.id,
drive_mode=0,
homing_offset=0,
range_min=0,
range_max=4095,
)
self.bus.write_calibration(self.calibration)
self._save_calibration()
logger.info(f"Calibration saved to {self.calibration_fpath}")
def configure(self) -> None:
with self.bus.torque_disabled():
self.bus.configure_motors()
# Use 'extended position mode' for all motors except gripper, because in joint mode the servos
# can't rotate more than 360 degrees (from 0 to 4095) And some mistake can happen while assembling
# the arm, you could end up with a servo with a position 0 or 4095 at a crucial point
for motor in self.bus.motors:
if motor != "gripper":
self.bus.write("Operating_Mode", motor, OperatingMode.EXTENDED_POSITION.value)
# Use 'position control current based' for gripper to be limited by the limit of the current. For
# the follower gripper, it means it can grasp an object without forcing too much even tho, its
# goal position is a complete grasp (both gripper fingers are ordered to join and reach a touch).
# For the leader gripper, it means we can use it as a physical trigger, since we can force with
# our finger to make it move, and it will move back to its original target position when we
# release the force.
self.bus.write("Operating_Mode", "gripper", OperatingMode.CURRENT_POSITION.value)
# Set better PID values to close the gap between recorded states and actions
# TODO(rcadene): Implement an automatic procedure to set optimal PID values for each motor
self.bus.write("Position_P_Gain", "elbow_flex", 1500)
self.bus.write("Position_I_Gain", "elbow_flex", 0)
self.bus.write("Position_D_Gain", "elbow_flex", 600)
def setup_motors(self) -> None:
for motor in reversed(self.bus.motors):
input(f"Connect the controller board to the '{motor}' motor only and press enter.")
self.bus.setup_motor(motor)
print(f"'{motor}' motor id set to {self.bus.motors[motor].id}")
def get_observation(self) -> dict[str, Any]:
if not self.is_connected:
raise DeviceNotConnectedError(f"{self} is not connected.")
# Read arm position
start = time.perf_counter()
obs_dict = self.bus.sync_read("Present_Position")
obs_dict = {f"{motor}.pos": val for motor, val in obs_dict.items()}
dt_ms = (time.perf_counter() - start) * 1e3
logger.debug(f"{self} read state: {dt_ms:.1f}ms")
# Capture images from cameras
for cam_key, cam in self.cameras.items():
start = time.perf_counter()
obs_dict[cam_key] = cam.async_read()
dt_ms = (time.perf_counter() - start) * 1e3
logger.debug(f"{self} read {cam_key}: {dt_ms:.1f}ms")
return obs_dict
def send_action(self, action: dict[str, float]) -> dict[str, float]:
"""Command arm to move to a target joint configuration.
The relative action magnitude may be clipped depending on the configuration parameter
`max_relative_target`. In this case, the action sent differs from original action.
Thus, this function always returns the action actually sent.
Args:
action (dict[str, float]): The goal positions for the motors.
Returns:
dict[str, float]: The action sent to the motors, potentially clipped.
"""
if not self.is_connected:
raise DeviceNotConnectedError(f"{self} is not connected.")
goal_pos = {key.removesuffix(".pos"): val for key, val in action.items() if key.endswith(".pos")}
# Cap goal position when too far away from present position.
# /!\ Slower fps expected due to reading from the follower.
if self.config.max_relative_target is not None:
present_pos = self.bus.sync_read("Present_Position")
goal_present_pos = {key: (g_pos, present_pos[key]) for key, g_pos in goal_pos.items()}
goal_pos = ensure_safe_goal_position(goal_present_pos, self.config.max_relative_target)
# Send goal position to the arm
self.bus.sync_write("Goal_Position", goal_pos)
return {f"{motor}.pos": val for motor, val in goal_pos.items()}
def disconnect(self):
if not self.is_connected:
raise DeviceNotConnectedError(f"{self} is not connected.")
self.bus.disconnect(self.config.disable_torque_on_disconnect)
for cam in self.cameras.values():
cam.disconnect()
logger.info(f"{self} disconnected.")

View File

@@ -16,8 +16,6 @@
from dataclasses import dataclass, field
from lerobot.cameras import CameraConfig
from ..config import RobotConfig
_GAINS: dict[str, dict[str, list[float]]] = {
@@ -53,11 +51,5 @@ class UnitreeG1Config(RobotConfig):
control_dt: float = 1.0 / 250.0 # 250Hz
# launch mujoco simulation
is_simulation: bool = False
# socket config for ZMQ bridge
robot_ip: str = "172.18.129.215"
# cameras (optional)
cameras: dict[str, CameraConfig] = field(default_factory=dict)
robot_ip: str = "192.168.123.164"

View File

@@ -1,302 +0,0 @@
#!/usr/bin/env python
"""
Standalone keyboard control script for Unitree G1 robot.
This script provides keyboard-based velocity control for the G1 robot's
locomotion system. It can be run alongside the main robot control to
provide manual movement commands.
Usage:
python keyboard_control.py [--robot-ip IP] [--simulation]
Controls:
W/S: Forward/Backward
A/D: Strafe Left/Right
Q/E: Rotate Left/Right
R/F: Raise/Lower Height (GR00T policies only)
Z: Stop (zero all velocity commands)
ESC/Ctrl+C: Exit
"""
import argparse
import sys
import select
import time
import numpy as np
# Terminal handling for non-blocking keyboard input
try:
import termios
import tty
HAS_TERMIOS = True
except ImportError:
HAS_TERMIOS = False
print("Warning: termios not available. Keyboard controls require Linux/macOS.")
class KeyboardController:
"""Handles keyboard input and converts to locomotion commands."""
def __init__(self, callback=None):
"""
Initialize keyboard controller.
Args:
callback: Optional function called when commands change.
Signature: callback(vx, vy, yaw, height)
"""
self.callback = callback
self.running = False
# Locomotion commands
self.vx = 0.0 # Forward/backward velocity
self.vy = 0.0 # Left/right velocity (strafe)
self.yaw = 0.0 # Rotation rate
self.height = 0.74 # Base height (for GR00T policies)
# Command limits
self.vx_limit = (-0.8, 0.8)
self.vy_limit = (-0.5, 0.5)
self.yaw_limit = (-1.0, 1.0)
self.height_limit = (0.50, 1.00)
# Increments per keypress
self.vx_increment = 0.4
self.vy_increment = 0.25
self.yaw_increment = 0.5
self.height_increment = 0.05
self._old_terminal_settings = None
def get_commands(self) -> tuple[float, float, float, float]:
"""Get current command values as tuple (vx, vy, yaw, height)."""
return (self.vx, self.vy, self.yaw, self.height)
def get_commands_array(self) -> np.ndarray:
"""Get velocity commands as numpy array [vx, vy, yaw]."""
return np.array([self.vx, self.vy, self.yaw], dtype=np.float32)
def reset_commands(self):
"""Reset all commands to zero (stop)."""
self.vx = 0.0
self.vy = 0.0
self.yaw = 0.0
self._notify_callback()
def _clamp(self, value: float, limits: tuple[float, float]) -> float:
"""Clamp value to limits."""
return max(limits[0], min(limits[1], value))
def _notify_callback(self):
"""Call callback with current commands if set."""
if self.callback:
self.callback(self.vx, self.vy, self.yaw, self.height)
def process_key(self, key: str) -> bool:
"""
Process a single key press and update commands.
Args:
key: Single character key that was pressed.
Returns:
True if key was handled, False otherwise.
"""
key = key.lower()
handled = True
if key == 'w':
self.vx = self._clamp(self.vx + self.vx_increment, self.vx_limit)
elif key == 's':
self.vx = self._clamp(self.vx - self.vx_increment, self.vx_limit)
elif key == 'a':
self.vy = self._clamp(self.vy + self.vy_increment, self.vy_limit)
elif key == 'd':
self.vy = self._clamp(self.vy - self.vy_increment, self.vy_limit)
elif key == 'q':
self.yaw = self._clamp(self.yaw + self.yaw_increment, self.yaw_limit)
elif key == 'e':
self.yaw = self._clamp(self.yaw - self.yaw_increment, self.yaw_limit)
elif key == 'r':
self.height = self._clamp(self.height + self.height_increment, self.height_limit)
elif key == 'f':
self.height = self._clamp(self.height - self.height_increment, self.height_limit)
elif key == 'z':
self.reset_commands()
return True # Already notified in reset_commands
else:
handled = False
if handled:
self._notify_callback()
return handled
def _setup_terminal(self):
"""Set terminal to raw mode for single character input."""
if HAS_TERMIOS:
self._old_terminal_settings = termios.tcgetattr(sys.stdin)
tty.setcbreak(sys.stdin.fileno())
def _restore_terminal(self):
"""Restore terminal to original settings."""
if HAS_TERMIOS and self._old_terminal_settings is not None:
termios.tcsetattr(sys.stdin, termios.TCSADRAIN, self._old_terminal_settings)
self._old_terminal_settings = None
def run(self):
"""Run the keyboard listener loop (blocking)."""
if not HAS_TERMIOS:
print("Error: Keyboard controls require termios (Linux/macOS)")
return
self.running = True
self._print_controls()
try:
self._setup_terminal()
while self.running:
# Check for keyboard input with timeout
if select.select([sys.stdin], [], [], 0.1)[0]:
key = sys.stdin.read(1)
# Handle escape sequences (arrow keys, etc.)
if key == '\x1b': # ESC
self.running = False
break
if self.process_key(key):
self._print_status()
except KeyboardInterrupt:
print("\nInterrupted by user")
finally:
self._restore_terminal()
print("\nKeyboard controls stopped")
def stop(self):
"""Stop the keyboard listener."""
self.running = False
def _print_controls(self):
"""Print control instructions."""
print("\n" + "=" * 60)
print("KEYBOARD CONTROLS ACTIVE")
print("=" * 60)
print(" W/S: Forward/Backward")
print(" A/D: Strafe Left/Right")
print(" Q/E: Rotate Left/Right")
print(" R/F: Raise/Lower Height (±5cm)")
print(" Z: Stop (zero all commands)")
print(" ESC: Exit")
print("=" * 60 + "\n")
def _print_status(self):
"""Print current command status."""
print(f"[CMD] vx={self.vx:+.2f}, vy={self.vy:+.2f}, yaw={self.yaw:+.2f} | height={self.height:.3f}m")
class RobotKeyboardController(KeyboardController):
"""Keyboard controller that directly updates a robot's locomotion commands."""
def __init__(self, robot):
"""
Initialize with a UnitreeG1 robot instance.
Args:
robot: UnitreeG1 robot instance with locomotion_cmd attribute.
"""
super().__init__()
self.robot = robot
# Initialize from robot's current state if available
if hasattr(robot, 'locomotion_cmd'):
self.vx = robot.locomotion_cmd[0]
self.vy = robot.locomotion_cmd[1]
self.yaw = robot.locomotion_cmd[2]
if hasattr(robot, 'groot_height_cmd'):
self.height = robot.groot_height_cmd
def _notify_callback(self):
"""Update robot's locomotion commands directly."""
if hasattr(self.robot, 'locomotion_cmd'):
self.robot.locomotion_cmd[0] = self.vx
self.robot.locomotion_cmd[1] = self.vy
self.robot.locomotion_cmd[2] = self.yaw
if hasattr(self.robot, 'groot_height_cmd'):
self.robot.groot_height_cmd = self.height
def start_keyboard_control_thread(robot) -> tuple:
"""
Start keyboard controls for a robot in a background thread.
Args:
robot: UnitreeG1 robot instance.
Returns:
Tuple of (controller, thread) for later stopping.
"""
import threading
controller = RobotKeyboardController(robot)
thread = threading.Thread(target=controller.run, daemon=True)
thread.start()
return controller, thread
def stop_keyboard_control_thread(controller, thread, timeout: float = 2.0):
"""
Stop the keyboard control thread.
Args:
controller: KeyboardController instance.
thread: Thread running the controller.
timeout: Max time to wait for thread to stop.
"""
controller.stop()
thread.join(timeout=timeout)
def main():
"""Standalone keyboard control with optional robot connection."""
parser = argparse.ArgumentParser(description="Keyboard control for Unitree G1")
parser.add_argument("--standalone", action="store_true",
help="Run in standalone mode (just print commands, no robot)")
args = parser.parse_args()
if args.standalone:
# Standalone mode - just demonstrate keyboard input
def print_callback(vx, vy, yaw, height):
print(f" → Would send: vx={vx:+.2f}, vy={vy:+.2f}, yaw={yaw:+.2f}, height={height:.3f}")
controller = KeyboardController(callback=print_callback)
print("Running in STANDALONE mode (no robot connection)")
controller.run()
else:
print("To use with a robot, import and use RobotKeyboardController:")
print("")
print(" from lerobot.robots.unitree_g1.keyboard_control import (")
print(" RobotKeyboardController,")
print(" start_keyboard_control_thread,")
print(" stop_keyboard_control_thread")
print(" )")
print("")
print(" # Start keyboard controls")
print(" controller, thread = start_keyboard_control_thread(robot)")
print("")
print(" # ... robot runs ...")
print("")
print(" # Stop keyboard controls")
print(" stop_keyboard_control_thread(controller, thread)")
print("")
print("Or run with --standalone to test keyboard input without a robot.")
if __name__ == "__main__":
main()

View File

@@ -99,12 +99,11 @@ def state_forward_loop(
lowstate_sub: ChannelSubscriber,
lowstate_sock: zmq.Socket,
state_period: float,
shutdown_event: threading.Event,
) -> None:
"""Read observation from DDS and forward to ZMQ clients."""
last_state_time = 0.0
while not shutdown_event.is_set():
while True:
# read from DDS
msg = lowstate_sub.Read()
if msg is None:
@@ -129,10 +128,7 @@ def cmd_forward_loop(
) -> None:
"""Receive commands from ZMQ and forward to DDS."""
while True:
try:
payload = lowcmd_sock.recv()
except zmq.ContextTerminated:
break
payload = lowcmd_sock.recv()
msg_dict = json.loads(payload.decode("utf-8"))
topic = msg_dict.get("topic", "")
@@ -186,26 +182,30 @@ def main() -> None:
lowstate_sock.bind(f"tcp://0.0.0.0:{LOWSTATE_PORT}")
state_period = 0.002 # ~500 hz
shutdown_event = threading.Event()
# start observation forwarding in background thread
# start observation forwarding thread
t_state = threading.Thread(
target=state_forward_loop,
args=(lowstate_sub, lowstate_sock, state_period, shutdown_event),
args=(lowstate_sub, lowstate_sock, state_period),
daemon=True,
)
t_state.start()
print("bridge running (lowstate -> zmq, lowcmd -> dds)")
# start action forwarding thread
t_cmd = threading.Thread(
target=cmd_forward_loop,
args=(lowcmd_sock, lowcmd_pub_debug, crc),
daemon=True,
)
t_cmd.start()
# run command forwarding in main thread
print("bridge running (lowstate -> zmq, lowcmd -> dds)")
# keep main thread alive so daemon threads don't exit
try:
cmd_forward_loop(lowcmd_sock, lowcmd_pub_debug, crc)
while True:
time.sleep(1.0)
except KeyboardInterrupt:
print("shutting down bridge...")
finally:
shutdown_event.set()
ctx.term() # terminates blocking zmq.recv() calls
t_state.join(timeout=2.0)
if __name__ == "__main__":

View File

@@ -30,8 +30,12 @@ from unitree_sdk2py.idl.unitree_hg.msg.dds_ import (
)
from unitree_sdk2py.utils.crc import CRC
from lerobot.envs.factory import make_env
from lerobot.robots.unitree_g1.g1_utils import G1_29_JointIndex
from lerobot.robots.unitree_g1.unitree_sdk2_socket import (
ChannelFactoryInitialize,
ChannelPublisher,
ChannelSubscriber,
)
from ..robot import Robot
from .config_unitree_g1 import UnitreeG1Config
@@ -123,21 +127,7 @@ class UnitreeG1(Robot):
self.control_dt = config.control_dt
if config.is_simulation:
from unitree_sdk2py.core.channel import (
ChannelFactoryInitialize,
ChannelPublisher,
ChannelSubscriber,
)
else:
from lerobot.robots.unitree_g1.unitree_sdk2_socket import (
ChannelFactoryInitialize,
ChannelPublisher,
ChannelSubscriber,
)
# connect robot
self.ChannelFactoryInitialize = ChannelFactoryInitialize
self.connect()
# initialize direct motor control interface
@@ -148,8 +138,8 @@ class UnitreeG1(Robot):
self.lowstate_buffer = DataBuffer()
# initialize subscribe thread to read robot state
self._shutdown_event = threading.Event()
self.subscribe_thread = threading.Thread(target=self._subscribe_motor_state)
self.subscribe_thread.daemon = True
self.subscribe_thread.start()
while not self.is_connected:
@@ -184,7 +174,7 @@ class UnitreeG1(Robot):
self.remote_controller = self.RemoteController()
def _subscribe_motor_state(self): # polls robot state @ 250Hz
while not self._shutdown_event.is_set():
while True:
start_time = time.time()
msg = self.lowstate_subscriber.Read()
if msg is not None:
@@ -228,17 +218,10 @@ class UnitreeG1(Robot):
pass
def connect(self, calibrate: bool = True) -> None: # connect to DDS
if self.config.is_simulation:
self.ChannelFactoryInitialize(0, "lo")
self.mujoco_env = make_env("lerobot/unitree-g1-mujoco", trust_remote_code=True)
else:
self.ChannelFactoryInitialize(0)
ChannelFactoryInitialize(0)
def disconnect(self):
self._shutdown_event.set()
self.subscribe_thread.join(timeout=2.0)
if self.config.is_simulation:
self.mujoco_env["hub_env"][0].envs[0].kill_sim()
pass
def get_observation(self) -> dict[str, Any]:
return self.lowstate_buffer.get_data()

View File

@@ -28,10 +28,6 @@ def make_robot_from_config(config: RobotConfig) -> Robot:
from .koch_follower import KochFollower
return KochFollower(config)
elif config.type == "omx_follower":
from .omx_follower import OmxFollower
return OmxFollower(config)
elif config.type == "so100_follower":
from .so100_follower import SO100Follower

View File

@@ -40,7 +40,6 @@ from lerobot.robots import ( # noqa: F401
koch_follower,
lekiwi,
make_robot_from_config,
omx_follower,
so100_follower,
so101_follower,
)
@@ -50,7 +49,6 @@ from lerobot.teleoperators import ( # noqa: F401
homunculus,
koch_leader,
make_teleoperator_from_config,
omx_leader,
so100_leader,
so101_leader,
)

View File

@@ -18,8 +18,7 @@
Edit LeRobot datasets using various transformation tools.
This script allows you to delete episodes, split datasets, merge datasets,
remove features, and convert image datasets to video format.
When new_repo_id is specified, creates a new dataset.
and remove features. When new_repo_id is specified, creates a new dataset.
Usage Examples:
@@ -66,25 +65,6 @@ Remove camera feature:
--operation.type remove_feature \
--operation.feature_names "['observation.images.top']"
Convert image dataset to video format (saves locally):
python -m lerobot.scripts.lerobot_edit_dataset \
--repo_id lerobot/pusht_image \
--operation.type convert_to_video \
--operation.output_dir /path/to/output/pusht_video
Convert image dataset and save with new repo_id:
python -m lerobot.scripts.lerobot_edit_dataset \
--repo_id lerobot/pusht_image \
--new_repo_id lerobot/pusht_video \
--operation.type convert_to_video
Convert and push to hub:
python -m lerobot.scripts.lerobot_edit_dataset \
--repo_id lerobot/pusht_image \
--new_repo_id lerobot/pusht_video \
--operation.type convert_to_video \
--push_to_hub true
Using JSON config file:
python -m lerobot.scripts.lerobot_edit_dataset \
--config_path path/to/edit_config.json
@@ -92,13 +72,9 @@ Using JSON config file:
import logging
import shutil
from concurrent.futures import ThreadPoolExecutor, as_completed
from dataclasses import dataclass
from pathlib import Path
import pandas as pd
from tqdm import tqdm
from lerobot.configs import parser
from lerobot.datasets.dataset_tools import (
delete_episodes,
@@ -106,10 +82,8 @@ from lerobot.datasets.dataset_tools import (
remove_feature,
split_dataset,
)
from lerobot.datasets.lerobot_dataset import LeRobotDataset, LeRobotDatasetMetadata
from lerobot.datasets.utils import write_stats, write_tasks
from lerobot.datasets.video_utils import encode_video_frames, get_video_info
from lerobot.utils.constants import HF_LEROBOT_HOME, OBS_IMAGE
from lerobot.datasets.lerobot_dataset import LeRobotDataset
from lerobot.utils.constants import HF_LEROBOT_HOME
from lerobot.utils.utils import init_logging
@@ -137,23 +111,10 @@ class RemoveFeatureConfig:
feature_names: list[str] | None = None
@dataclass
class ConvertToVideoConfig:
type: str = "convert_to_video"
output_dir: str | None = None
vcodec: str = "libsvtav1"
pix_fmt: str = "yuv420p"
g: int = 2
crf: int = 30
fast_decode: int = 0
episode_indices: list[int] | None = None
num_workers: int = 4
@dataclass
class EditDatasetConfig:
repo_id: str
operation: DeleteEpisodesConfig | SplitConfig | MergeConfig | RemoveFeatureConfig | ConvertToVideoConfig
operation: DeleteEpisodesConfig | SplitConfig | MergeConfig | RemoveFeatureConfig
root: str | None = None
new_repo_id: str | None = None
push_to_hub: bool = False
@@ -297,415 +258,6 @@ def handle_remove_feature(cfg: EditDatasetConfig) -> None:
LeRobotDataset(output_repo_id, root=output_dir).push_to_hub()
def save_episode_images_for_video(
dataset: LeRobotDataset,
imgs_dir: Path,
img_key: str,
episode_index: int,
num_workers: int = 4,
) -> None:
"""Save images from a specific episode and camera to disk for video encoding.
Args:
dataset: The LeRobot dataset to extract images from
imgs_dir: Directory to save images to
img_key: The image key (camera) to extract
episode_index: Index of the episode to save
num_workers: Number of threads for parallel image saving
"""
# Create directory
imgs_dir.mkdir(parents=True, exist_ok=True)
# Get dataset without torch format for PIL image access
hf_dataset = dataset.hf_dataset.with_format(None)
# Select only this camera's images
imgs_dataset = hf_dataset.select_columns(img_key)
# Get episode start and end indices
from_idx = dataset.meta.episodes["dataset_from_index"][episode_index]
to_idx = dataset.meta.episodes["dataset_to_index"][episode_index]
# Get all items for this episode
episode_dataset = imgs_dataset.select(range(from_idx, to_idx))
# Define function to save a single image
def save_single_image(i_item_tuple):
i, item = i_item_tuple
img = item[img_key]
# Use frame-XXXXXX.png format to match encode_video_frames expectations
img.save(str(imgs_dir / f"frame-{i:06d}.png"), quality=100)
return i
# Save images with proper naming convention for encode_video_frames (frame-XXXXXX.png)
items = list(enumerate(episode_dataset))
with ThreadPoolExecutor(max_workers=num_workers) as executor:
futures = [executor.submit(save_single_image, item) for item in items]
for future in as_completed(futures):
future.result() # This will raise any exceptions that occurred
def encode_episode_videos(
dataset: LeRobotDataset,
new_meta: LeRobotDatasetMetadata,
episode_index: int,
vcodec: str,
pix_fmt: str,
g: int,
crf: int,
fast_decode: int,
temp_dir: Path,
num_image_workers: int = 4,
) -> dict[str, dict]:
"""Encode videos for a single episode and return video metadata.
Args:
dataset: Source dataset with images
new_meta: Metadata object for the new video dataset
episode_index: Episode index to process
vcodec: Video codec
pix_fmt: Pixel format
g: Group of pictures size
crf: Constant rate factor
fast_decode: Fast decode tuning
temp_dir: Temporary directory for images
num_image_workers: Number of workers for saving images
Returns:
Dictionary mapping video keys to their metadata (chunk_index, file_index, timestamps)
"""
hf_dataset = dataset.hf_dataset.with_format(None)
img_keys = [key for key in hf_dataset.features if key.startswith(OBS_IMAGE)]
video_metadata = {}
fps = int(dataset.fps) # Convert to int for PyAV compatibility
episode_length = dataset.meta.episodes["length"][episode_index]
episode_duration = episode_length / dataset.fps # Use original fps for duration calculation
for img_key in img_keys:
# Save images temporarily
imgs_dir = temp_dir / f"episode_{episode_index:06d}" / img_key
save_episode_images_for_video(dataset, imgs_dir, img_key, episode_index, num_image_workers)
# Determine chunk and file indices
# For simplicity, we'll put each episode in its own file
chunk_idx = episode_index // new_meta.chunks_size
file_idx = episode_index % new_meta.chunks_size
# Create video path in the new dataset structure
video_path = new_meta.root / new_meta.video_path.format(
video_key=img_key, chunk_index=chunk_idx, file_index=file_idx
)
video_path.parent.mkdir(parents=True, exist_ok=True)
# Encode video
encode_video_frames(
imgs_dir=imgs_dir,
video_path=video_path,
fps=fps,
vcodec=vcodec,
pix_fmt=pix_fmt,
g=g,
crf=crf,
fast_decode=fast_decode,
overwrite=True,
)
# Clean up temporary images
shutil.rmtree(imgs_dir)
# Store video metadata
video_metadata[img_key] = {
f"videos/{img_key}/chunk_index": chunk_idx,
f"videos/{img_key}/file_index": file_idx,
f"videos/{img_key}/from_timestamp": 0.0,
f"videos/{img_key}/to_timestamp": episode_duration,
}
return video_metadata
def convert_dataset_to_videos(
dataset: LeRobotDataset,
output_dir: Path,
repo_id: str | None = None,
vcodec: str = "libsvtav1",
pix_fmt: str = "yuv420p",
g: int = 2,
crf: int = 30,
fast_decode: int = 0,
episode_indices: list[int] | None = None,
num_workers: int = 4,
) -> LeRobotDataset:
"""Convert image-based dataset to video-based dataset.
Creates a new LeRobotDataset with videos instead of images, following the proper
LeRobot dataset structure with videos stored in chunked MP4 files.
Args:
dataset: The source LeRobot dataset with images
output_dir: Directory to save the new video dataset
repo_id: Repository ID for the new dataset (default: original_id + "_video")
vcodec: Video codec (default: libsvtav1)
pix_fmt: Pixel format (default: yuv420p)
g: Group of pictures size (default: 2)
crf: Constant rate factor (default: 30)
fast_decode: Fast decode tuning (default: 0)
episode_indices: List of episode indices to convert (None = all episodes)
num_workers: Number of threads for parallel processing (default: 4)
Returns:
New LeRobotDataset with videos
"""
# Check that it's an image dataset
if len(dataset.meta.video_keys) > 0:
raise ValueError(
f"This operation is for image datasets only. Video dataset provided: {dataset.repo_id}"
)
# Get all image keys
hf_dataset = dataset.hf_dataset.with_format(None)
img_keys = [key for key in hf_dataset.features if key.startswith(OBS_IMAGE)]
if len(img_keys) == 0:
raise ValueError(f"No image keys found in dataset {dataset.repo_id}")
# Determine which episodes to process
if episode_indices is None:
episode_indices = list(range(dataset.meta.total_episodes))
if repo_id is None:
repo_id = f"{dataset.repo_id}_video"
logging.info(
f"Converting {len(episode_indices)} episodes with {len(img_keys)} cameras from {dataset.repo_id}"
)
logging.info(f"Video codec: {vcodec}, pixel format: {pix_fmt}, GOP: {g}, CRF: {crf}")
# Create new features dict, converting image features to video features
new_features = {}
for key, value in dataset.meta.features.items():
if key not in img_keys:
new_features[key] = value
else:
# Convert image key to video format
new_features[key] = value.copy()
new_features[key]["dtype"] = "video" # Change dtype from "image" to "video"
# Video info will be updated after episodes are encoded
# Create new metadata for video dataset
new_meta = LeRobotDatasetMetadata.create(
repo_id=repo_id,
fps=dataset.meta.fps,
features=new_features,
robot_type=dataset.meta.robot_type,
root=output_dir,
use_videos=True,
chunks_size=dataset.meta.chunks_size,
data_files_size_in_mb=dataset.meta.data_files_size_in_mb,
video_files_size_in_mb=dataset.meta.video_files_size_in_mb,
)
# Create temporary directory for image extraction
temp_dir = output_dir / "temp_images"
temp_dir.mkdir(parents=True, exist_ok=True)
# Process each episode
all_episode_metadata = []
try:
for ep_idx in tqdm(episode_indices, desc="Converting episodes to videos"):
# Get episode metadata from source
src_episode = dataset.meta.episodes[ep_idx]
# Encode videos for this episode
video_metadata = encode_episode_videos(
dataset=dataset,
new_meta=new_meta,
episode_index=ep_idx,
vcodec=vcodec,
pix_fmt=pix_fmt,
g=g,
crf=crf,
fast_decode=fast_decode,
temp_dir=temp_dir,
num_image_workers=num_workers,
)
# Build episode metadata
episode_meta = {
"episode_index": ep_idx,
"length": src_episode["length"],
"dataset_from_index": ep_idx * src_episode["length"],
"dataset_to_index": (ep_idx + 1) * src_episode["length"],
}
# Add video metadata
for img_key in img_keys:
episode_meta.update(video_metadata[img_key])
# Add data chunk/file info (using same structure as source)
if "data/chunk_index" in src_episode:
episode_meta["data/chunk_index"] = src_episode["data/chunk_index"]
episode_meta["data/file_index"] = src_episode["data/file_index"]
all_episode_metadata.append(episode_meta)
# Copy and transform data files (removing image columns)
_copy_data_without_images(dataset, new_meta, episode_indices, img_keys)
# Save episode metadata
episodes_df = pd.DataFrame(all_episode_metadata)
episodes_path = new_meta.root / "meta" / "episodes" / "chunk-000" / "file-000.parquet"
episodes_path.parent.mkdir(parents=True, exist_ok=True)
episodes_df.to_parquet(episodes_path, index=False)
# Update metadata info
new_meta.info["total_episodes"] = len(episode_indices)
new_meta.info["total_frames"] = sum(ep["length"] for ep in all_episode_metadata)
new_meta.info["total_tasks"] = dataset.meta.total_tasks
new_meta.info["splits"] = {"train": f"0:{len(episode_indices)}"}
# Update video info for all image keys (now videos)
# We need to manually set video info since update_video_info() checks video_keys first
for img_key in img_keys:
if not new_meta.features[img_key].get("info", None):
video_path = new_meta.root / new_meta.video_path.format(
video_key=img_key, chunk_index=0, file_index=0
)
new_meta.info["features"][img_key]["info"] = get_video_info(video_path)
from lerobot.datasets.utils import write_info
write_info(new_meta.info, new_meta.root)
# Copy stats and tasks
if dataset.meta.stats is not None:
# Remove image stats
new_stats = {k: v for k, v in dataset.meta.stats.items() if k not in img_keys}
write_stats(new_stats, new_meta.root)
if dataset.meta.tasks is not None:
write_tasks(dataset.meta.tasks, new_meta.root)
finally:
# Clean up temporary directory
if temp_dir.exists():
shutil.rmtree(temp_dir)
logging.info(f"✓ Completed converting {dataset.repo_id} to video format")
logging.info(f"New dataset saved to: {output_dir}")
# Return new dataset
return LeRobotDataset(repo_id=repo_id, root=output_dir)
def _copy_data_without_images(
src_dataset: LeRobotDataset,
dst_meta: LeRobotDatasetMetadata,
episode_indices: list[int],
img_keys: list[str],
) -> None:
"""Copy data files without image columns.
Args:
src_dataset: Source dataset
dst_meta: Destination metadata
episode_indices: Episodes to include
img_keys: Image keys to remove
"""
from lerobot.datasets.utils import DATA_DIR
data_dir = src_dataset.root / DATA_DIR
parquet_files = sorted(data_dir.glob("*/*.parquet"))
if not parquet_files:
raise ValueError(f"No parquet files found in {data_dir}")
episode_set = set(episode_indices)
for src_path in tqdm(parquet_files, desc="Processing data files"):
df = pd.read_parquet(src_path).reset_index(drop=True)
# Filter to only include selected episodes
df = df[df["episode_index"].isin(episode_set)].copy()
if len(df) == 0:
continue
# Remove image columns
columns_to_drop = [col for col in img_keys if col in df.columns]
if columns_to_drop:
df = df.drop(columns=columns_to_drop)
# Get chunk and file indices from path
relative_path = src_path.relative_to(src_dataset.root)
chunk_dir = relative_path.parts[1]
file_name = relative_path.parts[2]
chunk_idx = int(chunk_dir.split("-")[1])
file_idx = int(file_name.split("-")[1].split(".")[0])
# Write to destination without pandas index
dst_path = dst_meta.root / f"data/chunk-{chunk_idx:03d}/file-{file_idx:03d}.parquet"
dst_path.parent.mkdir(parents=True, exist_ok=True)
df.to_parquet(dst_path, index=False)
def handle_convert_to_video(cfg: EditDatasetConfig) -> None:
# Note: Parser may create any config type with the right fields, so we access fields directly
# instead of checking isinstance()
dataset = LeRobotDataset(cfg.repo_id, root=cfg.root)
# Determine output directory and repo_id
# Priority: 1) new_repo_id, 2) operation.output_dir, 3) auto-generated name
output_dir_config = getattr(cfg.operation, "output_dir", None)
if cfg.new_repo_id:
# Use new_repo_id for both local storage and hub push
output_repo_id = cfg.new_repo_id
output_dir = Path(cfg.root) / cfg.new_repo_id if cfg.root else HF_LEROBOT_HOME / cfg.new_repo_id
logging.info(f"Saving to new dataset: {cfg.new_repo_id}")
elif output_dir_config:
# Use custom output directory for local-only storage
output_dir = Path(output_dir_config)
# Extract repo name from output_dir for the dataset
output_repo_id = output_dir.name
logging.info(f"Saving to local directory: {output_dir}")
else:
# Auto-generate name: append "_video" to original repo_id
output_repo_id = f"{cfg.repo_id}_video"
output_dir = Path(cfg.root) / output_repo_id if cfg.root else HF_LEROBOT_HOME / output_repo_id
logging.info(f"Saving to auto-generated location: {output_dir}")
logging.info(f"Converting dataset {cfg.repo_id} to video format")
new_dataset = convert_dataset_to_videos(
dataset=dataset,
output_dir=output_dir,
repo_id=output_repo_id,
vcodec=getattr(cfg.operation, "vcodec", "libsvtav1"),
pix_fmt=getattr(cfg.operation, "pix_fmt", "yuv420p"),
g=getattr(cfg.operation, "g", 2),
crf=getattr(cfg.operation, "crf", 30),
fast_decode=getattr(cfg.operation, "fast_decode", 0),
episode_indices=getattr(cfg.operation, "episode_indices", None),
num_workers=getattr(cfg.operation, "num_workers", 4),
)
logging.info("Video dataset created successfully!")
logging.info(f"Location: {output_dir}")
logging.info(f"Episodes: {new_dataset.meta.total_episodes}")
logging.info(f"Frames: {new_dataset.meta.total_frames}")
if cfg.push_to_hub:
logging.info(f"Pushing to hub as {output_repo_id}...")
new_dataset.push_to_hub()
logging.info("✓ Successfully pushed to hub!")
else:
logging.info("Dataset saved locally (not pushed to hub)")
@parser.wrap()
def edit_dataset(cfg: EditDatasetConfig) -> None:
operation_type = cfg.operation.type
@@ -718,12 +270,10 @@ def edit_dataset(cfg: EditDatasetConfig) -> None:
handle_merge(cfg)
elif operation_type == "remove_feature":
handle_remove_feature(cfg)
elif operation_type == "convert_to_video":
handle_convert_to_video(cfg)
else:
raise ValueError(
f"Unknown operation type: {operation_type}\n"
f"Available operations: delete_episodes, split, merge, remove_feature, convert_to_video"
f"Available operations: delete_episodes, split, merge, remove_feature"
)

View File

@@ -46,7 +46,6 @@ from lerobot.robots import ( # noqa: F401
RobotConfig,
koch_follower,
make_robot_from_config,
omx_follower,
so100_follower,
so101_follower,
)
@@ -55,7 +54,6 @@ from lerobot.teleoperators import ( # noqa: F401
gamepad,
koch_leader,
make_teleoperator_from_config,
omx_leader,
so100_leader,
so101_leader,
)

View File

@@ -97,11 +97,9 @@ from lerobot.robots import ( # noqa: F401
hope_jr,
koch_follower,
make_robot_from_config,
omx_follower,
so100_follower,
so101_follower,
)
from lerobot.robots.unitree_g1 import config_unitree_g1 # noqa: F401
from lerobot.teleoperators import ( # noqa: F401
Teleoperator,
TeleoperatorConfig,
@@ -109,7 +107,6 @@ from lerobot.teleoperators import ( # noqa: F401
homunculus,
koch_leader,
make_teleoperator_from_config,
omx_leader,
so100_leader,
so101_leader,
)
@@ -198,8 +195,9 @@ class RecordConfig:
cli_overrides = parser.get_cli_overrides("policy")
self.policy = PreTrainedConfig.from_pretrained(policy_path, cli_overrides=cli_overrides)
self.policy.pretrained_path = policy_path
# Note: teleop and policy can both be None for robots with built-in control (e.g. unitree_g1)
# This is validated in record() after the robot is instantiated
if self.teleop is None and self.policy is None:
raise ValueError("Choose a policy, a teleoperator or both to control the robot")
@classmethod
def __get_path_fields__(cls) -> list[str]:
@@ -272,12 +270,7 @@ def record_loop(
for t in teleop
if isinstance(
t,
(
so100_leader.SO100Leader
| so101_leader.SO101Leader
| koch_leader.KochLeader
| omx_leader.OmxLeader
),
(so100_leader.SO100Leader | so101_leader.SO101Leader | koch_leader.KochLeader),
)
),
None,
@@ -340,13 +333,6 @@ def record_loop(
base_action = robot._from_keyboard_to_base_action(keyboard_action)
act = {**arm_action, **base_action} if len(base_action) > 0 else arm_action
act_processed_teleop = teleop_action_processor((act, obs))
elif policy is None and teleop is None and dataset is not None:
# Observation-only recording (robot controls itself, e.g. unitree_g1)
# Record observations, extract action-relevant values (positions) from obs
# Filter obs_processed to only include keys that match action_features
action_keys = set(robot.action_features.keys())
action_values = {k: v for k, v in obs_processed.items() if k in action_keys}
robot_action_to_send = None
else:
logging.info(
"No policy or teleoperator provided, skipping action generation."
@@ -359,17 +345,15 @@ def record_loop(
if policy is not None and act_processed_policy is not None:
action_values = act_processed_policy
robot_action_to_send = robot_action_processor((act_processed_policy, obs))
elif teleop is not None:
else:
action_values = act_processed_teleop
robot_action_to_send = robot_action_processor((act_processed_teleop, obs))
# else: observation-only mode, action_values already set above
# Send action to robot (skip if observation-only mode)
if robot_action_to_send is not None:
# Action can eventually be clipped using `max_relative_target`,
# so action actually sent is saved in the dataset. action = postprocessor.process(action)
# TODO(steven, pepijn, adil): we should use a pipeline step to clip the action, so the sent action is the action that we input to the robot.
_sent_action = robot.send_action(robot_action_to_send)
# Send action to robot
# Action can eventually be clipped using `max_relative_target`,
# so action actually sent is saved in the dataset. action = postprocessor.process(action)
# TODO(steven, pepijn, adil): we should use a pipeline step to clip the action, so the sent action is the action that we input to the robot.
_sent_action = robot.send_action(robot_action_to_send)
# Write to dataset
if dataset is not None:

View File

@@ -58,7 +58,6 @@ from lerobot.robots import ( # noqa: F401
hope_jr,
koch_follower,
make_robot_from_config,
omx_follower,
so100_follower,
so101_follower,
)

View File

@@ -33,7 +33,6 @@ from lerobot.robots import ( # noqa: F401
koch_follower,
lekiwi,
make_robot_from_config,
omx_follower,
so100_follower,
so101_follower,
)
@@ -41,7 +40,6 @@ from lerobot.teleoperators import ( # noqa: F401
TeleoperatorConfig,
koch_leader,
make_teleoperator_from_config,
omx_leader,
so100_leader,
so101_leader,
)
@@ -49,8 +47,6 @@ from lerobot.teleoperators import ( # noqa: F401
COMPATIBLE_DEVICES = [
"koch_follower",
"koch_leader",
"omx_follower",
"omx_leader",
"so100_follower",
"so100_leader",
"so101_follower",

View File

@@ -75,7 +75,6 @@ from lerobot.robots import ( # noqa: F401
hope_jr,
koch_follower,
make_robot_from_config,
omx_follower,
so100_follower,
so101_follower,
)
@@ -88,7 +87,6 @@ from lerobot.teleoperators import ( # noqa: F401
keyboard,
koch_leader,
make_teleoperator_from_config,
omx_leader,
so100_leader,
so101_leader,
)

View File

@@ -1,18 +0,0 @@
#!/usr/bin/env python
# Copyright 2025 The HuggingFace Inc. team. All rights reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
from .config_omx_leader import OmxLeaderConfig
from .omx_leader import OmxLeader

View File

@@ -1,30 +0,0 @@
#!/usr/bin/env python
# Copyright 2024 The HuggingFace Inc. team. All rights reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
from dataclasses import dataclass
from ..config import TeleoperatorConfig
@TeleoperatorConfig.register_subclass("omx_leader")
@dataclass
class OmxLeaderConfig(TeleoperatorConfig):
# Port to connect to the arm
port: str
# Sets the arm in torque mode with the gripper motor set to this value. This makes it possible to squeeze
# the gripper and have it spring back to an open position on its own.
gripper_open_pos: float = 37.0

View File

@@ -1,165 +0,0 @@
#!/usr/bin/env python
# Copyright 2024 The HuggingFace Inc. team. All rights reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
import logging
import time
from lerobot.motors import Motor, MotorCalibration, MotorNormMode
from lerobot.motors.dynamixel import (
DriveMode,
DynamixelMotorsBus,
OperatingMode,
)
from lerobot.utils.errors import DeviceAlreadyConnectedError, DeviceNotConnectedError
from ..teleoperator import Teleoperator
from .config_omx_leader import OmxLeaderConfig
logger = logging.getLogger(__name__)
class OmxLeader(Teleoperator):
"""
- [OMX](https://github.com/ROBOTIS-GIT/open_manipulator),
expansion, developed by Woojin Wie and Junha Cha from [ROBOTIS](https://ai.robotis.com/)
"""
config_class = OmxLeaderConfig
name = "omx_leader"
def __init__(self, config: OmxLeaderConfig):
super().__init__(config)
self.config = config
self.bus = DynamixelMotorsBus(
port=self.config.port,
motors={
"shoulder_pan": Motor(1, "xl330-m288", MotorNormMode.RANGE_M100_100),
"shoulder_lift": Motor(2, "xl330-m288", MotorNormMode.RANGE_M100_100),
"elbow_flex": Motor(3, "xl330-m288", MotorNormMode.RANGE_M100_100),
"wrist_flex": Motor(4, "xl330-m288", MotorNormMode.RANGE_M100_100),
"wrist_roll": Motor(5, "xl330-m288", MotorNormMode.RANGE_M100_100),
"gripper": Motor(6, "xl330-m077", MotorNormMode.RANGE_0_100),
},
calibration=self.calibration,
)
@property
def action_features(self) -> dict[str, type]:
return {f"{motor}.pos": float for motor in self.bus.motors}
@property
def feedback_features(self) -> dict[str, type]:
return {}
@property
def is_connected(self) -> bool:
return self.bus.is_connected
def connect(self, calibrate: bool = True) -> None:
if self.is_connected:
raise DeviceAlreadyConnectedError(f"{self} already connected")
self.bus.connect()
if not self.is_calibrated and calibrate:
logger.info(
"Mismatch between calibration values in the motor and the calibration file or no calibration file found"
)
self.calibrate()
self.configure()
logger.info(f"{self} connected.")
@property
def is_calibrated(self) -> bool:
return self.bus.is_calibrated
def calibrate(self) -> None:
self.bus.disable_torque()
logger.info(f"\nUsing factory default calibration values for {self}")
logger.info(f"\nWriting default configuration of {self} to the motors")
for motor in self.bus.motors:
self.bus.write("Operating_Mode", motor, OperatingMode.EXTENDED_POSITION.value)
for motor in self.bus.motors:
if motor == "gripper":
self.bus.write("Drive_Mode", motor, DriveMode.INVERTED.value)
else:
self.bus.write("Drive_Mode", motor, DriveMode.NON_INVERTED.value)
drive_modes = {motor: 1 if motor == "gripper" else 0 for motor in self.bus.motors}
self.calibration = {}
for motor, m in self.bus.motors.items():
self.calibration[motor] = MotorCalibration(
id=m.id,
drive_mode=drive_modes[motor],
homing_offset=0,
range_min=0,
range_max=4095,
)
self.bus.write_calibration(self.calibration)
self._save_calibration()
logger.info(f"Calibration saved to {self.calibration_fpath}")
def configure(self) -> None:
self.bus.disable_torque()
self.bus.configure_motors()
for motor in self.bus.motors:
if motor != "gripper":
# Use 'extended position mode' for all motors except gripper, because in joint mode the servos
# can't rotate more than 360 degrees (from 0 to 4095) And some mistake can happen while
# assembling the arm, you could end up with a servo with a position 0 or 4095 at a crucial
# point
self.bus.write("Operating_Mode", motor, OperatingMode.EXTENDED_POSITION.value)
# Use 'position control current based' for gripper to be limited by the limit of the current.
# For the follower gripper, it means it can grasp an object without forcing too much even tho,
# its goal position is a complete grasp (both gripper fingers are ordered to join and reach a touch).
# For the leader gripper, it means we can use it as a physical trigger, since we can force with our finger
# to make it move, and it will move back to its original target position when we release the force.
self.bus.write("Operating_Mode", "gripper", OperatingMode.CURRENT_POSITION.value)
# Set gripper's goal pos in current position mode so that we can use it as a trigger.
self.bus.enable_torque("gripper")
if self.is_calibrated:
self.bus.write("Goal_Position", "gripper", self.config.gripper_open_pos)
def setup_motors(self) -> None:
for motor in reversed(self.bus.motors):
input(f"Connect the controller board to the '{motor}' motor only and press enter.")
self.bus.setup_motor(motor)
print(f"'{motor}' motor id set to {self.bus.motors[motor].id}")
def get_action(self) -> dict[str, float]:
if not self.is_connected:
raise DeviceNotConnectedError(f"{self} is not connected.")
start = time.perf_counter()
action = self.bus.sync_read("Present_Position")
action = {f"{motor}.pos": val for motor, val in action.items()}
dt_ms = (time.perf_counter() - start) * 1e3
logger.debug(f"{self} read action: {dt_ms:.1f}ms")
return action
def send_feedback(self, feedback: dict[str, float]) -> None:
# TODO(rcadene, aliberts): Implement force feedback
raise NotImplementedError
def disconnect(self) -> None:
if not self.is_connected:
raise DeviceNotConnectedError(f"{self} is not connected.")
self.bus.disconnect()
logger.info(f"{self} disconnected.")

View File

@@ -41,10 +41,6 @@ def make_teleoperator_from_config(config: TeleoperatorConfig) -> Teleoperator:
from .koch_leader import KochLeader
return KochLeader(config)
elif config.type == "omx_leader":
from .omx_leader import OmxLeader
return OmxLeader(config)
elif config.type == "so100_leader":
from .so100_leader import SO100Leader

View File

@@ -29,7 +29,6 @@ from lerobot.datasets.dataset_tools import (
remove_feature,
split_dataset,
)
from lerobot.scripts.lerobot_edit_dataset import convert_dataset_to_videos
@pytest.fixture
@@ -1048,107 +1047,3 @@ def test_modify_features_preserves_file_structure(sample_dataset, tmp_path):
assert new_chunk_indices == original_chunk_indices, "Chunk indices should be preserved"
assert new_file_indices == original_file_indices, "File indices should be preserved"
assert "reward" in modified_dataset.meta.features
def test_convert_dataset_to_videos(tmp_path):
"""Test converting lerobot/pusht_image dataset to video format."""
from lerobot.datasets.lerobot_dataset import LeRobotDataset
# Load the actual lerobot/pusht_image dataset (only first 2 episodes for speed)
source_dataset = LeRobotDataset("lerobot/pusht_image", episodes=[0, 1])
output_dir = tmp_path / "pusht_video"
with (
patch("lerobot.datasets.lerobot_dataset.get_safe_version") as mock_get_safe_version,
patch("lerobot.datasets.lerobot_dataset.snapshot_download") as mock_snapshot_download,
):
mock_get_safe_version.return_value = "v3.0"
mock_snapshot_download.return_value = str(output_dir)
# Verify source dataset has images, not videos
assert len(source_dataset.meta.video_keys) == 0
assert "observation.image" in source_dataset.meta.features
# Convert to video dataset (only first 2 episodes for speed)
video_dataset = convert_dataset_to_videos(
dataset=source_dataset,
output_dir=output_dir,
repo_id="lerobot/pusht_video",
vcodec="libsvtav1",
pix_fmt="yuv420p",
g=2,
crf=30,
episode_indices=[0, 1],
num_workers=2,
)
# Verify new dataset has videos
assert len(video_dataset.meta.video_keys) > 0
assert "observation.image" in video_dataset.meta.video_keys
# Verify correct number of episodes and frames (2 episodes)
assert video_dataset.meta.total_episodes == 2
# Compare against the actual number of frames in the loaded episodes, not metadata total
assert len(video_dataset) == len(source_dataset)
# Verify video files exist
for ep_idx in range(video_dataset.meta.total_episodes):
for video_key in video_dataset.meta.video_keys:
video_path = video_dataset.root / video_dataset.meta.get_video_file_path(ep_idx, video_key)
assert video_path.exists(), f"Video file should exist: {video_path}"
# Verify we can load the dataset and access it
assert len(video_dataset) == video_dataset.meta.total_frames
# Test that we can actually get an item from the video dataset
item = video_dataset[0]
assert "observation.image" in item
assert "action" in item
# Cleanup
import shutil
if output_dir.exists():
shutil.rmtree(output_dir)
def test_convert_dataset_to_videos_subset_episodes(tmp_path):
"""Test converting only specific episodes from lerobot/pusht_image to video format."""
from lerobot.datasets.lerobot_dataset import LeRobotDataset
# Load the actual lerobot/pusht_image dataset (only first 3 episodes)
source_dataset = LeRobotDataset("lerobot/pusht_image", episodes=[0, 1, 2])
output_dir = tmp_path / "pusht_video_subset"
with (
patch("lerobot.datasets.lerobot_dataset.get_safe_version") as mock_get_safe_version,
patch("lerobot.datasets.lerobot_dataset.snapshot_download") as mock_snapshot_download,
):
mock_get_safe_version.return_value = "v3.0"
mock_snapshot_download.return_value = str(output_dir)
# Convert only episode 0 to video (subset of loaded episodes)
episode_indices = [0]
video_dataset = convert_dataset_to_videos(
dataset=source_dataset,
output_dir=output_dir,
repo_id="lerobot/pusht_video_subset",
episode_indices=episode_indices,
num_workers=2,
)
# Verify correct number of episodes
assert video_dataset.meta.total_episodes == len(episode_indices)
# Verify video files exist for selected episodes
assert len(video_dataset.meta.video_keys) > 0
assert "observation.image" in video_dataset.meta.video_keys
# Cleanup
import shutil
if output_dir.exists():
shutil.rmtree(output_dir)