mirror of
https://github.com/huggingface/lerobot.git
synced 2026-06-03 20:31:25 +00:00
* feat(envs): add RoboTwin 2.0 benchmark integration
- RoboTwinEnvConfig with 4-camera setup (head/front/left_wrist/right_wrist)
- Docker image with SAPIEN, mplib, CuRobo, pytorch3d (Python 3.12)
- CI workflow: 1-episode smoke eval with pepijn223/smolvla_robotwin
- RoboTwinProcessorStep for state float32 casting
- Camera rename_map: head_camera/front_camera/left_wrist -> camera1/2/3
* fix(robotwin): re-enable autograd for CuRobo planner warmup and take_action
lerobot_eval wraps the full rollout in torch.no_grad() (lerobot_eval.py:566),
but RoboTwin's setup_demo → load_robot → CuroboPlanner(...) runs
motion_gen.warmup(), which invokes Newton's-method trajectory optimization.
That optimizer calls cost.backward() internally, which raises
RuntimeError: element 0 of tensors does not require grad and does not have a grad_fn
when autograd is disabled. take_action() hits the same planner path at every
step. Wrap both setup_demo and take_action in torch.enable_grad() so CuRobo's
optimizer can build its computation graph. Policy inference is unaffected —
rollout()'s inner torch.inference_mode() block around select_action() is
untouched, so we still don't allocate grad buffers during policy forward.
* fix(robotwin): read nested get_obs() output and use aloha-agilex camera names
RoboTwin's base_task.get_obs() returns a nested dict:
{"observation": {cam: {"rgb": ..., "intrinsic_matrix": ...}},
"joint_action": {"left_arm": ..., "left_gripper": ...,
"right_arm": ..., "right_gripper": ...,
"vector": np.ndarray},
"endpose": {...}}
Our _get_obs was reading raw["{cam}_rgb"] / raw["{cam}"] and raw["joint_action"]
as if they were flat, so np.asarray(raw["joint_action"], dtype=float64) tripped
on a dict and raised
TypeError: float() argument must be a string or a real number, not 'dict'
Fix:
- Pull images from raw["observation"][cam]["rgb"]
- Pull joint state from raw["joint_action"]["vector"] (the flat array)
- Update the default camera tuple to (head_camera, left_camera, right_camera)
to match RoboTwin's actual wrist-camera names (envs/camera/camera.py:135-151)
* refactor(robotwin): drop defensive dict guards, cache black fallback frame
_get_obs was guarding every dict access with isinstance(..., dict) in case
RoboTwin's get_obs returned something else — but the API contract
(envs/_base_task.py:437) always returns a dict, so the guards were silently
masking real failures behind plausible-looking zero observations. Drop them.
Also:
- Cache a single black fallback frame in __init__ instead of allocating
a fresh np.zeros((H, W, 3), uint8) for every missing camera on every
step — the "camera not exposed" set is static per env.
- Only allocate the zero joint_state on the fallback path (not unconditionally
before the real value overwrites it).
- Replace .flatten() with .ravel() (no copy when already 1-D).
- Fold the nested-dict schema comment and two identical torch.enable_grad()
rationales into a single Autograd section in the class docstring.
- Fix stale `left_wrist` camera name in the observation docstring.
* fix(robotwin): align observation_space dims with D435 camera output
lerobot_eval crashed in gym.vector's SyncVectorEnv.reset with:
ValueError: Output array is the wrong shape
because RoboTwinEnvConfig declared observation_space = (480, 640, 3) but
task_config/demo_clean.yml specifies head_camera_type=D435, which renders
(240, 320, 3). gym.vector.concatenate pre-allocates a buffer from the
declared space, so the first np.stack raises on shape mismatch.
Changes:
- Config defaults now 240×320 (the D435 dims in _camera_config.yml), with
a comment pointing at the source of truth.
- RoboTwinEnv.__init__ accepts observation_height/width as Optional and
falls back to setup_kwargs["head_camera_h/w"] so the env is self-consistent
even if the config is not in sync.
- Config camera_names / features_map use the actual aloha-agilex camera
names (head_camera, left_camera, right_camera). Drops the stale
"front_camera" and "left_wrist"/"right_wrist" entries that never matched
anything RoboTwin exposes.
- CI workflow's rename_map updated to match the new camera names.
* fix(robotwin): expose _max_episode_steps for lerobot_eval.rollout
rollout() does `env.call("_max_episode_steps")` (lerobot_eval.py:157) to
know when to stop stepping. LiberoEnv and MetaworldEnv set this attribute;
RoboTwinEnv was tracking the limit under `episode_length` only, so the call
raised AttributeError once CuRobo finished warming up.
* fix(robotwin): install av-dep so lerobot_eval can write rollout MP4s
write_video (utils/io_utils.py:53) lazily imports PyAV via require_package
and raises silently inside the video-writing thread when the extra is not
installed — so the eval itself succeeds with pc_success=100 but no MP4
ever lands in videos/, and the artifact upload reports "No files were
found". Add av-dep to the install line (same pattern as the RoboMME image).
* feat(robotwin): eval 5 diverse tasks per CI run with NL descriptions
Widen the smoke eval from a single task (beat_block_hammer) to five:
click_bell, handover_block, open_laptop, stack_blocks_two on top of the
original. Each gets its own rollout video in videos/<task>_0/ so the
dashboard can surface visually distinct behaviours.
extract_task_descriptions.py now has a RoboTwin branch that reads
`description/task_instruction/<task>.json` (already shipped in the clone
at /opt/robotwin) and pulls the `full_description` field. CI cds into
the clone before invoking the script so the relative path resolves.
parse_eval_metrics.py is invoked with the same 5-task list so the
metrics.json embeds one entry per task.
* ci: point benchmark eval checkpoints at the lerobot/ org mirrors
pepijn223/smolvla_* → lerobot/smolvla_* across every benchmark job in
this branch (libero, metaworld, and the per-branch benchmark). The
checkpoints were mirrored into the lerobot/ org and that's the canonical
location going forward.
* refactor(robotwin): rebase docker image on huggingface/lerobot-gpu
Mirror the libero/metaworld/libero_plus/robomme pattern: start from the
nightly GPU image (apt deps, python, uv, venv, lerobot[all] already
there) and layer on only what RoboTwin 2.0 uniquely needs —
cuda-nvcc + cuda-cudart-dev (CuRobo builds from source), Vulkan libs +
NVIDIA ICD (SAPIEN renderer), sapien/mplib/open3d/pytorch3d/curobo
installs, the mplib + sapien upstream patches, and the TianxingChen
asset download.
Drops ~90 lines of duplicated base setup (CUDA FROM, apt python, uv
install, user creation, venv init, base lerobot install). 199 → 110.
Also repoint the docs + env docstring dataset link from
hxma/RoboTwin-LeRobot-v3.0 to the canonical lerobot/robotwin_unified.
* docs(robotwin): add robotwin to _toctree.yml under Benchmarks
doc-builder's TOC integrity check was rejecting the branch because
docs/source/robotwin.mdx existed but wasn't listed in _toctree.yml.
* fix(robotwin): defer YAML lookup and realign tests with current API
__init__ was eagerly calling _load_robotwin_setup_kwargs just to read
head_camera_h/w from the YAML. That import (`from envs import CONFIGS_PATH`)
required a real RoboTwin install, so constructing the env — and thus every
test in tests/envs/test_robotwin.py — blew up with ModuleNotFoundError
on fast-tests where RoboTwin isn't installed.
Replace the eager lookup with DEFAULT_CAMERA_H/W constants (240×320, the
D435 dims baked into task_config/demo_clean.yml). reset() still resolves
the full setup_kwargs lazily — that's fine because reset() is only
called inside the benchmark Docker image where RoboTwin is present.
Also resync the test file with the current env API:
- mock get_obs() as the real nested {"observation": {cam: {"rgb": …}},
"joint_action": {"vector": …}} shape
- patch both _load_robotwin_task and _load_robotwin_setup_kwargs
(_patch_load → _patch_runtime)
- drop `front_camera` / `left_wrist` from assertions — aloha-agilex
exposes head_camera + left_camera + right_camera, not those
- black-frame test now uses left_camera as the missing camera
- setup_demo call check loosened to the caller-provided seed/is_test
bits (full kwargs include the YAML-derived blob)
* fix: integrate PR #3315 review feedback
- ci: add Docker Hub login step, add HF_USER_TOKEN guard on eval step
- docker: tie patches to pinned versions with removal guidance, remove
unnecessary HF_TOKEN for public dataset, fix hadolint warnings
- docs: fix paper link to arxiv, add teaser image, fix camera names
(4→3 cameras), fix observation dims (480x640→240x320)
* fix(docs): correct RoboTwin 2.0 paper arxiv link
* fix(docs): use correct RoboTwin 2.0 teaser image URL
* fix(docs): use plain markdown image to fix MDX build
* ci(robotwin): smoke-eval 10 tasks instead of 5
Broader coverage on the RoboTwin 2.0 benchmark CI job: bump the smoke
eval from 5 tasks to 10 (one episode each). Added tasks are all drawn
from ROBOTWIN_TASKS and mirror the shape/complexity of the existing
set (simple single-object or single-fixture manipulations).
Tasks now run: beat_block_hammer, click_bell, handover_block,
open_laptop, stack_blocks_two, click_alarmclock, close_laptop,
close_microwave, open_microwave, place_block.
`parse_eval_metrics.py` reads `overall` for multi-task runs so no
parser change is needed. Bumped the step name and the metrics label
to reflect the 10-task layout.
* fix(ci): swap 4 broken RoboTwin tasks in smoke eval
The smoke eval hit two upstream issues:
- `open_laptop`: bug in OpenMOSS/RoboTwin main — `check_success()` uses
`self.arm_tag`, but that attribute is only set inside `play_once()`
(the scripted-expert path). During eval `take_action()` calls
`check_success()` directly, hitting `AttributeError: 'open_laptop'
object has no attribute 'arm_tag'`.
- `close_laptop`, `close_microwave`, `place_block`: not present in
upstream RoboTwin `envs/` at all — our ROBOTWIN_TASKS tuple drifted
from upstream and these names leaked into CI.
Replace the four broken tasks with upstream-confirmed equivalents
that exist both in ROBOTWIN_TASKS and in RoboTwin's `envs/`:
`adjust_bottle`, `lift_pot`, `stamp_seal`, `turn_switch`.
New 10-task smoke set: beat_block_hammer, click_bell, handover_block,
stack_blocks_two, click_alarmclock, open_microwave, adjust_bottle,
lift_pot, stamp_seal, turn_switch.
* fix(robotwin): sync ROBOTWIN_TASKS + doc with upstream (50 tasks)
The local ROBOTWIN_TASKS tuple drifted from upstream
RoboTwin-Platform/RoboTwin. Users passing names like `close_laptop`,
`close_microwave`, `dump_bin`, `place_block`, `pour_water`,
`fold_cloth`, etc. got past our validator (the names were in the
tuple) but then crashed inside robosuite with a confusing error,
because those tasks don't exist in upstream `envs/`.
- Replace ROBOTWIN_TASKS with a verbatim mirror of upstream's
`envs/` directory: 50 tasks as of main (was 60 with many
stale entries). Added a `gh api`-based one-liner comment so
future bumps are mechanical.
- Update the `60 tasks` claims in robotwin.mdx and
RoboTwinEnvConfig's docstring to `50`.
- Replace the stale example-task table in robotwin.mdx with ten
upstream-confirmed examples, and flag `open_laptop` as
temporarily broken (its `check_success()` uses `self.arm_tag`
which is only set inside `play_once()`; eval-mode callers hit
AttributeError).
- Rebuild the "Full benchmark" command with the actual 50-task
list, omitting `open_laptop`.
* test(robotwin): lower task-count floor from 60 to 50
ROBOTWIN_TASKS was trimmed to 50 tasks (see comment in
`src/lerobot/envs/robotwin.py:48`), but the assertion still
required ≥60, causing CI failures. Align the test with the
current upstream task count.
* fix(envs): preserve AsyncVectorEnv metadata/unwrapped in lazy eval envs
Port of #3416 onto this branch.
* ci: gate Docker Hub login on secret availability
* fix: integrate PR #3315 review feedback
- envs(robotwin): default `observation_height/width` in
`create_robotwin_envs` to `DEFAULT_CAMERA_H/W` (240/320) so they
match the D435 dims baked into `task_config/demo_clean.yml`.
- envs(robotwin): resolve `task_config/demo_clean.yml` via
`CONFIGS_PATH` instead of a cwd-relative path; works regardless
of where `lerobot-eval` is invoked.
- envs(robotwin): replace `print()` calls in `create_robotwin_envs`
with `logger.info(...)` (module-level `logger = logging.getLogger`).
- envs(robotwin): use `_LazyAsyncVectorEnv` for the async path so
async workers start lazily (matches LIBERO / RoboCasa / VLABench).
- envs(robotwin): cast `agent_pos` space + joint-state output to
float32 end-to-end (was mixed float64/float32).
- envs(configs): use the existing `_make_vec_env_cls(use_async,
n_envs)` helper in `RoboTwinEnvConfig.create_envs`; drop the
`get_env_processors` override so RoboTwin uses the identity
processor inherited from `EnvConfig`.
- processor: delete `RoboTwinProcessorStep` — the float32 cast now
happens in the wrapper itself, so the processor is redundant.
- tests: drop the `TestRoboTwinProcessorStep` suite; update the
mock obs fixture to use float32 `joint_action.vector`.
- ci: hoist `ROBOTWIN_POLICY` and `ROBOTWIN_TASKS` to job-level
env vars so the task list and policy aren't duplicated across
eval / extract / parse steps.
- docker: pin RoboTwin + CuRobo upstream clones to commit SHAs
(`RoboTwin@0aeea2d6`, `curobo@ca941586`) for reproducibility.
489 lines
18 KiB
Python
489 lines
18 KiB
Python
#!/usr/bin/env python
|
||
|
||
# Copyright 2025 The HuggingFace Inc. team. All rights reserved.
|
||
#
|
||
# Licensed under the Apache License, Version 2.0 (the "License");
|
||
# you may not use this file except in compliance with the License.
|
||
# You may obtain a copy of the License at
|
||
#
|
||
# http://www.apache.org/licenses/LICENSE-2.0
|
||
#
|
||
# Unless required by applicable law or agreed to in writing, software
|
||
# distributed under the License is distributed on an "AS IS" BASIS,
|
||
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||
# See the License for the specific language governing permissions and
|
||
# limitations under the License.
|
||
from __future__ import annotations
|
||
|
||
import importlib
|
||
import logging
|
||
from collections import defaultdict
|
||
from collections.abc import Callable, Sequence
|
||
from functools import partial
|
||
from typing import Any
|
||
|
||
import gymnasium as gym
|
||
import numpy as np
|
||
import torch
|
||
from gymnasium import spaces
|
||
|
||
from lerobot.types import RobotObservation
|
||
|
||
from .utils import _LazyAsyncVectorEnv
|
||
|
||
logger = logging.getLogger(__name__)
|
||
|
||
# Camera names as used by RoboTwin 2.0. The wrapper appends "_rgb" when looking
|
||
# up keys in get_obs() output (e.g. "head_camera" → "head_camera_rgb").
|
||
ROBOTWIN_CAMERA_NAMES: tuple[str, ...] = (
|
||
"head_camera",
|
||
"left_camera",
|
||
"right_camera",
|
||
)
|
||
|
||
ACTION_DIM = 14 # 7 DOF × 2 arms
|
||
ACTION_LOW = -1.0
|
||
ACTION_HIGH = 1.0
|
||
DEFAULT_EPISODE_LENGTH = 300
|
||
# D435 dims from task_config/_camera_config.yml (what demo_clean.yml selects).
|
||
DEFAULT_CAMERA_H = 240
|
||
DEFAULT_CAMERA_W = 320
|
||
|
||
# Task list from RoboTwin 2.0's `envs/` directory — mirrors upstream exactly
|
||
# (50 tasks as of main; earlier revisions had 60 with a different split).
|
||
# Keep this in sync with:
|
||
# gh api /repos/RoboTwin-Platform/RoboTwin/contents/envs --paginate \
|
||
# | jq -r '.[].name' | grep -E '\.py$' | grep -v '^_' | sed 's/\.py$//'
|
||
ROBOTWIN_TASKS: tuple[str, ...] = (
|
||
"adjust_bottle",
|
||
"beat_block_hammer",
|
||
"blocks_ranking_rgb",
|
||
"blocks_ranking_size",
|
||
"click_alarmclock",
|
||
"click_bell",
|
||
"dump_bin_bigbin",
|
||
"grab_roller",
|
||
"handover_block",
|
||
"handover_mic",
|
||
"hanging_mug",
|
||
"lift_pot",
|
||
"move_can_pot",
|
||
"move_pillbottle_pad",
|
||
"move_playingcard_away",
|
||
"move_stapler_pad",
|
||
"open_laptop",
|
||
"open_microwave",
|
||
"pick_diverse_bottles",
|
||
"pick_dual_bottles",
|
||
"place_a2b_left",
|
||
"place_a2b_right",
|
||
"place_bread_basket",
|
||
"place_bread_skillet",
|
||
"place_burger_fries",
|
||
"place_can_basket",
|
||
"place_cans_plasticbox",
|
||
"place_container_plate",
|
||
"place_dual_shoes",
|
||
"place_empty_cup",
|
||
"place_fan",
|
||
"place_mouse_pad",
|
||
"place_object_basket",
|
||
"place_object_scale",
|
||
"place_object_stand",
|
||
"place_phone_stand",
|
||
"place_shoe",
|
||
"press_stapler",
|
||
"put_bottles_dustbin",
|
||
"put_object_cabinet",
|
||
"rotate_qrcode",
|
||
"scan_object",
|
||
"shake_bottle",
|
||
"shake_bottle_horizontally",
|
||
"stack_blocks_three",
|
||
"stack_blocks_two",
|
||
"stack_bowls_three",
|
||
"stack_bowls_two",
|
||
"stamp_seal",
|
||
"turn_switch",
|
||
)
|
||
|
||
|
||
_ROBOTWIN_SETUP_CACHE: dict[str, dict[str, Any]] = {}
|
||
|
||
|
||
def _load_robotwin_setup_kwargs(task_name: str) -> dict[str, Any]:
|
||
"""Build the kwargs dict RoboTwin's setup_demo expects.
|
||
|
||
Mirrors the config loading done by RoboTwin's ``script/eval_policy.py``:
|
||
reads ``task_config/demo_clean.yml``, resolves the embodiment file from
|
||
``_embodiment_config.yml``, loads the robot's own ``config.yml``, and
|
||
reads camera dimensions from ``_camera_config.yml``.
|
||
|
||
Uses ``aloha-agilex`` single-robot dual-arm by default (the only embodiment
|
||
used by beat_block_hammer and most smoke-test tasks).
|
||
"""
|
||
if task_name in _ROBOTWIN_SETUP_CACHE:
|
||
return dict(_ROBOTWIN_SETUP_CACHE[task_name])
|
||
|
||
import os
|
||
|
||
import yaml # type: ignore[import-untyped]
|
||
from envs import CONFIGS_PATH # type: ignore[import-not-found]
|
||
|
||
task_config = "demo_clean"
|
||
with open(os.path.join(CONFIGS_PATH, f"{task_config}.yml"), encoding="utf-8") as f:
|
||
args = yaml.safe_load(f)
|
||
|
||
# Resolve embodiment — demo_clean.yml uses [aloha-agilex] (dual-arm single robot)
|
||
with open(os.path.join(CONFIGS_PATH, "_embodiment_config.yml"), encoding="utf-8") as f:
|
||
embodiment_types = yaml.safe_load(f)
|
||
embodiment = args.get("embodiment", ["aloha-agilex"])
|
||
if len(embodiment) == 1:
|
||
robot_file = embodiment_types[embodiment[0]]["file_path"]
|
||
args["left_robot_file"] = robot_file
|
||
args["right_robot_file"] = robot_file
|
||
args["dual_arm_embodied"] = True
|
||
elif len(embodiment) == 3:
|
||
args["left_robot_file"] = embodiment_types[embodiment[0]]["file_path"]
|
||
args["right_robot_file"] = embodiment_types[embodiment[1]]["file_path"]
|
||
args["embodiment_dis"] = embodiment[2]
|
||
args["dual_arm_embodied"] = False
|
||
else:
|
||
raise ValueError(f"embodiment must have 1 or 3 items, got {len(embodiment)}")
|
||
|
||
with open(os.path.join(args["left_robot_file"], "config.yml"), encoding="utf-8") as f:
|
||
args["left_embodiment_config"] = yaml.safe_load(f)
|
||
with open(os.path.join(args["right_robot_file"], "config.yml"), encoding="utf-8") as f:
|
||
args["right_embodiment_config"] = yaml.safe_load(f)
|
||
|
||
# Camera dimensions
|
||
with open(os.path.join(CONFIGS_PATH, "_camera_config.yml"), encoding="utf-8") as f:
|
||
camera_config = yaml.safe_load(f)
|
||
head_cam = args["camera"]["head_camera_type"]
|
||
args["head_camera_h"] = camera_config[head_cam]["h"]
|
||
args["head_camera_w"] = camera_config[head_cam]["w"]
|
||
|
||
# Headless overrides
|
||
args["render_freq"] = 0
|
||
args["task_name"] = task_name
|
||
args["task_config"] = task_config
|
||
|
||
_ROBOTWIN_SETUP_CACHE[task_name] = args
|
||
return dict(args)
|
||
|
||
|
||
def _load_robotwin_task(task_name: str) -> type:
|
||
"""Dynamically import and return a RoboTwin 2.0 task class.
|
||
|
||
RoboTwin tasks live in ``envs/<task_name>.py`` relative to the repository
|
||
root and are expected to be on ``sys.path`` after installation.
|
||
"""
|
||
try:
|
||
module = importlib.import_module(f"envs.{task_name}")
|
||
except ModuleNotFoundError as e:
|
||
raise ModuleNotFoundError(
|
||
f"Could not import RoboTwin task '{task_name}'. "
|
||
"Ensure RoboTwin 2.0 is installed and its 'envs/' directory is on PYTHONPATH. "
|
||
"See the RoboTwin installation guide: https://robotwin-platform.github.io/doc/usage/robotwin-install.html"
|
||
) from e
|
||
task_cls = getattr(module, task_name, None)
|
||
if task_cls is None:
|
||
raise AttributeError(f"Task class '{task_name}' not found in envs/{task_name}.py")
|
||
return task_cls
|
||
|
||
|
||
class RoboTwinEnv(gym.Env):
|
||
"""Gymnasium wrapper around a single RoboTwin 2.0 task.
|
||
|
||
RoboTwin uses a custom SAPIEN-based API (``setup_demo`` / ``get_obs`` /
|
||
``take_action`` / ``check_success``) rather than the standard gym interface.
|
||
This class bridges that API to Gymnasium so that ``lerobot-eval`` can drive
|
||
RoboTwin exactly like LIBERO or Meta-World.
|
||
|
||
The underlying SAPIEN environment is created lazily on the first ``reset()``
|
||
call *inside the worker process*. This is required for
|
||
``gym.vector.AsyncVectorEnv`` compatibility: SAPIEN allocates EGL/GPU
|
||
contexts that must not be forked from the parent process.
|
||
|
||
Observations
|
||
------------
|
||
The ``pixels`` dict uses the raw RoboTwin camera names as keys (e.g.
|
||
``"head_camera"``, ``"left_camera"``). ``preprocess_observation`` in
|
||
``envs/utils.py`` then converts these to ``observation.images.<cam>``.
|
||
|
||
Actions
|
||
-------
|
||
14-dim float32 array in ``[-1, 1]`` (joint-space, 7 DOF per arm).
|
||
|
||
Autograd
|
||
--------
|
||
``setup_demo`` and ``take_action`` drive CuRobo's Newton trajectory
|
||
optimizer, which calls ``cost.backward()`` internally. lerobot_eval wraps
|
||
the rollout in ``torch.no_grad()``, so both call sites re-enable grad.
|
||
"""
|
||
|
||
metadata = {"render_modes": ["rgb_array"], "render_fps": 25}
|
||
|
||
def __init__(
|
||
self,
|
||
task_name: str,
|
||
episode_index: int = 0,
|
||
n_envs: int = 1,
|
||
camera_names: Sequence[str] = ROBOTWIN_CAMERA_NAMES,
|
||
observation_height: int | None = None,
|
||
observation_width: int | None = None,
|
||
episode_length: int = DEFAULT_EPISODE_LENGTH,
|
||
render_mode: str = "rgb_array",
|
||
):
|
||
super().__init__()
|
||
self.task_name = task_name
|
||
self.task = task_name # used by add_envs_task() in utils.py
|
||
self.task_description = task_name.replace("_", " ")
|
||
self.episode_index = episode_index
|
||
self._reset_stride = n_envs
|
||
self.camera_names = list(camera_names)
|
||
# Default to D435 dims (the camera type baked into task_config/demo_clean.yml).
|
||
# The YAML-driven lookup is deferred to reset() so construction doesn't
|
||
# import RoboTwin's `envs` module — fast-tests run without RoboTwin installed.
|
||
self.observation_height = observation_height or DEFAULT_CAMERA_H
|
||
self.observation_width = observation_width or DEFAULT_CAMERA_W
|
||
self.episode_length = episode_length
|
||
self._max_episode_steps = episode_length # lerobot_eval.rollout reads this
|
||
self.render_mode = render_mode
|
||
|
||
self._env: Any | None = None # deferred — created on first reset() inside worker
|
||
self._step_count: int = 0
|
||
self._black_frame = np.zeros((self.observation_height, self.observation_width, 3), dtype=np.uint8)
|
||
|
||
image_spaces = {
|
||
cam: spaces.Box(
|
||
low=0,
|
||
high=255,
|
||
shape=(self.observation_height, self.observation_width, 3),
|
||
dtype=np.uint8,
|
||
)
|
||
for cam in self.camera_names
|
||
}
|
||
self.observation_space = spaces.Dict(
|
||
{
|
||
"pixels": spaces.Dict(image_spaces),
|
||
"agent_pos": spaces.Box(low=-np.inf, high=np.inf, shape=(ACTION_DIM,), dtype=np.float32),
|
||
}
|
||
)
|
||
self.action_space = spaces.Box(
|
||
low=ACTION_LOW, high=ACTION_HIGH, shape=(ACTION_DIM,), dtype=np.float32
|
||
)
|
||
|
||
def _ensure_env(self) -> None:
|
||
"""Create the SAPIEN environment on first use.
|
||
|
||
Called inside the worker subprocess after fork(), so each worker gets
|
||
its own EGL/GPU context rather than inheriting a stale one from the
|
||
parent process (which causes crashes with AsyncVectorEnv).
|
||
"""
|
||
if self._env is not None:
|
||
return
|
||
task_cls = _load_robotwin_task(self.task_name)
|
||
self._env = task_cls()
|
||
|
||
def _get_obs(self) -> RobotObservation:
|
||
assert self._env is not None, "_get_obs called before _ensure_env()"
|
||
raw = self._env.get_obs()
|
||
cameras_raw = raw.get("observation", {})
|
||
|
||
images: dict[str, np.ndarray] = {}
|
||
for cam in self.camera_names:
|
||
cam_data = cameras_raw.get(cam)
|
||
img = cam_data.get("rgb") if cam_data else None
|
||
if img is None:
|
||
images[cam] = self._black_frame
|
||
continue
|
||
img = np.asarray(img, dtype=np.uint8)
|
||
if img.ndim == 2:
|
||
img = np.stack([img, img, img], axis=-1)
|
||
elif img.shape[-1] != 3:
|
||
img = img[..., :3]
|
||
images[cam] = img
|
||
|
||
ja = raw.get("joint_action") or {}
|
||
vec = ja.get("vector")
|
||
if vec is not None:
|
||
arr = np.asarray(vec, dtype=np.float32).ravel()
|
||
joint_state = (
|
||
arr[:ACTION_DIM] if arr.size >= ACTION_DIM else np.zeros(ACTION_DIM, dtype=np.float32)
|
||
)
|
||
else:
|
||
joint_state = np.zeros(ACTION_DIM, dtype=np.float32)
|
||
|
||
return {"pixels": images, "agent_pos": joint_state}
|
||
|
||
def reset(self, seed: int | None = None, **kwargs) -> tuple[RobotObservation, dict]:
|
||
self._ensure_env()
|
||
super().reset(seed=seed)
|
||
assert self._env is not None # set by _ensure_env() above
|
||
|
||
actual_seed = self.episode_index if seed is None else seed
|
||
setup_kwargs = _load_robotwin_setup_kwargs(self.task_name)
|
||
setup_kwargs.update(seed=actual_seed, is_test=True)
|
||
with torch.enable_grad():
|
||
self._env.setup_demo(**setup_kwargs)
|
||
self.episode_index += self._reset_stride
|
||
self._step_count = 0
|
||
|
||
obs = self._get_obs()
|
||
return obs, {"is_success": False, "task": self.task_name}
|
||
|
||
def step(self, action: np.ndarray) -> tuple[RobotObservation, float, bool, bool, dict[str, Any]]:
|
||
assert self._env is not None, "step() called before reset()"
|
||
if action.ndim != 1 or action.shape[0] != ACTION_DIM:
|
||
raise ValueError(f"Expected 1-D action of shape ({ACTION_DIM},), got {action.shape}")
|
||
|
||
with torch.enable_grad():
|
||
if hasattr(self._env, "take_action"):
|
||
self._env.take_action(action)
|
||
else:
|
||
self._env.step(action)
|
||
|
||
self._step_count += 1
|
||
|
||
is_success = bool(getattr(self._env, "eval_success", False))
|
||
if not is_success and hasattr(self._env, "check_success"):
|
||
is_success = bool(self._env.check_success())
|
||
|
||
obs = self._get_obs()
|
||
reward = float(is_success)
|
||
terminated = is_success
|
||
truncated = self._step_count >= self.episode_length
|
||
|
||
info: dict[str, Any] = {
|
||
"task": self.task_name,
|
||
"is_success": is_success,
|
||
"step": self._step_count,
|
||
}
|
||
if terminated or truncated:
|
||
info["final_info"] = {
|
||
"task": self.task_name,
|
||
"is_success": is_success,
|
||
}
|
||
self.reset()
|
||
|
||
return obs, reward, terminated, truncated, info
|
||
|
||
def render(self) -> np.ndarray:
|
||
self._ensure_env()
|
||
obs = self._get_obs()
|
||
# Prefer head camera for rendering; fall back to first available.
|
||
if "head_camera" in obs["pixels"]:
|
||
return obs["pixels"]["head_camera"]
|
||
return next(iter(obs["pixels"].values()))
|
||
|
||
def close(self) -> None:
|
||
if self._env is not None:
|
||
if hasattr(self._env, "close_env"):
|
||
import contextlib
|
||
|
||
with contextlib.suppress(TypeError):
|
||
self._env.close_env()
|
||
self._env = None
|
||
|
||
|
||
# ---- Multi-task factory --------------------------------------------------------
|
||
|
||
|
||
def _make_env_fns(
|
||
*,
|
||
task_name: str,
|
||
n_envs: int,
|
||
camera_names: list[str],
|
||
observation_height: int,
|
||
observation_width: int,
|
||
episode_length: int,
|
||
) -> list[Callable[[], RoboTwinEnv]]:
|
||
"""Return n_envs factory callables for a single task."""
|
||
|
||
def _make_one(episode_index: int) -> RoboTwinEnv:
|
||
return RoboTwinEnv(
|
||
task_name=task_name,
|
||
episode_index=episode_index,
|
||
n_envs=n_envs,
|
||
camera_names=camera_names,
|
||
observation_height=observation_height,
|
||
observation_width=observation_width,
|
||
episode_length=episode_length,
|
||
)
|
||
|
||
return [partial(_make_one, i) for i in range(n_envs)]
|
||
|
||
|
||
def create_robotwin_envs(
|
||
task: str,
|
||
n_envs: int,
|
||
env_cls: Callable[[Sequence[Callable[[], Any]]], Any] | None = None,
|
||
camera_names: Sequence[str] = ROBOTWIN_CAMERA_NAMES,
|
||
observation_height: int = DEFAULT_CAMERA_H,
|
||
observation_width: int = DEFAULT_CAMERA_W,
|
||
episode_length: int = DEFAULT_EPISODE_LENGTH,
|
||
) -> dict[str, dict[int, Any]]:
|
||
"""Create vectorized RoboTwin 2.0 environments.
|
||
|
||
Returns:
|
||
``dict[task_name][0] -> VectorEnv`` — one entry per task, each wrapping
|
||
``n_envs`` parallel rollouts.
|
||
|
||
Args:
|
||
task: Comma-separated list of task names (e.g. ``"beat_block_hammer"``
|
||
or ``"beat_block_hammer,click_bell"``).
|
||
n_envs: Number of parallel rollouts per task.
|
||
env_cls: Vector env constructor (e.g. ``gym.vector.AsyncVectorEnv``).
|
||
camera_names: Cameras to include in observations.
|
||
observation_height: Pixel height for all cameras.
|
||
observation_width: Pixel width for all cameras.
|
||
episode_length: Max steps before truncation.
|
||
"""
|
||
if env_cls is None or not callable(env_cls):
|
||
raise ValueError("env_cls must be callable (e.g. gym.vector.AsyncVectorEnv).")
|
||
if not isinstance(n_envs, int) or n_envs <= 0:
|
||
raise ValueError(f"n_envs must be a positive int; got {n_envs}.")
|
||
|
||
task_names = [t.strip() for t in str(task).split(",") if t.strip()]
|
||
if not task_names:
|
||
raise ValueError("`task` must contain at least one RoboTwin task name.")
|
||
|
||
unknown = [t for t in task_names if t not in ROBOTWIN_TASKS]
|
||
if unknown:
|
||
raise ValueError(f"Unknown RoboTwin tasks: {unknown}. Available tasks: {sorted(ROBOTWIN_TASKS)}")
|
||
|
||
logger.info(
|
||
"Creating RoboTwin envs | tasks=%s | n_envs(per task)=%d",
|
||
task_names,
|
||
n_envs,
|
||
)
|
||
|
||
is_async = env_cls is gym.vector.AsyncVectorEnv
|
||
cached_obs_space: spaces.Space | None = None
|
||
cached_act_space: spaces.Space | None = None
|
||
cached_metadata: dict[str, Any] | None = None
|
||
|
||
out: dict[str, dict[int, Any]] = defaultdict(dict)
|
||
for task_name in task_names:
|
||
fns = _make_env_fns(
|
||
task_name=task_name,
|
||
n_envs=n_envs,
|
||
camera_names=list(camera_names),
|
||
observation_height=observation_height,
|
||
observation_width=observation_width,
|
||
episode_length=episode_length,
|
||
)
|
||
if is_async:
|
||
lazy = _LazyAsyncVectorEnv(fns, cached_obs_space, cached_act_space, cached_metadata)
|
||
if cached_obs_space is None:
|
||
cached_obs_space = lazy.observation_space
|
||
cached_act_space = lazy.action_space
|
||
cached_metadata = lazy.metadata
|
||
out[task_name][0] = lazy
|
||
else:
|
||
out[task_name][0] = env_cls(fns)
|
||
logger.info("Built vec env | task=%s | n_envs=%d", task_name, n_envs)
|
||
|
||
return {k: dict(v) for k, v in out.items()}
|