mirror of
https://github.com/huggingface/lerobot.git
synced 2026-05-30 10:21:24 +00:00
* feat(envs): add RoboMME benchmark integration - RoboMME env wrapper with image/wrist_image/state observations - Docker image with Vulkan, SAPIEN, mani-skill deps - CI workflow: 1-episode smoke eval with pepijn223/smolvla_robomme - preprocess_observation: handle image/wrist_image/state keys - pyproject.toml: robomme extra Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * refactor(docker): rebase RoboMME image on huggingface/lerobot-gpu Mirror the libero/metaworld pattern: start from the nightly GPU image (which already has apt deps, uv, venv, and lerobot[all] preinstalled) and only layer on what RoboMME uniquely needs — the Vulkan libs ManiSkill/SAPIEN requires, plus the robomme extra with the gymnasium/numpy overrides. Drops 48 lines of duplicated base setup (CUDA FROM, python install, user creation, venv init, base apt deps) that the nightly image already provides. Net: 102 → 54 lines. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * docs(robomme): drop prototype-branch note and move dataset to lerobot/robomme - Remove the "Related work" block referencing the prototype branch feat/robomme-integration; the PR stands on its own. - Point all dataset references at lerobot/robomme (docs, env module docstring, RoboMMEEnvConfig docstring) — this is the canonical HF location once the dataset is mirrored. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix(robomme): make docs build + fast tests green 1. Docs: add robomme to _toctree.yml under Benchmarks so doc-builder's TOC integrity check stops rejecting the new page. 2. Fast tests: robomme's mani-skill transitively pins numpy<2 which is unsatisfiable against the project's numpy>=2 base pin, so `uv sync` couldn't resolve a universal lockfile. Drop robomme as a pyproject extra entirely — it truly cannot coexist with the rest of the dep tree. The Dockerfile installs robomme directly from its git URL via `uv pip install --override`, which was already the runtime path. pyproject, docs, env docstrings, and the CI job comment all now point to the docker-only install. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * test(robomme): realign unit tests with current env API The tests were written against an earlier env layout and never updated when the wrapper was refactored, so CI's fast-test job was failing with: - KeyError: 'front_rgb' / 'wrist_rgb' — these were renamed to the lerobot-canonical 'image' / 'wrist_image' keys (matching the dataset columns and preprocess_observation's built-in fallbacks). - AssertionError: 'robomme' not in result — create_robomme_envs now returns {task_name: {task_id: env}}, not {'robomme': {...}}, so comma-separated task lists work. - ModuleNotFoundError: lerobot.envs.lazy_vec_env — LazyVectorEnv was removed; create_robomme_envs is straightforward synchronous now. Rewrite the 7 failing cases against the current API, drop the three LazyVectorEnv tests, and add a multi-task test so the new comma-separated task parsing is covered. Stub install/teardown is moved into helpers (`_install_robomme_stub` / `_uninstall_robomme_stub`) so individual tests stop repeating six boilerplate lines. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * ci: point benchmark eval checkpoints at the lerobot/ org mirrors pepijn223/smolvla_* → lerobot/smolvla_* across every benchmark job in this branch (libero, metaworld, and the per-branch benchmark). The checkpoints were mirrored into the lerobot/ org and that's the canonical location going forward. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: integrate PR #3311 review feedback - envs: rename obs keys to pixels/image, pixels/wrist_image, agent_pos - envs: add __post_init__ for dynamic action_dim in RoboMMEEnv config - envs: remove special-case obs conversion in utils.py (no longer needed) - ci: add Docker Hub login, HF_USER_TOKEN guard, --env.task_ids=[0] - scripts: extract_task_descriptions supports multiple task_ids - docs: title to # RoboMME, add image, restructure eval section - tests: update all key assertions to match new obs naming Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix(docs): use correct RoboMME teaser image URL Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * ci(robomme): smoke-eval 10 tasks instead of 5 Broader coverage on the RoboMME benchmark CI job: bump the smoke eval from 5 tasks to 10 (one episode each), all drawn from ROBOMME_TASKS. Tasks now run: PickXtimes, BinFill, StopCube, MoveCube, InsertPeg, SwingXtimes, VideoUnmask, ButtonUnmask, PickHighlight, PatternLock. Updated the parse_eval_metrics.py `--task` label from the single `PickXtimes` stub to the full comma list so the metrics artifact reflects what was actually run. `parse_eval_metrics.py` already reads `overall` for multi-task runs, so no parser change is needed. Made-with: Cursor * fix(robomme): nest `pixels` as a dict so preprocess_observation picks it up `_convert_obs` was returning flat keys (`pixels/image`, `pixels/wrist_image`). `preprocess_observation()` in envs/utils.py keys off the top-level `"pixels"` entry and, not finding it, silently dropped every image from the batch. The policy then saw zero image features and raised ValueError: All image features are missing from the batch. Match the LIBERO layout: return `{"pixels": {"image": ..., "wrist_image": ...}, "agent_pos": ...}` and declare the same shape in `observation_space`. Made-with: Cursor * fix(robomme): align docs and tests with nested pixels obs layout Addresses PR #3311 review feedback: - Docs: correct observation keys to `pixels/image` / `pixels/wrist_image` (mapped to `observation.images.image` / `observation.images.wrist_image`) and drop the now-obsolete column-rename snippet. - Tests: assert `result["pixels"]["image"]` instead of flat `pixels/image`, matching the nested layout required by `preprocess_observation()`. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(envs): preserve AsyncVectorEnv metadata/unwrapped in lazy eval envs Port of #3416 onto this branch. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * ci: gate Docker Hub login on secret availability Fork PRs cannot access `secrets.DOCKERHUB_LEROBOT_{USERNAME,PASSWORD}`, which made every benchmark job fail at the login step. Gate the login on the env-var expansion of the username so the step is skipped (not failed) when secrets are absent. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(robomme): address review feedback --------- Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
174 lines
7.0 KiB
Python
174 lines
7.0 KiB
Python
#!/usr/bin/env python3
|
|
# Copyright 2025 The HuggingFace Inc. team. All rights reserved.
|
|
#
|
|
# Licensed under the Apache License, Version 2.0 (the "License");
|
|
# you may not use this file except in compliance with the License.
|
|
# You may obtain a copy of the License at
|
|
#
|
|
# http://www.apache.org/licenses/LICENSE-2.0
|
|
#
|
|
# Unless required by applicable law or agreed to in writing, software
|
|
# distributed under the License is distributed on an "AS IS" BASIS,
|
|
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
|
# See the License for the specific language governing permissions and
|
|
# limitations under the License.
|
|
|
|
"""Extract natural-language task descriptions for a benchmark suite.
|
|
|
|
Runs inside the benchmark Docker container (where the env library is installed)
|
|
immediately after lerobot-eval, writing a JSON file that parse_eval_metrics.py
|
|
picks up and embeds in metrics.json.
|
|
|
|
Output format: {"<suite>_<task_idx>": "<nl instruction>", ...}
|
|
|
|
Usage:
|
|
python scripts/ci/extract_task_descriptions.py \\
|
|
--env libero --task libero_spatial \\
|
|
--output /tmp/eval-artifacts/task_descriptions.json
|
|
"""
|
|
|
|
from __future__ import annotations
|
|
|
|
import argparse
|
|
import json
|
|
import sys
|
|
from pathlib import Path
|
|
|
|
|
|
def _libero_descriptions(task_suite: str) -> dict[str, str]:
|
|
from libero.libero import benchmark # type: ignore[import-untyped]
|
|
|
|
suite_dict = benchmark.get_benchmark_dict()
|
|
if task_suite not in suite_dict:
|
|
print(
|
|
f"[extract_task_descriptions] Unknown LIBERO suite '{task_suite}'. "
|
|
f"Available: {list(suite_dict.keys())}",
|
|
file=sys.stderr,
|
|
)
|
|
return {}
|
|
suite = suite_dict[task_suite]()
|
|
return {f"{task_suite}_{i}": suite.get_task(i).language for i in range(suite.n_tasks)}
|
|
|
|
|
|
def _metaworld_descriptions(task_name: str) -> dict[str, str]:
|
|
# MetaWorld tasks don't expose a separate NL description attribute;
|
|
# use a cleaned version of the task name as the description.
|
|
label = task_name.removeprefix("metaworld-").replace("-", " ").strip()
|
|
return {f"{task_name}_0": label}
|
|
|
|
|
|
def _robotwin_descriptions(task_names: str) -> dict[str, str]:
|
|
"""Return descriptions for each requested RoboTwin task. Reads
|
|
`description/task_instruction/<task>.json` from the RoboTwin clone
|
|
(cwd is /opt/robotwin in CI). Falls back to the task name if missing."""
|
|
out: dict[str, str] = {}
|
|
root = Path("description/task_instruction")
|
|
for name in (t.strip() for t in task_names.split(",") if t.strip()):
|
|
desc_file = root / f"{name}.json"
|
|
desc = name.replace("_", " ")
|
|
if desc_file.is_file():
|
|
data = json.loads(desc_file.read_text())
|
|
full = data.get("full_description") or desc
|
|
# Strip the schema placeholders ({A}, {a}) — keep the sentence readable.
|
|
desc = full.replace("<", "").replace(">", "")
|
|
out[f"{name}_0"] = desc
|
|
return out
|
|
|
|
|
|
def _robocasa_descriptions(task_spec: str) -> dict[str, str]:
|
|
"""For each task in the comma-separated list, emit a cleaned-name label.
|
|
|
|
RoboCasa episodes carry their language instruction in the env's
|
|
`ep_meta['lang']`, populated per reset. Pulling it requires spinning
|
|
up the full kitchen env per task (~seconds each); we use the task
|
|
name as the key here and let the eval's episode info carry the
|
|
actual instruction.
|
|
"""
|
|
out: dict[str, str] = {}
|
|
for task in (t.strip() for t in task_spec.split(",") if t.strip()):
|
|
# Split CamelCase into words: "CloseFridge" → "close fridge".
|
|
label = "".join(f" {c.lower()}" if c.isupper() else c for c in task).strip()
|
|
out[f"{task}_0"] = label or task
|
|
return out
|
|
|
|
|
|
_ROBOMME_DESCRIPTIONS = {
|
|
"BinFill": "Fill the target bin with the correct number of cubes",
|
|
"PickXtimes": "Pick the indicated cube the specified number of times",
|
|
"SwingXtimes": "Swing the object the specified number of times",
|
|
"StopCube": "Grasp and stop the moving cube",
|
|
"VideoUnmask": "Pick the cube shown in the reference video",
|
|
"VideoUnmaskSwap": "Pick the cube matching the reference video after a swap",
|
|
"ButtonUnmask": "Press the button indicated by the reference",
|
|
"ButtonUnmaskSwap": "Press the correct button after objects are swapped",
|
|
"PickHighlight": "Pick the highlighted cube",
|
|
"VideoRepick": "Repick the cube shown in the reference video",
|
|
"VideoPlaceButton": "Place the cube on the button shown in the video",
|
|
"VideoPlaceOrder": "Place cubes in the order shown in the video",
|
|
"MoveCube": "Move the cube to the target location",
|
|
"InsertPeg": "Insert the peg into the target hole",
|
|
"PatternLock": "Unlock the pattern by pressing buttons in sequence",
|
|
"RouteStick": "Route the stick through the required waypoints",
|
|
}
|
|
|
|
|
|
def _robomme_descriptions(task_names: str, task_ids: list[int] | None = None) -> dict[str, str]:
|
|
"""Return descriptions for each requested RoboMME task. Keys match the
|
|
video filename pattern `<task>_<task_id>` used by the eval script."""
|
|
if task_ids is None:
|
|
task_ids = [0]
|
|
out: dict[str, str] = {}
|
|
for name in (t.strip() for t in task_names.split(",") if t.strip()):
|
|
desc = _ROBOMME_DESCRIPTIONS.get(name, name)
|
|
for tid in task_ids:
|
|
out[f"{name}_{tid}"] = desc
|
|
return out
|
|
|
|
|
|
def main() -> int:
|
|
parser = argparse.ArgumentParser(description=__doc__)
|
|
parser.add_argument("--env", required=True, help="Environment family (libero, metaworld, ...)")
|
|
parser.add_argument("--task", required=True, help="Task/suite name (e.g. libero_spatial)")
|
|
parser.add_argument(
|
|
"--task-ids",
|
|
type=str,
|
|
default=None,
|
|
help="Comma-separated task IDs (e.g. '0,1,2'). Default: [0]",
|
|
)
|
|
parser.add_argument("--output", required=True, help="Path to write task_descriptions.json")
|
|
args = parser.parse_args()
|
|
|
|
task_ids: list[int] | None = None
|
|
if args.task_ids:
|
|
task_ids = [int(x.strip()) for x in args.task_ids.split(",")]
|
|
|
|
descriptions: dict[str, str] = {}
|
|
try:
|
|
if args.env == "libero":
|
|
descriptions = _libero_descriptions(args.task)
|
|
elif args.env == "metaworld":
|
|
descriptions = _metaworld_descriptions(args.task)
|
|
elif args.env == "robotwin":
|
|
descriptions = _robotwin_descriptions(args.task)
|
|
elif args.env == "robocasa":
|
|
descriptions = _robocasa_descriptions(args.task)
|
|
elif args.env == "robomme":
|
|
descriptions = _robomme_descriptions(args.task, task_ids=task_ids)
|
|
else:
|
|
print(
|
|
f"[extract_task_descriptions] No description extractor for env '{args.env}'.",
|
|
file=sys.stderr,
|
|
)
|
|
except Exception as exc:
|
|
print(f"[extract_task_descriptions] Warning: {exc}", file=sys.stderr)
|
|
|
|
out_path = Path(args.output)
|
|
out_path.parent.mkdir(parents=True, exist_ok=True)
|
|
out_path.write_text(json.dumps(descriptions, indent=2))
|
|
print(f"[extract_task_descriptions] {len(descriptions)} descriptions → {out_path}")
|
|
return 0
|
|
|
|
|
|
if __name__ == "__main__":
|
|
sys.exit(main())
|