mirror of
https://github.com/huggingface/lerobot.git
synced 2026-06-01 11:21:27 +00:00
* feat(envs): add LIBERO-plus robustness benchmark integration
- LiberoPlusEnv config (subclass of LiberoEnv, same gym interface)
- Docker image installing LIBERO-plus fork via PYTHONPATH
- CI workflow: 1-episode smoke eval with pepijn223/smolvla_libero_plus
- pyproject.toml: libero_plus extra
* fix(libero): use suite's perturbation-aware init_states loader
LIBERO-plus's Benchmark class exposes a `get_task_init_states(i)` method that
strips perturbation suffixes (`_table_N`, `_tb_N`, `_view_`, `_language_`,
`_light_`, `_add_`, `_level`) and loads the underlying base `.pruned_init`
file — the on-disk name for a perturbation variant doesn't exist as a file,
only the base does. lerobot's loader was bypassing that logic and trying to
read the suffix-bearing filename directly, which failed for every non-zero
task id and killed the eval before any rollout video could be written.
Delegate to the suite's method when it exists; fall back to the path-based
loader for vanilla LIBERO (which does not provide the method).
Also drop the hf-libero install + init_files copy from the LIBERO-plus
Dockerfile — the LIBERO-plus clone already ships both `bddl_files/` and
`init_files/` for all five suites, so the copy was unnecessary and the
`cp -r` into an existing dir produced a confusing nested layout.
* fix(libero): resolve LIBERO-plus perturbation init_states path ourselves
Delegating to `task_suite.get_task_init_states(i)` works for path resolution
but LIBERO-plus's method calls `torch.load(path)` without `weights_only=False`,
which fails on PyTorch 2.6+ because the pickled init_states contains numpy
objects not in the default allowlist:
_pickle.UnpicklingError: Weights only load failed.
WeightsUnpickler error: Unsupported global:
GLOBAL numpy.core.multiarray._reconstruct was not an allowed global.
Mirror LIBERO-plus's suffix-stripping logic (`_table_N`, `_tb_N`, `_view_`,
`_language_`, `_light_`, `_add_`, `_level`) in our own helper so we can pass
`weights_only=False` ourselves. Vanilla LIBERO task names don't contain any
of these patterns except for `_table_` when followed by the word `center`
(e.g. `pick_up_the_black_bowl_from_table_center_...`), and the regex
requires `_table_\\d+` so semantic uses are preserved.
* fix(libero-plus): download perturbation assets from Sylvest/LIBERO-plus
LIBERO-plus's bddl_base_domain.py resolves scene XMLs with
`os.path.join(DIR_PATH, "../assets")`, so the `assets` key in config.yaml
has no effect on scene lookup — MuJoCo always opens
`<clone>/libero/libero/assets/scenes/...`. With no such directory present,
every perturbation task fails on:
FileNotFoundError: No such file or directory:
.../libero-plus/libero/libero/assets/scenes/tabletop_table_Cobblestone01_GLOSS_6K.xml
These textures, views, and extra objects ship only in the 6.4 GB `assets.zip`
published at `Sylvest/LIBERO-plus` (the LIBERO-plus README explicitly says
to download and unzip it into the package dir). Fetch it via `hf_hub_download`,
unzip into `${LIBERO_PLUS_ROOT}/`, install `unzip`, and point config.yaml at
the extracted dir so everything stays consistent. The download lives in its
own Docker layer so subsequent rebuilds reuse the cached assets.
Drops the lerobot/libero-assets snapshot_download — that mirror only has
vanilla LIBERO textures and is ignored for scene loading anyway.
* fix(libero-plus): flatten deep path prefix from Sylvest/LIBERO-plus assets.zip
The 6.4 GB zip ships with every entry prefixed by
`inspire/hdd/project/embodied-multimodality/public/syfei/libero_new/release/dataset/LIBERO-plus-0/assets/...`
(the author's internal filesystem layout, not the layout the LIBERO-plus
README promises), so the previous `unzip -d ${LIBERO_PLUS_ROOT}/` created
`${LIBERO_PLUS_ROOT}/inspire/.../assets/` — robosuite still opened
`${LIBERO_PLUS_ROOT}/assets/scenes/tabletop_table_Cobblestone01_GLOSS_6K.xml`
and hit the same FileNotFoundError.
Extract to a scratch dir, then `mv` the nested `assets/` subtree to the
expected location. Verified the target file exists in the zip central
directory under that exact prefix.
* refactor(libero): inline init_states resolver behind single regex
Collapse the three-style suffix stripper (split/re.sub/in) into one
compiled regex, drop the (Path, bool) tuple return, and move the
`_add_`/`_level` reshape branch into the caller so each branch loads
its own file and returns directly. Net: -11 lines, one fewer helper.
* refactor(libero-plus): rebase docker image on huggingface/lerobot-gpu
Mirror the libero/metaworld/robomme pattern: start from the nightly GPU
image (apt deps, python, uv, venv, lerobot[all] already there) and only
layer on what LIBERO-plus uniquely needs — its wand/ImageMagick build
deps, the non-extra runtime pips (robosuite==1.4.1, bddl, …), the
PYTHONPATH-shadowed fork, and the 6.4 GB assets.zip.
Drops ~50 lines of duplicated base setup (CUDA FROM, apt python, uv
install, user creation, venv init) the nightly already provides.
123 → 73 lines.
Also:
- Add libero_plus to docs/source/_toctree.yml under Benchmarks so
doc-builder's TOC integrity check stops failing.
- Repoint the docs dataset link from pepijn223/libero_plus_lerobot to
the canonical lerobot/libero_plus.
- Revert the stray uv.lock churn (revision/marker diff that crept in
from an unrelated resolve — unrelated to LIBERO-plus).
* fix(libero-plus): stop touching pyproject + uv.lock
The fast-tests job was rejecting the branch because pyproject.toml had a
[libero_plus] extra whose git dep wasn't represented in uv.lock.
The Docker image no longer needs the extra — it clones LIBERO-plus
directly and PYTHONPATH-shadows hf-libero. Drop [libero_plus] from
pyproject and restore pyproject.toml + uv.lock to exactly what's on
origin/main, so `uv sync --locked --extra test` is a no-op for this PR.
Also repoint the doc/CI/env comments that still mentioned the extra at
the Docker install path.
* fix(libero-plus): strip perturbation metadata from task descriptions
LIBERO-plus builds task.language by space-joining the perturbation-variant
filename, so every non-_language_ variant inherits a trailing blob like
"view 0 0 100 0 0 initstate 0 noise 45" or "add 16". That shows up in the
dashboard video labels and no longer matches the base instruction stored
in the training dataset.
Strip those tokens in extract_task_descriptions.py with an end-anchored
regex over the {view,initstate,noise,add,tb,table,light,level}(+digits)
vocabulary. The anchor preserves mid-sentence literal uses of those words
(e.g. "from table center and place it on the plate") — only the trailing
metadata chain is removed. _language_ variants carry real BDDL-sourced
text and are left untouched.
* ci: point benchmark eval checkpoints at the lerobot/ org mirrors
pepijn223/smolvla_* → lerobot/smolvla_* across every benchmark job in
this branch (libero, metaworld, and the per-branch benchmark). The
checkpoints were mirrored into the lerobot/ org and that's the canonical
location going forward.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* fix: integrate PR #3313 review feedback
- docs: fix paper link to arxiv, add benchmark image, add suite descriptions,
add LIBERO-plus replacement warning, restructure eval section to match
LIBERO doc style, fix policy I/O section, remove false try/except claim
- docker: fix shell grouping for hf-libero uninstall, replace hardcoded
asset path with dynamic find
- ci: add Docker Hub login step, add HF_USER_TOKEN guard on eval step
- envs: add is_libero_plus param to get_task_init_states so vanilla LIBERO
always takes the simple path
* fix(docs): use correct LIBERO-plus teaser image URL
* ci(libero-plus): drop redundant hf auth login step
The standalone login step ran `hf auth login` in a throwaway
`docker run --rm` container, so no credentials persisted. Auth is
already performed inside the eval step's container. Removing the
redundant step per PR #3313 review feedback.
* fix(envs): preserve AsyncVectorEnv metadata/unwrapped in lazy eval envs
Port of #3416 onto this branch. Without these attributes eval crashes
when calling `env.unwrapped.metadata["render_fps"]` with async vector
envs. Adds `metadata` / `unwrapped` to `_LazyAsyncVectorEnv` and
caches the metadata alongside obs/action spaces in the LIBERO and
MetaWorld factories.
* ci: gate Docker Hub login on secret availability
Fork PRs cannot access `secrets.DOCKERHUB_LEROBOT_{USERNAME,PASSWORD}`,
which made every benchmark job fail at the login step before any of
the actual build/eval work could run. Gate the login on the env-var
expansion of the username so the step is skipped (not failed) when
secrets are absent. Mirrors the existing pattern in the VLABench job.
* Update .github/workflows/benchmark_tests.yml
Co-authored-by: Khalil Meftah <khalil.meftah@huggingface.co>
Signed-off-by: Pepijn <138571049+pkooij@users.noreply.github.com>
* Update scripts/ci/extract_task_descriptions.py
Co-authored-by: Khalil Meftah <khalil.meftah@huggingface.co>
Signed-off-by: Pepijn <138571049+pkooij@users.noreply.github.com>
* Update .github/workflows/benchmark_tests.yml
Co-authored-by: Khalil Meftah <khalil.meftah@huggingface.co>
Signed-off-by: Pepijn <138571049+pkooij@users.noreply.github.com>
* Update docker/Dockerfile.benchmark.libero_plus
Co-authored-by: Khalil Meftah <khalil.meftah@huggingface.co>
Signed-off-by: Pepijn <138571049+pkooij@users.noreply.github.com>
* Update .github/workflows/benchmark_tests.yml
Co-authored-by: Khalil Meftah <khalil.meftah@huggingface.co>
Signed-off-by: Pepijn <138571049+pkooij@users.noreply.github.com>
* fix(libero-plus): address review feedback
* ci(libero-plus): fix YAML indentation in upload-artifact steps
The `uses:` key on two upload-artifact steps was at column 0 instead
of nested under the step, causing `pre-commit run check-yaml` to fail
with "expected <block end>, but found '<block mapping start>'".
Signed-off-by: Pepijn <138571049+pkooij@users.noreply.github.com>
Co-authored-by: Khalil Meftah <khalil.meftah@huggingface.co>
191 lines
7.7 KiB
Python
191 lines
7.7 KiB
Python
#!/usr/bin/env python3
|
|
# Copyright 2025 The HuggingFace Inc. team. All rights reserved.
|
|
#
|
|
# Licensed under the Apache License, Version 2.0 (the "License");
|
|
# you may not use this file except in compliance with the License.
|
|
# You may obtain a copy of the License at
|
|
#
|
|
# http://www.apache.org/licenses/LICENSE-2.0
|
|
#
|
|
# Unless required by applicable law or agreed to in writing, software
|
|
# distributed under the License is distributed on an "AS IS" BASIS,
|
|
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
|
# See the License for the specific language governing permissions and
|
|
# limitations under the License.
|
|
|
|
"""Extract natural-language task descriptions for a benchmark suite.
|
|
|
|
Runs inside the benchmark Docker container (where the env library is installed)
|
|
immediately after lerobot-eval, writing a JSON file that parse_eval_metrics.py
|
|
picks up and embeds in metrics.json.
|
|
|
|
Output format: {"<suite>_<task_idx>": "<nl instruction>", ...}
|
|
|
|
Usage:
|
|
python scripts/ci/extract_task_descriptions.py \\
|
|
--env libero --task libero_spatial \\
|
|
--output /tmp/eval-artifacts/task_descriptions.json
|
|
"""
|
|
|
|
from __future__ import annotations
|
|
|
|
import argparse
|
|
import json
|
|
import re
|
|
import sys
|
|
from pathlib import Path
|
|
|
|
# LIBERO-plus derives task.language by space-joining the perturbation-variant
|
|
# filename (grab_language_from_filename in libero/libero/benchmark/__init__.py),
|
|
# so non-_language_ variants inherit a trailing metadata blob like
|
|
# "view 0 0 100 0 0 initstate 0 noise 45" or "add 16". Strip those tokens so
|
|
# the description matches the base instruction used in the training dataset.
|
|
_LIBERO_PERTURBATION_TAIL_RE = re.compile(
|
|
r"(?:\s(?:view|initstate|noise|add|tb|table|light|level)(?:\s\d+)+)+$"
|
|
)
|
|
|
|
|
|
def _strip_libero_perturbation_tail(instruction: str) -> str:
|
|
return _LIBERO_PERTURBATION_TAIL_RE.sub("", instruction).strip()
|
|
|
|
|
|
def _libero_descriptions(task_suite: str) -> dict[str, str]:
|
|
from libero.libero import benchmark # type: ignore[import-untyped]
|
|
|
|
suite_dict = benchmark.get_benchmark_dict()
|
|
if task_suite not in suite_dict:
|
|
print(
|
|
f"[extract_task_descriptions] Unknown LIBERO suite '{task_suite}'. "
|
|
f"Available: {list(suite_dict.keys())}",
|
|
file=sys.stderr,
|
|
)
|
|
return {}
|
|
suite = suite_dict[task_suite]()
|
|
return {
|
|
f"{task_suite}_{i}": _strip_libero_perturbation_tail(suite.get_task(i).language)
|
|
for i in range(suite.n_tasks)
|
|
}
|
|
|
|
|
|
def _metaworld_descriptions(task_name: str) -> dict[str, str]:
|
|
# MetaWorld tasks don't expose a separate NL description attribute;
|
|
# use a cleaned version of the task name as the description.
|
|
label = task_name.removeprefix("metaworld-").replace("-", " ").strip()
|
|
return {f"{task_name}_0": label}
|
|
|
|
|
|
def _robotwin_descriptions(task_names: str) -> dict[str, str]:
|
|
"""Return descriptions for each requested RoboTwin task. Reads
|
|
`description/task_instruction/<task>.json` from the RoboTwin clone
|
|
(cwd is /opt/robotwin in CI). Falls back to the task name if missing."""
|
|
out: dict[str, str] = {}
|
|
root = Path("description/task_instruction")
|
|
for name in (t.strip() for t in task_names.split(",") if t.strip()):
|
|
desc_file = root / f"{name}.json"
|
|
desc = name.replace("_", " ")
|
|
if desc_file.is_file():
|
|
data = json.loads(desc_file.read_text())
|
|
full = data.get("full_description") or desc
|
|
# Strip the schema placeholders ({A}, {a}) — keep the sentence readable.
|
|
desc = full.replace("<", "").replace(">", "")
|
|
out[f"{name}_0"] = desc
|
|
return out
|
|
|
|
|
|
def _robocasa_descriptions(task_spec: str) -> dict[str, str]:
|
|
"""For each task in the comma-separated list, emit a cleaned-name label.
|
|
|
|
RoboCasa episodes carry their language instruction in the env's
|
|
`ep_meta['lang']`, populated per reset. Pulling it requires spinning
|
|
up the full kitchen env per task (~seconds each); we use the task
|
|
name as the key here and let the eval's episode info carry the
|
|
actual instruction.
|
|
"""
|
|
out: dict[str, str] = {}
|
|
for task in (t.strip() for t in task_spec.split(",") if t.strip()):
|
|
# Split CamelCase into words: "CloseFridge" → "close fridge".
|
|
label = "".join(f" {c.lower()}" if c.isupper() else c for c in task).strip()
|
|
out[f"{task}_0"] = label or task
|
|
return out
|
|
|
|
|
|
_ROBOMME_DESCRIPTIONS = {
|
|
"BinFill": "Fill the target bin with the correct number of cubes",
|
|
"PickXtimes": "Pick the indicated cube the specified number of times",
|
|
"SwingXtimes": "Swing the object the specified number of times",
|
|
"StopCube": "Grasp and stop the moving cube",
|
|
"VideoUnmask": "Pick the cube shown in the reference video",
|
|
"VideoUnmaskSwap": "Pick the cube matching the reference video after a swap",
|
|
"ButtonUnmask": "Press the button indicated by the reference",
|
|
"ButtonUnmaskSwap": "Press the correct button after objects are swapped",
|
|
"PickHighlight": "Pick the highlighted cube",
|
|
"VideoRepick": "Repick the cube shown in the reference video",
|
|
"VideoPlaceButton": "Place the cube on the button shown in the video",
|
|
"VideoPlaceOrder": "Place cubes in the order shown in the video",
|
|
"MoveCube": "Move the cube to the target location",
|
|
"InsertPeg": "Insert the peg into the target hole",
|
|
"PatternLock": "Unlock the pattern by pressing buttons in sequence",
|
|
"RouteStick": "Route the stick through the required waypoints",
|
|
}
|
|
|
|
|
|
def _robomme_descriptions(task_names: str, task_ids: list[int] | None = None) -> dict[str, str]:
|
|
"""Return descriptions for each requested RoboMME task. Keys match the
|
|
video filename pattern `<task>_<task_id>` used by the eval script."""
|
|
if task_ids is None:
|
|
task_ids = [0]
|
|
out: dict[str, str] = {}
|
|
for name in (t.strip() for t in task_names.split(",") if t.strip()):
|
|
desc = _ROBOMME_DESCRIPTIONS.get(name, name)
|
|
for tid in task_ids:
|
|
out[f"{name}_{tid}"] = desc
|
|
return out
|
|
|
|
|
|
def main() -> int:
|
|
parser = argparse.ArgumentParser(description=__doc__)
|
|
parser.add_argument("--env", required=True, help="Environment family (libero, metaworld, ...)")
|
|
parser.add_argument("--task", required=True, help="Task/suite name (e.g. libero_spatial)")
|
|
parser.add_argument(
|
|
"--task-ids",
|
|
type=str,
|
|
default=None,
|
|
help="Comma-separated task IDs (e.g. '0,1,2'). Default: [0]",
|
|
)
|
|
parser.add_argument("--output", required=True, help="Path to write task_descriptions.json")
|
|
args = parser.parse_args()
|
|
|
|
task_ids: list[int] | None = None
|
|
if args.task_ids:
|
|
task_ids = [int(x.strip()) for x in args.task_ids.split(",")]
|
|
|
|
descriptions: dict[str, str] = {}
|
|
try:
|
|
if args.env == ("libero", "libero_plus"):
|
|
descriptions = _libero_descriptions(args.task)
|
|
elif args.env == "metaworld":
|
|
descriptions = _metaworld_descriptions(args.task)
|
|
elif args.env == "robotwin":
|
|
descriptions = _robotwin_descriptions(args.task)
|
|
elif args.env == "robocasa":
|
|
descriptions = _robocasa_descriptions(args.task)
|
|
elif args.env == "robomme":
|
|
descriptions = _robomme_descriptions(args.task, task_ids=task_ids)
|
|
else:
|
|
print(
|
|
f"[extract_task_descriptions] No description extractor for env '{args.env}'.",
|
|
file=sys.stderr,
|
|
)
|
|
except Exception as exc:
|
|
print(f"[extract_task_descriptions] Warning: {exc}", file=sys.stderr)
|
|
|
|
out_path = Path(args.output)
|
|
out_path.parent.mkdir(parents=True, exist_ok=True)
|
|
out_path.write_text(json.dumps(descriptions, indent=2))
|
|
print(f"[extract_task_descriptions] {len(descriptions)} descriptions → {out_path}")
|
|
return 0
|
|
|
|
|
|
if __name__ == "__main__":
|
|
sys.exit(main())
|