lerobot-clone

ydy0615/lerobot-clone

Fork 0

mirror of https://github.com/huggingface/lerobot.git synced 2026-06-04 21:01:26 +00:00

Commit Graph

Author	SHA1	Message	Date
Pepijn	eba3ab3741	annotate: address review feedback — bug fixes, docs/code drift, naming, cleanup Bugs * validator: don't re-raise on unknown style. The second column_for_style lookup (used to route persistent vs event) now sits in try/except so an unknown style is recorded by _check_column_routing and skipped instead of crashing the whole validation pass. * general_vqa._target_cameras: when restrict_to_default_camera is set but the configured camera_key isn't one the provider exposes, warn and fall back to all cameras instead of returning a phantom key that KeyErrors deep in frame decode. * interjections: clamp interjection timestamps to frame_timestamps[0] rather than a hardcoded 0.0 (datasets can start at non-zero t). Docs / code drift * annotation_pipeline.mdx: drop the phantom 'vocabulary discovery / phase 0 / --vocabulary.* / canonical_vocabulary.json' section (none of it exists); describe the real describe->segment + coverage-stitch flow. Soften the src/lerobot/tools/ + TOOL_REGISTRY reference to 'not part of this PR' (matches tools.mdx, which already marks the runtime layer as not-yet-implemented). Fix the --push_to_hub/--new_repo_id wording. Note the default is now a single h200. Add a 'Contributing new modules' section inviting module / prompt / quality contributions. * executor docstring: six phases, no phantom phase 0. run_hf_job.py * add the Apache 2.0 license header (was flagged repeatedly). * default to a single GPU: flavor=h200, parallel_servers=1, num_gpus=1 (scale to h200x4 noted in the docstring). * pin the install to @main instead of the feature branch (won't break after merge). Naming / cleanup * rename dest_repo_id -> new_repo_id across config / script / example / test to match the LeRobot dataset edit tools. * rename prompt templates module_N_.txt -> descriptive (plan_, interjections_, vqa.txt) and update every load_prompt() call. remove dead _messages_to_prompt (used only by the removed in-process backends). * declare _warned_decode_fail (frames) and _warned_no_camera (vqa) as real init=False dataclass fields instead of getattr monkey-patches. * scope bandit B607 to the two ffmpeg subprocess.run sites via '# nosec B607' and drop it from the global skip list. Tests * fix stale canned-VLM markers ('ONE realistic interruption' -> 'compact interjection', 'Update the memory' -> 'compressed semantic memory') and drop the dead 'concise hierarchical PLAN' plan responders (plan generation is deterministic now) in run_e2e_smoke, test_pipeline_recipe_render, test_modules. * run_e2e_smoke now asserts interjection + speech rows are produced so a stale marker can't silently pass again. * drop remaining 'PR 1' / 'PR 2' references from test comments / names. Verified: tests/annotations + tests/datasets/test_language + tests/scripts/test_lerobot_annotate (31 passed); make-style E2E smoke (interjections=1 speech_atoms=2); pre-commit (ruff, mypy, bandit, prettier) clean. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-06-03 18:30:46 +02:00
Pepijn	b9246ef61b	tests(annotations): guard on the 'dataset' extra so base fast-test tier skips cleanly Fast Pytest Tests failed at COLLECTION in the base '--extra test' tier with 'ModuleNotFoundError: No module named datasets': tests/annotations/ conftest.py imported the fixture dataset builder (-> lerobot.datasets -> the HF 'datasets' lib + pandas/pyarrow), which only ship under the 'dataset' extra, so the whole annotations package crashed. Fix uses the repo's proven module-level guard pattern (see tests/datasets/test_language.py), NOT a conftest-level importorskip — verified empirically that pytest.importorskip raised during conftest import is treated as a collection ERROR (exit 1), while module-level importorskip is a clean SKIP. * conftest.py: import build_annotation_dataset LAZILY inside the fixtures so the conftest itself imports cleanly in every tier. * test_modules / test_validator / test_writer / test_pipeline_recipe_ render: add module-level pytest.importorskip('datasets') + ('pandas') before the pyarrow / lerobot.* imports (# noqa: E402 to match the existing convention). pyarrow-importing modules place the guard before the pyarrow import. * tests/scripts/test_lerobot_annotate.py: same guard (its _push_to_hub path imports lerobot.datasets). Result: - base / hardware / viz tiers (no dataset extra): annotation tests skip cleanly; the rest of the suite runs -> exit 0. - dataset tier: datasets present -> guards pass through -> annotation tests run with the stub VLM. The pipeline modules import only stdlib + relative + lerobot.datasets (no module-level datatrove / vllm / openai), so they import fine there. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-06-03 15:57:04 +02:00
pepijn	2ea0da2d9f	fix(annotate): tag uploaded dataset revision Co-authored-by: Cursor <cursoragent@cursor.com>	2026-05-19 12:44:35 +00:00

Author

SHA1

Message

Date

Pepijn

eba3ab3741

annotate: address review feedback — bug fixes, docs/code drift, naming, cleanup

Bugs
  * validator: don't re-raise on unknown style. The second column_for_style
    lookup (used to route persistent vs event) now sits in try/except so an
    unknown style is recorded by _check_column_routing and skipped instead
    of crashing the whole validation pass.
  * general_vqa._target_cameras: when restrict_to_default_camera is set but
    the configured camera_key isn't one the provider exposes, warn and fall
    back to all cameras instead of returning a phantom key that KeyErrors
    deep in frame decode.
  * interjections: clamp interjection timestamps to frame_timestamps[0]
    rather than a hardcoded 0.0 (datasets can start at non-zero t).

Docs / code drift
  * annotation_pipeline.mdx: drop the phantom 'vocabulary discovery / phase
    0 / --vocabulary.* / canonical_vocabulary.json' section (none of it
    exists); describe the real describe->segment + coverage-stitch flow.
    Soften the src/lerobot/tools/ + TOOL_REGISTRY reference to 'not part of
    this PR' (matches tools.mdx, which already marks the runtime layer as
    not-yet-implemented). Fix the --push_to_hub/--new_repo_id wording. Note
    the default is now a single h200. Add a 'Contributing new modules'
    section inviting module / prompt / quality contributions.
  * executor docstring: six phases, no phantom phase 0.

run_hf_job.py
  * add the Apache 2.0 license header (was flagged repeatedly).
  * default to a single GPU: flavor=h200, parallel_servers=1, num_gpus=1
    (scale to h200x4 noted in the docstring).
  * pin the install to @main instead of the feature branch (won't break
    after merge).

Naming / cleanup
  * rename dest_repo_id -> new_repo_id across config / script / example /
    test to match the LeRobot dataset edit tools.
  * rename prompt templates module_N_*.txt -> descriptive (plan_*,
    interjections_*, vqa.txt) and update every load_prompt() call.
  * remove dead _messages_to_prompt (used only by the removed in-process
    backends).
  * declare _warned_decode_fail (frames) and _warned_no_camera (vqa) as
    real init=False dataclass fields instead of getattr monkey-patches.
  * scope bandit B607 to the two ffmpeg subprocess.run sites via
    '# nosec B607' and drop it from the global skip list.

Tests
  * fix stale canned-VLM markers ('ONE realistic interruption' ->
    'compact interjection', 'Update the memory' -> 'compressed semantic
    memory') and drop the dead 'concise hierarchical PLAN' plan responders
    (plan generation is deterministic now) in run_e2e_smoke,
    test_pipeline_recipe_render, test_modules.
  * run_e2e_smoke now asserts interjection + speech rows are produced so a
    stale marker can't silently pass again.
  * drop remaining 'PR 1' / 'PR 2' references from test comments / names.

Verified: tests/annotations + tests/datasets/test_language +
tests/scripts/test_lerobot_annotate (31 passed); make-style E2E smoke
(interjections=1 speech_atoms=2); pre-commit (ruff, mypy, bandit,
prettier) clean.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

2026-06-03 18:30:46 +02:00

Pepijn

b9246ef61b

tests(annotations): guard on the 'dataset' extra so base fast-test tier skips cleanly

Fast Pytest Tests failed at COLLECTION in the base '--extra test' tier
with 'ModuleNotFoundError: No module named datasets': tests/annotations/
conftest.py imported the fixture dataset builder (-> lerobot.datasets ->
the HF 'datasets' lib + pandas/pyarrow), which only ship under the
'dataset' extra, so the whole annotations package crashed.

Fix uses the repo's proven module-level guard pattern (see
tests/datasets/test_language.py), NOT a conftest-level importorskip —
verified empirically that pytest.importorskip raised during conftest
*import* is treated as a collection ERROR (exit 1), while module-level
importorskip is a clean SKIP.

  * conftest.py: import build_annotation_dataset LAZILY inside the
    fixtures so the conftest itself imports cleanly in every tier.
  * test_modules / test_validator / test_writer / test_pipeline_recipe_
    render: add module-level pytest.importorskip('datasets') +
    ('pandas') before the pyarrow / lerobot.* imports (# noqa: E402 to
    match the existing convention). pyarrow-importing modules place the
    guard before the pyarrow import.
  * tests/scripts/test_lerobot_annotate.py: same guard (its _push_to_hub
    path imports lerobot.datasets).

Result:
  - base / hardware / viz tiers (no dataset extra): annotation tests
    skip cleanly; the rest of the suite runs -> exit 0.
  - dataset tier: datasets present -> guards pass through -> annotation
    tests run with the stub VLM. The pipeline modules import only
    stdlib + relative + lerobot.datasets (no module-level datatrove /
    vllm / openai), so they import fine there.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

2026-06-03 15:57:04 +02:00

pepijn

2ea0da2d9f

fix(annotate): tag uploaded dataset revision

Co-authored-by: Cursor <cursoragent@cursor.com>

2026-05-19 12:44:35 +00:00

3 Commits