lerobot-clone

mirror of https://github.com/huggingface/lerobot.git synced 2026-06-04 12:51:27 +00:00

Author	SHA1	Message	Date
Pepijn	c5042a6850	fix(annotate): stop action records + augmentation from corrupting RoboCasa labels Three compounding bugs made RoboCasa annotation produce off-task subtasks ('move stove to stove with left arm') and drifting augmentations ('wander around the kitchen' for 'Navigate to the stove'). 1. action_records.replace_subtask_text now defaults False. Overwriting the VLM's subtask text with a reconstruction of hallucinated {verb,object,arm,grasp,dest} fields is high-risk: navigation / non-manipulation tasks don't fit the schema and render to nonsense. Records are now additive by default (emit_record_row), never silently replacing subtask text. Flip replace_subtask_text on only for manipulation datasets verified to render cleanly. 2. _render_action_record_to_subtask_text drops a degenerate destination that just echoes the object (verb=move object=stove destination=stove -> 'move stove' instead of 'move stove to stove'). Also routes 'navigate' through the 'to <dest>' preposition family. 3. module_1_task_aug_axes.txt hardened: variants MUST preserve the goal/destination. Explicitly forbids 'Navigate to the stove' -> 'wander around the kitchen'. Only wording / arm / orientation / grasp may vary; verb meaning, object, and destination are fixed. examples/annotations/run_hf_job.py — corrected for RoboCasa: * derive_task_from_video=off (was =always). The dataset task string is authoritative and is what eval conditions on; =always threw it away, re-derived a hallucinated task from the video, and poisoned every downstream subtask/plan row. THIS was the dominant cause. * n_task_rephrasings=0 + task_aug_axes left off — RoboCasa eval uses exact task strings, so augmentation is unused/harmful. * action_records left off — manipulation schema doesn't fit atomic / navigation tasks. * plan_max_steps=6 to keep atomic-task decomposition tight. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-06-02 14:34:48 +02:00
Pepijn	98a519e7f2	fix(annotate): default frame provider to video keys, not image keys VideoFrameProvider derived its default camera and camera list from meta.camera_keys, which mixes image- and video-stored cameras. The clip/decode paths read videos/<key>/from_timestamp, which only exists for video keys, so an image-stored camera sorted first (e.g. observation.images.wrist) crashed the plan phase with a KeyError. Restrict the list and default to meta.video_keys. Add a regression test and point the example job at the dataset's actual video camera. Skip bandit B607 (ffmpeg/git are intentionally resolved via PATH). Co-authored-by: Cursor <cursoragent@cursor.com>	2026-06-02 12:09:55 +02:00
Pepijn	5dbf0fac5f	annotations(steerable): remove Phase 0 canonical vocabulary discovery Drops the optional Phase 0 vocabulary-discovery feature entirely. With the new structured action records (Phase 1a + 1b) providing cross-episode consistency via the deterministic template renderer, the older vocabulary-constraint path is redundant and adds a second constraint mechanism that wasn't well-validated in practice. Removed: * src/lerobot/annotations/steerable_pipeline/vocabulary.py (Vocabulary dataclass + VocabularyDiscoveryModule + load_/ save_vocabulary helpers; canonical_vocabulary.json on-disk format) * src/lerobot/annotations/steerable_pipeline/prompts/module_0_vocabulary.txt (Phase 0 VLM prompt) * tests/annotations/test_vocabulary.py Pruned wiring across: * config.py: VocabularyConfig dataclass + AnnotationPipelineConfig. vocabulary field * executor.py: vocabulary attribute on Executor + _run_vocabulary_ phase method + Phase 0 phases.append call in run() * modules/plan_subtasks_memory.py: Vocabulary import + vocabulary attribute + _subtask_vocabulary_block / _memory_vocabulary_block helpers + _canonicalize_subtask / _normalize / _invalid_subtasks / _build_subtask_retry_message methods + vocabulary-gated retry path in _generate_subtasks + empty-episode warning + _NORMALIZE_ STRIP_TOKENS constant * prompts/module_1_subtasks.txt: {vocabulary_block} placeholder * prompts/module_1_memory.txt: {vocabulary_block} placeholder * __init__.py: Vocabulary / VocabularyDiscoveryModule / load_ vocabulary / save_vocabulary / vocabulary_path / VOCABULARY_ FILENAME re-exports * scripts/lerobot_annotate.py: VocabularyDiscoveryModule import + instantiation + executor argument * examples/annotations/run_hf_job.py: --vocabulary.enabled=false flag + docstring references + inline phase-0 comment The original free-form rephrasings path stays (PlanConfig. n_task_rephrasings still works when task_aug_axes.enabled=False). Action records remain the preferred mechanism for cross-episode subtask consistency. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-06-02 11:48:27 +02:00
pepijn	920c6ef5a2	docs(annotate): disable phase-0 vocabulary discovery by default in run_hf_job Heterogeneous datasets (different tasks/scenes across episodes) don't share a single small subtask + memory vocabulary, so the canonical vocabulary phase narrowed every episode to the wrong target distribution. Flip the example to free-form generation by default and document the ``--vocabulary.enabled=true`` switch for homogeneous datasets where the canonical vocabulary still helps the downstream policy. No pipeline-code changes: ``VocabularyConfig.enabled`` already gates phase 0 (see ``executor.py:_run_vocabulary_phase`` and ``VocabularyConfig`` docstring) and falls back to free-form generation. Co-authored-by: Cursor <cursoragent@cursor.com>	2026-05-26 04:42:10 +00:00
pepijn	54221ceea2	feat(annotate): let the VLM decide vocabulary size Hardcoding ``n_subtask_target=10`` and ``n_memory_target=6`` baked task complexity into the config — a simple pick-and-place needs ~6, a multi-step recipe needs ~20. The VLM already sees the clips, so let it pick the count itself from what's recurring across episodes. Drop both knobs from ``VocabularyConfig`` and the ``module_0_vocabulary`` prompt template. The prompt now says "decide the count yourself based on what you see — the smallest set that still covers every recurring phase" and adds an "each label must recur across the demos" rule so the VLM filters out one-off motions. Update the launcher script + docs to remove the old knobs. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Co-authored-by: Cursor <cursoragent@cursor.com>	2026-05-22 11:46:31 +00:00
pepijn	369ab17110	fix(annotate): update run_hf_job CLI args for renamed namespaces + phase 0 Three stale things in the launcher script: - ``--module_1/2/3.*`` no longer exist; review commit `fd18beb` renamed the CLI namespaces to ``--plan/interjections/vqa``. Forwarded all eight existing args to their new names. - ``--push_to_hub`` is now a bool; the destination repo lives at ``--dest_repo_id``. Split the single positional into both args. - ``openai`` was missing from the pip install list, which the prior review review (claude bot, 2026-05-08) flagged — the default vlm backend is ``openai`` so the job would have ImportError'd. Added. Also expose the new phase 0 (canonical vocabulary discovery) knobs explicitly: ``--vocabulary.sample_episodes``, ``--n_subtask_target``, ``--n_memory_target``. Defaults are sane (3 / 10 / 6) but worth flagging in the example so the operator knows what they're running. Update the docstring + section comments to match the current phase layout (vocabulary → plan → interjections → vqa → writer). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Co-authored-by: Cursor <cursoragent@cursor.com>	2026-05-22 11:43:06 +00:00
Pepijn	c5676ef1b3	feat(annotate): add dest_repo_id for separate push target Adds an optional `dest_repo_id` to AnnotationPipelineConfig. When set, `push_to_hub` uploads the annotated dataset there instead of overwriting the source `repo_id`, restoring separate source/destination repos. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-18 15:05:23 +02:00
Pepijn Kooijmans	fd18beb3a1	review: address CarolinePascal feedback - name the three modules everywhere (plan / interjections / vqa) instead of module_1/2/3 — config classes, config fields, executor params, staging keys and phase names now carry the module name - rename examples/annotation -> examples/annotations; add the Apache header to run_hf_job.py - drop the unused GeneralVqaModule._generate_one - remove "PR 1" references from comments/docstrings - frames.py: rely on the always-defined LeRobotDatasetMetadata.camera_keys - executor.py: read/write meta/info.json via load_info / write_info - reader.py: load meta/tasks.parquet via io_utils.load_tasks - make --push_to_hub a bool; push the annotated dataset back to --repo_id - move the on-disk test dataset builder into tests/fixtures (build_annotation_dataset); run_e2e_smoke reuses it - clarify in the docs that the vqa module grounds each pair on a single frame (K = per-tick anchor count) - hoist stdlib dynamic imports to module scope Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-18 12:03:25 +02:00

8 Commits