lerobot-clone

mirror of https://github.com/huggingface/lerobot.git synced 2026-06-02 20:01:25 +00:00

Author	SHA1	Message	Date
Pepijn	cd128cbbd5	annotate: add verb-scoped disambiguation rules to subtask prompt Adopt the one prompt technique Scale's dense-captioning study found reliably positive: targeted, verb-scoped, visually-grounded disambiguation rules. Their lesson was that such a rule must fire ONLY on the spatial situation it names (their narrow 'Stack vs Put' rule helped; an over-broad directional 'Scoop' rule bled into other verbs and hurt), so each rule here is phrased visually and scoped to one confusable pair: * stack-vs-put (on top of an object vs on a surface) * insert-vs-put (fitted slot vs surface) * pick-up/retrieve-vs-put (decide by which way the OBJECT moves: gripper closes + object moves with hand = pick up; gripper opens + object stays = put — directly targets Scale's dominant direction-flip failure) * pour-vs-put (tilt + flow vs untilted move) This is the highest-confidence, lowest-risk change from the Scale findings; our pipeline already aligns with their 'avoid' list (no temporal tokens, no overlays, no fancy sampling, no sequential context injection, uniform sampling, describe-don't-predict framing). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-06-02 16:10:49 +02:00
Pepijn	1fb46ab300	annotate: cap embedded-frame budget to fit VLM context (fix 32k overflow) Switching the plan module to embedded frames (use_video_url=false) exposed a context overflow: at frames_per_second=2.0 with the old max_video_frames=128 default, a 480x640 episode embeds ~128 frames ≈ 33-39k vision tokens, over the model's 32768 context — every plan call died with 'Input length exceeds maximum context length' (HTTP 400), crashing the whole annotation job. The video_url path never hit this because the server downsampled; the embedded path sends every sampled frame, so the frame count is a hard token budget. Fix: * config default max_video_frames 128 -> 32 (~8-10k vision tokens, comfortable headroom for the prompt + describe/verify passes). Frames are still sampled UNIFORMLY across the whole episode, so longer episodes are subsampled, not truncated — full temporal coverage preserved, just coarser density. * run_hf_job.py: frames_per_second 2.0 -> 1.0, explicit --plan.max_video_frames=32, with a comment explaining the token budget and the 'do not raise toward 128 with embedded frames' rule. Only the plan module embeds the full episode; VQA (1 frame/tick) and interjections (4-frame window) were never at risk. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-06-02 16:02:25 +02:00
Pepijn	79f9a84407	annotate: make full-episode subtask coverage unconditional Remove the subtask_full_coverage config flag. Stitching subtask spans into a contiguous full-episode cover is now always applied in _generate_subtasks — a sparse / gap-ridden subtask timeline is never desirable for conditioning, so there's no reason to make it optional. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-06-02 15:36:23 +02:00
Pepijn	799d0e3bcc	annotate: stitch subtasks to full-episode coverage The verify pass prunes subtasks, which could leave the first subtask starting after t0 or leave gaps between spans — so the subtask timeline no longer tiled the episode and frames fell through with no active subtask label. New deterministic post-step (no VLM call), default on via PlanConfig.subtask_full_coverage: * first subtask start pulled back to the episode's first frame t0 (idle / approach before the first labelled action folds into it) * each subtask end snapped to the next subtask start (gaps closed) * last subtask end extended to the last frame t_last Runs after segment + verify in _generate_subtasks. Starts other than the first are left as the VLM/verify produced them (already frame- snapped + distinct), so the cover is contiguous and non-overlapping. Disable with --plan.subtask_full_coverage=false if a consumer wants sparse subtasks. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-06-02 15:34:34 +02:00
Pepijn	1fe1463ae0	annotate: enable subtask describe->segment->verify chain by default Flip PlanConfig.subtask_describe_first and subtask_verify defaults False -> True. Every subtask annotation now runs the 3-call grounding + pruning chain by default, since the single-call path reliably hallucinates steps from the task text. Costs 2 extra VLM calls/episode; disable with --plan.subtask_describe_first=false / --plan.subtask_ verify=false on easy datasets where fewer calls matter more than label fidelity. run_hf_job.py: drop the now-redundant explicit flags, leave a note that the chain is default-on and how to opt out. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-06-02 15:13:50 +02:00
Pepijn	dcd368e1f8	annotate: multi-call subtask quality chain (describe -> segment -> verify) The single-call 'watch video -> emit subtask JSON' pattern makes the VLM commit to structured output before reasoning about what it saw, so it pattern-matches the task text and hallucinates steps. Split it into an opt-in multi-call chain that grounds first and prunes last. New PlanConfig flags (both default False -> single-call unchanged): * subtask_describe_first: a grounding pass narrates ONLY what is visible in the video (no subtask JSON yet). That description is injected into the segmentation prompt via a new {observation_block} placeholder, so the model segments its own grounded observations instead of the instruction text. +1 VLM call/episode. * subtask_verify: after segmentation, an adversarial pass re-watches the video and drops any candidate subtask it cannot see. Can only PRUNE (never add/rewrite/move) and fails open (keeps un-verified spans if the call returns nothing). +1 VLM call/episode. Implementation: * _generate_subtasks now orchestrates describe -> segment -> verify. * Factored span cleaning into _clean_spans (shared by segment + verify outputs); added _describe_episode and _verify_subtasks helpers. * New prompts module_1_subtask_describe.txt (returns {description}) and module_1_subtask_verify.txt (returns pruned {subtasks}). * module_1_subtasks.txt gains a {observation_block} slot at the top. run_hf_job.py enables both for the RoboCasa run (3 VLM calls/episode for subtasks). Combined with single-camera grounding + the embedded- frame path, this is the high-quality configuration. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-06-02 15:12:46 +02:00
Pepijn	ba5d4c5cd8	annotate: kill subtask hallucination + single-camera grounding Two fixes for 'subtasks describe actions not in the video' plus a way to focus the whole pipeline on one camera. ANTI-HALLUCINATION 1. _episode_video_block: when use_video_url is set but clip extraction fails, FALL BACK to embedded frames instead of returning an empty block. An empty block left the VLM with zero visual grounding, so it invented subtasks from the task text alone — the likely root cause of hallucinated steps. Now logs a warning and embeds frames. 2. module_1_subtasks.txt gains a GROUNDING preamble (overrides all other rules): label only motion visible in specific frames; never invent/anticipate/pad; max_steps is a CEILING not a target; atomic demos may be exactly ONE subtask; the VIDEO is ground truth, not the instruction text. SINGLE-CAMERA GROUNDING * New VqaConfig.restrict_to_default_camera (default False). When True, the VQA module grounds on only the --vlm.camera_key stream instead of iterating every camera — matching the plan / interjection modules, which already use that single camera. Now the whole pipeline can focus on one view (e.g. observation.images.base). run_hf_job.py updated: * use_video_url=false + frames_per_second=2.0 — embed frames directly (most reliable; no silent text-only failure mode) with dense grounding. * vqa.restrict_to_default_camera=true — VQA on the single camera too. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-06-02 15:08:25 +02:00
Pepijn	7454b4c993	annotate: remove action-record subtask-text replacement entirely Drops the replace_subtask_text option and the _render_action_record_to_subtask_text renderer. Action records are now strictly additive: when action_records.enabled=True the module emits style='action_record' rows (the typed {verb,object,arm,grasp,dest, mistake} schema) and NEVER rewrites the subtask text the policy conditions on. The render-back-to-text path was the source of corrupted subtasks (navigation tasks produced 'move stove to stove', manipulation tasks got spurious 'with left arm using pinch grip' suffixes). Reconstructing natural-language subtasks from hallucinated structured fields is inherently fragile, so the capability is removed rather than guarded. Removed: * ActionRecordsConfig.replace_subtask_text field * PlanSubtasksMemoryModule._render_action_record_to_subtask_text * the span['text'] = canonical_text overwrite in run_episode Updated docstrings + run_hf_job.py comment accordingly. emit_record_row (default True) is now the feature's only output. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-06-02 14:42:36 +02:00
Pepijn	c5042a6850	fix(annotate): stop action records + augmentation from corrupting RoboCasa labels Three compounding bugs made RoboCasa annotation produce off-task subtasks ('move stove to stove with left arm') and drifting augmentations ('wander around the kitchen' for 'Navigate to the stove'). 1. action_records.replace_subtask_text now defaults False. Overwriting the VLM's subtask text with a reconstruction of hallucinated {verb,object,arm,grasp,dest} fields is high-risk: navigation / non-manipulation tasks don't fit the schema and render to nonsense. Records are now additive by default (emit_record_row), never silently replacing subtask text. Flip replace_subtask_text on only for manipulation datasets verified to render cleanly. 2. _render_action_record_to_subtask_text drops a degenerate destination that just echoes the object (verb=move object=stove destination=stove -> 'move stove' instead of 'move stove to stove'). Also routes 'navigate' through the 'to <dest>' preposition family. 3. module_1_task_aug_axes.txt hardened: variants MUST preserve the goal/destination. Explicitly forbids 'Navigate to the stove' -> 'wander around the kitchen'. Only wording / arm / orientation / grasp may vary; verb meaning, object, and destination are fixed. examples/annotations/run_hf_job.py — corrected for RoboCasa: * derive_task_from_video=off (was =always). The dataset task string is authoritative and is what eval conditions on; =always threw it away, re-derived a hallucinated task from the video, and poisoned every downstream subtask/plan row. THIS was the dominant cause. * n_task_rephrasings=0 + task_aug_axes left off — RoboCasa eval uses exact task strings, so augmentation is unused/harmful. * action_records left off — manipulation schema doesn't fit atomic / navigation tasks. * plan_max_steps=6 to keep atomic-task decomposition tight. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-06-02 14:34:48 +02:00
Pepijn	98a519e7f2	fix(annotate): default frame provider to video keys, not image keys VideoFrameProvider derived its default camera and camera list from meta.camera_keys, which mixes image- and video-stored cameras. The clip/decode paths read videos/<key>/from_timestamp, which only exists for video keys, so an image-stored camera sorted first (e.g. observation.images.wrist) crashed the plan phase with a KeyError. Restrict the list and default to meta.video_keys. Add a regression test and point the example job at the dataset's actual video camera. Skip bandit B607 (ffmpeg/git are intentionally resolved via PATH). Co-authored-by: Cursor <cursoragent@cursor.com>	2026-06-02 12:09:55 +02:00
Pepijn	5dbf0fac5f	annotations(steerable): remove Phase 0 canonical vocabulary discovery Drops the optional Phase 0 vocabulary-discovery feature entirely. With the new structured action records (Phase 1a + 1b) providing cross-episode consistency via the deterministic template renderer, the older vocabulary-constraint path is redundant and adds a second constraint mechanism that wasn't well-validated in practice. Removed: * src/lerobot/annotations/steerable_pipeline/vocabulary.py (Vocabulary dataclass + VocabularyDiscoveryModule + load_/ save_vocabulary helpers; canonical_vocabulary.json on-disk format) * src/lerobot/annotations/steerable_pipeline/prompts/module_0_vocabulary.txt (Phase 0 VLM prompt) * tests/annotations/test_vocabulary.py Pruned wiring across: * config.py: VocabularyConfig dataclass + AnnotationPipelineConfig. vocabulary field * executor.py: vocabulary attribute on Executor + _run_vocabulary_ phase method + Phase 0 phases.append call in run() * modules/plan_subtasks_memory.py: Vocabulary import + vocabulary attribute + _subtask_vocabulary_block / _memory_vocabulary_block helpers + _canonicalize_subtask / _normalize / _invalid_subtasks / _build_subtask_retry_message methods + vocabulary-gated retry path in _generate_subtasks + empty-episode warning + _NORMALIZE_ STRIP_TOKENS constant * prompts/module_1_subtasks.txt: {vocabulary_block} placeholder * prompts/module_1_memory.txt: {vocabulary_block} placeholder * __init__.py: Vocabulary / VocabularyDiscoveryModule / load_ vocabulary / save_vocabulary / vocabulary_path / VOCABULARY_ FILENAME re-exports * scripts/lerobot_annotate.py: VocabularyDiscoveryModule import + instantiation + executor argument * examples/annotations/run_hf_job.py: --vocabulary.enabled=false flag + docstring references + inline phase-0 comment The original free-form rephrasings path stays (PlanConfig. n_task_rephrasings still works when task_aug_axes.enabled=False). Action records remain the preferred mechanism for cross-episode subtask consistency. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-06-02 11:48:27 +02:00
Pepijn	2bfaf44db2	annotations(steerable): structured action records + 5-axis task augmentation EgoMimic-inspired additions to the plan module, both opt-in for back-compat. 1. PHASE 1a + 1b: per-subtask structured action records * cfg.action_records.enabled=True triggers, after Phase 1 subtask-span generation, one extra VLM call per subtask to extract a typed record: {verb, object, arm, grasp_type, destination, mistake} * A deterministic Python template (_render_action_record_to_subtask_text) renders the record back to canonical subtask text. When replace_subtask_ text=True (default), this REPLACES the VLM's free-form text — eliminates cross-episode phrasing drift. * When emit_record_row=True (default), the structured record is also emitted as a row with style='action_record' (added to PERSISTENT_STYLES) so downstream training can consume the typed schema directly. * Verb + grasp vocabularies are configurable. Out-of-vocab values are rejected at extraction time. 2. STRUCTURED 5-AXIS TASK AUGMENTATION * cfg.task_aug_axes.enabled=True replaces the free-form n_task_rephrasings path with a structured prompt producing variants along 5 named axes: synonym_paraphrase (3) omit_arm (3) omit_orientation (2) omit_grasp_method (2) combined_omissions (2) Total ~12 variants. Axes with nothing to omit emit fewer entries. * Each variant is emitted as a task_aug row at t=0 (existing style). Inspired by https://github.com/GaTech-RL2/EgoVerse/tree/main/egomimic/scripts/language_process — they pay Scale AI annotators to fill a structured form and then generate language via a deterministic prompt. We get the same hallucination-reducing structure via one extra VLM call per subtask. Files: src/lerobot/datasets/language.py src/lerobot/annotations/steerable_pipeline/config.py src/lerobot/annotations/steerable_pipeline/modules/plan_subtasks_memory.py src/lerobot/annotations/steerable_pipeline/prompts/module_1_action_record.txt src/lerobot/annotations/steerable_pipeline/prompts/module_1_task_aug_axes.txt Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-06-02 11:35:35 +02:00
pepijn	1e7c0d6aa1	annotate(plan): force composite-action subtasks; ban ultra-fine splits Tighten ``module_1_subtasks.txt`` so the VLM emits one composite atomic action per subtask instead of decomposing every pick into ``move to X`` / ``grasp X`` / ``lift X``: - Lock the verb vocabulary to the composite set the low-level policy actually learns end-to-end: ``pick up`` (approach + grasp + lift), ``put``/``place`` (transport + release), ``push``, ``pull``, ``turn``, ``press``, ``open``, ``close``, ``pour``, ``insert``. ``go to`` is allowed only as a pure relocation between phases. - Add an explicit ``Forbidden ultra-fine splits`` block enumerating the patterns the VLM was tempted to emit (``move to X``, ``reach for X``, ``grasp X``, ``lift X``, ``release X``) and instructing it to fold each into its parent composite. - Rewrite the Good/Bad examples to match the composite contract; the previous ``"move to blue cube" / "grasp blue cube" / "lift blue cube"`` Good list was actively encouraging the over- segmentation pattern this prompt is supposed to prevent. - Tighten the duration rule: candidates shorter than ``min_subtask_seconds`` must be merged into a neighbour rather than emitted. Pairs with bumping the runtime floor to 3 s so composites have room to land. Pure prompt change — no code or schema change. Existing canonical- vocabulary retry path is unaffected (the new verb whitelist lives in prose, not in the validator). Co-authored-by: Cursor <cursoragent@cursor.com>	2026-05-26 05:14:30 +00:00
pepijn	920c6ef5a2	docs(annotate): disable phase-0 vocabulary discovery by default in run_hf_job Heterogeneous datasets (different tasks/scenes across episodes) don't share a single small subtask + memory vocabulary, so the canonical vocabulary phase narrowed every episode to the wrong target distribution. Flip the example to free-form generation by default and document the ``--vocabulary.enabled=true`` switch for homogeneous datasets where the canonical vocabulary still helps the downstream policy. No pipeline-code changes: ``VocabularyConfig.enabled`` already gates phase 0 (see ``executor.py:_run_vocabulary_phase`` and ``VocabularyConfig`` docstring) and falls back to free-form generation. Co-authored-by: Cursor <cursoragent@cursor.com>	2026-05-26 04:42:10 +00:00
Pepijn	c37b1fc7d0	Merge origin/feat/language-annotation-pipeline (8 fix(annotate) commits + vocabulary phase)	2026-05-25 15:47:25 +02:00
Pepijn	9020635b14	Merge branch 'main' into feat/language-annotation-pipeline Resolves conflicts from 32 commits on main: * docs/source/_toctree.yml — keep both new toc entries (annotation_pipeline + video_encoding_parameters). * docs/source/language_and_recipes.mdx — adopt main's section ordering (Layer 2 before "Temporal semantics") and float32 timestamp dtype to match the codebase. * src/lerobot/configs/__init__.py — keep both export sets (recipe + video encoder). * src/lerobot/datasets/dataset_metadata.py — drop redundant lazy imports (top-level imports cover both LANGUAGE_COLUMNS and DEFAULT_TOOLS); adopt main's @tools.setter for info.json write-back. * src/lerobot/datasets/feature_utils.py — call the real validate_feature_language() instead of returning "". * src/lerobot/datasets/language.py — float32 timestamps to match pa.float32() used in video_utils.py and the rest of the codebase. * src/lerobot/datasets/language_render.py — adopt main's unwrap_scalar() helper (drops two hand-rolled .item()/list unwrappers); float32 in docstring. * src/lerobot/processor/render_messages_processor.py — drop PR-local _scalar() helper, use shared unwrap_scalar(). * tests/datasets/test_language.py — adopt main's new float32 dtype + validate_feature_language warning tests. * tests/datasets/test_dataset_metadata.py — adopt main's new tools.setter persist/clear tests. * uv.lock — regenerated cleanly from main's resolver. 90 of 92 touched tests pass. Two pre-existing test failures (test_module1_plan_memory_subtask_smoke, test_module2_mid_episode_emits_paired_interjection_and_speech in tests/annotations/test_modules.py) are unrelated to this merge — that test file doesn't exist on main, so the failures originate on the branch and are addressed by the 8 newer fix(annotate) commits already on origin that will land in a follow-up. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-25 15:46:32 +02:00
pepijn	471b2b1b1d	fix(annotate): bump same-frame subtasks onto distinct frames If two consecutive VLM-emitted subtask spans have ``start`` timestamps that round to the same source frame after ``snap_to_frame`` (e.g. on short episodes the VLM sometimes nominates two ~adjacent action boundaries within one 30 Hz step), the writer emits two ``style=subtask`` rows at the identical persistent timestamp. The training-time renderer's default binding ``subtask: active_at(t, style=subtask)`` then raises: ValueError: Ambiguous resolver for style='subtask'; add role=..., tool_name=..., or camera=... to disambiguate. … and the whole training run dies on the first batch. Observed concretely on ``pepijn223/super_poulain_vocab2`` (job 22159979): episodes 3 and 30 each had two subtask rows at the same timestamp (``release yellow cube`` + ``retract arm`` snapping to the same frame). Add ``_dedupe_starts_to_distinct_frames`` to walk the cleaned span list and, whenever a snapped start collides with one already used, push the later span onto the next free frame timestamp. Both subtasks survive on distinct timestamps; the renderer can now disambiguate. If the episode genuinely has no later free frame (extremely unlikely — would require a same-timestamp collision on the very last frame of the episode), the later span is dropped with a warning rather than left to poison the render. New test ``test_plan_module_bumps_collocated_subtasks_to_distinct_frames`` locks in the contract; full vocabulary suite is 14/14 green. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Co-authored-by: Cursor <cursoragent@cursor.com>	2026-05-23 19:31:44 +00:00
pepijn	a15e16c072	fix(annotate): replace fuzzy subtask snapping with strict match + one-shot retry The Jaccard-overlap snap was warping VLM output into wrong canonical labels — e.g. an off-vocab "consult the wizard" span would silently become "grasp blue cube" if that scored highest. Even with a higher floor the operator can't tell which subtasks were paraphrases vs genuine mislabels in the resulting dataset. Replace with strict exact-match validation + a single targeted retry: 1. Generate subtasks as before. 2. If any returned subtask's normalised form (lowercased, articles stripped, whitespace collapsed) isn't in the canonical vocab, fire one retry call naming the offending strings and re-sending the full canonical list. The retry prompt requires byte-identical output from the vocab. 3. After the retry, validate again. Spans still off-vocab are dropped — no fuzzy snapping ever produces a different canonical label than the VLM actually emitted. 4. If every span ends up off-vocab even after the retry, warn loudly so the operator extends ``meta/canonical_vocabulary.json`` to cover the missing phase. The episode is left with empty subtasks rather than silently fabricated ones — visibility > sweep-under- the-rug. Promote ``_NORMALIZE_STRIP_TOKENS`` to a class constant and split the normalisation helper out so the retry-validation and the final canonicalisation share one source of truth. Tests: - test_plan_module_accepts_article_only_difference: "grasp the blue cube" still maps to canonical "grasp blue cube" (article-tolerant). - test_plan_module_retries_when_subtask_off_vocab: paraphrase triggers the retry which the VLM corrects in pass 2. - test_plan_module_drops_off_vocab_subtask_after_retry: VLM that refuses to correct → bad span dropped, in-vocab span kept. - test_plan_module_empty_when_all_off_vocab_after_retry: every span off-vocab → episode left empty (no warping). All 13 vocabulary tests pass. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Co-authored-by: Cursor <cursoragent@cursor.com>	2026-05-23 09:57:27 +00:00
pepijn	336af85c09	fix(annotate): never leave an episode with zero canonical subtasks When the canonical vocabulary is enabled and the VLM produces spans that don't overlap any canonical label, the previous Jaccard-floor (0.5) dropped them and the episode came out with no subtasks at all — invisible to the downstream policy. Observed on ``pepijn223/super_poulain_vocab``: some episodes had empty subtask columns because every VLM-emitted phrase scored below 0.5 against the discovered vocabulary. Two-pass canonicalisation: - First pass keeps the Jaccard floor (lowered from 0.5 → 0.25, to let mild paraphrases through) and drops everything below. - If that first pass leaves the episode with zero subtasks, fall back to a second pass that always snaps each VLM span to its nearest canonical label by Jaccard (no floor). The episode ends up with subtasks even when the vocabulary missed a phase — a slightly-wrong canonical label is still closer to the right motion than nothing at all. - Log loudly when the fallback fires so the operator can spot coverage gaps in ``meta/canonical_vocabulary.json``. - Log a per-episode count at INFO when some (but not all) spans were dropped so it's visible without spamming the run output. Promote the Jaccard floor + ignore-tokens to class constants so they're a single edit point. Add ``force=True`` parameter to ``_canonicalize_subtask`` for the no-floor fallback path. New test ``test_plan_module_snaps_when_all_off_vocab`` covers the fallback; existing ``test_plan_module_drops_off_vocab_subtask`` is adjusted to keep at least one in-vocab span so the floor path can still fire and is exercised. All 12 vocabulary tests pass. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Co-authored-by: Cursor <cursoragent@cursor.com>	2026-05-22 12:44:03 +00:00
pepijn	54221ceea2	feat(annotate): let the VLM decide vocabulary size Hardcoding ``n_subtask_target=10`` and ``n_memory_target=6`` baked task complexity into the config — a simple pick-and-place needs ~6, a multi-step recipe needs ~20. The VLM already sees the clips, so let it pick the count itself from what's recurring across episodes. Drop both knobs from ``VocabularyConfig`` and the ``module_0_vocabulary`` prompt template. The prompt now says "decide the count yourself based on what you see — the smallest set that still covers every recurring phase" and adds an "each label must recur across the demos" rule so the VLM filters out one-off motions. Update the launcher script + docs to remove the old knobs. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Co-authored-by: Cursor <cursoragent@cursor.com>	2026-05-22 11:46:31 +00:00
pepijn	369ab17110	fix(annotate): update run_hf_job CLI args for renamed namespaces + phase 0 Three stale things in the launcher script: - ``--module_1/2/3.*`` no longer exist; review commit `fd18beb` renamed the CLI namespaces to ``--plan/interjections/vqa``. Forwarded all eight existing args to their new names. - ``--push_to_hub`` is now a bool; the destination repo lives at ``--dest_repo_id``. Split the single positional into both args. - ``openai`` was missing from the pip install list, which the prior review review (claude bot, 2026-05-08) flagged — the default vlm backend is ``openai`` so the job would have ImportError'd. Added. Also expose the new phase 0 (canonical vocabulary discovery) knobs explicitly: ``--vocabulary.sample_episodes``, ``--n_subtask_target``, ``--n_memory_target``. Defaults are sane (3 / 10 / 6) but worth flagging in the example so the operator knows what they're running. Update the docstring + section comments to match the current phase layout (vocabulary → plan → interjections → vqa → writer). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Co-authored-by: Cursor <cursoragent@cursor.com>	2026-05-22 11:43:06 +00:00
pepijn	86a7edc590	feat(annotate): phase 0 — derive canonical vocabulary from sample episodes The pipeline previously emitted near-unique subtask + memory phrasings per episode (free-form LLM rephrasing). On the downstream low-level policy that collapses the action expert's conditioning to noise: every episode pairs a different paraphrase with similar motions, so the expert learns a flat scene-prior that ignores the subtask string — then at inference the high-level head invents yet another paraphrase and the expert produces tiny "uncertain hover" chunks. Add a vocabulary-discovery phase (phase 0) that runs once per dataset: - watches the first ``vocabulary.sample_episodes`` (default 3) episode videos as one Qwen-VL prompt, - asks the VLM to derive ~``n_subtask_target`` canonical imperative subtask labels and ~``n_memory_target`` first-person past-tense memory milestones that recur across the demos, - persists them to ``meta/canonical_vocabulary.json`` (human- inspectable, hand-editable), and - wires the resulting ``Vocabulary`` into the ``plan`` module so every per-episode subtask + memory call is constrained to those exact strings (both as prompt-side instructions and post-VLM validation: paraphrases snap to the closest canonical entry via token-set overlap; below a 0.5 Jaccard floor the subtask is dropped rather than warped into something semantically wrong). Operator workflow: - first run discovers the vocabulary, writes the JSON, and runs the ``plan`` module against it, - subsequent runs reuse the on-disk file (``reuse_existing=True`` default) so hand-edits stick, - set ``--vocabulary.enabled=False`` to fall back to free-form generation (the original behaviour). The discovery prompt forbids gerunds / third-person / adverbs and caps the lists to the requested counts, matching the Hi-Robot / π0.6-MEM convention of small per-environment vocabularies. The ``plan`` module's subtask + memory prompts grow a conditional ``{vocabulary_block}`` slot rendered only when a vocabulary is present; without one the templates collapse to their previous free-form form. Tests: 11 new unit tests under tests/annotations/test_vocabulary.py cover the on-disk round-trip, discovery against the fixture dataset, ``reuse_existing`` short-circuit, paraphrase canonicalisation, off- vocab subtask dropping, and the no-vocabulary pass-through path. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Co-authored-by: Cursor <cursoragent@cursor.com>	2026-05-22 11:40:05 +00:00
Pepijn	8194897994	fix(deps): cap placo below 0.9.16 and harden kinematics import (#3647 ) * fix(deps): cap placo below 0.9.16 and harden kinematics import placo 0.9.16 links against liburdfdom_sensor.so.4, which is unavailable on Ubuntu 24.04 (noble ships urdfdom 3.x). Importing placo on that base crashes with: ImportError: liburdfdom_sensor.so.4.0: cannot open shared object file This broke nightly Latest Deps tests (CPU and GPU) when the lockfile upgrade picked placo 0.9.16, since lerobot.model.kinematics unconditionally imports placo when _placo_available is true, and that check (importlib.util.find_spec) cannot detect dlopen failures of transitive shared libraries — so unrelated subsystems (RL actor, gym_manipulator) became unimportable. Two changes: 1. Pin placo to <0.9.16 in pyproject.toml + regenerate uv.lock (0.9.16 → 0.9.15). Short-term unblock for nightly CI until system urdfdom 4.x is broadly available. 2. Harden the import guard in src/lerobot/model/kinematics.py: wrap 'import placo' in try/except ImportError so a missing transitive .so no longer crashes module import. RobotKinematics instantiation now raises an informative ImportError citing the underlying dlopen failure via _raise_if_placo_unusable(). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(kinematics): hoist _placo_runtime_error to module scope for mypy Mypy walks the TYPE_CHECKING branch in which the runtime else-block is not executed, so _placo_runtime_error was only defined at runtime and mypy reported 'Name "_placo_runtime_error" is not defined' on the three references inside _raise_if_placo_unusable. Declare the symbol unconditionally at module scope with a default of None; the runtime import-failure branch still assigns to it. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * style(kinematics): drop verbose comments around placo import guard Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-22 12:03:07 +02:00
Haoming Song	9f437d86b6	fix(groot): align GR00TN15Config with transformers config dataclasses (#3606 ) * fix(gr00t): fix gr00t config dataclass init TypeError * fix(groot): guard strict config decorator without transformers for passing CI --------- Co-authored-by: Pepijn <138571049+pkooij@users.noreply.github.com>	2026-05-22 10:31:04 +02:00
Haoming Song	b74a551d38	fix(pi0, pi05): stabilize torch.compile and expand test coverage (#3610 ) * chore(gr00t): sync with #3606 for fixing gr00t config crash * fix(pi0&pi05): fix graph break caused by deepcopy of past_key_values in sample_actions * fix(pi0&pi05): fix frequent recompile caused by compute_layer_complete * feat(test): add compile test and benchamrk for pi0 and pi05 * feat(test): add comprehensive testing for pi0 and pi05. Including processor, forward, sample action, etc.	2026-05-22 10:29:34 +02:00
Nikodem Bartnik	c0a2e9814d	fix examples (#3623 ) - Fixed broken API examples in Lerobot Imitation Learning Documentation - Teleoperation with cameras improved by adding a fixed frequency in the loop (without it the cameras feed gets very slow) - Wrapped record example script in main() to avoid problems on Mac - Previously teleoperation example was using SO-ARM and teleoperation with cameras was using Koch. I changed it to use SO-ARM in all of the examples. - Added section on how to train with HF Jobs - CLI and Python examples - Replaced lerobot-record with lerobot-rollout in policies examples	2026-05-21 22:14:07 +02:00
Khalil Meftah	bac4f61eae	refactor: support custom progress parquet overlays (#3640 )	2026-05-21 14:32:10 +02:00
Virgileboat	f4b834844e	Feat/clean can bus (#3526 ) * change timeout for handshake * enforce last state read when querry * change import order * fix(motors): flush stale robstride RX and harden feedback drain * robstride: remove redundant timeout and max_messages casts * bugfix + %-style * update exception catch	2026-05-21 11:44:04 +02:00
Pepijn	a0233f53f4	feat(annotate): default VLM to Qwen3.6-35B-A3B-FP8 Match the production target used in examples/annotations/run_hf_job.py. Per Scale Labs' dense-captioning ablations, model capacity dominates prompt-engineering gains; defaulting to the larger model avoids shipping a worst-tier configuration out of the box. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-20 11:46:59 +02:00
Roham Z. Nobari	dfdc48a7f1	fix(datasets): bound VideoDecoderCache to prevent OOM on large datasets (#3614 ) VideoDecoderCache used an unbounded dict keyed on absolute path, with no eviction in the standard LeRobotDataset path. With shuffled iteration over datasets that have many distinct mp4 files, every DataLoader worker accumulated one cached (VideoDecoder, fsspec file handle) pair per distinct path it had ever touched. Per-entry cost is ~3-5 MB of host RAM plus one open FD; at ~8 k entries this is roughly 30 GB per worker. This was hit in the wild during a SmolVLA training run on a 4,195-episode SO-101 dataset (8,390 mp4s, two cameras per episode). dmesg showed anon-rss climbing to 34.9 GB on a single pt_data_worker before the OOM killer fired ~30 min into training; with --num_workers=8 the per-worker peak halved to 17.9 GB, which is the expected inverse-scaling signature when the leak is per-decode and the workload is split across workers. The working workaround on the affected platform was --dataset.video_backend=pyav, because the pyav path opens/closes per call and never touches this cache. Switch the backing store to an OrderedDict and evict LRU entries when the cap is reached, closing the evicted file handle inside the lock so we do not leak FDs either. Default cap is DEFAULT_DECODER_CACHE_SIZE = 100, overridable via LEROBOT_VIDEO_DECODER_CACHE_SIZE or by passing max_size= to the constructor; max_size=None restores the legacy unbounded behaviour for callers that need it. Validation on the original failing workload (decode_video_frames_torchcodec called over real mp4s from the affected SO-101 dataset): unbounded: 300 files -> +1087 MB host RSS, cache=300, still climbing cap=50: 500 files -> +266 MB host RSS, cache=50, stable cap=50: 2000 calls -> +312 MB host RSS, cache=50, stable cap=100: 1000 calls -> +470 MB host RSS, cache=100, stable Three independent seeded runs at cap=50 agreed to within 1% (263 / 266 / 265 MB delta), and the 2000-call multi-pass run shows RSS plateaus after the cap is reached instead of drifting. Tests in tests/datasets/test_video_decoder_cache.py cover: default-is-bounded, size cap, LRU ordering, FD close on eviction, FD close on clear(), cache-hit invariance, max_size=None fallback, and env-var override. No regressions in test_video_encoding.py, test_streaming.py, or test_dataset_reader.py (73 prior tests still pass alongside the 8 new ones).	2026-05-19 16:54:25 +02:00
四七	6a8878a639	fix(datasets): normalize shape=(1,) numeric values before HF encoding (#3344 ) * fix(datasets): normalize shape=(1,) numeric values before save * test(datasets): cover shape=(1,) int/bool and finalize Co-authored-by: Copilot <copilot@github.com>	2026-05-19 16:53:19 +02:00
Caroline Pascal	d38eb89f71	feat(video re-encoding): Adding utility and dataset edition tool for video re-encoding (#3611 ) * feat(utility): adding video re-encode utility * feat(edit): adding a new lerobot-edit-dataset tool to re-encode all the videos of a dataset * chore(format): formatting code * chore(review): fix Claude reviews * test(reencode dataset): adding missing test for reencode dataset	2026-05-19 14:46:14 +02:00
Pepijn	7ab4936b1b	Add extensive language support (#3467 ) * Add extensive language support * Address review: split persistent/event schemas, drop event timestamps - recipe.py: derive _VALID_ROLES/_VALID_STREAMS from MessageRole/MessageStream Literals - dataset_metadata.py: keep CODEBASE_VERSION at v3.0 - language.py: remove RESERVED_STYLES; split arrow/feature schemas into persistent (with timestamp) and event (without timestamp); add docstrings - language_render.py: events use frame-row timestamp implicitly; no per-event timestamp filtering or sorting - converters.py: drop unused subtask_key passthrough - add docstrings to new public APIs (recipe, render_messages_processor, collate) - update tests for split schemas; revert uv.lock Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * Add docstrings to all new helpers; revert uv.lock Covers private helpers in recipe.py, language.py, language_render.py, and render_messages_processor.py. Also reverts uv.lock to main (it was re-generated by `uv run` during local checks). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(language): add motion (persistent) and trace (event-only) styles Promote the previously-reserved motion/trace styles to first-class core styles. motion routes to language_persistent (it tracks robot state over time); trace routes to language_events (single-moment annotations). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(language): per-camera tagging on view-dependent styles Adds a nullable `camera` field to the language row struct (both persistent and event variants) so view-dependent styles like `vqa` can carry which `observation.images.` view they were grounded against. Without this, multi-camera datasets ended up with multiple `(vqa, role)` rows at the same timestamp that the resolver could not disambiguate. - `language.py`: add `camera` to PERSISTENT_ROW_FIELDS / EVENT_ROW_FIELDS, to both Arrow struct types and the HF datasets feature mappings; introduce VIEW_DEPENDENT_STYLES = {vqa, motion, trace} plus `is_view_dependent_style` and `validate_camera_field` helpers (camera required iff style is view-dependent). - `language_render.py`: thread an optional `camera=` kwarg through every resolver (`active_at`, `emitted_at`, `nth_prev`, `nth_next`) and through `_matching_rows` / `_select_`, so recipes can disambiguate per-camera VQA with `emitted_at(t, style=vqa, role=assistant, camera=...)`. Without a `camera` filter, multi-row matches keep raising the existing ambiguity error — which is the desired behaviour on multi-camera data. - `recipes/pi05_hirobot.yaml`: replace the single `ask_vqa` branch with `ask_vqa_top` and `ask_vqa_wrist` per-camera sub-recipes (each carrying the matching image block), keeping the original 0.20 budget and documenting the customization point for datasets with different cameras. - Tests: schema test asserts the new field order; new tests cover `is_view_dependent_style`, `validate_camera_field` (both required and forbidden directions), per-camera `emitted_at` filtering, and the ambiguity error when two cameras emit `(vqa, assistant)` at the same timestamp without a `camera=` filter. RenderMessagesStep + dataset passthrough fixtures updated to include the new field. - `docs/source/language_and_recipes.mdx`: document the `camera` field, the per-camera resolver pattern, and the canonical recipe convention. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(language): drop motion from VIEW_DEPENDENT_STYLES Motion primitives are described in robot-frame (joint / Cartesian) terms, not pixel space, so they are camera-agnostic. Only `vqa` (event) and `trace` (event, pixel-trajectory) are view-dependent. The `camera` field stays on PERSISTENT_ROW_FIELDS for schema symmetry — the validator, resolver, and HF feature mapping behave identically across the two columns regardless of which styles populate `camera` today — but persistent rows now always have `camera=None` in practice. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(language): task_aug style + automatic ${task} rephrasing rotation Adds task-prompt diversity (Xiao 2022 / CAST) without touching ``meta/tasks.parquet`` or forcing recipes to opt in. The plan reserved ``task_aug`` as a future style; this lands it now. - ``language.py``: add ``task_aug`` to ``CORE_STYLES`` and ``PERSISTENT_STYLES``. ``column_for_style("task_aug")`` returns ``language_persistent`` so PR 2 writers route it correctly. - ``language_render.py``: ``_resolve_task`` now consults the persistent slice for rows of ``style="task_aug", role="user"``. When any exist it picks one deterministically by ``sample_idx`` (blake2b-keyed, not Python's randomized hash) so an epoch sees every rephrasing of every episode while the same sample still resolves identically across reruns. Falls back to the canonical ``meta/tasks.parquet`` task when no rephrasings are present, so existing datasets and unannotated runs keep their behaviour. Explicit ``task=`` overrides still win. - Tests: rephrasing coverage across samples, determinism on repeat ``sample_idx``, fallback when persistent has no ``task_aug`` rows, and explicit override priority. Recipes get this for free: any ``${task}`` placeholder rotates through the available rephrasings. Recipes that want the literal canonical task can override the binding. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(language): tool catalog in meta/info.json + LeRobotDatasetMetadata.tools Stores OpenAI-style function schemas at ``meta/info.json["tools"]`` so datasets can declare which tools are available (today: just ``say``; tomorrow: per-dataset extensions). The ``DEFAULT_TOOLS`` constant fills in for unannotated datasets so chat-template consumers don't have to special-case anything. Three pieces: - ``language.py``: ``SAY_TOOL_SCHEMA`` and ``DEFAULT_TOOLS`` constants. Single source of truth — PR 2's writer and PR 3's runtime tool registry will both import from here instead of duplicating the dict. - ``dataset_metadata.py``: ``LeRobotDatasetMetadata.tools`` property reads ``info.json["tools"]`` and falls back to ``DEFAULT_TOOLS``. Returns deep-copied dicts so callers can mutate the result safely. - ``docs/source/tools.mdx``: spec page covering the catalog, per-row invocations, and the three-step "how to add a new tool" workflow (declare schema, implement, register). Linked from the docs toctree under the Datasets section. This lays the groundwork for PR 2's pipeline writing the catalog out during annotation, and PR 3's ``src/lerobot/tools/`` package shipping runnable implementations (one file per tool — first up: ``say.py`` wrapping Kyutai's pocket-tts). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * Apply ruff and prettier formatting after merge Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * refactor(language): unify resolver dispatch and prune redundant test scaffolding * Drop the unused `events` kwarg from `active_at`/`nth_prev`/`nth_next`; only `emitted_at` actually consults events. The dispatcher in `_resolve_spec` now passes events conditionally. * Replace the dual `_persistent_sort_key`/`_event_sort_key` pair with a single `_row_sort_key` and drop the `sort_key` parameter from `_select_one`. Event rows lack `timestamp` (it is implicit in the frame) and now default to `0.0` for sort purposes — the `(style, role)` tiebreaker is unchanged. * Inline `_select_latest` into `active_at` (its only caller). * Collapse `emitted_at`'s dual-branch into one `_select_one` call. * Tighten `_validate_persistent_resolver` to a single `column_for_style(style) != LANGUAGE_PERSISTENT` check. * Parameterize `test_per_camera_blend_renders_both_views` over the two cameras and factor the sub-recipe builder into `_vqa_subrecipe` so the test no longer hand-rolls two near-identical recipe blocks. Net -98 LOC; behavior, public resolver names, and test expectations unchanged. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(language): always raise on ambiguous resolver matches `_select_one` previously skipped its ambiguity check whenever any of `role`/`tool_name`/`camera` was set, on the assumption that the caller had already pinned down a unique row. That left a real ambiguity hole for VQA: with two cameras emitting `(vqa, assistant)` at the same frame, `emitted_at(..., role="assistant")` silently picked the first sorted row instead of telling the recipe to add `camera=...`. The existing `test_emitted_at_raises_on_ambiguous_per_camera_vqa` test already encoded the desired behavior. Tighten the check: any time `len(rows) > 1` we now raise with the selectors echoed back, so users see exactly which fields they passed and that more is needed to disambiguate. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * chore: fix CI — collapse short ValueError to one line, refresh uv.lock * `ruff format` on CI (newer version) wants the short `camera=None` ValueError on a single line. * `uv.lock` was stale relative to `pyproject.toml`'s `datasets>=4.7.0` pin (and picked up upstream `s390x` marker fixes for cuda packages). CI runs `uv sync --locked` which rejected the divergence. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(language): keep base install green — drop processor re-export, gate dataset-extra tests `lerobot.processor` re-exported `RenderMessagesStep` at the package level, so importing anything from `lerobot.processor` pulled in `lerobot.datasets.language` → `lerobot.datasets/__init__.py` → `require_package("datasets")`, which fails in the Tier 1 base install that intentionally omits the `[dataset]` extra. The chain bricked collection for unrelated suites (`tests/policies/pi0_pi05/...`, `tests/envs/...`, etc.). * Stop re-exporting `RenderMessagesStep` from `lerobot.processor`. The only consumer (the test) already imports from the submodule. Document the deliberate omission in the module docstring. * Add `pytest.importorskip("datasets", ...)` (and `pandas` where needed) at the top of the four PR-added tests that exercise the language stack: - tests/datasets/test_language.py - tests/datasets/test_language_render.py - tests/processor/test_render_messages_processor.py - tests/utils/test_collate.py Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(language): address review — tools accessor, motion docs, conditional collate * `meta.tools` actually reads `info.json["tools"]`. `DatasetInfo` had no `tools` field, so `from_dict` silently dropped the key (it warned about unknown fields then discarded them) and the property always returned `DEFAULT_TOOLS`. Added `tools: list[dict] \| None` to the dataclass; `to_dict()` drops it when unset so existing datasets keep a clean `info.json`. Fixed the accessor to read `self.info.tools` (the previous `.get(...)` would have raised AttributeError on the dataclass anyway). Added regression tests: fallback when absent, round-trip from disk, and round-trip through `DatasetInfo.from_dict` / `to_dict`. * `motion` is not view-dependent — fix the docs. The mdx claimed rows of style `motion` must carry `camera`, but `VIEW_DEPENDENT_STYLES = {"vqa", "trace"}` and the validator agrees: motion primitives are joint/Cartesian-frame, not pixel-space. Updated both call-out paragraphs in `language_and_recipes.mdx`. * Conditional `collate_fn` swap. Added `meta.has_language_columns` and gate the `lerobot_collate_fn` swap in `lerobot_train.py` on it, so non-language datasets keep PyTorch's `default_collate`. Also added a pass-through test in `test_collate.py` that asserts on a plain tensor batch the custom collate matches `default_collate` key-for-key, plus a test for the `None`-sample drop path. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * review: dedupe regex, centralize column names, harden collate, more tests * #2 — dedupe `_PLACEHOLDER_RE`. The same regex was compiled in `recipe.py` and `language_render.py`. Promote to module-level `PLACEHOLDER_RE` in `recipe.py` (its primary owner — declares template syntax) and import from `language_render.py`. * #3 — centralize language column names. `io_utils.py` had hardcoded `{"language_persistent", "language_events"}` literals at two sites. Replace with `LANGUAGE_COLUMNS` import so a future column rename can't silently desync. * #4 — defensive collate preserved-keys. `lerobot_collate_fn` silently filtered language fields from samples that didn't have them, which would hand downstream consumers a preserved list shorter than the tensor batch. Now: if any sample carries a key, every sample in the batch must carry it; otherwise raise a `ValueError` so the upstream rendering bug surfaces at the boundary. * #5 — `_scalar` rejects non-singleton lists. Previously a zero- or multi-element list fell through and triggered confusing `float([])` errors downstream. Now raises `ValueError` with the actual length. * #6 — refactor `_extract_complementary_data`. Replace 11 lines of `key = {... if ... else {}}` plus an 11-line splat dict with a single `_COMPLEMENTARY_KEYS` tuple iterated once. * #7 — document `EXTENDED_STYLES`. Was an empty `set()` with no comment. Add a docstring explaining it's an intentional extension point: downstream modules append project-local styles before `column_for_style` is called. * #9 — `tools.mdx` notes the runtime layer is future work. The page referenced `src/lerobot/tools/`, `registry.py`, and `get_tools(meta)` — none exist in this PR. Added a callout at the start of "How to add your own tool" plus a note on the implementations paragraph. * #10 — tests for YAML round-trip, malformed rows, blend validation. `test_recipe.py` grew from 1 case to 12 covering: blend-or-messages exclusivity, target-turn requirement, blend emptiness, weight presence/positivity, nested-blend rejection, `from_dict` with nested blends, `from_yaml` / `load_recipe` agreement, top-level non-mapping rejection. Added a malformed-row test for `_normalize_rows` that asserts non-dict entries raise `TypeError`. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * review: emitted_at uses 0.1s tolerance; MessageTurn requires stream at construction * Float tolerance in `emitted_at` for persistent styles. The ``_timestamp(row) == t`` exact-equality check silently missed any caller that derived ``t`` arithmetically (e.g. ``frame_idx / fps``) even though the parquet timestamp would only differ by ULPs. Added ``EMITTED_AT_TOLERANCE_S = 0.1`` and check ``abs(...) <= tolerance`` instead, with a docstring explaining why exact equality wasn't enough and why 0.1 s is safe at typical 30–100 Hz control rates. Test asserts the new behavior at half-window (matches) and double-window (no match) using the constant so it stays in sync. * `MessageTurn.stream` is required at construction. It was typed ``MessageStream \| None = None`` so YAML could omit ``stream:`` and pass the dataclass invariant — but ``_validate_rendered`` rejected ``None`` streams later, surfacing the error at the first sample instead of at recipe load. Now ``__post_init__`` raises ``ValueError`` if ``stream`` is ``None``, with the list of valid streams in the message. The redundant late-stage check in ``_validate_rendered`` is replaced with a one-line comment that cites the upstream invariant. Test pins the new construction-time rejection. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * docs(tools): drop follow-up-PR references Reword the two callouts in `tools.mdx` to describe the runtime layer in present tense ("not part of the catalog layer shipped today", "those modules don't yet exist in the tree") instead of pointing at a specific follow-up PR. Keeps the doc honest about what works now without coupling it to a particular release order. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * review: address CarolinePascal feedback - language timestamps: float64 -> float32 to match LeRobotDataset frame timestamps (Arrow struct + HF feature) - dataset_metadata: hoist `.language` imports to module top — language.py has no lerobot imports, so there is no circular-import risk - dataset_metadata: add a `meta.tools` setter that persists the catalog to info.json and reloads `meta.info` - feature_utils: validate the `language` dtype instead of returning "" — warn (non-fatal) when a non-empty value is written at record time - centralize the scalar-unwrap helper as `lerobot.utils.utils.unwrap_scalar`, shared by render_messages_processor and language_render - docs: move `## Layer 2 — recipe anatomy` ahead of the resolver sections, which describe recipe bindings rather than dataset layout - language_render: note in EMITTED_AT_TOLERANCE_S that persistent rows change on a human-action timescale, not the camera frame rate Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-19 14:46:11 +02:00
pepijn	2ea0da2d9f	fix(annotate): tag uploaded dataset revision Co-authored-by: Cursor <cursoragent@cursor.com>	2026-05-19 12:44:35 +00:00
Pepijn	134a707c7a	feat(annotate): first-person memory narrative + shorter speech prompts - module_1_memory: rewrite as an explicit first-person, past-tense narrative ("I picked up...", "I opened...") matching the MEM (Torne 2026) running-memory style, instead of "one or two short sentences" with no person/tense guidance. - module_1_task_rephrasings: bias rephrasings toward short imperative. - module_2_initial_speech: prefer very short robot acknowledgements. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-19 14:17:30 +02:00
von Neumann 101	ca8c60a0ed	Set OpenCV fourcc after size and fps (#3620 ) * Set OpenCV fourcc after size and fps * Set OpenCV fourcc last on Windows * Add comment explaining DSHOW fourcc ordering	2026-05-19 14:06:41 +02:00
Pepijn	ce47075d6b	feat(annotate): deterministic plan, single-frame VQA, dataset tagging Port the steerable-pipeline refinements developed on feat/smolvla-on- steerable back into the annotation pipeline itself: - module_1_subtasks: imperative verb-first telegraphic labels with a consistent-object-noun rule and good/bad examples (no hard word cap). - _generate_plan: drop the VLM round-trip; the plan is now a deterministic numbered list of still-todo subtasks, re-emitted at every subtask boundary so it shrinks as work progresses. Removes module_1_plan.txt. - VqaConfig.K 3 -> 1: a VQA pair anchors exactly its emission frame, no stale-label temporal smear. - lerobot-annotate: tag the pushed dataset with its codebase_version so LeRobotDataset can resolve a revision and load it. - module_2_interjection: shorter, more natural mid-task cues. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-19 14:06:15 +02:00
Pepijn	26013da699	feat(annotations): enforce imperative verb-first subtask phrasing Rewrite module_1_subtasks prompt to produce short imperative commands ("pick up the orange") instead of third-person narration ("the robot arm moves to the orange"). Drops the verbose "how, not what" rule and adds a good/bad few-shot table. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-19 13:53:20 +02:00
Pepijn	3c15fd8537	feat(robots): natively integrate Seeed Studio reBot B601-DM arm (#3624 ) * feat(robots): natively integrate Seeed Studio reBot B601-DM arm Add first-class LeRobot support for the Seeed Studio reBot arm, replacing the out-of-tree `lerobot-robot-seeed-b601` / `lerobot-teleoperator-rebot-arm-102` plugin packages. New devices: - robot `rebot_b601_follower` — single-arm B601-DM follower (6-DOF + gripper, Damiao CAN motors via `motorbridge`) - robot `bi_rebot_b601_follower` — bimanual follower composing two single arms - teleoperator `rebot_102_leader` — single-arm StarArm102 / reBot Arm 102 leader (FashionStar UART servos via `motorbridge-smart-servo`) - teleoperator `bi_rebot_102_leader` — bimanual leader composing two single arms The bimanual variants reuse the single-arm classes and namespace each arm's observation/action keys with `left_` / `right_` prefixes, so a bimanual StarArm102 leader can teleoperate a bimanual reBot B601 follower. Optional SDK imports are guarded; a `rebot` extra installs `motorbridge` and `motorbridge-smart-servo`. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * docs: add reBot B601-DM calibration & dual-arm teleoperation guide Add docs/source/rebot_b601.mdx covering single-arm and bimanual calibration and teleoperation for the reBot B601-DM follower and reBot Arm 102 leader, with zero-position reference images from the Seeed Studio wiki. Register the page in the docs toctree. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * docs: fix reBot B601 MDX build (move JSON example out of <Tip>) The doc-builder parses `{...}` inside MDX component children as a Svelte expression, so the joint_directions JSON example broke the build. Move it into a top-level fenced code block. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * docs: apply prettier formatting to reBot B601 page Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * docs: remove duplicate colocated reBot B601 page docs/source/rebot_b601.mdx is the canonical, toctree-registered page; the colocated rebot_b601.md was a redundant thinner copy. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * docs: clarify 6-DOF leader fallback comment in reBot B601 follower Explain that holding wrist_yaw at zero is what lets a 6-DOF leader (e.g. so100_leader / so101_leader) teleoperate the 7-DOF follower. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * refactor: address Caroline's PR review on reBot B601 integration - leader: remove _validate_config (no other lerobot device validates its config; a key mismatch now surfaces as a plain KeyError) - leader: simplify _round_to_valid_range to direct modular arithmetic instead of a bidirectional search loop - leader: inline the single-use _clamp helper - follower & leader: write MotorCalibration range_min/range_max from the configured joint_limits / joint_ranges instead of a fixed [-90, 90] - docs: add a "Find the USB ports" section (lerobot-find-port) and move the brltty/permissions tip there; link the OpenArm page for SocketCAN adapter configuration Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-18 19:49:21 +02:00
Pepijn	f72b28738a	fix(annotate): default keyframe decode to ffmpeg CLI (thread-safe) The decoder chain tried torchcodec first, then ffmpeg. torchcodec is not thread-safe: under the executor's 16-wide concurrent decode in the interjections phase it SIGSEGVs (exit 139) before the ffmpeg fallback is ever reached — uncatchable, so it kills the whole job. Default the auto chain to ffmpeg only. Per-frame ffmpeg decode runs in an isolated child process: crash-safe and concurrency-safe (the plan phase already proved 16 parallel ffmpeg subprocesses are fine). torchcodec / pyav remain available via an explicit video_backend. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-18 16:40:29 +02:00
Pepijn	1bd53cc7da	fix(annotate): decode keyframes via ffmpeg CLI fallback PyAV segfaulted (exit 139) decoding the AV1 streams modern LeRobot datasets use — a SIGSEGV that the per-episode try/except cannot catch, killing the whole job when the interjections phase started. Replace the PyAV fallback with _decode_frames_ffmpeg, which shells out to the ffmpeg CLI: a full ffmpeg build decodes AV1, and a child-process crash is a catchable non-zero exit rather than a segfault. Decoder chain is now torchcodec -> ffmpeg. _decode_frames_av stays available behind video_backend="pyav" for callers that want it. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-18 16:08:31 +02:00
Pepijn	7128bb1769	fix(annotate): decode keyframes via PyAV directly The pyav fallback routed through lerobot's decode_video_frames(backend= "pyav"), which uses torchvision.io.VideoReader — removed in torchvision 0.23+. On modern torch stacks (e.g. vllm-openai with torchvision 0.26) both torchcodec and that path fail, leaving interjection/vqa prompts without visual context. Add _decode_frames_av: a self-contained PyAV decoder that picks the nearest frame per timestamp. It is the always-available tail of the decoder chain (torchcodec -> pyav) and the target of --video_backend=pyav. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-18 15:45:04 +02:00
Pepijn	31e0c15e55	fix(annotate): pyav fallback when torchcodec keyframe decode fails VideoFrameProvider decoded keyframes via torchcodec only. Some containers (e.g. vllm-openai) ship a torchcodec that cannot push packets to the decoder ("Operation not permitted"), silently degrading interjection/vqa prompts to no visual context. _decode now retries with pyav when the default backend raises, and a new `video_backend` config field lets callers pin the backend explicitly. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-18 15:23:53 +02:00
Pepijn	c5676ef1b3	feat(annotate): add dest_repo_id for separate push target Adds an optional `dest_repo_id` to AnnotationPipelineConfig. When set, `push_to_hub` uploads the annotated dataset there instead of overwriting the source `repo_id`, restoring separate source/destination repos. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-18 15:05:23 +02:00
Quentin Lhoest	5ebbdf3d05	Mention the new Lance LeRobotDataset implementation in the docs (#3609 ) * Enhance documentation with Lance format details Added information about Lance format and `lerobot-lancedb` package for multimodal AI datasets. Signed-off-by: Quentin Lhoest <42851186+lhoestq@users.noreply.github.com>	2026-05-18 14:51:26 +02:00
Pepijn Kooijmans	9dfc9084e1	review: decode keyframes via video_utils.decode_video_frames Addresses three of CarolinePascal's frames.py comments (the fourth, the subprocess re-encode, waits on #3611): - replace the bespoke _decode_pyav_direct PyAV decoder with lerobot.datasets.video_utils.decode_video_frames (torchcodec backend, PyAV fallback) — torchvision's VideoReader removal no longer applies - frames flow through the provider as torch.Tensor (C, H, W uint8); PIL is materialised only at the VLM-message boundary in to_image_blocks / to_video_block, where the chat backends need it - _decode now returns exactly one frame per timestamp (or [] on failure), so frames_at pairs them with strict=True Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-18 14:00:38 +02:00
Khalil Meftah	6e035fb169	Update reward config and model card template (#3625 )	2026-05-18 13:12:15 +02:00
Pepijn Kooijmans	fd18beb3a1	review: address CarolinePascal feedback - name the three modules everywhere (plan / interjections / vqa) instead of module_1/2/3 — config classes, config fields, executor params, staging keys and phase names now carry the module name - rename examples/annotation -> examples/annotations; add the Apache header to run_hf_job.py - drop the unused GeneralVqaModule._generate_one - remove "PR 1" references from comments/docstrings - frames.py: rely on the always-defined LeRobotDatasetMetadata.camera_keys - executor.py: read/write meta/info.json via load_info / write_info - reader.py: load meta/tasks.parquet via io_utils.load_tasks - make --push_to_hub a bool; push the annotated dataset back to --repo_id - move the on-disk test dataset builder into tests/fixtures (build_annotation_dataset); run_e2e_smoke reuses it - clarify in the docs that the vqa module grounds each pair on a single frame (K = per-tick anchor count) - hoist stdlib dynamic imports to module scope Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-18 12:03:25 +02:00
Haoming Song	01dcb4c292	fix(pi05): update pi05 with transformers v5.4.0 interface (#3603 )	2026-05-15 11:37:05 +02:00
Caroline Pascal	bd9619dfc3	feat(encoding parameters): adding support for user provided video encoding parameters (#3455 ) * chore(video backend): renaming codec into video_backend in get_safe_default_video_backend() * feat(pyav utils): adding suport for PyAV encoding parameters validation * feat(VideoEncoderConfig): creating a VideoEncoderConfig to encapsulate encoding parameters * feat(VideoEncoderConfig): propagating the VideoEncoderConfig in the codebase * chore(docs): updating the docs * feat(metadata): adding encoding parameters in dataset metadata * fix(concatenation compatibility): adding compatibility check when concatenating video files * feat(VideoEncoderConfig init): making VideoEncoderConfig more robust and adaptable to multiple backends * feat(pyav checks): making pyav parameters checks more robust * chore(duplicate): removing duplicate get_codec_options definition * test(existing): adapting existing tests * test(new): adding new tests for encoding related features * chore(format): fixing formatting issues * chore(PyAV): cleaning up PyAV utils and encoding parameters checks to stick to the minimun required tooling. * chore(format): formatting code * chore(doctrings): updating docstrings * fix(camera_encoder_config): Removing camera_encoder_config from LeRobotDataset, as it's only required in LeRobotDatasetWriter. * feat(default values): applying a consistent naming convention for default RGB cameras video encoder parameters * fix(rollout): propagating VideoEncoderConfig to the latest recording modes * chore(format): formatting code, fixing error messages and variable names * fix(arguments order): reverting changes in arguments order in StreamingVideoEncoder * chore(relative imports): switching to relative local imports within lerobot.datasets * test(artifacts): cleaning up artifacts for the video encoding tests * chore(docs): updating docs * chore(fromat): formatting code * fix(imports): refactoring the file architecture to avoid circular imports. VideoEncoderConfig is now defined in lerobot.configs and lazily imports av at runtime. * fix(typos): fixing typos and small mistakes * test(factories): updating factories * feat(aggregate): updating dataset aggregation procedure. Encoding tuning paramters (crf, g,...) are ignored for validation and changed to None in the aggregated dataset if incompatible. * docs(typos): fixing typos * fix(deletion): reverting unwanted deletion * fix(typos): fixing multiple typos * feat(codec options): passing codec options to lerobot_edit_dataset episode deletion tool * typo(typo): typo * fix(typos): fixing remaining typos * chore(rename): renaming camera_encoder_config to camera_encoder * docs(clean): cleaning and formating docs * docs(dataset): addind details about datasets * chore(format): formatting code * docs(warning): adding warning regarding encoding parameters modification * fix(re-encoding): removing inconsistent re-encoding option in lerobot_edit_dataset * typos(typos): typos * chore(format): resolving prettier issues * fix(h264_nvenc): fixing crf handling for h264_nvenc * docs(clean): removing too technical parts of the docs * fix(imports): fixing imports at the __init__ level * fix(imports): fixing not very pretty imports in video config file	2026-05-14 23:46:42 +02:00

1 2 3 4 5 ...

1584 Commits