lerobot-clone

mirror of https://github.com/huggingface/lerobot.git synced 2026-05-31 10:51:35 +00:00

Author	SHA1	Message	Date
Maximellerbach	ea738a20c0	fixing misconception about multiview / singleview handling	2026-05-27 11:16:37 +02:00
Maxime Ellerbach	5e2c4c22c8	Merge branch 'main' into feat/vla-jepa	2026-05-26 15:03:59 +02:00
Maximellerbach	1a536d1f71	smol fix to avoid having default CPU device when training	2026-05-26 15:03:29 +02:00
Haoming Song	5c98e80430	fix(gr00t): fix Eagle25VL model and processor crash in transformers>=5.4.0, <5.6.0 (#3652 ) Co-authored-by: Steven Palma <imstevenpmwork@ieee.org>	2026-05-26 14:04:22 +02:00
Reece O'Mahoney	f65f3f7a4a	Fix policy.path in YAML configs (PR #3145 followup) (#3597 ) PR #3145 added YAML support for policy.path but left two bugs: 1. extract_path_fields_from_config only deleted config_data[field] when no sibling overrides existed. With siblings, the dict stayed in place and draccus crashed decoding it as PreTrainedConfig (no 'type' key). Sibling overrides go into _config_yaml_overrides and are applied later by from_pretrained(), so the field can always be removed. 2. wrap() updated config_path_cli to the cleaned temp file path but never propagated it to the draccus.parse fallback branch. cli_args still contained --config_path=<original>, so draccus read the original YAML with path: still present. Tests passed because they (a) called extract_path_fields_from_config directly and (b) included type: alongside path: in the YAML, sidestepping both bugs. Co-authored-by: Steven Palma <imstevenpmwork@ieee.org>	2026-05-26 14:01:19 +02:00
Maximellerbach	d09410882d	adding instructions for different embodiement + fixing some tests	2026-05-26 11:51:54 +02:00
Maximellerbach	b75b3ce02d	fix qwen norm layer output libero eval is now as expected	2026-05-22 17:36:43 +02:00
Pepijn	8194897994	fix(deps): cap placo below 0.9.16 and harden kinematics import (#3647 ) * fix(deps): cap placo below 0.9.16 and harden kinematics import placo 0.9.16 links against liburdfdom_sensor.so.4, which is unavailable on Ubuntu 24.04 (noble ships urdfdom 3.x). Importing placo on that base crashes with: ImportError: liburdfdom_sensor.so.4.0: cannot open shared object file This broke nightly Latest Deps tests (CPU and GPU) when the lockfile upgrade picked placo 0.9.16, since lerobot.model.kinematics unconditionally imports placo when _placo_available is true, and that check (importlib.util.find_spec) cannot detect dlopen failures of transitive shared libraries — so unrelated subsystems (RL actor, gym_manipulator) became unimportable. Two changes: 1. Pin placo to <0.9.16 in pyproject.toml + regenerate uv.lock (0.9.16 → 0.9.15). Short-term unblock for nightly CI until system urdfdom 4.x is broadly available. 2. Harden the import guard in src/lerobot/model/kinematics.py: wrap 'import placo' in try/except ImportError so a missing transitive .so no longer crashes module import. RobotKinematics instantiation now raises an informative ImportError citing the underlying dlopen failure via _raise_if_placo_unusable(). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(kinematics): hoist _placo_runtime_error to module scope for mypy Mypy walks the TYPE_CHECKING branch in which the runtime else-block is not executed, so _placo_runtime_error was only defined at runtime and mypy reported 'Name "_placo_runtime_error" is not defined' on the three references inside _raise_if_placo_unusable. Declare the symbol unconditionally at module scope with a default of None; the runtime import-failure branch still assigns to it. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * style(kinematics): drop verbose comments around placo import guard Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-22 12:03:07 +02:00
Haoming Song	9f437d86b6	fix(groot): align GR00TN15Config with transformers config dataclasses (#3606 ) * fix(gr00t): fix gr00t config dataclass init TypeError * fix(groot): guard strict config decorator without transformers for passing CI --------- Co-authored-by: Pepijn <138571049+pkooij@users.noreply.github.com>	2026-05-22 10:31:04 +02:00
Haoming Song	b74a551d38	fix(pi0, pi05): stabilize torch.compile and expand test coverage (#3610 ) * chore(gr00t): sync with #3606 for fixing gr00t config crash * fix(pi0&pi05): fix graph break caused by deepcopy of past_key_values in sample_actions * fix(pi0&pi05): fix frequent recompile caused by compute_layer_complete * feat(test): add compile test and benchamrk for pi0 and pi05 * feat(test): add comprehensive testing for pi0 and pi05. Including processor, forward, sample action, etc.	2026-05-22 10:29:34 +02:00
Maxime Ellerbach	5495c10cdf	trying to close success rate gap	2026-05-22 07:59:47 +00:00
Nikodem Bartnik	c0a2e9814d	fix examples (#3623 ) - Fixed broken API examples in Lerobot Imitation Learning Documentation - Teleoperation with cameras improved by adding a fixed frequency in the loop (without it the cameras feed gets very slow) - Wrapped record example script in main() to avoid problems on Mac - Previously teleoperation example was using SO-ARM and teleoperation with cameras was using Koch. I changed it to use SO-ARM in all of the examples. - Added section on how to train with HF Jobs - CLI and Python examples - Replaced lerobot-record with lerobot-rollout in policies examples	2026-05-21 22:14:07 +02:00
Maximellerbach	1bcba9dec6	fixing training and exal examples	2026-05-21 16:35:50 +02:00
Maximellerbach	dd13eda002	adding guard for diffusers	2026-05-21 15:41:26 +02:00
Maximellerbach	c01a00a972	adressing dtype zeros issue	2026-05-21 15:40:17 +02:00
Maxime Ellerbach	f75a2ee2f5	fixing doc defaults args Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com> Signed-off-by: Maxime Ellerbach <maxime@ellerbach.net>	2026-05-21 15:37:37 +02:00
Khalil Meftah	bac4f61eae	refactor: support custom progress parquet overlays (#3640 )	2026-05-21 14:32:10 +02:00
Maximellerbach	0edbb68ec3	pre-commit cleanup	2026-05-21 13:29:14 +02:00
Maxime Ellerbach	47f8a50fa0	refactoring into using pre and post processor	2026-05-21 13:29:14 +02:00
Maxime Ellerbach	51e57789ba	lots of changes to make existing weights work, need to massively refactor the pre and post processing	2026-05-21 13:29:14 +02:00
Maximellerbach	27669724d2	removing missleading future_action_window_size to just use chunk_size	2026-05-21 13:29:14 +02:00
Maximellerbach	e32a552edb	allow different state dim and action dim	2026-05-21 13:29:14 +02:00
Maximellerbach	596fda60d7	trying out to re-init the action head to avoid pretraining dimension mismatch	2026-05-21 13:29:14 +02:00
Maximellerbach	3fcea935b2	make default params more aligned with paper and pretrained models - adding possibility of freezing qwen backbone and world model - added tests for weight loading	2026-05-21 13:29:14 +02:00
Maximellerbach	76b63ebb26	add one-shot script to convert ginwind/VLA-JEPA checkpoints to safetensors (will remove once migrated)	2026-05-21 13:29:14 +02:00
Maximellerbach	bbe4ba7a53	add VLA-JEPA documentation Covers architecture overview, pretrained checkpoints, config reference, training/eval commands for LIBERO-10, and guidance on fine-tuning for single-camera datasets.	2026-05-21 13:29:14 +02:00
Maximellerbach	593253e155	update VLA-JEPA tests for arch changes and action_is_pad - Switch conftest to use `action_model_type="DiT-test"` now that `action_num_heads` / `action_attention_head_dim` have been removed. - Add action_head tests covering fully-padded loss (zero) and equivalence of action_is_pad=None vs all-zeros mask. - Remove obsolete `test_native_to_lerobot_wm_only` test.	2026-05-21 13:29:14 +02:00
Maximellerbach	acf65faaff	propagate action_is_pad masking through VLA-JEPA policy pipeline Pass the `action_is_pad` tensor from the batch through to the action head so padded timesteps are excluded from the flow-matching loss.	2026-05-21 13:29:14 +02:00
Maximellerbach	82a05f9cb4	align VLA-JEPA architecture with original checkpoint - Remove stale `action_num_heads` / `action_attention_head_dim` config fields; DiT head dimensions are now always derived from the preset (DiT-B/L/test). - Add `num_target_vision_tokens` and `action_max_seq_len` config fields required by the action head's future-token embedding and positional embedding tables. - Fix default `qwen_model_name` to 2B (matches all released checkpoints). - Rename `ActionEncoder` attrs w1/w2/w3 → layer1/layer2/layer3 to match checkpoint key names; replace `nn.Sequential` decoder/state-encoder with `_MLP2` (layer1/layer2 naming). - Fix `VLAJEPAActionHead` to size ActionEncoder and StateEncoder at `inner_dim` (DiT input width) rather than `action_hidden_size` (DiT output width). - Rename `DiT.blocks` → `transformer_blocks` and `attn` → `attn1` to match checkpoint; add alternating cross/self attention (even blocks cross-attend to Qwen context, odd blocks self-attend). - Add `DiT-test` preset for unit tests. - Rewrite `ActionConditionedVideoPredictor` with explicit ViT-style blocks (`_PredictorBlock` with fused qkv) to match checkpoint structure; rename `encoder`/`norm`/`proj` → `predictor_blocks`/`predictor_norm`/`predictor_proj`.	2026-05-21 13:29:14 +02:00
Maximellerbach	d4abb9d562	adding more tests to ensure good coverage	2026-05-21 13:29:13 +02:00
Maximellerbach	090d392b19	some more fixes to be closer to the original implem	2026-05-21 13:29:13 +02:00
Maxime Ellerbach	e36d742d7d	adjusting obs steps, tublets size to match original implementation	2026-05-21 13:29:13 +02:00
Maxime Ellerbach	f8a1acb6c9	fixing wm_loss not propagating	2026-05-21 13:29:13 +02:00
Maxime Ellerbach	60347bc742	fix warnings with qwen processor kwargs	2026-05-21 13:29:13 +02:00
Maxime Ellerbach	c6ec8d00e3	fixing action and state dim	2026-05-21 13:29:13 +02:00
Maximellerbach	0edb693ee4	adding guards to avoid needing transformers and diffusers for type checking and basic tests	2026-05-21 13:29:13 +02:00
Maximellerbach	cdae1b9ad8	updating uv lock	2026-05-21 13:29:13 +02:00
Maximellerbach	80ecf7bf53	adding deps to pyproject.toml	2026-05-21 13:28:04 +02:00
Maximellerbach	5597d539e7	linting	2026-05-21 13:27:07 +02:00
ginwind	dfbedb71d7	(feat)policies: add VLA-JEPA	2026-05-21 13:27:07 +02:00
ginwind	ebe6c66263	support vla_jepa	2026-05-21 13:27:07 +02:00
ginwind	0e18bdaf7a	feat(policies): add VLA-JEPA	2026-05-21 13:27:07 +02:00
ginwind	d5944c410c	feat(policies): add VLA-JEPA	2026-05-21 13:27:07 +02:00
ginwind	0d37efdb4b	first commit	2026-05-21 13:27:07 +02:00
Virgileboat	f4b834844e	Feat/clean can bus (#3526 ) * change timeout for handshake * enforce last state read when querry * change import order * fix(motors): flush stale robstride RX and harden feedback drain * robstride: remove redundant timeout and max_messages casts * bugfix + %-style * update exception catch	2026-05-21 11:44:04 +02:00
Roham Z. Nobari	dfdc48a7f1	fix(datasets): bound VideoDecoderCache to prevent OOM on large datasets (#3614 ) VideoDecoderCache used an unbounded dict keyed on absolute path, with no eviction in the standard LeRobotDataset path. With shuffled iteration over datasets that have many distinct mp4 files, every DataLoader worker accumulated one cached (VideoDecoder, fsspec file handle) pair per distinct path it had ever touched. Per-entry cost is ~3-5 MB of host RAM plus one open FD; at ~8 k entries this is roughly 30 GB per worker. This was hit in the wild during a SmolVLA training run on a 4,195-episode SO-101 dataset (8,390 mp4s, two cameras per episode). dmesg showed anon-rss climbing to 34.9 GB on a single pt_data_worker before the OOM killer fired ~30 min into training; with --num_workers=8 the per-worker peak halved to 17.9 GB, which is the expected inverse-scaling signature when the leak is per-decode and the workload is split across workers. The working workaround on the affected platform was --dataset.video_backend=pyav, because the pyav path opens/closes per call and never touches this cache. Switch the backing store to an OrderedDict and evict LRU entries when the cap is reached, closing the evicted file handle inside the lock so we do not leak FDs either. Default cap is DEFAULT_DECODER_CACHE_SIZE = 100, overridable via LEROBOT_VIDEO_DECODER_CACHE_SIZE or by passing max_size= to the constructor; max_size=None restores the legacy unbounded behaviour for callers that need it. Validation on the original failing workload (decode_video_frames_torchcodec called over real mp4s from the affected SO-101 dataset): unbounded: 300 files -> +1087 MB host RSS, cache=300, still climbing cap=50: 500 files -> +266 MB host RSS, cache=50, stable cap=50: 2000 calls -> +312 MB host RSS, cache=50, stable cap=100: 1000 calls -> +470 MB host RSS, cache=100, stable Three independent seeded runs at cap=50 agreed to within 1% (263 / 266 / 265 MB delta), and the 2000-call multi-pass run shows RSS plateaus after the cap is reached instead of drifting. Tests in tests/datasets/test_video_decoder_cache.py cover: default-is-bounded, size cap, LRU ordering, FD close on eviction, FD close on clear(), cache-hit invariance, max_size=None fallback, and env-var override. No regressions in test_video_encoding.py, test_streaming.py, or test_dataset_reader.py (73 prior tests still pass alongside the 8 new ones).	2026-05-19 16:54:25 +02:00
四七	6a8878a639	fix(datasets): normalize shape=(1,) numeric values before HF encoding (#3344 ) * fix(datasets): normalize shape=(1,) numeric values before save * test(datasets): cover shape=(1,) int/bool and finalize Co-authored-by: Copilot <copilot@github.com>	2026-05-19 16:53:19 +02:00
Caroline Pascal	d38eb89f71	feat(video re-encoding): Adding utility and dataset edition tool for video re-encoding (#3611 ) * feat(utility): adding video re-encode utility * feat(edit): adding a new lerobot-edit-dataset tool to re-encode all the videos of a dataset * chore(format): formatting code * chore(review): fix Claude reviews * test(reencode dataset): adding missing test for reencode dataset	2026-05-19 14:46:14 +02:00
Pepijn	7ab4936b1b	Add extensive language support (#3467 ) * Add extensive language support * Address review: split persistent/event schemas, drop event timestamps - recipe.py: derive _VALID_ROLES/_VALID_STREAMS from MessageRole/MessageStream Literals - dataset_metadata.py: keep CODEBASE_VERSION at v3.0 - language.py: remove RESERVED_STYLES; split arrow/feature schemas into persistent (with timestamp) and event (without timestamp); add docstrings - language_render.py: events use frame-row timestamp implicitly; no per-event timestamp filtering or sorting - converters.py: drop unused subtask_key passthrough - add docstrings to new public APIs (recipe, render_messages_processor, collate) - update tests for split schemas; revert uv.lock Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * Add docstrings to all new helpers; revert uv.lock Covers private helpers in recipe.py, language.py, language_render.py, and render_messages_processor.py. Also reverts uv.lock to main (it was re-generated by `uv run` during local checks). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(language): add motion (persistent) and trace (event-only) styles Promote the previously-reserved motion/trace styles to first-class core styles. motion routes to language_persistent (it tracks robot state over time); trace routes to language_events (single-moment annotations). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(language): per-camera tagging on view-dependent styles Adds a nullable `camera` field to the language row struct (both persistent and event variants) so view-dependent styles like `vqa` can carry which `observation.images.` view they were grounded against. Without this, multi-camera datasets ended up with multiple `(vqa, role)` rows at the same timestamp that the resolver could not disambiguate. - `language.py`: add `camera` to PERSISTENT_ROW_FIELDS / EVENT_ROW_FIELDS, to both Arrow struct types and the HF datasets feature mappings; introduce VIEW_DEPENDENT_STYLES = {vqa, motion, trace} plus `is_view_dependent_style` and `validate_camera_field` helpers (camera required iff style is view-dependent). - `language_render.py`: thread an optional `camera=` kwarg through every resolver (`active_at`, `emitted_at`, `nth_prev`, `nth_next`) and through `_matching_rows` / `_select_`, so recipes can disambiguate per-camera VQA with `emitted_at(t, style=vqa, role=assistant, camera=...)`. Without a `camera` filter, multi-row matches keep raising the existing ambiguity error — which is the desired behaviour on multi-camera data. - `recipes/pi05_hirobot.yaml`: replace the single `ask_vqa` branch with `ask_vqa_top` and `ask_vqa_wrist` per-camera sub-recipes (each carrying the matching image block), keeping the original 0.20 budget and documenting the customization point for datasets with different cameras. - Tests: schema test asserts the new field order; new tests cover `is_view_dependent_style`, `validate_camera_field` (both required and forbidden directions), per-camera `emitted_at` filtering, and the ambiguity error when two cameras emit `(vqa, assistant)` at the same timestamp without a `camera=` filter. RenderMessagesStep + dataset passthrough fixtures updated to include the new field. - `docs/source/language_and_recipes.mdx`: document the `camera` field, the per-camera resolver pattern, and the canonical recipe convention. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(language): drop motion from VIEW_DEPENDENT_STYLES Motion primitives are described in robot-frame (joint / Cartesian) terms, not pixel space, so they are camera-agnostic. Only `vqa` (event) and `trace` (event, pixel-trajectory) are view-dependent. The `camera` field stays on PERSISTENT_ROW_FIELDS for schema symmetry — the validator, resolver, and HF feature mapping behave identically across the two columns regardless of which styles populate `camera` today — but persistent rows now always have `camera=None` in practice. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(language): task_aug style + automatic ${task} rephrasing rotation Adds task-prompt diversity (Xiao 2022 / CAST) without touching ``meta/tasks.parquet`` or forcing recipes to opt in. The plan reserved ``task_aug`` as a future style; this lands it now. - ``language.py``: add ``task_aug`` to ``CORE_STYLES`` and ``PERSISTENT_STYLES``. ``column_for_style("task_aug")`` returns ``language_persistent`` so PR 2 writers route it correctly. - ``language_render.py``: ``_resolve_task`` now consults the persistent slice for rows of ``style="task_aug", role="user"``. When any exist it picks one deterministically by ``sample_idx`` (blake2b-keyed, not Python's randomized hash) so an epoch sees every rephrasing of every episode while the same sample still resolves identically across reruns. Falls back to the canonical ``meta/tasks.parquet`` task when no rephrasings are present, so existing datasets and unannotated runs keep their behaviour. Explicit ``task=`` overrides still win. - Tests: rephrasing coverage across samples, determinism on repeat ``sample_idx``, fallback when persistent has no ``task_aug`` rows, and explicit override priority. Recipes get this for free: any ``${task}`` placeholder rotates through the available rephrasings. Recipes that want the literal canonical task can override the binding. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(language): tool catalog in meta/info.json + LeRobotDatasetMetadata.tools Stores OpenAI-style function schemas at ``meta/info.json["tools"]`` so datasets can declare which tools are available (today: just ``say``; tomorrow: per-dataset extensions). The ``DEFAULT_TOOLS`` constant fills in for unannotated datasets so chat-template consumers don't have to special-case anything. Three pieces: - ``language.py``: ``SAY_TOOL_SCHEMA`` and ``DEFAULT_TOOLS`` constants. Single source of truth — PR 2's writer and PR 3's runtime tool registry will both import from here instead of duplicating the dict. - ``dataset_metadata.py``: ``LeRobotDatasetMetadata.tools`` property reads ``info.json["tools"]`` and falls back to ``DEFAULT_TOOLS``. Returns deep-copied dicts so callers can mutate the result safely. - ``docs/source/tools.mdx``: spec page covering the catalog, per-row invocations, and the three-step "how to add a new tool" workflow (declare schema, implement, register). Linked from the docs toctree under the Datasets section. This lays the groundwork for PR 2's pipeline writing the catalog out during annotation, and PR 3's ``src/lerobot/tools/`` package shipping runnable implementations (one file per tool — first up: ``say.py`` wrapping Kyutai's pocket-tts). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * Apply ruff and prettier formatting after merge Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * refactor(language): unify resolver dispatch and prune redundant test scaffolding * Drop the unused `events` kwarg from `active_at`/`nth_prev`/`nth_next`; only `emitted_at` actually consults events. The dispatcher in `_resolve_spec` now passes events conditionally. * Replace the dual `_persistent_sort_key`/`_event_sort_key` pair with a single `_row_sort_key` and drop the `sort_key` parameter from `_select_one`. Event rows lack `timestamp` (it is implicit in the frame) and now default to `0.0` for sort purposes — the `(style, role)` tiebreaker is unchanged. * Inline `_select_latest` into `active_at` (its only caller). * Collapse `emitted_at`'s dual-branch into one `_select_one` call. * Tighten `_validate_persistent_resolver` to a single `column_for_style(style) != LANGUAGE_PERSISTENT` check. * Parameterize `test_per_camera_blend_renders_both_views` over the two cameras and factor the sub-recipe builder into `_vqa_subrecipe` so the test no longer hand-rolls two near-identical recipe blocks. Net -98 LOC; behavior, public resolver names, and test expectations unchanged. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(language): always raise on ambiguous resolver matches `_select_one` previously skipped its ambiguity check whenever any of `role`/`tool_name`/`camera` was set, on the assumption that the caller had already pinned down a unique row. That left a real ambiguity hole for VQA: with two cameras emitting `(vqa, assistant)` at the same frame, `emitted_at(..., role="assistant")` silently picked the first sorted row instead of telling the recipe to add `camera=...`. The existing `test_emitted_at_raises_on_ambiguous_per_camera_vqa` test already encoded the desired behavior. Tighten the check: any time `len(rows) > 1` we now raise with the selectors echoed back, so users see exactly which fields they passed and that more is needed to disambiguate. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * chore: fix CI — collapse short ValueError to one line, refresh uv.lock * `ruff format` on CI (newer version) wants the short `camera=None` ValueError on a single line. * `uv.lock` was stale relative to `pyproject.toml`'s `datasets>=4.7.0` pin (and picked up upstream `s390x` marker fixes for cuda packages). CI runs `uv sync --locked` which rejected the divergence. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(language): keep base install green — drop processor re-export, gate dataset-extra tests `lerobot.processor` re-exported `RenderMessagesStep` at the package level, so importing anything from `lerobot.processor` pulled in `lerobot.datasets.language` → `lerobot.datasets/__init__.py` → `require_package("datasets")`, which fails in the Tier 1 base install that intentionally omits the `[dataset]` extra. The chain bricked collection for unrelated suites (`tests/policies/pi0_pi05/...`, `tests/envs/...`, etc.). * Stop re-exporting `RenderMessagesStep` from `lerobot.processor`. The only consumer (the test) already imports from the submodule. Document the deliberate omission in the module docstring. * Add `pytest.importorskip("datasets", ...)` (and `pandas` where needed) at the top of the four PR-added tests that exercise the language stack: - tests/datasets/test_language.py - tests/datasets/test_language_render.py - tests/processor/test_render_messages_processor.py - tests/utils/test_collate.py Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(language): address review — tools accessor, motion docs, conditional collate * `meta.tools` actually reads `info.json["tools"]`. `DatasetInfo` had no `tools` field, so `from_dict` silently dropped the key (it warned about unknown fields then discarded them) and the property always returned `DEFAULT_TOOLS`. Added `tools: list[dict] \| None` to the dataclass; `to_dict()` drops it when unset so existing datasets keep a clean `info.json`. Fixed the accessor to read `self.info.tools` (the previous `.get(...)` would have raised AttributeError on the dataclass anyway). Added regression tests: fallback when absent, round-trip from disk, and round-trip through `DatasetInfo.from_dict` / `to_dict`. * `motion` is not view-dependent — fix the docs. The mdx claimed rows of style `motion` must carry `camera`, but `VIEW_DEPENDENT_STYLES = {"vqa", "trace"}` and the validator agrees: motion primitives are joint/Cartesian-frame, not pixel-space. Updated both call-out paragraphs in `language_and_recipes.mdx`. * Conditional `collate_fn` swap. Added `meta.has_language_columns` and gate the `lerobot_collate_fn` swap in `lerobot_train.py` on it, so non-language datasets keep PyTorch's `default_collate`. Also added a pass-through test in `test_collate.py` that asserts on a plain tensor batch the custom collate matches `default_collate` key-for-key, plus a test for the `None`-sample drop path. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * review: dedupe regex, centralize column names, harden collate, more tests * #2 — dedupe `_PLACEHOLDER_RE`. The same regex was compiled in `recipe.py` and `language_render.py`. Promote to module-level `PLACEHOLDER_RE` in `recipe.py` (its primary owner — declares template syntax) and import from `language_render.py`. * #3 — centralize language column names. `io_utils.py` had hardcoded `{"language_persistent", "language_events"}` literals at two sites. Replace with `LANGUAGE_COLUMNS` import so a future column rename can't silently desync. * #4 — defensive collate preserved-keys. `lerobot_collate_fn` silently filtered language fields from samples that didn't have them, which would hand downstream consumers a preserved list shorter than the tensor batch. Now: if any sample carries a key, every sample in the batch must carry it; otherwise raise a `ValueError` so the upstream rendering bug surfaces at the boundary. * #5 — `_scalar` rejects non-singleton lists. Previously a zero- or multi-element list fell through and triggered confusing `float([])` errors downstream. Now raises `ValueError` with the actual length. * #6 — refactor `_extract_complementary_data`. Replace 11 lines of `key = {... if ... else {}}` plus an 11-line splat dict with a single `_COMPLEMENTARY_KEYS` tuple iterated once. * #7 — document `EXTENDED_STYLES`. Was an empty `set()` with no comment. Add a docstring explaining it's an intentional extension point: downstream modules append project-local styles before `column_for_style` is called. * #9 — `tools.mdx` notes the runtime layer is future work. The page referenced `src/lerobot/tools/`, `registry.py`, and `get_tools(meta)` — none exist in this PR. Added a callout at the start of "How to add your own tool" plus a note on the implementations paragraph. * #10 — tests for YAML round-trip, malformed rows, blend validation. `test_recipe.py` grew from 1 case to 12 covering: blend-or-messages exclusivity, target-turn requirement, blend emptiness, weight presence/positivity, nested-blend rejection, `from_dict` with nested blends, `from_yaml` / `load_recipe` agreement, top-level non-mapping rejection. Added a malformed-row test for `_normalize_rows` that asserts non-dict entries raise `TypeError`. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * review: emitted_at uses 0.1s tolerance; MessageTurn requires stream at construction * Float tolerance in `emitted_at` for persistent styles. The ``_timestamp(row) == t`` exact-equality check silently missed any caller that derived ``t`` arithmetically (e.g. ``frame_idx / fps``) even though the parquet timestamp would only differ by ULPs. Added ``EMITTED_AT_TOLERANCE_S = 0.1`` and check ``abs(...) <= tolerance`` instead, with a docstring explaining why exact equality wasn't enough and why 0.1 s is safe at typical 30–100 Hz control rates. Test asserts the new behavior at half-window (matches) and double-window (no match) using the constant so it stays in sync. * `MessageTurn.stream` is required at construction. It was typed ``MessageStream \| None = None`` so YAML could omit ``stream:`` and pass the dataclass invariant — but ``_validate_rendered`` rejected ``None`` streams later, surfacing the error at the first sample instead of at recipe load. Now ``__post_init__`` raises ``ValueError`` if ``stream`` is ``None``, with the list of valid streams in the message. The redundant late-stage check in ``_validate_rendered`` is replaced with a one-line comment that cites the upstream invariant. Test pins the new construction-time rejection. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * docs(tools): drop follow-up-PR references Reword the two callouts in `tools.mdx` to describe the runtime layer in present tense ("not part of the catalog layer shipped today", "those modules don't yet exist in the tree") instead of pointing at a specific follow-up PR. Keeps the doc honest about what works now without coupling it to a particular release order. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * review: address CarolinePascal feedback - language timestamps: float64 -> float32 to match LeRobotDataset frame timestamps (Arrow struct + HF feature) - dataset_metadata: hoist `.language` imports to module top — language.py has no lerobot imports, so there is no circular-import risk - dataset_metadata: add a `meta.tools` setter that persists the catalog to info.json and reloads `meta.info` - feature_utils: validate the `language` dtype instead of returning "" — warn (non-fatal) when a non-empty value is written at record time - centralize the scalar-unwrap helper as `lerobot.utils.utils.unwrap_scalar`, shared by render_messages_processor and language_render - docs: move `## Layer 2 — recipe anatomy` ahead of the resolver sections, which describe recipe bindings rather than dataset layout - language_render: note in EMITTED_AT_TOLERANCE_S that persistent rows change on a human-action timescale, not the camera frame rate Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-19 14:46:11 +02:00
von Neumann 101	ca8c60a0ed	Set OpenCV fourcc after size and fps (#3620 ) * Set OpenCV fourcc after size and fps * Set OpenCV fourcc last on Windows * Add comment explaining DSHOW fourcc ordering	2026-05-19 14:06:41 +02:00

1 2 3 4 5 ...

1508 Commits