lerobot-clone

mirror of https://github.com/huggingface/lerobot.git synced 2026-06-03 20:31:25 +00:00

Author	SHA1	Message	Date
Pepijn	beb22afd81	review: dedupe regex, centralize column names, harden collate, more tests * #2 — dedupe `_PLACEHOLDER_RE`. The same regex was compiled in `recipe.py` and `language_render.py`. Promote to module-level `PLACEHOLDER_RE` in `recipe.py` (its primary owner — declares template syntax) and import from `language_render.py`. * #3 — centralize language column names. `io_utils.py` had hardcoded `{"language_persistent", "language_events"}` literals at two sites. Replace with `LANGUAGE_COLUMNS` import so a future column rename can't silently desync. * #4 — defensive collate preserved-keys. `lerobot_collate_fn` silently filtered language fields from samples that didn't have them, which would hand downstream consumers a preserved list shorter than the tensor batch. Now: if any sample carries a key, every sample in the batch must carry it; otherwise raise a `ValueError` so the upstream rendering bug surfaces at the boundary. * #5 — `_scalar` rejects non-singleton lists. Previously a zero- or multi-element list fell through and triggered confusing `float([])` errors downstream. Now raises `ValueError` with the actual length. * #6 — refactor `_extract_complementary_data`. Replace 11 lines of `key = {... if ... else {}}` plus an 11-line splat dict with a single `_COMPLEMENTARY_KEYS` tuple iterated once. * #7 — document `EXTENDED_STYLES`. Was an empty `set()` with no comment. Add a docstring explaining it's an intentional extension point: downstream modules append project-local styles before `column_for_style` is called. * #9 — `tools.mdx` notes the runtime layer is future work. The page referenced `src/lerobot/tools/`, `registry.py`, and `get_tools(meta)` — none exist in this PR. Added a callout at the start of "How to add your own tool" plus a note on the implementations paragraph. * #10 — tests for YAML round-trip, malformed rows, blend validation. `test_recipe.py` grew from 1 case to 12 covering: blend-or-messages exclusivity, target-turn requirement, blend emptiness, weight presence/positivity, nested-blend rejection, `from_dict` with nested blends, `from_yaml` / `load_recipe` agreement, top-level non-mapping rejection. Added a malformed-row test for `_normalize_rows` that asserts non-dict entries raise `TypeError`. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-06 19:06:38 +02:00
Pepijn	5c30b14929	Merge remote-tracking branch 'origin/main' into feat/language-columns	2026-05-06 12:09:13 +02:00
Maxime Ellerbach	cb0a944941	refactor(datasets): replace untyped dict with typed DatasetInfo dataclass (#3472 ) * refactor(datasets): replace untyped dict with typed DatasetInfo dataclass Introduce typed DatasetInfo dataclass to replace untyped dict representation of info.json. Changes: - Add DatasetInfo dataclass with explicit fields and validation - Implement __post_init__ for shape conversion (list ↔ tuple) - Add dict-style compatibility layer (__getitem__, __setitem__, .get()) - Add from_dict() and to_dict() for JSON serialization - Update io_utils to use load_info/write_info with DatasetInfo - Update dataset utilities and metadata to use attribute access - Remove aggregate.py dict-style field access - Add tests fixture support for DatasetInfo Benefits: - Type safety with IDE auto-completion - Validation at construction time - Explicit schema documentation * fix pre-commit * update docstring inside DatasetInfo.from_dict() * sorts the unknown to have deterministic output Signed-off-by: Maxime Ellerbach <maxime@ellerbach.net> * refactoring the last few old fieds * fix crop dataset roi type mismatch * use consistantly int for data and video_files_size_in_mb --------- Signed-off-by: Maxime Ellerbach <maxime@ellerbach.net> Co-authored-by: jjolla93 <jjolla93@gmail.com>	2026-04-28 18:40:30 +02:00
Pepijn	8833d735a1	Add extensive language support	2026-04-27 10:56:32 +02:00
Steven Palma	df0763a2bc	feat(dependencies): minimal default tag install (#3362 )	2026-04-12 20:03:04 +02:00
Steven Palma	d90e4bcfd3	refactor(dataset): modular files (#3171 ) * refactor(dataset): modular files * refactor(dataset): update imports across the codebase	2026-03-15 23:58:09 -07:00

6 Commits