lerobot-clone

mirror of https://github.com/huggingface/lerobot.git synced 2026-05-31 02:41:24 +00:00

Author	SHA1	Message	Date
Maximellerbach	9961eb6918	adding .mdx docs and shortening polivy_vla_jepa_README.md	2026-05-29 11:52:42 +02:00
Maximellerbach	e1852db71a	adding licences	2026-05-29 11:51:58 +02:00
Maximellerbach	c2b29c8ae0	removing conversion script	2026-05-28 12:05:35 +02:00
Maximellerbach	58eac863aa	fixing misconception about multiview / singleview handling	2026-05-28 12:05:35 +02:00
Maximellerbach	952e5146dc	smol fix to avoid having default CPU device when training	2026-05-28 12:05:35 +02:00
Maximellerbach	37fda2a6fc	adding instructions for different embodiement + fixing some tests	2026-05-28 12:05:35 +02:00
Maximellerbach	df7d5132d1	fix qwen norm layer output libero eval is now as expected	2026-05-28 12:05:35 +02:00
Maxime Ellerbach	8efa5cabe9	trying to close success rate gap	2026-05-28 12:05:35 +02:00
Maximellerbach	7e23859c55	fixing training and exal examples	2026-05-28 12:05:35 +02:00
Maximellerbach	a24f669deb	adding guard for diffusers	2026-05-28 12:05:35 +02:00
Maximellerbach	b7727b8a6c	adressing dtype zeros issue	2026-05-28 12:05:35 +02:00
Maxime Ellerbach	7db4414e6b	fixing doc defaults args Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com> Signed-off-by: Maxime Ellerbach <maxime@ellerbach.net>	2026-05-28 12:05:35 +02:00
Maximellerbach	7da594fda8	pre-commit cleanup	2026-05-28 12:05:35 +02:00
Maxime Ellerbach	01ce5d7af1	refactoring into using pre and post processor	2026-05-28 12:05:35 +02:00
Maxime Ellerbach	83ef59e020	lots of changes to make existing weights work, need to massively refactor the pre and post processing	2026-05-28 12:05:35 +02:00
Maximellerbach	997b713f14	removing missleading future_action_window_size to just use chunk_size	2026-05-28 12:05:35 +02:00
Maximellerbach	1e3d25f10e	allow different state dim and action dim	2026-05-28 12:05:35 +02:00
Maximellerbach	8c0efb8295	trying out to re-init the action head to avoid pretraining dimension mismatch	2026-05-28 12:05:35 +02:00
Maximellerbach	9ef0fd5433	make default params more aligned with paper and pretrained models - adding possibility of freezing qwen backbone and world model - added tests for weight loading	2026-05-28 12:05:35 +02:00
Maximellerbach	26d2ac48a8	add one-shot script to convert ginwind/VLA-JEPA checkpoints to safetensors (will remove once migrated)	2026-05-28 12:05:34 +02:00
Maximellerbach	0f29cd3167	add VLA-JEPA documentation Covers architecture overview, pretrained checkpoints, config reference, training/eval commands for LIBERO-10, and guidance on fine-tuning for single-camera datasets.	2026-05-28 12:05:34 +02:00
Maximellerbach	64c9570547	update VLA-JEPA tests for arch changes and action_is_pad - Switch conftest to use `action_model_type="DiT-test"` now that `action_num_heads` / `action_attention_head_dim` have been removed. - Add action_head tests covering fully-padded loss (zero) and equivalence of action_is_pad=None vs all-zeros mask. - Remove obsolete `test_native_to_lerobot_wm_only` test.	2026-05-28 12:05:34 +02:00
Maximellerbach	8b03d25fef	propagate action_is_pad masking through VLA-JEPA policy pipeline Pass the `action_is_pad` tensor from the batch through to the action head so padded timesteps are excluded from the flow-matching loss.	2026-05-28 12:05:34 +02:00
Maximellerbach	5e37d97631	align VLA-JEPA architecture with original checkpoint - Remove stale `action_num_heads` / `action_attention_head_dim` config fields; DiT head dimensions are now always derived from the preset (DiT-B/L/test). - Add `num_target_vision_tokens` and `action_max_seq_len` config fields required by the action head's future-token embedding and positional embedding tables. - Fix default `qwen_model_name` to 2B (matches all released checkpoints). - Rename `ActionEncoder` attrs w1/w2/w3 → layer1/layer2/layer3 to match checkpoint key names; replace `nn.Sequential` decoder/state-encoder with `_MLP2` (layer1/layer2 naming). - Fix `VLAJEPAActionHead` to size ActionEncoder and StateEncoder at `inner_dim` (DiT input width) rather than `action_hidden_size` (DiT output width). - Rename `DiT.blocks` → `transformer_blocks` and `attn` → `attn1` to match checkpoint; add alternating cross/self attention (even blocks cross-attend to Qwen context, odd blocks self-attend). - Add `DiT-test` preset for unit tests. - Rewrite `ActionConditionedVideoPredictor` with explicit ViT-style blocks (`_PredictorBlock` with fused qkv) to match checkpoint structure; rename `encoder`/`norm`/`proj` → `predictor_blocks`/`predictor_norm`/`predictor_proj`.	2026-05-28 12:05:34 +02:00
Maximellerbach	a71e0d34ad	adding more tests to ensure good coverage	2026-05-28 12:05:34 +02:00
Maximellerbach	d00b3e993a	some more fixes to be closer to the original implem	2026-05-28 12:05:34 +02:00
Maxime Ellerbach	596c72bfc6	adjusting obs steps, tublets size to match original implementation	2026-05-28 12:05:34 +02:00
Maxime Ellerbach	ea535ad98d	fixing wm_loss not propagating	2026-05-28 12:05:34 +02:00
Maxime Ellerbach	7368a0085a	fix warnings with qwen processor kwargs	2026-05-28 12:05:34 +02:00
Maxime Ellerbach	7dba4f19a9	fixing action and state dim	2026-05-28 12:05:34 +02:00
Maximellerbach	90d398ea59	adding guards to avoid needing transformers and diffusers for type checking and basic tests	2026-05-28 12:05:34 +02:00
Maximellerbach	16a4643000	updating uv lock	2026-05-28 12:05:34 +02:00
Maximellerbach	6fa78ca8b4	adding deps to pyproject.toml	2026-05-28 11:59:32 +02:00
Maximellerbach	ec4bf4e47f	linting	2026-05-28 11:59:32 +02:00
ginwind	da56489174	(feat)policies: add VLA-JEPA	2026-05-28 11:59:32 +02:00
ginwind	addb354296	support vla_jepa	2026-05-28 11:59:31 +02:00
ginwind	848ed3240e	feat(policies): add VLA-JEPA	2026-05-28 11:58:03 +02:00
ginwind	f7bb1795e7	feat(policies): add VLA-JEPA	2026-05-28 11:58:03 +02:00
ginwind	0355902fba	first commit	2026-05-28 11:58:03 +02:00
Haoquan Fang	24017e960c	Add MolmoAct2 policy (#3604 ) * add molmoact2 policy * add apache headers to molmoact2 files * simplify molmoact2 package imports * align molmoact2 feature validation with eo pattern * remove molmoact2 processor override from factory * guard molmoact2 transformers imports * guard molmoact2 processor transformers import * add scipy dependency to molmoact2 extra * use a single molmoact2 action queue * move molmoact2 config logic into config * fix molmoact2 hf image key resolution * load molmoact2 without remote code * lazy import molmoact2 scipy * format molmoact2 files * skip molmoact2 tests without optional deps * fix molmoact2 pre-commit checks * validate molmoact2 gripper range	2026-05-27 18:58:37 +02:00
Khalil Meftah	e86f5af5bf	feat(rewards): add TOPReward reward model (#3629 ) * feat(rewards): add TOPReward reward model * refactor(rewards): clean up TOPReward processor/model * fix(rewards/topreward): add missing input keys mm_token_type_ids * fix(rewards/topreward): fix pyproject extra typo and simplify processor (#3653) Add lerobot[topreward] extra to all in pyproject.toml, drop the redundant labels arg in scoring, and collapse the dead-branch shape check in the encoder processor. * optmize topreward input processing (#3660) --------- Co-authored-by: Cole <91766445+jcoleharrison@users.noreply.github.com> Co-authored-by: Haoming Song <haomingsong24@gmail.com>	2026-05-27 14:24:31 +02:00
Haoming Song	5c98e80430	fix(gr00t): fix Eagle25VL model and processor crash in transformers>=5.4.0, <5.6.0 (#3652 ) Co-authored-by: Steven Palma <imstevenpmwork@ieee.org>	2026-05-26 14:04:22 +02:00
Reece O'Mahoney	f65f3f7a4a	Fix policy.path in YAML configs (PR #3145 followup) (#3597 ) PR #3145 added YAML support for policy.path but left two bugs: 1. extract_path_fields_from_config only deleted config_data[field] when no sibling overrides existed. With siblings, the dict stayed in place and draccus crashed decoding it as PreTrainedConfig (no 'type' key). Sibling overrides go into _config_yaml_overrides and are applied later by from_pretrained(), so the field can always be removed. 2. wrap() updated config_path_cli to the cleaned temp file path but never propagated it to the draccus.parse fallback branch. cli_args still contained --config_path=<original>, so draccus read the original YAML with path: still present. Tests passed because they (a) called extract_path_fields_from_config directly and (b) included type: alongside path: in the YAML, sidestepping both bugs. Co-authored-by: Steven Palma <imstevenpmwork@ieee.org>	2026-05-26 14:01:19 +02:00
Pepijn	8194897994	fix(deps): cap placo below 0.9.16 and harden kinematics import (#3647 ) * fix(deps): cap placo below 0.9.16 and harden kinematics import placo 0.9.16 links against liburdfdom_sensor.so.4, which is unavailable on Ubuntu 24.04 (noble ships urdfdom 3.x). Importing placo on that base crashes with: ImportError: liburdfdom_sensor.so.4.0: cannot open shared object file This broke nightly Latest Deps tests (CPU and GPU) when the lockfile upgrade picked placo 0.9.16, since lerobot.model.kinematics unconditionally imports placo when _placo_available is true, and that check (importlib.util.find_spec) cannot detect dlopen failures of transitive shared libraries — so unrelated subsystems (RL actor, gym_manipulator) became unimportable. Two changes: 1. Pin placo to <0.9.16 in pyproject.toml + regenerate uv.lock (0.9.16 → 0.9.15). Short-term unblock for nightly CI until system urdfdom 4.x is broadly available. 2. Harden the import guard in src/lerobot/model/kinematics.py: wrap 'import placo' in try/except ImportError so a missing transitive .so no longer crashes module import. RobotKinematics instantiation now raises an informative ImportError citing the underlying dlopen failure via _raise_if_placo_unusable(). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(kinematics): hoist _placo_runtime_error to module scope for mypy Mypy walks the TYPE_CHECKING branch in which the runtime else-block is not executed, so _placo_runtime_error was only defined at runtime and mypy reported 'Name "_placo_runtime_error" is not defined' on the three references inside _raise_if_placo_unusable. Declare the symbol unconditionally at module scope with a default of None; the runtime import-failure branch still assigns to it. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * style(kinematics): drop verbose comments around placo import guard Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-22 12:03:07 +02:00
Haoming Song	9f437d86b6	fix(groot): align GR00TN15Config with transformers config dataclasses (#3606 ) * fix(gr00t): fix gr00t config dataclass init TypeError * fix(groot): guard strict config decorator without transformers for passing CI --------- Co-authored-by: Pepijn <138571049+pkooij@users.noreply.github.com>	2026-05-22 10:31:04 +02:00
Haoming Song	b74a551d38	fix(pi0, pi05): stabilize torch.compile and expand test coverage (#3610 ) * chore(gr00t): sync with #3606 for fixing gr00t config crash * fix(pi0&pi05): fix graph break caused by deepcopy of past_key_values in sample_actions * fix(pi0&pi05): fix frequent recompile caused by compute_layer_complete * feat(test): add compile test and benchamrk for pi0 and pi05 * feat(test): add comprehensive testing for pi0 and pi05. Including processor, forward, sample action, etc.	2026-05-22 10:29:34 +02:00
Nikodem Bartnik	c0a2e9814d	fix examples (#3623 ) - Fixed broken API examples in Lerobot Imitation Learning Documentation - Teleoperation with cameras improved by adding a fixed frequency in the loop (without it the cameras feed gets very slow) - Wrapped record example script in main() to avoid problems on Mac - Previously teleoperation example was using SO-ARM and teleoperation with cameras was using Koch. I changed it to use SO-ARM in all of the examples. - Added section on how to train with HF Jobs - CLI and Python examples - Replaced lerobot-record with lerobot-rollout in policies examples	2026-05-21 22:14:07 +02:00
Khalil Meftah	bac4f61eae	refactor: support custom progress parquet overlays (#3640 )	2026-05-21 14:32:10 +02:00
Virgileboat	f4b834844e	Feat/clean can bus (#3526 ) * change timeout for handshake * enforce last state read when querry * change import order * fix(motors): flush stale robstride RX and harden feedback drain * robstride: remove redundant timeout and max_messages casts * bugfix + %-style * update exception catch	2026-05-21 11:44:04 +02:00
Roham Z. Nobari	dfdc48a7f1	fix(datasets): bound VideoDecoderCache to prevent OOM on large datasets (#3614 ) VideoDecoderCache used an unbounded dict keyed on absolute path, with no eviction in the standard LeRobotDataset path. With shuffled iteration over datasets that have many distinct mp4 files, every DataLoader worker accumulated one cached (VideoDecoder, fsspec file handle) pair per distinct path it had ever touched. Per-entry cost is ~3-5 MB of host RAM plus one open FD; at ~8 k entries this is roughly 30 GB per worker. This was hit in the wild during a SmolVLA training run on a 4,195-episode SO-101 dataset (8,390 mp4s, two cameras per episode). dmesg showed anon-rss climbing to 34.9 GB on a single pt_data_worker before the OOM killer fired ~30 min into training; with --num_workers=8 the per-worker peak halved to 17.9 GB, which is the expected inverse-scaling signature when the leak is per-decode and the workload is split across workers. The working workaround on the affected platform was --dataset.video_backend=pyav, because the pyav path opens/closes per call and never touches this cache. Switch the backing store to an OrderedDict and evict LRU entries when the cap is reached, closing the evicted file handle inside the lock so we do not leak FDs either. Default cap is DEFAULT_DECODER_CACHE_SIZE = 100, overridable via LEROBOT_VIDEO_DECODER_CACHE_SIZE or by passing max_size= to the constructor; max_size=None restores the legacy unbounded behaviour for callers that need it. Validation on the original failing workload (decode_video_frames_torchcodec called over real mp4s from the affected SO-101 dataset): unbounded: 300 files -> +1087 MB host RSS, cache=300, still climbing cap=50: 500 files -> +266 MB host RSS, cache=50, stable cap=50: 2000 calls -> +312 MB host RSS, cache=50, stable cap=100: 1000 calls -> +470 MB host RSS, cache=100, stable Three independent seeded runs at cap=50 agreed to within 1% (263 / 266 / 265 MB delta), and the 2000-call multi-pass run shows RSS plateaus after the cap is reached instead of drifting. Tests in tests/datasets/test_video_decoder_cache.py cover: default-is-bounded, size cap, LRU ordering, FD close on eviction, FD close on clear(), cache-hit invariance, max_size=None fallback, and env-var override. No regressions in test_video_encoding.py, test_streaming.py, or test_dataset_reader.py (73 prior tests still pass alongside the 8 new ones).	2026-05-19 16:54:25 +02:00

1 2 3 4 5 ...

1512 Commits