lerobot-clone

mirror of https://github.com/huggingface/lerobot.git synced 2026-06-04 21:01:26 +00:00

Go to file

Maxime Ellerbach 2e9cd87bbd feat(policies): add VLA-JEPA (#3568 )

* first commit

* feat(policies): add VLA-JEPA

* feat(policies): add VLA-JEPA

* support vla_jepa

* (feat)policies: add VLA-JEPA

* linting

* adding deps to pyproject.toml

* updating uv lock

* adding guards to avoid needing transformers and diffusers for type checking and basic tests

* fixing action and state dim

* fix warnings with qwen processor kwargs

* fixing wm_loss not propagating

* adjusting obs steps, tublets size to match original implementation

* some more fixes to be closer to the original implem

* adding more tests to ensure good coverage

* align VLA-JEPA architecture with original checkpoint

- Remove stale `action_num_heads` / `action_attention_head_dim` config fields;
  DiT head dimensions are now always derived from the preset (DiT-B/L/test).
- Add `num_target_vision_tokens` and `action_max_seq_len` config fields required
  by the action head's future-token embedding and positional embedding tables.
- Fix default `qwen_model_name` to 2B (matches all released checkpoints).
- Rename `ActionEncoder` attrs w1/w2/w3 → layer1/layer2/layer3 to match
  checkpoint key names; replace `nn.Sequential` decoder/state-encoder with
  `_MLP2` (layer1/layer2 naming).
- Fix `VLAJEPAActionHead` to size ActionEncoder and StateEncoder at `inner_dim`
  (DiT input width) rather than `action_hidden_size` (DiT output width).
- Rename `DiT.blocks` → `transformer_blocks` and `attn` → `attn1` to match
  checkpoint; add alternating cross/self attention (even blocks cross-attend to
  Qwen context, odd blocks self-attend).
- Add `DiT-test` preset for unit tests.
- Rewrite `ActionConditionedVideoPredictor` with explicit ViT-style blocks
  (`_PredictorBlock` with fused qkv) to match checkpoint structure; rename
  `encoder`/`norm`/`proj` → `predictor_blocks`/`predictor_norm`/`predictor_proj`.

* propagate action_is_pad masking through VLA-JEPA policy pipeline

Pass the `action_is_pad` tensor from the batch through to the action head
so padded timesteps are excluded from the flow-matching loss.

* update VLA-JEPA tests for arch changes and action_is_pad

- Switch conftest to use `action_model_type="DiT-test"` now that
  `action_num_heads` / `action_attention_head_dim` have been removed.
- Add action_head tests covering fully-padded loss (zero) and equivalence
  of action_is_pad=None vs all-zeros mask.
- Remove obsolete `test_native_to_lerobot_wm_only` test.

* add VLA-JEPA documentation

Covers architecture overview, pretrained checkpoints, config reference,
training/eval commands for LIBERO-10, and guidance on fine-tuning for
single-camera datasets.

* add one-shot script to convert ginwind/VLA-JEPA checkpoints to safetensors (will remove once migrated)

* make default params more aligned with paper and pretrained models
- adding possibility of freezing qwen backbone and world model
- added tests for weight loading

* trying out to re-init the action head to avoid pretraining dimension mismatch

* allow different state dim and action dim

* removing missleading future_action_window_size to just use chunk_size

* lots of changes to make existing weights work, need to massively refactor the pre and post processing

* refactoring into using pre and post processor

* pre-commit cleanup

* fixing doc defaults args

Signed-off-by: Maxime Ellerbach <maxime@ellerbach.net>

* adressing dtype zeros issue

* adding guard for diffusers

* fixing training and exal examples

* trying to close success rate gap

* fix qwen norm layer output libero eval is now as expected

* adding instructions for different embodiement + fixing some tests

* smol fix to avoid having default CPU device when training

* fixing misconception about multiview / singleview handling

* removing conversion script

* adding licences

* adding .mdx docs and shortening polivy_vla_jepa_README.md

* removing useless pre-processor

* cleanup

* removing swish in favor of silu

* adding configuration gripper index and threshold

* fixing simlink

---------

Signed-off-by: Maxime Ellerbach <maxime@ellerbach.net>
Co-authored-by: ginwind <ginwind@mail.ustc.edu.cn>

2026-06-04 19:22:51 +02:00

.github

chore(deps): cap torch ceiling at <2.12, pin Linux wheels to cu128 (#3570 )

2026-05-11 19:47:55 +02:00

docker

chore(deps): cap torch ceiling at <2.12, pin Linux wheels to cu128 (#3570 )

2026-05-11 19:47:55 +02:00

docs

feat(policies): add VLA-JEPA (#3568 )

2026-06-04 19:22:51 +02:00

examples

refactor: support custom progress parquet overlays (#3640 )

2026-05-21 14:32:10 +02:00

media/readme

feat(docs): modernize readme (#2660 )

2025-12-18 19:45:13 +01:00

scripts/ci

feat(sim): VLABench benchmark integration (#3396 )

2026-04-21 17:54:11 +02:00

src/lerobot

feat(policies): add VLA-JEPA (#3568 )

2026-06-04 19:22:51 +02:00

tests

feat(policies): add VLA-JEPA (#3568 )

2026-06-04 19:22:51 +02:00

.dockerignore

Organize test folders (#856 )

2025-03-13 14:05:55 +01:00

.gitattributes

Hardware API redesign (#777 )

2025-06-05 17:48:43 +02:00

.gitignore

feat(ci): add uv.lock (#3292 )

2026-04-06 12:23:37 +02:00

.pre-commit-config.yaml

feat(dependencies): require Python 3.12+ as minimum version (#3023 )

2026-03-06 10:15:13 +01:00

AGENT_GUIDE.md

docs: add policy & compute guide (#3534 )

2026-05-11 15:19:12 +02:00

AGENTS.md

docs(agents): add AGENT_GUIDE.md for user facing agent (#3430 )

2026-04-22 11:54:19 +02:00

AI_POLICY.md

chore: add AI policy (#3055 )

2026-02-28 14:41:28 +01:00

CLAUDE.md

feat(ci): add agent assitance workflow (#3332 )

2026-04-09 12:06:25 +02:00

CODE_OF_CONDUCT.md

chore(ci): update PR template (#2665 )

2025-12-17 17:10:04 +01:00

CONTRIBUTING.md

chore(docs): update contributing (#3387 )

2026-04-15 11:02:37 +02:00

docs-requirements.txt

feat(ci): release workflow publish to pypi test + lock files (#1643 )

2025-08-01 17:14:15 +02:00

LICENSE

Add simxarm license

2024-03-25 12:28:07 +01:00

Makefile

feat(sim): add metaworld env (#2088 )

2025-10-14 17:21:18 +02:00

MANIFEST.in

Expose sarm package API and ship reward model card template (#3477 )

2026-04-29 16:17:16 +02:00

pyproject.toml

feat(policies): add VLA-JEPA (#3568 )

2026-06-04 19:22:51 +02:00

README.md

docs: add policy & compute guide (#3534 )

2026-05-11 15:19:12 +02:00

requirements-macos.txt

chore(deps): update requirements file (#3114 )

2026-03-09 11:18:05 +01:00

requirements-ubuntu.txt

chore(deps): update requirements file (#3114 )

2026-03-09 11:18:05 +01:00

requirements.in

chore(deps): update requirements file (#3114 )

2026-03-09 11:18:05 +01:00

SECURITY.md

chore: add security policy (#2809 )

2026-01-16 14:38:42 +01:00

setup.py

chore: adds dynamic README handling and setup script (#2724 )

2025-12-28 01:45:06 +01:00

uv.lock

feat(policies): add VLA-JEPA (#3568 )

2026-06-04 19:22:51 +02:00

README.md

LeRobot aims to provide models, datasets, and tools for real-world robotics in PyTorch. The goal is to lower the barrier to entry so that everyone can contribute to and benefit from shared datasets and pretrained models.

🤗 A hardware-agnostic, Python-native interface that standardizes control across diverse platforms, from low-cost arms (SO-100) to humanoids.

🤗 A standardized, scalable LeRobotDataset format (Parquet + MP4 or images) hosted on the Hugging Face Hub, enabling efficient storage, streaming and visualization of massive robotic datasets.

🤗 State-of-the-art policies that have been shown to transfer to the real-world ready for training and deployment.

🤗 Comprehensive support for the open-source ecosystem to democratize physical AI.

Quick Start

LeRobot can be installed directly from PyPI.

pip install lerobot
lerobot-info

Important

For detailed installation guide, please see the Installation Documentation.

Robots & Control

LeRobot provides a unified Robot class interface that decouples control logic from hardware specifics. It supports a wide range of robots and teleoperation devices.

from lerobot.robots.myrobot import MyRobot

# Connect to a robot
robot = MyRobot(config=...)
robot.connect()

# Read observation and send action
obs = robot.get_observation()
action = model.select_action(obs)
robot.send_action(action)

Supported Hardware: SO100, LeKiwi, Koch, HopeJR, OMX, EarthRover, Reachy2, Gamepads, Keyboards, Phones, OpenARM, Unitree G1.

While these devices are natively integrated into the LeRobot codebase, the library is designed to be extensible. You can easily implement the Robot interface to utilize LeRobot's data collection, training, and visualization tools for your own custom robot.

For detailed hardware setup guides, see the Hardware Documentation.

LeRobot Dataset

To solve the data fragmentation problem in robotics, we utilize the LeRobotDataset format.

Structure: Synchronized MP4 videos (or images) for vision and Parquet files for state/action data.
HF Hub Integration: Explore thousands of robotics datasets on the Hugging Face Hub.
Tools: Seamlessly delete episodes, split by indices/fractions, add/remove features, and merge multiple datasets.

from lerobot.datasets.lerobot_dataset import LeRobotDataset

# Load a dataset from the Hub
dataset = LeRobotDataset("lerobot/aloha_mobile_cabinet")

# Access data (automatically handles video decoding)
episode_index=0
print(f"{dataset[episode_index]['action'].shape=}\n")

Learn more about it in the LeRobotDataset Documentation

SoTA Models

LeRobot implements state-of-the-art policies in pure PyTorch, covering Imitation Learning, Reinforcement Learning, and Vision-Language-Action (VLA) models, with more coming soon. It also provides you with the tools to instrument and inspect your training process.

Training a policy is as simple as running a script configuration:

lerobot-train \
  --policy=act \
  --dataset.repo_id=lerobot/aloha_mobile_cabinet

Category	Models
Imitation Learning	ACT, Diffusion, VQ-BeT, Multitask DiT Policy
Reinforcement Learning	HIL-SERL, TDMPC & QC-FQL (coming soon)
VLAs Models	Pi0Fast, Pi0.5, GR00T N1.5, SmolVLA, XVLA

Similarly to the hardware, you can easily implement your own policy & leverage LeRobot's data collection, training, and visualization tools, and share your model to the HF Hub

For detailed policy setup guides, see the Policy Documentation. For GPU/RAM requirements and expected training time per policy, see the Compute Hardware Guide.

Inference & Evaluation

Evaluate your policies in simulation or on real hardware using the unified evaluation script. LeRobot supports standard benchmarks like LIBERO, MetaWorld and more to come.

# Evaluate a policy on the LIBERO benchmark
lerobot-eval \
  --policy.path=lerobot/pi0_libero_finetuned \
  --env.type=libero \
  --env.task=libero_object \
  --eval.n_episodes=10

Learn how to implement your own simulation environment or benchmark and distribute it from the HF Hub by following the EnvHub Documentation

Resources

Documentation: The complete guide to tutorials & API.
Chinese Tutorials: LeRobot+SO-ARM101中文教程-同济子豪兄 Detailed doc for assembling, teleoperate, dataset, train, deploy. Verified by Seed Studio and 5 global hackathon players.
Discord: Join the LeRobot server to discuss with the community.
X: Follow us on X to stay up-to-date with the latest developments.
Robot Learning Tutorial: A free, hands-on course to learn robot learning using LeRobot.

Citation

If you use LeRobot in your project, please cite the GitHub repository to acknowledge the ongoing development and contributors:

@misc{cadene2024lerobot,
    author = {Cadene, Remi and Alibert, Simon and Soare, Alexander and Gallouedec, Quentin and Zouitine, Adil and Palma, Steven and Kooijmans, Pepijn and Aractingi, Michel and Shukor, Mustafa and Aubakirova, Dana and Russi, Martino and Capuano, Francesco and Pascal, Caroline and Choghari, Jade and Moss, Jess and Wolf, Thomas},
    title = {LeRobot: State-of-the-art Machine Learning for Real-World Robotics in Pytorch},
    howpublished = "\url{https://github.com/huggingface/lerobot}",
    year = {2024}
}

If you are referencing our research or the academic paper, please also cite our ICLR publication:

ICLR 2026 Paper

@inproceedings{cadenelerobot,
  title={LeRobot: An Open-Source Library for End-to-End Robot Learning},
  author={Cadene, Remi and Alibert, Simon and Capuano, Francesco and Aractingi, Michel and Zouitine, Adil and Kooijmans, Pepijn and Choghari, Jade and Russi, Martino and Pascal, Caroline and Palma, Steven and Shukor, Mustafa and Moss, Jess and Soare, Alexander and Aubakirova, Dana and Lhoest, Quentin and Gallou\'edec, Quentin and Wolf, Thomas},
  booktitle={The Fourteenth International Conference on Learning Representations},
  year={2026},
  url={https://arxiv.org/abs/2602.22818}
}

Contribute

We welcome contributions from everyone in the community! To get started, please read our CONTRIBUTING.md guide. Whether you're adding a new feature, improving documentation, or fixing a bug, your help and feedback are invaluable. We're incredibly excited about the future of open-source robotics and can't wait to work with you on what's next—thank you for your support!

_{Built by the LeRobot team at Hugging Face with ❤️}