mirror of
https://github.com/huggingface/lerobot.git
synced 2026-06-04 04:41:24 +00:00
Add lerobot-eval-parallel and lerobot-eval-autotune entry points for multi-process evaluation. A single H100 running 4 shards of SmolVLA achieves ~100% GPU utilisation vs ~0.5% with the serial baseline. - EvalConfig: add shard_id / num_shards fields; validate ranges - lerobot_eval.py: _shard_episodes() splits n_episodes round-robin; eval_main uses per-shard n_episodes + seed offset; writes shard_K_of_N.json when num_shards > 1 - lerobot_eval_parallel.py: spawns K subprocesses with disjoint shard IDs, sets MUJOCO_GL and OMP_NUM_THREADS, merges results on completion - lerobot_eval_autotune.py: probes GPU VRAM, CPU cores, optional model footprint and env step time; derives optimal num_shards / batch_size / MUJOCO_GL; prints a paste-ready command - pyproject.toml: register lerobot-eval-parallel and lerobot-eval-autotune Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>