Files
lerobot-clone/src/lerobot
Pepijn 5972a85ec7 feat(eval): episode sharding, parallel launcher, and autotune
Add lerobot-eval-parallel and lerobot-eval-autotune entry points for
multi-process evaluation. A single H100 running 4 shards of SmolVLA
achieves ~100% GPU utilisation vs ~0.5% with the serial baseline.

- EvalConfig: add shard_id / num_shards fields; validate ranges
- lerobot_eval.py: _shard_episodes() splits n_episodes round-robin;
  eval_main uses per-shard n_episodes + seed offset; writes
  shard_K_of_N.json when num_shards > 1
- lerobot_eval_parallel.py: spawns K subprocesses with disjoint shard
  IDs, sets MUJOCO_GL and OMP_NUM_THREADS, merges results on completion
- lerobot_eval_autotune.py: probes GPU VRAM, CPU cores, optional model
  footprint and env step time; derives optimal num_shards / batch_size /
  MUJOCO_GL; prints a paste-ready command
- pyproject.toml: register lerobot-eval-parallel and lerobot-eval-autotune

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-07 13:43:03 +02:00
..