Each _InferenceServer now gets its own preprocessor/postprocessor
instances, preventing RuntimeError from HuggingFace tokenizer's
non-thread-safe Rust borrow checker when multiple servers run
concurrently.
Made-with: Cursor
New eval.runtime=multiprocess spawns local lerobot-eval-worker
subprocesses instead of Docker containers. Supports eval.policy_servers
for parallel inference. Works on SLURM clusters and anywhere Docker
is unavailable.
Usage: lerobot-eval --eval.runtime=multiprocess \
--eval.instance_count=8 --eval.policy_servers=4 --eval.port=50051
Made-with: Cursor
Add eval.policy_servers parameter (default 1) that spawns N independent
policy inference servers on consecutive ports. Containers are round-robin
assigned across servers, enabling parallel GPU inference for small models
like SmolVLA (~1.4GB each).
Usage: --eval.policy_servers=4 --eval.instance_count=20
→ 4 model copies on GPU, 20 containers distributed across them.
Made-with: Cursor
Add docker_runtime.py (host-side) and lerobot_eval_worker.py (container-side)
for --eval.runtime=docker. Policy loads once on the host GPU; Docker containers
run env-only workers that call back via HTTP for action chunks, maximising GPU
utilisation across parallel benchmark tasks.
- _InferenceServer: HTTP server wrapping predict_action_chunk with a single lock
- run_eval_in_docker: spawns instance_count containers, collects + merges per-task
JSON, writes eval_info.json compatible with _aggregate_eval_from_per_task
- lerobot-eval-worker CLI: make_env → shard tasks → run episodes → write JSON
- EvalDockerConfig: add port field (default 50051)
- pyproject.toml: add lerobot-eval-worker entry point
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>