mirror of
https://github.com/huggingface/lerobot.git
synced 2026-06-03 20:31:25 +00:00
refactor(annotate): delegate distribution to HF Jobs; drop SLURM/local switch
The executor previously claimed it would "optionally hand off" to
datatrove's LocalPipelineExecutor or SlurmPipelineExecutor — but it
already runs phases inline in every code path, and HF Jobs (see
``examples/annotation/run_hf_job.py``) is the actual distribution
strategy. Stop pretending we have an executor selector.
* `executor.py`: drop `select_executor_class`, the "kind" log line, and
the references to LocalPipelineExecutor / SlurmPipelineExecutor.
Module docstring now says distribution is delegated to HF Jobs.
* `config.py`: drop `auto_threshold`, `force_local`, `slurm_partition`,
`slurm_gpus`, `slurm_time`, `workers`. `ExecutorConfig` keeps only
`episode_parallelism`. While here, prune the longer "why" docstrings
on every field down to the load-bearing bits — full story moves to
`docs/source/annotation_pipeline.mdx`.
* `pyproject.toml`: drop `datatrove>=0.4.0,<2.0.0` from the
`[annotations]` extra; the dep was only there for the (never used)
cluster executors. Comment block notes the new HF-Jobs delegation.
* `reader.py`, `lerobot_annotate.py`: drop their own datatrove /
flavor-namespace mentions.
* `docs/source/annotation_pipeline.mdx`:
- remove the flavor-namespace / sidecar paragraph (out of scope —
"multiple revisions = multiple copies" is dataset-level policy);
- remove the "writer drops the legacy `subtask_index` column" note
(already covered by PR 1's intentional-break call-out);
- remove the chat-template + `apply_chat_template(messages, tools=...)`
line (covered by Tools doc);
- replace the "executor picks Local vs Slurm" paragraph with
`--executor.episode_parallelism` and a pointer to HF Jobs;
- rewrite the style→recipe section to talk about "recipes" generically
instead of pinning a specific YAML;
- add a "Running on Hugging Face Jobs" section pointing at
`examples/annotation/run_hf_job.py`;
- add a "Running locally" example matching the CLI's docstring
(`uv run lerobot-annotate --root=... --vlm.model_id=...`);
- extend the paper-inspirations list with Pi0.7 and Steerable VLA
Policies (Zhao 2025) for Module 3.
Tests: same 3 pre-existing failures as before this commit (2 module
assertions still in flight; 1 carryover from PR 1). 41/44 pass.
Pre-commit clean.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
@@ -26,9 +26,7 @@ episode containing:
|
||||
- ``frames_df``: pandas.DataFrame slice for the episode (only loaded on demand)
|
||||
|
||||
This shape lets each module operate per-episode without loading all parquet
|
||||
rows into memory at once. It deliberately does not depend on datatrove —
|
||||
datatrove integration wraps this generator inside a ``PipelineStep`` in
|
||||
:mod:`.executor`.
|
||||
rows into memory at once.
|
||||
"""
|
||||
|
||||
from __future__ import annotations
|
||||
|
||||
Reference in New Issue
Block a user