feat: language annotation pipeline (PR 2/3)
Adds the steerable annotation pipeline (`lerobot-annotate`) that populates
the `language_persistent` and `language_events` columns introduced in
PR 1 directly into `data/chunk-*/file-*.parquet`. No flavor namespace,
no sidecar tree.
Modules produced:
- Module 1 (plan_subtasks_memory): Pi0.7-style subtasks, plan (init +
refresh on interjection), MEM-style memory at subtask boundaries.
- Module 2 (interjections_and_speech): t=0 speech-only acknowledgement,
mid-episode paired interjection + speech tool-call atom.
- Module 3 (general_vqa): bbox/keypoint/count/attribute/spatial pairs at
configurable cadence with one-retry JSON validation.
Writer enforces: per-episode persistent identity, exact-frame event
timestamps, column routing per `column_for_style`, dataset-level `tools`
column with the `say` schema, drops legacy `subtask_index`. Validator
runs against staged JSONL artifacts before the writer rewrites parquet.
Adds `lerobot-annotate` console script, `annotations` extra (datatrove +
optional vllm), `make annotation-e2e` opt-in smoke target, and
`docs/source/annotation_pipeline.mdx`.
Branched from PR 1 (`feat/language-columns`).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-27 16:22:51 +02:00
|
|
|
|
# Annotation Pipeline
|
|
|
|
|
|
|
2026-06-04 11:48:59 +02:00
|
|
|
|
`lerobot-annotate` watches each episode's video with a vision-language
|
|
|
|
|
|
model (VLM) and writes natural-language annotations back into your
|
|
|
|
|
|
dataset. It fills the two language columns from the
|
feat: language annotation pipeline (PR 2/3)
Adds the steerable annotation pipeline (`lerobot-annotate`) that populates
the `language_persistent` and `language_events` columns introduced in
PR 1 directly into `data/chunk-*/file-*.parquet`. No flavor namespace,
no sidecar tree.
Modules produced:
- Module 1 (plan_subtasks_memory): Pi0.7-style subtasks, plan (init +
refresh on interjection), MEM-style memory at subtask boundaries.
- Module 2 (interjections_and_speech): t=0 speech-only acknowledgement,
mid-episode paired interjection + speech tool-call atom.
- Module 3 (general_vqa): bbox/keypoint/count/attribute/spatial pairs at
configurable cadence with one-retry JSON validation.
Writer enforces: per-episode persistent identity, exact-frame event
timestamps, column routing per `column_for_style`, dataset-level `tools`
column with the `say` schema, drops legacy `subtask_index`. Validator
runs against staged JSONL artifacts before the writer rewrites parquet.
Adds `lerobot-annotate` console script, `annotations` extra (datatrove +
optional vllm), `make annotation-e2e` opt-in smoke target, and
`docs/source/annotation_pipeline.mdx`.
Branched from PR 1 (`feat/language-columns`).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-27 16:22:51 +02:00
|
|
|
|
[Language Columns and Recipes](./language_and_recipes) page —
|
2026-06-04 11:48:59 +02:00
|
|
|
|
`language_persistent` and `language_events` — straight into
|
refactor(annotate): delegate distribution to HF Jobs; drop SLURM/local switch
The executor previously claimed it would "optionally hand off" to
datatrove's LocalPipelineExecutor or SlurmPipelineExecutor — but it
already runs phases inline in every code path, and HF Jobs (see
``examples/annotation/run_hf_job.py``) is the actual distribution
strategy. Stop pretending we have an executor selector.
* `executor.py`: drop `select_executor_class`, the "kind" log line, and
the references to LocalPipelineExecutor / SlurmPipelineExecutor.
Module docstring now says distribution is delegated to HF Jobs.
* `config.py`: drop `auto_threshold`, `force_local`, `slurm_partition`,
`slurm_gpus`, `slurm_time`, `workers`. `ExecutorConfig` keeps only
`episode_parallelism`. While here, prune the longer "why" docstrings
on every field down to the load-bearing bits — full story moves to
`docs/source/annotation_pipeline.mdx`.
* `pyproject.toml`: drop `datatrove>=0.4.0,<2.0.0` from the
`[annotations]` extra; the dep was only there for the (never used)
cluster executors. Comment block notes the new HF-Jobs delegation.
* `reader.py`, `lerobot_annotate.py`: drop their own datatrove /
flavor-namespace mentions.
* `docs/source/annotation_pipeline.mdx`:
- remove the flavor-namespace / sidecar paragraph (out of scope —
"multiple revisions = multiple copies" is dataset-level policy);
- remove the "writer drops the legacy `subtask_index` column" note
(already covered by PR 1's intentional-break call-out);
- remove the chat-template + `apply_chat_template(messages, tools=...)`
line (covered by Tools doc);
- replace the "executor picks Local vs Slurm" paragraph with
`--executor.episode_parallelism` and a pointer to HF Jobs;
- rewrite the style→recipe section to talk about "recipes" generically
instead of pinning a specific YAML;
- add a "Running on Hugging Face Jobs" section pointing at
`examples/annotation/run_hf_job.py`;
- add a "Running locally" example matching the CLI's docstring
(`uv run lerobot-annotate --root=... --vlm.model_id=...`);
- extend the paper-inspirations list with Pi0.7 and Steerable VLA
Policies (Zhao 2025) for Module 3.
Tests: same 3 pre-existing failures as before this commit (2 module
assertions still in flight; 1 carryover from PR 1). 41/44 pass.
Pre-commit clean.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-08 11:09:22 +02:00
|
|
|
|
`data/chunk-*/file-*.parquet`.
|
feat: language annotation pipeline (PR 2/3)
Adds the steerable annotation pipeline (`lerobot-annotate`) that populates
the `language_persistent` and `language_events` columns introduced in
PR 1 directly into `data/chunk-*/file-*.parquet`. No flavor namespace,
no sidecar tree.
Modules produced:
- Module 1 (plan_subtasks_memory): Pi0.7-style subtasks, plan (init +
refresh on interjection), MEM-style memory at subtask boundaries.
- Module 2 (interjections_and_speech): t=0 speech-only acknowledgement,
mid-episode paired interjection + speech tool-call atom.
- Module 3 (general_vqa): bbox/keypoint/count/attribute/spatial pairs at
configurable cadence with one-retry JSON validation.
Writer enforces: per-episode persistent identity, exact-frame event
timestamps, column routing per `column_for_style`, dataset-level `tools`
column with the `say` schema, drops legacy `subtask_index`. Validator
runs against staged JSONL artifacts before the writer rewrites parquet.
Adds `lerobot-annotate` console script, `annotations` extra (datatrove +
optional vllm), `make annotation-e2e` opt-in smoke target, and
`docs/source/annotation_pipeline.mdx`.
Branched from PR 1 (`feat/language-columns`).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-27 16:22:51 +02:00
|
|
|
|
|
2026-06-04 11:48:59 +02:00
|
|
|
|
In short: point it at a LeRobot dataset, and it adds subtasks, plans,
|
|
|
|
|
|
memory, interjections, speech, and visual Q&A that a policy can be
|
|
|
|
|
|
trained on.
|
|
|
|
|
|
|
|
|
|
|
|
## How it fits together
|
|
|
|
|
|
|
|
|
|
|
|
```text
|
2026-06-04 11:59:31 +02:00
|
|
|
|
your dataset lerobot-annotate
|
|
|
|
|
|
(LeRobot v3.1)
|
|
|
|
|
|
│
|
|
|
|
|
|
▼
|
|
|
|
|
|
┌─────────────────────────────────────────────────────┐
|
|
|
|
|
|
│ read episodes │
|
|
|
|
|
|
└──────────────────────────┬──────────────────────────┘
|
|
|
|
|
|
│
|
|
|
|
|
|
┌────────────────────┼────────────────────┐
|
|
|
|
|
|
▼ ▼ ▼
|
|
|
|
|
|
┌──────────┐ ┌───────────────┐ ┌──────────┐ one shared Qwen-VL
|
|
|
|
|
|
│ plan │ │ interjections │ │ vqa │ ◀── server (vLLM, OpenAI
|
|
|
|
|
|
└────┬─────┘ └───────┬───────┘ └────┬─────┘ API) drives all three
|
|
|
|
|
|
└────────────────────┼─────────────────────┘
|
|
|
|
|
|
│ each module stages raw JSONL
|
|
|
|
|
|
▼ into .annotate_staging/
|
|
|
|
|
|
┌─────────────────┐
|
|
|
|
|
|
│ validator │ ◀── checks everything
|
|
|
|
|
|
└────────┬────────┘
|
|
|
|
|
|
▼
|
|
|
|
|
|
┌─────────────────┐
|
|
|
|
|
|
│ writer │
|
|
|
|
|
|
└────────┬────────┘
|
|
|
|
|
|
▼
|
|
|
|
|
|
data/chunk-*/file-*.parquet
|
|
|
|
|
|
(+ meta/info.json tools)
|
2026-06-04 11:48:59 +02:00
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
|
|
Three modules (`plan`, `interjections`, `vqa`) all talk to **one** shared
|
|
|
|
|
|
VLM. Each module stages its output to disk, a validator checks it, and a
|
|
|
|
|
|
single writer rewrites the dataset shards in place.
|
|
|
|
|
|
|
feat: language annotation pipeline (PR 2/3)
Adds the steerable annotation pipeline (`lerobot-annotate`) that populates
the `language_persistent` and `language_events` columns introduced in
PR 1 directly into `data/chunk-*/file-*.parquet`. No flavor namespace,
no sidecar tree.
Modules produced:
- Module 1 (plan_subtasks_memory): Pi0.7-style subtasks, plan (init +
refresh on interjection), MEM-style memory at subtask boundaries.
- Module 2 (interjections_and_speech): t=0 speech-only acknowledgement,
mid-episode paired interjection + speech tool-call atom.
- Module 3 (general_vqa): bbox/keypoint/count/attribute/spatial pairs at
configurable cadence with one-retry JSON validation.
Writer enforces: per-episode persistent identity, exact-frame event
timestamps, column routing per `column_for_style`, dataset-level `tools`
column with the `say` schema, drops legacy `subtask_index`. Validator
runs against staged JSONL artifacts before the writer rewrites parquet.
Adds `lerobot-annotate` console script, `annotations` extra (datatrove +
optional vllm), `make annotation-e2e` opt-in smoke target, and
`docs/source/annotation_pipeline.mdx`.
Branched from PR 1 (`feat/language-columns`).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-27 16:22:51 +02:00
|
|
|
|
## What the pipeline produces
|
|
|
|
|
|
|
2026-06-04 11:48:59 +02:00
|
|
|
|
Each module emits a few kinds of annotation ("styles"), routed to one of
|
|
|
|
|
|
the two language columns:
|
feat: language annotation pipeline (PR 2/3)
Adds the steerable annotation pipeline (`lerobot-annotate`) that populates
the `language_persistent` and `language_events` columns introduced in
PR 1 directly into `data/chunk-*/file-*.parquet`. No flavor namespace,
no sidecar tree.
Modules produced:
- Module 1 (plan_subtasks_memory): Pi0.7-style subtasks, plan (init +
refresh on interjection), MEM-style memory at subtask boundaries.
- Module 2 (interjections_and_speech): t=0 speech-only acknowledgement,
mid-episode paired interjection + speech tool-call atom.
- Module 3 (general_vqa): bbox/keypoint/count/attribute/spatial pairs at
configurable cadence with one-retry JSON validation.
Writer enforces: per-episode persistent identity, exact-frame event
timestamps, column routing per `column_for_style`, dataset-level `tools`
column with the `say` schema, drops legacy `subtask_index`. Validator
runs against staged JSONL artifacts before the writer rewrites parquet.
Adds `lerobot-annotate` console script, `annotations` extra (datatrove +
optional vllm), `make annotation-e2e` opt-in smoke target, and
`docs/source/annotation_pipeline.mdx`.
Branched from PR 1 (`feat/language-columns`).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-27 16:22:51 +02:00
|
|
|
|
|
2026-06-02 17:41:46 +02:00
|
|
|
|
| Style / atom | Column | Module |
|
|
|
|
|
|
| ------------------------------------------- | --------------------- | --------------- |
|
|
|
|
|
|
| `subtask` (Pi0.7-style "how, not what") | `language_persistent` | `plan` |
|
|
|
|
|
|
| `plan` (initial + refresh on interjection) | `language_persistent` | `plan` |
|
|
|
|
|
|
| `memory` (MEM-style compression) | `language_persistent` | `plan` |
|
2026-06-04 11:48:59 +02:00
|
|
|
|
| `task_aug` (rephrasings of the task) | `language_persistent` | `plan` |
|
2026-06-02 17:41:46 +02:00
|
|
|
|
| `interjection` | `language_events` | `interjections` |
|
|
|
|
|
|
| speech tool-call atom (`style=null`, `say`) | `language_events` | `interjections` |
|
|
|
|
|
|
| `vqa` (user / assistant pair) | `language_events` | `vqa` |
|
feat: language annotation pipeline (PR 2/3)
Adds the steerable annotation pipeline (`lerobot-annotate`) that populates
the `language_persistent` and `language_events` columns introduced in
PR 1 directly into `data/chunk-*/file-*.parquet`. No flavor namespace,
no sidecar tree.
Modules produced:
- Module 1 (plan_subtasks_memory): Pi0.7-style subtasks, plan (init +
refresh on interjection), MEM-style memory at subtask boundaries.
- Module 2 (interjections_and_speech): t=0 speech-only acknowledgement,
mid-episode paired interjection + speech tool-call atom.
- Module 3 (general_vqa): bbox/keypoint/count/attribute/spatial pairs at
configurable cadence with one-retry JSON validation.
Writer enforces: per-episode persistent identity, exact-frame event
timestamps, column routing per `column_for_style`, dataset-level `tools`
column with the `say` schema, drops legacy `subtask_index`. Validator
runs against staged JSONL artifacts before the writer rewrites parquet.
Adds `lerobot-annotate` console script, `annotations` extra (datatrove +
optional vllm), `make annotation-e2e` opt-in smoke target, and
`docs/source/annotation_pipeline.mdx`.
Branched from PR 1 (`feat/language-columns`).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-27 16:22:51 +02:00
|
|
|
|
|
2026-06-04 11:48:59 +02:00
|
|
|
|
### How subtasks are generated
|
|
|
|
|
|
|
|
|
|
|
|
The `plan` module doesn't ask the VLM for subtasks in one shot. Instead
|
|
|
|
|
|
it uses a two-step **describe → segment** flow:
|
|
|
|
|
|
|
|
|
|
|
|
1. **Describe** — the VLM narrates only what it actually sees in the
|
|
|
|
|
|
chosen camera (no guessing about the task).
|
|
|
|
|
|
2. **Segment** — that description is fed back in, and the VLM splits the
|
|
|
|
|
|
episode into consecutive atomic subtasks.
|
|
|
|
|
|
|
|
|
|
|
|
The resulting spans are then stitched into a gap-free, full-episode
|
|
|
|
|
|
cover, so **every frame has exactly one active subtask**. See
|
annotate: address review feedback — bug fixes, docs/code drift, naming, cleanup
Bugs
* validator: don't re-raise on unknown style. The second column_for_style
lookup (used to route persistent vs event) now sits in try/except so an
unknown style is recorded by _check_column_routing and skipped instead
of crashing the whole validation pass.
* general_vqa._target_cameras: when restrict_to_default_camera is set but
the configured camera_key isn't one the provider exposes, warn and fall
back to all cameras instead of returning a phantom key that KeyErrors
deep in frame decode.
* interjections: clamp interjection timestamps to frame_timestamps[0]
rather than a hardcoded 0.0 (datasets can start at non-zero t).
Docs / code drift
* annotation_pipeline.mdx: drop the phantom 'vocabulary discovery / phase
0 / --vocabulary.* / canonical_vocabulary.json' section (none of it
exists); describe the real describe->segment + coverage-stitch flow.
Soften the src/lerobot/tools/ + TOOL_REGISTRY reference to 'not part of
this PR' (matches tools.mdx, which already marks the runtime layer as
not-yet-implemented). Fix the --push_to_hub/--new_repo_id wording. Note
the default is now a single h200. Add a 'Contributing new modules'
section inviting module / prompt / quality contributions.
* executor docstring: six phases, no phantom phase 0.
run_hf_job.py
* add the Apache 2.0 license header (was flagged repeatedly).
* default to a single GPU: flavor=h200, parallel_servers=1, num_gpus=1
(scale to h200x4 noted in the docstring).
* pin the install to @main instead of the feature branch (won't break
after merge).
Naming / cleanup
* rename dest_repo_id -> new_repo_id across config / script / example /
test to match the LeRobot dataset edit tools.
* rename prompt templates module_N_*.txt -> descriptive (plan_*,
interjections_*, vqa.txt) and update every load_prompt() call.
* remove dead _messages_to_prompt (used only by the removed in-process
backends).
* declare _warned_decode_fail (frames) and _warned_no_camera (vqa) as
real init=False dataclass fields instead of getattr monkey-patches.
* scope bandit B607 to the two ffmpeg subprocess.run sites via
'# nosec B607' and drop it from the global skip list.
Tests
* fix stale canned-VLM markers ('ONE realistic interruption' ->
'compact interjection', 'Update the memory' -> 'compressed semantic
memory') and drop the dead 'concise hierarchical PLAN' plan responders
(plan generation is deterministic now) in run_e2e_smoke,
test_pipeline_recipe_render, test_modules.
* run_e2e_smoke now asserts interjection + speech rows are produced so a
stale marker can't silently pass again.
* drop remaining 'PR 1' / 'PR 2' references from test comments / names.
Verified: tests/annotations + tests/datasets/test_language +
tests/scripts/test_lerobot_annotate (31 passed); make-style E2E smoke
(interjections=1 speech_atoms=2); pre-commit (ruff, mypy, bandit,
prettier) clean.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-06-03 18:30:46 +02:00
|
|
|
|
[`run_hf_job.py`](https://github.com/huggingface/lerobot/blob/main/examples/annotations/run_hf_job.py)
|
2026-06-04 11:48:59 +02:00
|
|
|
|
for the production settings (single camera, embedded frames, windowed
|
annotate: address review feedback — bug fixes, docs/code drift, naming, cleanup
Bugs
* validator: don't re-raise on unknown style. The second column_for_style
lookup (used to route persistent vs event) now sits in try/except so an
unknown style is recorded by _check_column_routing and skipped instead
of crashing the whole validation pass.
* general_vqa._target_cameras: when restrict_to_default_camera is set but
the configured camera_key isn't one the provider exposes, warn and fall
back to all cameras instead of returning a phantom key that KeyErrors
deep in frame decode.
* interjections: clamp interjection timestamps to frame_timestamps[0]
rather than a hardcoded 0.0 (datasets can start at non-zero t).
Docs / code drift
* annotation_pipeline.mdx: drop the phantom 'vocabulary discovery / phase
0 / --vocabulary.* / canonical_vocabulary.json' section (none of it
exists); describe the real describe->segment + coverage-stitch flow.
Soften the src/lerobot/tools/ + TOOL_REGISTRY reference to 'not part of
this PR' (matches tools.mdx, which already marks the runtime layer as
not-yet-implemented). Fix the --push_to_hub/--new_repo_id wording. Note
the default is now a single h200. Add a 'Contributing new modules'
section inviting module / prompt / quality contributions.
* executor docstring: six phases, no phantom phase 0.
run_hf_job.py
* add the Apache 2.0 license header (was flagged repeatedly).
* default to a single GPU: flavor=h200, parallel_servers=1, num_gpus=1
(scale to h200x4 noted in the docstring).
* pin the install to @main instead of the feature branch (won't break
after merge).
Naming / cleanup
* rename dest_repo_id -> new_repo_id across config / script / example /
test to match the LeRobot dataset edit tools.
* rename prompt templates module_N_*.txt -> descriptive (plan_*,
interjections_*, vqa.txt) and update every load_prompt() call.
* remove dead _messages_to_prompt (used only by the removed in-process
backends).
* declare _warned_decode_fail (frames) and _warned_no_camera (vqa) as
real init=False dataclass fields instead of getattr monkey-patches.
* scope bandit B607 to the two ffmpeg subprocess.run sites via
'# nosec B607' and drop it from the global skip list.
Tests
* fix stale canned-VLM markers ('ONE realistic interruption' ->
'compact interjection', 'Update the memory' -> 'compressed semantic
memory') and drop the dead 'concise hierarchical PLAN' plan responders
(plan generation is deterministic now) in run_e2e_smoke,
test_pipeline_recipe_render, test_modules.
* run_e2e_smoke now asserts interjection + speech rows are produced so a
stale marker can't silently pass again.
* drop remaining 'PR 1' / 'PR 2' references from test comments / names.
Verified: tests/annotations + tests/datasets/test_language +
tests/scripts/test_lerobot_annotate (31 passed); make-style E2E smoke
(interjections=1 speech_atoms=2); pre-commit (ruff, mypy, bandit,
prettier) clean.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-06-03 18:30:46 +02:00
|
|
|
|
subtask generation).
|
feat(annotate): phase 0 — derive canonical vocabulary from sample episodes
The pipeline previously emitted near-unique subtask + memory phrasings
per episode (free-form LLM rephrasing). On the downstream low-level
policy that collapses the action expert's conditioning to noise: every
episode pairs a different paraphrase with similar motions, so the
expert learns a flat scene-prior that ignores the subtask string —
then at inference the high-level head invents *yet another* paraphrase
and the expert produces tiny "uncertain hover" chunks.
Add a vocabulary-discovery phase (phase 0) that runs once per dataset:
- watches the first ``vocabulary.sample_episodes`` (default 3)
episode videos as one Qwen-VL prompt,
- asks the VLM to derive ~``n_subtask_target`` canonical imperative
subtask labels and ~``n_memory_target`` first-person past-tense
memory milestones that recur across the demos,
- persists them to ``meta/canonical_vocabulary.json`` (human-
inspectable, hand-editable), and
- wires the resulting ``Vocabulary`` into the ``plan`` module so
every per-episode subtask + memory call is constrained to those
exact strings (both as prompt-side instructions *and* post-VLM
validation: paraphrases snap to the closest canonical entry via
token-set overlap; below a 0.5 Jaccard floor the subtask is
dropped rather than warped into something semantically wrong).
Operator workflow:
- first run discovers the vocabulary, writes the JSON, and runs
the ``plan`` module against it,
- subsequent runs reuse the on-disk file (``reuse_existing=True``
default) so hand-edits stick,
- set ``--vocabulary.enabled=False`` to fall back to free-form
generation (the original behaviour).
The discovery prompt forbids gerunds / third-person / adverbs and
caps the lists to the requested counts, matching the Hi-Robot /
π0.6-MEM convention of small per-environment vocabularies. The
``plan`` module's subtask + memory prompts grow a conditional
``{vocabulary_block}`` slot rendered only when a vocabulary is
present; without one the templates collapse to their previous
free-form form.
Tests: 11 new unit tests under tests/annotations/test_vocabulary.py
cover the on-disk round-trip, discovery against the fixture dataset,
``reuse_existing`` short-circuit, paraphrase canonicalisation, off-
vocab subtask dropping, and the no-vocabulary pass-through path.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-authored-by: Cursor <cursoragent@cursor.com>
2026-05-22 11:40:05 +00:00
|
|
|
|
|
2026-06-04 11:48:59 +02:00
|
|
|
|
### Tools
|
|
|
|
|
|
|
|
|
|
|
|
The writer does **not** add a `tools` column to the parquet. The tool
|
|
|
|
|
|
catalog lives in `meta/info.json["tools"]` instead (see [Tools](./tools)).
|
|
|
|
|
|
After every run, the pipeline makes sure the canonical `say` schema is in
|
|
|
|
|
|
that list, keeping any tools you declared beforehand.
|
2026-04-30 18:51:38 +02:00
|
|
|
|
|
2026-06-04 11:48:59 +02:00
|
|
|
|
Want to add your own tool? Edit `meta/info.json["tools"]` directly — the
|
|
|
|
|
|
pipeline preserves whatever is already there. That makes the tool visible
|
|
|
|
|
|
to the chat template, so the model can learn to _generate_ the call. The
|
|
|
|
|
|
runtime layer that actually _executes_ a generated call (the `Tool`
|
|
|
|
|
|
protocol / `TOOL_REGISTRY` under `src/lerobot/tools/`) is not part of
|
|
|
|
|
|
this PR — the [Tools](./tools) doc marks those pieces as
|
|
|
|
|
|
not-yet-implemented.
|
feat: language annotation pipeline (PR 2/3)
Adds the steerable annotation pipeline (`lerobot-annotate`) that populates
the `language_persistent` and `language_events` columns introduced in
PR 1 directly into `data/chunk-*/file-*.parquet`. No flavor namespace,
no sidecar tree.
Modules produced:
- Module 1 (plan_subtasks_memory): Pi0.7-style subtasks, plan (init +
refresh on interjection), MEM-style memory at subtask boundaries.
- Module 2 (interjections_and_speech): t=0 speech-only acknowledgement,
mid-episode paired interjection + speech tool-call atom.
- Module 3 (general_vqa): bbox/keypoint/count/attribute/spatial pairs at
configurable cadence with one-retry JSON validation.
Writer enforces: per-episode persistent identity, exact-frame event
timestamps, column routing per `column_for_style`, dataset-level `tools`
column with the `say` schema, drops legacy `subtask_index`. Validator
runs against staged JSONL artifacts before the writer rewrites parquet.
Adds `lerobot-annotate` console script, `annotations` extra (datatrove +
optional vllm), `make annotation-e2e` opt-in smoke target, and
`docs/source/annotation_pipeline.mdx`.
Branched from PR 1 (`feat/language-columns`).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-27 16:22:51 +02:00
|
|
|
|
|
refactor(annotate): delegate distribution to HF Jobs; drop SLURM/local switch
The executor previously claimed it would "optionally hand off" to
datatrove's LocalPipelineExecutor or SlurmPipelineExecutor — but it
already runs phases inline in every code path, and HF Jobs (see
``examples/annotation/run_hf_job.py``) is the actual distribution
strategy. Stop pretending we have an executor selector.
* `executor.py`: drop `select_executor_class`, the "kind" log line, and
the references to LocalPipelineExecutor / SlurmPipelineExecutor.
Module docstring now says distribution is delegated to HF Jobs.
* `config.py`: drop `auto_threshold`, `force_local`, `slurm_partition`,
`slurm_gpus`, `slurm_time`, `workers`. `ExecutorConfig` keeps only
`episode_parallelism`. While here, prune the longer "why" docstrings
on every field down to the load-bearing bits — full story moves to
`docs/source/annotation_pipeline.mdx`.
* `pyproject.toml`: drop `datatrove>=0.4.0,<2.0.0` from the
`[annotations]` extra; the dep was only there for the (never used)
cluster executors. Comment block notes the new HF-Jobs delegation.
* `reader.py`, `lerobot_annotate.py`: drop their own datatrove /
flavor-namespace mentions.
* `docs/source/annotation_pipeline.mdx`:
- remove the flavor-namespace / sidecar paragraph (out of scope —
"multiple revisions = multiple copies" is dataset-level policy);
- remove the "writer drops the legacy `subtask_index` column" note
(already covered by PR 1's intentional-break call-out);
- remove the chat-template + `apply_chat_template(messages, tools=...)`
line (covered by Tools doc);
- replace the "executor picks Local vs Slurm" paragraph with
`--executor.episode_parallelism` and a pointer to HF Jobs;
- rewrite the style→recipe section to talk about "recipes" generically
instead of pinning a specific YAML;
- add a "Running on Hugging Face Jobs" section pointing at
`examples/annotation/run_hf_job.py`;
- add a "Running locally" example matching the CLI's docstring
(`uv run lerobot-annotate --root=... --vlm.model_id=...`);
- extend the paper-inspirations list with Pi0.7 and Steerable VLA
Policies (Zhao 2025) for Module 3.
Tests: same 3 pre-existing failures as before this commit (2 module
assertions still in flight; 1 carryover from PR 1). 41/44 pass.
Pre-commit clean.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-08 11:09:22 +02:00
|
|
|
|
## Running on Hugging Face Jobs
|
|
|
|
|
|
|
2026-06-04 11:48:59 +02:00
|
|
|
|
Annotation runs on [Hugging Face Jobs](https://huggingface.co/docs/hub/en/jobs).
|
|
|
|
|
|
The repo ships a launcher script you copy and tweak for your dataset:
|
refactor(annotate): delegate distribution to HF Jobs; drop SLURM/local switch
The executor previously claimed it would "optionally hand off" to
datatrove's LocalPipelineExecutor or SlurmPipelineExecutor — but it
already runs phases inline in every code path, and HF Jobs (see
``examples/annotation/run_hf_job.py``) is the actual distribution
strategy. Stop pretending we have an executor selector.
* `executor.py`: drop `select_executor_class`, the "kind" log line, and
the references to LocalPipelineExecutor / SlurmPipelineExecutor.
Module docstring now says distribution is delegated to HF Jobs.
* `config.py`: drop `auto_threshold`, `force_local`, `slurm_partition`,
`slurm_gpus`, `slurm_time`, `workers`. `ExecutorConfig` keeps only
`episode_parallelism`. While here, prune the longer "why" docstrings
on every field down to the load-bearing bits — full story moves to
`docs/source/annotation_pipeline.mdx`.
* `pyproject.toml`: drop `datatrove>=0.4.0,<2.0.0` from the
`[annotations]` extra; the dep was only there for the (never used)
cluster executors. Comment block notes the new HF-Jobs delegation.
* `reader.py`, `lerobot_annotate.py`: drop their own datatrove /
flavor-namespace mentions.
* `docs/source/annotation_pipeline.mdx`:
- remove the flavor-namespace / sidecar paragraph (out of scope —
"multiple revisions = multiple copies" is dataset-level policy);
- remove the "writer drops the legacy `subtask_index` column" note
(already covered by PR 1's intentional-break call-out);
- remove the chat-template + `apply_chat_template(messages, tools=...)`
line (covered by Tools doc);
- replace the "executor picks Local vs Slurm" paragraph with
`--executor.episode_parallelism` and a pointer to HF Jobs;
- rewrite the style→recipe section to talk about "recipes" generically
instead of pinning a specific YAML;
- add a "Running on Hugging Face Jobs" section pointing at
`examples/annotation/run_hf_job.py`;
- add a "Running locally" example matching the CLI's docstring
(`uv run lerobot-annotate --root=... --vlm.model_id=...`);
- extend the paper-inspirations list with Pi0.7 and Steerable VLA
Policies (Zhao 2025) for Module 3.
Tests: same 3 pre-existing failures as before this commit (2 module
assertions still in flight; 1 carryover from PR 1). 41/44 pass.
Pre-commit clean.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-08 11:09:22 +02:00
|
|
|
|
|
|
|
|
|
|
```bash
|
2026-05-18 12:03:25 +02:00
|
|
|
|
HF_TOKEN=hf_... uv run python examples/annotations/run_hf_job.py
|
refactor(annotate): delegate distribution to HF Jobs; drop SLURM/local switch
The executor previously claimed it would "optionally hand off" to
datatrove's LocalPipelineExecutor or SlurmPipelineExecutor — but it
already runs phases inline in every code path, and HF Jobs (see
``examples/annotation/run_hf_job.py``) is the actual distribution
strategy. Stop pretending we have an executor selector.
* `executor.py`: drop `select_executor_class`, the "kind" log line, and
the references to LocalPipelineExecutor / SlurmPipelineExecutor.
Module docstring now says distribution is delegated to HF Jobs.
* `config.py`: drop `auto_threshold`, `force_local`, `slurm_partition`,
`slurm_gpus`, `slurm_time`, `workers`. `ExecutorConfig` keeps only
`episode_parallelism`. While here, prune the longer "why" docstrings
on every field down to the load-bearing bits — full story moves to
`docs/source/annotation_pipeline.mdx`.
* `pyproject.toml`: drop `datatrove>=0.4.0,<2.0.0` from the
`[annotations]` extra; the dep was only there for the (never used)
cluster executors. Comment block notes the new HF-Jobs delegation.
* `reader.py`, `lerobot_annotate.py`: drop their own datatrove /
flavor-namespace mentions.
* `docs/source/annotation_pipeline.mdx`:
- remove the flavor-namespace / sidecar paragraph (out of scope —
"multiple revisions = multiple copies" is dataset-level policy);
- remove the "writer drops the legacy `subtask_index` column" note
(already covered by PR 1's intentional-break call-out);
- remove the chat-template + `apply_chat_template(messages, tools=...)`
line (covered by Tools doc);
- replace the "executor picks Local vs Slurm" paragraph with
`--executor.episode_parallelism` and a pointer to HF Jobs;
- rewrite the style→recipe section to talk about "recipes" generically
instead of pinning a specific YAML;
- add a "Running on Hugging Face Jobs" section pointing at
`examples/annotation/run_hf_job.py`;
- add a "Running locally" example matching the CLI's docstring
(`uv run lerobot-annotate --root=... --vlm.model_id=...`);
- extend the paper-inspirations list with Pi0.7 and Steerable VLA
Policies (Zhao 2025) for Module 3.
Tests: same 3 pre-existing failures as before this commit (2 module
assertions still in flight; 1 carryover from PR 1). 41/44 pass.
Pre-commit clean.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-08 11:09:22 +02:00
|
|
|
|
```
|
|
|
|
|
|
|
2026-06-04 11:48:59 +02:00
|
|
|
|
[`run_hf_job.py`](https://github.com/huggingface/lerobot/blob/main/examples/annotations/run_hf_job.py)
|
|
|
|
|
|
starts a single-GPU `h200` job (bump it to `h200x4` for big datasets)
|
|
|
|
|
|
that:
|
refactor(annotate): delegate distribution to HF Jobs; drop SLURM/local switch
The executor previously claimed it would "optionally hand off" to
datatrove's LocalPipelineExecutor or SlurmPipelineExecutor — but it
already runs phases inline in every code path, and HF Jobs (see
``examples/annotation/run_hf_job.py``) is the actual distribution
strategy. Stop pretending we have an executor selector.
* `executor.py`: drop `select_executor_class`, the "kind" log line, and
the references to LocalPipelineExecutor / SlurmPipelineExecutor.
Module docstring now says distribution is delegated to HF Jobs.
* `config.py`: drop `auto_threshold`, `force_local`, `slurm_partition`,
`slurm_gpus`, `slurm_time`, `workers`. `ExecutorConfig` keeps only
`episode_parallelism`. While here, prune the longer "why" docstrings
on every field down to the load-bearing bits — full story moves to
`docs/source/annotation_pipeline.mdx`.
* `pyproject.toml`: drop `datatrove>=0.4.0,<2.0.0` from the
`[annotations]` extra; the dep was only there for the (never used)
cluster executors. Comment block notes the new HF-Jobs delegation.
* `reader.py`, `lerobot_annotate.py`: drop their own datatrove /
flavor-namespace mentions.
* `docs/source/annotation_pipeline.mdx`:
- remove the flavor-namespace / sidecar paragraph (out of scope —
"multiple revisions = multiple copies" is dataset-level policy);
- remove the "writer drops the legacy `subtask_index` column" note
(already covered by PR 1's intentional-break call-out);
- remove the chat-template + `apply_chat_template(messages, tools=...)`
line (covered by Tools doc);
- replace the "executor picks Local vs Slurm" paragraph with
`--executor.episode_parallelism` and a pointer to HF Jobs;
- rewrite the style→recipe section to talk about "recipes" generically
instead of pinning a specific YAML;
- add a "Running on Hugging Face Jobs" section pointing at
`examples/annotation/run_hf_job.py`;
- add a "Running locally" example matching the CLI's docstring
(`uv run lerobot-annotate --root=... --vlm.model_id=...`);
- extend the paper-inspirations list with Pi0.7 and Steerable VLA
Policies (Zhao 2025) for Module 3.
Tests: same 3 pre-existing failures as before this commit (2 module
assertions still in flight; 1 carryover from PR 1). 41/44 pass.
Pre-commit clean.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-08 11:09:22 +02:00
|
|
|
|
|
2026-06-04 11:45:38 +02:00
|
|
|
|
1. installs `lerobot` (from `main`) plus the annotation extras,
|
2026-06-04 11:48:59 +02:00
|
|
|
|
2. boots one vLLM server per GPU (using the `vllm/vllm-openai` image) and
|
|
|
|
|
|
drives it over the OpenAI-compatible API,
|
2026-05-18 12:03:25 +02:00
|
|
|
|
3. runs the `plan` / `interjections` / `vqa` modules across the dataset
|
2026-06-04 11:48:59 +02:00
|
|
|
|
with `lerobot-annotate`,
|
|
|
|
|
|
4. with `--push_to_hub=true`, uploads the result to `--new_repo_id` (or
|
|
|
|
|
|
back to `--repo_id` in place if you leave that unset).
|
refactor(annotate): delegate distribution to HF Jobs; drop SLURM/local switch
The executor previously claimed it would "optionally hand off" to
datatrove's LocalPipelineExecutor or SlurmPipelineExecutor — but it
already runs phases inline in every code path, and HF Jobs (see
``examples/annotation/run_hf_job.py``) is the actual distribution
strategy. Stop pretending we have an executor selector.
* `executor.py`: drop `select_executor_class`, the "kind" log line, and
the references to LocalPipelineExecutor / SlurmPipelineExecutor.
Module docstring now says distribution is delegated to HF Jobs.
* `config.py`: drop `auto_threshold`, `force_local`, `slurm_partition`,
`slurm_gpus`, `slurm_time`, `workers`. `ExecutorConfig` keeps only
`episode_parallelism`. While here, prune the longer "why" docstrings
on every field down to the load-bearing bits — full story moves to
`docs/source/annotation_pipeline.mdx`.
* `pyproject.toml`: drop `datatrove>=0.4.0,<2.0.0` from the
`[annotations]` extra; the dep was only there for the (never used)
cluster executors. Comment block notes the new HF-Jobs delegation.
* `reader.py`, `lerobot_annotate.py`: drop their own datatrove /
flavor-namespace mentions.
* `docs/source/annotation_pipeline.mdx`:
- remove the flavor-namespace / sidecar paragraph (out of scope —
"multiple revisions = multiple copies" is dataset-level policy);
- remove the "writer drops the legacy `subtask_index` column" note
(already covered by PR 1's intentional-break call-out);
- remove the chat-template + `apply_chat_template(messages, tools=...)`
line (covered by Tools doc);
- replace the "executor picks Local vs Slurm" paragraph with
`--executor.episode_parallelism` and a pointer to HF Jobs;
- rewrite the style→recipe section to talk about "recipes" generically
instead of pinning a specific YAML;
- add a "Running on Hugging Face Jobs" section pointing at
`examples/annotation/run_hf_job.py`;
- add a "Running locally" example matching the CLI's docstring
(`uv run lerobot-annotate --root=... --vlm.model_id=...`);
- extend the paper-inspirations list with Pi0.7 and Steerable VLA
Policies (Zhao 2025) for Module 3.
Tests: same 3 pre-existing failures as before this commit (2 module
assertions still in flight; 1 carryover from PR 1). 41/44 pass.
Pre-commit clean.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-08 11:09:22 +02:00
|
|
|
|
|
2026-06-04 11:48:59 +02:00
|
|
|
|
To use a different dataset, model, or hub repo, edit the `CMD` block in
|
|
|
|
|
|
the script. Every flag there maps directly to a `lerobot-annotate` flag
|
|
|
|
|
|
(run `lerobot-annotate --help` for the full list).
|
feat: language annotation pipeline (PR 2/3)
Adds the steerable annotation pipeline (`lerobot-annotate`) that populates
the `language_persistent` and `language_events` columns introduced in
PR 1 directly into `data/chunk-*/file-*.parquet`. No flavor namespace,
no sidecar tree.
Modules produced:
- Module 1 (plan_subtasks_memory): Pi0.7-style subtasks, plan (init +
refresh on interjection), MEM-style memory at subtask boundaries.
- Module 2 (interjections_and_speech): t=0 speech-only acknowledgement,
mid-episode paired interjection + speech tool-call atom.
- Module 3 (general_vqa): bbox/keypoint/count/attribute/spatial pairs at
configurable cadence with one-retry JSON validation.
Writer enforces: per-episode persistent identity, exact-frame event
timestamps, column routing per `column_for_style`, dataset-level `tools`
column with the `say` schema, drops legacy `subtask_index`. Validator
runs against staged JSONL artifacts before the writer rewrites parquet.
Adds `lerobot-annotate` console script, `annotations` extra (datatrove +
optional vllm), `make annotation-e2e` opt-in smoke target, and
`docs/source/annotation_pipeline.mdx`.
Branched from PR 1 (`feat/language-columns`).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-27 16:22:51 +02:00
|
|
|
|
|
annotate: address review feedback — bug fixes, docs/code drift, naming, cleanup
Bugs
* validator: don't re-raise on unknown style. The second column_for_style
lookup (used to route persistent vs event) now sits in try/except so an
unknown style is recorded by _check_column_routing and skipped instead
of crashing the whole validation pass.
* general_vqa._target_cameras: when restrict_to_default_camera is set but
the configured camera_key isn't one the provider exposes, warn and fall
back to all cameras instead of returning a phantom key that KeyErrors
deep in frame decode.
* interjections: clamp interjection timestamps to frame_timestamps[0]
rather than a hardcoded 0.0 (datasets can start at non-zero t).
Docs / code drift
* annotation_pipeline.mdx: drop the phantom 'vocabulary discovery / phase
0 / --vocabulary.* / canonical_vocabulary.json' section (none of it
exists); describe the real describe->segment + coverage-stitch flow.
Soften the src/lerobot/tools/ + TOOL_REGISTRY reference to 'not part of
this PR' (matches tools.mdx, which already marks the runtime layer as
not-yet-implemented). Fix the --push_to_hub/--new_repo_id wording. Note
the default is now a single h200. Add a 'Contributing new modules'
section inviting module / prompt / quality contributions.
* executor docstring: six phases, no phantom phase 0.
run_hf_job.py
* add the Apache 2.0 license header (was flagged repeatedly).
* default to a single GPU: flavor=h200, parallel_servers=1, num_gpus=1
(scale to h200x4 noted in the docstring).
* pin the install to @main instead of the feature branch (won't break
after merge).
Naming / cleanup
* rename dest_repo_id -> new_repo_id across config / script / example /
test to match the LeRobot dataset edit tools.
* rename prompt templates module_N_*.txt -> descriptive (plan_*,
interjections_*, vqa.txt) and update every load_prompt() call.
* remove dead _messages_to_prompt (used only by the removed in-process
backends).
* declare _warned_decode_fail (frames) and _warned_no_camera (vqa) as
real init=False dataclass fields instead of getattr monkey-patches.
* scope bandit B607 to the two ffmpeg subprocess.run sites via
'# nosec B607' and drop it from the global skip list.
Tests
* fix stale canned-VLM markers ('ONE realistic interruption' ->
'compact interjection', 'Update the memory' -> 'compressed semantic
memory') and drop the dead 'concise hierarchical PLAN' plan responders
(plan generation is deterministic now) in run_e2e_smoke,
test_pipeline_recipe_render, test_modules.
* run_e2e_smoke now asserts interjection + speech rows are produced so a
stale marker can't silently pass again.
* drop remaining 'PR 1' / 'PR 2' references from test comments / names.
Verified: tests/annotations + tests/datasets/test_language +
tests/scripts/test_lerobot_annotate (31 passed); make-style E2E smoke
(interjections=1 speech_atoms=2); pre-commit (ruff, mypy, bandit,
prettier) clean.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-06-03 18:30:46 +02:00
|
|
|
|
## Contributing new modules
|
|
|
|
|
|
|
2026-06-04 11:48:59 +02:00
|
|
|
|
The pipeline is built to grow, and **contributions are very welcome** —
|
|
|
|
|
|
a brand-new module (say, trajectory traces or affordances), a new prompt
|
|
|
|
|
|
template, a smarter grounding flow, or quality fixes to the existing
|
|
|
|
|
|
`plan` / `interjections` / `vqa` modules.
|
|
|
|
|
|
|
|
|
|
|
|
Every module lives under
|
annotate: address review feedback — bug fixes, docs/code drift, naming, cleanup
Bugs
* validator: don't re-raise on unknown style. The second column_for_style
lookup (used to route persistent vs event) now sits in try/except so an
unknown style is recorded by _check_column_routing and skipped instead
of crashing the whole validation pass.
* general_vqa._target_cameras: when restrict_to_default_camera is set but
the configured camera_key isn't one the provider exposes, warn and fall
back to all cameras instead of returning a phantom key that KeyErrors
deep in frame decode.
* interjections: clamp interjection timestamps to frame_timestamps[0]
rather than a hardcoded 0.0 (datasets can start at non-zero t).
Docs / code drift
* annotation_pipeline.mdx: drop the phantom 'vocabulary discovery / phase
0 / --vocabulary.* / canonical_vocabulary.json' section (none of it
exists); describe the real describe->segment + coverage-stitch flow.
Soften the src/lerobot/tools/ + TOOL_REGISTRY reference to 'not part of
this PR' (matches tools.mdx, which already marks the runtime layer as
not-yet-implemented). Fix the --push_to_hub/--new_repo_id wording. Note
the default is now a single h200. Add a 'Contributing new modules'
section inviting module / prompt / quality contributions.
* executor docstring: six phases, no phantom phase 0.
run_hf_job.py
* add the Apache 2.0 license header (was flagged repeatedly).
* default to a single GPU: flavor=h200, parallel_servers=1, num_gpus=1
(scale to h200x4 noted in the docstring).
* pin the install to @main instead of the feature branch (won't break
after merge).
Naming / cleanup
* rename dest_repo_id -> new_repo_id across config / script / example /
test to match the LeRobot dataset edit tools.
* rename prompt templates module_N_*.txt -> descriptive (plan_*,
interjections_*, vqa.txt) and update every load_prompt() call.
* remove dead _messages_to_prompt (used only by the removed in-process
backends).
* declare _warned_decode_fail (frames) and _warned_no_camera (vqa) as
real init=False dataclass fields instead of getattr monkey-patches.
* scope bandit B607 to the two ffmpeg subprocess.run sites via
'# nosec B607' and drop it from the global skip list.
Tests
* fix stale canned-VLM markers ('ONE realistic interruption' ->
'compact interjection', 'Update the memory' -> 'compressed semantic
memory') and drop the dead 'concise hierarchical PLAN' plan responders
(plan generation is deterministic now) in run_e2e_smoke,
test_pipeline_recipe_render, test_modules.
* run_e2e_smoke now asserts interjection + speech rows are produced so a
stale marker can't silently pass again.
* drop remaining 'PR 1' / 'PR 2' references from test comments / names.
Verified: tests/annotations + tests/datasets/test_language +
tests/scripts/test_lerobot_annotate (31 passed); make-style E2E smoke
(interjections=1 speech_atoms=2); pre-commit (ruff, mypy, bandit,
prettier) clean.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-06-03 18:30:46 +02:00
|
|
|
|
`src/lerobot/annotations/steerable_pipeline/modules/`, shares the VLM
|
2026-06-04 11:48:59 +02:00
|
|
|
|
client and the keyframe cache, writes its raw output to the staging
|
|
|
|
|
|
tree, and plugs into the executor as its own phase. Got an idea? Open an
|
|
|
|
|
|
issue or PR on [the repo](https://github.com/huggingface/lerobot).
|
|
|
|
|
|
|
|
|
|
|
|
## How recipes consume the output
|
|
|
|
|
|
|
|
|
|
|
|
The annotations are meant to be read by recipes (see
|
|
|
|
|
|
[Language Columns and Recipes](./language_and_recipes)). Typically:
|
|
|
|
|
|
|
|
|
|
|
|
- low-level / high-level / memory-update branches read
|
|
|
|
|
|
`subtask` / `plan` / `memory` from `language_persistent`.
|
|
|
|
|
|
- an interjection-response branch reads `interjection` events plus the
|
|
|
|
|
|
paired speech atom (merged into one assistant turn via `tool_calls_from`)
|
|
|
|
|
|
and the matching `plan` refresh at the same timestamp.
|
|
|
|
|
|
- a VQA branch reads the `(vqa, user)` and `(vqa, assistant)` pairs from
|
|
|
|
|
|
`language_events`.
|
|
|
|
|
|
|
|
|
|
|
|
## Why state and events are split
|
|
|
|
|
|
|
|
|
|
|
|
Two ideas shape the design:
|
|
|
|
|
|
|
|
|
|
|
|
1. **Persistent state vs. exact events.** Persistent rows (`subtask`,
|
|
|
|
|
|
`plan`, `memory`) apply to the whole episode and answer "what's true
|
|
|
|
|
|
right now?". Event rows (`interjection`, `vqa`, speech) appear only on
|
|
|
|
|
|
the one frame whose timestamp matches. Timestamps are copied straight
|
|
|
|
|
|
from the source parquet — never recomputed in floating point.
|
|
|
|
|
|
2. **One VLM pass.** All three modules share a single VLM client (the
|
|
|
|
|
|
OpenAI-compatible client talking to the job's vLLM server), so you pay
|
|
|
|
|
|
for one model load per dataset, not three.
|
|
|
|
|
|
|
|
|
|
|
|
## Re-running a single module
|
|
|
|
|
|
|
|
|
|
|
|
Each module stages its raw output to
|
|
|
|
|
|
`<root>/.annotate_staging/episode_{N:06d}/<module>.jsonl`. This makes
|
|
|
|
|
|
prompt iteration cheap: re-running one module overwrites only its own
|
|
|
|
|
|
JSONL, then the writer recomposes the final parquet. Disable modules you
|
|
|
|
|
|
don't want with `--plan.enabled=false` (and likewise
|
|
|
|
|
|
`--interjections.enabled` / `--vqa.enabled`) to test one at a time.
|
|
|
|
|
|
|
|
|
|
|
|
## What the validator checks
|
|
|
|
|
|
|
|
|
|
|
|
Before the writer runs, `StagingValidator` confirms:
|
|
|
|
|
|
|
|
|
|
|
|
- every event row lands exactly on a real frame timestamp;
|
|
|
|
|
|
- no speech / interjection pairs are left orphaned;
|
feat: language annotation pipeline (PR 2/3)
Adds the steerable annotation pipeline (`lerobot-annotate`) that populates
the `language_persistent` and `language_events` columns introduced in
PR 1 directly into `data/chunk-*/file-*.parquet`. No flavor namespace,
no sidecar tree.
Modules produced:
- Module 1 (plan_subtasks_memory): Pi0.7-style subtasks, plan (init +
refresh on interjection), MEM-style memory at subtask boundaries.
- Module 2 (interjections_and_speech): t=0 speech-only acknowledgement,
mid-episode paired interjection + speech tool-call atom.
- Module 3 (general_vqa): bbox/keypoint/count/attribute/spatial pairs at
configurable cadence with one-retry JSON validation.
Writer enforces: per-episode persistent identity, exact-frame event
timestamps, column routing per `column_for_style`, dataset-level `tools`
column with the `say` schema, drops legacy `subtask_index`. Validator
runs against staged JSONL artifacts before the writer rewrites parquet.
Adds `lerobot-annotate` console script, `annotations` extra (datatrove +
optional vllm), `make annotation-e2e` opt-in smoke target, and
`docs/source/annotation_pipeline.mdx`.
Branched from PR 1 (`feat/language-columns`).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-27 16:22:51 +02:00
|
|
|
|
- `plan` is refreshed at every interjection timestamp;
|
2026-06-04 11:48:59 +02:00
|
|
|
|
- `memory` rows fall on subtask boundaries (a warning, not an error);
|
|
|
|
|
|
- each VQA assistant `content` is valid JSON in one of the
|
feat: language annotation pipeline (PR 2/3)
Adds the steerable annotation pipeline (`lerobot-annotate`) that populates
the `language_persistent` and `language_events` columns introduced in
PR 1 directly into `data/chunk-*/file-*.parquet`. No flavor namespace,
no sidecar tree.
Modules produced:
- Module 1 (plan_subtasks_memory): Pi0.7-style subtasks, plan (init +
refresh on interjection), MEM-style memory at subtask boundaries.
- Module 2 (interjections_and_speech): t=0 speech-only acknowledgement,
mid-episode paired interjection + speech tool-call atom.
- Module 3 (general_vqa): bbox/keypoint/count/attribute/spatial pairs at
configurable cadence with one-retry JSON validation.
Writer enforces: per-episode persistent identity, exact-frame event
timestamps, column routing per `column_for_style`, dataset-level `tools`
column with the `say` schema, drops legacy `subtask_index`. Validator
runs against staged JSONL artifacts before the writer rewrites parquet.
Adds `lerobot-annotate` console script, `annotations` extra (datatrove +
optional vllm), `make annotation-e2e` opt-in smoke target, and
`docs/source/annotation_pipeline.mdx`.
Branched from PR 1 (`feat/language-columns`).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-27 16:22:51 +02:00
|
|
|
|
bbox / keypoint / count / attribute / spatial shapes;
|
2026-06-04 11:48:59 +02:00
|
|
|
|
- every row goes to the column chosen by `column_for_style(style)`.
|
feat: language annotation pipeline (PR 2/3)
Adds the steerable annotation pipeline (`lerobot-annotate`) that populates
the `language_persistent` and `language_events` columns introduced in
PR 1 directly into `data/chunk-*/file-*.parquet`. No flavor namespace,
no sidecar tree.
Modules produced:
- Module 1 (plan_subtasks_memory): Pi0.7-style subtasks, plan (init +
refresh on interjection), MEM-style memory at subtask boundaries.
- Module 2 (interjections_and_speech): t=0 speech-only acknowledgement,
mid-episode paired interjection + speech tool-call atom.
- Module 3 (general_vqa): bbox/keypoint/count/attribute/spatial pairs at
configurable cadence with one-retry JSON validation.
Writer enforces: per-episode persistent identity, exact-frame event
timestamps, column routing per `column_for_style`, dataset-level `tools`
column with the `say` schema, drops legacy `subtask_index`. Validator
runs against staged JSONL artifacts before the writer rewrites parquet.
Adds `lerobot-annotate` console script, `annotations` extra (datatrove +
optional vllm), `make annotation-e2e` opt-in smoke target, and
`docs/source/annotation_pipeline.mdx`.
Branched from PR 1 (`feat/language-columns`).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-27 16:22:51 +02:00
|
|
|
|
|
2026-06-04 11:48:59 +02:00
|
|
|
|
Any error aborts the writer. Pass `--skip_validation=true` to override
|
|
|
|
|
|
while debugging.
|
feat: language annotation pipeline (PR 2/3)
Adds the steerable annotation pipeline (`lerobot-annotate`) that populates
the `language_persistent` and `language_events` columns introduced in
PR 1 directly into `data/chunk-*/file-*.parquet`. No flavor namespace,
no sidecar tree.
Modules produced:
- Module 1 (plan_subtasks_memory): Pi0.7-style subtasks, plan (init +
refresh on interjection), MEM-style memory at subtask boundaries.
- Module 2 (interjections_and_speech): t=0 speech-only acknowledgement,
mid-episode paired interjection + speech tool-call atom.
- Module 3 (general_vqa): bbox/keypoint/count/attribute/spatial pairs at
configurable cadence with one-retry JSON validation.
Writer enforces: per-episode persistent identity, exact-frame event
timestamps, column routing per `column_for_style`, dataset-level `tools`
column with the `say` schema, drops legacy `subtask_index`. Validator
runs against staged JSONL artifacts before the writer rewrites parquet.
Adds `lerobot-annotate` console script, `annotations` extra (datatrove +
optional vllm), `make annotation-e2e` opt-in smoke target, and
`docs/source/annotation_pipeline.mdx`.
Branched from PR 1 (`feat/language-columns`).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-27 16:22:51 +02:00
|
|
|
|
|
2026-06-04 11:48:59 +02:00
|
|
|
|
## Where each module's ideas come from
|
feat: language annotation pipeline (PR 2/3)
Adds the steerable annotation pipeline (`lerobot-annotate`) that populates
the `language_persistent` and `language_events` columns introduced in
PR 1 directly into `data/chunk-*/file-*.parquet`. No flavor namespace,
no sidecar tree.
Modules produced:
- Module 1 (plan_subtasks_memory): Pi0.7-style subtasks, plan (init +
refresh on interjection), MEM-style memory at subtask boundaries.
- Module 2 (interjections_and_speech): t=0 speech-only acknowledgement,
mid-episode paired interjection + speech tool-call atom.
- Module 3 (general_vqa): bbox/keypoint/count/attribute/spatial pairs at
configurable cadence with one-retry JSON validation.
Writer enforces: per-episode persistent identity, exact-frame event
timestamps, column routing per `column_for_style`, dataset-level `tools`
column with the `say` schema, drops legacy `subtask_index`. Validator
runs against staged JSONL artifacts before the writer rewrites parquet.
Adds `lerobot-annotate` console script, `annotations` extra (datatrove +
optional vllm), `make annotation-e2e` opt-in smoke target, and
`docs/source/annotation_pipeline.mdx`.
Branched from PR 1 (`feat/language-columns`).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-27 16:22:51 +02:00
|
|
|
|
|
2026-06-04 11:48:59 +02:00
|
|
|
|
- **`plan` — subtasks.** Hi Robot ([Shi 2025](https://arxiv.org/abs/2502.19417))
|
|
|
|
|
|
for atom granularity ("pick up one piece of lettuce", "place bowl to
|
|
|
|
|
|
box"); Pi0.7 ([Physical Intelligence 2025](https://pi.website/pi07))
|
|
|
|
|
|
for "how, not what" detail.
|
|
|
|
|
|
- **`plan` — memory.** MEM ([Torne 2026](https://arxiv.org/abs/2603.03596)):
|
|
|
|
|
|
keep only the minimal relevant information — preserve outcomes, drop
|
|
|
|
|
|
specific attributes.
|
|
|
|
|
|
- **`interjections`.** Hi Robot's scenario taxonomy: negative task,
|
feat: language annotation pipeline (PR 2/3)
Adds the steerable annotation pipeline (`lerobot-annotate`) that populates
the `language_persistent` and `language_events` columns introduced in
PR 1 directly into `data/chunk-*/file-*.parquet`. No flavor namespace,
no sidecar tree.
Modules produced:
- Module 1 (plan_subtasks_memory): Pi0.7-style subtasks, plan (init +
refresh on interjection), MEM-style memory at subtask boundaries.
- Module 2 (interjections_and_speech): t=0 speech-only acknowledgement,
mid-episode paired interjection + speech tool-call atom.
- Module 3 (general_vqa): bbox/keypoint/count/attribute/spatial pairs at
configurable cadence with one-retry JSON validation.
Writer enforces: per-episode persistent identity, exact-frame event
timestamps, column routing per `column_for_style`, dataset-level `tools`
column with the `say` schema, drops legacy `subtask_index`. Validator
runs against staged JSONL artifacts before the writer rewrites parquet.
Adds `lerobot-annotate` console script, `annotations` extra (datatrove +
optional vllm), `make annotation-e2e` opt-in smoke target, and
`docs/source/annotation_pipeline.mdx`.
Branched from PR 1 (`feat/language-columns`).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-27 16:22:51 +02:00
|
|
|
|
situated correction, specific constraint, preference. Speech is a
|
2026-06-04 11:48:59 +02:00
|
|
|
|
tool-call-only atom
|
|
|
|
|
|
(`tool_calls=[{type:function, function:{name:"say", arguments:{text:...}}}]`).
|
|
|
|
|
|
- **`vqa`.** ECoT ([Zawalski 2024](https://arxiv.org/abs/2407.08693)) for
|
|
|
|
|
|
grounded features (pixel bounding boxes `[x_min, y_min, x_max, y_max]`,
|
|
|
|
|
|
keypoints) and Steerable VLA Policies
|
|
|
|
|
|
([Zhao 2025](https://arxiv.org/abs/2509.07626)) for multi-abstraction
|
|
|
|
|
|
grounding. Pi0.7 also grounds answers across abstraction levels.
|
|
|
|
|
|
|
|
|
|
|
|
When improving a module, tweak its prompt template in
|
|
|
|
|
|
`src/lerobot/annotations/steerable_pipeline/prompts/` rather than
|
|
|
|
|
|
rewriting from scratch.
|
|
|
|
|
|
|
|
|
|
|
|
## Roughly how much it costs
|
|
|
|
|
|
|
|
|
|
|
|
Per episode, the pipeline makes about `max_steps` plan calls,
|
|
|
|
|
|
`max_interjections_per_episode` interjection calls, and
|
|
|
|
|
|
`vqa_emission_hz × episode_seconds` VQA calls. With the defaults (8
|
|
|
|
|
|
subtasks, 1 interjection, 1 Hz × 3 pairs) on a 30-second episode, that's
|
|
|
|
|
|
~50 VLM calls.
|
|
|
|
|
|
|
|
|
|
|
|
Storage stays small: `language_persistent` is at most tens of KB per
|
|
|
|
|
|
episode (parquet dictionary-encodes the one entry that repeats across
|
|
|
|
|
|
frames), and `language_events` is empty on most frames — its size scales
|
|
|
|
|
|
with the number of emissions, not `num_frames × num_emissions`.
|