mirror of
https://github.com/huggingface/lerobot.git
synced 2026-05-30 02:11:25 +00:00
annotate(plan): force composite-action subtasks; ban ultra-fine splits
Tighten ``module_1_subtasks.txt`` so the VLM emits one composite atomic action per subtask instead of decomposing every pick into ``move to X`` / ``grasp X`` / ``lift X``: - Lock the verb vocabulary to the composite set the low-level policy actually learns end-to-end: ``pick up`` (approach + grasp + lift), ``put``/``place`` (transport + release), ``push``, ``pull``, ``turn``, ``press``, ``open``, ``close``, ``pour``, ``insert``. ``go to`` is allowed only as a pure relocation between phases. - Add an explicit ``Forbidden ultra-fine splits`` block enumerating the patterns the VLM was tempted to emit (``move to X``, ``reach for X``, ``grasp X``, ``lift X``, ``release X``) and instructing it to fold each into its parent composite. - Rewrite the Good/Bad examples to match the composite contract; the previous ``"move to blue cube" / "grasp blue cube" / "lift blue cube"`` Good list was actively encouraging the over- segmentation pattern this prompt is supposed to prevent. - Tighten the duration rule: candidates shorter than ``min_subtask_seconds`` must be merged into a neighbour rather than emitted. Pairs with bumping the runtime floor to 3 s so composites have room to land. Pure prompt change — no code or schema change. Existing canonical- vocabulary retry path is unaffected (the new verb whitelist lives in prose, not in the validator). Co-authored-by: Cursor <cursoragent@cursor.com>
This commit is contained in:
@@ -8,14 +8,42 @@ the robot performs.
|
||||
|
||||
{vocabulary_block}Authoring rules — Hi Robot atom granularity, pi0.7-style short prompts:
|
||||
|
||||
- Each subtask = one atomic skill the low-level policy can execute.
|
||||
- Write each subtask as an IMPERATIVE COMMAND, starting with a verb:
|
||||
move, reach, pick up, grasp, place, put, push, pull, open, close,
|
||||
turn, press, lift, insert, pour...
|
||||
- Each subtask = one COMPOSITE atomic skill the low-level policy can
|
||||
execute end-to-end. A "skill" bundles its own approach motion with
|
||||
its terminal action — do NOT split the approach off as its own
|
||||
subtask. The whole-arm policy already learns to reach as part of
|
||||
every manipulation primitive.
|
||||
- Write each subtask as an IMPERATIVE COMMAND, starting with one of
|
||||
these verbs (extend only when none fits):
|
||||
pick up <obj> — approach + grasp + lift in one subtask
|
||||
put <obj> on/in <loc> — transport + release in one subtask
|
||||
place <obj> on/in <loc> — synonym of "put"; pick one and stay consistent
|
||||
push <obj> — contact + linear shove
|
||||
pull <obj> — contact + linear retract
|
||||
turn <knob/dial/handle> — rotary actuation
|
||||
press <button> — single-press contact
|
||||
open <drawer/door/lid> — full open motion
|
||||
close <drawer/door/lid> — full close motion
|
||||
pour <src> into <dst> — tilt + flow
|
||||
insert <obj> into <slot>— alignment + push-fit
|
||||
go to <loc> — ONLY when no grasp / actuation follows
|
||||
(e.g. a pure relocation between phases).
|
||||
If the next subtask grasps something at
|
||||
that location, drop "go to ..." and just
|
||||
write "pick up ..." instead.
|
||||
- Forbidden ultra-fine splits — the VLM is NOT allowed to emit these
|
||||
as standalone subtasks; fold them into the parent composite:
|
||||
"move to X" → fold into "pick up X" (or whatever follows)
|
||||
"reach for X" → fold into "pick up X"
|
||||
"grasp X" → fold into "pick up X"
|
||||
"lift X" → fold into "pick up X" (or "put X on Y" if it's
|
||||
the transport phase of a place)
|
||||
"release X" → fold into "put X on Y" (or "place X in Y")
|
||||
- Keep it SHORT — a verb phrase, not a sentence. Drop articles
|
||||
("the", "a") and adverbs ("carefully", "slowly"). Add a "how"
|
||||
detail (which hand, which grasp point) ONLY when it is needed to
|
||||
disambiguate.
|
||||
disambiguate. Every subtask must begin with one of the verbs
|
||||
above (no leading nouns, no "then", no "first").
|
||||
- NEVER use third person. Never write "the robot", "the arm", "the
|
||||
gripper moves", "it picks up" — the robot is implied. Command it,
|
||||
do not describe it.
|
||||
@@ -23,16 +51,22 @@ the robot performs.
|
||||
"cube", every subtask says "cube" — never switch to "block". If it
|
||||
says "box", never switch to "bin"/"container". Keep vocabulary
|
||||
consistent across the whole episode.
|
||||
- Good: "move to blue cube", "grasp blue cube", "lift blue cube",
|
||||
"place blue cube in box", "open drawer", "release yellow cube".
|
||||
- Bad: "the robot arm moves towards the blue cube" (third person,
|
||||
too long), "carefully pick up the cube" (adverb, article),
|
||||
"release the yellow block" ("block" when the task said "cube").
|
||||
- Good: "pick up blue cube", "put blue cube in box", "open drawer",
|
||||
"turn red knob", "press start button", "go to sink".
|
||||
- Bad: "move to blue cube" (approach as its own subtask — forbidden,
|
||||
must be folded into "pick up blue cube"); "the robot arm moves
|
||||
towards the blue cube" (third person, too long); "carefully pick
|
||||
up the cube" (adverb, article); "release the yellow block"
|
||||
("block" when the task said "cube", and "release" must be folded
|
||||
into a "put"/"place" subtask).
|
||||
- Subtasks are non-overlapping and cover the full episode in order.
|
||||
Choose the cut points yourself based on what you see in the video
|
||||
(gripper open/close events, contact, regrasps, transitions).
|
||||
- Each subtask spans at least {min_subtask_seconds} seconds.
|
||||
- Do not exceed {max_steps} subtasks total.
|
||||
- Each subtask spans at least {min_subtask_seconds} seconds. If a
|
||||
candidate span would be shorter, merge it into its neighbour
|
||||
rather than emitting it.
|
||||
- Do not exceed {max_steps} subtasks total. Fewer, larger composites
|
||||
are preferred over many micro-steps.
|
||||
- Every subtask's [start_time, end_time] must lie within
|
||||
[0.0, {episode_duration}] seconds.
|
||||
|
||||
|
||||
Reference in New Issue
Block a user