Files
lerobot-clone/examples/annotations/run_hf_job.py
Pepijn 518e191337 annotate: windowed subtask generation for constant temporal density
Long episodes no longer get sparse subtasks. Previously a long episode
was subsampled to max_video_frames=32 across its whole duration (~1
frame/4s for a 2-min clip). New opt-in windowing keeps a CONSTANT
frames_per_second density by splitting the episode into fixed-length
windows and running the subtask chain per window.

New PlanConfig.subtask_window_seconds (default 0.0 = off). When > 0 and
the episode is longer than one window:
  * episode is split into consecutive [w0, w1] windows of this length
  * each window's frames are sampled at frames_per_second (so a 32s
    window at 1 fps = 32 frames, filling but not exceeding the per-call
    context budget)
  * the full describe -> segment -> verify chain runs PER window, in
    window-relative time [0, L]; spans are offset back to absolute
  * all windows' spans are merged, frame-snap-deduped, and stitched into
    one contiguous whole-episode cover

Implementation:
  * _episode_video_block / _video_message / _describe_episode /
    _verify_subtasks gain an optional window=(w0,w1); when set they
    embed frames sampled in that absolute range at frames_per_second
    (video_url path skipped — it's whole-episode).
  * _clean_spans gains bounds= (override clamp range, for window-relative
    spans) and dedupe= (skip frame-snap until the merged absolute set).
  * new _generate_subtasks_windowed + _subtasks_for_window orchestrate
    the loop; _generate_subtasks branches to them when window_s > 0.

run_hf_job.py: --plan.subtask_window_seconds=32 (32s windows at 1 fps).
Cost scales with episode length (chain calls × ceil(duration/window)).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-06-02 16:26:14 +02:00

5.6 KiB