tests/scripts/test_lerobot_annotate.py

#!/usr/bin/env python

import json
from types import SimpleNamespace

import pytest

# ``lerobot.scripts.lerobot_annotate`` (and the ``_push_to_hub`` path it
# exercises) imports ``lerobot.datasets``, which only ships under the
# ``dataset`` extra. Skip in tiers without it instead of erroring.
pytest.importorskip("datasets", reason="datasets is required (install lerobot[dataset])")


def test_push_to_hub_tags_uploaded_dataset_revision(tmp_path, monkeypatch):
    from lerobot.scripts.lerobot_annotate import _push_to_hub

    root = tmp_path / "dataset"
    (root / "meta").mkdir(parents=True)
    (root / "meta" / "info.json").write_text(json.dumps({"codebase_version": "v3.0"}))

    calls = {}

    class FakeHfApi:
        def create_repo(self, **kwargs):
            calls["create_repo"] = kwargs

        def upload_folder(self, **kwargs):
            calls["upload_folder"] = kwargs
            return SimpleNamespace(oid="abc123")

        def create_tag(self, **kwargs):
            calls["create_tag"] = kwargs

    monkeypatch.setattr("huggingface_hub.HfApi", FakeHfApi)

    cfg = SimpleNamespace(
        repo_id="source/dataset",
        new_repo_id="annotated/dataset",
        push_private=True,
        push_commit_message=None,
    )

    _push_to_hub(root, cfg)

    assert calls["create_repo"] == {
        "repo_id": "annotated/dataset",
        "repo_type": "dataset",
        "private": True,
        "exist_ok": True,
    }
    assert calls["upload_folder"]["repo_id"] == "annotated/dataset"
    assert calls["create_tag"] == {
        "repo_id": "annotated/dataset",
        "tag": "v3.0",
        "repo_type": "dataset",
        "exist_ok": True,
        "revision": "abc123",
    }
fix(annotate): tag uploaded dataset revision Co-authored-by: Cursor <cursoragent@cursor.com> 2026-05-19 12:44:35 +00:00			`#!/usr/bin/env python`

			`import json`
			`from types import SimpleNamespace`

tests(annotations): guard on the 'dataset' extra so base fast-test tier skips cleanly Fast Pytest Tests failed at COLLECTION in the base '--extra test' tier with 'ModuleNotFoundError: No module named datasets': tests/annotations/ conftest.py imported the fixture dataset builder (-> lerobot.datasets -> the HF 'datasets' lib + pandas/pyarrow), which only ship under the 'dataset' extra, so the whole annotations package crashed. Fix uses the repo's proven module-level guard pattern (see tests/datasets/test_language.py), NOT a conftest-level importorskip — verified empirically that pytest.importorskip raised during conftest import is treated as a collection ERROR (exit 1), while module-level importorskip is a clean SKIP. * conftest.py: import build_annotation_dataset LAZILY inside the fixtures so the conftest itself imports cleanly in every tier. * test_modules / test_validator / test_writer / test_pipeline_recipe_ render: add module-level pytest.importorskip('datasets') + ('pandas') before the pyarrow / lerobot.* imports (# noqa: E402 to match the existing convention). pyarrow-importing modules place the guard before the pyarrow import. * tests/scripts/test_lerobot_annotate.py: same guard (its _push_to_hub path imports lerobot.datasets). Result: - base / hardware / viz tiers (no dataset extra): annotation tests skip cleanly; the rest of the suite runs -> exit 0. - dataset tier: datasets present -> guards pass through -> annotation tests run with the stub VLM. The pipeline modules import only stdlib + relative + lerobot.datasets (no module-level datatrove / vllm / openai), so they import fine there. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> 2026-06-03 15:56:53 +02:00			`import pytest`

			# ``lerobot.scripts.lerobot_annotate`` (and the ``_push_to_hub`` path it
			# exercises) imports ``lerobot.datasets``, which only ships under the
			# ``dataset`` extra. Skip in tiers without it instead of erroring.
			`pytest.importorskip("datasets", reason="datasets is required (install lerobot[dataset])")`

fix(annotate): tag uploaded dataset revision Co-authored-by: Cursor <cursoragent@cursor.com> 2026-05-19 12:44:35 +00:00
			`def test_push_to_hub_tags_uploaded_dataset_revision(tmp_path, monkeypatch):`
			`from lerobot.scripts.lerobot_annotate import _push_to_hub`

			`root = tmp_path / "dataset"`
			`(root / "meta").mkdir(parents=True)`
			`(root / "meta" / "info.json").write_text(json.dumps({"codebase_version": "v3.0"}))`

			`calls = {}`

			`class FakeHfApi:`
			`def create_repo(self, **kwargs):`
			`calls["create_repo"] = kwargs`

			`def upload_folder(self, **kwargs):`
			`calls["upload_folder"] = kwargs`
			`return SimpleNamespace(oid="abc123")`

			`def create_tag(self, **kwargs):`
			`calls["create_tag"] = kwargs`

			`monkeypatch.setattr("huggingface_hub.HfApi", FakeHfApi)`

			`cfg = SimpleNamespace(`
			`repo_id="source/dataset",`
annotate: address review feedback — bug fixes, docs/code drift, naming, cleanup Bugs * validator: don't re-raise on unknown style. The second column_for_style lookup (used to route persistent vs event) now sits in try/except so an unknown style is recorded by _check_column_routing and skipped instead of crashing the whole validation pass. * general_vqa._target_cameras: when restrict_to_default_camera is set but the configured camera_key isn't one the provider exposes, warn and fall back to all cameras instead of returning a phantom key that KeyErrors deep in frame decode. * interjections: clamp interjection timestamps to frame_timestamps[0] rather than a hardcoded 0.0 (datasets can start at non-zero t). Docs / code drift * annotation_pipeline.mdx: drop the phantom 'vocabulary discovery / phase 0 / --vocabulary.* / canonical_vocabulary.json' section (none of it exists); describe the real describe->segment + coverage-stitch flow. Soften the src/lerobot/tools/ + TOOL_REGISTRY reference to 'not part of this PR' (matches tools.mdx, which already marks the runtime layer as not-yet-implemented). Fix the --push_to_hub/--new_repo_id wording. Note the default is now a single h200. Add a 'Contributing new modules' section inviting module / prompt / quality contributions. * executor docstring: six phases, no phantom phase 0. run_hf_job.py * add the Apache 2.0 license header (was flagged repeatedly). * default to a single GPU: flavor=h200, parallel_servers=1, num_gpus=1 (scale to h200x4 noted in the docstring). * pin the install to @main instead of the feature branch (won't break after merge). Naming / cleanup * rename dest_repo_id -> new_repo_id across config / script / example / test to match the LeRobot dataset edit tools. * rename prompt templates module_N_.txt -> descriptive (plan_, interjections_, vqa.txt) and update every load_prompt() call. remove dead _messages_to_prompt (used only by the removed in-process backends). * declare _warned_decode_fail (frames) and _warned_no_camera (vqa) as real init=False dataclass fields instead of getattr monkey-patches. * scope bandit B607 to the two ffmpeg subprocess.run sites via '# nosec B607' and drop it from the global skip list. Tests * fix stale canned-VLM markers ('ONE realistic interruption' -> 'compact interjection', 'Update the memory' -> 'compressed semantic memory') and drop the dead 'concise hierarchical PLAN' plan responders (plan generation is deterministic now) in run_e2e_smoke, test_pipeline_recipe_render, test_modules. * run_e2e_smoke now asserts interjection + speech rows are produced so a stale marker can't silently pass again. * drop remaining 'PR 1' / 'PR 2' references from test comments / names. Verified: tests/annotations + tests/datasets/test_language + tests/scripts/test_lerobot_annotate (31 passed); make-style E2E smoke (interjections=1 speech_atoms=2); pre-commit (ruff, mypy, bandit, prettier) clean. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> 2026-06-03 18:30:46 +02:00			`new_repo_id="annotated/dataset",`
fix(annotate): tag uploaded dataset revision Co-authored-by: Cursor <cursoragent@cursor.com> 2026-05-19 12:44:35 +00:00			`push_private=True,`
			`push_commit_message=None,`
			`)`

			`_push_to_hub(root, cfg)`

			`assert calls["create_repo"] == {`
			`"repo_id": "annotated/dataset",`
			`"repo_type": "dataset",`
			`"private": True,`
			`"exist_ok": True,`
			`}`
			`assert calls["upload_folder"]["repo_id"] == "annotated/dataset"`
			`assert calls["create_tag"] == {`
			`"repo_id": "annotated/dataset",`
			`"tag": "v3.0",`
			`"repo_type": "dataset",`
			`"exist_ok": True,`
			`"revision": "abc123",`
			`}`