Files
lerobot-clone/tests/annotations/test_frames.py

68 lines
2.6 KiB
Python
Raw Normal View History

review: fix dead-code bug, add thread safety, atomic writes, smaller cleanups **Critical: video_for_episode was unreachable dead code.** ``video_for_episode`` was indented inside ``_decode_pyav_direct``, after its ``return`` statement — Python parsed it as a nested function that never executed. Module 1's ``_episode_video_block`` calls ``self.frame_provider.video_for_episode(record, target_count)`` on the ``use_video_url=False`` path, which would have AttributeError'd on any real dataset. Tests passed only because they used ``_StubFrameProvider`` / ``_NullProvider`` which have the method. Moved it to be a proper method of ``VideoFrameProvider`` (right after ``frames_at``). **Thread safety on VideoFrameProvider.** The executor runs Module 1/2/3 phases under a ``ThreadPoolExecutor``, so the per-instance ``_cache`` dict and the one-shot ``_warned_decode_fail`` flag were exposed to concurrent reads/writes. Added a ``threading.Lock`` field, wrapped cache reads/writes and the warn-flag check-and-set in ``with self._lock:``. Stub fixtures unaffected. **episode_clip_path is now a method of VideoFrameProvider.** Used to be a free function reaching into ``provider._meta.episodes`` and ``provider._meta.get_video_file_path`` from outside the class. As a method it just uses ``self._meta``. The only caller (Module 1) updated; no external callers. **Atomic write in LanguageColumnsWriter.** ``pq.write_table(new_table, path)`` was overwriting the parquet shard in place — a crash mid-write would corrupt the file. Now writes to a sibling ``.tmp`` and ``Path.replace`` atomically. **Smaller items:** * ``executor.py`` docstring opened with "four phases" but listed six. Now says "six phases" to match. * ``[annotations]`` extra in ``pyproject.toml`` now includes ``openai>=1.40,<2.0``. Default ``VlmConfig.backend`` is ``"openai"``, so without it ``_make_openai_client`` would ImportError on a fresh ``uv sync --extra annotations``. * ``_snap_to_frame`` was duplicated identically in ``plan_subtasks_memory.py`` and ``interjections_and_speech.py``. Promoted to ``snap_to_frame`` in ``reader.py`` (next to ``EpisodeRecord``); both modules now import it. Backwards-compat alias not needed — no external callers. * ``EpisodeRecord.frames_df()`` was re-reading the full parquet on every call. Now memoizes via a private dataclass field so repeat calls from different modules pay the cost once. Method signature unchanged. * ``_extract_first_json_object`` had a redundant ``and not escape`` guard that was dead because the prior block already handled and reset ``escape``. Replaced with a comment explaining the invariant. **Pre-existing lint cleanups surfaced once these files entered pre-commit's scope:** * dead local ``client = clients[0]`` in ``_make_openai_client`` (the real round-robin uses ``clients[rr_counter[...]]``). * ``cmd = ... if "{port}" in cmd else f"...{port}"`` ternary collapse in ``_spawn_parallel_inference_servers``. * ``seek_pts = 0 if stream.time_base is None else int(...)`` ternary collapse in ``_decode_pyav_direct``. * ``# nosec B310`` on the localhost ``urllib.request.urlopen`` probe in ``_server_is_up`` — the URL is the user-configured local-server endpoint the CLI itself spawned, not arbitrary user input. **Test added.** ``tests/annotations/test_frames.py`` pins the regression on ``VideoFrameProvider``: asserts ``video_for_episode`` and ``episode_clip_path`` are callable methods (not nested dead code or free functions), and that the ``_lock`` field is a real ``threading.Lock``. Sweep: 64 passed, 2 failed (same pre-existing module-impl bugs as before this commit). Pre-commit clean. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-08 11:53:43 +02:00
#!/usr/bin/env python
# Copyright 2026 The HuggingFace Inc. team. All rights reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
"""Unit tests for :class:`VideoFrameProvider` method bindings.
These were prompted by a real regression: ``video_for_episode`` was
indented one level too deep so it ended up nested *inside* the
``_decode_pyav_direct`` helper (after that function's ``return``
statement) silently dead code that meant production runs with
``use_video_url=False`` would ``AttributeError`` on
``self.frame_provider.video_for_episode(...)``. The existing module
tests didn't catch it because they exercise stub providers.
The tests below assert on the class itself (not on an instance), so a
future reindent regression flips them to red without needing a real
LeRobot dataset on disk.
"""
from __future__ import annotations
import pytest
pytest.importorskip("datasets", reason="datasets is required (install lerobot[dataset])")
from lerobot.annotations.steerable_pipeline.frames import ( # noqa: E402
VideoFrameProvider,
)
def test_video_for_episode_is_a_method_of_videoframeprovider():
"""``video_for_episode`` must be a bound method, not nested dead code."""
assert callable(getattr(VideoFrameProvider, "video_for_episode", None))
def test_episode_clip_path_is_a_method_of_videoframeprovider():
"""``episode_clip_path`` is now a method (was a free function reaching
into ``provider._meta`` from outside the class)."""
assert callable(getattr(VideoFrameProvider, "episode_clip_path", None))
def test_videoframeprovider_has_a_lock_for_concurrent_use():
"""A ``ThreadPoolExecutor`` runs Module 1/2/3 phases concurrently;
the cache + warn-flag accesses must be guarded.
"""
import threading
# Fresh-instance check via a minimal fake to avoid touching the hub.
# The lock is declared with ``init=False`` and has a default factory,
# so a constructed instance must own a real ``threading.Lock``.
lock_field = next(
(f for f in VideoFrameProvider.__dataclass_fields__.values() if f.name == "_lock"),
None,
)
assert lock_field is not None
assert lock_field.default_factory is threading.Lock