tests(annotations): fix stale canned-VLM markers + action_record style assertion

The annotation tests had never actually run in CI (collection failed on the missing 'datasets' extra); now that they do, three stale assertions surfaced against the evolved pipeline: * test_module1_plan_memory_subtask_smoke: the memory canned-responder marker 'Update the memory' no longer appears in module_1_memory.txt (now 'compressed semantic memory'), so the stub returned no memory row and the {subtask,plan,memory} subset check failed. Marker updated to match the current prompt. * test_module2_mid_episode_emits_paired_interjection_and_speech: the interjection marker 'Write ONE interjection' is now 'Write ONE compact interjection' in module_2_interjection.txt, so 0 interjections were emitted. Marker updated. * tests/datasets/test_language.py::test_style_registry_routes_columns: PERSISTENT_STYLES gained 'action_record' in this PR; add it to the expected set. These are test/prompt-marker syncs — no production behavior change. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-06-04 04:41:24 +00:00 · 2026-06-03 16:21:17 +02:00
parent 273a8fc335
commit a18d969753
2 changed files with 6 additions and 8 deletions
--- a/tests/annotations/test_modules.py
+++ b/tests/annotations/test_modules.py
@@ -88,7 +88,7 @@ def test_module1_plan_memory_subtask_smoke(fixture_dataset_root: Path, tmp_path:
                    {"text": "place the sponge into the sink", "start": 0.8, "end": 1.1},
                ]
            },
-            "Update the memory": {"memory": "wiped the counter once"},
+            "compressed semantic memory": {"memory": "wiped the counter once"},
        },
    )
    module = PlanSubtasksMemoryModule(vlm=vlm, config=PlanConfig())
@@ -151,12 +151,10 @@ def test_module2_mid_episode_emits_paired_interjection_and_speech(
        {
            "acknowledgement the robot": {"text": "OK."},
            # Marker matches the distinctive line of
-            # ``module_2_interjection.txt``. The old marker
-            # ("ONE realistic interruption") came from a previous prompt
-            # version that asked for counterfactual interjections; the
-            # current design anchors on subtask boundaries instead, so
-            # the prompt and its marker changed.
-            "Write ONE interjection": {
+            # ``module_2_interjection.txt`` ("Write ONE compact
+            # interjection ..."). Keep this in sync with that prompt's
+            # wording — the canned responder matches on substring.
+            "Write ONE compact interjection": {
                "interjection": "now wipe the counter please",
                "speech": "On it.",
            },