tests(annotations): fix stale canned-VLM markers + action_record style assertion

The annotation tests had never actually run in CI (collection failed on
the missing 'datasets' extra); now that they do, three stale assertions
surfaced against the evolved pipeline:

  * test_module1_plan_memory_subtask_smoke: the memory canned-responder
    marker 'Update the memory' no longer appears in module_1_memory.txt
    (now 'compressed semantic memory'), so the stub returned no memory
    row and the {subtask,plan,memory} subset check failed. Marker
    updated to match the current prompt.
  * test_module2_mid_episode_emits_paired_interjection_and_speech: the
    interjection marker 'Write ONE interjection' is now 'Write ONE
    compact interjection' in module_2_interjection.txt, so 0 interjections
    were emitted. Marker updated.
  * tests/datasets/test_language.py::test_style_registry_routes_columns:
    PERSISTENT_STYLES gained 'action_record' in this PR; add it to the
    expected set.

These are test/prompt-marker syncs — no production behavior change.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
Pepijn
2026-06-03 16:21:17 +02:00
parent 273a8fc335
commit a18d969753
2 changed files with 6 additions and 8 deletions

View File

@@ -88,7 +88,7 @@ def test_module1_plan_memory_subtask_smoke(fixture_dataset_root: Path, tmp_path:
{"text": "place the sponge into the sink", "start": 0.8, "end": 1.1},
]
},
"Update the memory": {"memory": "wiped the counter once"},
"compressed semantic memory": {"memory": "wiped the counter once"},
},
)
module = PlanSubtasksMemoryModule(vlm=vlm, config=PlanConfig())
@@ -151,12 +151,10 @@ def test_module2_mid_episode_emits_paired_interjection_and_speech(
{
"acknowledgement the robot": {"text": "OK."},
# Marker matches the distinctive line of
# ``module_2_interjection.txt``. The old marker
# ("ONE realistic interruption") came from a previous prompt
# version that asked for counterfactual interjections; the
# current design anchors on subtask boundaries instead, so
# the prompt and its marker changed.
"Write ONE interjection": {
# ``module_2_interjection.txt`` ("Write ONE compact
# interjection ..."). Keep this in sync with that prompt's
# wording — the canned responder matches on substring.
"Write ONE compact interjection": {
"interjection": "now wipe the counter please",
"speech": "On it.",
},