fix(imports): fixing av import in test_depth.py

tests(typos): fixing typos in tests
fix(info): fixing info metadata update when is_depth_map was set
2026-05-31 02:41:24 +00:00 · 2026-05-22 15:13:15 +02:00 · 2026-05-22 13:09:56 +02:00 · 2026-05-22 02:48:30 +02:00 · 2026-05-22 02:07:33 +02:00 · 2026-05-22 02:06:37 +02:00
94 changed files with 1630 additions and 6692 deletions
--- a/docs/source/_toctree.yml
+++ b/docs/source/_toctree.yml
@@ -39,10 +39,8 @@
    title: Porting Large Datasets
  - local: using_dataset_tools
    title: Using the Dataset Tools
-  - local: language_and_recipes
-    title: Language Columns and Recipes
-  - local: tools
-    title: Tools
+  - local: dataset_subtask
+    title: Using Subtasks in the Dataset
  - local: video_encoding_parameters
    title: Video encoding parameters
  - local: streaming_video_encoding
@@ -73,8 +71,6 @@
 - sections:
  - local: sarm
    title: SARM
-  - local: topreward
-    title: TOPReward
  title: "Reward Models"
 - sections:
  - local: inference
@@ -147,8 +143,6 @@
    title: OMX
  - local: openarm
    title: OpenArm
-  - local: rebot_b601
-    title: reBot B601-DM
  title: "Robots"
 - sections:
  - local: phone_teleop
--- a/docs/source/dataset_subtask.mdx
+++ b/docs/source/dataset_subtask.mdx
@@ -0,0 +1,277 @@
+# Using Subtasks in LeRobot Datasets
+
+Subtask support in robotics datasets has proven effective in improving robot reasoning and understanding. Subtasks are particularly useful for:
+
+- **Hierarchical policies**: Building policies that include subtask predictions to visualize robot reasoning in real time
+- **Reward modeling**: Helping reward models understand task progression (e.g., SARM-style stage-aware reward models)
+- **Task decomposition**: Breaking down complex manipulation tasks into atomic, interpretable steps
+
+LeRobotDataset now supports subtasks as part of its dataset structure, alongside tasks.
+
+## What are Subtasks?
+
+While a **task** describes the overall goal (e.g., "Pick up the apple and place it in the basket"), **subtasks** break down the execution into finer-grained steps:
+
+1. "Approach the apple"
+2. "Grasp the apple"
+3. "Lift the apple"
+4. "Move to basket"
+5. "Release the apple"
+
+Each frame in the dataset can be annotated with its corresponding subtask, enabling models to learn and predict these intermediate stages.
+
+<img
+  src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/lerobot/subtask-asset.png"
+  alt="An overview of subtask annotation showing how frames are labeled with intermediate subtask stages"
+  width="80%"
+/>
+
+<p>
+  <em>Figure: Overview of subtask annotation.</em>
+</p>
+
+**Reference:** _Subtask-learning based for robot self-assembly in flexible collaborative assembly in manufacturing_, Original Article, Published: 19 April 2022.
+
+## Dataset Structure
+
+Subtask information is stored in the dataset metadata:
+
+```
+my-dataset/
+├── data/
+│   └── ...
+├── meta/
+│   ├── info.json
+│   ├── stats.json
+│   ├── tasks.parquet
+│   ├── subtasks.parquet      # Subtask index → subtask string mapping
+│   └── episodes/
+│       └── ...
+└── videos/
+    └── ...
+```
+
+### Subtasks Parquet File
+
+The `meta/subtasks.parquet` file maps subtask indices to their natural language descriptions:
+
+| subtask_index | subtask (index column) |
+| ------------- | ---------------------- |
+| 0             | "Approach the apple"   |
+| 1             | "Grasp the apple"      |
+| 2             | "Lift the apple"       |
+| ...           | ...                    |
+
+### Frame-Level Annotations
+
+Each frame in the dataset can include a `subtask_index` field that references the subtasks parquet file:
+
+```python
+# Example frame data in the parquet file
+{
+    "index": 42,
+    "timestamp": 1.4,
+    "episode_index": 0,
+    "task_index": 0,
+    "subtask_index": 2,  # References "Lift the apple"
+    "observation.state": [...],
+    "action": [...],
+}
+```
+
+## Annotating Datasets with Subtasks
+
+We provide a HuggingFace Space for easily annotating any LeRobotDataset with subtasks:
+
+**[https://huggingface.co/spaces/lerobot/annotate](https://huggingface.co/spaces/lerobot/annotate)**
+
+After completing your annotation:
+
+1. Click "Push to Hub" to upload your annotated dataset
+2. You can also run the annotation space locally by following the instructions at [github.com/huggingface/lerobot-annotate](https://github.com/huggingface/lerobot-annotate)
+
+## Loading Datasets with Subtasks
+
+When you load a dataset with subtask annotations, the subtask information is automatically available:
+
+```python
+from lerobot.datasets import LeRobotDataset
+
+# Load a dataset with subtask annotations
+dataset = LeRobotDataset("jadechoghari/collect-fruit-annotated")
+
+# Access a sample
+sample = dataset[100]
+
+# The sample includes both task and subtask information
+print(sample["task"])        # "Collect the fruit"
+print(sample["subtask"])     # "Grasp the apple"
+print(sample["task_index"])  # tensor(0)
+print(sample["subtask_index"])  # tensor(2)
+```
+
+### Checking for Subtask Support
+
+You can check if a dataset has subtask annotations:
+
+```python
+# Check if subtasks are available
+has_subtasks = (
+    "subtask_index" in dataset.features
+    and dataset.meta.subtasks is not None
+)
+
+if has_subtasks:
+    print(f"Dataset has {len(dataset.meta.subtasks)} unique subtasks")
+    print("Subtasks:", list(dataset.meta.subtasks.index))
+```
+
+## Using Subtasks for Training
+
+### With the Tokenizer Processor
+
+The `TokenizerProcessor` automatically handles subtask tokenization for Vision-Language Action (VLA) models:
+
+```python
+from lerobot.processor import TokenizerProcessorStep
+
+# Create a tokenizer processor step
+tokenizer_processor = TokenizerProcessorStep(
+    tokenizer_name_or_path="google/paligemma-3b-pt-224",
+    padding="max_length",
+    max_length=64,
+)
+
+# The processor will automatically tokenize subtasks if present in the batch
+# and add them to the observation under:
+# - "observation.subtask.tokens"
+# - "observation.subtask.attention_mask"
+```
+
+When subtasks are available in the batch, the tokenizer processor adds:
+
+- `observation.subtask.tokens`: Tokenized subtask text
+- `observation.subtask.attention_mask`: Attention mask for the subtask tokens
+
+### DataLoader with Subtasks
+
+```python
+import torch
+from lerobot.datasets import LeRobotDataset
+
+dataset = LeRobotDataset("jadechoghari/collect-fruit-annotated")
+
+dataloader = torch.utils.data.DataLoader(
+    dataset,
+    batch_size=16,
+    shuffle=True,
+)
+
+for batch in dataloader:
+    # Access subtask information in the batch
+    subtasks = batch["subtask"]  # List of subtask strings
+    subtask_indices = batch["subtask_index"]  # Tensor of subtask indices
+
+    # Use for training hierarchical policies or reward models
+    print(f"Batch subtasks: {set(subtasks)}")
+```
+
+## Example Datasets with Subtask Annotations
+
+Try loading a dataset with subtask annotations:
+
+```python
+from lerobot.datasets import LeRobotDataset
+
+# Example dataset with subtask annotations
+dataset = LeRobotDataset("jadechoghari/collect-fruit-annotated")
+
+# Explore the subtasks
+print("Available subtasks:")
+for subtask_name in dataset.meta.subtasks.index:
+    print(f"  - {subtask_name}")
+
+# Get subtask distribution
+subtask_counts = {}
+for i in range(len(dataset)):
+    sample = dataset[i]
+    subtask = sample["subtask"]
+    subtask_counts[subtask] = subtask_counts.get(subtask, 0) + 1
+
+print("\nSubtask distribution:")
+for subtask, count in sorted(subtask_counts.items(), key=lambda x: -x[1]):
+    print(f"  {subtask}: {count} frames")
+```
+
+## Use Cases
+
+### 1. Hierarchical Policy Training
+
+Train policies that predict both actions and current subtask:
+
+```python
+class HierarchicalPolicy(nn.Module):
+    def __init__(self, num_subtasks):
+        super().__init__()
+        self.action_head = nn.Linear(hidden_dim, action_dim)
+        self.subtask_head = nn.Linear(hidden_dim, num_subtasks)
+
+    def forward(self, observations):
+        features = self.encoder(observations)
+        actions = self.action_head(features)
+        subtask_logits = self.subtask_head(features)
+        return actions, subtask_logits
+```
+
+### 2. Stage-Aware Reward Modeling (SARM)
+
+Build reward models that understand task progression:
+
+```python
+# SARM predicts:
+# - Stage: Which subtask is being executed (discrete)
+# - Progress: How far along the subtask (continuous 0-1)
+
+class SARMRewardModel(nn.Module):
+    def forward(self, observations):
+        features = self.encoder(observations)
+        stage_logits = self.stage_classifier(features)
+        progress = self.progress_regressor(features)
+        return stage_logits, progress
+```
+
+### 3. Progress Visualization
+
+Monitor robot execution by tracking subtask progression:
+
+```python
+def visualize_execution(model, observations):
+    for t, obs in enumerate(observations):
+        action, subtask_logits = model(obs)
+        predicted_subtask = subtask_names[subtask_logits.argmax()]
+        print(f"t={t}: Executing '{predicted_subtask}'")
+```
+
+## API Reference
+
+### LeRobotDataset Properties
+
+| Property                    | Type                   | Description                                |
+| --------------------------- | ---------------------- | ------------------------------------------ |
+| `meta.subtasks`             | `pd.DataFrame \| None` | DataFrame mapping subtask names to indices |
+| `features["subtask_index"]` | `dict`                 | Feature spec for subtask_index if present  |
+
+### Sample Keys
+
+When subtasks are available, each sample includes:
+
+| Key             | Type           | Description                          |
+| --------------- | -------------- | ------------------------------------ |
+| `subtask_index` | `torch.Tensor` | Integer index of the current subtask |
+| `subtask`       | `str`          | Natural language subtask description |
+
+## Related Resources
+
+- [SARM Paper](https://arxiv.org/pdf/2509.25358) - Stage-Aware Reward Modeling for Long Horizon Robot Manipulation
+- [LeRobot Annotate Space](https://huggingface.co/spaces/lerobot/annotate) - Interactive annotation tool
+- [LeRobotDataset v3.0](./lerobot-dataset-v3) - Dataset format documentation
--- a/docs/source/language_and_recipes.mdx
+++ b/docs/source/language_and_recipes.mdx
@@ -1,147 +0,0 @@
-# Language columns and recipes
-
-Most LeRobot datasets ship with a single `task` string per episode — fine for
-short, single-instruction skills, but not enough for the longer-horizon,
-multi-modal robot policies the field is moving toward (high-level planning,
-memory, interjections, VQA, tool use). To support those policies without
-forking the dataset format, LeRobot extends `LeRobotDataset` with two optional
-language columns and a small recipe layer that turns those rows into
-chat-style training samples on the fly.
-
-The design splits cleanly into three layers:
-
-1. **Data in the dataset** — language annotations stored next to frames in
-   `data/chunk-*/file-*.parquet` as two optional columns (`language_persistent`
-   and `language_events`). Datasets without these columns keep their existing
-   behavior.
-2. **Recipe** — a YAML file that declares which annotation rows to bind and
-   how to lay them out as chat turns (`role`, `content`, optional images,
-   optional tool calls). Recipes are pure config; no Python required to add a
-   new one.
-3. **Training format** — at sample time, `RenderMessagesStep` resolves the
-   recipe against the per-frame annotations and emits HF-style `messages` plus
-   LeRobot-specific sidecars (`message_streams`, `target_message_indices`)
-   that policy processors consume.
-
-This page describes each layer in turn.
-
-## Layer 1 — language columns in the dataset
-
-The two optional columns live next to frame data in
-`data/chunk-*/file-*.parquet`:
-
- `language_persistent`: a list of rows broadcast across every frame in an episode for state that remains active, such as `subtask`, `plan`, and `memory`.
- `language_events`: a list of rows only on the exact frame where an event was emitted, such as `interjection`, `vqa`, and speech tool calls.
-
-Both columns share the same row shape (event rows omit `timestamp` because the
-frame the row sits on already provides it):
-
-```text
-role: string
-content: string | null
-style: string | null
-timestamp: float32        # persistent rows only
-camera: string | null     # observation.images.* feature key, view-dependent rows only
-tool_calls: list[Json] | null
-```
-
-The `camera` field tags rows whose `content` is grounded in a specific camera
-view. Rows of view-dependent styles (`vqa` and `trace`) MUST set `camera` to
-the matching `observation.images.*` feature key. Rows of every other style —
-including `motion`, which describes robot-frame primitives in joint / Cartesian
-terms — MUST leave `camera` as `null`. Pipeline writers and the validator
-enforce this via `validate_camera_field(style, camera)`.
-
-`meta/tasks.parquet` remains the canonical source for the task. The special `${task}` recipe binding always reads that task string and does not depend on language annotations.
-
-### Architecture
-
-The language stack itself has three internal modules backing layer 1:
-
-1. `lerobot.datasets.language` defines the schema, style registry, and `column_for_style`.
-2. `lerobot.datasets.language_render` resolves rows and renders messages.
-3. `RenderMessagesStep` turns dataset samples into `messages`, `message_streams`, and `target_message_indices`.
-
-`LeRobotDataset` stays recipe-agnostic. It passes `language_persistent` and `language_events` through when present, and unannotated datasets keep their existing behavior.
-
-## Layer 2 — recipe anatomy
-
-Recipes are YAML files backed by `TrainingRecipe` and `MessageTurn`. They
-declare which annotation rows to pull (via `bindings`) and how to compose them
-into chat turns (`messages`).
-
-```yaml
-messages:
-  - { role: user, content: "${task}", stream: high_level }
-  - { role: assistant, content: "${subtask}", stream: low_level, target: true }
-```
-
-A recipe can also branch into a weighted **blend** of sub-recipes. At sample
-time, exactly one branch is selected deterministically from the sample index,
-so different frames train different objectives (e.g. memory updates vs.
-low-level execution vs. VQA) without any Python wiring.
-
-### Temporal semantics
-
-Persistent styles are active after emission until replaced:
-
- `active_at(t, style=subtask)`
- `nth_prev(style=memory, offset=1)`
- `nth_next(style=subtask, offset=1)`
-
-Event styles only exist on their exact timestamp:
-
- `emitted_at(t, style=interjection)`
- `emitted_at(t, style=vqa, role=user, camera=observation.images.top)`
- `emitted_at(t, role=assistant, tool_name=say)`
-
-Exact event matching has no tolerance window, so writers must stamp event rows with frame timestamps from the parquet data.
-
-### View-dependent resolution
-
-For view-dependent styles (`vqa` and `trace`), the resolver gains a
-`camera=` filter parallel to `role=` and `tool_name=`. Datasets with multiple
-cameras typically emit one (`vqa`, `user`) + (`vqa`, `assistant`) pair per
-camera at the same timestamp; without `camera=`, those resolvers see two
-matches and raise an ambiguity error. Recipes consume each camera through its
-own binding plus a matching image block, e.g.
-
-```yaml
-ask_vqa_top:
-  bindings:
-    vqa_query: "emitted_at(t, style=vqa, role=user, camera=observation.images.top)"
-    vqa: "emitted_at(t, style=vqa, role=assistant, camera=observation.images.top)"
-  messages:
-    - role: user
-      stream: high_level
-      if_present: vqa_query
-      content:
-        - { type: image, feature: observation.images.top }
-        - { type: text, text: "${vqa_query}" }
-    - {
-        role: assistant,
-        content: "${vqa}",
-        stream: high_level,
-        target: true,
-        if_present: vqa,
-      }
-```
-
-Add one such sub-recipe per camera the dataset records.
-
-## Layer 3 — training format
-
-Rendered samples use HF-style chat messages plus LeRobot sidecars:
-
-```python
-sample["messages"]
-sample["message_streams"]
-sample["target_message_indices"]
-```
-
-The renderer does not apply a tokenizer chat template. Policy processors decide how to serialize the messages for their backbone, which keeps the same dataset usable across SmolVLA, Pi0.5, and any future VLM that expects OpenAI-style chat messages.
-
-## Graceful absence
-
-If both language columns are missing, `None`, or empty, `RenderMessagesStep` is a no-op.
-If an event-scoped branch is selected on a frame without the required event row, rendering returns `None`, allowing a loader to retry another sample.
--- a/docs/source/rebot_b601.mdx
+++ b/docs/source/rebot_b601.mdx
@@ -1,186 +0,0 @@
-# reBot B601-DM
-
-[reBot B601-DM](https://wiki.seeedstudio.com/rebot_arm_b601_dm_lerobot/) is an open-source, low-cost robot arm from Seeed Studio for embodied-AI and imitation learning. It comes as a **follower** arm (the `B601-DM`, a 6-DOF arm plus gripper driven by Damiao CAN motors) and a **leader** arm (the `StarArm102` / `reBot Arm 102`, driven by FashionStar UART smart servos) used to teleoperate it.
-
-This page covers **calibration** and **teleoperation** for both single-arm and bimanual (dual-arm) setups.
-
-<div style="display: flex; align-items: center; gap: 10px;">
-  <img
-    src="https://files.seeedstudio.com/wiki/robotics/projects/lerobot/b601dm_zeroposition.jpg"
-    alt="reBot B601-DM follower arm at its zero position"
-    width="48%"
-  />
-  <img
-    src="https://files.seeedstudio.com/wiki/robotics/projects/lerobot/102_zeroposition.jpg"
-    alt="reBot Arm 102 leader arm at its zero position"
-    width="48%"
-  />
-</div>
-
-_Left: the B601-DM follower at its zero position. Right: the reBot Arm 102 leader at its zero position. Images courtesy of [Seeed Studio](https://wiki.seeedstudio.com/rebot_arm_b601_dm_lerobot/)._
-
-## Install LeRobot 🤗
-
-Follow our [Installation Guide](./installation), then install the reBot support:
-
-```bash
-pip install -e ".[rebot]"
-```
-
-This pulls in `motorbridge` (CAN motor control for the B601-DM follower) and `motorbridge-smart-servo` (FashionStar UART servos for the reBot Arm 102 leader).
-
-## Registered device types
-
-| Type                     | Kind                                         |
-| ------------------------ | -------------------------------------------- |
-| `rebot_b601_follower`    | single-arm B601-DM follower robot            |
-| `bi_rebot_b601_follower` | bimanual (dual-arm) follower robot           |
-| `rebot_102_leader`       | single-arm reBot Arm 102 leader teleoperator |
-| `bi_rebot_102_leader`    | bimanual (dual-arm) leader teleoperator      |
-
-The bimanual types compose two single-arm instances and namespace each arm's
-observation/action keys with a `left_` / `right_` prefix. Per-arm settings are
-passed through nested `left_arm_config.*` / `right_arm_config.*` arguments.
-
-## Find the USB ports
-
-For each device, find the USB port associated with its motor bus using:
-
-```bash
-lerobot-find-port
-```
-
-<Tip warning={true}>
-  On Linux, remove `brltty` (`sudo apt remove brltty`) so it does not hold the
-  leader's USB serial port. You may also need to grant access to the serial
-  devices: `sudo chmod 666 /dev/ttyACM* /dev/ttyUSB*`.
-</Tip>
-
-## Calibration
-
-Neither arm stores a persistent hardware calibration: every time it connects, the motors are re-zeroed against the pose the arm is physically holding. Calibration simply records that zero pose. When prompted, **manually move the arm to its zero position** (the default sit-down pose shown above, gripper fully closed) and press <kbd>ENTER</kbd>.
-
-### Follower (B601-DM)
-
-<hfoptions id="calibrate-follower">
-<hfoption id="Single arm">
-
-```bash
-lerobot-calibrate \
-    --robot.type=rebot_b601_follower \
-    --robot.port=/dev/ttyACM0 \
-    --robot.id=follower \
-    --robot.can_adapter=damiao
-```
-
-</hfoption>
-<hfoption id="Dual arm">
-
-Connect the bimanual follower; calibration runs for the left arm, then the right arm.
-
-```bash
-lerobot-calibrate \
-    --robot.type=bi_rebot_b601_follower \
-    --robot.id=bi_follower \
-    --robot.left_arm_config.port=/dev/ttyACM0 \
-    --robot.left_arm_config.can_adapter=damiao \
-    --robot.right_arm_config.port=/dev/ttyACM1 \
-    --robot.right_arm_config.can_adapter=damiao
-```
-
-Per-arm calibration files are saved with `_left` / `_right` suffixes on the id.
-
-</hfoption>
-</hfoptions>
-
-### Leader (reBot Arm 102)
-
-<hfoptions id="calibrate-leader">
-<hfoption id="Single arm">
-
-```bash
-lerobot-calibrate \
-    --teleop.type=rebot_102_leader \
-    --teleop.port=/dev/ttyUSB0 \
-    --teleop.id=leader
-```
-
-</hfoption>
-<hfoption id="Dual arm">
-
-```bash
-lerobot-calibrate \
-    --teleop.type=bi_rebot_102_leader \
-    --teleop.id=bi_leader \
-    --teleop.left_arm_config.port=/dev/ttyUSB0 \
-    --teleop.right_arm_config.port=/dev/ttyUSB1
-```
-
-</hfoption>
-</hfoptions>
-
-## Teleoperation
-
-Once both arms are calibrated, drive the follower with the leader. The follower talks to its CAN bus through a Damiao serial bridge (`can_adapter=damiao`, the default) or a SocketCAN adapter (`can_adapter=socketcan`). See the [OpenArm page](./openarm) for more details on the SocketCAN adapter configuration.
-
-<hfoptions id="teleoperate">
-<hfoption id="Single arm">
-
-```bash
-lerobot-teleoperate \
-    --robot.type=rebot_b601_follower \
-    --robot.port=/dev/ttyACM0 \
-    --robot.id=follower \
-    --robot.can_adapter=damiao \
-    --teleop.type=rebot_102_leader \
-    --teleop.port=/dev/ttyUSB0 \
-    --teleop.id=leader
-```
-
-</hfoption>
-<hfoption id="Dual arm">
-
-The bimanual leader and follower reuse the single-arm classes; each arm is
-configured through nested `left_arm_config.*` / `right_arm_config.*` arguments,
-so a bimanual reBot Arm 102 leader drives a bimanual B601-DM follower.
-
-```bash
-lerobot-teleoperate \
-    --robot.type=bi_rebot_b601_follower \
-    --robot.id=bi_follower \
-    --robot.left_arm_config.port=/dev/ttyACM0 \
-    --robot.left_arm_config.can_adapter=damiao \
-    --robot.right_arm_config.port=/dev/ttyACM1 \
-    --robot.right_arm_config.can_adapter=damiao \
-    --teleop.type=bi_rebot_102_leader \
-    --teleop.id=bi_leader \
-    --teleop.left_arm_config.port=/dev/ttyUSB0 \
-    --teleop.right_arm_config.port=/dev/ttyUSB1
-```
-
-</hfoption>
-</hfoptions>
-
-<Tip>
-  The leader and follower share the same joint names (`shoulder_pan,
-  shoulder_lift, elbow_flex, wrist_flex, wrist_yaw, wrist_roll, gripper`), so
-  leader actions map directly onto the follower.
-</Tip>
-
-If the motion of a joint is reversed, flip its sign in the leader's `joint_directions` (the gripper also carries a scale to widen its range to the follower):
-
-```bash
-lerobot-teleoperate \
-    --robot.type=rebot_b601_follower \
-    --robot.port=/dev/ttyACM0 \
-    --robot.can_adapter=damiao \
-    --teleop.type=rebot_102_leader \
-    --teleop.port=/dev/ttyUSB0 \
-    --teleop.joint_directions='{"shoulder_pan":-1,"shoulder_lift":-1,"elbow_flex":1,"wrist_flex":1,"wrist_yaw":1,"wrist_roll":-1,"gripper":-6}'
-```
-
-## Recording datasets
-
-Swap `lerobot-teleoperate` for `lerobot-record` (with the same `--robot.*` / `--teleop.*` arguments, plus `--dataset.*`) to record demonstrations for training. See [Imitation Learning for Robots](./il_robots) for the full workflow.
-
-For hardware assembly and wiring, see the [Seeed Studio reBot wiki](https://wiki.seeedstudio.com/rebot_arm_b601_dm_lerobot/).
--- a/docs/source/tools.mdx
+++ b/docs/source/tools.mdx
@@ -1,210 +0,0 @@
-# Tools
-
-LeRobot v3.1 supports **tool calls** in policies — assistant messages can
-emit structured invocations like `say(text="OK, starting now")` that the
-runtime dispatches to a real implementation (TTS, controller, logger, …).
-
-This page covers:
-
-1. Where the tool catalog lives.
-2. How the annotation pipeline produces tool-call atoms.
-3. How to add your own tool.
-
-## Where tools are declared
-
-Two layers.
-
-**The catalog** — a list of OpenAI-style function schemas — lives at
-`meta/info.json["tools"]` on each dataset. Example:
-
-```json
-{
-  "features": { "...": "..." },
-  "tools": [
-    {
-      "type": "function",
-      "function": {
-        "name": "say",
-        "description": "Speak a short utterance to the user via the TTS executor.",
-        "parameters": {
-          "type": "object",
-          "properties": {
-            "text": {
-              "type": "string",
-              "description": "The verbatim text to speak."
-            }
-          },
-          "required": ["text"]
-        }
-      }
-    }
-  ]
-}
-```
-
-Read it via the dataset metadata accessor:
-
-```python
-from lerobot.datasets.dataset_metadata import LeRobotDatasetMetadata
-
-meta = LeRobotDatasetMetadata(repo_id="pepijn/super_poulain_final_annotations")
-tools = meta.tools     # list[dict] — OpenAI tool schemas
-```
-
-If the dataset's `info.json` doesn't declare any tools, `meta.tools`
-returns `DEFAULT_TOOLS` from `lerobot.datasets.language` — currently a
-single-entry list with the canonical `say` schema. So unannotated
-datasets and chat-template consumers keep working without any
-configuration:
-
-```python
-prompt_str = tokenizer.apply_chat_template(
-    sample["messages"],
-    tools=meta.tools,                 # works either way
-    add_generation_prompt=False,
-    tokenize=False,
-)
-```
-
-**The implementations** — runnable Python — will live under
-`src/lerobot/tools/`, one file per tool. The runtime dispatcher and
-the canonical `say` implementation (wrapping Kyutai's pocket-tts) are
-not part of the catalog layer described here; today this layer ships
-only the schema storage and the `DEFAULT_TOOLS` fallback constant.
-
-## Per-row tool _invocations_
-
-The catalog above describes _what can be called_. The actual _call_ — the
-function name plus the argument values — is stored per-row, on the
-assistant atoms in `language_events`:
-
-```python
-{
-  "role": "assistant",
-  "content": null,
-  "style": null,
-  "timestamp": 12.4,
-  "camera": null,
-  "tool_calls": [
-    { "type": "function",
-      "function": { "name": "say", "arguments": { "text": "On it." } } }
-  ]
-}
-```
-
-Recipes splice these into rendered messages via `tool_calls_from`:
-
-```yaml
-user_interjection_response:
-  bindings:
-    speech: "emitted_at(t, role=assistant, tool_name=say)"
-  messages:
-    - { role: user, content: "${task}", stream: high_level }
-    - {
-        role: assistant,
-        content: "${current_plan}",
-        stream: high_level,
-        target: true,
-        tool_calls_from: speech,
-      }
-```
-
-The model's training target is one assistant turn that carries both the
-plan text _and_ the `say` tool call. At inference, the runtime parses
-the generated text back into structured `tool_calls` and dispatches to
-the matching implementation.
-
-## How to add your own tool
-
-> **Note:** Steps 2 and 3 below describe the runtime layer
-> (`src/lerobot/tools/`, the `Tool` protocol, `TOOL_REGISTRY`,
-> `get_tools(meta)`) which is not part of the catalog layer shipped
-> today — those modules don't yet exist in the tree. Step 1 alone is
-> enough to make the tool visible to the chat template via
-> `meta.tools` so the model can learn to _generate_ the call;
-> executing the call at inference requires the runtime layer.
-
-Three steps. Concrete example: a `record_observation` tool the policy
-can call to capture an extra observation outside the regular control
-loop.
-
-### Step 1 — declare the schema
-
-Add an entry under `meta/info.json["tools"]`. Either edit the file
-directly on disk _before_ running the annotation pipeline (it'll be
-preserved) or hand it to `lerobot-annotate` via a config flag.
-
-```json
-{
-  "tools": [
-    { "type": "function", "function": { "name": "say", "...": "..." } },
-    {
-      "type": "function",
-      "function": {
-        "name": "record_observation",
-        "description": "Capture a high-resolution still image for the user.",
-        "parameters": {
-          "type": "object",
-          "properties": {
-            "label": {
-              "type": "string",
-              "description": "Short label for the saved image."
-            }
-          },
-          "required": ["label"]
-        }
-      }
-    }
-  ]
-}
-```
-
-The schema follows OpenAI's function-calling convention exactly, so the
-chat template can render it natively.
-
-### Step 2 — implement the call
-
-Create `src/lerobot/tools/record_observation.py`:
-
-```python
-from .base import Tool
-from typing import Any
-
-RECORD_OBSERVATION_SCHEMA: dict[str, Any] = { "...": "..." }   # mirrors the JSON above
-
-
-class RecordObservationTool:
-    name = "record_observation"
-    schema = RECORD_OBSERVATION_SCHEMA
-
-    def __init__(self, schema: dict | None = None, output_dir: str = "."):
-        self.output_dir = output_dir
-
-    def call(self, arguments: dict) -> str:
-        label = arguments["label"]
-        # ... save the latest camera frame to <output_dir>/<label>.png ...
-        return f"saved {label}.png"
-```
-
-One file per tool keeps dependencies isolated — `record_observation`
-might pull `pillow`, while `say` pulls `pocket-tts`. Users installing
-only the tools they need avoid heavy transitive deps.
-
-### Step 3 — register it
-
-Add to `src/lerobot/tools/registry.py`:
-
-```python
-from .record_observation import RecordObservationTool
-
-TOOL_REGISTRY["record_observation"] = RecordObservationTool
-```
-
-That's it. At runtime `get_tools(meta)` looks up each schema in
-`meta.tools`, instantiates the matching registered class, and returns
-a name → instance dict the dispatcher can route into.
-
-If you want to use a tool _without_ writing an implementation (e.g. for
-training-time chat-template formatting only), step 1 alone is enough —
-the model still learns to _generate_ the call. Steps 2 and 3 are only
-needed to actually _execute_ it at inference.
--- a/docs/source/topreward.mdx
+++ b/docs/source/topreward.mdx
@@ -1,177 +0,0 @@
-# TOPReward
-
-TOPReward is a **zero-shot reward model** that extracts token log-probabilities from an off-the-shelf vision-language model (VLM) as a robotic reward signal. Given a video trajectory and a task instruction, it returns the VLM's log-likelihood that the instruction is true — no fine-tuning required.
-
-**Paper**: [TOPReward: Token Probabilities as Hidden Zero-Shot Rewards for Robotics](https://arxiv.org/abs/2602.19313)
-**Project**: [topreward.github.io](https://topreward.github.io/webpage/)
-**Original code**: [github.com/TOPReward/TOPReward](https://github.com/TOPReward/TOPReward)
-**Default backbone**: [Qwen/Qwen3-VL-8B-Instruct](https://huggingface.co/Qwen/Qwen3-VL-8B-Instruct)
-
-## Overview
-
-TOPReward asks a generic VLM how likely a task instruction is, **conditioned on the video** of a robot trying to complete that task. Concretely, given:
-
- A trajectory video (a sequence of frames).
- A task instruction (e.g. _"open the drawer"_).
-
-it builds a chat prompt of the form
-
-```text
-<video>
-"The above video shows a robot manipulation trajectory that completes the
- following task: <instruction> Decide whether the above statement is True
- or not. The answer is: True"
-```
-
-forwards it through the VLM, label-masks everything except the very last token, and reads back the log-probability of that token — by default the literal `"True"` that closes the suffix template. The resulting `log P("True" | video + prompt + instruction)` is the reward.
-
-Because the method only depends on a frozen VLM, TOPReward is **zero-shot**: there are no fine-tuned weights to host. The "model" in LeRobot is a small wrapper around `transformers`' `Qwen3VLForConditionalGeneration` plus the label-masking logic. The processor owns the tokeniser and builds the full chat prompt (EO-1/Robometer pattern).
-
-## What the LeRobot integration covers
-
- Standard `reward_model.type=topreward` configuration through LeRobot.
- VLM loading via the `transformers` `Qwen3VLForConditionalGeneration` API.
- Prompt assembly + tokenisation in the processor (matching upstream `QwenClient.compute_instruction_reward`).
- `compute_reward()` returns one scalar log-prob per sample.
- LeRobot reward-model save/load — `save_pretrained` writes only `config.json` (the VLM is identified by `vlm_name`).
- An offline labeling script that writes a `topreward_progress.parquet` (SARM-compatible schema) for RA-BC and overlay.
-
-The current LeRobot port supports the **Qwen3-VL client only**. Other upstream clients (Gemini, OpenAI, Gemma, Molmo) can be added as follow-up extras.
-
-## Installation Requirements
-
-1. Install LeRobot following the [Installation Guide](./installation).
-2. Install the TOPReward optional extra:
-
-```bash
-pip install -e ".[topreward]"
-```
-
-or, with `uv` from a source checkout:
-
-```bash
-uv sync --extra topreward
-```
-
-This pulls in `transformers`. The first time you run TOPReward, Hugging Face will also download the VLM weights from the Hub (~16 GB for Qwen3-VL-8B-Instruct). A GPU is strongly recommended.
-
-## Model Inputs and Outputs
-
-TOPReward expects:
-
- A trajectory video or sequence of frames.
- A natural-language task description.
-
-In LeRobot datasets the preprocessor reads:
-
-| Config field              | Default                     | Meaning                                       |
-| ------------------------- | --------------------------- | --------------------------------------------- |
-| `reward_model.image_key`  | `observation.images.top`    | Camera observation used by TOPReward          |
-| `reward_model.task_key`   | `task`                      | Key in complementary data for the task string |
-| `reward_model.max_frames` | `16`                        | Cap on frames per sample                      |
-| `reward_model.fps`        | `2.0`                       | Metadata passed to the Qwen video processor   |
-| `reward_model.vlm_name`   | `Qwen/Qwen3-VL-8B-Instruct` | Hugging Face Hub id of the underlying VLM     |
-
-The model returns:
-
- `compute_reward(batch)`: one log-probability per sample. Higher = better task-video alignment. When `success_threshold` is finite, returns the binary thresholded value instead.
-
-## Usage
-
-### Load the reward model directly
-
-```python
-from lerobot.rewards.topreward import TOPRewardConfig, TOPRewardModel
-
-cfg = TOPRewardConfig(
-    vlm_name="Qwen/Qwen3-VL-8B-Instruct",
-    device="cuda",
-)
-reward_model = TOPRewardModel(cfg)
-```
-
-### Use the reward factory
-
-```python
-from lerobot.rewards import make_reward_model, make_reward_model_config, make_reward_pre_post_processors
-
-cfg = make_reward_model_config(
-    "topreward",
-    vlm_name="Qwen/Qwen3-VL-8B-Instruct",
-    device="cuda",
-    image_key="observation.images.top",
-)
-reward_model = make_reward_model(cfg)
-preprocessor, postprocessor = make_reward_pre_post_processors(cfg)
-```
-
-The preprocessor tokenises the full prompt (video + prefix + instruction suffix), writes Qwen-VL tensors + `prompt_length` under `observation.topreward.*`. The model reads those tensors, label-masks based on `prompt_length`, and extracts the log-prob reward.
-
-### Offline dataset labeling
-
-Write a `topreward_progress.parquet` for RA-BC training and overlay videos:
-
-```bash
-# Sparse-dense (15 anchors per episode, matches upstream)
-uv run python -m lerobot.rewards.topreward.compute_rabc_weights \
-    --dataset-repo-id lerobot/libero_10_image \
-    --num-samples 15 \
-    --device cuda
-```
-
-Then render the progress overlay for any episode:
-
-```bash
-uv run examples/dataset/create_progress_videos.py \
-    --repo-id lerobot/libero_10_image \
-    --episode 0 \
-    --progress-file topreward_progress.parquet \
-    --gif
-```
-
-## Configuration Notes
-
-### Prompt knobs
-
-The default prompt mirrors the upstream paper:
-
-```text
-prompt_prefix = "The above video shows a robot manipulation trajectory that completes the following task: "
-prompt_suffix_template = "{instruction} Decide whether the above statement is True or not. The answer is: True"
-```
-
-Both are exposed on `TOPRewardConfig` for ablation. The suffix template **must** contain `{instruction}`.
-
-### Chat template
-
-`add_chat_template=True` wraps the full prompt (including instruction) with the tokenizer's chat template before tokenisation. Default is `False`, matching the upstream paper's main experiments.
-
-## Limitations
-
- The current LeRobot port is **inference-only and zero-shot**; `forward()` is not overridden and `is_trainable` returns `False`.
- Only the **Qwen3-VL family** is supported; other upstream clients are out of scope.
- TOPReward inherits the underlying VLM's biases.
-
-## References
-
- [TOPReward project page](https://topreward.github.io/webpage/)
- [TOPReward paper](https://arxiv.org/abs/2602.19313)
- [Original TOPReward code](https://github.com/TOPReward/TOPReward)
- [Qwen3-VL-8B-Instruct](https://huggingface.co/Qwen/Qwen3-VL-8B-Instruct)
-
-## Citation
-
-```bibtex
-@article{chen2026topreward,
-  title={TOPReward: Token Probabilities as Hidden Zero-Shot Rewards for Robotics},
-  author={Chen, Shirui and Harrison, Cole and Lee, Ying-Chun and Yang, Angela Jin and
-          Ren, Zhongzheng and Ratliff, Lillian J and Duan, Jiafei and Fox, Dieter and
-          Krishna, Ranjay},
-  journal={arXiv preprint arXiv:2602.19313},
-  year={2026}
-}
-```
-
-## License
-
-The original TOPReward codebase is MIT-licensed. The LeRobot port follows the LeRobot Apache 2.0 license; the wrapped Qwen3-VL weights are subject to the original Qwen license.
--- a/docs/source/video_encoding_parameters.mdx
+++ b/docs/source/video_encoding_parameters.mdx
@@ -82,7 +82,7 @@ After the first episode of a video stream is encoded, the encoder configuration
        "video.pix_fmt": "yuv420p",
        "video.fps": 30,
        "video.channels": 3,
-        "video.is_depth_map": false,
+        "is_depth_map": false,
        "video.g": 2,
        "video.crf": 30,
        "video.preset": "fast",
@@ -97,7 +97,7 @@ After the first episode of a video stream is encoded, the encoder configuration

 Two sources contribute to the `info` block:

- **Stream-derived** (read back from the encoded MP4 with PyAV): `video.height`, `video.width`, `video.codec`, `video.pix_fmt`, `video.fps`, `video.channels`, `video.is_depth_map`, plus `audio.*` if an audio stream is present.
+- **Stream-derived** (read back from the encoded MP4 with PyAV): `video.height`, `video.width`, `video.codec`, `video.pix_fmt`, `video.fps`, `video.channels`, `is_depth_map`, plus `audio.*` if an audio stream is present.
 - **Encoder-derived** (taken from `VideoEncoderConfig`): `video.g`, `video.crf`, `video.preset`, `video.fast_decode`, `video.video_backend`, `video.extra_options`.

 <Tip>
--- a/pyproject.toml
+++ b/pyproject.toml
@@ -95,7 +95,7 @@ dependencies = [

 # ── Feature-scoped extras ──────────────────────────────────
 dataset = [
-    "datasets>=4.7.0,<5.0.0",
+    "datasets>=4.0.0,<5.0.0",
    "pandas>=2.0.0,<3.0.0", # NOTE: Transitive dependency of datasets
    "pyarrow>=21.0.0,<30.0.0", # NOTE: Transitive dependency of datasets
    "lerobot[av-dep]",
@@ -151,8 +151,6 @@ pyserial-dep = ["pyserial>=3.5,<4.0"]
 deepdiff-dep = ["deepdiff>=7.0.1,<9.0.0"]
 pynput-dep = ["pynput>=1.7.8,<1.9.0"]
 pyzmq-dep = ["pyzmq>=26.2.1,<28.0.0"]
-motorbridge-dep = ["motorbridge>=0.3.2,<0.4.0"]
-motorbridge-smart-servo-dep = ["motorbridge-smart-servo>=0.0.4,<0.1.0"]

 # Motors
 feetech = ["feetech-servo-sdk>=1.0.0,<2.0.0", "lerobot[pyserial-dep]", "lerobot[deepdiff-dep]"]
@@ -176,9 +174,6 @@ unitree_g1 = [
    "lerobot[pygame-dep]",
 ]
 reachy2 = ["reachy2_sdk>=1.0.15,<1.1.0"]
-# Seeed Studio reBot B601-DM follower (motorbridge / CAN) + StarArm102 / reBot Arm 102
-# leader (motorbridge-smart-servo / FashionStar UART servos).
-rebot = ["lerobot[motorbridge-dep]", "lerobot[motorbridge-smart-servo-dep]"]
 kinematics = ["lerobot[placo-dep]"]
 intelrealsense = [
    "pyrealsense2>=2.55.1.6486,<2.57.0 ; sys_platform != 'darwin'",
@@ -209,7 +204,6 @@ groot = [
    "flash-attn>=2.5.9,<3.0.0 ; sys_platform != 'darwin'"
 ]
 sarm = ["lerobot[transformers-dep]", "pydantic>=2.0.0,<3.0.0", "faker>=33.0.0,<35.0.0", "lerobot[matplotlib-dep]", "lerobot[qwen-vl-utils-dep]"]
-topreward = ["lerobot[transformers-dep]"]
 xvla = ["lerobot[transformers-dep]"]
 eo1 = ["lerobot[transformers-dep]", "lerobot[qwen-vl-utils-dep]"]
 hilserl = ["lerobot[transformers-dep]", "lerobot[dataset]", "gym-hil>=0.1.13,<0.2.0", "lerobot[grpcio-dep]", "lerobot[placo-dep]"]
@@ -266,7 +260,6 @@ all = [
    "lerobot[lekiwi]",
    "lerobot[openarms]",
    "lerobot[reachy2]",
-    "lerobot[rebot]",
    "lerobot[kinematics]",
    "lerobot[intelrealsense]",
    "lerobot[diffusion]",
@@ -287,7 +280,6 @@ all = [
    "lerobot[libero]; sys_platform == 'linux'",
    "lerobot[metaworld]",
    "lerobot[sarm]",
-    "lerobot[topreward]",
    "lerobot[peft]",
    # "lerobot[unitree_g1]", TODO: Unitree requires specific installation instructions for unitree_sdk2
 ]
--- a/src/lerobot/cameras/opencv/camera_opencv.py
+++ b/src/lerobot/cameras/opencv/camera_opencv.py
@@ -199,13 +199,12 @@ class OpenCVCamera(Camera):
            DeviceNotConnectedError: If the camera is not connected.
        """

+        # Set FOURCC first (if specified) as it can affect available FPS/resolution options
+        if self.config.fourcc is not None:
+            self._validate_fourcc()
        if self.videocapture is None:
            raise DeviceNotConnectedError(f"{self} videocapture is not initialized")

-        set_fourcc_after_size_and_fps = platform.system() == "Windows"
-        if self.config.fourcc is not None and not set_fourcc_after_size_and_fps:
-            self._validate_fourcc()
-
        default_width = int(round(self.videocapture.get(cv2.CAP_PROP_FRAME_WIDTH)))
        default_height = int(round(self.videocapture.get(cv2.CAP_PROP_FRAME_HEIGHT)))

@@ -223,11 +222,6 @@ class OpenCVCamera(Camera):
        else:
            self._validate_fps()

-        if self.config.fourcc is not None and set_fourcc_after_size_and_fps:
-            # On Windows with DSHOW, changing the resolution can silently override the FOURCC setting.
-            # Set FOURCC last to make sure the requested pixel format is actually enforced.
-            self._validate_fourcc()
-
    def _validate_fps(self) -> None:
        """Validates and sets the camera's frames per second (FPS)."""

@@ -436,7 +430,7 @@ class OpenCVCamera(Camera):
        Internal loop run by the background thread for asynchronous reading.

        On each iteration:
-        1. Reads a color frame
+        1. Reads a color frame (blocking call)
        2. Stores result in latest_frame and updates timestamp (thread-safe)
        3. Sets new_frame_event to notify listeners

@@ -445,8 +439,9 @@ class OpenCVCamera(Camera):
        if self.stop_event is None:
            raise RuntimeError(f"{self}: stop_event is not initialized before starting read loop.")

+        stop_event = self.stop_event
        failure_count = 0
-        while not self.stop_event.is_set():
+        while not stop_event.is_set():
            try:
                raw_frame = self._read_from_hardware()
                processed_frame = self._postprocess_image(raw_frame)
@@ -484,6 +479,8 @@ class OpenCVCamera(Camera):

        if self.thread is not None and self.thread.is_alive():
            self.thread.join(timeout=2.0)
+            if self.thread.is_alive():
+                logger.warning(f"{self} read thread did not terminate within timeout.")

        self.thread = None
        self.stop_event = None
--- a/src/lerobot/cameras/realsense/camera_realsense.py
+++ b/src/lerobot/cameras/realsense/camera_realsense.py
@@ -332,8 +332,8 @@ class RealSenseCamera(Camera):
        from the camera hardware via the RealSense pipeline.

        Returns:
-            np.ndarray: The depth map as a NumPy array (height, width)
-                  of type `np.uint16` (raw depth values in millimeters) and rotation.
+            np.ndarray: The depth map as a NumPy array (height, width, 1)
+                  of type `np.uint16` (raw depth values in millimeters).

        Raises:
            DeviceNotConnectedError: If the camera is not connected.
@@ -465,8 +465,8 @@ class RealSenseCamera(Camera):
        Internal loop run by the background thread for asynchronous reading.

        On each iteration:
-        1. Reads a color frame with 500ms timeout
-        2. Stores result in latest_frame and updates timestamp (thread-safe)
+        1. Reads a color/depth frame (blocking call with 10s timeout)
+        2. Stores result in latest_color_frame/latest_depth_frame and updates timestamp (thread-safe)
        3. Sets new_frame_event to notify listeners

        Stops on DeviceNotConnectedError, logs other errors and continues.
@@ -474,8 +474,9 @@ class RealSenseCamera(Camera):
        if self.stop_event is None:
            raise RuntimeError(f"{self}: stop_event is not initialized before starting read loop.")

+        stop_event = self.stop_event
        failure_count = 0
-        while not self.stop_event.is_set():
+        while not stop_event.is_set():
            try:
                frame = self._read_from_hardware()
                color_frame_raw = frame.get_color_frame()
@@ -486,6 +487,8 @@ class RealSenseCamera(Camera):
                    depth_frame_raw = frame.get_depth_frame()
                    depth_frame = np.asanyarray(depth_frame_raw.get_data())
                    processed_depth_frame = self._postprocess_image(depth_frame, depth_frame=True)
+                    if processed_depth_frame.ndim == 2:  # (H, W) -> (H, W, 1)
+                        processed_depth_frame = processed_depth_frame[..., np.newaxis]

                capture_time = time.perf_counter()

@@ -522,6 +525,8 @@ class RealSenseCamera(Camera):

        if self.thread is not None and self.thread.is_alive():
            self.thread.join(timeout=2.0)
+            if self.thread.is_alive():  # pragma: no cover
+                logger.warning(f"{self} read thread did not terminate within timeout.")

        self.thread = None
        self.stop_event = None
@@ -532,7 +537,6 @@ class RealSenseCamera(Camera):
            self.latest_timestamp = None
            self.new_frame_event.clear()

-    # NOTE(Steven): Missing implementation for depth for now
    @check_if_not_connected
    def async_read(self, timeout_ms: float = 200) -> NDArray[Any]:
        """
@@ -575,7 +579,6 @@ class RealSenseCamera(Camera):

        return frame

-    # NOTE(Steven): Missing implementation for depth for now
    @check_if_not_connected
    def read_latest(self, max_age_ms: int = 500) -> NDArray[Any]:
        """Return the most recent (color) frame captured immediately (Peeking).
@@ -611,6 +614,71 @@ class RealSenseCamera(Camera):

        return frame

+    @check_if_not_connected
+    def async_read_depth(self, timeout_ms: float = 200) -> NDArray[Any]:
+        """Read the latest depth frame asynchronously, in metric meters.
+
+        Mirrors :meth:`async_read` but returns the depth stream rather than the
+        color stream. Output is ``np.uint16`` of shape ``(H, W, 1)``.
+
+        Raises:
+            DeviceNotConnectedError: If the camera is not connected.
+            RuntimeError: If ``use_depth`` is ``False`` for this camera, or if
+                the background read thread is not running.
+            TimeoutError: If no frame becomes available within ``timeout_ms``.
+        """
+        if not self.use_depth:
+            raise RuntimeError(f"{self}: cannot read depth — camera was configured with use_depth=False.")
+
+        if self.thread is None or not self.thread.is_alive():
+            raise RuntimeError(f"{self} read thread is not running.")
+
+        if not self.new_frame_event.wait(timeout=timeout_ms / 1000.0):
+            raise TimeoutError(f"Timed out waiting for depth frame from camera {self} after {timeout_ms} ms.")
+
+        with self.frame_lock:
+            depth_frame = self.latest_depth_frame
+            self.new_frame_event.clear()
+
+        if depth_frame is None:
+            raise RuntimeError(f"Internal error: Event set but no depth frame available for {self}.")
+
+        return depth_frame
+
+    @check_if_not_connected
+    def read_latest_depth(self, max_age_ms: int = 500) -> NDArray[Any]:
+        """Return the most recent depth frame in metric meters (peeking).
+
+        Non-blocking counterpart of :meth:`read_latest` for the depth stream.
+        Output is ``np.uint16`` of shape ``(H, W, 1)`` in millimeters.
+
+        Raises:
+            DeviceNotConnectedError: If the camera is not connected.
+            RuntimeError: If ``use_depth`` is ``False`` for this camera, or if
+                no depth frame has been captured yet.
+            TimeoutError: If the latest depth frame is older than ``max_age_ms``.
+        """
+        if not self.use_depth:
+            raise RuntimeError(f"{self}: cannot read depth — camera was configured with use_depth=False.")
+
+        if self.thread is None or not self.thread.is_alive():
+            raise RuntimeError(f"{self} read thread is not running.")
+
+        with self.frame_lock:
+            depth_frame = self.latest_depth_frame
+            timestamp = self.latest_timestamp
+
+        if depth_frame is None or timestamp is None:
+            raise RuntimeError(f"{self} has not captured any depth frames yet.")
+
+        age_ms = (time.perf_counter() - timestamp) * 1e3
+        if age_ms > max_age_ms:
+            raise TimeoutError(
+                f"{self} latest depth frame is too old: {age_ms:.1f} ms (max allowed: {max_age_ms} ms)."
+            )
+
+        return depth_frame
+
    def disconnect(self) -> None:
        """
        Disconnects from the camera, stops the pipeline, and cleans up resources.
--- a/src/lerobot/cameras/zmq/camera_zmq.py
+++ b/src/lerobot/cameras/zmq/camera_zmq.py
@@ -249,8 +249,9 @@ class ZMQCamera(Camera):
        if self.stop_event is None:
            raise RuntimeError(f"{self}: stop_event is not initialized.")

+        stop_event = self.stop_event
        failure_count = 0
-        while not self.stop_event.is_set():
+        while not stop_event.is_set():
            try:
                frame = self._read_from_hardware()
                capture_time = time.perf_counter()
@@ -292,6 +293,8 @@ class ZMQCamera(Camera):

        if self.thread is not None and self.thread.is_alive():
            self.thread.join(timeout=2.0)
+            if self.thread.is_alive():
+                logger.warning(f"{self} read thread did not terminate within timeout.")

        self.thread = None
        self.stop_event = None
--- a/src/lerobot/configs/init.py
+++ b/src/lerobot/configs/init.py
@@ -24,7 +24,6 @@ Import them directly: ``from lerobot.configs.train import TrainPipelineConfig``
 from .dataset import DatasetRecordConfig
 from .default import DatasetConfig, EvalConfig, PeftConfig, WandBConfig
 from .policies import PreTrainedConfig
-from .recipe import MessageTurn, TrainingRecipe, load_recipe
 from .types import (
    FeatureType,
    NormalizationMode,
@@ -35,8 +34,10 @@ from .types import (
 from .video import (
    VALID_VIDEO_CODECS,
    VIDEO_ENCODER_INFO_KEYS,
+    DepthEncoderConfig,
    VideoEncoderConfig,
    camera_encoder_defaults,
+    depth_encoder_defaults,
 )

 __all__ = [
@@ -50,15 +51,14 @@ __all__ = [
    "DatasetRecordConfig",
    "DatasetConfig",
    "EvalConfig",
-    "MessageTurn",
    "PeftConfig",
    "PreTrainedConfig",
-    "TrainingRecipe",
    "WandBConfig",
-    "load_recipe",
    "VideoEncoderConfig",
+    "DepthEncoderConfig",
    # Defaults
    "camera_encoder_defaults",
+    "depth_encoder_defaults",
    # Constants
    "VALID_VIDEO_CODECS",
    "VIDEO_ENCODER_INFO_KEYS",
--- a/src/lerobot/configs/dataset.py
+++ b/src/lerobot/configs/dataset.py
@@ -18,7 +18,7 @@ from dataclasses import dataclass, field
 from datetime import datetime
 from pathlib import Path

-from .video import VideoEncoderConfig, camera_encoder_defaults
+from .video import DepthEncoderConfig, VideoEncoderConfig, camera_encoder_defaults, depth_encoder_defaults


@dataclass
@@ -60,6 +60,8 @@ class DatasetRecordConfig:
    # Video encoder settings for camera MP4s (codec, quality, GOP, etc.). Tuned via CLI nested keys,
    # e.g. ``--dataset.camera_encoder.vcodec=h264`` (see ``VideoEncoderConfig``).
    camera_encoder: VideoEncoderConfig = field(default_factory=camera_encoder_defaults)
+    # Video encoder settings for depth-map MP4s (codec, quality, GOP, etc.). Tuned via CLI nested keys.
+    depth_encoder: DepthEncoderConfig = field(default_factory=depth_encoder_defaults)
    # Enable streaming video encoding: encode frames in real-time during capture instead
    # of writing PNG images first. Makes save_episode() near-instant. More info in the documentation: https://huggingface.co/docs/lerobot/streaming_video_encoding
    streaming_encoding: bool = False
--- a/src/lerobot/configs/recipe.py
+++ b/src/lerobot/configs/recipe.py
@@ -1,206 +0,0 @@
-#!/usr/bin/env python
-
-# Copyright 2026 The HuggingFace Inc. team. All rights reserved.
-#
-# Licensed under the Apache License, Version 2.0 (the "License");
-# you may not use this file except in compliance with the License.
-# You may obtain a copy of the License at
-#
-#     http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-
-from __future__ import annotations
-
-import re
-from dataclasses import dataclass
-from pathlib import Path
-from typing import Any, Literal, get_args
-
-MessageRole = Literal["user", "assistant", "system", "tool"]
-MessageStream = Literal["high_level", "low_level"]
-
-DEFAULT_BINDINGS = {
-    "subtask": "active_at(t, style=subtask)",
-    "memory": "active_at(t, style=memory)",
-    "plan": "active_at(t, style=plan)",
-    "speech": "emitted_at(t, role=assistant, tool_name=say)",
-    "interjection": "emitted_at(t, style=interjection)",
-    "vqa": "emitted_at(t, style=vqa, role=assistant)",
-    "vqa_query": "emitted_at(t, style=vqa, role=user)",
-}
-
-PLACEHOLDER_RE = re.compile(r"\$\{([A-Za-z_][A-Za-z0-9_]*)\}")
-"""``${name}`` placeholder pattern used by both recipe binding-reference
-discovery (here) and rendered-message substitution (in ``language_render``)."""
-
-_VALID_ROLES = frozenset(get_args(MessageRole))
-_VALID_STREAMS = frozenset(get_args(MessageStream))
-
-
-@dataclass
-class MessageTurn:
-    """A single chat-style turn in a recipe template.
-
-    ``content`` may be a plain string, a list of HF-style multimodal blocks, or
-    ``None`` when ``tool_calls_from`` supplies tool-call payloads instead.
-    ``stream`` tags the turn for downstream filtering, ``target`` flags it as a
-    training target, and ``if_present`` skips the turn when the named binding
-    resolves to ``None``.
-    """
-
-    role: MessageRole
-    content: str | list[dict[str, Any]] | None = None
-    stream: MessageStream | None = None
-    target: bool = False
-    if_present: str | None = None
-    tool_calls_from: str | None = None
-
-    def __post_init__(self) -> None:
-        """Validate role, stream, and content after dataclass construction."""
-        if self.role not in _VALID_ROLES:
-            raise ValueError(f"Unsupported message role: {self.role!r}")
-        # ``stream`` is typed Optional only so the dataclass can keep its
-        # field ordering, but recipes must always tag every turn with a
-        # stream — the renderer's ``_validate_rendered`` would reject
-        # ``None`` later on. Fail at construction so the bad recipe is
-        # caught at YAML load time rather than at the first sample.
-        if self.stream is None:
-            raise ValueError(
-                f"MessageTurn(role={self.role!r}) is missing a stream — "
-                f"every turn must declare one of {sorted(_VALID_STREAMS)}."
-            )
-        if self.stream not in _VALID_STREAMS:
-            raise ValueError(f"Unsupported message stream: {self.stream!r}")
-        if self.content is None and self.tool_calls_from is None:
-            raise ValueError("MessageTurn.content is required unless tool_calls_from is set.")
-        if self.content is not None and not isinstance(self.content, (str, list)):
-            raise TypeError("MessageTurn.content must be a string, a list of HF-style blocks, or None.")
-        if isinstance(self.content, list):
-            for block in self.content:
-                if not isinstance(block, dict) or "type" not in block:
-                    raise ValueError(
-                        "Multimodal content blocks must be HF-style dictionaries with a type key."
-                    )
-
-    @classmethod
-    def from_dict(cls, data: dict[str, Any]) -> MessageTurn:
-        """Construct a :class:`MessageTurn` from a plain dictionary."""
-        return cls(**data)
-
-
-@dataclass
-class TrainingRecipe:
-    """A recipe describing how to render training samples from language rows.
-
-    A recipe is either a *message recipe* (``messages`` plus optional
-    ``bindings``) or a *blend recipe* (``blend`` mapping names to weighted
-    sub-recipes). ``weight`` is only meaningful inside a blend.
-    """
-
-    messages: list[MessageTurn] | None = None
-    bindings: dict[str, str] | None = None
-    blend: dict[str, TrainingRecipe] | None = None
-    weight: float | None = None
-
-    def __post_init__(self) -> None:
-        """Validate that exactly one of ``messages`` or ``blend`` is set."""
-        if self.messages is not None and self.blend is not None:
-            raise ValueError("TrainingRecipe must set only one of messages or blend.")
-        if self.messages is None and self.blend is None:
-            raise ValueError("TrainingRecipe must set one of messages or blend.")
-
-        if self.messages is not None:
-            self._validate_message_recipe()
-        if self.blend is not None:
-            self._validate_blend_recipe()
-
-    @classmethod
-    def from_dict(cls, data: dict[str, Any]) -> TrainingRecipe:
-        """Construct a :class:`TrainingRecipe` from a nested dictionary."""
-        data = dict(data)
-        if data.get("messages") is not None:
-            data["messages"] = [
-                turn if isinstance(turn, MessageTurn) else MessageTurn.from_dict(turn)
-                for turn in data["messages"]
-            ]
-        if data.get("blend") is not None:
-            data["blend"] = {
-                name: recipe if isinstance(recipe, TrainingRecipe) else cls.from_dict(recipe)
-                for name, recipe in data["blend"].items()
-            }
-        return cls(**data)
-
-    @classmethod
-    def from_yaml(cls, path: str | Path) -> TrainingRecipe:
-        """Load a :class:`TrainingRecipe` from a YAML file at ``path``."""
-        import yaml  # type: ignore[import-untyped]
-
-        with open(path) as f:
-            data = yaml.safe_load(f)
-        if not isinstance(data, dict):
-            raise ValueError(f"Recipe YAML must contain a mapping at the top level: {path}")
-        return cls.from_dict(data)
-
-    def _validate_message_recipe(self) -> None:
-        """Ensure every templated binding is known and at least one turn is a target."""
-        assert self.messages is not None
-        known_bindings = set(DEFAULT_BINDINGS) | set(self.bindings or {}) | {"task"}
-
-        for turn in self.messages:
-            missing = self._referenced_bindings(turn) - known_bindings
-            if missing:
-                raise ValueError(f"MessageTurn references unknown binding(s): {sorted(missing)}")
-
-        if not any(turn.target for turn in self.messages):
-            raise ValueError("Message recipes must contain at least one target turn.")
-
-    def _validate_blend_recipe(self) -> None:
-        """Ensure each blend component is a non-empty, weighted message recipe."""
-        assert self.blend is not None
-        if not self.blend:
-            raise ValueError("Blend recipes must contain at least one component.")
-
-        for name, recipe in self.blend.items():
-            if recipe.blend is not None:
-                raise ValueError(f"Blend component {name!r} cannot itself define a blend.")
-            if recipe.messages is None:
-                raise ValueError(f"Blend component {name!r} must define messages.")
-            if recipe.weight is None:
-                raise ValueError(f"Blend component {name!r} must define weight.")
-            if recipe.weight <= 0:
-                raise ValueError(f"Blend component {name!r} must have a positive weight.")
-
-    def _referenced_bindings(self, turn: MessageTurn) -> set[str]:
-        """Return the binding names that ``turn`` references via placeholders or attributes."""
-        names: set[str] = set()
-        if turn.if_present is not None:
-            names.add(turn.if_present)
-        if turn.tool_calls_from is not None:
-            names.add(turn.tool_calls_from)
-        names.update(_placeholders_in_content(turn.content))
-        return names
-
-
-def _placeholders_in_content(content: str | list[dict[str, Any]] | None) -> set[str]:
-    """Return the set of ``${name}`` placeholders found anywhere in ``content``."""
-    if content is None:
-        return set()
-    if isinstance(content, str):
-        return set(PLACEHOLDER_RE.findall(content))
-
-    names: set[str] = set()
-    for block in content:
-        for value in block.values():
-            if isinstance(value, str):
-                names.update(PLACEHOLDER_RE.findall(value))
-    return names
-
-
-def load_recipe(path: str | Path) -> TrainingRecipe:
-    """Load a :class:`TrainingRecipe` from a YAML file at ``path``."""
-    return TrainingRecipe.from_yaml(path)
--- a/src/lerobot/configs/video.py
+++ b/src/lerobot/configs/video.py
@@ -19,8 +19,8 @@
 from __future__ import annotations

 import logging
-from dataclasses import dataclass, field
-from typing import Any
+from dataclasses import dataclass, field, fields
+from typing import Any, ClassVar

 from lerobot.utils.import_utils import require_package

@@ -36,11 +36,12 @@ HW_VIDEO_CODECS = [
    "h264_vaapi",  # Linux Intel/AMD
    "h264_qsv",  # Intel Quick Sync
 ]
-VALID_VIDEO_CODECS: frozenset[str] = frozenset({"h264", "hevc", "libsvtav1", "auto", *HW_VIDEO_CODECS})
+VALID_VIDEO_CODECS: frozenset[str] = frozenset(
+    {"h264", "hevc", "libsvtav1", "ffv1", "auto", *HW_VIDEO_CODECS}
+)
 # Aliases for legacy video codec names.
 VIDEO_CODECS_ALIASES: dict[str, str] = {"av1": "libsvtav1"}

-
 LIBSVTAV1_DEFAULT_PRESET: int = 12

 # Keys persisted under ``features[*]["info"]`` as ``video.<name>`` (from :class:`VideoEncoderConfig`).
@@ -52,6 +53,19 @@ VIDEO_ENCODER_INFO_KEYS: frozenset[str] = frozenset(
    f"video.{name}" for name in VIDEO_ENCODER_INFO_FIELD_NAMES
 )

+# Default depth quantization and encoding parameters.
+DEPTH_QUANT_BITS: int = 12
+DEPTH_QMAX: int = (1 << DEPTH_QUANT_BITS) - 1  # 4095
+
+DEFAULT_DEPTH_MIN: float = 0.01
+DEFAULT_DEPTH_MAX: float = 10.0
+DEFAULT_DEPTH_SHIFT: float = 3.5
+DEFAULT_DEPTH_USE_LOG: bool = True
+DEFAULT_DEPTH_PIX_FMT: str = "gray12le"
+
+# Depth-specific tuning fields persisted under ``features[*]["info"]`` as ``video.<name>``.
+DEPTH_ENCODER_INFO_FIELD_NAMES: frozenset[str] = frozenset({"depth_min", "depth_max", "shift", "use_log"})
+

@dataclass
 class VideoEncoderConfig:
@@ -86,6 +100,10 @@ class VideoEncoderConfig:
    video_backend: str = "pyav"
    extra_options: dict[str, Any] = field(default_factory=dict)

+    # Source-data channel count this encoder is expected to handle (3 for RGB,
+    # 1 for depth, etc.)
+    _DEFAULT_CHANNELS: ClassVar[int] = 3
+
    def __post_init__(self) -> None:
        self.resolve_vcodec()
        # Empty-constructor ergonomics: ``VideoEncoderConfig()`` must "just work".
@@ -138,7 +156,9 @@ class VideoEncoderConfig:
            require_package("av", extra="dataset")
            from lerobot.datasets import check_video_encoder_parameters_pyav

-            check_video_encoder_parameters_pyav(self.vcodec, self.pix_fmt, self.get_codec_options())
+            check_video_encoder_parameters_pyav(
+                self.vcodec, self.pix_fmt, self.get_codec_options(), channels=self._DEFAULT_CHANNELS
+            )

    def resolve_vcodec(self) -> None:
        """Check ``vcodec`` and, when it is ``"auto"``, pick a concrete encoder.
@@ -218,6 +238,10 @@ class VideoEncoderConfig:
        elif self.vcodec == "h264_qsv":
            set_if("global_quality", self.crf)
            set_if("preset", self.preset)
+        elif self.vcodec == "ffv1":
+            # Lossless intra-frame codec. ``crf``/``preset``/``fast_decode``
+            # are not meaningful.
+            set_if("threads", encoder_threads)
        else:
            set_if("crf", self.crf)
            set_if("preset", self.preset)
@@ -233,3 +257,59 @@ class VideoEncoderConfig:
 def camera_encoder_defaults() -> VideoEncoderConfig:
    """Return a :class:`VideoEncoderConfig` with RGB-camera defaults."""
    return VideoEncoderConfig()
+
+
+@dataclass
+class DepthEncoderConfig(VideoEncoderConfig):
+    """Encoder configuration for depth-map streams.
+
+    Inherits the full :class:`VideoEncoderConfig` surface (codec, GOP, CRF,
+    preset, ``extra_options``…) and adds the four parameters of the depth
+    quantizer.
+
+    Defaults flip ``vcodec`` to ``"hevc"`` (Main 12 profile) and ``pix_fmt``
+    to ``"gray12le"``.
+
+
+    Attributes:
+        depth_min: Minimum depth in physical units (e.g. metres) represented
+            by quantum ``0``.
+        depth_max: Maximum depth represented by quantum :data:`DEPTH_QMAX`.
+        shift: Pre-log offset for numerical stability near zero.
+        use_log: ``True`` for logarithmic quantization (default; matches
+            sensor error profile), ``False`` for linear.
+    """
+
+    vcodec: str = "hevc"
+    pix_fmt: str = "gray12le"
+
+    depth_min: float = DEFAULT_DEPTH_MIN
+    depth_max: float = DEFAULT_DEPTH_MAX
+    shift: float = DEFAULT_DEPTH_SHIFT
+    use_log: bool = DEFAULT_DEPTH_USE_LOG
+
+    _DEFAULT_CHANNELS: ClassVar[int] = 1
+
+    @classmethod
+    def from_video_info(cls, video_info: dict | None) -> DepthEncoderConfig:
+        """Reconstruct a :class:`DepthEncoderConfig` from a depth feature's ``info`` block.
+
+        Reuses :meth:`VideoEncoderConfig.from_video_info` for the base
+        codec/tuning fields and then layers the depth-specific tuning
+        (``depth_min`` / ``depth_max`` / ``shift`` / ``use_log``) on top.
+        Missing keys fall back to the class defaults.
+        """
+        base = VideoEncoderConfig.from_video_info(video_info)
+        kwargs: dict[str, Any] = {f.name: getattr(base, f.name) for f in fields(base) if f.init}
+
+        video_info = video_info or {}
+        for name in DEPTH_ENCODER_INFO_FIELD_NAMES:
+            value = video_info.get(f"video.{name}")
+            if value is not None:
+                kwargs[name] = value
+        return cls(**kwargs)
+
+
+def depth_encoder_defaults() -> DepthEncoderConfig:
+    """Return a :class:`DepthEncoderConfig` with depth-camera defaults."""
+    return DepthEncoderConfig()
--- a/src/lerobot/datasets/init.py
+++ b/src/lerobot/datasets/init.py
@@ -31,21 +31,12 @@ from .dataset_tools import (
    modify_features,
    modify_tasks,
    recompute_stats,
-    reencode_dataset,
    remove_feature,
    split_dataset,
 )
 from .factory import make_dataset, resolve_delta_timestamps
 from .image_writer import safe_stop_image_writer
 from .io_utils import load_episodes, write_stats
-from .language import (
-    EVENT_ONLY_STYLES,
-    LANGUAGE_EVENTS,
-    LANGUAGE_PERSISTENT,
-    PERSISTENT_STYLES,
-    STYLE_REGISTRY,
-    column_for_style,
-)
 from .lerobot_dataset import LeRobotDataset
 from .multi_dataset import MultiLeRobotDataset
 from .pipeline_features import aggregate_pipeline_dataset_features, create_initial_features
@@ -63,15 +54,10 @@ __all__ = [
    "CODEBASE_VERSION",
    "DEFAULT_EPISODES_PATH",
    "DEFAULT_QUANTILES",
-    "EVENT_ONLY_STYLES",
    "EpisodeAwareSampler",
-    "LANGUAGE_EVENTS",
-    "LANGUAGE_PERSISTENT",
    "LeRobotDataset",
    "LeRobotDatasetMetadata",
    "MultiLeRobotDataset",
-    "PERSISTENT_STYLES",
-    "STYLE_REGISTRY",
    "StreamingLeRobotDataset",
    "VideoEncodingManager",
    "check_video_encoder_parameters_pyav",
@@ -83,7 +69,6 @@ __all__ = [
    "convert_image_to_video_dataset",
    "create_initial_features",
    "create_lerobot_dataset_card",
-    "column_for_style",
    "delete_episodes",
    "get_feature_stats",
    "load_episodes",
@@ -92,7 +77,6 @@ __all__ = [
    "modify_features",
    "modify_tasks",
    "recompute_stats",
-    "reencode_dataset",
    "remove_feature",
    "resolve_delta_timestamps",
    "safe_stop_image_writer",
--- a/src/lerobot/datasets/compute_stats.py
+++ b/src/lerobot/datasets/compute_stats.py
@@ -512,7 +512,7 @@ def compute_episode_stats(

    ep_stats = {}
    for key, data in episode_data.items():
-        if features[key]["dtype"] in {"string", "language"}:
+        if features[key]["dtype"] == "string":
            continue

        if features[key]["dtype"] in ["image", "video"]:
@@ -550,8 +550,10 @@ def _validate_stat_value(value: np.ndarray, key: str, feature_key: str) -> None:
    if key == "count" and value.shape != (1,):
        raise ValueError(f"Shape of 'count' must be (1), but is {value.shape} instead.")

-    if "image" in feature_key and key != "count" and value.shape != (3, 1, 1):
-        raise ValueError(f"Shape of quantile '{key}' must be (3,1,1), but is {value.shape} instead.")
+    if "image" in feature_key and key != "count" and value.shape not in ((3, 1, 1), (1, 1, 1)):
+        raise ValueError(
+            f"Shape of quantile '{key}' must be (3,1,1) or (1,1,1) but is {value.shape} instead."
+        )


 def _assert_type_and_shape(stats_list: list[dict[str, dict]]):
--- a/src/lerobot/datasets/dataset_metadata.py
+++ b/src/lerobot/datasets/dataset_metadata.py
@@ -36,12 +36,12 @@ from .io_utils import (
    load_episodes,
    load_info,
    load_stats,
+    load_subtasks,
    load_tasks,
    write_info,
    write_stats,
    write_tasks,
 )
-from .language import DEFAULT_TOOLS, LANGUAGE_COLUMNS
 from .utils import (
    DEFAULT_EPISODES_PATH,
    check_version_compatibility,
@@ -177,6 +177,7 @@ class LeRobotDatasetMetadata:
        self.info = load_info(self.root)
        check_version_compatibility(self.repo_id, self._version, CODEBASE_VERSION)
        self.tasks = load_tasks(self.root)
+        self.subtasks = load_subtasks(self.root)
        self.episodes = load_episodes(self.root)
        self.stats = load_stats(self.root)

@@ -337,54 +338,30 @@ class LeRobotDatasetMetadata:
        """Keys to access visual modalities stored as videos."""
        return [key for key, ft in self.features.items() if ft["dtype"] == "video"]

+    @property
+    def depth_keys(self) -> list[str]:
+        """Keys to access depth-map modalities stored as videos or images.
+
+        A depth key is a feature whose ``info`` dict carries ``"is_depth_map": True``
+        (or the legacy ``"video.is_depth_map"`` inside ``info`` or ``video_info``).
+        """
+
+        def _is_depth(ft: dict) -> bool:
+            info = ft.get("info") or {}
+            video_info = ft.get("video_info") or {}
+            return (
+                info.get("is_depth_map", False)
+                or info.get("video.is_depth_map", False)
+                or video_info.get("video.is_depth_map", False)
+            )
+
+        return [key for key, ft in self.features.items() if _is_depth(ft)]
+
    @property
    def camera_keys(self) -> list[str]:
        """Keys to access visual modalities (regardless of their storage method)."""
        return [key for key, ft in self.features.items() if ft["dtype"] in ["video", "image"]]

-    @property
-    def has_language_columns(self) -> bool:
-        """Return ``True`` if the dataset declares any language column.
-
-        Used to gate language-aware code paths (collate, render step) so
-        unannotated datasets keep PyTorch's default collate behavior.
-        """
-        return any(col in self.features for col in LANGUAGE_COLUMNS)
-
-    @property
-    def tools(self) -> list[dict]:
-        """OpenAI-style tool schemas declared by this dataset.
-
-        Read from ``meta/info.json["tools"]``. Returns a copy, so callers
-        can mutate the result safely. Falls back to
-        :data:`lerobot.datasets.language.DEFAULT_TOOLS` (the canonical
-        ``say`` schema) when the dataset doesn't declare any — that way
-        unannotated datasets and chat-template consumers
-        (``apply_chat_template(messages, tools=meta.tools)``) keep
-        working out of the box.
-
-        Implementations live under :mod:`lerobot.tools` (one file per
-        tool); see ``docs/source/tools.mdx`` for the authoring guide.
-        """
-        declared = self.info.tools
-        if declared:
-            return [dict(t) for t in declared]
-        return [dict(t) for t in DEFAULT_TOOLS]
-
-    @tools.setter
-    def tools(self, value: list[dict] | None) -> None:
-        """Persist a tool catalog to ``meta/info.json`` and reload metadata.
-
-        Writes ``value`` into the on-disk ``info.json`` (or clears the
-        ``tools`` key when ``value`` is ``None`` or empty), then reloads
-        ``self.info`` so the in-memory metadata matches what's on disk.
-        Saves callers from hand-editing ``info.json`` and re-instantiating
-        the metadata object.
-        """
-        self.info.tools = [dict(t) for t in value] if value else None
-        write_info(self.info, self.root)
-        self.info = load_info(self.root)
-
    @property
    def names(self) -> dict[str, list | dict]:
        """Names of the various dimensions of vector modalities."""
@@ -580,7 +557,7 @@ class LeRobotDatasetMetadata:
    def update_video_info(
        self,
        video_key: str | None = None,
-        camera_encoder: VideoEncoderConfig | None = None,
+        video_encoder: VideoEncoderConfig | None = None,
    ) -> None:
        """Populate per-feature video info in ``info.json``.

@@ -600,9 +577,13 @@ class LeRobotDatasetMetadata:

        video_keys = [video_key] if video_key is not None else self.video_keys
        for key in video_keys:
-            if not self.features[key].get("info", None):
-                video_path = self.root / self.video_path.format(video_key=key, chunk_index=0, file_index=0)
-                self.info.features[key]["info"] = get_video_info(video_path, camera_encoder=camera_encoder)
+            existing = self.features[key].get("info") or {}
+            # Skip only if real video info has already been written. The ``is_depth_map`` entry (created at feature creation) is not blocking.
+            if set(existing.keys()) - {"is_depth_map"}:
+                continue
+            video_path = self.root / self.video_path.format(video_key=key, chunk_index=0, file_index=0)
+            new_info = get_video_info(video_path, video_encoder=video_encoder)
+            self.info.features[key]["info"] = {**existing, **new_info}

    def update_chunk_settings(
        self,
@@ -713,6 +694,7 @@ class LeRobotDatasetMetadata:
        _validate_feature_names(features)

        obj.tasks = None
+        obj.subtasks = None
        obj.episodes = None
        obj.stats = None
        obj.info = create_empty_dataset_info(
--- a/src/lerobot/datasets/dataset_reader.py
+++ b/src/lerobot/datasets/dataset_reader.py
@@ -22,7 +22,10 @@ from pathlib import Path
 import datasets
 import torch

+from lerobot.configs.video import DepthEncoderConfig
+
 from .dataset_metadata import LeRobotDatasetMetadata
+from .depth_utils import dequantize_depth
 from .feature_utils import (
    check_delta_timestamps,
    get_delta_indices,
@@ -86,6 +89,12 @@ class DatasetReader:
            check_delta_timestamps(delta_timestamps, meta.fps, tolerance_s)
            self.delta_indices = get_delta_indices(delta_timestamps, meta.fps)

+        ##TODO(CarolinePascal): Should we rather use a more lightweight structure ?
+        self._depth_encoder_configs: dict[str, DepthEncoderConfig] = {
+            vid_key: DepthEncoderConfig.from_video_info(self._meta.features[vid_key].get("info"))
+            for vid_key in self._meta.depth_keys
+        }
+
    def try_load(self) -> bool:
        """Attempt to load from local cache. Returns True if data is sufficient."""
        try:
@@ -247,7 +256,18 @@ class DatasetReader:
                self._tolerance_s,
                self._video_backend,
                return_uint8=self._return_uint8,
+                is_depth=vid_key in self._meta.depth_keys,
            )
+            if vid_key in self._meta.depth_keys:
+                depth_encoder = self._depth_encoder_configs[vid_key]
+                frames = dequantize_depth(
+                    frames,
+                    depth_min=depth_encoder.depth_min,
+                    depth_max=depth_encoder.depth_max,
+                    shift=depth_encoder.shift,
+                    use_log=depth_encoder.use_log,
+                    output_tensor=True,
+                )
            return vid_key, frames.squeeze(0)

        items = list(query_timestamps.items())
@@ -295,4 +315,9 @@ class DatasetReader:
        task_idx = item["task_index"].item()
        item["task"] = self._meta.tasks.iloc[task_idx].name

+        # add subtask information if available
+        if "subtask_index" in self._meta.features and self._meta.subtasks is not None:
+            subtask_idx = item["subtask_index"].item()
+            item["subtask"] = self._meta.subtasks.iloc[subtask_idx].name
+
        return item
--- a/src/lerobot/datasets/dataset_tools.py
+++ b/src/lerobot/datasets/dataset_tools.py
@@ -26,7 +26,7 @@ This module provides utilities for:
 import logging
 import shutil
 from collections.abc import Callable
-from concurrent.futures import ProcessPoolExecutor, ThreadPoolExecutor, as_completed
+from concurrent.futures import ThreadPoolExecutor, as_completed
 from pathlib import Path

 import datasets
@@ -61,13 +61,11 @@ from .utils import (
    DEFAULT_DATA_FILE_SIZE_IN_MB,
    DEFAULT_DATA_PATH,
    DEFAULT_EPISODES_PATH,
-    VIDEO_DIR,
    update_chunk_file_indices,
 )
 from .video_utils import (
    encode_video_frames,
    get_video_info,
-    reencode_video,
 )


@@ -1331,7 +1329,7 @@ def _estimate_frame_size_via_calibration(
            imgs_dir=calibration_dir,
            video_path=calibration_video_path,
            fps=fps,
-            camera_encoder=camera_encoder,
+            video_encoder=camera_encoder,
            overwrite=True,
        )

@@ -1815,7 +1813,7 @@ def convert_image_to_video_dataset(
                    imgs_dir=imgs_dir,
                    video_path=video_path,
                    fps=fps,
-                    camera_encoder=camera_encoder,
+                    video_encoder=camera_encoder,
                    overwrite=True,
                )

@@ -1862,7 +1860,7 @@ def convert_image_to_video_dataset(
                    video_key=img_key, chunk_index=0, file_index=0
                )
                new_meta.info.features[img_key]["info"] = get_video_info(
-                    video_path, camera_encoder=camera_encoder
+                    video_path, video_encoder=camera_encoder
                )

        write_info(new_meta.info, new_meta.root)
@@ -1886,83 +1884,3 @@ def convert_image_to_video_dataset(

    # Return new dataset
    return LeRobotDataset(repo_id=repo_id, root=output_dir)
-
-
-def _reencode_video_worker(args: tuple) -> Path:
-    """Picklable worker for :func:`reencode_dataset`'s process pool."""
-    video_path, camera_encoder, encoder_threads = args
-    reencode_video(
-        input_video_path=video_path,
-        output_video_path=video_path,
-        camera_encoder=camera_encoder,
-        encoder_threads=encoder_threads,
-        overwrite=True,
-    )
-    return video_path
-
-
-def reencode_dataset(
-    dataset: LeRobotDataset,
-    camera_encoder: VideoEncoderConfig,
-    encoder_threads: int | None = None,
-    num_workers: int | None = None,
-) -> LeRobotDataset:
-    """Re-encode every video in a dataset with a new set of encoding parameters.
-
-    Videos are re-encoded in-place and the video information in ``info.json`` is refreshed.
-
-    Args:
-        dataset: An existing :class:`LeRobotDataset` whose videos will be
-            re-encoded.
-        camera_encoder: Target encoder configuration applied to every video
-            file.
-        encoder_threads: Per-encoder thread count forwarded to
-            :func:`reencode_video`. ``None`` lets the codec decide.
-        num_workers: Number of parallel processes. ``None`` or ``0`` means
-            sequential (no multiprocessing); ``1+`` spawns a
-            :class:`~concurrent.futures.ProcessPoolExecutor`.
-
-    Returns:
-        The same :class:`LeRobotDataset` instance with its metadata updated
-        on disk.
-    """
-    meta = dataset.meta
-    video_paths_list = []
-
-    # Only re-encode if the videos are not already encoded with the given video encoding parameters
-    for video_key in meta.video_keys:
-        current_info = meta.info.features[video_key].get("info", {})
-        current_encoder = VideoEncoderConfig.from_video_info(current_info)
-        if current_encoder != camera_encoder:
-            video_paths_list.extend((meta.root / VIDEO_DIR / video_key).rglob("*.mp4"))
-        else:
-            logging.info(f"{video_key} videos are already encoded with {camera_encoder}. Nothing to do.")
-
-    if len(video_paths_list) == 0:
-        logging.warning("Dataset has no videos to re-encode.")
-        return dataset
-    logging.info(f"Re-encoding {len(video_paths_list)} video file(s) with {camera_encoder}")
-
-    worker_args = [(vp, camera_encoder, encoder_threads) for vp in video_paths_list]
-    if num_workers and num_workers > 1:
-        with ProcessPoolExecutor(max_workers=num_workers) as pool:
-            futures = [pool.submit(_reencode_video_worker, args) for args in worker_args]
-            for future in tqdm(
-                as_completed(futures),
-                total=len(futures),
-                desc="Re-encoding videos",
-            ):
-                future.result()
-    else:
-        for args in tqdm(worker_args, desc="Re-encoding videos"):
-            _reencode_video_worker(args)
-
-    # Refresh video info in metadata for every video key.
-    for vid_key in meta.video_keys:
-        video_path = meta.root / meta.get_video_file_path(0, vid_key)
-        meta.info.features[vid_key]["info"] = get_video_info(video_path, camera_encoder=camera_encoder)
-
-    write_info(meta.info, meta.root)
-    logging.info("Dataset metadata updated.")
-
-    return dataset
--- a/src/lerobot/datasets/dataset_writer.py
+++ b/src/lerobot/datasets/dataset_writer.py
@@ -31,7 +31,12 @@ import PIL.Image
 import pyarrow.parquet as pq
 import torch

-from lerobot.configs import VideoEncoderConfig, camera_encoder_defaults
+from lerobot.configs import (
+    DepthEncoderConfig,
+    VideoEncoderConfig,
+    camera_encoder_defaults,
+    depth_encoder_defaults,
+)

 from .compute_stats import compute_episode_stats
 from .dataset_metadata import LeRobotDatasetMetadata
@@ -48,6 +53,7 @@ from .io_utils import (
    write_info,
 )
 from .utils import (
+    DEFAULT_DEPTH_PATH,
    DEFAULT_EPISODES_PATH,
    DEFAULT_IMAGE_PATH,
    update_chunk_file_indices,
@@ -67,17 +73,22 @@ def _encode_video_worker(
    episode_index: int,
    root: Path,
    fps: int,
-    camera_encoder: VideoEncoderConfig | None = None,
+    video_encoder: VideoEncoderConfig | None = None,
    encoder_threads: int | None = None,
 ) -> Path:
    temp_path = Path(tempfile.mkdtemp(dir=root)) / f"{video_key}_{episode_index:03d}.mp4"
-    fpath = DEFAULT_IMAGE_PATH.format(image_key=video_key, episode_index=episode_index, frame_index=0)
+    path_template = (
+        DEFAULT_DEPTH_PATH
+        if video_encoder is not None and isinstance(video_encoder, DepthEncoderConfig)
+        else DEFAULT_IMAGE_PATH
+    )
+    fpath = path_template.format(image_key=video_key, episode_index=episode_index, frame_index=0)
    img_dir = (root / fpath).parent
    encode_video_frames(
        img_dir,
        temp_path,
        fps,
-        camera_encoder=camera_encoder,
+        video_encoder=video_encoder,
        encoder_threads=encoder_threads,
        overwrite=True,
    )
@@ -97,6 +108,7 @@ class DatasetWriter:
        meta: LeRobotDatasetMetadata,
        root: Path,
        camera_encoder: VideoEncoderConfig | None,
+        depth_encoder: DepthEncoderConfig | None,
        encoder_threads: int | None,
        batch_encoding_size: int,
        streaming_encoder: StreamingVideoEncoder | None = None,
@@ -110,6 +122,8 @@ class DatasetWriter:
            root: Local dataset root directory.
            camera_encoder: Video encoder settings applied to all cameras.
                ``None`` uses :func:`~lerobot.configs.camera_encoder_defaults`.
+            depth_encoder: Video encoder settings applied to all **depth** cameras.
+                ``None`` uses :func:`~lerobot.configs.depth_encoder_defaults`.
            encoder_threads: Number of encoder threads (global). ``None``
                lets the codec decide.
            batch_encoding_size: Number of episodes to accumulate before
@@ -121,6 +135,7 @@ class DatasetWriter:
        self._meta = meta
        self._root = root
        self._camera_encoder = camera_encoder or camera_encoder_defaults()
+        self._depth_encoder = depth_encoder or depth_encoder_defaults()
        self._encoder_threads = encoder_threads
        self._batch_encoding_size = batch_encoding_size
        self._streaming_encoder = streaming_encoder
@@ -145,7 +160,8 @@ class DatasetWriter:
        return ep_buffer

    def _get_image_file_path(self, episode_index: int, image_key: str, frame_index: int) -> Path:
-        fpath = DEFAULT_IMAGE_PATH.format(
+        path_template = DEFAULT_DEPTH_PATH if image_key in self._meta.depth_keys else DEFAULT_IMAGE_PATH
+        fpath = path_template.format(
            image_key=image_key, episode_index=episode_index, frame_index=frame_index
        )
        return self._root / fpath
@@ -195,6 +211,7 @@ class DatasetWriter:
        if frame_index == 0 and self._streaming_encoder is not None:
            self._streaming_encoder.start_episode(
                video_keys=list(self._meta.video_keys),
+                depth_video_keys=set(self._meta.video_keys) & set(self._meta.depth_keys),
                temp_dir=self._root,
            )

@@ -293,7 +310,9 @@ class DatasetWriter:
                            episode_index,
                            self._root,
                            self._meta.fps,
-                            self._camera_encoder,
+                            self._depth_encoder
+                            if video_key in self._meta.depth_keys
+                            else self._camera_encoder,
                            self._encoder_threads,
                        ): video_key
                        for video_key in self._meta.video_keys
@@ -504,7 +523,12 @@ class DatasetWriter:

        # Update video info (only needed when first episode is encoded)
        if episode_index == 0:
-            self._meta.update_video_info(video_key, camera_encoder=self._camera_encoder)
+            self._meta.update_video_info(
+                video_key,
+                video_encoder=self._depth_encoder
+                if video_key in self._meta.depth_keys
+                else self._camera_encoder,
+            )
            write_info(self._meta.info, self._meta.root)

        metadata = {
@@ -571,13 +595,14 @@ class DatasetWriter:
            self.image_writer.wait_until_done()

    def _encode_temporary_episode_video(self, video_key: str, episode_index: int) -> Path:
-        """Use ffmpeg to convert frames stored as png into mp4 videos."""
+        """Use ffmpeg to convert frames stored as png/tiff into mp4 videos."""
+        is_depth = video_key in self._meta.depth_keys
        return _encode_video_worker(
            video_key,
            episode_index,
            self._root,
            self._meta.fps,
-            self._camera_encoder,
+            self._depth_encoder if is_depth else self._camera_encoder,
            self._encoder_threads,
        )

--- a/src/lerobot/datasets/depth_utils.py
+++ b/src/lerobot/datasets/depth_utils.py
@@ -0,0 +1,214 @@
+#!/usr/bin/env python
+
+# Copyright 2026 The HuggingFace Inc. team. All rights reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+"""
+Depth encoding/decoding helpers for :class:`VideoEncoderConfig`.
+"""
+
+import math
+from typing import Literal
+
+import av
+import numpy as np
+import torch
+from numpy.typing import NDArray
+
+from lerobot.configs.video import (
+    DEFAULT_DEPTH_MAX,
+    DEFAULT_DEPTH_MIN,
+    DEFAULT_DEPTH_PIX_FMT,
+    DEFAULT_DEPTH_SHIFT,
+    DEFAULT_DEPTH_USE_LOG,
+    DEPTH_QMAX,
+)
+
+from .pyav_utils import write_u16_plane
+
+_MM_PER_METRE = 1000.0
+_UINT16_MAX = 65535
+
+
+def _validate_log_quant_params(depth_min: float, shift: float) -> None:
+    """Ensure ``log(depth_min + shift)`` is finite."""
+    if depth_min + shift <= 0:
+        raise ValueError(
+            f"depth_min + shift must be positive for logarithmic quantization, "
+            f"got depth_min={depth_min} + shift={shift} = {depth_min + shift}"
+        )
+
+
+def _depth_input_to_float32_and_unit(
+    depth: NDArray[np.integer] | NDArray[np.floating],
+    input_unit: Literal["auto", "m", "mm"],
+) -> tuple[NDArray[np.float32], Literal["m", "mm"]]:
+    """Convert depth to float32 in the chosen unit, and return the resolved unit."""
+    resolved_unit = (
+        ("m" if np.issubdtype(depth.dtype, np.floating) else "mm") if input_unit == "auto" else input_unit
+    )
+    return depth.astype(np.float32, order="K"), resolved_unit
+
+
+def quantize_depth(
+    depth: NDArray[np.uint16] | NDArray[np.float32] | torch.Tensor,
+    depth_min: float = DEFAULT_DEPTH_MIN,
+    depth_max: float = DEFAULT_DEPTH_MAX,
+    shift: float = DEFAULT_DEPTH_SHIFT,
+    use_log: bool = DEFAULT_DEPTH_USE_LOG,
+    pix_fmt: str = DEFAULT_DEPTH_PIX_FMT,
+    video_backend: str | None = "pyav",
+    input_unit: Literal["auto", "m", "mm"] = "auto",
+) -> NDArray[np.uint16] | av.VideoFrame:
+    """Quantize depth to 12-bit codes (``uint16``, values ``0…DEPTH_QMAX``).
+
+    Depth maps are packed into 12-bit integer frames so they fit in standard
+    high-bit-depth pixel formats (e.g. ``yuv420p12le`` / ``gray12le``)
+    and can be encoded by widely supported video codecs (HEVC Main 12, ffv1).
+    Logarithmic quantization is the default because it allocates more quanta
+    to near-range depth, which matches the (1/depth) error profile of typical
+    depth sensors. Math is ported from BEHAVIOR-1K's ``obs_utils.py``.
+
+    **Input units**:
+
+    - ``input_unit="auto"`` (default): infer from dtype (floating = m, non-floating = mm).
+    - ``input_unit="mm"``: interpret input values as millimetres.
+    - ``input_unit="m"``: interpret input values as metres.
+
+    Quantization math runs in the **resolved input unit**.
+
+    ``depth_min``, ``depth_max``, and ``shift`` are always in **metres**.
+
+    Args:
+        depth: Depth map; ``torch.Tensor`` is moved to CPU for conversion.
+        depth_min: Depth (metres) at quantum ``0``.
+        depth_max: Depth (metres) at quantum :data:`DEPTH_QMAX`.
+        shift: Depth shift (metres); used in log mode. Must satisfy ``depth_min + shift > 0``.
+        use_log: If ``True`` (default), quantize in log space.
+        video_backend: Video backend to use for encoding. Defaults to "pyav".
+        input_unit: Input unit policy (``"auto"``, ``"mm"``, ``"m"``).
+
+    Returns:
+        ``numpy.ndarray``, ``dtype=uint16``, same shape as ``depth``, values in
+        ``[0, DEPTH_QMAX]``.
+
+    Raises:
+        ValueError: If ``input_unit`` is not ``"auto"``, ``"mm"``, or ``"m"``.
+        ValueError: If ``use_log=True`` and ``depth_min + shift <= 0``.
+    """
+    if input_unit not in ("auto", "m", "mm"):
+        raise ValueError(f"input_unit must be 'auto', 'm', or 'mm', got {input_unit!r}")
+
+    if isinstance(depth, torch.Tensor):
+        depth = depth.detach().cpu().numpy()
+
+    # Squeeze single-channel dim: (H, W, 1) or (1, H, W) → (H, W)
+    if depth.ndim == 3 and (depth.shape[-1] == 1 or depth.shape[0] == 1):
+        depth = depth.squeeze()
+
+    depth_f, resolved_unit = _depth_input_to_float32_and_unit(depth, input_unit=input_unit)
+
+    # Convert depth_min, depth_max, and shift to the resolved input unit.
+    depth_min_u = np.float32(depth_min) if resolved_unit == "m" else np.float32(depth_min * _MM_PER_METRE)
+    depth_max_u = np.float32(depth_max) if resolved_unit == "m" else np.float32(depth_max * _MM_PER_METRE)
+    shift_u = np.float32(shift) if resolved_unit == "m" else np.float32(shift * _MM_PER_METRE)
+
+    # Normalization and quantization is performed in the resolved input unit.
+    if use_log:
+        _validate_log_quant_params(depth_min, shift)
+        log_min = math.log(float(depth_min_u + shift_u))
+        log_max = math.log(float(depth_max_u + shift_u))
+        norm = (np.log(depth_f + shift_u) - log_min) / (log_max - log_min)
+    else:
+        norm = (depth_f - depth_min_u) / (depth_max_u - depth_min_u)
+
+    quantized = np.rint(norm * DEPTH_QMAX).clip(0, DEPTH_QMAX).astype(np.uint16, copy=False)
+
+    if video_backend == "pyav":
+        frame = av.VideoFrame.from_ndarray(quantized, format=pix_fmt)
+        write_u16_plane(frame.planes[0], quantized)
+        return frame
+    else:
+        return quantized
+
+
+def dequantize_depth(
+    quantized: NDArray[np.uint16] | av.VideoFrame,
+    depth_min: float = DEFAULT_DEPTH_MIN,
+    depth_max: float = DEFAULT_DEPTH_MAX,
+    shift: float = DEFAULT_DEPTH_SHIFT,
+    use_log: bool = DEFAULT_DEPTH_USE_LOG,
+    pix_fmt: str = DEFAULT_DEPTH_PIX_FMT,
+    output_unit: Literal["m", "mm"] = "mm",
+    output_tensor: bool = False,
+) -> NDArray[np.uint16] | NDArray[np.float32] | torch.Tensor:
+    """Inverse of :func:`quantize_depth`.
+
+    Tuning arguments **must match** :func:`quantize_depth`.
+
+    Decoding inverts the same normalized code mapping as :func:`quantize_depth`
+    using ``depth_min`` / ``depth_max`` / ``shift`` (in metres), then returns
+    the requested output unit.
+
+    Args:
+        quantized: 12-bit codes ``[0, DEPTH_QMAX]``, ``dtype=uint16``.
+        depth_min, depth_max, shift, use_log: Same as :func:`quantize_depth` (metres).
+        output_unit: ``\"mm\"`` returns ``uint16`` millimetres (``rint``, clip
+            ``[0, 65535]``). ``\"m\"`` returns ``float32`` metres in
+            ``[depth_min, depth_max]``.
+        output_tensor: If True, return a torch.Tensor instead of a numpy array.
+
+    Returns:
+        Depth map in the requested unit and dtype.
+
+    Raises:
+        ValueError: If ``use_log=True`` and ``depth_min + shift <= 0``.
+        ValueError: If ``output_unit`` is not ``\"m\"`` or ``\"mm\"``.
+    """
+    if output_unit not in ("m", "mm"):
+        raise ValueError(f"output_unit must be 'm' or 'mm', got {output_unit!r}")
+
+    if isinstance(quantized, av.VideoFrame):
+        quantized = quantized.to_ndarray(format=pix_fmt)
+
+    norm = np.asarray(quantized, dtype=np.float32, order="K") / DEPTH_QMAX
+
+    depth_min_m = np.float32(depth_min)
+    depth_max_m = np.float32(depth_max)
+    shift_m = np.float32(shift)
+
+    # The de-normalization and de-quantization is performed in meters (convenience choice).
+    if use_log:
+        _validate_log_quant_params(depth_min, shift)
+        log_min = math.log(float(depth_min_m + shift_m))
+        log_max = math.log(float(depth_max_m + shift_m))
+        depth_m = np.exp(norm * (log_max - log_min) + log_min) - shift_m
+    else:
+        depth_m = norm * (depth_max_m - depth_min_m) + depth_min_m
+    depth_m = np.clip(depth_m, depth_min_m, depth_max_m).astype(np.float32, copy=False)
+
+    # Add single-channel dim: (H, W) → (H, W, 1)
+    if depth_m.ndim == 2:
+        depth_m = depth_m[..., np.newaxis]
+
+    # Return depth as float32 meters.
+    if output_unit == "m":
+        return torch.from_numpy(depth_m) if output_tensor else depth_m
+
+    # Return depth as uint16 millimeters.
+    mm = np.rint(depth_m * _MM_PER_METRE).clip(0, _UINT16_MAX).astype(np.uint16, copy=False)
+    if output_tensor:
+        # torch.uint16 support is very limited, we convert to float32 instead.
+        return torch.from_numpy(mm.astype(np.float32))
+    else:
+        return mm
--- a/src/lerobot/datasets/feature_utils.py
+++ b/src/lerobot/datasets/feature_utils.py
@@ -13,7 +13,6 @@
 # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 # See the License for the specific language governing permissions and
 # limitations under the License.
-import logging
 from pprint import pformat

 import datasets
@@ -24,12 +23,6 @@ from lerobot.configs import VIDEO_ENCODER_INFO_KEYS
 from lerobot.utils.constants import DEFAULT_FEATURES
 from lerobot.utils.utils import is_valid_numpy_dtype_string

-from .language import (
-    LANGUAGE_PERSISTENT,
-    is_language_column,
-    language_events_column_feature,
-    language_persistent_column_feature,
-)
 from .utils import (
    DEFAULT_CHUNK_SIZE,
    DEFAULT_DATA_FILE_SIZE_IN_MB,
@@ -54,13 +47,7 @@ def get_hf_features_from_features(features: dict) -> datasets.Features:
    """
    hf_features = {}
    for key, ft in features.items():
-        if is_language_column(key):
-            hf_features[key] = (
-                language_persistent_column_feature()
-                if key == LANGUAGE_PERSISTENT
-                else language_events_column_feature()
-            )
-        elif ft["dtype"] == "video":
+        if ft["dtype"] == "video":
            continue
        elif ft["dtype"] == "image":
            hf_features[key] = datasets.Image()
@@ -291,8 +278,6 @@ def validate_feature_dtype_and_shape(
        return validate_feature_image_or_video(name, expected_shape, value)
    elif expected_dtype == "string":
        return validate_feature_string(name, value)
-    elif expected_dtype == "language":
-        return validate_feature_language(name, value)
    else:
        raise NotImplementedError(f"The feature dtype '{expected_dtype}' is not implemented yet.")

@@ -336,7 +321,7 @@ def validate_feature_image_or_video(

    Args:
        name (str): The name of the feature.
-        expected_shape (list[str]): The expected shape (C, H, W).
+        expected_shape (list[str]): The expected shape, e.g. (C, H, W) or (H, W, C).
        value: The image data to validate.

    Returns:
@@ -372,30 +357,6 @@ def validate_feature_string(name: str, value: str) -> str:
    return ""


-def validate_feature_language(name: str, value) -> str:
-    """Validate a feature that is expected to hold language annotations.
-
-    Language columns (``language_persistent`` / ``language_events``) are
-    populated after recording by the annotation pipeline, not at record time.
-    Any value supplied here is dropped before the frame is written, so a
-    non-empty value almost certainly signals a mistake. We warn rather than
-    fail to keep recording resilient.
-
-    Args:
-        name (str): The name of the feature.
-        value: The value to validate.
-
-    Returns:
-        str: Always an empty string — language values are non-fatal.
-    """
-    if value is not None:
-        logging.warning(
-            f"The feature '{name}' is a 'language' column populated by the annotation pipeline, "
-            f"not at record time. The provided value will be dropped."
-        )
-    return ""
-
-
 def validate_episode_buffer(episode_buffer: dict, total_episodes: int, features: dict) -> None:
    """Validate the episode buffer before it's written to disk.

--- a/src/lerobot/datasets/image_writer.py
+++ b/src/lerobot/datasets/image_writer.py
@@ -42,10 +42,41 @@ def safe_stop_image_writer(func):


 def image_array_to_pil_image(image_array: np.ndarray, range_check: bool = True) -> PIL.Image.Image:
-    # TODO(aliberts): handle 1 channel and 4 for depth images
-    if image_array.ndim != 3:
-        raise ValueError(f"The array has {image_array.ndim} dimensions, but 3 is expected for an image.")
+    """Convert a NumPy array to a PIL Image, preserving precision for grayscale.

+    Behaviour by shape:
+
+    - ``(H, W)`` or ``(1, H, W)`` / ``(H, W, 1)``: single-channel grayscale.
+      The native dtype is preserved using the matching PIL mode
+      (``I;16`` / ``F``). This is the path used for raw depth maps (no rescaling, clamping, or downcasting)
+    - ``(3, H, W)`` / ``(H, W, 3)``: RGB. Channels-first inputs are transposed
+      to channels-last. Float inputs in ``[0, 1]`` are scaled to ``uint8``
+      (existing behaviour, gated by ``range_check``).
+
+    Other shapes / channel counts raise ``NotImplementedError`` or
+    ``ValueError``.
+    """
+    # TODO(CarolinePascal): 4 dimensions RGB-D images
+    if image_array.ndim not in (2, 3):
+        raise ValueError(f"The array has {image_array.ndim} dimensions, but 2 or 3 is expected for an image.")
+
+    # Squeeze 3D single-channel inputs to 2D so depth maps work whether the
+    # caller emits (H, W), (1, H, W), or (H, W, 1).
+    if image_array.ndim == 3:
+        if image_array.shape[0] == 1:
+            image_array = image_array[0]
+        elif image_array.shape[-1] == 1:
+            image_array = image_array[..., 0]
+
+    if image_array.ndim == 2:
+        if image_array.dtype not in [np.uint16, np.float32]:
+            raise ValueError(
+                f"Unsupported single-channel image dtype: {image_array.dtype}. "
+                f"Supported dtypes: {sorted(str(d) for d in [np.uint16, np.float32])}."
+            )
+        return PIL.Image.fromarray(np.ascontiguousarray(image_array))
+
+    # 3D path: must be RGB (3 channels), channels-first or channels-last.
    if image_array.shape[0] == 3:
        # Transpose from pytorch convention (C, H, W) to (H, W, C)
        image_array = image_array.transpose(1, 2, 0)
@@ -71,13 +102,28 @@ def image_array_to_pil_image(image_array: np.ndarray, range_check: bool = True)
    return PIL.Image.fromarray(image_array)


+def save_kwargs_for_path(fpath: Path, compress_level: int) -> dict:
+    """Pick the right format-specific kwargs for :meth:`PIL.Image.Image.save`.
+
+    PNG uses ``compress_level`` (0-9, zlib). TIFF uses ``compression`` (raw) for lossless raw depth maps.
+    """
+    suffix = Path(fpath).suffix.lower()
+    if suffix == ".png":
+        return {"compress_level": compress_level}
+    if suffix in (".tif", ".tiff"):
+        return {"compression": "raw"}
+    return {}
+
+
 def write_image(image: np.ndarray | PIL.Image.Image, fpath: Path, compress_level: int = 1):
    """
    Saves a NumPy array or PIL Image to a file.

    This function handles both NumPy arrays and PIL Image objects, converting
    the former to a PIL Image before saving. It includes error handling for
-    the save operation.
+    the save operation. The output format is inferred from the *fpath*
+    extension: ``.png`` → PNG with ``compress_level``, ``.tiff`` / ``.tif``
+    → lossless raw depth maps (TIFF).

    Args:
        image (np.ndarray | PIL.Image.Image): The image data to save.
@@ -101,7 +147,7 @@ def write_image(image: np.ndarray | PIL.Image.Image, fpath: Path, compress_level
            img = image
        else:
            raise TypeError(f"Unsupported image type: {type(image)}")
-        img.save(fpath, compress_level=compress_level)
+        img.save(fpath, **save_kwargs_for_path(fpath, compress_level))
    except Exception as e:
        logger.error("Error writing image %s: %s", fpath, e)

--- a/src/lerobot/datasets/io_utils.py
+++ b/src/lerobot/datasets/io_utils.py
@@ -31,10 +31,10 @@ from torchvision import transforms
 from lerobot.utils.io_utils import load_json, write_json
 from lerobot.utils.utils import SuppressProgressBars, flatten_dict, unflatten_dict

-from .language import LANGUAGE_COLUMNS
 from .utils import (
    DEFAULT_DATA_FILE_SIZE_IN_MB,
    DEFAULT_EPISODES_PATH,
+    DEFAULT_SUBTASKS_PATH,
    DEFAULT_TASKS_PATH,
    EPISODES_DIR,
    INFO_PATH,
@@ -186,6 +186,14 @@ def load_tasks(local_dir: Path) -> pandas.DataFrame:
    return tasks


+def load_subtasks(local_dir: Path) -> pandas.DataFrame | None:
+    """Load subtasks from subtasks.parquet if it exists."""
+    subtasks_path = local_dir / DEFAULT_SUBTASKS_PATH
+    if subtasks_path.exists():
+        return pd.read_parquet(subtasks_path)
+    return None
+
+
 def write_episodes(episodes: Dataset, local_dir: Path) -> None:
    """Write episode metadata to a parquet file in the LeRobot v3.0 format.
    This function writes episode-level metadata to a single parquet file.
@@ -257,13 +265,11 @@ def hf_transform_to_torch(items_dict: dict[str, list[Any]]) -> dict[str, list[to
        dict: The batch with items converted to torch tensors.
    """
    for key in items_dict:
-        if key in LANGUAGE_COLUMNS:
-            continue
        first_item = items_dict[key][0]
        if isinstance(first_item, PILImage.Image):
            to_tensor = transforms.ToTensor()
            items_dict[key] = [to_tensor(img) for img in items_dict[key]]
-        elif first_item is None or isinstance(first_item, dict):
+        elif first_item is None:
            pass
        else:
            items_dict[key] = [x if isinstance(x, str) else torch.tensor(x) for x in items_dict[key]]
@@ -298,9 +304,8 @@ def item_to_torch(item: dict) -> dict:
    Returns:
        dict: Dictionary with all tensor-like items converted to torch.Tensor.
    """
-    skip_keys = {"task", *LANGUAGE_COLUMNS}
    for key, val in item.items():
-        if isinstance(val, (np.ndarray | list)) and key not in skip_keys:
+        if isinstance(val, (np.ndarray | list)) and key not in ["task"]:
            # Convert numpy arrays and lists to torch tensors
            item[key] = torch.tensor(val)
    return item
--- a/src/lerobot/datasets/language.py
+++ b/src/lerobot/datasets/language.py
@@ -1,242 +0,0 @@
-#!/usr/bin/env python
-
-# Copyright 2026 The HuggingFace Inc. team. All rights reserved.
-#
-# Licensed under the Apache License, Version 2.0 (the "License");
-# you may not use this file except in compliance with the License.
-# You may obtain a copy of the License at
-#
-#     http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-
-from __future__ import annotations
-
-from typing import Literal
-
-import datasets
-import pyarrow as pa
-
-LANGUAGE_PERSISTENT = "language_persistent"
-LANGUAGE_EVENTS = "language_events"
-LANGUAGE_COLUMNS = (LANGUAGE_PERSISTENT, LANGUAGE_EVENTS)
-PERSISTENT_ROW_FIELDS = ("role", "content", "style", "timestamp", "camera", "tool_calls")
-EVENT_ROW_FIELDS = ("role", "content", "style", "camera", "tool_calls")
-
-CORE_STYLES = {
-    "subtask",
-    "plan",
-    "memory",
-    "motion",
-    "interjection",
-    "vqa",
-    "trace",
-    "task_aug",
-}
-# Project-local styles can be registered at import time by appending to
-# ``EXTENDED_STYLES`` before ``column_for_style`` is called. Anything added
-# here is treated as a known style alongside ``CORE_STYLES`` for resolver
-# validation. Empty by default — populate from a downstream module that
-# also extends ``PERSISTENT_STYLES`` or ``EVENT_ONLY_STYLES`` to declare
-# the new style's column.
-EXTENDED_STYLES: set[str] = set()
-STYLE_REGISTRY = CORE_STYLES | EXTENDED_STYLES
-
-PERSISTENT_STYLES = {"subtask", "plan", "memory", "motion", "task_aug"}
-EVENT_ONLY_STYLES = {"interjection", "vqa", "trace"}
-
-# Styles whose ``content`` is grounded in a specific camera view. Rows of these
-# styles MUST carry a non-null ``camera`` referencing an ``observation.images.*``
-# feature key. Rows of every other style MUST have ``camera=None``. ``motion``
-# is intentionally NOT in this set: motion primitives are described in
-# robot-frame (joint / Cartesian) terms, not pixel space, so they are
-# camera-agnostic. ``trace`` is the pixel-trajectory event style and IS
-# view-dependent. The ``camera`` field nevertheless lives on
-# ``PERSISTENT_ROW_FIELDS`` too so the schema, validator, and resolver
-# behave symmetrically across the two columns; persistent rows simply
-# always have ``camera=None`` in practice today.
-VIEW_DEPENDENT_STYLES = {"vqa", "trace"}
-
-LanguageColumn = Literal["language_persistent", "language_events"]
-
-
-def _json_arrow_type() -> pa.DataType:
-    """Return the Arrow JSON type, falling back to ``string`` on older pyarrow."""
-    return pa.json_() if hasattr(pa, "json_") else pa.string()
-
-
-def _json_feature() -> object:
-    """Return the HF ``datasets`` JSON feature, falling back to a string value."""
-    return datasets.Json() if hasattr(datasets, "Json") else datasets.Value("string")
-
-
-def language_persistent_row_arrow_type() -> pa.StructType:
-    """Return the Arrow struct type for a single persistent language row.
-
-    Persistent rows carry their own ``timestamp`` because they represent a state
-    that became active at a specific moment and remains active until superseded.
-    ``timestamp`` is ``float32`` to match the timestamp dtype LeRobotDataset
-    uses for frame data.
-    """
-    return pa.struct(
-        [
-            pa.field("role", pa.string(), nullable=False),
-            pa.field("content", pa.string(), nullable=True),
-            pa.field("style", pa.string(), nullable=True),
-            pa.field("timestamp", pa.float32(), nullable=False),
-            pa.field("camera", pa.string(), nullable=True),
-            pa.field("tool_calls", pa.list_(_json_arrow_type()), nullable=True),
-        ]
-    )
-
-
-def language_event_row_arrow_type() -> pa.StructType:
-    """Return the Arrow struct type for a single event language row.
-
-    Event rows have no ``timestamp`` field: each event is stored on the dataset
-    row whose frame timestamp is the event's firing time.
-    """
-    return pa.struct(
-        [
-            pa.field("role", pa.string(), nullable=False),
-            pa.field("content", pa.string(), nullable=True),
-            pa.field("style", pa.string(), nullable=True),
-            pa.field("camera", pa.string(), nullable=True),
-            pa.field("tool_calls", pa.list_(_json_arrow_type()), nullable=True),
-        ]
-    )
-
-
-def language_persistent_arrow_type() -> pa.ListType:
-    """Return the Arrow list type for the ``language_persistent`` column."""
-    return pa.list_(language_persistent_row_arrow_type())
-
-
-def language_events_arrow_type() -> pa.ListType:
-    """Return the Arrow list type for the ``language_events`` column."""
-    return pa.list_(language_event_row_arrow_type())
-
-
-def language_persistent_row_feature() -> dict[str, object]:
-    """Return the HF ``datasets`` feature mapping for a persistent language row."""
-    return {
-        "role": datasets.Value("string"),
-        "content": datasets.Value("string"),
-        "style": datasets.Value("string"),
-        "timestamp": datasets.Value("float32"),
-        "camera": datasets.Value("string"),
-        "tool_calls": datasets.List(_json_feature()),
-    }
-
-
-def language_event_row_feature() -> dict[str, object]:
-    """Return the HF ``datasets`` feature mapping for an event language row."""
-    return {
-        "role": datasets.Value("string"),
-        "content": datasets.Value("string"),
-        "style": datasets.Value("string"),
-        "camera": datasets.Value("string"),
-        "tool_calls": datasets.List(_json_feature()),
-    }
-
-
-def language_persistent_column_feature() -> datasets.List:
-    """Return the HF ``datasets`` feature for the ``language_persistent`` column."""
-    return datasets.List(language_persistent_row_feature())
-
-
-def language_events_column_feature() -> datasets.List:
-    """Return the HF ``datasets`` feature for the ``language_events`` column."""
-    return datasets.List(language_event_row_feature())
-
-
-def language_feature_info() -> dict[str, dict]:
-    """Return the ``info["features"]`` entries for both language columns."""
-    return {
-        LANGUAGE_PERSISTENT: {"dtype": "language", "shape": (1,), "names": None},
-        LANGUAGE_EVENTS: {"dtype": "language", "shape": (1,), "names": None},
-    }
-
-
-def is_language_column(key: str) -> bool:
-    """Return ``True`` if ``key`` is one of the dataset's language column names."""
-    return key in LANGUAGE_COLUMNS
-
-
-def is_view_dependent_style(style: str | None) -> bool:
-    """Return ``True`` if rows of ``style`` must be tagged with a ``camera`` key."""
-    return style in VIEW_DEPENDENT_STYLES
-
-
-def validate_camera_field(style: str | None, camera: str | None) -> None:
-    """Enforce the ``camera`` invariant: required iff ``style`` is view-dependent.
-
-    Raises ``ValueError`` if a view-dependent style is missing ``camera`` or if
-    a non-view-dependent style carries one. Pipeline writers and the validator
-    should call this on every emitted row.
-    """
-    if is_view_dependent_style(style):
-        if not camera:
-            raise ValueError(
-                f"Rows of view-dependent style {style!r} require a non-empty 'camera' "
-                f"field referencing an 'observation.images.*' feature key."
-            )
-    elif camera is not None:
-        raise ValueError(f"Rows of style {style!r} must have camera=None; got camera={camera!r}.")
-
-
-# --- Tool registry --------------------------------------------------------
-# Tools declared on a dataset live in ``meta/info.json["tools"]`` as a list
-# of OpenAI-style function schemas. The runtime / training stack reads them
-# through :class:`LeRobotDatasetMetadata.tools` (with these constants as
-# fallback when the dataset doesn't declare any). Implementations live
-# under :mod:`lerobot.tools` (one file per tool); see
-# ``docs/source/tools.mdx`` for the authoring guide.
-
-SAY_TOOL_SCHEMA: dict = {
-    "type": "function",
-    "function": {
-        "name": "say",
-        "description": "Speak a short utterance to the user via the TTS executor.",
-        "parameters": {
-            "type": "object",
-            "properties": {
-                "text": {
-                    "type": "string",
-                    "description": "The verbatim text to speak.",
-                }
-            },
-            "required": ["text"],
-        },
-    },
-}
-"""Canonical schema for the ``say`` tool emitted by the steerable
-annotation pipeline (PR 2 Module 2). Single source of truth — PR 2's
-writer, PR 3's runtime tool registry, and the dataset visualizer all
-import this constant rather than duplicating the dict."""
-
-DEFAULT_TOOLS: list[dict] = [SAY_TOOL_SCHEMA]
-"""Fallback tools list. Returned by ``LeRobotDatasetMetadata.tools``
-when ``meta/info.json["tools"]`` is unset, so unannotated datasets and
-chat-template consumers (``apply_chat_template(messages, tools=...)``)
-keep working out of the box."""
-
-
-def column_for_style(style: str | None) -> LanguageColumn:
-    """Map a language style to the column where rows of that style are stored.
-
-    Styles in :data:`PERSISTENT_STYLES` route to :data:`LANGUAGE_PERSISTENT`.
-    Styles in :data:`EVENT_ONLY_STYLES` and the implicit ``None`` style route
-    to :data:`LANGUAGE_EVENTS`.
-    """
-    if style is None:
-        return LANGUAGE_EVENTS
-    if style in PERSISTENT_STYLES:
-        return LANGUAGE_PERSISTENT
-    if style in EVENT_ONLY_STYLES:
-        return LANGUAGE_EVENTS
-    raise ValueError(f"Unknown language style: {style!r}")
--- a/src/lerobot/datasets/language_render.py
+++ b/src/lerobot/datasets/language_render.py
@@ -1,545 +0,0 @@
-#!/usr/bin/env python
-
-# Copyright 2026 The HuggingFace Inc. team. All rights reserved.
-#
-# Licensed under the Apache License, Version 2.0 (the "License");
-# you may not use this file except in compliance with the License.
-# You may obtain a copy of the License at
-#
-#     http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-
-from __future__ import annotations
-
-import copy
-import hashlib
-import re
-from collections.abc import Sequence
-from typing import Any
-
-from lerobot.configs.recipe import DEFAULT_BINDINGS, PLACEHOLDER_RE, TrainingRecipe
-from lerobot.utils.utils import unwrap_scalar
-
-from .language import LANGUAGE_PERSISTENT, column_for_style
-
-LanguageRow = dict[str, Any]
-RenderedMessages = dict[str, list[Any]]
-
-_RESOLVER_RE = re.compile(r"^(?P<name>[A-Za-z_][A-Za-z0-9_]*)\((?P<args>.*)\)$")
-
-
-def active_at(
-    t: float,
-    *,
-    persistent: Sequence[LanguageRow],
-    style: str | None = None,
-    role: str | None = None,
-    tool_name: str | None = None,
-    camera: str | None = None,
-) -> LanguageRow | None:
-    """Return the persistent row of ``style`` that is active at time ``t``.
-
-    A persistent row is "active" at ``t`` when its own ``timestamp`` is the
-    most recent one ``<= t`` for the given ``style``/``role``/``tool_name``/
-    ``camera`` selector. Only valid for persistent styles.
-    """
-    _validate_persistent_resolver("active_at", style)
-    matches = [
-        row
-        for row in _matching_rows(persistent, style=style, role=role, tool_name=tool_name, camera=camera)
-        if _timestamp(row) <= t
-    ]
-    if not matches:
-        return None
-    latest_ts = max(_timestamp(row) for row in matches)
-    return _select_one(
-        [row for row in matches if _timestamp(row) == latest_ts],
-        style=style,
-        role=role,
-        tool_name=tool_name,
-        camera=camera,
-    )
-
-
-EMITTED_AT_TOLERANCE_S = 0.1
-"""Half-window for matching persistent rows to a frame timestamp in
-``emitted_at``. Persistent timestamps come from parquet (float32) and ``t``
-is also a float32 from parquet, so in the ideal hot path an exact match
-would suffice — but any caller that derives ``t`` arithmetically (e.g.
-``frame_idx / fps``) breaks bit-equality. A 0.1 s tolerance covers
-common arithmetic drift without admitting frames that are visibly far
-apart at typical control rates (30–100 Hz). This does mean two persistent
-rows of the same selector emitted within 0.1 s of each other cannot be
-told apart by ``emitted_at`` — acceptable because persistent annotations
-(subtask / plan / memory transitions) change on a human-action timescale,
-not at the camera frame rate."""
-
-
-def emitted_at(
-    t: float,
-    *,
-    persistent: Sequence[LanguageRow],
-    events: Sequence[LanguageRow],
-    style: str | None = None,
-    role: str | None = None,
-    tool_name: str | None = None,
-    camera: str | None = None,
-) -> LanguageRow | None:
-    """Return the row of ``style`` emitted at exactly time ``t``.
-
-    For persistent styles, this matches persistent rows whose own ``timestamp``
-    is within ``EMITTED_AT_TOLERANCE_S`` of ``t`` (see that constant for why
-    we use a tolerance instead of bit-equality). For event styles, the
-    ``events`` list is assumed to come from the dataset row at frame ``t``
-    (event rows carry no timestamp of their own), so all matching event rows
-    are considered emitted at ``t``. ``camera`` filters by the row's
-    ``camera`` field — required to disambiguate when multiple view-dependent
-    rows share ``(t, role)`` across cameras.
-    """
-    if column_for_style(style) == LANGUAGE_PERSISTENT:
-        matches = [
-            row
-            for row in _matching_rows(persistent, style=style, role=role, tool_name=tool_name, camera=camera)
-            if abs(_timestamp(row) - t) <= EMITTED_AT_TOLERANCE_S
-        ]
-    else:
-        matches = _matching_rows(events, style=style, role=role, tool_name=tool_name, camera=camera)
-    return _select_one(matches, style=style, role=role, tool_name=tool_name, camera=camera)
-
-
-def nth_prev(
-    t: float,
-    *,
-    persistent: Sequence[LanguageRow],
-    style: str | None = None,
-    offset: int = 1,
-    role: str | None = None,
-    tool_name: str | None = None,
-    camera: str | None = None,
-) -> LanguageRow | None:
-    """Return the persistent row that was active ``offset`` steps before ``t``.
-
-    Walks back through chronologically sorted persistent rows of ``style``
-    (filtered by optional ``role``/``tool_name``/``camera``) and returns the
-    one ``offset`` positions before the row active at ``t``. Only valid for
-    persistent styles.
-    """
-    return _nth_relative("nth_prev", t, persistent, style, -offset, role, tool_name, camera)
-
-
-def nth_next(
-    t: float,
-    *,
-    persistent: Sequence[LanguageRow],
-    style: str | None = None,
-    offset: int = 1,
-    role: str | None = None,
-    tool_name: str | None = None,
-    camera: str | None = None,
-) -> LanguageRow | None:
-    """Return the persistent row that becomes active ``offset`` steps after ``t``.
-
-    Walks forward through chronologically sorted persistent rows of ``style``
-    (filtered by optional ``role``/``tool_name``/``camera``) and returns the
-    one ``offset`` positions after the row active at ``t``. Only valid for
-    persistent styles.
-    """
-    return _nth_relative("nth_next", t, persistent, style, offset, role, tool_name, camera)
-
-
-def render_sample(
-    *,
-    recipe: TrainingRecipe,
-    persistent: Sequence[LanguageRow] | None,
-    events: Sequence[LanguageRow] | None,
-    t: float,
-    sample_idx: int,
-    task: str | None = None,
-    dataset_ctx: Any | None = None,
-) -> RenderedMessages | None:
-    """Render the chat-style messages for a single dataset sample.
-
-    Resolves the recipe's bindings against ``persistent`` and ``events`` rows
-    at frame timestamp ``t``, then expands the recipe's message templates.
-    Returns ``None`` if the resolved sample contains no target message.
-    """
-    persistent_rows = _normalize_rows(persistent or [])
-    event_rows = _normalize_rows(events or [])
-    selected_recipe = _select_recipe(recipe, sample_idx)
-    bindings = _resolve_bindings(
-        selected_recipe,
-        persistent=persistent_rows,
-        events=event_rows,
-        t=t,
-        sample_idx=sample_idx,
-        task=task,
-        dataset_ctx=dataset_ctx,
-    )
-    return _render_message_recipe(selected_recipe, bindings)
-
-
-def _select_recipe(recipe: TrainingRecipe, sample_idx: int) -> TrainingRecipe:
-    """Pick a deterministic blend component for ``sample_idx`` (or return ``recipe``)."""
-    if recipe.blend is None:
-        return recipe
-
-    total_weight = sum(component.weight or 0.0 for component in recipe.blend.values())
-    if total_weight <= 0:
-        raise ValueError("Blend weights must sum to a positive value.")
-
-    digest = hashlib.blake2b(str(sample_idx).encode(), digest_size=8).digest()
-    draw = int.from_bytes(digest, "big") / 2**64 * total_weight
-    cumulative = 0.0
-    last_component: TrainingRecipe | None = None
-    for component in recipe.blend.values():
-        last_component = component
-        cumulative += component.weight or 0.0
-        if draw < cumulative:
-            return component
-    assert last_component is not None
-    return last_component
-
-
-def _resolve_bindings(
-    recipe: TrainingRecipe,
-    *,
-    persistent: Sequence[LanguageRow],
-    events: Sequence[LanguageRow],
-    t: float,
-    sample_idx: int,
-    task: str | None,
-    dataset_ctx: Any | None,
-) -> dict[str, LanguageRow | str | None]:
-    """Resolve every binding in ``recipe`` (plus ``task``) at time ``t``."""
-    bindings: dict[str, LanguageRow | str | None] = {
-        "task": _resolve_task(task, dataset_ctx, persistent=persistent, sample_idx=sample_idx),
-    }
-    specs = {**DEFAULT_BINDINGS, **(recipe.bindings or {})}
-    for name, spec in specs.items():
-        bindings[name] = _resolve_spec(spec, persistent=persistent, events=events, t=t)
-    return bindings
-
-
-def _resolve_task(
-    task: str | None,
-    dataset_ctx: Any | None,
-    *,
-    persistent: Sequence[LanguageRow] = (),
-    sample_idx: int = 0,
-) -> str | None:
-    """Return the task string for ``sample_idx``.
-
-    Resolution order:
-
-    1. Explicit ``task`` override (caller-supplied) wins.
-    2. If ``persistent`` contains rows of style ``task_aug`` (role=user),
-       deterministically pick one by ``sample_idx`` so each frame of an
-       episode rotates through the available rephrasings across an epoch.
-       This realizes Xiao 2022 / CAST-style task-prompt diversity without
-       changing ``meta/tasks.parquet`` and without forcing recipes to opt
-       in: ``${task}`` automatically picks a rephrasing when one exists,
-       and falls back to the canonical task otherwise. Recipes that want
-       the literal canonical task can override the binding.
-    3. Otherwise read the canonical task from ``dataset_ctx`` (which is
-       backed by ``meta/tasks.parquet``).
-    """
-    if task is not None:
-        return task
-
-    aug_rows = [r for r in persistent if r.get("style") == "task_aug" and r.get("role") == "user"]
-    if aug_rows:
-        # Deterministic, blake2b-based pick keyed on sample_idx so the
-        # rotation is reproducible across runs (Python's built-in ``hash``
-        # is process-randomized).
-        digest = hashlib.blake2b(f"task_aug:{sample_idx}".encode(), digest_size=8).digest()
-        idx = int.from_bytes(digest, "big") % len(aug_rows)
-        chosen = aug_rows[idx].get("content")
-        if chosen:
-            return str(chosen)
-
-    if dataset_ctx is None:
-        return None
-    if isinstance(dataset_ctx, dict):
-        return dataset_ctx.get("task")
-    return getattr(dataset_ctx, "task", None)
-
-
-def _resolve_spec(
-    spec: str,
-    *,
-    persistent: Sequence[LanguageRow],
-    events: Sequence[LanguageRow],
-    t: float,
-) -> LanguageRow | None:
-    """Parse a single binding's resolver expression and dispatch to its function."""
-    match = _RESOLVER_RE.match(spec.strip())
-    if match is None:
-        raise ValueError(f"Invalid resolver expression: {spec!r}")
-    name = match.group("name")
-    kwargs = _parse_resolver_args(match.group("args"))
-    kwargs.pop("t_arg", None)
-
-    if name == "emitted_at":
-        return emitted_at(t, persistent=persistent, events=events, **kwargs)
-    if name == "active_at":
-        return active_at(t, persistent=persistent, **kwargs)
-    if name == "nth_prev":
-        return nth_prev(t, persistent=persistent, **kwargs)
-    if name == "nth_next":
-        return nth_next(t, persistent=persistent, **kwargs)
-    raise ValueError(f"Unknown language resolver: {name!r}")
-
-
-def _parse_resolver_args(args: str) -> dict[str, Any]:
-    """Parse a comma-separated resolver argument list into a kwargs dict."""
-    kwargs: dict[str, Any] = {}
-    if not args.strip():
-        return kwargs
-
-    parts = [part.strip() for part in args.split(",") if part.strip()]
-    for part in parts:
-        if part == "t":
-            kwargs["t_arg"] = True
-            continue
-        if "=" not in part:
-            raise ValueError(f"Invalid resolver argument: {part!r}")
-        key, value = (item.strip() for item in part.split("=", 1))
-        if key == "offset":
-            kwargs[key] = int(value)
-        else:
-            kwargs[key] = value.strip("\"'")
-    return kwargs
-
-
-def _render_message_recipe(
-    recipe: TrainingRecipe,
-    bindings: dict[str, LanguageRow | str | None],
-) -> RenderedMessages | None:
-    """Expand ``recipe.messages`` into rendered chat messages using ``bindings``."""
-    assert recipe.messages is not None
-    messages: list[dict[str, Any]] = []
-    streams: list[str | None] = []
-    target_indices: list[int] = []
-
-    for turn in recipe.messages:
-        if turn.if_present is not None and bindings.get(turn.if_present) is None:
-            continue
-
-        message = {"role": turn.role}
-        if turn.content is not None:
-            message["content"] = _render_content(turn.content, bindings)
-
-        if turn.tool_calls_from is not None:
-            row = bindings.get(turn.tool_calls_from)
-            tool_calls = row.get("tool_calls") if isinstance(row, dict) else None
-            if tool_calls:
-                message["tool_calls"] = copy.deepcopy(tool_calls)
-
-        message_idx = len(messages)
-        messages.append(message)
-        streams.append(turn.stream)
-        if turn.target:
-            target_indices.append(message_idx)
-
-    if not target_indices:
-        return None
-
-    rendered = {
-        "messages": messages,
-        "message_streams": streams,
-        "target_message_indices": target_indices,
-    }
-    _validate_rendered(rendered)
-    return rendered
-
-
-def _render_content(
-    content: str | list[dict[str, Any]],
-    bindings: dict[str, LanguageRow | str | None],
-) -> str | list[dict[str, Any]]:
-    """Substitute bindings into a string or each string field of multimodal blocks."""
-    if isinstance(content, str):
-        return _substitute(content, bindings)
-
-    rendered_blocks = []
-    for block in content:
-        rendered_block = copy.deepcopy(block)
-        for key, value in rendered_block.items():
-            if isinstance(value, str):
-                rendered_block[key] = _substitute(value, bindings)
-        rendered_blocks.append(rendered_block)
-    return rendered_blocks
-
-
-def _substitute(template: str, bindings: dict[str, LanguageRow | str | None]) -> str:
-    """Replace ``${name}`` placeholders in ``template`` with their bound values."""
-
-    def replace(match: re.Match[str]) -> str:
-        """Resolve a single ``${name}`` match to its bound string value."""
-        name = match.group(1)
-        if name not in bindings:
-            raise ValueError(f"Unknown template binding: {name!r}")
-        value = bindings[name]
-        if value is None:
-            return ""
-        if isinstance(value, dict):
-            content = value.get("content")
-            return "" if content is None else str(content)
-        return str(value)
-
-    return PLACEHOLDER_RE.sub(replace, template)
-
-
-def _validate_rendered(rendered: RenderedMessages) -> None:
-    """Sanity-check the rendered output for stream/target alignment."""
-    messages = rendered["messages"]
-    streams = rendered["message_streams"]
-    target_indices = rendered["target_message_indices"]
-
-    if len(streams) != len(messages):
-        raise ValueError("message_streams must be aligned with messages.")
-    if not target_indices:
-        raise ValueError("Rendered samples must contain at least one target message.")
-    for idx in target_indices:
-        if idx < 0 or idx >= len(messages):
-            raise ValueError(f"Target message index {idx} is out of bounds.")
-    # ``stream`` is enforced non-None at MessageTurn construction time
-    # (see ``MessageTurn.__post_init__``), so a missing stream here would
-    # mean the dataclass invariant was bypassed; no need to re-check.
-
-
-def _nth_relative(
-    name: str,
-    t: float,
-    persistent: Sequence[LanguageRow],
-    style: str | None,
-    offset: int,
-    role: str | None,
-    tool_name: str | None,
-    camera: str | None,
-) -> LanguageRow | None:
-    """Shared body for ``nth_prev`` / ``nth_next`` with signed ``offset``."""
-    _validate_persistent_resolver(name, style)
-    if abs(offset) < 1:
-        raise ValueError(f"{name} offset must be non-zero.")
-
-    rows = sorted(
-        _matching_rows(persistent, style=style, role=role, tool_name=tool_name, camera=camera),
-        key=_row_sort_key,
-    )
-    if not rows:
-        return None
-
-    anchor_idx = None
-    for idx, row in enumerate(rows):
-        if _timestamp(row) <= t:
-            anchor_idx = idx
-        else:
-            break
-
-    target_idx = (offset - 1 if offset > 0 else None) if anchor_idx is None else anchor_idx + offset
-
-    if target_idx is None or target_idx < 0 or target_idx >= len(rows):
-        return None
-    return rows[target_idx]
-
-
-def _validate_persistent_resolver(name: str, style: str | None) -> None:
-    """Reject calls with missing or event-only ``style`` for persistent resolvers."""
-    if style is None:
-        raise ValueError(f"{name} requires a persistent style.")
-    if column_for_style(style) != LANGUAGE_PERSISTENT:
-        raise ValueError(f"{name} cannot be used with event-only style {style!r}.")
-
-
-def _matching_rows(
-    rows: Sequence[LanguageRow],
-    *,
-    style: str | None,
-    role: str | None,
-    tool_name: str | None,
-    camera: str | None,
-) -> list[LanguageRow]:
-    """Return ``rows`` filtered by optional ``style``/``role``/``tool_name``/``camera`` selectors."""
-    return [
-        row
-        for row in rows
-        if (style is None or row.get("style") == style)
-        and (role is None or row.get("role") == role)
-        and (tool_name is None or _row_has_tool_name(row, tool_name))
-        and (camera is None or row.get("camera") == camera)
-    ]
-
-
-def _select_one(
-    rows: Sequence[LanguageRow],
-    *,
-    style: str | None,
-    role: str | None,
-    tool_name: str | None,
-    camera: str | None,
-) -> LanguageRow | None:
-    """Return the single matching row, or raise if the resolver is ambiguous.
-
-    Multiple matches always raise — even when the caller already passed
-    some selectors — because remaining ambiguity means the data has
-    several rows that look identical to the resolver and the caller
-    needs to pin down a specific one (e.g. add ``camera=...`` for VQA
-    rows shared across cameras).
-    """
-    if not rows:
-        return None
-    if len(rows) > 1:
-        raise ValueError(
-            f"Ambiguous resolver for style={style!r} role={role!r} "
-            f"tool_name={tool_name!r} camera={camera!r}: {len(rows)} matching rows. "
-            f"Add a selector that distinguishes them."
-        )
-    return rows[0]
-
-
-def _row_sort_key(row: LanguageRow) -> tuple[float, str, str]:
-    """Stable sort key for both persistent and event rows.
-
-    Event rows lack ``timestamp`` (it is implicit in the frame), so default
-    to ``0.0`` — within a single frame all event rows share the same sort
-    bucket and are tiebroken by ``(style, role)``.
-    """
-    timestamp = row.get("timestamp")
-    ts = float(unwrap_scalar(timestamp)) if timestamp is not None else 0.0
-    return (ts, row.get("style") or "", row.get("role") or "")
-
-
-def _timestamp(row: LanguageRow) -> float:
-    """Extract a row's ``timestamp`` as a Python float (unwrapping numpy scalars)."""
-    return float(unwrap_scalar(row["timestamp"]))
-
-
-def _row_has_tool_name(row: LanguageRow, tool_name: str) -> bool:
-    """Return ``True`` if any of the row's tool calls invokes ``tool_name``."""
-    for tool_call in row.get("tool_calls") or []:
-        if isinstance(tool_call, str):
-            continue
-        function = tool_call.get("function") if isinstance(tool_call, dict) else None
-        if isinstance(function, dict) and function.get("name") == tool_name:
-            return True
-    return False
-
-
-def _normalize_rows(rows: Sequence[Any]) -> list[LanguageRow]:
-    """Convert pyarrow scalars / mappings into a fresh list of plain dict rows."""
-    normalized = []
-    for row in rows:
-        if row is None:
-            continue
-        if hasattr(row, "as_py"):
-            row = row.as_py()
-        if not isinstance(row, dict):
-            raise TypeError(f"Language rows must be dictionaries, got {type(row).__name__}.")
-        normalized.append(dict(row))
-    return normalized
--- a/src/lerobot/datasets/lerobot_dataset.py
+++ b/src/lerobot/datasets/lerobot_dataset.py
@@ -24,7 +24,7 @@ import torch.utils
 from huggingface_hub import HfApi, snapshot_download
 from huggingface_hub.errors import RevisionNotFoundError

-from lerobot.configs import VideoEncoderConfig
+from lerobot.configs import DepthEncoderConfig, VideoEncoderConfig
 from lerobot.utils.constants import HF_LEROBOT_HUB_CACHE

 from .dataset_metadata import CODEBASE_VERSION, LeRobotDatasetMetadata
@@ -60,6 +60,7 @@ class LeRobotDataset(torch.utils.data.Dataset):
        return_uint8: bool = False,
        batch_encoding_size: int = 1,
        camera_encoder: VideoEncoderConfig | None = None,
+        depth_encoder: DepthEncoderConfig | None = None,
        encoder_threads: int | None = None,
        streaming_encoding: bool = False,
        encoder_queue_maxsize: int = 30,
@@ -186,6 +187,9 @@ class LeRobotDataset(torch.utils.data.Dataset):
            camera_encoder (VideoEncoderConfig | None, optional): Video encoder settings for cameras
                (codec, quality, etc.). When ``None``, :func:`~lerobot.configs.video.camera_encoder_defaults`
                is used by the writer.
+            depth_encoder (DepthEncoderConfig | None, optional): Video encoder settings for depth cameras
+                (codec, quality, etc.). When ``None``, :func:`~lerobot.configs.depth.depth_encoder_defaults`
+                is used by the writer.
            encoder_threads (int | None, optional): Number of encoder threads (global). ``None`` lets the
                codec decide.
            streaming_encoding (bool, optional): If True, encode video frames in real-time during capture
@@ -273,6 +277,7 @@ class LeRobotDataset(torch.utils.data.Dataset):
                streaming_enc = self._build_streaming_encoder(
                    self.meta.fps,
                    camera_encoder,
+                    depth_encoder,
                    encoder_queue_maxsize,
                    encoder_threads,
                )
@@ -280,6 +285,7 @@ class LeRobotDataset(torch.utils.data.Dataset):
                meta=self.meta,
                root=self.root,
                camera_encoder=camera_encoder,
+                depth_encoder=depth_encoder,
                encoder_threads=encoder_threads,
                batch_encoding_size=batch_encoding_size,
                streaming_encoder=streaming_enc,
@@ -322,12 +328,14 @@ class LeRobotDataset(torch.utils.data.Dataset):
    def _build_streaming_encoder(
        fps: int,
        camera_encoder: VideoEncoderConfig | None,
+        depth_encoder: DepthEncoderConfig | None,
        encoder_queue_maxsize: int,
        encoder_threads: int | None,
    ) -> StreamingVideoEncoder:
        return StreamingVideoEncoder(
            fps=fps,
            camera_encoder=camera_encoder,
+            depth_encoder=depth_encoder,
            queue_maxsize=encoder_queue_maxsize,
            encoder_threads=encoder_threads,
        )
@@ -645,6 +653,7 @@ class LeRobotDataset(torch.utils.data.Dataset):
        video_backend: str | None = None,
        batch_encoding_size: int = 1,
        camera_encoder: VideoEncoderConfig | None = None,
+        depth_encoder: DepthEncoderConfig | None = None,
        metadata_buffer_size: int = 10,
        streaming_encoding: bool = False,
        encoder_queue_maxsize: int = 30,
@@ -677,6 +686,8 @@ class LeRobotDataset(torch.utils.data.Dataset):
                batch-encoding videos. ``1`` means encode immediately.
            camera_encoder: Video encoder settings for cameras (codec, quality, etc.).
                When ``None``, :func:`~lerobot.configs.video.camera_encoder_defaults` is used.
+            depth_encoder: Video encoder settings for depth cameras (codec, quality, etc.).
+                When ``None``, :func:`~lerobot.configs.depth.depth_encoder_defaults` is used.
            encoder_threads: Number of encoder threads (global). ``None``
                lets the codec decide.
            metadata_buffer_size: Number of episode metadata records to buffer
@@ -720,12 +731,13 @@ class LeRobotDataset(torch.utils.data.Dataset):
        streaming_enc = None
        if streaming_encoding and len(obj.meta.video_keys) > 0:
            streaming_enc = cls._build_streaming_encoder(
-                fps, camera_encoder, encoder_queue_maxsize, encoder_threads
+                fps, camera_encoder, depth_encoder, encoder_queue_maxsize, encoder_threads
            )
        obj.writer = DatasetWriter(
            meta=obj.meta,
            root=obj.root,
            camera_encoder=camera_encoder,
+            depth_encoder=depth_encoder,
            encoder_threads=encoder_threads,
            batch_encoding_size=batch_encoding_size,
            streaming_encoder=streaming_enc,
@@ -749,6 +761,7 @@ class LeRobotDataset(torch.utils.data.Dataset):
        video_backend: str | None = None,
        batch_encoding_size: int = 1,
        camera_encoder: VideoEncoderConfig | None = None,
+        depth_encoder: DepthEncoderConfig | None = None,
        encoder_threads: int | None = None,
        image_writer_processes: int = 0,
        image_writer_threads: int = 0,
@@ -778,6 +791,8 @@ class LeRobotDataset(torch.utils.data.Dataset):
                batch-encoding videos.
            camera_encoder: Video encoder settings for cameras (codec, quality, etc.).
                When ``None``, :func:`~lerobot.configs.video.camera_encoder_defaults` is used.
+            depth_encoder: Video encoder settings for depth cameras (codec, quality, etc.).
+                When ``None``, :func:`~lerobot.configs.depth.depth_encoder_defaults` is used.
            encoder_threads: Number of encoder threads (global). ``None``
                lets the codec decide.
            image_writer_processes: Subprocesses for async image writing.
@@ -824,12 +839,13 @@ class LeRobotDataset(torch.utils.data.Dataset):
        streaming_enc = None
        if streaming_encoding and len(obj.meta.video_keys) > 0:
            streaming_enc = cls._build_streaming_encoder(
-                obj.meta.fps, camera_encoder, encoder_queue_maxsize, encoder_threads
+                obj.meta.fps, camera_encoder, depth_encoder, encoder_queue_maxsize, encoder_threads
            )
        obj.writer = DatasetWriter(
            meta=obj.meta,
            root=obj.root,
            camera_encoder=camera_encoder,
+            depth_encoder=depth_encoder,
            encoder_threads=encoder_threads,
            batch_encoding_size=batch_encoding_size,
            streaming_encoder=streaming_enc,
--- a/src/lerobot/datasets/pyav_utils.py
+++ b/src/lerobot/datasets/pyav_utils.py
@@ -24,6 +24,7 @@ import logging
 from typing import Any

 import av
+import numpy as np

 logger = logging.getLogger(__name__)

@@ -31,6 +32,22 @@ FFMPEG_NUMERIC_OPTION_TYPES = ("INT", "INT64", "UINT64", "FLOAT", "DOUBLE")
 FFMPEG_INTEGER_OPTION_TYPES = ("INT", "INT64", "UINT64")


+def write_u16_plane(plane: av.video.plane.VideoPlane, src: np.ndarray, fill_value: int | None = None) -> None:
+    """Copy ``src`` into a uint16 plane respecting FFmpeg line padding."""
+    height, width = src.shape
+    stride_u16 = plane.line_size // np.dtype(np.uint16).itemsize
+    dst = np.frombuffer(plane, dtype=np.uint16).reshape(height, stride_u16)
+    if fill_value is not None:
+        dst.fill(fill_value)
+    dst[:, :width] = src
+
+
+@functools.cache
+def get_pix_fmt_channels(pix_fmt: str) -> int:
+    """Return the number of components (channels) for *pix_fmt*."""
+    return len(av.VideoFormat(pix_fmt).components)
+
+
@functools.cache
 def get_codec(vcodec: str) -> av.codec.Codec | None:
    """PyAV write-mode ``Codec`` for *vcodec*, or ``None`` if unavailable."""
@@ -142,6 +159,16 @@ def _check_pixel_format(vcodec: str, pix_fmt: str) -> None:
        )


+def _check_pix_fmt_channels(pix_fmt: str, channels: int) -> None:
+    """Ensure *pix_fmt* can carry at least *channels* components."""
+    pix_fmt_channels = get_pix_fmt_channels(pix_fmt)
+    if pix_fmt_channels < channels:
+        raise ValueError(
+            f"pix_fmt={pix_fmt!r} carries only {pix_fmt_channels} component(s) "
+            f"but the source data has {channels} channel(s)."
+        )
+
+
 def _check_codec_options(vcodec: str, codec_options: dict[str, Any]) -> None:
    """Validate merged encoder options (typed) against the codec's published AVOptions."""
    supported_options = _get_codec_options_by_name(vcodec)
@@ -156,12 +183,18 @@ def _check_codec_options(vcodec: str, codec_options: dict[str, Any]) -> None:
        _check_option_value(vcodec, key, value, supported_options[key])


-def check_video_encoder_parameters_pyav(vcodec: str, pix_fmt: str, codec_options: dict[str, Any]) -> None:
+def check_video_encoder_parameters_pyav(
+    vcodec: str,
+    pix_fmt: str,
+    codec_options: dict[str, Any],
+    channels: int | None = None,
+) -> None:
    """Verify *config* is compatible with the bundled FFmpeg build.

    Checks pixel format, abstract tuning-field compatibility, and each merged
    encoder option from :meth:`~lerobot.configs.video.VideoEncoderConfig.get_codec_options`
    against PyAV (including numeric ``extra_options`` present in that dict).
+    When given, additionally verify that *pix_fmt* carries as many components as the source data channels.
    No-op when ``config.vcodec`` isn't in the local FFmpeg build.

    Raises:
@@ -171,4 +204,6 @@ def check_video_encoder_parameters_pyav(vcodec: str, pix_fmt: str, codec_options
    if not options:
        raise ValueError(f"Codec {vcodec!r} is not available in the bundled FFmpeg build")
    _check_pixel_format(vcodec, pix_fmt)
+    if channels is not None:
+        _check_pix_fmt_channels(pix_fmt, channels)
    _check_codec_options(vcodec, codec_options)
--- a/src/lerobot/datasets/utils.py
+++ b/src/lerobot/datasets/utils.py
@@ -88,10 +88,12 @@ VIDEO_DIR = "videos"

 CHUNK_FILE_PATTERN = "chunk-{chunk_index:03d}/file-{file_index:03d}"
 DEFAULT_TASKS_PATH = "meta/tasks.parquet"
+DEFAULT_SUBTASKS_PATH = "meta/subtasks.parquet"
 DEFAULT_EPISODES_PATH = EPISODES_DIR + "/" + CHUNK_FILE_PATTERN + ".parquet"
 DEFAULT_DATA_PATH = DATA_DIR + "/" + CHUNK_FILE_PATTERN + ".parquet"
 DEFAULT_VIDEO_PATH = VIDEO_DIR + "/{video_key}/" + CHUNK_FILE_PATTERN + ".mp4"
 DEFAULT_IMAGE_PATH = "images/{image_key}/episode-{episode_index:06d}/frame-{frame_index:06d}.png"
+DEFAULT_DEPTH_PATH = "images/{image_key}/episode-{episode_index:06d}/frame-{frame_index:06d}.tiff"

 LEGACY_EPISODES_PATH = "meta/episodes.jsonl"
 LEGACY_EPISODES_STATS_PATH = "meta/episodes_stats.jsonl"
@@ -129,9 +131,6 @@ class DatasetInfo:
    # Optional metadata
    robot_type: str | None = None
    splits: dict[str, str] = field(default_factory=dict)
-    # OpenAI-style tool schemas declared by the dataset. ``None`` means the
-    # dataset doesn't declare any — readers fall back to ``DEFAULT_TOOLS``.
-    tools: list[dict] | None = None

    def __post_init__(self) -> None:
        # Coerce feature shapes from list to tuple — JSON deserialisation
@@ -153,15 +152,11 @@ class DatasetInfo:
        """Return a JSON-serialisable dict.

        Converts tuple shapes back to lists so ``json.dump`` can handle them.
-        Drops ``tools`` when unset so existing datasets keep a clean
-        ``info.json``.
        """
        d = dataclasses.asdict(self)
        for ft in d["features"].values():
            if isinstance(ft.get("shape"), tuple):
                ft["shape"] = list(ft["shape"])
-        if d.get("tools") is None:
-            d.pop("tools", None)
        return d

    @classmethod
--- a/src/lerobot/datasets/video_utils.py
+++ b/src/lerobot/datasets/video_utils.py
@@ -37,11 +37,16 @@ from datasets.features.features import register_feature
 from PIL import Image

 from lerobot.configs import (
+    DepthEncoderConfig,
    VideoEncoderConfig,
    camera_encoder_defaults,
+    depth_encoder_defaults,
 )
 from lerobot.utils.import_utils import get_safe_default_video_backend

+from .depth_utils import quantize_depth
+from .pyav_utils import get_pix_fmt_channels
+
 logger = logging.getLogger(__name__)


@@ -51,6 +56,7 @@ def decode_video_frames(
    tolerance_s: float,
    backend: str | None = None,
    return_uint8: bool = False,
+    is_depth: bool = False,
 ) -> torch.Tensor:
    """
    Decodes video frames using the specified backend.
@@ -70,6 +76,11 @@ def decode_video_frames(

    Currently supports torchcodec on cpu and pyav.
    """
+    if backend != "pyav" and is_depth:
+        logger.warning("Decoding depth maps is only supported with the 'pyav' backend.")
+        # We do not actually return uint8 here, but we avoid the 255 normalization step.
+        return decode_video_frames_pyav(video_path, timestamps, tolerance_s, return_uint8=True, is_depth=True)
+
    if backend is None:
        backend = get_safe_default_video_backend()
    if backend == "torchcodec":
@@ -89,6 +100,7 @@ def decode_video_frames_pyav(
    tolerance_s: float,
    log_loaded_timestamps: bool = False,
    return_uint8: bool = False,
+    is_depth: bool = False,
 ) -> torch.Tensor:
    """Loads frames associated to the requested timestamps of a video using PyAV.

@@ -138,9 +150,13 @@ def decode_video_frames_pyav(
            current_ts = float(frame.pts * stream.time_base)
            if log_loaded_timestamps:
                logger.info(f"frame loaded at timestamp={current_ts:.4f}")
-            # Convert to CHW uint8 to match torchcodec's output layout.
-            arr = frame.to_ndarray(format="rgb24")  # H, W, 3
-            loaded_frames.append(torch.from_numpy(arr).permute(2, 0, 1).contiguous())
+            if is_depth:
+                arr = frame.to_ndarray(format="gray12le")  # (H, W) uint12
+                loaded_frames.append(torch.from_numpy(arr).unsqueeze(0).contiguous())
+            else:
+                arr = frame.to_ndarray(format="rgb24")  # (H, W, 3)
+                # Convert to CHW uint8 to match torchcodec's output layout.
+                loaded_frames.append(torch.from_numpy(arr).permute(2, 0, 1).contiguous())
            loaded_ts.append(current_ts)
            if current_ts >= last_ts:
                break
@@ -335,17 +351,17 @@ def encode_video_frames(
    imgs_dir: Path | str,
    video_path: Path | str,
    fps: int,
-    camera_encoder: VideoEncoderConfig | None = None,
+    video_encoder: VideoEncoderConfig | None = None,
    encoder_threads: int | None = None,
    *,
    log_level: int | None = av.logging.WARNING,
    overwrite: bool = False,
 ) -> None:
    """More info on ffmpeg arguments tuning on `benchmark/video/README.md`"""
-    if camera_encoder is None:
-        camera_encoder = camera_encoder_defaults()
-    vcodec = camera_encoder.vcodec
-    pix_fmt = camera_encoder.pix_fmt
+    if video_encoder is None:
+        video_encoder = camera_encoder_defaults()
+    vcodec = video_encoder.vcodec
+    pix_fmt = video_encoder.pix_fmt

    video_path = Path(video_path)
    imgs_dir = Path(imgs_dir)
@@ -357,7 +373,8 @@ def encode_video_frames(
    video_path.parent.mkdir(parents=True, exist_ok=True)

    # Get input frames
-    template = "frame-" + ("[0-9]" * 6) + ".png"
+    suffix = ".png" if not isinstance(video_encoder, DepthEncoderConfig) else ".tiff"
+    template = "frame-" + ("[0-9]" * 6) + suffix
    input_list = sorted(
        glob.glob(str(imgs_dir / template)), key=lambda x: int(x.split("-")[-1].split(".")[0])
    )
@@ -367,7 +384,7 @@ def encode_video_frames(
    with Image.open(input_list[0]) as dummy_image:
        width, height = dummy_image.size

-    video_options = camera_encoder.get_codec_options(encoder_threads, as_strings=True)
+    video_options = video_encoder.get_codec_options(encoder_threads, as_strings=True)

    # Set logging level
    if log_level is not None:
@@ -403,92 +420,6 @@ def encode_video_frames(
        raise OSError(f"Video encoding did not work. File not found: {video_path}.")


-def reencode_video(
-    input_video_path: Path | str,
-    output_video_path: Path | str,
-    camera_encoder: VideoEncoderConfig | None = None,
-    encoder_threads: int | None = None,
-    log_level: int | None = av.logging.WARNING,
-    overwrite: bool = False,
-) -> None:
-    """Re-encode a video file using the given encoder configuration.
-
-    Args:
-        input_video_path: Existing video file to read.
-        output_video_path: Path for the re-encoded file.
-        camera_encoder: Encoder configuration. Defaults to :func:`camera_encoder_defaults`.
-        encoder_threads: Optional thread count forwarded to :meth:`VideoEncoderConfig.get_codec_options`.
-        log_level: libav log level while encoding, or ``None`` to leave logging unchanged. Defaults to WARNING.
-        overwrite: When ``False`` and ``output_video_path`` already exists, skip and log a warning.
-    """
-
-    camera_encoder = camera_encoder or camera_encoder_defaults()
-
-    output_video_path = Path(output_video_path)
-
-    if output_video_path.exists() and not overwrite:
-        logger.warning(f"Video file already exists: {output_video_path}. Skipping re-encode.")
-        return
-
-    output_video_path.parent.mkdir(parents=True, exist_ok=True)
-
-    video_options = camera_encoder.get_codec_options(encoder_threads, as_strings=True)
-    vcodec = camera_encoder.vcodec
-    pix_fmt = camera_encoder.pix_fmt
-
-    with tempfile.NamedTemporaryFile(suffix=".mp4", delete=False) as tmp_named_file:
-        tmp_output_video_path = tmp_named_file.name
-
-    if log_level is not None:
-        logging.getLogger("libav").setLevel(log_level)
-
-    try:
-        with av.open(input_video_path, mode="r") as src:
-            try:
-                in_stream = src.streams.video[0]
-            except IndexError as e:
-                raise ValueError(f"No video stream in {input_video_path}") from e
-
-            fps = (
-                in_stream.base_rate
-            )  # We allow fractional fps though LeRobotDataset only supports integer fps
-            width = int(in_stream.width)
-            height = int(in_stream.height)
-
-            with av.open(
-                tmp_output_video_path,
-                mode="w",
-                options={
-                    "movflags": "faststart"
-                },  # faststart is to move the metadata to the beginning of the file to speed up loading
-            ) as dst:
-                out_stream = dst.add_stream(vcodec, fps, options=video_options)
-                out_stream.pix_fmt = pix_fmt
-                out_stream.width = width
-                out_stream.height = height
-
-                for frame in src.decode(in_stream):
-                    frame = frame.reformat(width=width, height=height, format=pix_fmt)
-                    packet = out_stream.encode(frame)
-                    if packet:
-                        dst.mux(packet)
-
-                packet = out_stream.encode()
-                if packet:
-                    dst.mux(packet)
-
-        shutil.move(tmp_output_video_path, output_video_path)
-    except Exception:
-        Path(tmp_output_video_path).unlink(missing_ok=True)
-        raise
-    finally:
-        if log_level is not None:
-            av.logging.restore_default_callback()
-
-    if not output_video_path.exists():
-        raise OSError(f"Video re-encoding did not work. File not found: {output_video_path}.")
-
-
 def concatenate_video_files(
    input_video_paths: list[Path | str],
    output_video_path: Path,
@@ -605,22 +536,21 @@ class _CameraEncoderThread(threading.Thread):
        self,
        video_path: Path,
        fps: int,
-        vcodec: str,
-        pix_fmt: str,
-        codec_options: dict[str, str],
+        video_encoder: VideoEncoderConfig,
        frame_queue: queue.Queue,
        result_queue: queue.Queue,
        stop_event: threading.Event,
+        encoder_threads: int | None = None,
    ):
        super().__init__(daemon=True)
        self.video_path = video_path
        self.fps = fps
-        self.vcodec = vcodec
-        self.pix_fmt = pix_fmt
-        self.codec_options = codec_options
+        self.video_encoder = video_encoder
+        self.is_depth = isinstance(video_encoder, DepthEncoderConfig)
        self.frame_queue = frame_queue
        self.result_queue = result_queue
        self.stop_event = stop_event
+        self.encoder_threads = encoder_threads

    def run(self) -> None:
        from .compute_stats import RunningQuantileStats, auto_downsample_height_width
@@ -645,12 +575,12 @@ class _CameraEncoderThread(threading.Thread):
                    # Sentinel: flush and close
                    break

-                # Ensure HWC uint8 numpy array
+                # Ensure HWC (RGB or depth) uint8 (RGB only) numpy array
                if isinstance(frame_data, np.ndarray):
-                    if frame_data.ndim == 3 and frame_data.shape[0] == 3:
+                    if frame_data.ndim == 3 and frame_data.shape[0] in (1, 3):
                        # CHW -> HWC
                        frame_data = frame_data.transpose(1, 2, 0)
-                    if frame_data.dtype != np.uint8:
+                    if not self.is_depth and frame_data.dtype != np.uint8:
                        frame_data = (frame_data * 255).astype(np.uint8)

                # Open container on first frame (to get width/height)
@@ -658,15 +588,29 @@ class _CameraEncoderThread(threading.Thread):
                    height, width = frame_data.shape[:2]
                    Path(self.video_path).parent.mkdir(parents=True, exist_ok=True)
                    container = av.open(str(self.video_path), "w")
-                    output_stream = container.add_stream(self.vcodec, self.fps, options=self.codec_options)
-                    output_stream.pix_fmt = self.pix_fmt
+                    output_stream = container.add_stream(
+                        self.video_encoder.vcodec,
+                        self.fps,
+                        options=self.video_encoder.get_codec_options(self.encoder_threads, as_strings=True),
+                    )
+                    output_stream.pix_fmt = self.video_encoder.pix_fmt
                    output_stream.width = width
                    output_stream.height = height
                    output_stream.time_base = Fraction(1, self.fps)

                # Encode frame with explicit timestamps
-                pil_img = Image.fromarray(frame_data)
-                video_frame = av.VideoFrame.from_image(pil_img)
+                if not self.is_depth:
+                    pil_img = Image.fromarray(frame_data)
+                    video_frame = av.VideoFrame.from_image(pil_img)
+                else:
+                    video_frame = quantize_depth(
+                        frame_data,
+                        depth_min=self.video_encoder.depth_min,
+                        depth_max=self.video_encoder.depth_max,
+                        shift=self.video_encoder.shift,
+                        use_log=self.video_encoder.use_log,
+                        video_backend=self.video_encoder.video_backend,
+                    )
                video_frame.pts = frame_count
                video_frame.time_base = Fraction(1, self.fps)
                packet = output_stream.encode(video_frame)
@@ -725,6 +669,7 @@ class StreamingVideoEncoder:
        self,
        fps: int,
        camera_encoder: VideoEncoderConfig | None = None,
+        depth_encoder: DepthEncoderConfig | None = None,
        queue_maxsize: int = 30,
        encoder_threads: int | None = None,
    ):
@@ -740,6 +685,7 @@ class StreamingVideoEncoder:
        """
        self.fps = fps
        self._camera_encoder = camera_encoder or camera_encoder_defaults()
+        self._depth_encoder = depth_encoder or depth_encoder_defaults()
        self._encoder_threads = encoder_threads
        self.queue_maxsize = queue_maxsize

@@ -752,18 +698,25 @@ class StreamingVideoEncoder:
        self._episode_active = False
        self._closed = False

-    def start_episode(self, video_keys: list[str], temp_dir: Path) -> None:
+    def start_episode(
+        self, video_keys: list[str], temp_dir: Path, depth_video_keys: list[str] | None = None
+    ) -> None:
        """Start encoder threads for a new episode.

        Args:
            video_keys: List of video feature keys (e.g. ["observation.images.laptop"])
            temp_dir: Base directory for temporary MP4 files
+            depth_video_keys: List of video feature keys that carry depth maps (e.g.
+                ["observation.images.laptop_depth"]).  Defaults to ``[]`` (no depth keys).
        """
        if self._episode_active:
            self.cancel_episode()

        self._dropped_frames.clear()

+        if depth_video_keys is None:
+            depth_video_keys = []
+
        for video_key in video_keys:
            frame_queue: queue.Queue = queue.Queue(maxsize=self.queue_maxsize)
            result_queue: queue.Queue = queue.Queue(maxsize=1)
@@ -772,17 +725,15 @@ class StreamingVideoEncoder:
            temp_video_dir = Path(tempfile.mkdtemp(dir=temp_dir))
            video_path = temp_video_dir / f"{video_key.replace('/', '_')}_streaming.mp4"

-            vcodec = self._camera_encoder.vcodec
-            codec_options = self._camera_encoder.get_codec_options(self._encoder_threads, as_strings=True)
+            encoder = self._depth_encoder if video_key in depth_video_keys else self._camera_encoder
            encoder_thread = _CameraEncoderThread(
                video_path=video_path,
                fps=self.fps,
-                vcodec=vcodec,
-                pix_fmt=self._camera_encoder.pix_fmt,
-                codec_options=codec_options,
+                video_encoder=encoder,
                frame_queue=frame_queue,
                result_queue=result_queue,
                stop_event=stop_event,
+                encoder_threads=self._encoder_threads,
            )
            encoder_thread.start()

@@ -989,13 +940,13 @@ def get_audio_info(video_path: Path | str) -> dict:

 def get_video_info(
    video_path: Path | str,
-    camera_encoder: VideoEncoderConfig | None = None,
+    video_encoder: VideoEncoderConfig | None = None,
 ) -> dict:
    """Build the ``video.*`` / ``audio.*`` info dict persisted in ``info.json``.

    Args:
        video_path: Path to the encoded video file to probe.
-        camera_encoder: If provided, record the exact encoder settings used to encode this
+        video_encoder: If provided, record the exact encoder settings used to encode this
            video. Stream-derived values take precedence — encoder fields are only written for keys
            not already populated from the video file itself.
    """
@@ -1015,13 +966,10 @@ def get_video_info(
        video_info["video.width"] = video_stream.width
        video_info["video.codec"] = video_stream.codec.canonical_name
        video_info["video.pix_fmt"] = video_stream.pix_fmt
-        video_info["video.is_depth_map"] = False

        # Calculate fps from r_frame_rate
        video_info["video.fps"] = int(video_stream.base_rate)
-
-        pixel_channels = get_video_pixel_channels(video_stream.pix_fmt)
-        video_info["video.channels"] = pixel_channels
+        video_info["video.channels"] = get_pix_fmt_channels(video_stream.pix_fmt)

    # Reset logging level
    av.logging.restore_default_callback()
@@ -1030,27 +978,18 @@ def get_video_info(
    video_info.update(**get_audio_info(video_path))

    # Add additional encoder configuration if provided
-    if camera_encoder is not None:
-        for field_name, field_value in asdict(camera_encoder).items():
+    if video_encoder is not None:
+        for field_name, field_value in asdict(video_encoder).items():
            # vcodec is already populated from the video stream
            if field_name == "vcodec":
                continue
            video_info.setdefault(f"video.{field_name}", field_value)

+    video_info["is_depth_map"] = isinstance(video_encoder, DepthEncoderConfig)
+
    return video_info


-def get_video_pixel_channels(pix_fmt: str) -> int:
-    if "gray" in pix_fmt or "depth" in pix_fmt or "monochrome" in pix_fmt:
-        return 1
-    elif "rgba" in pix_fmt or "yuva" in pix_fmt:
-        return 4
-    elif "rgb" in pix_fmt or "yuv" in pix_fmt:
-        return 3
-    else:
-        raise ValueError("Unknown format")
-
-
 def get_video_duration_in_s(video_path: Path | str) -> float:
    """
    Get the duration of a video file in seconds using PyAV.
--- a/src/lerobot/processor/init.py
+++ b/src/lerobot/processor/init.py
@@ -95,13 +95,6 @@ from .relative_action_processor import (
 from .rename_processor import RenameObservationsProcessorStep, rename_stats
 from .tokenizer_processor import ActionTokenizerProcessorStep, TokenizerProcessorStep

-# RenderMessagesStep is intentionally NOT re-exported here: it pulls in
-# `lerobot.datasets.language`, which requires the `[dataset]` extra
-# (`datasets`, `pyarrow`). Importing it from the processor package would
-# break every base-install consumer of `lerobot.processor`. Users that
-# need it import directly:
-#   from lerobot.processor.render_messages_processor import RenderMessagesStep
-
 __all__ = [
    "ActionProcessorStep",
    "AddTeleopActionAsComplimentaryDataStep",
--- a/src/lerobot/processor/batch_processor.py
+++ b/src/lerobot/processor/batch_processor.py
@@ -174,24 +174,6 @@ class AddBatchDimensionComplementaryDataStep(ComplementaryDataProcessorStep):
            task_index_value = complementary_data["task_index"]
            if isinstance(task_index_value, Tensor) and task_index_value.dim() == 0:
                complementary_data["task_index"] = task_index_value.unsqueeze(0)
-
-        complementary_data.pop("language_persistent", None)
-        complementary_data.pop("language_events", None)
-
-        if "messages" in complementary_data:
-            messages = complementary_data["messages"]
-            if isinstance(messages, list) and (not messages or isinstance(messages[0], dict)):
-                complementary_data["messages"] = [messages]
-
-        if "message_streams" in complementary_data:
-            streams = complementary_data["message_streams"]
-            if isinstance(streams, list) and (not streams or isinstance(streams[0], str)):
-                complementary_data["message_streams"] = [streams]
-
-        if "target_message_indices" in complementary_data:
-            indices = complementary_data["target_message_indices"]
-            if isinstance(indices, list) and (not indices or isinstance(indices[0], int)):
-                complementary_data["target_message_indices"] = [indices]
        return complementary_data

    def transform_features(
--- a/src/lerobot/processor/converters.py
+++ b/src/lerobot/processor/converters.py
@@ -153,30 +153,26 @@ def from_tensor_to_numpy(x: torch.Tensor | Any) -> np.ndarray | float | int | An
    return x


-_COMPLEMENTARY_KEYS = (
-    "task",
-    "index",
-    "task_index",
-    "episode_index",
-    "timestamp",
-    "language_persistent",
-    "language_events",
-    "messages",
-    "message_streams",
-    "target_message_indices",
-)
-
-
 def _extract_complementary_data(batch: dict[str, Any]) -> dict[str, Any]:
-    """Extract complementary data from a batch dictionary.
+    """
+    Extract complementary data from a batch dictionary.

-    Includes padding flags (any key containing ``_is_pad``) plus the fixed
-    set of metadata / language keys defined in ``_COMPLEMENTARY_KEYS`` —
-    each only when present in ``batch``.
+    This includes padding flags, task description, and indices.
+
+    Args:
+        batch: The batch dictionary.
+
+    Returns:
+        A dictionary with the extracted complementary data.
    """
    pad_keys = {k: v for k, v in batch.items() if "_is_pad" in k}
-    extras = {k: batch[k] for k in _COMPLEMENTARY_KEYS if k in batch}
-    return {**pad_keys, **extras}
+    task_key = {"task": batch["task"]} if "task" in batch else {}
+    subtask_key = {"subtask": batch["subtask"]} if "subtask" in batch else {}
+    index_key = {"index": batch["index"]} if "index" in batch else {}
+    task_index_key = {"task_index": batch["task_index"]} if "task_index" in batch else {}
+    episode_index_key = {"episode_index": batch["episode_index"]} if "episode_index" in batch else {}
+
+    return {**pad_keys, **task_key, **subtask_key, **index_key, **task_index_key, **episode_index_key}


 def create_transition(
--- a/src/lerobot/processor/render_messages_processor.py
+++ b/src/lerobot/processor/render_messages_processor.py
@@ -1,84 +0,0 @@
-#!/usr/bin/env python
-
-# Copyright 2026 The HuggingFace Inc. team. All rights reserved.
-#
-# Licensed under the Apache License, Version 2.0 (the "License");
-# you may not use this file except in compliance with the License.
-# You may obtain a copy of the License at
-#
-#     http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-
-from __future__ import annotations
-
-from dataclasses import dataclass
-from typing import Any
-
-from lerobot.configs import PipelineFeatureType, PolicyFeature
-from lerobot.configs.recipe import TrainingRecipe
-from lerobot.datasets.language import LANGUAGE_EVENTS, LANGUAGE_PERSISTENT
-from lerobot.datasets.language_render import render_sample
-from lerobot.types import EnvTransition, TransitionKey
-from lerobot.utils.utils import unwrap_scalar
-
-from .pipeline import ProcessorStep, ProcessorStepRegistry
-
-
-@dataclass
-@ProcessorStepRegistry.register(name="render_messages_processor")
-class RenderMessagesStep(ProcessorStep):
-    """Processor step that turns raw language columns into rendered chat messages.
-
-    Reads ``language_persistent`` and ``language_events`` from the transition's
-    complementary data, renders them through ``recipe`` at the sample timestamp,
-    and replaces the raw columns with the resulting ``messages`` /
-    ``message_streams`` / ``target_message_indices`` keys.
-    """
-
-    recipe: TrainingRecipe
-    dataset_ctx: Any | None = None
-
-    def __call__(self, transition: EnvTransition) -> EnvTransition | None:
-        """Render messages for a single transition; return ``None`` to drop it."""
-        complementary_data = transition.get(TransitionKey.COMPLEMENTARY_DATA) or {}
-        persistent = complementary_data.get(LANGUAGE_PERSISTENT) or []
-        events = complementary_data.get(LANGUAGE_EVENTS) or []
-
-        if not persistent and not events:
-            return transition
-
-        timestamp = complementary_data.get("timestamp")
-        if timestamp is None:
-            raise KeyError("RenderMessagesStep requires sample timestamp in complementary data.")
-
-        sample_idx = complementary_data.get("index", 0)
-        rendered = render_sample(
-            recipe=self.recipe,
-            persistent=persistent,
-            events=events,
-            t=unwrap_scalar(timestamp),
-            sample_idx=int(unwrap_scalar(sample_idx)),
-            task=complementary_data.get("task"),
-            dataset_ctx=self.dataset_ctx,
-        )
-        if rendered is None:
-            return None
-
-        new_transition = transition.copy()
-        new_complementary_data = dict(complementary_data)
-        new_complementary_data.pop(LANGUAGE_PERSISTENT, None)
-        new_complementary_data.pop(LANGUAGE_EVENTS, None)
-        new_complementary_data.update(rendered)
-        new_transition[TransitionKey.COMPLEMENTARY_DATA] = new_complementary_data
-        return new_transition
-
-    def transform_features(
-        self, features: dict[PipelineFeatureType, dict[str, PolicyFeature]]
-    ) -> dict[PipelineFeatureType, dict[str, PolicyFeature]]:
-        """Pass features through unchanged; rendering only touches complementary data."""
-        return features
--- a/src/lerobot/rewards/init.py
+++ b/src/lerobot/rewards/init.py
@@ -21,13 +21,11 @@ from .factory import (
 )
 from .pretrained import PreTrainedRewardModel as PreTrainedRewardModel
 from .sarm.configuration_sarm import SARMConfig as SARMConfig
-from .topreward.configuration_topreward import TOPRewardConfig as TOPRewardConfig

 __all__ = [
    # Configuration classes
    "RewardClassifierConfig",
    "SARMConfig",
-    "TOPRewardConfig",
    # Base class
    "PreTrainedRewardModel",
    # Factory functions
--- a/src/lerobot/rewards/factory.py
+++ b/src/lerobot/rewards/factory.py
@@ -26,7 +26,6 @@ from lerobot.processor import PolicyAction, PolicyProcessorPipeline
 from .classifier.configuration_classifier import RewardClassifierConfig
 from .pretrained import PreTrainedRewardModel
 from .sarm.configuration_sarm import SARMConfig
-from .topreward.configuration_topreward import TOPRewardConfig


 def get_reward_model_class(name: str) -> type[PreTrainedRewardModel]:
@@ -38,7 +37,7 @@ def get_reward_model_class(name: str) -> type[PreTrainedRewardModel]:

    Args:
        name: The name of the reward model. Supported names are "reward_classifier",
-              "sarm", "topreward".
+              "sarm".

    Returns:
        The reward model class corresponding to the given name.
@@ -54,10 +53,6 @@ def get_reward_model_class(name: str) -> type[PreTrainedRewardModel]:
        from lerobot.rewards.sarm.modeling_sarm import SARMRewardModel

        return SARMRewardModel
-    elif name == "topreward":
-        from lerobot.rewards.topreward.modeling_topreward import TOPRewardModel
-
-        return TOPRewardModel
    else:
        try:
            return _get_reward_model_cls_from_name(name=name)
@@ -74,7 +69,7 @@ def make_reward_model_config(reward_type: str, **kwargs) -> RewardModelConfig:

    Args:
        reward_type: The type of the reward model. Supported types include
-                     "reward_classifier", "sarm", "topreward".
+                     "reward_classifier", "sarm".
        **kwargs: Keyword arguments to be passed to the configuration class constructor.

    Returns:
@@ -87,8 +82,6 @@ def make_reward_model_config(reward_type: str, **kwargs) -> RewardModelConfig:
        return RewardClassifierConfig(**kwargs)
    elif reward_type == "sarm":
        return SARMConfig(**kwargs)
-    elif reward_type == "topreward":
-        return TOPRewardConfig(**kwargs)
    else:
        try:
            config_cls = RewardModelConfig.get_choice_class(reward_type)
@@ -169,14 +162,6 @@ def make_reward_pre_post_processors(
            dataset_meta=kwargs.get("dataset_meta"),
        )

-    elif isinstance(reward_cfg, TOPRewardConfig):
-        from lerobot.rewards.topreward.processor_topreward import make_topreward_pre_post_processors
-
-        return make_topreward_pre_post_processors(
-            config=reward_cfg,
-            dataset_stats=kwargs.get("dataset_stats"),
-        )
-
    else:
        try:
            processors = _make_processors_from_reward_model_config(
--- a/src/lerobot/rewards/topreward/init.py
+++ b/src/lerobot/rewards/topreward/init.py
@@ -1,19 +0,0 @@
-# Copyright 2026 The HuggingFace Inc. team. All rights reserved.
-#
-# Licensed under the Apache License, Version 2.0 (the "License");
-# you may not use this file except in compliance with the License.
-# You may obtain a copy of the License at
-#
-#     http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-
-from .configuration_topreward import TOPRewardConfig
-from .modeling_topreward import TOPRewardModel
-from .processor_topreward import make_topreward_pre_post_processors
-
-__all__ = ["TOPRewardConfig", "TOPRewardModel", "make_topreward_pre_post_processors"]
--- a/src/lerobot/rewards/topreward/compute_rabc_weights.py
+++ b/src/lerobot/rewards/topreward/compute_rabc_weights.py
@@ -1,353 +0,0 @@
-#!/usr/bin/env python
-
-# Copyright 2026 The HuggingFace Inc. team. All rights reserved.
-#
-# Licensed under the Apache License, Version 2.0 (the "License");
-# you may not use this file except in compliance with the License.
-# You may obtain a copy of the License at
-#
-#     http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-
-"""Compute per-frame TOPReward progress curves for a LeRobot dataset.
-
-For each episode, scores trajectory prefixes of increasing length using
-the TOPReward reward model, min-max normalises the raw log-prob rewards per episode,
-and writes a parquet file with one row per frame.
-
-The parquet uses the same schema as SARM's :mod:`lerobot.rewards.sarm.compute_rabc_weights`.
-
-Usage:
-    # Sparse-dense mode (15 anchors per episode, matches upstream)
-    python -m lerobot.rewards.topreward.compute_rabc_weights \\
-        --dataset-repo-id lerobot/libero_10_image \\
-        --num-samples 15
-
-    # Use a different VLM backbone
-    python -m lerobot.rewards.topreward.compute_rabc_weights \\
-        --dataset-repo-id lerobot/libero_10_image \\
-        --vlm-name Qwen/Qwen3-VL-4B-Instruct
-"""
-
-from __future__ import annotations
-
-import argparse
-import logging
-from pathlib import Path
-from typing import Any
-
-import numpy as np
-import pyarrow as pa
-import pyarrow.parquet as pq
-import torch
-from tqdm import tqdm
-
-from lerobot.datasets import LeRobotDataset
-from lerobot.rewards.topreward.configuration_topreward import TOPRewardConfig
-from lerobot.rewards.topreward.modeling_topreward import TOPRewardModel
-from lerobot.rewards.topreward.processor_topreward import TOPRewardEncoderProcessorStep
-from lerobot.types import TransitionKey
-
-DEFAULT_OUTPUT_FILENAME = "topreward_progress.parquet"
-
-
-def get_reward_model_path_from_parquet(parquet_path: Path) -> str | None:
-    """Read ``reward_model_path`` from parquet metadata if available."""
-    if not parquet_path.exists():
-        return None
-    try:
-        metadata = pq.read_metadata(parquet_path).schema.to_arrow_schema().metadata
-        if metadata and b"reward_model_path" in metadata:
-            return metadata[b"reward_model_path"].decode()
-    except Exception:  # nosec B110
-        return None
-    return None
-
-
-def _resolve_task(sample: dict[str, Any], default: str) -> str:
-    """Best-effort task extraction from a dataset sample."""
-    task = sample.get("task")
-    if isinstance(task, str) and task:
-        return task
-    return default
-
-
-def normalize_rewards(rewards: list[float] | np.ndarray) -> np.ndarray:
-    """Min-max normalise raw log-prob rewards into ``[0, 1]``."""
-    rewards_arr = np.asarray(rewards, dtype=np.float64)
-    if rewards_arr.size == 0:
-        return rewards_arr.astype(np.float32)
-    if rewards_arr.size == 1:
-        return np.array([1.0], dtype=np.float32)
-    r_min, r_max = rewards_arr.min(), rewards_arr.max()
-    if r_max == r_min:
-        return np.ones_like(rewards_arr, dtype=np.float32)
-    return ((rewards_arr - r_min) / (r_max - r_min)).astype(np.float32)
-
-
-def compute_instruction_rewards_for_prefixes(
-    model: TOPRewardModel,
-    encoder: TOPRewardEncoderProcessorStep,
-    dataset: LeRobotDataset,
-    ep_start: int,
-    num_frames: int,
-    task: str,
-    image_key: str,
-    num_samples: int | None,
-    device: str,
-) -> np.ndarray:
-    """Score an episode via prefix sweep and return a per-frame normalised curve."""
-    if num_samples is None or num_samples >= num_frames:
-        prefix_lengths = np.arange(1, num_frames + 1, dtype=np.int64)
-    else:
-        prefix_lengths = np.unique(np.linspace(1, num_frames, num_samples).round().astype(np.int64))
-
-    episode_frames = torch.stack([dataset[ep_start + i][image_key] for i in range(num_frames)])
-    rewards: list[float] = []
-    for length in prefix_lengths:
-        frames = episode_frames[: int(length)].unsqueeze(0)  # (1, T, C, H, W)
-
-        transition = {
-            TransitionKey.OBSERVATION: {image_key: frames},
-            TransitionKey.COMPLEMENTARY_DATA: {"task": task},
-        }
-        encoded = encoder(transition)
-        obs = encoded[TransitionKey.OBSERVATION]
-        batch = {
-            key: value.to(device) if isinstance(value, torch.Tensor) else value for key, value in obs.items()
-        }
-
-        with torch.no_grad():
-            reward = model.compute_reward(batch)
-        rewards.append(float(reward.item()))
-
-    normalized_rewards = normalize_rewards(rewards)
-
-    if prefix_lengths.shape[0] == num_frames:
-        return normalized_rewards
-
-    return np.interp(
-        np.arange(1, num_frames + 1, dtype=np.float64),
-        prefix_lengths.astype(np.float64),
-        normalized_rewards.astype(np.float64),
-    ).astype(np.float32)
-
-
-def compute_topreward_progress(
-    dataset_repo_id: str,
-    reward_model_path: str | None = None,
-    vlm_name: str | None = None,
-    output_path: str | None = None,
-    device: str = "cuda",
-    num_samples: int | None = None,
-    fps: float | None = None,
-    episodes: list[int] | None = None,
-) -> Path:
-    """Run TOPReward over a dataset and write per-frame progress."""
-    if reward_model_path is not None:
-        logging.info(f"Loading TOPReward config from: {reward_model_path}")
-        model = TOPRewardModel.from_pretrained(reward_model_path)
-        config = model.config
-        config.device = device
-        if vlm_name is not None and vlm_name != config.vlm_name:
-            logging.info(f"Overriding vlm_name from config: {config.vlm_name} -> {vlm_name}")
-            config.vlm_name = vlm_name
-            model = TOPRewardModel(config)
-    else:
-        config_kwargs: dict[str, Any] = {"device": device}
-        if vlm_name is not None:
-            config_kwargs["vlm_name"] = vlm_name
-        if fps is not None:
-            config_kwargs["fps"] = fps
-        config = TOPRewardConfig(**config_kwargs)
-        logging.info(f"Constructing TOPReward with VLM: {config.vlm_name}")
-        model = TOPRewardModel(config)
-
-    model.to(device).eval()
-
-    encoder = TOPRewardEncoderProcessorStep(
-        vlm_name=config.vlm_name,
-        image_key=config.image_key,
-        task_key=config.task_key,
-        default_task=config.default_task,
-        max_frames=None,  # no tail-crop: we control prefix length explicitly
-        fps=config.fps,
-        prompt_prefix=config.prompt_prefix,
-        prompt_suffix_template=config.prompt_suffix_template,
-        add_chat_template=config.add_chat_template,
-        max_length=config.max_input_length,
-    )
-
-    image_key = config.image_key
-
-    logging.info(f"Loading dataset: {dataset_repo_id}")
-    dataset = LeRobotDataset(dataset_repo_id, download_videos=True)
-    logging.info(f"Dataset: {dataset.num_episodes} episodes, {dataset.num_frames} frames")
-
-    episode_indices = list(range(dataset.num_episodes)) if episodes is None else episodes
-    logging.info(f"Processing {len(episode_indices)} episode(s)")
-
-    all_index: list[int] = []
-    all_episode: list[int] = []
-    all_frame: list[int] = []
-    all_progress: list[float] = []
-
-    for episode_idx in tqdm(episode_indices, desc="Episodes"):
-        ep = dataset.meta.episodes[episode_idx]
-        ep_start = int(ep["dataset_from_index"])
-        ep_end = int(ep["dataset_to_index"])
-        num_frames = ep_end - ep_start
-        if num_frames <= 0:
-            continue
-
-        first_sample = dataset[ep_start]
-        task = _resolve_task(first_sample, default=config.default_task or "perform the task")
-
-        per_frame = compute_instruction_rewards_for_prefixes(
-            model=model,
-            encoder=encoder,
-            dataset=dataset,
-            ep_start=ep_start,
-            num_frames=num_frames,
-            task=task,
-            image_key=image_key,
-            num_samples=num_samples,
-            device=device,
-        )
-
-        for local in range(num_frames):
-            all_index.append(ep_start + local)
-            all_episode.append(episode_idx)
-            all_frame.append(local)
-            all_progress.append(float(per_frame[local]))
-
-        if device.startswith("cuda"):
-            torch.cuda.empty_cache()
-
-    table = pa.table(
-        {
-            "index": np.asarray(all_index, dtype=np.int64),
-            "episode_index": np.asarray(all_episode, dtype=np.int64),
-            "frame_index": np.asarray(all_frame, dtype=np.int64),
-            "progress_sparse": np.asarray(all_progress, dtype=np.float32),
-        }
-    )
-
-    schema_metadata: dict[bytes, bytes] = {b"vlm_name": config.vlm_name.encode()}
-    if reward_model_path is not None:
-        schema_metadata[b"reward_model_path"] = reward_model_path.encode()
-    table = table.replace_schema_metadata(schema_metadata)
-
-    out = Path(dataset.root) / DEFAULT_OUTPUT_FILENAME if output_path is None else Path(output_path)
-    out.parent.mkdir(parents=True, exist_ok=True)
-    pq.write_table(table, out)
-    logging.info(f"Saved {len(table)} frame values to {out}")
-
-    progress_arr = np.asarray(all_progress, dtype=np.float32)
-    if progress_arr.size:
-        logging.info(
-            f"Progress: mean={float(progress_arr.mean()):.4f}, "
-            f"std={float(progress_arr.std()):.4f}, "
-            f"min={float(progress_arr.min()):.4f}, "
-            f"max={float(progress_arr.max()):.4f}"
-        )
-    return out
-
-
-def main():
-    parser = argparse.ArgumentParser(
-        description="Compute per-frame TOPReward progress curves for RA-BC weighting.",
-        formatter_class=argparse.RawDescriptionHelpFormatter,
-        epilog="""
-Examples:
-    # Sparse-dense mode (matches upstream TOPReward num_samples=15)
-    python -m lerobot.rewards.topreward.compute_rabc_weights \\
-        --dataset-repo-id lerobot/libero_10_image \\
-        --num-samples 15
-
-    # Use a smaller VLM
-    python -m lerobot.rewards.topreward.compute_rabc_weights \\
-        --dataset-repo-id lerobot/libero_10_image \\
-        --vlm-name Qwen/Qwen3-VL-4B-Instruct
-        """,
-    )
-    parser.add_argument(
-        "--dataset-repo-id", type=str, required=True, help="HuggingFace dataset repo id or local path."
-    )
-    parser.add_argument(
-        "--reward-model-path", type=str, default=None, help="Optional TOPReward LeRobot config."
-    )
-    parser.add_argument("--vlm-name", type=str, default=None, help="Override the VLM backbone (HF Hub id).")
-    parser.add_argument("--output-path", type=str, default=None, help="Output parquet path.")
-    parser.add_argument("--device", type=str, default="cuda", help="Device to use (default: cuda).")
-    parser.add_argument(
-        "--num-samples",
-        type=int,
-        default=None,
-        help="Anchor prefix samples per episode. None = dense. 15 matches upstream.",
-    )
-    parser.add_argument(
-        "--episodes",
-        type=int,
-        nargs="+",
-        default=None,
-        help="Process only these episode indices (e.g. --episodes 0 or --episodes 0 5 10).",
-    )
-    parser.add_argument("--fps", type=float, default=None, help="Override TOPRewardConfig.fps.")
-    parser.add_argument(
-        "--push-to-hub", action="store_true", help="Upload to the dataset repo on HuggingFace Hub."
-    )
-
-    args = parser.parse_args()
-
-    logging.basicConfig(level=logging.INFO, format="%(asctime)s %(levelname)s %(message)s")
-
-    output_path = compute_topreward_progress(
-        dataset_repo_id=args.dataset_repo_id,
-        reward_model_path=args.reward_model_path,
-        vlm_name=args.vlm_name,
-        output_path=args.output_path,
-        device=args.device,
-        num_samples=args.num_samples,
-        fps=args.fps,
-        episodes=args.episodes,
-    )
-
-    print(f"\nTOPReward progress saved to: {output_path}")
-
-    if args.push_to_hub:
-        from huggingface_hub import HfApi
-
-        api = HfApi()
-        hub_path = DEFAULT_OUTPUT_FILENAME
-
-        print(f"\nUploading to Hub: {args.dataset_repo_id}/{hub_path}")
-        api.upload_file(
-            path_or_fileobj=str(output_path),
-            path_in_repo=hub_path,
-            repo_id=args.dataset_repo_id,
-            repo_type="dataset",
-        )
-        print(
-            "Successfully uploaded to: "
-            f"https://huggingface.co/datasets/{args.dataset_repo_id}/blob/main/{hub_path}"
-        )
-
-        print("\nTo use in training, add to your config:")
-        print("  use_rabc: true")
-        print(f"  rabc_progress_path: hf://datasets/{args.dataset_repo_id}/{hub_path}")
-        print("  rabc_head_mode: sparse")
-    else:
-        print("\nTo use in training, add to your config:")
-        print("  use_rabc: true")
-        print(f"  rabc_progress_path: {output_path}")
-        print("  rabc_head_mode: sparse")
-
-
-if __name__ == "__main__":
-    main()
--- a/src/lerobot/rewards/topreward/configuration_topreward.py
+++ b/src/lerobot/rewards/topreward/configuration_topreward.py
@@ -1,146 +0,0 @@
-# Copyright 2026 The HuggingFace Inc. team. All rights reserved.
-#
-# Licensed under the Apache License, Version 2.0 (the "License");
-# you may not use this file except in compliance with the License.
-# You may obtain a copy of the License at
-#
-#     http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-
-from __future__ import annotations
-
-from dataclasses import dataclass, field
-
-from lerobot.configs import FeatureType, NormalizationMode, PolicyFeature
-from lerobot.configs.rewards import RewardModelConfig
-from lerobot.utils.constants import OBS_IMAGES
-
-# Default prompt scaffolding from the upstream TOPReward paper / reference
-# implementation (``QwenClient.compute_instruction_reward``). The prompt
-# scores the terminal ``True`` token in ``f"{instruction} ... True"``
-# given the video.
-DEFAULT_PROMPT_PREFIX = (
-    "The above video shows a robot manipulation trajectory that completes the following task: "
-)
-DEFAULT_PROMPT_SUFFIX_TEMPLATE = (
-    "{instruction} Decide whether the above statement is True or not. The answer is: True"
-)
-
-
-@RewardModelConfig.register_subclass("topreward")
-@dataclass
-class TOPRewardConfig(RewardModelConfig):
-    """Configuration for the TOPReward zero-shot reward model.
-
-    TOPReward is **zero-shot**: it has no learnable parameters of its own.
-    The "model" is a generic vision-language model (default
-    ``Qwen/Qwen3-VL-8B-Instruct``) used with a fixed prompt to extract
-    token log-probabilities as a reward signal. There is therefore no
-    fine-tuned checkpoint to host: ``pretrained_path`` is unused at
-    runtime — the model identity is :attr:`vlm_name` (an HF Hub id).
-
-    Args:
-        vlm_name: Hugging Face Hub id of the underlying VLM. Must be a
-            Qwen3-VL family model (the only client implemented in this
-            LeRobot port).
-        torch_dtype: Torch dtype name passed to the VLM loader
-            (``"auto"``, ``"bfloat16"``, ``"float16"``, ...).
-        attn_implementation: ``transformers`` attention implementation
-            (e.g. ``"flash_attention_2"``, ``"sdpa"``). Defaults to
-            ``None`` so the upstream picks the best available.
-        image_key: Observation key that holds the trajectory frames.
-        task_key: Complementary-data key that holds the task instruction.
-        default_task: Fallback instruction when ``task_key`` is absent.
-        max_frames: Cap on the number of frames fed to the VLM per
-            sample. ``None`` = use all frames.
-        fps: Frames-per-second metadata for the Qwen video processor.
-        prompt_prefix: Text shown to the VLM right after the video and
-            before the suffix template.
-        prompt_suffix_template: Suffix appended after ``prompt_prefix``.
-            Must contain ``{instruction}``; the VLM scores the
-            log-likelihood of the tokens that follow the prefix.
-        add_chat_template: If ``True``, wrap the full prompt with the
-            tokenizer's chat template before tokenisation (matches
-            upstream ``add_chat_template=True``).
-        success_threshold: Optional log-prob threshold. If finite,
-            :meth:`TOPRewardModel.compute_reward` returns
-            ``(reward > success_threshold).float()`` instead of the raw
-            log-prob.
-        max_input_length: Hard limit on the total tokenized input length;
-            samples that exceed it raise a ``ValueError``.
-    """
-
-    # Path to a local LeRobot dir or HF repo that holds a ``config.json``
-    # snapshot of this TOPRewardConfig. The VLM weights themselves are
-    # always identified by ``vlm_name``.
-    pretrained_path: str | None = None
-
-    vlm_name: str = "Qwen/Qwen3-VL-8B-Instruct"
-    torch_dtype: str = "auto"
-    attn_implementation: str | None = None
-
-    image_key: str = OBS_IMAGES + ".top"
-    task_key: str = "task"
-    default_task: str | None = None
-    max_frames: int | None = 16
-    fps: float = 2.0
-
-    prompt_prefix: str = DEFAULT_PROMPT_PREFIX
-    prompt_suffix_template: str = DEFAULT_PROMPT_SUFFIX_TEMPLATE
-    add_chat_template: bool = False
-
-    success_threshold: float = float("-inf")
-    max_input_length: int = 32768
-
-    license: str | None = "mit"  # matches upstream TOPReward
-    tags: list[str] | None = field(
-        default_factory=lambda: ["reward-model", "vision-language", "qwen3-vl", "zero-shot"]
-    )
-
-    input_features: dict[str, PolicyFeature] = field(default_factory=dict)
-    output_features: dict[str, PolicyFeature] = field(default_factory=dict)
-    normalization_mapping: dict[str, NormalizationMode] = field(
-        default_factory=lambda: {
-            "VISUAL": NormalizationMode.IDENTITY,
-            "REWARD": NormalizationMode.IDENTITY,
-        }
-    )
-
-    def __post_init__(self) -> None:
-        super().__post_init__()
-        if self.max_frames is not None and self.max_frames < 1:
-            raise ValueError(f"max_frames must be >= 1, got {self.max_frames}")
-        if self.fps <= 0:
-            raise ValueError(f"fps must be > 0, got {self.fps}")
-        if "{instruction}" not in self.prompt_suffix_template:
-            raise ValueError(
-                "prompt_suffix_template must contain `{instruction}` so the model "
-                "scores the log-likelihood of the task suffix."
-            )
-        if self.max_input_length <= 0:
-            raise ValueError(f"max_input_length must be > 0, got {self.max_input_length}")
-
-        if self.image_key not in self.input_features:
-            self.input_features[self.image_key] = PolicyFeature(shape=(3, 224, 224), type=FeatureType.VISUAL)
-        self.output_features.setdefault("reward", PolicyFeature(shape=(1,), type=FeatureType.REWARD))
-
-    @property
-    def observation_delta_indices(self) -> list[int] | None:
-        return None
-
-    @property
-    def action_delta_indices(self) -> None:
-        return None
-
-    @property
-    def reward_delta_indices(self) -> None:
-        return None
-
-    def validate_features(self) -> None:
-        if self.image_key not in self.input_features:
-            raise ValueError(f"TOPReward requires image input feature {self.image_key!r}")
--- a/src/lerobot/rewards/topreward/modeling_topreward.py
+++ b/src/lerobot/rewards/topreward/modeling_topreward.py
@@ -1,238 +0,0 @@
-# Copyright 2026 Shirui Chen, Cole Harrison, Ying-Chun Lee, Angela Jin Yang,
-# Zhongzheng Ren, Lillian J. Ratliff, Jiafei Duan, Dieter Fox, Ranjay Krishna
-# and The HuggingFace Inc. team. All rights reserved.
-#
-# Licensed under the Apache License, Version 2.0 (the "License");
-# you may not use this file except in compliance with the License.
-# You may obtain a copy of the License at
-#
-#     http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-
-"""TOPReward: Token Probabilities as Hidden Zero-Shot Rewards for Robotics.
-
-Paper:         https://arxiv.org/abs/2602.19313
-Project:       https://topreward.github.io/webpage/
-Original code: https://github.com/TOPReward/TOPReward
-Backbone:      https://huggingface.co/Qwen/Qwen3-VL-8B-Instruct  (default)
-
-TOPReward is a **zero-shot** reward model: it has no fine-tuned weights of
-its own. Given a video trajectory and a task instruction, it asks an
-off-the-shelf VLM how likely the instruction is, conditioned on the video,
-and returns that log-likelihood as the reward signal.
-
-Inference recipe:
-
-1. The processor builds a chat-style prompt, tokenises it, and emits
-   ``input_ids``, ``attention_mask``, vision tensors, and ``labels``.
-   The processor label-masks everything except the terminal answer token with
-   ``-100``.
-2. Forward the full token sequence through the VLM.
-3. Read the terminal answer token log-probability from the logits as the
-   scalar reward.
-
-With the default ``prompt_suffix_template``, the only unmasked token is the
-literal ``"True"`` at the end — the reward is
-``log P("True" | video + prompt + instruction)``.
-
-This LeRobot port is **inference-only and not trainable** — :meth:`forward`
-is intentionally inherited from :class:`PreTrainedRewardModel` and raises
-``NotImplementedError``, making :attr:`PreTrainedRewardModel.is_trainable`
-return ``False``.
-
-Because the VLM weights live on the Hugging Face Hub under their canonical
-id (``Qwen/Qwen3-VL-8B-Instruct`` etc.) and TOPReward never modifies them,
-:meth:`_save_pretrained` and :meth:`from_pretrained` are overridden so a
-TOPReward LeRobot "checkpoint" is a single ``config.json`` (the VLM is
-re-fetched from the Hub at load time).
-"""
-
-from __future__ import annotations
-
-import builtins
-import logging
-import os
-from pathlib import Path
-from tempfile import TemporaryDirectory
-from typing import TYPE_CHECKING, Any, TypeVar
-
-import numpy as np
-import torch
-from huggingface_hub import HfApi, hf_hub_download
-from huggingface_hub.constants import CONFIG_NAME
-from huggingface_hub.errors import HfHubHTTPError
-from torch import Tensor
-from torch.nn.functional import cross_entropy
-
-from lerobot.configs.rewards import RewardModelConfig
-from lerobot.rewards.pretrained import PreTrainedRewardModel
-from lerobot.rewards.topreward.configuration_topreward import TOPRewardConfig
-from lerobot.rewards.topreward.processor_topreward import TOPREWARD_FEATURE_PREFIX, TOPREWARD_INPUT_KEYS
-from lerobot.utils.import_utils import _transformers_available, require_package
-
-if TYPE_CHECKING:
-    from lerobot.configs.train import TrainPipelineConfig
-
-if TYPE_CHECKING or _transformers_available:
-    from transformers import Qwen3VLForConditionalGeneration
-else:
-    Qwen3VLForConditionalGeneration = None  # type: ignore[assignment]
-
-logger = logging.getLogger(__name__)
-
-T = TypeVar("T", bound="TOPRewardModel")
-
-
-def _torch_dtype(name: str) -> torch.dtype | str:
-    """Resolve a torch dtype name; ``"auto"`` is passed through verbatim."""
-    if name == "auto":
-        return "auto"
-    dtype = getattr(torch, name, None)
-    if isinstance(dtype, torch.dtype):
-        return dtype
-    raise ValueError(f"Unknown torch dtype: {name!r}")
-
-
-class TOPRewardModel(PreTrainedRewardModel):
-    """TOPReward zero-shot reward model."""
-
-    name = "topreward"
-    config_class = TOPRewardConfig
-
-    def __init__(self, config: TOPRewardConfig) -> None:
-        require_package("transformers", extra="topreward")
-        super().__init__(config)
-        self.config = config
-
-        torch_dtype = _torch_dtype(config.torch_dtype)
-        model_kwargs: dict[str, Any] = {"dtype": torch_dtype, "trust_remote_code": True}
-        if config.attn_implementation is not None:
-            model_kwargs["attn_implementation"] = config.attn_implementation
-
-        self.model = Qwen3VLForConditionalGeneration.from_pretrained(config.vlm_name, **model_kwargs)
-
-    def compute_reward(self, batch: dict[str, Any]) -> Tensor:
-        """Return one log-prob reward per sample in the batch."""
-        inputs: dict[str, Any] = {}
-        for key in TOPREWARD_INPUT_KEYS:
-            batch_key = f"{TOPREWARD_FEATURE_PREFIX}{key}"
-            if batch_key not in batch:
-                raise KeyError(
-                    f"TOPReward batch missing `{batch_key}`. Make sure the "
-                    "TOPRewardEncoderProcessorStep ran before `compute_reward`."
-                )
-            inputs[key] = batch[batch_key]
-
-        device = next(self.model.parameters()).device
-        inputs = {key: value.to(device) if hasattr(value, "to") else value for key, value in inputs.items()}
-        labels = inputs.pop("labels")
-        inputs["logits_to_keep"] = 2
-
-        self.eval()
-        with torch.no_grad():
-            outputs = self.model(**inputs)
-        logits = outputs.logits
-        rewards = -cross_entropy(logits[:, -2, :].float(), labels[:, -1], reduction="none")
-        if np.isfinite(self.config.success_threshold):
-            rewards = (rewards > self.config.success_threshold).float()
-        return rewards.to(self.config.device or "cpu")
-
-    def _save_pretrained(self, save_directory: Path) -> None:
-        """Save ``config.json`` only."""
-        self.config._save_pretrained(save_directory)
-
-    @classmethod
-    def from_pretrained(
-        cls: builtins.type[T],
-        pretrained_name_or_path: str | Path,
-        *,
-        config: RewardModelConfig | None = None,
-        force_download: bool = False,
-        resume_download: bool | None = None,
-        proxies: dict | None = None,
-        token: str | bool | None = None,
-        cache_dir: str | Path | None = None,
-        local_files_only: bool = False,
-        revision: str | None = None,
-        strict: bool = False,  # noqa: ARG003 — accepted for API parity; unused (no safetensors to load)
-        **kwargs: Any,
-    ) -> T:
-        """Load a TOPReward configuration and instantiate the wrapped VLM."""
-        if config is None:
-            config = RewardModelConfig.from_pretrained(
-                pretrained_name_or_path=pretrained_name_or_path,
-                force_download=force_download,
-                resume_download=resume_download,
-                proxies=proxies,
-                token=token,
-                cache_dir=cache_dir,
-                local_files_only=local_files_only,
-                revision=revision,
-                **kwargs,
-            )
-        if not isinstance(config, TOPRewardConfig):
-            raise TypeError(
-                f"Expected a TOPRewardConfig, got {type(config).__name__}. Make sure "
-                f"`pretrained_name_or_path={pretrained_name_or_path!r}` points at a "
-                "TOPReward checkpoint."
-            )
-
-        model_id = str(pretrained_name_or_path)
-        if not os.path.isdir(model_id):
-            try:
-                hf_hub_download(
-                    repo_id=model_id,
-                    filename=CONFIG_NAME,
-                    revision=revision,
-                    cache_dir=cache_dir,
-                    force_download=force_download,
-                    proxies=proxies,
-                    resume_download=resume_download,
-                    token=token,
-                    local_files_only=local_files_only,
-                )
-            except HfHubHTTPError as e:
-                raise FileNotFoundError(
-                    f"{CONFIG_NAME} not found on the HuggingFace Hub in {model_id}"
-                ) from e
-
-        instance = cls(config, **kwargs)
-        instance.to(config.device)
-        instance.eval()
-        return instance
-
-    def push_model_to_hub(self, cfg: TrainPipelineConfig):
-        """Push the TOPReward ``config.json`` + model card to the Hub."""
-        api = HfApi()
-        repo_id = api.create_repo(
-            repo_id=self.config.repo_id, private=self.config.private, exist_ok=True
-        ).repo_id
-
-        with TemporaryDirectory(ignore_cleanup_errors=True) as tmp:
-            saved_path = Path(tmp) / repo_id
-            saved_path.mkdir(parents=True, exist_ok=True)
-
-            self.config._save_pretrained(saved_path)
-
-            card = self.generate_model_card(
-                cfg.dataset.repo_id, self.config.type, self.config.license, self.config.tags
-            )
-            card.save(str(saved_path / "README.md"))
-
-            cfg.save_pretrained(saved_path)
-
-            commit_info = api.upload_folder(
-                repo_id=repo_id,
-                repo_type="model",
-                folder_path=saved_path,
-                commit_message="Upload TOPReward config and readme",
-                allow_patterns=["*.json", "*.yaml", "*.md"],
-                ignore_patterns=["*.tmp", "*.log", "*.safetensors"],
-            )
-
-            logger.info(f"Model pushed to {commit_info.repo_url.url}")
--- a/src/lerobot/rewards/topreward/processor_topreward.py
+++ b/src/lerobot/rewards/topreward/processor_topreward.py
@@ -1,305 +0,0 @@
-# Copyright 2026 The HuggingFace Inc. team. All rights reserved.
-#
-# Licensed under the Apache License, Version 2.0 (the "License");
-# you may not use this file except in compliance with the License.
-# You may obtain a copy of the License at
-#
-#     http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-
-"""TOPReward pre/post processing pipeline."""
-
-from __future__ import annotations
-
-from dataclasses import dataclass, field
-from typing import TYPE_CHECKING, Any
-
-import torch
-from torch import Tensor
-
-from lerobot.configs import PipelineFeatureType, PolicyFeature
-from lerobot.processor import (
-    AddBatchDimensionProcessorStep,
-    DeviceProcessorStep,
-    PolicyAction,
-    PolicyProcessorPipeline,
-    ProcessorStep,
-    ProcessorStepRegistry,
-    policy_action_to_transition,
-)
-from lerobot.rewards.topreward.configuration_topreward import (
-    DEFAULT_PROMPT_PREFIX,
-    DEFAULT_PROMPT_SUFFIX_TEMPLATE,
-    TOPRewardConfig,
-)
-from lerobot.types import EnvTransition, TransitionKey
-from lerobot.utils.constants import (
-    OBS_IMAGES,
-    OBS_PREFIX,
-    POLICY_POSTPROCESSOR_DEFAULT_NAME,
-    POLICY_PREPROCESSOR_DEFAULT_NAME,
-)
-from lerobot.utils.import_utils import _transformers_available, require_package
-
-if TYPE_CHECKING or _transformers_available:
-    from transformers import AutoProcessor
-else:
-    AutoProcessor = None
-
-TOPREWARD_FEATURE_PREFIX = f"{OBS_PREFIX}topreward."
-
-_TRUE_ANSWER = "True"
-
-TOPREWARD_VLM_INPUT_KEYS = (
-    "input_ids",
-    "attention_mask",
-    "pixel_values_videos",
-    "video_grid_thw",
-    "mm_token_type_ids",
-)
-TOPREWARD_INPUT_KEYS = TOPREWARD_VLM_INPUT_KEYS + ("labels",)
-
-
-def _prepare_video_batch(video: Tensor, *, max_frames: int | None) -> Tensor:
-    """Return videos as ``(B, T, C, H, W)`` uint8 tensors for Qwen3-VL."""
-    if video.ndim == 4:
-        video = video.unsqueeze(1)
-    elif video.ndim != 5:
-        raise ValueError(
-            f"Expected TOPReward frames with shape (B,C,H,W) or (B,T,C,H,W); got {tuple(video.shape)}"
-        )
-
-    if max_frames is not None:
-        video = video[:, -max_frames:]
-    if video.shape[-1] in (1, 3):
-        video = video.permute(0, 1, 4, 2, 3)
-    elif video.shape[2] not in (1, 3):
-        raise ValueError(f"Expected channel dim of size 1 or 3, got shape {tuple(video.shape)}")
-
-    if video.is_floating_point():
-        video = video * 255.0
-
-    return video.clamp(0, 255).to(torch.uint8).contiguous()
-
-
-def _expand_tasks(task: Any, *, batch_size: int, default: str | None) -> list[str]:
-    if task is None:
-        task = default
-    if task is None:
-        raise KeyError("TOPReward expected a task description in complementary data")
-    if isinstance(task, str):
-        return [task] * batch_size
-    if isinstance(task, tuple):
-        task = list(task)
-    if not (isinstance(task, list) and all(isinstance(item, str) for item in task)):
-        raise TypeError(f"TOPReward task must be a string or list of strings, got {type(task)}")
-    if len(task) == 1 and batch_size > 1:
-        return task * batch_size
-    if len(task) != batch_size:
-        raise ValueError(f"Expected {batch_size} tasks, got {len(task)}")
-    return task
-
-
-@dataclass
-@ProcessorStepRegistry.register(name="topreward_encoder")
-class TOPRewardEncoderProcessorStep(ProcessorStep):
-    """Encode raw frames + task into Qwen-VL tensors for the TOPReward model.
-
-    Loads a :class:`~transformers.AutoProcessor` matching ``vlm_name`` and
-    builds the full chat prompt including the instruction suffix. The
-    resulting ``input_ids``, ``attention_mask``, vision tensors, and
-    ``labels`` are written under the ``observation.topreward.*`` namespace
-    so the model can score without re-tokenising.
-
-    At call time the step reads:
-
-    - ``observation[image_key]``: ``(B, T, C, H, W)`` or ``(B, C, H, W)`` frames.
-    - ``complementary_data[task_key]``: a string or list of strings.
-
-    and writes ``observation[f"{TOPREWARD_FEATURE_PREFIX}<name>"]`` for the
-    Qwen-VL tensors plus ``labels``.
-    """
-
-    vlm_name: str = "Qwen/Qwen3-VL-8B-Instruct"
-    image_key: str = OBS_IMAGES + ".top"
-    task_key: str = "task"
-    default_task: str | None = None
-    max_frames: int | None = 16
-    fps: float = 2.0
-    prompt_prefix: str = DEFAULT_PROMPT_PREFIX
-    prompt_suffix_template: str = DEFAULT_PROMPT_SUFFIX_TEMPLATE
-    add_chat_template: bool = False
-    max_length: int = 32768
-
-    _processor: Any = field(default=None, init=False, repr=False)
-
-    def __post_init__(self) -> None:
-        require_package("transformers", extra="topreward")
-        self._processor = AutoProcessor.from_pretrained(self.vlm_name, trust_remote_code=True)
-
-    def __call__(self, transition: EnvTransition) -> EnvTransition:
-        observation = transition.get(TransitionKey.OBSERVATION)
-        complementary = transition.get(TransitionKey.COMPLEMENTARY_DATA) or {}
-        if self.image_key not in observation:
-            raise KeyError(f"TOPReward expected image key {self.image_key!r} in observation")
-
-        frames = observation[self.image_key]
-        videos = frames.detach().cpu() if isinstance(frames, Tensor) else torch.as_tensor(frames)
-        videos = _prepare_video_batch(videos, max_frames=self.max_frames)
-
-        batch_size = videos.shape[0]
-        tasks = _expand_tasks(
-            complementary.get(self.task_key, self.default_task),
-            batch_size=batch_size,
-            default=self.default_task,
-        )
-
-        encoded = self._encode_batch(videos, tasks, batch_size)
-
-        new_observation = dict(observation)
-        for key, value in encoded.items():
-            new_observation[f"{TOPREWARD_FEATURE_PREFIX}{key}"] = value
-
-        new_transition = transition.copy()
-        new_transition[TransitionKey.OBSERVATION] = new_observation
-        return new_transition
-
-    def _encode_batch(self, videos: Tensor, tasks: list[str], batch_size) -> dict[str, Any]:
-        """Tokenise a batch of (frames, task) pairs into Qwen-VL tensors.
-
-        The loop only builds per-sample chat strings. Tokenisation, padding,
-        video preprocessing, and label construction are batched.
-        """
-
-        texts: list[str] = []
-        video_metadata = [
-            {
-                "total_num_frames": int(videos.shape[1]),
-                "fps": float(self.fps),
-                "frames_indices": list(range(int(videos.shape[1]))),
-            }
-            for _ in range(batch_size)
-        ]
-        eos_token = self._processor.tokenizer.eos_token
-
-        for i in range(batch_size):
-            instruction_suffix = self.prompt_suffix_template.format(instruction=tasks[i])
-            if self.add_chat_template:
-                suffix_for_template = instruction_suffix.removesuffix(_TRUE_ANSWER).rstrip()
-                templated_messages = [
-                    {
-                        "role": "user",
-                        "content": [
-                            {"type": "video", "video": videos[i], "fps": self.fps},
-                            {"type": "text", "text": f"{self.prompt_prefix}{suffix_for_template}"},
-                        ],
-                    }
-                ]
-                prompt_chat = self._processor.apply_chat_template(
-                    templated_messages, tokenize=False, add_generation_prompt=True
-                )
-                full_text = f"{prompt_chat}{_TRUE_ANSWER}"
-            else:
-                user_messages = [
-                    {
-                        "role": "user",
-                        "content": [
-                            {"type": "video", "video": videos[i], "fps": self.fps},
-                            {"type": "text", "text": self.prompt_prefix},
-                        ],
-                    }
-                ]
-                prompt_chat = self._processor.apply_chat_template(
-                    user_messages, tokenize=False, add_generation_prompt=False
-                )
-                if eos_token is not None:
-                    prompt_chat = prompt_chat.split(eos_token)[0]
-                full_text = f"{prompt_chat}{instruction_suffix}"
-
-            texts.append(full_text)
-
-        result = self._processor(
-            text=texts,
-            videos=videos,
-            video_metadata=video_metadata,
-            do_sample_frames=False,
-            padding=True,
-            padding_side="left",
-            return_tensors="pt",
-        )
-        input_ids = result["input_ids"]
-
-        if input_ids.shape[-1] > self.max_length:
-            raise ValueError(
-                f"TOPReward input length {input_ids.shape[-1]} exceeds max_length "
-                f"{self.max_length}; lower `max_frames` or raise `max_length`."
-            )
-
-        labels = torch.full_like(input_ids, -100)
-        labels[:, -1] = input_ids[:, -1]
-        result["labels"] = labels
-        return result
-
-    def transform_features(
-        self, features: dict[PipelineFeatureType, dict[str, PolicyFeature]]
-    ) -> dict[PipelineFeatureType, dict[str, PolicyFeature]]:
-        return features
-
-    def get_config(self) -> dict[str, Any]:
-        return {
-            "vlm_name": self.vlm_name,
-            "image_key": self.image_key,
-            "task_key": self.task_key,
-            "default_task": self.default_task,
-            "max_frames": self.max_frames,
-            "fps": self.fps,
-            "prompt_prefix": self.prompt_prefix,
-            "prompt_suffix_template": self.prompt_suffix_template,
-            "add_chat_template": self.add_chat_template,
-            "max_length": self.max_length,
-        }
-
-
-def make_topreward_pre_post_processors(
-    config: TOPRewardConfig,
-    dataset_stats: dict[str, dict[str, Any]] | None = None,
-) -> tuple[
-    PolicyProcessorPipeline[dict[str, Any], dict[str, Any]],
-    PolicyProcessorPipeline[PolicyAction, PolicyAction],
-]:
-    """Pipeline that pre-encodes frames + task into Qwen-VL tensors.
-
-    The preprocessor adds a batch dimension if needed, runs TOPReward's
-    encoder (which tokenises the full prompt and emits ``labels``), and
-    moves everything to the configured device. The postprocessor is
-    the identity since TOPReward outputs a single reward tensor.
-    """
-    preprocessor = PolicyProcessorPipeline[dict[str, Any], dict[str, Any]](
-        steps=[
-            AddBatchDimensionProcessorStep(),
-            TOPRewardEncoderProcessorStep(
-                vlm_name=config.vlm_name,
-                image_key=config.image_key,
-                task_key=config.task_key,
-                default_task=config.default_task,
-                max_frames=config.max_frames,
-                fps=config.fps,
-                prompt_prefix=config.prompt_prefix,
-                prompt_suffix_template=config.prompt_suffix_template,
-                add_chat_template=config.add_chat_template,
-                max_length=config.max_input_length,
-            ),
-            DeviceProcessorStep(device=config.device or "cpu"),
-        ],
-        name=POLICY_PREPROCESSOR_DEFAULT_NAME,
-    )
-    postprocessor = PolicyProcessorPipeline(
-        name=POLICY_POSTPROCESSOR_DEFAULT_NAME,
-        to_transition=policy_action_to_transition,
-    )
-    return preprocessor, postprocessor
--- a/src/lerobot/robots/bi_rebot_b601_follower/init.py
+++ b/src/lerobot/robots/bi_rebot_b601_follower/init.py
@@ -1,20 +0,0 @@
-#!/usr/bin/env python
-
-# Copyright 2026 The HuggingFace Inc. team. All rights reserved.
-#
-# Licensed under the Apache License, Version 2.0 (the "License");
-# you may not use this file except in compliance with the License.
-# You may obtain a copy of the License at
-#
-#     http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-
-from .bi_rebot_b601_follower import BiRebotB601Follower
-from .config_bi_rebot_b601_follower import BiRebotB601FollowerConfig
-
-__all__ = ["BiRebotB601Follower", "BiRebotB601FollowerConfig"]
--- a/src/lerobot/robots/bi_rebot_b601_follower/bi_rebot_b601_follower.py
+++ b/src/lerobot/robots/bi_rebot_b601_follower/bi_rebot_b601_follower.py
@@ -1,150 +0,0 @@
-#!/usr/bin/env python
-
-# Copyright 2026 The HuggingFace Inc. team. All rights reserved.
-#
-# Licensed under the Apache License, Version 2.0 (the "License");
-# you may not use this file except in compliance with the License.
-# You may obtain a copy of the License at
-#
-#     http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-
-import logging
-from functools import cached_property
-
-from lerobot.types import RobotAction, RobotObservation
-from lerobot.utils.decorators import check_if_already_connected, check_if_not_connected
-
-from ..rebot_b601_follower import RebotB601Follower, RebotB601FollowerRobotConfig
-from ..robot import Robot
-from .config_bi_rebot_b601_follower import BiRebotB601FollowerConfig
-
-logger = logging.getLogger(__name__)
-
-
-class BiRebotB601Follower(Robot):
-    """Bimanual Seeed Studio reBot B601-DM follower.
-
-    Composes two single-arm :class:`RebotB601Follower` instances. Observation and
-    action keys of each arm are namespaced with a ``left_`` / ``right_`` prefix.
-    """
-
-    config_class = BiRebotB601FollowerConfig
-    name = "bi_rebot_b601_follower"
-
-    def __init__(self, config: BiRebotB601FollowerConfig):
-        super().__init__(config)
-        self.config = config
-
-        left_arm_config = RebotB601FollowerRobotConfig(
-            id=f"{config.id}_left" if config.id else None,
-            calibration_dir=config.calibration_dir,
-            port=config.left_arm_config.port,
-            can_adapter=config.left_arm_config.can_adapter,
-            dm_serial_baud=config.left_arm_config.dm_serial_baud,
-            disable_torque_on_disconnect=config.left_arm_config.disable_torque_on_disconnect,
-            max_relative_target=config.left_arm_config.max_relative_target,
-            cameras=config.left_arm_config.cameras,
-            motor_can_ids=config.left_arm_config.motor_can_ids,
-            pos_vel_velocity=config.left_arm_config.pos_vel_velocity,
-            gripper_torque_ratio=config.left_arm_config.gripper_torque_ratio,
-            joint_limits=config.left_arm_config.joint_limits,
-        )
-
-        right_arm_config = RebotB601FollowerRobotConfig(
-            id=f"{config.id}_right" if config.id else None,
-            calibration_dir=config.calibration_dir,
-            port=config.right_arm_config.port,
-            can_adapter=config.right_arm_config.can_adapter,
-            dm_serial_baud=config.right_arm_config.dm_serial_baud,
-            disable_torque_on_disconnect=config.right_arm_config.disable_torque_on_disconnect,
-            max_relative_target=config.right_arm_config.max_relative_target,
-            cameras=config.right_arm_config.cameras,
-            motor_can_ids=config.right_arm_config.motor_can_ids,
-            pos_vel_velocity=config.right_arm_config.pos_vel_velocity,
-            gripper_torque_ratio=config.right_arm_config.gripper_torque_ratio,
-            joint_limits=config.right_arm_config.joint_limits,
-        )
-
-        self.left_arm = RebotB601Follower(left_arm_config)
-        self.right_arm = RebotB601Follower(right_arm_config)
-
-        # Only for compatibility with parts of the codebase that expect `robot.cameras`.
-        self.cameras = {**self.left_arm.cameras, **self.right_arm.cameras}
-
-    @property
-    def _motors_ft(self) -> dict[str, type]:
-        return {
-            **{f"left_{k}": v for k, v in self.left_arm._motors_ft.items()},
-            **{f"right_{k}": v for k, v in self.right_arm._motors_ft.items()},
-        }
-
-    @property
-    def _cameras_ft(self) -> dict[str, tuple]:
-        return {
-            **{f"left_{k}": v for k, v in self.left_arm._cameras_ft.items()},
-            **{f"right_{k}": v for k, v in self.right_arm._cameras_ft.items()},
-        }
-
-    @cached_property
-    def observation_features(self) -> dict[str, type | tuple]:
-        return {**self._motors_ft, **self._cameras_ft}
-
-    @cached_property
-    def action_features(self) -> dict[str, type]:
-        return self._motors_ft
-
-    @property
-    def is_connected(self) -> bool:
-        return self.left_arm.is_connected and self.right_arm.is_connected
-
-    @check_if_already_connected
-    def connect(self, calibrate: bool = True) -> None:
-        self.left_arm.connect(calibrate)
-        self.right_arm.connect(calibrate)
-
-    @property
-    def is_calibrated(self) -> bool:
-        return self.left_arm.is_calibrated and self.right_arm.is_calibrated
-
-    def calibrate(self) -> None:
-        self.left_arm.calibrate()
-        self.right_arm.calibrate()
-
-    def configure(self) -> None:
-        self.left_arm.configure()
-        self.right_arm.configure()
-
-    @check_if_not_connected
-    def get_observation(self) -> RobotObservation:
-        obs_dict = {}
-        obs_dict.update({f"left_{k}": v for k, v in self.left_arm.get_observation().items()})
-        obs_dict.update({f"right_{k}": v for k, v in self.right_arm.get_observation().items()})
-        return obs_dict
-
-    @check_if_not_connected
-    def send_action(self, action: RobotAction) -> RobotAction:
-        left_action = {
-            key.removeprefix("left_"): value for key, value in action.items() if key.startswith("left_")
-        }
-        right_action = {
-            key.removeprefix("right_"): value for key, value in action.items() if key.startswith("right_")
-        }
-
-        sent_action_left = self.left_arm.send_action(left_action)
-        sent_action_right = self.right_arm.send_action(right_action)
-
-        return {
-            **{f"left_{k}": v for k, v in sent_action_left.items()},
-            **{f"right_{k}": v for k, v in sent_action_right.items()},
-        }
-
-    @check_if_not_connected
-    def disconnect(self) -> None:
-        self.left_arm.disconnect()
-        self.right_arm.disconnect()
--- a/src/lerobot/robots/bi_rebot_b601_follower/config_bi_rebot_b601_follower.py
+++ b/src/lerobot/robots/bi_rebot_b601_follower/config_bi_rebot_b601_follower.py
@@ -1,29 +0,0 @@
-#!/usr/bin/env python
-
-# Copyright 2026 The HuggingFace Inc. team. All rights reserved.
-#
-# Licensed under the Apache License, Version 2.0 (the "License");
-# you may not use this file except in compliance with the License.
-# You may obtain a copy of the License at
-#
-#     http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-
-from dataclasses import dataclass
-
-from ..config import RobotConfig
-from ..rebot_b601_follower import RebotB601FollowerConfig
-
-
-@RobotConfig.register_subclass("bi_rebot_b601_follower")
-@dataclass
-class BiRebotB601FollowerConfig(RobotConfig):
-    """Configuration class for the bimanual reBot B601-DM follower robot."""
-
-    left_arm_config: RebotB601FollowerConfig
-    right_arm_config: RebotB601FollowerConfig
--- a/src/lerobot/robots/rebot_b601_follower/init.py
+++ b/src/lerobot/robots/rebot_b601_follower/init.py
@@ -1,20 +0,0 @@
-#!/usr/bin/env python
-
-# Copyright 2026 The HuggingFace Inc. team. All rights reserved.
-#
-# Licensed under the Apache License, Version 2.0 (the "License");
-# you may not use this file except in compliance with the License.
-# You may obtain a copy of the License at
-#
-#     http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-
-from .config_rebot_b601_follower import RebotB601FollowerConfig, RebotB601FollowerRobotConfig
-from .rebot_b601_follower import RebotB601Follower
-
-__all__ = ["RebotB601Follower", "RebotB601FollowerConfig", "RebotB601FollowerRobotConfig"]
--- a/src/lerobot/robots/rebot_b601_follower/config_rebot_b601_follower.py
+++ b/src/lerobot/robots/rebot_b601_follower/config_rebot_b601_follower.py
@@ -1,94 +0,0 @@
-#!/usr/bin/env python
-
-# Copyright 2026 The HuggingFace Inc. team. All rights reserved.
-#
-# Licensed under the Apache License, Version 2.0 (the "License");
-# you may not use this file except in compliance with the License.
-# You may obtain a copy of the License at
-#
-#     http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-
-from dataclasses import dataclass, field
-
-from lerobot.cameras import CameraConfig
-
-from ..config import RobotConfig
-
-
-@dataclass
-class RebotB601FollowerConfig:
-    """Base configuration class for the Seeed Studio reBot B601-DM follower arm.
-
-    The B601-DM is a 6-DOF arm plus gripper driven by Damiao CAN motors. Motor
-    communication goes through the ``motorbridge`` package.
-    """
-
-    # Communication port. For ``can_adapter="damiao"`` this is the Damiao serial
-    # bridge device (e.g. "/dev/ttyACM0"); for ``can_adapter="socketcan"`` it is
-    # the CAN channel name (e.g. "can0").
-    port: str
-
-    # CAN adapter type:
-    #   "damiao"    - Damiao dedicated serial bridge (default)
-    #   "socketcan" - SocketCAN based adapters (PCAN, slcan, embedded controllers, ...)
-    can_adapter: str = "damiao"
-
-    # Baud rate for the Damiao serial bridge (only used when can_adapter="damiao").
-    dm_serial_baud: int = 921600
-
-    disable_torque_on_disconnect: bool = True
-
-    # `max_relative_target` limits the magnitude of the relative positional target
-    # vector for safety purposes (in degrees). Set to a positive scalar to apply the
-    # same value to all motors, or to a dict mapping motor names to per-motor values.
-    max_relative_target: float | dict[str, float] | None = None
-
-    # cameras
-    cameras: dict[str, CameraConfig] = field(default_factory=dict)
-
-    # Maps motor names to their (send_can_id, recv_can_id) pair.
-    motor_can_ids: dict[str, tuple[int, int]] = field(
-        default_factory=lambda: {
-            "shoulder_pan": (0x01, 0x11),
-            "shoulder_lift": (0x02, 0x12),
-            "elbow_flex": (0x03, 0x13),
-            "wrist_flex": (0x04, 0x14),
-            "wrist_yaw": (0x05, 0x15),
-            "wrist_roll": (0x06, 0x16),
-            "gripper": (0x07, 0x17),
-        }
-    )
-
-    # Target velocity for joints running in POS_VEL mode, in degrees/s. A scalar is
-    # applied to every joint; a list provides one value per joint (in motor order).
-    pos_vel_velocity: float | list[float] = field(default_factory=lambda: [150.0] * 7)
-
-    # Torque/current ratio for the gripper's FORCE_POS mode, in range [0, 1].
-    gripper_torque_ratio: float = 0.1
-
-    # Soft joint limits (degrees). These are clipped against on every action.
-    joint_limits: dict[str, tuple[float, float]] = field(
-        default_factory=lambda: {
-            "shoulder_pan": (-145.0, 145.0),
-            "shoulder_lift": (-170.0, 1.0),
-            "elbow_flex": (-200.0, 1.0),
-            "wrist_flex": (-80.0, 90.0),
-            "wrist_yaw": (-90.0, 90.0),
-            "wrist_roll": (-90.0, 90.0),
-            "gripper": (-270.0, 0.0),
-        }
-    )
-
-
-@RobotConfig.register_subclass("rebot_b601_follower")
-@dataclass
-class RebotB601FollowerRobotConfig(RobotConfig, RebotB601FollowerConfig):
-    """Registered configuration for the reBot B601-DM follower robot."""
-
-    pass
--- a/src/lerobot/robots/rebot_b601_follower/rebot_b601_follower.py
+++ b/src/lerobot/robots/rebot_b601_follower/rebot_b601_follower.py
@@ -1,289 +0,0 @@
-#!/usr/bin/env python
-
-# Copyright 2026 The HuggingFace Inc. team. All rights reserved.
-#
-# Licensed under the Apache License, Version 2.0 (the "License");
-# you may not use this file except in compliance with the License.
-# You may obtain a copy of the License at
-#
-#     http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-
-import logging
-import math
-import time
-from functools import cached_property
-from typing import TYPE_CHECKING
-
-from lerobot.cameras import make_cameras_from_configs
-from lerobot.motors import MotorCalibration
-from lerobot.types import RobotAction, RobotObservation
-from lerobot.utils.decorators import check_if_already_connected, check_if_not_connected
-from lerobot.utils.import_utils import _motorbridge_available, require_package
-
-from ..robot import Robot
-from ..utils import ensure_safe_goal_position
-from .config_rebot_b601_follower import RebotB601FollowerRobotConfig
-
-if TYPE_CHECKING or _motorbridge_available:
-    from motorbridge import Controller as MotorBridgeController, Mode as MotorBridgeMode
-else:
-    MotorBridgeController = None
-    MotorBridgeMode = None
-
-logger = logging.getLogger(__name__)
-
-# Joint controlled in FORCE_POS mode; every other joint runs in POS_VEL mode.
-GRIPPER_MOTOR = "gripper"
-# Per-joint Damiao motor models for the B601-DM (passed to motorbridge).
-MOTOR_MODELS = {
-    "shoulder_pan": "4340P",
-    "shoulder_lift": "4340P",
-    "elbow_flex": "4340P",
-    "wrist_flex": "4310",
-    "wrist_yaw": "4310",
-    "wrist_roll": "4310",
-    "gripper": "4310",
-}
-_ENSURE_MODE_RETRIES = 9
-_SETTLE_SEC = 0.01
-_ZERO_SETTLE_SEC = 0.1
-
-
-class RebotB601Follower(Robot):
-    """Seeed Studio reBot B601-DM follower arm (6-DOF + gripper, Damiao CAN motors).
-
-    Motor communication is handled by the ``motorbridge`` package over a CAN bus,
-    reached either through a Damiao serial bridge or a SocketCAN adapter.
-    """
-
-    config_class = RebotB601FollowerRobotConfig
-    name = "rebot_b601_follower"
-
-    def __init__(self, config: RebotB601FollowerRobotConfig):
-        require_package("motorbridge", extra="rebot")
-        super().__init__(config)
-        self.config = config
-        self.bus: MotorBridgeController | None = None
-        self.motors: dict = {}
-        self.motor_names = list(config.motor_can_ids.keys())
-        self.cameras = make_cameras_from_configs(config.cameras)
-
-    @property
-    def _motors_ft(self) -> dict[str, type]:
-        return {f"{motor}.pos": float for motor in self.motor_names}
-
-    @property
-    def _cameras_ft(self) -> dict[str, tuple]:
-        return {
-            cam: (self.config.cameras[cam].height, self.config.cameras[cam].width, 3) for cam in self.cameras
-        }
-
-    @cached_property
-    def observation_features(self) -> dict[str, type | tuple]:
-        return {**self._motors_ft, **self._cameras_ft}
-
-    @cached_property
-    def action_features(self) -> dict[str, type]:
-        return self._motors_ft
-
-    @property
-    def is_connected(self) -> bool:
-        return self.bus is not None and all(cam.is_connected for cam in self.cameras.values())
-
-    @check_if_already_connected
-    def connect(self, calibrate: bool = True) -> None:
-        logger.info(f"Connecting {self} on {self.config.port} (adapter={self.config.can_adapter})...")
-        if self.config.can_adapter == "damiao":
-            self.bus = MotorBridgeController.from_dm_serial(
-                serial_port=self.config.port,
-                baud=self.config.dm_serial_baud,
-            )
-        elif self.config.can_adapter == "socketcan":
-            self.bus = MotorBridgeController(channel=self.config.port)
-        else:
-            raise ValueError(
-                f"Unsupported can_adapter '{self.config.can_adapter}'. Use 'damiao' or 'socketcan'."
-            )
-
-        for motor_name, (send_id, recv_id) in self.config.motor_can_ids.items():
-            self.motors[motor_name] = self.bus.add_damiao_motor(send_id, recv_id, MOTOR_MODELS[motor_name])
-
-        if not self.is_calibrated and calibrate:
-            logger.info(
-                "Mismatch between calibration values in the motor and the calibration file or no calibration file found"
-            )
-            self.calibrate()
-
-        for cam in self.cameras.values():
-            cam.connect()
-
-        self.configure()
-        logger.info(f"{self} connected.")
-
-    @property
-    def is_calibrated(self) -> bool:
-        return bool(self.calibration)
-
-    def calibrate(self) -> None:
-        if self.calibration:
-            user_input = input(
-                f"Press ENTER to use provided calibration file associated with the id {self.id}, "
-                "or type 'c' and press ENTER to run calibration: "
-            )
-            if user_input.strip().lower() != "c":
-                logger.info(f"Using calibration file associated with the id {self.id}")
-                return
-
-        logger.info(f"\nRunning calibration of {self}")
-        self.bus.disable_all()
-        print(
-            "\nCalibration: set zero position.\n"
-            "Manually move the reBot B601 to its ZERO POSITION and close the gripper.\n"
-            "See the B601 manual for the zero pose (the default sit-down position).\n"
-        )
-        input("Press ENTER when ready...")
-
-        for motor in self.motors.values():
-            motor.set_zero_position()
-            time.sleep(_ZERO_SETTLE_SEC)
-        logger.info("Arm zero position set.")
-
-        self.calibration = {}
-        for motor_name, (send_id, _recv_id) in self.config.motor_can_ids.items():
-            range_min, range_max = self.config.joint_limits[motor_name]
-            self.calibration[motor_name] = MotorCalibration(
-                id=send_id,
-                drive_mode=0,
-                homing_offset=0,
-                range_min=int(range_min),
-                range_max=int(range_max),
-            )
-
-        self._save_calibration()
-        print(f"Calibration saved to {self.calibration_fpath}")
-
-    def configure(self) -> None:
-        self.bus.enable_all()
-        for motor_name, motor in self.motors.items():
-            target_mode = (
-                MotorBridgeMode.FORCE_POS if motor_name == GRIPPER_MOTOR else MotorBridgeMode.POS_VEL
-            )
-            for attempt in range(_ENSURE_MODE_RETRIES + 1):
-                try:
-                    motor.ensure_mode(target_mode)
-                    break
-                except Exception:
-                    if attempt == _ENSURE_MODE_RETRIES:
-                        raise
-                    time.sleep(_SETTLE_SEC)
-            logger.debug(f"{motor_name} mode set to {target_mode}")
-
-    @check_if_not_connected
-    def disable_torque(self) -> None:
-        """Disable motor torque so the arm can be moved by hand (read-only debugging)."""
-        self.bus.disable_all()
-        logger.info(f"{self} torque disabled.")
-
-    def _present_pos(self) -> dict[str, float]:
-        """Read present joint positions in degrees."""
-        for motor in self.motors.values():
-            motor.request_feedback()
-        try:
-            self.bus.poll_feedback_once()
-        except Exception:
-            logger.warning("CAN bus poll feedback failed.")
-
-        present_pos = {}
-        for motor_name, motor in self.motors.items():
-            state = motor.get_state()
-            present_pos[motor_name] = math.degrees(state.pos) if state is not None else 0.0
-        return present_pos
-
-    @check_if_not_connected
-    def get_observation(self) -> RobotObservation:
-        start = time.perf_counter()
-        obs_dict = {f"{motor}.pos": pos for motor, pos in self._present_pos().items()}
-        dt_ms = (time.perf_counter() - start) * 1e3
-        logger.debug(f"{self} read state: {dt_ms:.1f}ms")
-
-        for cam_key, cam in self.cameras.items():
-            start = time.perf_counter()
-            obs_dict[cam_key] = cam.read_latest()
-            dt_ms = (time.perf_counter() - start) * 1e3
-            logger.debug(f"{self} read {cam_key}: {dt_ms:.1f}ms")
-
-        return obs_dict
-
-    @check_if_not_connected
-    def send_action(self, action: RobotAction) -> RobotAction:
-        """Command the arm to a target joint configuration.
-
-        Positions are expressed in degrees. The relative action magnitude may be
-        clipped depending on `max_relative_target`, so the action actually sent is
-        always returned.
-        """
-        goal_pos = {key.removesuffix(".pos"): val for key, val in action.items() if key.endswith(".pos")}
-
-        # Clip against soft joint limits.
-        for motor_name in list(goal_pos):
-            if motor_name in self.config.joint_limits:
-                min_limit, max_limit = self.config.joint_limits[motor_name]
-                clipped = max(min_limit, min(max_limit, goal_pos[motor_name]))
-                if clipped != goal_pos[motor_name]:
-                    logger.debug(f"Clipped {motor_name} from {goal_pos[motor_name]:.2f} to {clipped:.2f}")
-                goal_pos[motor_name] = clipped
-
-        # Tolerate 6-DOF leaders that have no wrist_yaw joint by holding it at zero.
-        # This is intentional: it lets a 6-DOF leader such as the SO-100 / SO-101
-        # (so100_leader / so101_leader) teleoperate this 7-DOF follower — the missing
-        # wrist_yaw command is simply treated as 0.0 instead of raising.
-        if "wrist_yaw" not in goal_pos:
-            goal_pos["wrist_yaw"] = 0.0
-
-        # Cap relative target when too far from the present position.
-        if self.config.max_relative_target is not None:
-            present_pos = self._present_pos()
-            goal_present_pos = {key: (g, present_pos.get(key, g)) for key, g in goal_pos.items()}
-            goal_pos = ensure_safe_goal_position(goal_present_pos, self.config.max_relative_target)
-
-        for motor_name, position_deg in goal_pos.items():
-            motor = self.motors.get(motor_name)
-            if motor is None:
-                continue
-            idx = self.motor_names.index(motor_name)
-            vel_deg_s = (
-                self.config.pos_vel_velocity[idx]
-                if isinstance(self.config.pos_vel_velocity, list)
-                else self.config.pos_vel_velocity
-            )
-            pos_rad = math.radians(position_deg)
-            vel_rad = math.radians(vel_deg_s)
-            if motor_name == GRIPPER_MOTOR:
-                motor.send_force_pos(pos_rad, vel_rad, self.config.gripper_torque_ratio)
-            else:
-                motor.send_pos_vel(pos_rad, vel_rad)
-
-        return {f"{motor}.pos": val for motor, val in goal_pos.items()}
-
-    @check_if_not_connected
-    def disconnect(self) -> None:
-        for motor in self.motors.values():
-            if self.config.disable_torque_on_disconnect:
-                motor.disable()
-            motor.clear_error()
-            motor.close()
-
-        self.bus.close()
-        self.bus = None
-        self.motors = {}
-
-        for cam in self.cameras.values():
-            cam.disconnect()
-
-        logger.info(f"{self} disconnected.")
--- a/src/lerobot/robots/so_follower/so_follower.py
+++ b/src/lerobot/robots/so_follower/so_follower.py
@@ -68,9 +68,12 @@ class SOFollower(Robot):

    @property
    def _cameras_ft(self) -> dict[str, tuple]:
-        return {
-            cam: (self.config.cameras[cam].height, self.config.cameras[cam].width, 3) for cam in self.cameras
-        }
+        features: dict[str, tuple] = {}
+        for cam in self.cameras:
+            features[cam] = (self.cameras[cam].height, self.cameras[cam].width, 3)
+            if getattr(self.cameras[cam], "use_depth", False):
+                features[f"{cam}_depth"] = (self.cameras[cam].height, self.cameras[cam].width, 1)
+        return features

    @cached_property
    def observation_features(self) -> dict[str, type | tuple]:
@@ -190,6 +193,12 @@ class SOFollower(Robot):
            dt_ms = (time.perf_counter() - start) * 1e3
            logger.debug(f"{self} read {cam_key}: {dt_ms:.1f}ms")

+            if getattr(cam, "use_depth", False):
+                start = time.perf_counter()
+                obs_dict[f"{cam_key}_depth"] = cam.read_latest_depth()
+                dt_ms = (time.perf_counter() - start) * 1e3
+                logger.debug(f"{self} read {cam_key} depth: {dt_ms:.1f}ms")
+
        return obs_dict

    @check_if_not_connected
--- a/src/lerobot/robots/utils.py
+++ b/src/lerobot/robots/utils.py
@@ -68,14 +68,6 @@ def make_robot_from_config(config: RobotConfig) -> Robot:
        from .bi_openarm_follower import BiOpenArmFollower

        return BiOpenArmFollower(config)
-    elif config.type == "rebot_b601_follower":
-        from .rebot_b601_follower import RebotB601Follower
-
-        return RebotB601Follower(config)
-    elif config.type == "bi_rebot_b601_follower":
-        from .bi_rebot_b601_follower import BiRebotB601Follower
-
-        return BiRebotB601Follower(config)
    elif config.type == "mock_robot":
        from tests.mocks.mock_robot import MockRobot

--- a/src/lerobot/rollout/context.py
+++ b/src/lerobot/rollout/context.py
@@ -333,6 +333,7 @@ def build_rollout_context(
                root=cfg.dataset.root,
                batch_encoding_size=cfg.dataset.video_encoding_batch_size,
                camera_encoder=cfg.dataset.camera_encoder,
+                depth_encoder=cfg.dataset.depth_encoder,
                streaming_encoding=cfg.dataset.streaming_encoding,
                encoder_queue_maxsize=cfg.dataset.encoder_queue_maxsize,
                encoder_threads=cfg.dataset.encoder_threads,
@@ -368,6 +369,7 @@ def build_rollout_context(
                * len(robot.cameras if hasattr(robot, "cameras") else []),
                batch_encoding_size=cfg.dataset.video_encoding_batch_size,
                camera_encoder=cfg.dataset.camera_encoder,
+                depth_encoder=cfg.dataset.depth_encoder,
                streaming_encoding=cfg.dataset.streaming_encoding,
                encoder_queue_maxsize=cfg.dataset.encoder_queue_maxsize,
                encoder_threads=cfg.dataset.encoder_threads,
--- a/src/lerobot/scripts/lerobot_calibrate.py
+++ b/src/lerobot/scripts/lerobot_calibrate.py
@@ -39,7 +39,6 @@ from lerobot.robots import (  # noqa: F401
    Robot,
    RobotConfig,
    bi_openarm_follower,
-    bi_rebot_b601_follower,
    bi_so_follower,
    hope_jr,
    koch_follower,
@@ -47,14 +46,12 @@ from lerobot.robots import (  # noqa: F401
    make_robot_from_config,
    omx_follower,
    openarm_follower,
-    rebot_b601_follower,
    so_follower,
 )
 from lerobot.teleoperators import (  # noqa: F401
    Teleoperator,
    TeleoperatorConfig,
    bi_openarm_leader,
-    bi_rebot_102_leader,
    bi_so_leader,
    homunculus,
    koch_leader,
@@ -62,7 +59,6 @@ from lerobot.teleoperators import (  # noqa: F401
    omx_leader,
    openarm_leader,
    openarm_mini,
-    rebot_102_leader,
    so_leader,
    unitree_g1,
 )
--- a/src/lerobot/scripts/lerobot_edit_dataset.py
+++ b/src/lerobot/scripts/lerobot_edit_dataset.py
@@ -178,31 +178,6 @@ Recompute stats for relative actions and push to hub:
        --operation.num_workers 4 \
        --push_to_hub true

-Re-encode all videos in a dataset (saves to lerobot/pusht_reencoded by default):
-    lerobot-edit-dataset \
-        --repo_id lerobot/pusht \
-        --operation.type reencode_videos \
-        --operation.camera_encoder.vcodec h264 \
-        --operation.camera_encoder.pix_fmt yuv420p \
-        --operation.camera_encoder.crf 23
-
-Re-encode videos into a new dataset using 4 parallel processes:
-    lerobot-edit-dataset \
-        --repo_id lerobot/pusht \
-        --new_repo_id lerobot/pusht_h264 \
-        --operation.type reencode_videos \
-        --operation.camera_encoder.vcodec h264 \
-        --operation.camera_encoder.crf 23 \
-        --operation.num_workers 4
-
-Re-encode videos in-place (overwrites original dataset):
-    lerobot-edit-dataset \
-        --repo_id lerobot/pusht \
-        --new_repo_id lerobot/pusht \
-        --operation.type reencode_videos \
-        --operation.camera_encoder.vcodec h264 \
-        --operation.overwrite true
-
 Using JSON config file:
    lerobot-edit-dataset \
        --config_path path/to/edit_config.json
@@ -225,7 +200,6 @@ from lerobot.datasets import (
    merge_datasets,
    modify_tasks,
    recompute_stats,
-    reencode_dataset,
    remove_feature,
    split_dataset,
 )
@@ -294,15 +268,6 @@ class RecomputeStatsConfig(OperationConfig):
    overwrite: bool = False


-@OperationConfig.register_subclass("reencode_videos")
-@dataclass
-class ReencodeVideosConfig(OperationConfig):
-    camera_encoder: VideoEncoderConfig = field(default_factory=camera_encoder_defaults)
-    num_workers: int = 0
-    encoder_threads: int | None = None
-    overwrite: bool = False
-
-
@OperationConfig.register_subclass("info")
@dataclass
 class InfoConfig(OperationConfig):
@@ -669,58 +634,6 @@ def handle_recompute_stats(cfg: EditDatasetConfig) -> None:
        dataset.push_to_hub()


-def handle_reencode_videos(cfg: EditDatasetConfig) -> None:
-    if not isinstance(cfg.operation, ReencodeVideosConfig):
-        raise ValueError("Operation config must be ReencodeVideosConfig")
-
-    output_repo_id, input_root, output_root = _resolve_io_paths(
-        cfg.repo_id,
-        cfg.new_repo_id,
-        cfg.root,
-        cfg.new_root,
-        default_new_repo_id=f"{cfg.repo_id}_reencoded",
-    )
-    in_place = output_root == input_root
-
-    if in_place and not cfg.operation.overwrite:
-        raise ValueError(
-            f"reencode_videos would overwrite the dataset in-place at {input_root}. "
-            "Pass --operation.overwrite true to allow in-place modification, "
-            "or use --new_repo_id / --new_root to write to a different location. "
-            f"Default output repo_id when neither is set: '{cfg.repo_id}_reencoded'."
-        )
-
-    if in_place:
-        logging.warning(
-            f"Overwriting dataset videos in-place at {input_root}. The original videos will be lost."
-        )
-        dataset = LeRobotDataset(cfg.repo_id, root=input_root)
-    else:
-        logging.info(f"Copying dataset from {input_root} to {output_root}")
-        if output_root.exists():
-            backup_path = output_root.with_name(output_root.name + "_old")
-            logging.warning(f"Output directory {output_root} already exists. Moving to {backup_path}")
-            if backup_path.exists():
-                shutil.rmtree(backup_path)
-            shutil.move(output_root, backup_path)
-        shutil.copytree(input_root, output_root)
-        dataset = LeRobotDataset(output_repo_id, root=output_root)
-
-    logging.info(f"Re-encoding videos in {output_repo_id} with {cfg.operation.camera_encoder}")
-    reencode_dataset(
-        dataset,
-        camera_encoder=cfg.operation.camera_encoder,
-        encoder_threads=cfg.operation.encoder_threads,
-        num_workers=cfg.operation.num_workers,
-    )
-
-    logging.info(f"All videos re-encoded at {dataset.root}")
-
-    if cfg.push_to_hub:
-        logging.info(f"Pushing to hub as {output_repo_id}...")
-        dataset.push_to_hub()
-
-
 def _get_dataset_size(repo_path):
    import os

@@ -794,8 +707,6 @@ def edit_dataset(cfg: EditDatasetConfig) -> None:
        handle_convert_image_to_video(cfg)
    elif operation_type == "recompute_stats":
        handle_recompute_stats(cfg)
-    elif operation_type == "reencode_videos":
-        handle_reencode_videos(cfg)
    elif operation_type == "info":
        handle_info(cfg)
    else:
--- a/src/lerobot/scripts/lerobot_find_joint_limits.py
+++ b/src/lerobot/scripts/lerobot_find_joint_limits.py
@@ -45,19 +45,16 @@ from lerobot.model import RobotKinematics
 from lerobot.robots import (  # noqa: F401
    RobotConfig,
    bi_openarm_follower,
-    bi_rebot_b601_follower,
    bi_so_follower,
    koch_follower,
    make_robot_from_config,
    omx_follower,
    openarm_follower,
-    rebot_b601_follower,
    so_follower,
 )
 from lerobot.teleoperators import (  # noqa: F401
    TeleoperatorConfig,
    bi_openarm_leader,
-    bi_rebot_102_leader,
    bi_so_leader,
    gamepad,
    koch_leader,
@@ -65,7 +62,6 @@ from lerobot.teleoperators import (  # noqa: F401
    omx_leader,
    openarm_leader,
    openarm_mini,
-    rebot_102_leader,
    so_leader,
 )
 from lerobot.utils.robot_utils import precise_sleep
--- a/src/lerobot/scripts/lerobot_record.py
+++ b/src/lerobot/scripts/lerobot_record.py
@@ -120,7 +120,6 @@ from lerobot.robots import (  # noqa: F401
    Robot,
    RobotConfig,
    bi_openarm_follower,
-    bi_rebot_b601_follower,
    bi_so_follower,
    earthrover_mini_plus,
    hope_jr,
@@ -129,7 +128,6 @@ from lerobot.robots import (  # noqa: F401
    omx_follower,
    openarm_follower,
    reachy2,
-    rebot_b601_follower,
    so_follower,
    unitree_g1 as unitree_g1_robot,
 )
@@ -137,7 +135,6 @@ from lerobot.teleoperators import (  # noqa: F401
    Teleoperator,
    TeleoperatorConfig,
    bi_openarm_leader,
-    bi_rebot_102_leader,
    bi_so_leader,
    homunculus,
    koch_leader,
@@ -146,7 +143,6 @@ from lerobot.teleoperators import (  # noqa: F401
    openarm_leader,
    openarm_mini,
    reachy2_teleoperator,
-    rebot_102_leader,
    so_leader,
    unitree_g1,
 )
@@ -403,6 +399,7 @@ def record(
                root=cfg.dataset.root,
                batch_encoding_size=cfg.dataset.video_encoding_batch_size,
                camera_encoder=cfg.dataset.camera_encoder,
+                depth_encoder=cfg.dataset.depth_encoder,
                encoder_threads=cfg.dataset.encoder_threads,
                streaming_encoding=cfg.dataset.streaming_encoding,
                encoder_queue_maxsize=cfg.dataset.encoder_queue_maxsize,
@@ -432,6 +429,7 @@ def record(
                image_writer_threads=cfg.dataset.num_image_writer_threads_per_camera * len(robot.cameras),
                batch_encoding_size=cfg.dataset.video_encoding_batch_size,
                camera_encoder=cfg.dataset.camera_encoder,
+                depth_encoder=cfg.dataset.depth_encoder,
                encoder_threads=cfg.dataset.encoder_threads,
                streaming_encoding=cfg.dataset.streaming_encoding,
                encoder_queue_maxsize=cfg.dataset.encoder_queue_maxsize,
--- a/src/lerobot/scripts/lerobot_replay.py
+++ b/src/lerobot/scripts/lerobot_replay.py
@@ -56,7 +56,6 @@ from lerobot.robots import (  # noqa: F401
    Robot,
    RobotConfig,
    bi_openarm_follower,
-    bi_rebot_b601_follower,
    bi_so_follower,
    earthrover_mini_plus,
    hope_jr,
@@ -65,7 +64,6 @@ from lerobot.robots import (  # noqa: F401
    omx_follower,
    openarm_follower,
    reachy2,
-    rebot_b601_follower,
    so_follower,
    unitree_g1,
 )
--- a/src/lerobot/scripts/lerobot_rollout.py
+++ b/src/lerobot/scripts/lerobot_rollout.py
@@ -144,7 +144,6 @@ from lerobot.robots import (  # noqa: F401
    Robot,
    RobotConfig,
    bi_openarm_follower,
-    bi_rebot_b601_follower,
    bi_so_follower,
    earthrover_mini_plus,
    hope_jr,
@@ -152,7 +151,6 @@ from lerobot.robots import (  # noqa: F401
    omx_follower,
    openarm_follower,
    reachy2,
-    rebot_b601_follower,
    so_follower,
    unitree_g1 as unitree_g1_robot,
 )
@@ -161,7 +159,6 @@ from lerobot.teleoperators import (  # noqa: F401
    Teleoperator,
    TeleoperatorConfig,
    bi_openarm_leader,
-    bi_rebot_102_leader,
    bi_so_leader,
    homunculus,
    koch_leader,
@@ -169,7 +166,6 @@ from lerobot.teleoperators import (  # noqa: F401
    openarm_leader,
    openarm_mini,
    reachy2_teleoperator,
-    rebot_102_leader,
    so_leader,
    unitree_g1,
 )
--- a/src/lerobot/scripts/lerobot_setup_motors.py
+++ b/src/lerobot/scripts/lerobot_setup_motors.py
@@ -30,24 +30,20 @@ import draccus

 from lerobot.robots import (  # noqa: F401
    RobotConfig,
-    bi_rebot_b601_follower,
    bi_so_follower,
    koch_follower,
    lekiwi,
    make_robot_from_config,
    omx_follower,
-    rebot_b601_follower,
    so_follower,
 )
 from lerobot.teleoperators import (  # noqa: F401
    TeleoperatorConfig,
-    bi_rebot_102_leader,
    bi_so_leader,
    koch_leader,
    make_teleoperator_from_config,
    omx_leader,
    openarm_mini,
-    rebot_102_leader,
    so_leader,
 )

--- a/src/lerobot/scripts/lerobot_teleoperate.py
+++ b/src/lerobot/scripts/lerobot_teleoperate.py
@@ -72,7 +72,6 @@ from lerobot.robots import (  # noqa: F401
    Robot,
    RobotConfig,
    bi_openarm_follower,
-    bi_rebot_b601_follower,
    bi_so_follower,
    earthrover_mini_plus,
    hope_jr,
@@ -81,7 +80,6 @@ from lerobot.robots import (  # noqa: F401
    omx_follower,
    openarm_follower,
    reachy2,
-    rebot_b601_follower,
    so_follower,
    unitree_g1 as unitree_g1_robot,
 )
@@ -89,7 +87,6 @@ from lerobot.teleoperators import (  # noqa: F401
    Teleoperator,
    TeleoperatorConfig,
    bi_openarm_leader,
-    bi_rebot_102_leader,
    bi_so_leader,
    gamepad,
    homunculus,
@@ -100,7 +97,6 @@ from lerobot.teleoperators import (  # noqa: F401
    openarm_leader,
    openarm_mini,
    reachy2_teleoperator,
-    rebot_102_leader,
    so_leader,
    unitree_g1,
 )
--- a/src/lerobot/scripts/lerobot_train.py
+++ b/src/lerobot/scripts/lerobot_train.py
@@ -48,7 +48,6 @@ from lerobot.envs import close_envs, make_env, make_env_pre_post_processors
 from lerobot.optim.factory import make_optimizer_and_scheduler
 from lerobot.policies import PreTrainedPolicy, make_policy, make_pre_post_processors
 from lerobot.rewards import make_reward_pre_post_processors
-from lerobot.utils.collate import lerobot_collate_fn
 from lerobot.utils.import_utils import register_third_party_plugins
 from lerobot.utils.logging_utils import AverageMeter, MetricsTracker
 from lerobot.utils.random_utils import set_seed
@@ -402,10 +401,6 @@ def train(cfg: TrainPipelineConfig, accelerator: "Accelerator | None" = None):
        shuffle = True
        sampler = None

-    # Only swap in the language-aware collate when the dataset actually
-    # declares language columns; otherwise stay on PyTorch's default
-    # collate so non-language training runs are unaffected.
-    collate_fn = lerobot_collate_fn if dataset.meta.has_language_columns else None
    dataloader = torch.utils.data.DataLoader(
        dataset,
        num_workers=cfg.num_workers,
@@ -414,7 +409,6 @@ def train(cfg: TrainPipelineConfig, accelerator: "Accelerator | None" = None):
        sampler=sampler,
        pin_memory=device.type == "cuda",
        drop_last=False,
-        collate_fn=collate_fn,
        prefetch_factor=cfg.prefetch_factor if cfg.num_workers > 0 else None,
        persistent_workers=cfg.persistent_workers and cfg.num_workers > 0,
    )
--- a/src/lerobot/teleoperators/bi_rebot_102_leader/init.py
+++ b/src/lerobot/teleoperators/bi_rebot_102_leader/init.py
@@ -1,20 +0,0 @@
-#!/usr/bin/env python
-
-# Copyright 2026 The HuggingFace Inc. team. All rights reserved.
-#
-# Licensed under the Apache License, Version 2.0 (the "License");
-# you may not use this file except in compliance with the License.
-# You may obtain a copy of the License at
-#
-#     http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-
-from .bi_rebot_102_leader import BiRebotArm102Leader
-from .config_bi_rebot_102_leader import BiRebotArm102LeaderConfig
-
-__all__ = ["BiRebotArm102Leader", "BiRebotArm102LeaderConfig"]
--- a/src/lerobot/teleoperators/bi_rebot_102_leader/bi_rebot_102_leader.py
+++ b/src/lerobot/teleoperators/bi_rebot_102_leader/bi_rebot_102_leader.py
@@ -1,113 +0,0 @@
-#!/usr/bin/env python
-
-# Copyright 2026 The HuggingFace Inc. team. All rights reserved.
-#
-# Licensed under the Apache License, Version 2.0 (the "License");
-# you may not use this file except in compliance with the License.
-# You may obtain a copy of the License at
-#
-#     http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-
-import logging
-from functools import cached_property
-
-from lerobot.types import RobotAction
-from lerobot.utils.decorators import check_if_already_connected, check_if_not_connected
-
-from ..rebot_102_leader import RebotArm102Leader, RebotArm102LeaderTeleopConfig
-from ..teleoperator import Teleoperator
-from .config_bi_rebot_102_leader import BiRebotArm102LeaderConfig
-
-logger = logging.getLogger(__name__)
-
-
-class BiRebotArm102Leader(Teleoperator):
-    """Bimanual Seeed Studio StarArm102 / reBot Arm 102 leader.
-
-    Composes two single-arm :class:`RebotArm102Leader` instances. Action keys of
-    each arm are namespaced with a ``left_`` / ``right_`` prefix, so a bimanual
-    leader can teleoperate a bimanual reBot B601 follower.
-    """
-
-    config_class = BiRebotArm102LeaderConfig
-    name = "bi_rebot_102_leader"
-
-    def __init__(self, config: BiRebotArm102LeaderConfig):
-        super().__init__(config)
-        self.config = config
-
-        left_arm_config = RebotArm102LeaderTeleopConfig(
-            id=f"{config.id}_left" if config.id else None,
-            calibration_dir=config.calibration_dir,
-            port=config.left_arm_config.port,
-            baudrate=config.left_arm_config.baudrate,
-            joint_ids=config.left_arm_config.joint_ids,
-            joint_directions=config.left_arm_config.joint_directions,
-            joint_ranges=config.left_arm_config.joint_ranges,
-        )
-
-        right_arm_config = RebotArm102LeaderTeleopConfig(
-            id=f"{config.id}_right" if config.id else None,
-            calibration_dir=config.calibration_dir,
-            port=config.right_arm_config.port,
-            baudrate=config.right_arm_config.baudrate,
-            joint_ids=config.right_arm_config.joint_ids,
-            joint_directions=config.right_arm_config.joint_directions,
-            joint_ranges=config.right_arm_config.joint_ranges,
-        )
-
-        self.left_arm = RebotArm102Leader(left_arm_config)
-        self.right_arm = RebotArm102Leader(right_arm_config)
-
-    @cached_property
-    def action_features(self) -> dict[str, type]:
-        return {
-            **{f"left_{k}": v for k, v in self.left_arm.action_features.items()},
-            **{f"right_{k}": v for k, v in self.right_arm.action_features.items()},
-        }
-
-    @cached_property
-    def feedback_features(self) -> dict[str, type]:
-        return {}
-
-    @property
-    def is_connected(self) -> bool:
-        return self.left_arm.is_connected and self.right_arm.is_connected
-
-    @check_if_already_connected
-    def connect(self, calibrate: bool = True) -> None:
-        self.left_arm.connect(calibrate)
-        self.right_arm.connect(calibrate)
-
-    @property
-    def is_calibrated(self) -> bool:
-        return self.left_arm.is_calibrated and self.right_arm.is_calibrated
-
-    def calibrate(self) -> None:
-        self.left_arm.calibrate()
-        self.right_arm.calibrate()
-
-    def configure(self) -> None:
-        self.left_arm.configure()
-        self.right_arm.configure()
-
-    @check_if_not_connected
-    def get_action(self) -> RobotAction:
-        action_dict = {}
-        action_dict.update({f"left_{k}": v for k, v in self.left_arm.get_action().items()})
-        action_dict.update({f"right_{k}": v for k, v in self.right_arm.get_action().items()})
-        return action_dict
-
-    def send_feedback(self, feedback: dict[str, float]) -> None:
-        raise NotImplementedError("Feedback is not implemented for the reBot Arm 102 leader.")
-
-    @check_if_not_connected
-    def disconnect(self) -> None:
-        self.left_arm.disconnect()
-        self.right_arm.disconnect()
--- a/src/lerobot/teleoperators/bi_rebot_102_leader/config_bi_rebot_102_leader.py
+++ b/src/lerobot/teleoperators/bi_rebot_102_leader/config_bi_rebot_102_leader.py
@@ -1,29 +0,0 @@
-#!/usr/bin/env python
-
-# Copyright 2026 The HuggingFace Inc. team. All rights reserved.
-#
-# Licensed under the Apache License, Version 2.0 (the "License");
-# you may not use this file except in compliance with the License.
-# You may obtain a copy of the License at
-#
-#     http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-
-from dataclasses import dataclass
-
-from ..config import TeleoperatorConfig
-from ..rebot_102_leader import RebotArm102LeaderConfig
-
-
-@TeleoperatorConfig.register_subclass("bi_rebot_102_leader")
-@dataclass
-class BiRebotArm102LeaderConfig(TeleoperatorConfig):
-    """Configuration class for the bimanual reBot Arm 102 leader teleoperator."""
-
-    left_arm_config: RebotArm102LeaderConfig
-    right_arm_config: RebotArm102LeaderConfig
--- a/src/lerobot/teleoperators/rebot_102_leader/init.py
+++ b/src/lerobot/teleoperators/rebot_102_leader/init.py
@@ -1,20 +0,0 @@
-#!/usr/bin/env python
-
-# Copyright 2026 The HuggingFace Inc. team. All rights reserved.
-#
-# Licensed under the Apache License, Version 2.0 (the "License");
-# you may not use this file except in compliance with the License.
-# You may obtain a copy of the License at
-#
-#     http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-
-from .config_rebot_102_leader import RebotArm102LeaderConfig, RebotArm102LeaderTeleopConfig
-from .rebot_102_leader import RebotArm102Leader
-
-__all__ = ["RebotArm102Leader", "RebotArm102LeaderConfig", "RebotArm102LeaderTeleopConfig"]
--- a/src/lerobot/teleoperators/rebot_102_leader/config_rebot_102_leader.py
+++ b/src/lerobot/teleoperators/rebot_102_leader/config_rebot_102_leader.py
@@ -1,83 +0,0 @@
-#!/usr/bin/env python
-
-# Copyright 2026 The HuggingFace Inc. team. All rights reserved.
-#
-# Licensed under the Apache License, Version 2.0 (the "License");
-# you may not use this file except in compliance with the License.
-# You may obtain a copy of the License at
-#
-#     http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-
-from dataclasses import dataclass, field
-
-from ..config import TeleoperatorConfig
-
-
-@dataclass
-class RebotArm102LeaderConfig:
-    """Base configuration class for the Seeed Studio StarArm102 / reBot Arm 102 leader.
-
-    The reBot Arm 102 is a 7-joint (incl. gripper) leader arm driven by FashionStar
-    UART smart servos. Servo communication goes through ``motorbridge-smart-servo``.
-    """
-
-    # USB-to-UART device the leader arm is connected to (e.g. "/dev/ttyUSB0").
-    port: str
-
-    baudrate: int = 1_000_000
-
-    # Servo id of each joint on the UART bus.
-    joint_ids: dict[str, int] = field(
-        default_factory=lambda: {
-            "shoulder_pan": 0,
-            "shoulder_lift": 1,
-            "elbow_flex": 2,
-            "wrist_flex": 3,
-            "wrist_yaw": 4,
-            "wrist_roll": 5,
-            "gripper": 6,
-        }
-    )
-
-    # Per-joint sign applied to raw servo angles so the leader matches the follower
-    # convention. The gripper additionally carries a scale (e.g. -6) to widen its
-    # range to the reBot B601 follower's gripper travel.
-    joint_directions: dict[str, int] = field(
-        default_factory=lambda: {
-            "shoulder_pan": -1,
-            "shoulder_lift": -1,
-            "elbow_flex": 1,
-            "wrist_flex": 1,
-            "wrist_yaw": 1,
-            "wrist_roll": -1,
-            "gripper": -6,
-        }
-    )
-
-    # Per-joint [min, max] output range in degrees. Matches the reBot B601 follower
-    # joint limits so leader actions can drive the follower key-for-key.
-    joint_ranges: dict[str, list[int]] = field(
-        default_factory=lambda: {
-            "shoulder_pan": [-150, 150],
-            "shoulder_lift": [-170, 1],
-            "elbow_flex": [-200, 1],
-            "wrist_flex": [-80, 90],
-            "wrist_yaw": [-90, 90],
-            "wrist_roll": [-90, 90],
-            "gripper": [-270, 0],
-        }
-    )
-
-
-@TeleoperatorConfig.register_subclass("rebot_102_leader")
-@dataclass
-class RebotArm102LeaderTeleopConfig(TeleoperatorConfig, RebotArm102LeaderConfig):
-    """Registered configuration for the reBot Arm 102 leader teleoperator."""
-
-    pass
--- a/src/lerobot/teleoperators/rebot_102_leader/rebot_102_leader.py
+++ b/src/lerobot/teleoperators/rebot_102_leader/rebot_102_leader.py
@@ -1,207 +0,0 @@
-#!/usr/bin/env python
-
-# Copyright 2026 The HuggingFace Inc. team. All rights reserved.
-#
-# Licensed under the Apache License, Version 2.0 (the "License");
-# you may not use this file except in compliance with the License.
-# You may obtain a copy of the License at
-#
-#     http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-
-import logging
-import time
-from typing import TYPE_CHECKING
-
-from lerobot.motors import MotorCalibration
-from lerobot.types import RobotAction
-from lerobot.utils.decorators import check_if_already_connected, check_if_not_connected
-from lerobot.utils.import_utils import _motorbridge_smart_servo_available, require_package
-
-from ..teleoperator import Teleoperator
-from .config_rebot_102_leader import RebotArm102LeaderTeleopConfig
-
-if TYPE_CHECKING or _motorbridge_smart_servo_available:
-    from motorbridge_smart_servo import FashionStarServo, ServoMonitor
-else:
-    FashionStarServo = None
-    ServoMonitor = None
-
-logger = logging.getLogger(__name__)
-
-_SETTLE_SEC = 0.01
-
-
-class RebotArm102Leader(Teleoperator):
-    """Seeed Studio StarArm102 / reBot Arm 102 leader arm.
-
-    A 7-joint (incl. gripper) leader built on FashionStar UART smart servos. Servo
-    communication is handled by the ``motorbridge-smart-servo`` package; this class
-    only reads joint angles, so it produces actions but accepts no feedback.
-    """
-
-    config_class = RebotArm102LeaderTeleopConfig
-    name = "rebot_102_leader"
-
-    def __init__(self, config: RebotArm102LeaderTeleopConfig):
-        require_package("motorbridge-smart-servo", extra="rebot", import_name="motorbridge_smart_servo")
-        super().__init__(config)
-        self.config = config
-        self.bus: FashionStarServo | None = None
-        self.motor_names = list(config.joint_ids.keys())
-        self._last_raw_positions: dict[str, float] = {}
-
-    @property
-    def action_features(self) -> dict[str, type]:
-        return {f"{motor}.pos": float for motor in self.motor_names}
-
-    @property
-    def feedback_features(self) -> dict[str, type]:
-        return {}
-
-    @property
-    def is_connected(self) -> bool:
-        return self.bus is not None
-
-    @check_if_already_connected
-    def connect(self, calibrate: bool = True) -> None:
-        logger.info(f"Connecting {self} on {self.config.port}...")
-        bus = FashionStarServo(self.config.port, baudrate=self.config.baudrate)
-        try:
-            for motor_name, motor_id in self.config.joint_ids.items():
-                if not bus.ping(motor_id):
-                    raise RuntimeError(f"Servo not found for {motor_name} (id={motor_id}).")
-                self._last_raw_positions[motor_name] = 0.0
-            self.bus = bus
-
-            if not self.is_calibrated and calibrate:
-                logger.info(
-                    "Mismatch between calibration values in the motor and the calibration file or no calibration file found"
-                )
-                self.calibrate()
-
-            self.configure()
-        except Exception:
-            bus.close()
-            self.bus = None
-            raise
-
-        logger.info(f"{self} connected.")
-
-    @property
-    def is_calibrated(self) -> bool:
-        return bool(self.calibration) and set(self.calibration) == set(self.motor_names)
-
-    def calibrate(self) -> None:
-        if self.calibration:
-            user_input = input(
-                f"Press ENTER to use provided calibration file associated with the id {self.id}, "
-                "or type 'c' and press ENTER to run calibration: "
-            )
-            if user_input.strip().lower() != "c":
-                logger.info(f"Using calibration file associated with the id {self.id}")
-                return
-
-        logger.info(f"\nRunning calibration of {self}")
-        input(
-            "\nCalibration: set zero position.\n"
-            "Manually move the reBot Arm 102 to its zero pose and close the gripper.\n"
-            "Press ENTER when ready..."
-        )
-
-        self.calibration = {}
-        for motor_name, motor_id in self.config.joint_ids.items():
-            self.bus.unlock(motor_id)
-            time.sleep(_SETTLE_SEC)
-            self.bus.set_origin_point(motor_id)
-            range_min, range_max = self.config.joint_ranges[motor_name]
-            self.calibration[motor_name] = MotorCalibration(
-                id=motor_id,
-                drive_mode=0,
-                homing_offset=0,
-                range_min=int(range_min),
-                range_max=int(range_max),
-            )
-
-        self._save_calibration()
-        logger.info(f"Calibration saved to {self.calibration_fpath}")
-
-    def configure(self) -> None:
-        for motor_id in self.config.joint_ids.values():
-            self.bus.unlock(motor_id)
-            time.sleep(_SETTLE_SEC)
-        # Reset the multi-turn counter of each servo individually.
-        for motor_id in self.config.joint_ids.values():
-            self.bus.reset_multi_turn(motor_id)
-
-    def _read_raw_positions(self) -> dict[str, float]:
-        result: dict[int, ServoMonitor | None] = self.bus.sync_monitor(list(self.config.joint_ids.values()))
-        id_to_name = {v: k for k, v in self.config.joint_ids.items()}
-        raw_positions: dict[str, float] = {}
-        for motor_id, monitor in result.items():
-            motor_name = id_to_name[motor_id]
-            if monitor is None:
-                raise RuntimeError(f"Servo {motor_name} (id={motor_id}) has never responded.")
-            raw_positions[motor_name] = monitor.angle_deg
-        return raw_positions
-
-    @staticmethod
-    def _round_to_valid_range(value: float, min_value: float, max_value: float) -> tuple[float, int]:
-        """Unwrap a multi-turn angle into the ±180° window centred on (min+max)/2.
-
-        The servo may report an angle that has accumulated extra full rotations
-        (value = true_angle + N*360). Subtract the nearest whole number of turns
-        to bring it back into [center-180, center+180]. Returns the unwrapped
-        angle and the number of turns removed.
-        """
-        center = (min_value + max_value) / 2.0
-        turns = round((value - center) / 360.0)
-        return value - turns * 360.0, abs(turns)
-
-    @check_if_not_connected
-    def get_action(self) -> RobotAction:
-        start = time.perf_counter()
-        try:
-            raw_positions = self._read_raw_positions()
-            self._last_raw_positions = raw_positions
-        except Exception as e:
-            logger.error(f"Failed to read raw positions: {e}")
-            logger.warning("[EMERGENCY STOP] Hold the follower arm and cut off the main power to the arms.")
-            logger.warning(
-                "[EMERGENCY STOP] Break the teleoperation session and check the leader USB connection or power."
-            )
-            raw_positions = self._last_raw_positions
-
-        action_dict: dict[str, float] = {}
-        for motor_name in self.motor_names:
-            range_min, range_max = self.config.joint_ranges[motor_name]
-            direction = self.config.joint_directions[motor_name]
-            sign = 1.0 if direction >= 0 else -1.0
-            unwrapped, k = self._round_to_valid_range(
-                raw_positions[motor_name], range_min * sign, range_max * sign
-            )
-            position = unwrapped * direction
-            if k > 0:
-                logger.debug(
-                    f"Servo {motor_name} (id={self.config.joint_ids[motor_name]}) wrapped {k} * 360°. "
-                    f"Unwrapped pos: {unwrapped:.1f}° (raw: {raw_positions[motor_name]:.1f}°)"
-                )
-            action_dict[f"{motor_name}.pos"] = max(float(range_min), min(float(range_max), position))
-
-        dt_ms = (time.perf_counter() - start) * 1e3
-        logger.debug(f"{self} read action: {dt_ms:.1f}ms")
-        return action_dict
-
-    def send_feedback(self, feedback: dict[str, float]) -> None:
-        raise NotImplementedError("Feedback is not implemented for the reBot Arm 102 leader.")
-
-    @check_if_not_connected
-    def disconnect(self) -> None:
-        self.bus.close()
-        self.bus = None
-        logger.info(f"{self} disconnected.")
--- a/src/lerobot/teleoperators/utils.py
+++ b/src/lerobot/teleoperators/utils.py
@@ -99,14 +99,6 @@ def make_teleoperator_from_config(config: TeleoperatorConfig) -> "Teleoperator":
        from .openarm_mini import OpenArmMini

        return OpenArmMini(config)
-    elif config.type == "rebot_102_leader":
-        from .rebot_102_leader import RebotArm102Leader
-
-        return RebotArm102Leader(config)
-    elif config.type == "bi_rebot_102_leader":
-        from .bi_rebot_102_leader import BiRebotArm102Leader
-
-        return BiRebotArm102Leader(config)
    else:
        try:
            return cast("Teleoperator", make_device_from_device_class(config))
--- a/src/lerobot/templates/lerobot_rewardmodel_modelcard_template.md
+++ b/src/lerobot/templates/lerobot_rewardmodel_modelcard_template.md
@@ -13,8 +13,6 @@
 A reward classifier is a lightweight neural network that scores observations or trajectories for task success, providing a learned reward signal or offline evaluation when explicit rewards are unavailable.
 {% elif model_name == "sarm" %}
 A Success-Aware Reward Model (SARM) predicts a dense reward signal from observations, typically used downstream for reinforcement learning or human-in-the-loop fine-tuning when task success is not directly observable.
-{% elif model_name == "topreward" %}
-TOPReward is a **zero-shot** reward model that extracts token log-probabilities from an off-the-shelf vision-language model (default Qwen3-VL) as a reward signal. Given a video trajectory and a task instruction, it returns the VLM's log-likelihood of the instruction being true, with no fine-tuning required.
 {% else %}
 _Reward model type not recognized — please update this template._
 {% endif %}
--- a/src/lerobot/utils/collate.py
+++ b/src/lerobot/utils/collate.py
@@ -1,65 +0,0 @@
-#!/usr/bin/env python
-
-# Copyright 2026 The HuggingFace Inc. team. All rights reserved.
-#
-# Licensed under the Apache License, Version 2.0 (the "License");
-# you may not use this file except in compliance with the License.
-# You may obtain a copy of the License at
-#
-#     http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-
-from __future__ import annotations
-
-from typing import Any
-
-from torch.utils.data._utils.collate import default_collate
-
-from lerobot.datasets.language import LANGUAGE_COLUMNS
-
-_PYTHON_LIST_KEYS = {"messages", "message_streams", "target_message_indices"}
-
-
-def lerobot_collate_fn(batch: list[dict[str, Any] | None]) -> dict[str, Any] | None:
-    """Collate function that preserves Python-list and language fields as lists.
-
-    Drops ``None`` samples (e.g. recipes that yielded no target message), keeps
-    rendered-message and language fields as plain Python lists, and delegates
-    every other key to PyTorch's ``default_collate``.
-    """
-    batch = [sample for sample in batch if sample is not None]
-    if not batch:
-        return None
-
-    # All-or-nothing per key: a partial-presence batch (e.g. half the samples
-    # carry `messages` and half don't) is a real bug in the upstream
-    # rendering step — silently filtering would hand downstream consumers a
-    # preserved list shorter than the tensor batch. Raise instead so the
-    # mismatch surfaces at the boundary.
-    preserved: dict[str, list[Any]] = {}
-    for key in _PYTHON_LIST_KEYS:
-        presence = [key in sample for sample in batch]
-        if not any(presence):
-            continue
-        if not all(presence):
-            raise ValueError(
-                f"Inconsistent batch: {sum(presence)}/{len(batch)} samples carry {key!r}; "
-                f"every sample in a batch must agree."
-            )
-        preserved[key] = [sample[key] for sample in batch]
-    tensorizable = [
-        {
-            key: value
-            for key, value in sample.items()
-            if key not in _PYTHON_LIST_KEYS and key not in LANGUAGE_COLUMNS
-        }
-        for sample in batch
-    ]
-    collated = default_collate(tensorizable)
-    collated.update(preserved)
-    return collated
--- a/src/lerobot/utils/feature_utils.py
+++ b/src/lerobot/utils/feature_utils.py
@@ -69,6 +69,7 @@ def hw_to_dataset_features(
        for key, ftype in hw_features.items()
        if ftype is float or (isinstance(ftype, PolicyFeature) and ftype.type != FeatureType.VISUAL)
    }
+    # TODO(CarolinePascal): we should not rely on the shape to determine if a feature is a camera !
    cam_fts = {key: shape for key, shape in hw_features.items() if isinstance(shape, tuple)}

    if joint_fts and prefix == ACTION:
@@ -86,11 +87,19 @@ def hw_to_dataset_features(
        }

    for key, shape in cam_fts.items():
-        features[f"{prefix}.images.{key}"] = {
-            "dtype": "video" if use_video else "image",
-            "shape": shape,
-            "names": ["height", "width", "channels"],
-        }
+        dtype = "video" if use_video else "image"
+        if len(shape) == 3 and shape[2] in (1, 3):
+            features[f"{prefix}.images.{key}"] = {
+                "dtype": dtype,
+                "shape": shape,
+                "names": ["height", "width", "channels"],
+                "info": {"is_depth_map": shape[2] == 1},
+            }
+        else:
+            raise ValueError(
+                f"Camera feature '{key}' has shape {shape}. "
+                f"Expected a 3-tuple (H, W, C), e.g. (480, 640, 3) for RGB or (480, 640, 1) for depth."
+            )

    _validate_feature_names(features)
    return features
@@ -149,11 +158,11 @@ def dataset_to_policy_features(features: dict[str, dict]) -> dict[str, PolicyFea
            type = FeatureType.VISUAL
            if len(shape) != 3:
                raise ValueError(f"Number of dimensions of {key} != 3 (shape={shape})")
-
-            names = ft["names"]
-            # Backward compatibility for "channel" which is an error introduced in LeRobotDataset v2.0 for ported datasets.
-            if names[2] in ["channel", "channels"]:  # (h, w, c) -> (c, h, w)
-                shape = (shape[2], shape[0], shape[1])
+            else:
+                names = ft["names"]
+                # Backward compatibility for "channel" which is an error introduced in LeRobotDataset v2.0 for ported datasets.
+                if names[2] in ["channel", "channels"]:  # (h, w, c) -> (c, h, w)
+                    shape = (shape[2], shape[0], shape[1])
        elif key == OBS_ENV_STATE:
            type = FeatureType.ENV
        elif key.startswith(OBS_STR):
--- a/src/lerobot/utils/import_utils.py
+++ b/src/lerobot/utils/import_utils.py
@@ -114,10 +114,6 @@ _dynamixel_sdk_available = is_package_available("dynamixel-sdk", import_name="dy
 _feetech_sdk_available = is_package_available("feetech-servo-sdk", import_name="scservo_sdk")
 _reachy2_sdk_available = is_package_available("reachy2_sdk")
 _can_available = is_package_available("python-can", "can")
-_motorbridge_available = is_package_available("motorbridge")
-_motorbridge_smart_servo_available = is_package_available(
-    "motorbridge-smart-servo", import_name="motorbridge_smart_servo"
-)
 _unitree_sdk_available = is_package_available("unitree-sdk2py", "unitree_sdk2py")
 _pyrealsense2_available = is_package_available("pyrealsense2") or is_package_available(
    "pyrealsense2-macosx", import_name="pyrealsense2"
--- a/src/lerobot/utils/utils.py
+++ b/src/lerobot/utils/utils.py
@@ -160,25 +160,6 @@ def has_method(cls: object, method_name: str) -> bool:
    return hasattr(cls, method_name) and callable(getattr(cls, method_name))


-def unwrap_scalar(value: Any) -> Any:
-    """Unwrap a tensor / numpy scalar / single-element list into a Python scalar.
-
-    Tensors and numpy scalars expose ``.item()``; single-element lists are
-    unwrapped recursively. Anything else is returned unchanged. Centralized
-    here so the language renderer and processor steps share one definition.
-
-    Raises:
-        ValueError: If ``value`` is a list with zero or multiple elements.
-    """
-    if hasattr(value, "item"):
-        return value.item()
-    if isinstance(value, list):
-        if len(value) != 1:
-            raise ValueError(f"Expected a scalar, got list of length {len(value)}: {value!r}")
-        return unwrap_scalar(value[0])
-    return value
-
-
 def is_valid_numpy_dtype_string(dtype_str: str) -> bool:
    """
    Return True if a given string can be converted to a numpy dtype.
--- a/src/lerobot/utils/visualization_utils.py
+++ b/src/lerobot/utils/visualization_utils.py
@@ -107,8 +107,15 @@ def log_rerun_data(
                    for i, vi in enumerate(arr):
                        rr.log(f"{key}_{i}", rr.Scalars(float(vi)))
                else:
-                    img_entity = rr.Image(arr).compress() if compress_images else rr.Image(arr)
-                    rr.log(key, entity=img_entity, static=True)
+                    if arr.shape[-1] == 1:
+                        img_entity = (
+                            rr.DepthImage(arr, colormap=rr.components.Colormap.Viridis).compress()
+                            if compress_images
+                            else rr.DepthImage(arr, colormap=rr.components.Colormap.Viridis)
+                        )
+                    else:
+                        img_entity = rr.Image(arr).compress() if compress_images else rr.Image(arr)
+                    rr.log(key, entity=img_entity)

    if action:
        for k, v in action.items():
--- a/tests/configs/test_recipe.py
+++ b/tests/configs/test_recipe.py
@@ -1,168 +0,0 @@
-#!/usr/bin/env python
-
-from pathlib import Path
-from textwrap import dedent
-
-import pytest
-
-from lerobot.configs.recipe import MessageTurn, TrainingRecipe, load_recipe
-
-
-def _minimal_message_turn(content: str = "${task}") -> MessageTurn:
-    return MessageTurn(role="user", content=content, stream="high_level")
-
-
-def _minimal_target_turn() -> MessageTurn:
-    return MessageTurn(role="assistant", content="ok", stream="high_level", target=True)
-
-
-# ── Message-recipe validation ────────────────────────────────────────
-
-
-def test_message_recipe_validates_unknown_binding():
-    with pytest.raises(ValueError, match="unknown binding"):
-        TrainingRecipe(
-            messages=[
-                MessageTurn(role="user", content="${missing}", stream="high_level"),
-                _minimal_target_turn(),
-            ]
-        )
-
-
-def test_message_turn_requires_a_stream():
-    """Every turn must declare a stream — None is rejected at construction.
-
-    Previously this only failed at render time (``_validate_rendered``);
-    catching it here means a malformed recipe YAML errors at load instead
-    of at the first training sample.
-    """
-    with pytest.raises(ValueError, match="missing a stream"):
-        MessageTurn(role="user", content="${task}")
-
-
-def test_message_recipe_requires_at_least_one_target():
-    with pytest.raises(ValueError, match="target"):
-        TrainingRecipe(
-            messages=[
-                _minimal_message_turn(),
-                MessageTurn(role="assistant", content="no target", stream="high_level"),
-            ]
-        )
-
-
-def test_recipe_rejects_both_messages_and_blend():
-    with pytest.raises(ValueError, match="only one"):
-        TrainingRecipe(
-            messages=[_minimal_message_turn(), _minimal_target_turn()],
-            blend={"a": TrainingRecipe(weight=1.0, messages=[_minimal_target_turn()])},
-        )
-
-
-def test_recipe_rejects_neither_messages_nor_blend():
-    with pytest.raises(ValueError, match="must set one"):
-        TrainingRecipe()
-
-
-# ── Blend validation ─────────────────────────────────────────────────
-
-
-def test_blend_must_be_non_empty():
-    with pytest.raises(ValueError, match="at least one component"):
-        TrainingRecipe(blend={})
-
-
-def test_blend_component_must_define_weight():
-    with pytest.raises(ValueError, match="weight"):
-        TrainingRecipe(blend={"a": TrainingRecipe(messages=[_minimal_target_turn()])})
-
-
-def test_blend_component_weight_must_be_positive():
-    with pytest.raises(ValueError, match="positive weight"):
-        TrainingRecipe(blend={"a": TrainingRecipe(weight=0.0, messages=[_minimal_target_turn()])})
-
-
-def test_blend_component_must_define_messages():
-    # A bare TrainingRecipe(weight=1.0) would itself raise; build it without
-    # going through __post_init__ to exercise the blend-level validator.
-    bad = TrainingRecipe.__new__(TrainingRecipe)
-    bad.messages = None
-    bad.bindings = None
-    bad.blend = None
-    bad.weight = 1.0
-    with pytest.raises(ValueError, match="must define messages"):
-        TrainingRecipe(blend={"a": bad})
-
-
-def test_blend_components_cannot_themselves_define_a_blend():
-    inner = TrainingRecipe(blend={"x": TrainingRecipe(weight=1.0, messages=[_minimal_target_turn()])})
-    # Force-bypass the inner component's normal validation so the test
-    # exercises the outer blend's "no nested blends" rule directly.
-    nested = TrainingRecipe.__new__(TrainingRecipe)
-    nested.messages = None
-    nested.bindings = None
-    nested.blend = inner.blend
-    nested.weight = 1.0
-    with pytest.raises(ValueError, match="cannot itself define a blend"):
-        TrainingRecipe(blend={"outer": nested})
-
-
-# ── from_dict / from_yaml round-trips ────────────────────────────────
-
-
-def test_from_dict_with_nested_blend():
-    recipe = TrainingRecipe.from_dict(
-        {
-            "blend": {
-                "a": {
-                    "weight": 1.0,
-                    "messages": [
-                        {"role": "user", "content": "${task}", "stream": "high_level"},
-                        {"role": "assistant", "content": "a", "stream": "high_level", "target": True},
-                    ],
-                },
-                "b": {
-                    "weight": 2.0,
-                    "messages": [
-                        {"role": "user", "content": "${task}", "stream": "high_level"},
-                        {"role": "assistant", "content": "b", "stream": "high_level", "target": True},
-                    ],
-                },
-            }
-        }
-    )
-    assert recipe.blend is not None
-    assert set(recipe.blend) == {"a", "b"}
-    assert recipe.blend["b"].weight == 2.0
-    # Inner messages were promoted to MessageTurn instances.
-    assert isinstance(recipe.blend["a"].messages[0], MessageTurn)
-
-
-def test_from_yaml_round_trips_through_load_recipe(tmp_path: Path):
-    yaml_text = dedent(
-        """
-        bindings:
-          custom: "active_at(t, style=subtask)"
-        messages:
-          - {role: user, content: "${task}: ${custom}", stream: high_level}
-          - {role: assistant, content: "ok", stream: high_level, target: true}
-        """
-    ).strip()
-    path = tmp_path / "recipe.yaml"
-    path.write_text(yaml_text)
-
-    via_classmethod = TrainingRecipe.from_yaml(path)
-    via_helper = load_recipe(path)
-
-    assert via_classmethod.bindings == {"custom": "active_at(t, style=subtask)"}
-    assert via_classmethod.messages[1].target is True
-    # ``load_recipe`` is just a wrapper, but assert the two paths agree
-    # on the structural result so a future divergence is caught here.
-    assert via_helper.bindings == via_classmethod.bindings
-    assert len(via_helper.messages) == len(via_classmethod.messages)
-
-
-def test_from_yaml_rejects_non_mapping(tmp_path: Path):
-    path = tmp_path / "bad.yaml"
-    path.write_text("- just\n- a\n- list\n")
-    with pytest.raises(ValueError, match="mapping at the top level"):
-        TrainingRecipe.from_yaml(path)
--- a/tests/datasets/test_dataset_metadata.py
+++ b/tests/datasets/test_dataset_metadata.py
@@ -59,11 +59,13 @@ def _make_dummy_stats(features: dict) -> dict:
    stats = {}
    for key, ft in features.items():
        if ft["dtype"] in ("image", "video"):
+            channels = ft["shape"][-1]
+            stat_shape = (channels, 1, 1)
            stats[key] = {
-                "max": np.ones((3, 1, 1), dtype=np.float32),
-                "mean": np.full((3, 1, 1), 0.5, dtype=np.float32),
-                "min": np.zeros((3, 1, 1), dtype=np.float32),
-                "std": np.full((3, 1, 1), 0.25, dtype=np.float32),
+                "max": np.ones(stat_shape, dtype=np.float32),
+                "mean": np.full(stat_shape, 0.5, dtype=np.float32),
+                "min": np.zeros(stat_shape, dtype=np.float32),
+                "std": np.full(stat_shape, 0.25, dtype=np.float32),
                "count": np.array([5]),
            }
        elif ft["dtype"] in ("float32", "float64", "int64"):
@@ -142,6 +144,45 @@ def test_create_without_videos_has_no_video_path(tmp_path):
    assert meta.video_keys == []


+@pytest.mark.parametrize(
+    ("marker_field", "marker_key"),
+    [
+        ("info", "is_depth_map"),
+        ("info", "video.is_depth_map"),
+        ("video_info", "video.is_depth_map"),
+    ],
+    ids=["info.is_depth_map", "info.video.is_depth_map_legacy", "video_info.video.is_depth_map_legacy"],
+)
+def test_depth_keys_property_filters_by_marker(tmp_path, marker_field, marker_key):
+    """``depth_keys`` recognises the canonical and the two legacy marker variants."""
+    depth_feature = {
+        "dtype": "video",
+        "shape": (64, 96, 1),
+        "names": ["height", "width", "channels"],
+        marker_field: {marker_key: True},
+    }
+    features = {
+        **VIDEO_FEATURES,
+        "observation.images.laptop_depth": depth_feature,
+    }
+    meta = LeRobotDatasetMetadata.create(
+        repo_id="test/depth_keys",
+        fps=DEFAULT_FPS,
+        features=features,
+        root=tmp_path / f"depth_keys_{marker_field}_{marker_key.replace('.', '_')}",
+    )
+
+    assert set(meta.video_keys) == {"observation.images.laptop", "observation.images.laptop_depth"}
+    assert meta.depth_keys == ["observation.images.laptop_depth"]
+
+
+def test_depth_keys_empty_when_no_marker(tmp_path):
+    meta = LeRobotDatasetMetadata.create(
+        repo_id="test/no_depth", fps=DEFAULT_FPS, features=VIDEO_FEATURES, root=tmp_path / "no_depth"
+    )
+    assert meta.depth_keys == []
+
+
 def test_create_raises_on_existing_directory(tmp_path):
    """create() raises if root directory already exists."""
    root = tmp_path / "existing"
@@ -385,140 +426,3 @@ def test_finalize_flushes_buffered_metadata(tmp_path):
    assert episodes_dir.exists()
    parquet_files = list(episodes_dir.rglob("*.parquet"))
    assert len(parquet_files) > 0
-
-
-# ── Tools accessor ───────────────────────────────────────────────────
-
-
-def test_tools_falls_back_to_default_when_info_has_no_tools_field(tmp_path):
-    """meta.tools returns DEFAULT_TOOLS when info.json doesn't declare any."""
-    from lerobot.datasets.language import DEFAULT_TOOLS
-
-    root = tmp_path / "no_tools"
-    meta = LeRobotDatasetMetadata.create(
-        repo_id="test/no_tools",
-        fps=DEFAULT_FPS,
-        features=SIMPLE_FEATURES,
-        root=root,
-        use_videos=False,
-    )
-
-    assert meta.tools == DEFAULT_TOOLS
-    # info.json on disk should NOT include a `tools` key for clean datasets
-    with open(root / INFO_PATH) as f:
-        info_on_disk = json.load(f)
-    assert "tools" not in info_on_disk
-
-
-def test_tools_reads_declared_tools_from_info_json(tmp_path):
-    """A `tools` list written into info.json survives load → meta.tools.
-
-    Regression test for the bug where ``DatasetInfo.from_dict`` silently
-    dropped the ``tools`` key (no matching dataclass field), so
-    ``meta.tools`` always returned ``DEFAULT_TOOLS`` regardless of
-    what was on disk.
-    """
-    from lerobot.datasets.io_utils import load_info
-
-    root = tmp_path / "with_tools"
-    meta = LeRobotDatasetMetadata.create(
-        repo_id="test/with_tools",
-        fps=DEFAULT_FPS,
-        features=SIMPLE_FEATURES,
-        root=root,
-        use_videos=False,
-    )
-
-    custom_tool = {
-        "type": "function",
-        "function": {
-            "name": "record_observation",
-            "description": "Capture a still image.",
-            "parameters": {
-                "type": "object",
-                "properties": {"label": {"type": "string"}},
-                "required": ["label"],
-            },
-        },
-    }
-    info_path = root / INFO_PATH
-    with open(info_path) as f:
-        raw = json.load(f)
-    raw["tools"] = [custom_tool]
-    with open(info_path, "w") as f:
-        json.dump(raw, f)
-
-    # Reload info from disk and rebind it on the metadata object
-    meta.info = load_info(root)
-    assert meta.tools == [custom_tool]
-
-
-def test_tools_round_trip_through_dataset_info(tmp_path):
-    """A `tools` list survives DatasetInfo.from_dict / to_dict."""
-    from lerobot.datasets.utils import DatasetInfo
-
-    raw = {
-        "codebase_version": "v3.1",
-        "fps": 30,
-        "features": SIMPLE_FEATURES,
-        "tools": [{"type": "function", "function": {"name": "say"}}],
-    }
-    info = DatasetInfo.from_dict(raw)
-    assert info.tools == raw["tools"]
-    assert info.to_dict()["tools"] == raw["tools"]
-
-
-def test_tools_setter_persists_to_info_json_and_reloads(tmp_path):
-    """Assigning meta.tools writes info.json and reloads meta.info."""
-    from lerobot.datasets.io_utils import load_info
-
-    root = tmp_path / "set_tools"
-    meta = LeRobotDatasetMetadata.create(
-        repo_id="test/set_tools",
-        fps=DEFAULT_FPS,
-        features=SIMPLE_FEATURES,
-        root=root,
-        use_videos=False,
-    )
-
-    custom_tool = {
-        "type": "function",
-        "function": {
-            "name": "record_observation",
-            "description": "Capture a still image.",
-            "parameters": {
-                "type": "object",
-                "properties": {"label": {"type": "string"}},
-                "required": ["label"],
-            },
-        },
-    }
-    meta.tools = [custom_tool]
-
-    # In-memory metadata reflects the new catalog ...
-    assert meta.tools == [custom_tool]
-    assert meta.info.tools == [custom_tool]
-    # ... and a fresh read from disk agrees.
-    assert load_info(root).tools == [custom_tool]
-
-
-def test_tools_setter_clears_key_when_set_to_none(tmp_path):
-    """Setting meta.tools back to None drops the key and restores the default."""
-    from lerobot.datasets.language import DEFAULT_TOOLS
-
-    root = tmp_path / "clear_tools"
-    meta = LeRobotDatasetMetadata.create(
-        repo_id="test/clear_tools",
-        fps=DEFAULT_FPS,
-        features=SIMPLE_FEATURES,
-        root=root,
-        use_videos=False,
-    )
-
-    meta.tools = [{"type": "function", "function": {"name": "say"}}]
-    meta.tools = None
-
-    assert meta.tools == DEFAULT_TOOLS
-    with open(root / INFO_PATH) as f:
-        info_on_disk = json.load(f)
-    assert "tools" not in info_on_disk
--- a/tests/datasets/test_dataset_tools.py
+++ b/tests/datasets/test_dataset_tools.py
@@ -23,7 +23,6 @@ import torch

 pytest.importorskip("datasets", reason="datasets is required (install lerobot[dataset])")

-
 from lerobot.configs import VideoEncoderConfig
 from lerobot.datasets.dataset_tools import (
    add_features,
@@ -32,12 +31,9 @@ from lerobot.datasets.dataset_tools import (
    merge_datasets,
    modify_features,
    modify_tasks,
-    reencode_dataset,
    remove_feature,
    split_dataset,
 )
-from lerobot.datasets.io_utils import load_info
-from tests.datasets.test_video_encoding import _add_frames, require_h264, require_libsvtav1


@pytest.fixture
@@ -1330,41 +1326,3 @@ def test_convert_image_to_video_dataset_subset_episodes(tmp_path):

        if output_dir.exists():
            shutil.rmtree(output_dir)
-
-
-# ─── reencode_dataset ─────────────────────────────────────────────────
-
-
-@require_libsvtav1
-@require_h264
-def test_reencode_dataset_multi_key_multiprocessing(
-    tmp_path, empty_lerobot_dataset_factory, features_factory
-):
-    """Re-encode a two-camera dataset with num_workers=2 and verify metadata refresh."""
-    features = features_factory(use_videos=True)
-    initial_cfg = VideoEncoderConfig(vcodec="libsvtav1", g=2, crf=30, preset=12)
-    dataset = empty_lerobot_dataset_factory(
-        root=tmp_path / "ds",
-        features=features,
-        use_videos=True,
-        camera_encoder=initial_cfg,
-    )
-
-    _add_frames(dataset, num_frames=4)
-    dataset.save_episode()
-    _add_frames(dataset, num_frames=4)
-    dataset.save_episode()
-    dataset.finalize()
-
-    assert len(dataset.meta.video_keys) == 2
-
-    target_cfg = VideoEncoderConfig(vcodec="h264", g=6, crf=23, pix_fmt="yuv420p")
-
-    result = reencode_dataset(dataset, camera_encoder=target_cfg, num_workers=2)
-
-    assert result is dataset
-
-    persisted_info = load_info(dataset.root)
-    for vk in dataset.meta.video_keys:
-        persisted_encoder = VideoEncoderConfig.from_video_info(persisted_info.features[vk].get("info", {}))
-        assert persisted_encoder == target_cfg
--- a/tests/datasets/test_dataset_writer.py
+++ b/tests/datasets/test_dataset_writer.py
@@ -53,8 +53,8 @@ def _make_frame(features: dict, task: str = "Dummy task") -> dict:
 # ── Existing encode_video_worker tests ───────────────────────────────


-def test_encode_video_worker_forwards_camera_encoder(tmp_path):
-    """_encode_video_worker forwards camera_encoder to encode_video_frames."""
+def test_encode_video_worker_forwards_video_encoder(tmp_path):
+    """_encode_video_worker forwards video_encoder to encode_video_frames."""
    video_key = "observation.images.laptop"
    fpath = DEFAULT_IMAGE_PATH.format(image_key=video_key, episode_index=0, frame_index=0)
    img_dir = tmp_path / Path(fpath).parent
@@ -74,16 +74,16 @@ def test_encode_video_worker_forwards_camera_encoder(tmp_path):
            0,
            tmp_path,
            fps=30,
-            camera_encoder=VideoEncoderConfig(vcodec="h264", preset=None),
+            video_encoder=VideoEncoderConfig(vcodec="h264", preset=None),
            encoder_threads=4,
        )

-    assert captured_kwargs["camera_encoder"].vcodec == "h264"
+    assert captured_kwargs["video_encoder"].vcodec == "h264"
    assert captured_kwargs["encoder_threads"] == 4


-def test_encode_video_worker_default_camera_encoder(tmp_path):
-    """_encode_video_worker passes None camera_encoder which encode_video_frames defaults."""
+def test_encode_video_worker_default_video_encoder(tmp_path):
+    """_encode_video_worker passes None video_encoder which encode_video_frames defaults."""
    video_key = "observation.images.laptop"
    fpath = DEFAULT_IMAGE_PATH.format(image_key=video_key, episode_index=0, frame_index=0)
    img_dir = tmp_path / Path(fpath).parent
@@ -100,7 +100,7 @@ def test_encode_video_worker_default_camera_encoder(tmp_path):
    with patch("lerobot.datasets.dataset_writer.encode_video_frames", side_effect=mock_encode):
        _encode_video_worker(video_key, 0, tmp_path, fps=30)

-    assert captured_kwargs["camera_encoder"] is None
+    assert captured_kwargs["video_encoder"] is None
    assert captured_kwargs["encoder_threads"] is None


--- a/tests/datasets/test_datasets.py
+++ b/tests/datasets/test_datasets.py
@@ -1480,10 +1480,15 @@ def test_valid_video_codecs_constant():
    assert "h264" in VALID_VIDEO_CODECS
    assert "hevc" in VALID_VIDEO_CODECS
    assert "libsvtav1" in VALID_VIDEO_CODECS
+    assert "ffv1" in VALID_VIDEO_CODECS
    assert "auto" in VALID_VIDEO_CODECS
    assert "h264_videotoolbox" in VALID_VIDEO_CODECS
    assert "h264_nvenc" in VALID_VIDEO_CODECS
-    assert len(VALID_VIDEO_CODECS) == 10
+    assert "h264_vaapi" in VALID_VIDEO_CODECS
+    assert "h264_qsv" in VALID_VIDEO_CODECS
+    assert "hevc_videotoolbox" in VALID_VIDEO_CODECS
+    assert "hevc_nvenc" in VALID_VIDEO_CODECS
+    assert len(VALID_VIDEO_CODECS) == 11    


 def test_delta_timestamps_with_episodes_filter(tmp_path, empty_lerobot_dataset_factory):
--- a/tests/datasets/test_depth.py
+++ b/tests/datasets/test_depth.py
@@ -0,0 +1,307 @@
+"""Tests for the depth-integration feature.
+
+Covers quantization/dequantization round-trips (depth_utils), image writer
+depth support (image_writer), hardware→dataset feature routing
+(feature_utils), video info helpers (video_utils / configs.video), and
+feature-to-file-format routing through the dataset writer.
+
+Depth metadata detection on ``LeRobotDatasetMetadata.depth_keys`` (canonical
+and legacy marker variants) lives in ``test_dataset_metadata.py``.
+"""
+
+from pathlib import Path
+
+import numpy as np
+import PIL.Image
+import pytest
+import torch
+
+pytest.importorskip("av", reason="av is required (install lerobot[dataset])")
+
+import av
+
+from lerobot.configs import DepthEncoderConfig
+from lerobot.configs.video import DEPTH_QMAX, VALID_VIDEO_CODECS
+from lerobot.datasets.depth_utils import dequantize_depth, quantize_depth
+from lerobot.datasets.image_writer import (
+    image_array_to_pil_image,
+    save_kwargs_for_path,
+    write_image,
+)
+from lerobot.datasets.pyav_utils import get_pix_fmt_channels
+from tests.fixtures.constants import (
+    DEFAULT_FPS,
+    DUMMY_CAMERA_FEATURES,
+    DUMMY_DEPTH_CAMERA_FEATURES,
+    DUMMY_MOTOR_FEATURES,
+    DUMMY_REPO_ID,
+)
+
+H, W = 48, 64
+DEPTH_MIN = 0.01
+DEPTH_MAX = 10.0
+
+
+# ── 1. Quantize / Dequantize round-trips ────────────────────────────
+
+
+class TestQuantizeDequantize:
+    """Core numerical tests for depth_utils.quantize_depth / dequantize_depth."""
+
+    def _make_depth_metres(self) -> np.ndarray:
+        """Linearly-spaced float32 depth in metres covering the default range."""
+        return np.linspace(DEPTH_MIN, DEPTH_MAX, H * W, dtype=np.float32).reshape(H, W)
+
+    def test_roundtrip_linear_metres(self):
+        depth = self._make_depth_metres()
+        quantized = quantize_depth(depth, use_log=False, video_backend=None)
+        recovered = dequantize_depth(quantized, use_log=False, output_unit="m")
+
+        assert recovered.shape == (H, W, 1), f"Expected (H,W,1), got {recovered.shape}"
+        assert recovered.dtype == np.float32
+        tol = (DEPTH_MAX - DEPTH_MIN) / DEPTH_QMAX
+        np.testing.assert_allclose(recovered[..., 0], depth, atol=tol + 1e-6)
+
+    def test_roundtrip_log_metres(self):
+        depth = self._make_depth_metres()
+        quantized = quantize_depth(depth, use_log=True, video_backend=None)
+        recovered = dequantize_depth(quantized, use_log=True, output_unit="m")
+
+        assert recovered.shape == (H, W, 1)
+        near = depth < 1.0
+        far = depth > 8.0
+        err_near = np.abs(recovered[..., 0][near] - depth[near])
+        err_far = np.abs(recovered[..., 0][far] - depth[far])
+        assert err_near.mean() < err_far.mean(), "Log quant should be more precise at close range"
+
+    def test_roundtrip_mm_uint16_input(self):
+        depth_mm = np.linspace(10, 10000, H * W, dtype=np.float64).reshape(H, W).astype(np.uint16)
+        quantized = quantize_depth(depth_mm, use_log=False, video_backend=None, input_unit="mm")
+        recovered = dequantize_depth(quantized, use_log=False, output_unit="mm")
+
+        assert recovered.dtype == np.uint16
+        tol_mm = (DEPTH_MAX - DEPTH_MIN) * 1000.0 / DEPTH_QMAX
+        np.testing.assert_allclose(
+            recovered[..., 0].astype(np.float64), depth_mm.astype(np.float64), atol=tol_mm + 1.0
+        )
+
+    def test_quantize_clamps_out_of_range(self):
+        depth = np.array([[0.001, 99.0]], dtype=np.float32)
+        quantized = quantize_depth(depth, use_log=False, video_backend=None)
+        assert quantized[0, 0] == 0
+        assert quantized[0, 1] == DEPTH_QMAX
+
+    def test_quantize_accepts_torch_tensor(self):
+        t = torch.rand(H, W, dtype=torch.float32) * (DEPTH_MAX - DEPTH_MIN) + DEPTH_MIN
+        result = quantize_depth(t, video_backend=None)
+        assert isinstance(result, np.ndarray)
+        assert result.dtype == np.uint16
+
+    def test_quantize_squeezes_channel_dim(self):
+        depth = self._make_depth_metres()
+        for shape in [(H, W, 1), (1, H, W)]:
+            reshaped = depth.reshape(shape)
+            quantized = quantize_depth(reshaped, video_backend=None)
+            assert quantized.ndim == 2, f"Input shape {shape} should be squeezed to 2D"
+
+    def test_quantize_returns_pyav_frame(self):
+        depth = self._make_depth_metres()
+        result = quantize_depth(depth, video_backend="pyav")
+        assert isinstance(result, av.VideoFrame)
+
+    def test_dequantize_output_tensor(self):
+        quantized = np.full((H, W), DEPTH_QMAX // 2, dtype=np.uint16)
+        result = dequantize_depth(quantized, output_unit="m", output_tensor=True)
+        assert isinstance(result, torch.Tensor)
+        assert result.shape == (H, W, 1)
+
+    def test_invalid_log_params_raises(self):
+        depth = np.ones((4, 4), dtype=np.float32)
+        with pytest.raises(ValueError, match="depth_min \\+ shift must be positive"):
+            quantize_depth(depth, depth_min=1.0, shift=-2.0, use_log=True, video_backend=None)
+
+
+# ── 2. Image writer depth support ───────────────────────────────────
+
+
+class TestImageWriterDepth:
+    """image_array_to_pil_image and write_image for single-channel depth maps."""
+
+    def test_pil_uint16_grayscale(self):
+        arr = np.arange(H * W, dtype=np.uint16).reshape(H, W)
+        img = image_array_to_pil_image(arr)
+        assert isinstance(img, PIL.Image.Image)
+        assert img.mode == "I;16"
+        assert img.size == (W, H)
+
+    def test_pil_float32_grayscale(self):
+        arr = np.random.rand(H, W).astype(np.float32)
+        img = image_array_to_pil_image(arr)
+        assert img.mode == "F"
+
+    def test_pil_squeeze_hwc1_and_1hw(self):
+        arr_uint16 = np.zeros((H, W), dtype=np.uint16)
+        for input_arr in [arr_uint16.reshape(H, W, 1), arr_uint16.reshape(1, H, W)]:
+            img = image_array_to_pil_image(input_arr)
+            assert img.size == (W, H)
+
+    def test_save_kwargs_png_vs_tiff(self):
+        png_kw = save_kwargs_for_path(Path("frame.png"), compress_level=5)
+        assert png_kw == {"compress_level": 5}
+
+        tiff_kw = save_kwargs_for_path(Path("frame.tiff"), compress_level=5)
+        assert tiff_kw == {"compression": "raw"}
+
+        assert save_kwargs_for_path(Path("frame.jpg"), compress_level=5) == {}
+
+    def test_write_image_tiff_roundtrip(self, tmp_path):
+        arr = np.arange(H * W, dtype=np.uint16).reshape(H, W)
+        fpath = tmp_path / "depth.tiff"
+        write_image(arr, fpath)
+
+        assert fpath.exists()
+        with PIL.Image.open(fpath) as loaded:
+            recovered = np.array(loaded)
+        np.testing.assert_array_equal(recovered, arr)
+
+
+# ── 3. Feature routing ──────────────────────────────────────────────
+
+
+class TestHwToDatasetFeaturesDepth:
+    """hw_to_dataset_features marks single-channel cameras as depth."""
+
+    def test_single_channel_cam_marked_depth(self):
+        from lerobot.utils.feature_utils import hw_to_dataset_features
+
+        features = hw_to_dataset_features({"cam": (480, 640, 1)}, prefix="observation")
+        ft = features["observation.images.cam"]
+        assert ft["info"]["is_depth_map"] is True
+
+    def test_three_channel_cam_not_depth(self):
+        from lerobot.utils.feature_utils import hw_to_dataset_features
+
+        features = hw_to_dataset_features({"cam": (480, 640, 3)}, prefix="observation")
+        ft = features["observation.images.cam"]
+        assert ft["info"]["is_depth_map"] is False
+
+    def test_invalid_channel_count_raises(self):
+        from lerobot.utils.feature_utils import hw_to_dataset_features
+
+        with pytest.raises(ValueError, match="Expected a 3-tuple"):
+            hw_to_dataset_features({"cam": (480, 640, 2)}, prefix="observation")
+
+
+# ── 4. Video info depth flag ────────────────────────────────────────
+
+
+class TestVideoInfoDepthFlag:
+    """Misc depth-related constants and helpers in video_utils / configs."""
+
+    def test_get_pix_fmt_channels_gray(self):
+        assert get_pix_fmt_channels("gray12le") == 1
+        assert get_pix_fmt_channels("gray8") == 1
+
+    def test_ffv1_in_valid_codecs(self):
+        assert "ffv1" in VALID_VIDEO_CODECS
+
+
+# ── 5. Feature-to-file-format routing ───────────────────────────────
+
+
+def _build_mixed_features(dtype: str) -> dict:
+    """Build a feature dict with one RGB camera and one depth camera.
+
+    Uses shapes from ``DUMMY_CAMERA_FEATURES`` and ``DUMMY_DEPTH_CAMERA_FEATURES``
+    defined in ``tests.fixtures.constants``.
+    """
+    rgb_cam = next(iter(DUMMY_CAMERA_FEATURES.values()))
+    depth_cam = next(iter(DUMMY_DEPTH_CAMERA_FEATURES.values()))
+    return {
+        "observation.images.rgb": {"dtype": dtype, **rgb_cam},
+        "observation.images.depth": {"dtype": dtype, **depth_cam},
+        **{k: {"dtype": v["dtype"], **v} for k, v in DUMMY_MOTOR_FEATURES.items()},
+    }
+
+
+def _make_mixed_frame(features: dict) -> dict:
+    """Build a valid frame dict matching the given feature schema."""
+    frame: dict = {"task": "test task"}
+    for key, ft in features.items():
+        shape = ft["shape"]
+        if ft["dtype"] in ("image", "video"):
+            channels = shape[-1]
+            if channels == 1:
+                frame[key] = np.random.randint(0, 4095, shape, dtype=np.uint16)
+            else:
+                frame[key] = np.random.randint(0, 255, shape, dtype=np.uint8)
+        else:
+            frame[key] = np.random.randn(*shape).astype(ft["dtype"])
+    return frame
+
+
+class TestFeatureFileRouting:
+    """Verify that depth vs RGB features are routed to the correct file format."""
+
+    NUM_FRAMES = 5
+
+    def test_no_video_depth_tiff_rgb_png(self, tmp_path):
+        """Without video encoding: depth -> .tiff, RGB -> .png."""
+        from lerobot.datasets.lerobot_dataset import LeRobotDataset
+
+        features = _build_mixed_features(dtype="image")
+
+        dataset = LeRobotDataset.create(
+            repo_id=DUMMY_REPO_ID,
+            fps=DEFAULT_FPS,
+            features=features,
+            root=tmp_path / "ds",
+            use_videos=False,
+        )
+
+        for _ in range(self.NUM_FRAMES):
+            dataset.add_frame(_make_mixed_frame(features))
+
+        buf = dataset.writer.episode_buffer
+        depth_paths = [Path(p) for p in buf["observation.images.depth"]]
+        rgb_paths = [Path(p) for p in buf["observation.images.rgb"]]
+
+        assert all(p.suffix == ".tiff" for p in depth_paths), "Depth frames should be .tiff"
+        assert all(p.suffix == ".png" for p in rgb_paths), "RGB frames should be .png"
+        assert all(p.exists() for p in depth_paths), "Depth TIFF files should exist on disk"
+        assert all(p.exists() for p in rgb_paths), "RGB PNG files should exist on disk"
+
+        dataset.save_episode()
+        dataset.finalize()
+
+    def test_video_depth_uses_depth_encoder(self, tmp_path):
+        """With streaming video encoding: depth keys use DepthEncoderConfig, RGB keys do not."""
+        from lerobot.datasets.lerobot_dataset import LeRobotDataset
+
+        features = _build_mixed_features(dtype="video")
+
+        dataset = LeRobotDataset.create(
+            repo_id=DUMMY_REPO_ID,
+            fps=DEFAULT_FPS,
+            features=features,
+            root=tmp_path / "ds",
+            use_videos=True,
+            streaming_encoding=True,
+        )
+
+        assert dataset.writer._streaming_encoder is not None
+        encoder = dataset.writer._streaming_encoder
+
+        for _ in range(self.NUM_FRAMES):
+            dataset.add_frame(_make_mixed_frame(features))
+
+        rgb_thread = encoder._threads["observation.images.rgb"]
+        depth_thread = encoder._threads["observation.images.depth"]
+
+        assert not isinstance(rgb_thread.video_encoder, DepthEncoderConfig)
+        assert isinstance(depth_thread.video_encoder, DepthEncoderConfig)
+        assert depth_thread.is_depth is True
+        assert rgb_thread.is_depth is False
+
+        dataset.save_episode()
+        dataset.finalize()
--- a/tests/datasets/test_image_writer.py
+++ b/tests/datasets/test_image_writer.py
@@ -94,7 +94,7 @@ def test_image_array_to_pil_image_pytorch_format(img_array_factory):

 def test_image_array_to_pil_image_single_channel(img_array_factory):
    img_array = img_array_factory(channels=1)
-    with pytest.raises(NotImplementedError):
+    with pytest.raises(ValueError, match="Unsupported single-channel image dtype"):
        image_array_to_pil_image(img_array)


--- a/tests/datasets/test_language.py
+++ b/tests/datasets/test_language.py
@@ -1,173 +0,0 @@
-#!/usr/bin/env python
-
-import pytest
-
-pytest.importorskip("datasets", reason="datasets is required (install lerobot[dataset])")
-pytest.importorskip("pandas", reason="pandas is required (install lerobot[dataset])")
-
-import numpy as np  # noqa: E402
-import pandas as pd  # noqa: E402
-import pyarrow as pa  # noqa: E402
-
-from lerobot.datasets import LeRobotDataset  # noqa: E402
-from lerobot.datasets.io_utils import write_info  # noqa: E402
-from lerobot.datasets.language import (  # noqa: E402
-    EVENT_ONLY_STYLES,
-    LANGUAGE_EVENTS,
-    LANGUAGE_PERSISTENT,
-    PERSISTENT_STYLES,
-    STYLE_REGISTRY,
-    VIEW_DEPENDENT_STYLES,
-    column_for_style,
-    is_view_dependent_style,
-    language_events_arrow_type,
-    language_feature_info,
-    language_persistent_arrow_type,
-    validate_camera_field,
-)
-from lerobot.datasets.utils import DEFAULT_DATA_PATH  # noqa: E402
-
-
-def test_language_arrow_schema_has_expected_fields():
-    persistent_row_type = language_persistent_arrow_type().value_type
-    event_row_type = language_events_arrow_type().value_type
-
-    assert isinstance(persistent_row_type, pa.StructType)
-    assert persistent_row_type.names == [
-        "role",
-        "content",
-        "style",
-        "timestamp",
-        "camera",
-        "tool_calls",
-    ]
-
-    assert isinstance(event_row_type, pa.StructType)
-    assert event_row_type.names == ["role", "content", "style", "camera", "tool_calls"]
-
-    # Persistent-row timestamps use float32, matching LeRobotDataset frame timestamps.
-    assert persistent_row_type.field("timestamp").type == pa.float32()
-
-
-def test_validate_feature_language_warns_only_on_non_empty_value(caplog):
-    from lerobot.datasets.feature_utils import validate_feature_language
-
-    # None (the expected record-time value) is silent and non-fatal.
-    with caplog.at_level("WARNING"):
-        assert validate_feature_language("language_persistent", None) == ""
-    assert caplog.records == []
-
-    # A stray non-empty value is dropped later, so we warn rather than fail.
-    with caplog.at_level("WARNING"):
-        assert validate_feature_language("language_persistent", [{"role": "user"}]) == ""
-    assert any("language_persistent" in r.message for r in caplog.records)
-
-
-def test_style_registry_routes_columns():
-    assert {"subtask", "plan", "memory", "motion", "task_aug"} == PERSISTENT_STYLES
-    assert {"interjection", "vqa", "trace"} == EVENT_ONLY_STYLES
-    assert PERSISTENT_STYLES | EVENT_ONLY_STYLES <= STYLE_REGISTRY
-
-    assert column_for_style("subtask") == LANGUAGE_PERSISTENT
-    assert column_for_style("plan") == LANGUAGE_PERSISTENT
-    assert column_for_style("memory") == LANGUAGE_PERSISTENT
-    assert column_for_style("motion") == LANGUAGE_PERSISTENT
-    assert column_for_style("task_aug") == LANGUAGE_PERSISTENT
-    assert column_for_style("interjection") == LANGUAGE_EVENTS
-    assert column_for_style("vqa") == LANGUAGE_EVENTS
-    assert column_for_style("trace") == LANGUAGE_EVENTS
-    assert column_for_style(None) == LANGUAGE_EVENTS
-
-
-def test_view_dependent_styles():
-    # motion lives in PERSISTENT_STYLES and is described in robot-frame
-    # (joint / Cartesian) terms, so it is NOT view-dependent. Only vqa
-    # (event) and trace (event, pixel-trajectory) carry a camera tag.
-    assert {"vqa", "trace"} == VIEW_DEPENDENT_STYLES
-    assert is_view_dependent_style("vqa")
-    assert is_view_dependent_style("trace")
-    assert not is_view_dependent_style("motion")
-    assert not is_view_dependent_style("subtask")
-    assert not is_view_dependent_style("plan")
-    assert not is_view_dependent_style("interjection")
-    assert not is_view_dependent_style(None)
-
-
-def test_validate_camera_field_requires_camera_for_view_dependent_styles():
-    validate_camera_field("vqa", "observation.images.top")
-    validate_camera_field("trace", "observation.images.front")
-    with pytest.raises(ValueError, match="view-dependent"):
-        validate_camera_field("vqa", None)
-    with pytest.raises(ValueError, match="view-dependent"):
-        validate_camera_field("trace", "")
-
-
-def test_validate_camera_field_rejects_camera_on_non_view_dependent_styles():
-    validate_camera_field("subtask", None)
-    validate_camera_field("plan", None)
-    validate_camera_field("memory", None)
-    validate_camera_field("motion", None)
-    validate_camera_field("interjection", None)
-    validate_camera_field(None, None)
-    with pytest.raises(ValueError, match="must have camera=None"):
-        validate_camera_field("subtask", "observation.images.top")
-    with pytest.raises(ValueError, match="must have camera=None"):
-        validate_camera_field("motion", "observation.images.top")
-    with pytest.raises(ValueError, match="must have camera=None"):
-        validate_camera_field("interjection", "observation.images.top")
-    with pytest.raises(ValueError, match="must have camera=None"):
-        validate_camera_field(None, "observation.images.top")
-
-
-def test_unknown_style_rejected():
-    with pytest.raises(ValueError, match="Unknown language style"):
-        column_for_style("surprise")
-
-
-def test_lerobot_dataset_passes_language_columns_through(tmp_path, empty_lerobot_dataset_factory):
-    root = tmp_path / "language_dataset"
-    dataset = empty_lerobot_dataset_factory(
-        root=root,
-        features={"state": {"dtype": "float32", "shape": (2,), "names": None}},
-        use_videos=False,
-    )
-    dataset.add_frame({"state": np.array([0.0, 1.0], dtype=np.float32), "task": "tidy"})
-    dataset.add_frame({"state": np.array([1.0, 2.0], dtype=np.float32), "task": "tidy"})
-    dataset.save_episode()
-    dataset.finalize()
-
-    persistent = [
-        {
-            "role": "assistant",
-            "content": "reach for the cup",
-            "style": "subtask",
-            "timestamp": 0.0,
-            "camera": None,
-            "tool_calls": None,
-        }
-    ]
-    event = {
-        "role": "user",
-        "content": "what is visible?",
-        "style": "vqa",
-        "camera": "observation.images.top",
-        "tool_calls": None,
-    }
-    data_path = root / DEFAULT_DATA_PATH.format(chunk_index=0, file_index=0)
-    df = pd.read_parquet(data_path)
-    df[LANGUAGE_PERSISTENT] = [persistent, persistent]
-    df[LANGUAGE_EVENTS] = [[event], []]
-    df.to_parquet(data_path)
-
-    info = dataset.meta.info
-    info["features"].update(language_feature_info())
-    write_info(info, root)
-
-    reloaded = LeRobotDataset(repo_id=dataset.repo_id, root=root)
-
-    first = reloaded[0]
-    second = reloaded[1]
-    assert first[LANGUAGE_PERSISTENT] == persistent
-    assert first[LANGUAGE_EVENTS] == [event]
-    assert second[LANGUAGE_PERSISTENT] == persistent
-    assert second[LANGUAGE_EVENTS] == []
--- a/tests/datasets/test_language_render.py
+++ b/tests/datasets/test_language_render.py
@@ -1,417 +0,0 @@
-#!/usr/bin/env python
-
-import pytest
-
-pytest.importorskip("datasets", reason="datasets is required (install lerobot[dataset])")
-
-from lerobot.configs.recipe import MessageTurn, TrainingRecipe  # noqa: E402
-from lerobot.datasets.language_render import (  # noqa: E402
-    EMITTED_AT_TOLERANCE_S,
-    active_at,
-    emitted_at,
-    nth_next,
-    nth_prev,
-    render_sample,
-)
-
-
-def persistent_row(role, content, style, timestamp, tool_calls=None, camera=None):
-    return {
-        "role": role,
-        "content": content,
-        "style": style,
-        "timestamp": timestamp,
-        "camera": camera,
-        "tool_calls": tool_calls,
-    }
-
-
-def event_row(role, content, style, tool_calls=None, camera=None):
-    return {
-        "role": role,
-        "content": content,
-        "style": style,
-        "camera": camera,
-        "tool_calls": tool_calls,
-    }
-
-
-PERSISTENT = [
-    persistent_row("assistant", "plan 0", "plan", 0.0),
-    persistent_row("assistant", "memory 0", "memory", 0.0),
-    persistent_row("assistant", "subtask 0", "subtask", 0.0),
-    persistent_row("assistant", "memory 1", "memory", 1.0),
-    persistent_row("assistant", "subtask 1", "subtask", 1.0),
-]
-EVENTS_AT_1 = [
-    event_row("user", "what is visible?", "vqa", camera="observation.images.top"),
-    event_row("assistant", '{"count": 2}', "vqa", camera="observation.images.top"),
-]
-EVENTS_AT_2 = [
-    event_row("user", "skip wiping", "interjection"),
-    event_row(
-        "assistant",
-        None,
-        None,
-        [{"type": "function", "function": {"name": "say", "arguments": {"text": "Skipping wiping."}}}],
-    ),
-]
-# Same emission tick, two cameras: triggers per-camera disambiguation in
-# resolvers, mirroring how Module 3 of the annotation pipeline writes one
-# (vqa, user) + (vqa, assistant) pair per camera.
-EVENTS_AT_3_TWO_CAMERAS = [
-    event_row("user", "how many cups (top)?", "vqa", camera="observation.images.top"),
-    event_row("assistant", '{"count": 3}', "vqa", camera="observation.images.top"),
-    event_row("user", "how many cups (wrist)?", "vqa", camera="observation.images.wrist"),
-    event_row("assistant", '{"count": 1}', "vqa", camera="observation.images.wrist"),
-]
-
-
-def test_resolver_temporal_semantics():
-    assert active_at(0.5, persistent=PERSISTENT, style="subtask")["content"] == "subtask 0"
-    assert active_at(1.0, persistent=PERSISTENT, style="subtask")["content"] == "subtask 1"
-    assert emitted_at(0.5, persistent=PERSISTENT, events=[], style="vqa", role="assistant") is None
-    assert (
-        emitted_at(1.0, persistent=PERSISTENT, events=EVENTS_AT_1, style="vqa", role="assistant")["content"]
-        == '{"count": 2}'
-    )
-
-
-def test_persistent_relative_resolvers_reject_event_styles():
-    with pytest.raises(ValueError, match="event-only"):
-        active_at(1.0, persistent=PERSISTENT, style="vqa")
-    with pytest.raises(ValueError, match="event-only"):
-        nth_prev(1.0, persistent=PERSISTENT, style="interjection")
-
-
-def test_nth_prev_and_next():
-    assert nth_prev(1.0, persistent=PERSISTENT, style="subtask", offset=1)["content"] == "subtask 0"
-    assert nth_next(0.0, persistent=PERSISTENT, style="subtask", offset=1)["content"] == "subtask 1"
-
-
-def test_substitution_if_present_multimodal_and_tool_calls():
-    recipe = TrainingRecipe(
-        messages=[
-            MessageTurn(
-                role="user",
-                content=[
-                    {"type": "image", "feature": "observation.images.top"},
-                    {"type": "text", "text": "${task}: ${interjection}"},
-                ],
-                stream="high_level",
-                if_present="interjection",
-            ),
-            MessageTurn(
-                role="assistant",
-                content="${plan}",
-                stream="high_level",
-                target=True,
-                tool_calls_from="speech",
-            ),
-        ],
-        bindings={"plan": "active_at(t, style=plan)"},
-    )
-
-    rendered = render_sample(
-        recipe=recipe,
-        persistent=PERSISTENT,
-        events=EVENTS_AT_2,
-        t=2.0,
-        sample_idx=0,
-        task="clean kitchen",
-    )
-
-    assert rendered["messages"][0]["content"][1]["text"] == "clean kitchen: skip wiping"
-    assert rendered["messages"][1]["content"] == "plan 0"
-    assert rendered["messages"][1]["tool_calls"][0]["function"]["name"] == "say"
-    assert rendered["message_streams"] == ["high_level", "high_level"]
-    assert rendered["target_message_indices"] == [1]
-
-
-def test_exact_event_miss_returns_none_when_target_skips():
-    recipe = TrainingRecipe(
-        messages=[
-            MessageTurn(role="user", content="${vqa_query}", stream="high_level", if_present="vqa_query"),
-            MessageTurn(
-                role="assistant",
-                content="${vqa}",
-                stream="high_level",
-                target=True,
-                if_present="vqa",
-            ),
-        ]
-    )
-
-    assert (
-        render_sample(recipe=recipe, persistent=PERSISTENT, events=EVENTS_AT_2, t=0.0, sample_idx=0) is None
-    )
-
-
-def test_deterministic_blend_sampling():
-    recipe = TrainingRecipe(
-        blend={
-            "a": TrainingRecipe(
-                weight=1.0,
-                messages=[
-                    MessageTurn(role="user", content="${task}", stream="high_level"),
-                    MessageTurn(role="assistant", content="a", stream="high_level", target=True),
-                ],
-            ),
-            "b": TrainingRecipe(
-                weight=1.0,
-                messages=[
-                    MessageTurn(role="user", content="${task}", stream="high_level"),
-                    MessageTurn(role="assistant", content="b", stream="high_level", target=True),
-                ],
-            ),
-        }
-    )
-
-    first = render_sample(
-        recipe=recipe, persistent=PERSISTENT, events=EVENTS_AT_2, t=0.0, sample_idx=123, task="x"
-    )
-    second = render_sample(
-        recipe=recipe, persistent=PERSISTENT, events=EVENTS_AT_2, t=0.0, sample_idx=123, task="x"
-    )
-    assert first == second
-
-
-def test_emitted_at_filters_vqa_by_camera():
-    top = emitted_at(
-        3.0,
-        persistent=PERSISTENT,
-        events=EVENTS_AT_3_TWO_CAMERAS,
-        style="vqa",
-        role="assistant",
-        camera="observation.images.top",
-    )
-    wrist = emitted_at(
-        3.0,
-        persistent=PERSISTENT,
-        events=EVENTS_AT_3_TWO_CAMERAS,
-        style="vqa",
-        role="assistant",
-        camera="observation.images.wrist",
-    )
-    assert top["content"] == '{"count": 3}'
-    assert wrist["content"] == '{"count": 1}'
-
-
-def test_emitted_at_raises_on_ambiguous_per_camera_vqa():
-    with pytest.raises(ValueError, match="Ambiguous resolver"):
-        emitted_at(
-            3.0,
-            persistent=PERSISTENT,
-            events=EVENTS_AT_3_TWO_CAMERAS,
-            style="vqa",
-            role="assistant",
-        )
-
-
-def _vqa_subrecipe(camera: str) -> TrainingRecipe:
-    return TrainingRecipe(
-        weight=1.0,
-        bindings={
-            "vqa_query": f"emitted_at(t, style=vqa, role=user, camera={camera})",
-            "vqa": f"emitted_at(t, style=vqa, role=assistant, camera={camera})",
-        },
-        messages=[
-            MessageTurn(
-                role="user",
-                content=[{"type": "image", "feature": camera}, {"type": "text", "text": "${vqa_query}"}],
-                stream="high_level",
-                if_present="vqa_query",
-            ),
-            MessageTurn(
-                role="assistant",
-                content="${vqa}",
-                stream="high_level",
-                target=True,
-                if_present="vqa",
-            ),
-        ],
-    )
-
-
-@pytest.mark.parametrize(
-    ("camera", "expected_query", "expected_answer"),
-    [
-        ("observation.images.top", "how many cups (top)?", '{"count": 3}'),
-        ("observation.images.wrist", "how many cups (wrist)?", '{"count": 1}'),
-    ],
-)
-def test_per_camera_blend_renders_both_views(camera, expected_query, expected_answer):
-    rendered = render_sample(
-        recipe=_vqa_subrecipe(camera),
-        persistent=PERSISTENT,
-        events=EVENTS_AT_3_TWO_CAMERAS,
-        t=3.0,
-        sample_idx=0,
-    )
-
-    assert rendered["messages"][0]["content"][0]["feature"] == camera
-    assert rendered["messages"][0]["content"][1]["text"] == expected_query
-    assert rendered["messages"][1]["content"] == expected_answer
-
-
-def test_resolve_task_picks_rephrasing_deterministically_per_sample():
-    rephrasings = [
-        persistent_row("user", "tidy the kitchen", "task_aug", 0.0),
-        persistent_row("user", "please clean up the kitchen", "task_aug", 0.0),
-        persistent_row("user", "kitchen needs tidying", "task_aug", 0.0),
-        persistent_row("user", "make the kitchen clean", "task_aug", 0.0),
-    ]
-    recipe = TrainingRecipe(
-        messages=[
-            MessageTurn(role="user", content="${task}", stream="high_level"),
-            MessageTurn(role="assistant", content="ok", stream="high_level", target=True),
-        ]
-    )
-
-    # No explicit task override → resolver consults persistent rows.
-    seen: set[str] = set()
-    for sample_idx in range(64):
-        rendered = render_sample(
-            recipe=recipe,
-            persistent=rephrasings,
-            events=[],
-            t=0.0,
-            sample_idx=sample_idx,
-            dataset_ctx={"task": "canonical kitchen task"},
-        )
-        seen.add(rendered["messages"][0]["content"])
-    # Every rephrasing should be reachable across enough samples.
-    assert seen == {r["content"] for r in rephrasings}
-    # Same sample_idx → same pick (determinism).
-    a = render_sample(
-        recipe=recipe,
-        persistent=rephrasings,
-        events=[],
-        t=0.0,
-        sample_idx=42,
-        dataset_ctx={"task": "canonical"},
-    )
-    b = render_sample(
-        recipe=recipe,
-        persistent=rephrasings,
-        events=[],
-        t=0.0,
-        sample_idx=42,
-        dataset_ctx={"task": "canonical"},
-    )
-    assert a["messages"][0]["content"] == b["messages"][0]["content"]
-
-
-def test_resolve_task_falls_back_to_canonical_without_rephrasings():
-    recipe = TrainingRecipe(
-        messages=[
-            MessageTurn(role="user", content="${task}", stream="high_level"),
-            MessageTurn(role="assistant", content="ok", stream="high_level", target=True),
-        ]
-    )
-    rendered = render_sample(
-        recipe=recipe,
-        persistent=PERSISTENT,  # no task_aug rows
-        events=[],
-        t=0.0,
-        sample_idx=0,
-        dataset_ctx={"task": "clean the kitchen"},
-    )
-    assert rendered["messages"][0]["content"] == "clean the kitchen"
-
-
-def test_resolve_task_explicit_override_beats_rephrasings():
-    rephrasings = [
-        persistent_row("user", "rephrased one", "task_aug", 0.0),
-        persistent_row("user", "rephrased two", "task_aug", 0.0),
-    ]
-    recipe = TrainingRecipe(
-        messages=[
-            MessageTurn(role="user", content="${task}", stream="high_level"),
-            MessageTurn(role="assistant", content="ok", stream="high_level", target=True),
-        ]
-    )
-    rendered = render_sample(
-        recipe=recipe,
-        persistent=rephrasings,
-        events=[],
-        t=0.0,
-        sample_idx=0,
-        task="explicit override wins",
-        dataset_ctx={"task": "canonical"},
-    )
-    assert rendered["messages"][0]["content"] == "explicit override wins"
-
-
-def test_emitted_at_persistent_tolerates_small_timestamp_drift():
-    """Persistent ``emitted_at`` should match within EMITTED_AT_TOLERANCE_S
-    so callers that derive ``t`` arithmetically (``frame_idx / fps``) still
-    line up with the parquet-stored timestamp.
-    """
-    rows = [persistent_row("assistant", "memo", "memory", 1.0)]
-    # Half a tolerance window — bit-different float, comfortably inside
-    inside = emitted_at(1.0 + EMITTED_AT_TOLERANCE_S / 2, persistent=rows, events=[], style="memory")
-    assert inside is not None and inside["content"] == "memo"
-
-    # Just past the window — no match
-    outside = emitted_at(1.0 + EMITTED_AT_TOLERANCE_S * 2, persistent=rows, events=[], style="memory")
-    assert outside is None
-
-
-def test_render_sample_rejects_non_dict_language_rows():
-    """``_normalize_rows`` must surface malformed inputs as TypeError.
-
-    A pipeline that hands the renderer a non-dict (e.g. a stray string)
-    is a real upstream bug — silent skipping would let it propagate.
-    """
-    recipe = TrainingRecipe(
-        messages=[
-            MessageTurn(role="user", content="${task}", stream="high_level"),
-            MessageTurn(role="assistant", content="ok", stream="high_level", target=True),
-        ]
-    )
-    with pytest.raises(TypeError, match="must be dictionaries"):
-        render_sample(
-            recipe=recipe,
-            persistent=["not a dict"],
-            events=[],
-            t=0.0,
-            sample_idx=0,
-            task="x",
-        )
-
-
-def test_low_level_branch_renders_active_subtask():
-    low_level = TrainingRecipe(
-        blend={
-            "low": TrainingRecipe(
-                weight=1.0,
-                messages=[
-                    MessageTurn(
-                        role="user",
-                        content="${task}\nPlan: ${plan}\nMemory: ${memory}",
-                        stream="high_level",
-                    ),
-                    MessageTurn(
-                        role="assistant",
-                        content="${subtask}",
-                        stream="low_level",
-                        target=True,
-                    ),
-                ],
-            )
-        }
-    )
-
-    rendered = render_sample(
-        recipe=low_level,
-        persistent=PERSISTENT,
-        events=[],
-        t=0.5,
-        sample_idx=0,
-        task="clean kitchen",
-    )
-
-    assert rendered["messages"][-1] == {"role": "assistant", "content": "subtask 0"}
-    assert rendered["message_streams"][-1] == "low_level"
-    assert rendered["target_message_indices"] == [1]
--- a/tests/datasets/test_streaming_video_encoder.py
+++ b/tests/datasets/test_streaming_video_encoder.py
@@ -61,9 +61,7 @@ class TestCameraEncoderThread:
        encoder_thread = _CameraEncoderThread(
            video_path=video_path,
            fps=fps,
-            vcodec=enc_cfg.vcodec,
-            pix_fmt=enc_cfg.pix_fmt,
-            codec_options=enc_cfg.get_codec_options(as_strings=True),
+            video_encoder=enc_cfg,
            frame_queue=frame_queue,
            result_queue=result_queue,
            stop_event=stop_event,
@@ -112,9 +110,7 @@ class TestCameraEncoderThread:
        encoder_thread = _CameraEncoderThread(
            video_path=video_path,
            fps=fps,
-            vcodec=enc_cfg.vcodec,
-            pix_fmt=enc_cfg.pix_fmt,
-            codec_options=enc_cfg.get_codec_options(as_strings=True),
+            video_encoder=enc_cfg,
            frame_queue=frame_queue,
            result_queue=result_queue,
            stop_event=stop_event,
@@ -146,9 +142,7 @@ class TestCameraEncoderThread:
        encoder_thread = _CameraEncoderThread(
            video_path=video_path,
            fps=fps,
-            vcodec=enc_cfg.vcodec,
-            pix_fmt=enc_cfg.pix_fmt,
-            codec_options=enc_cfg.get_codec_options(as_strings=True),
+            video_encoder=enc_cfg,
            frame_queue=frame_queue,
            result_queue=result_queue,
            stop_event=stop_event,
@@ -391,7 +385,8 @@ class TestStreamingVideoEncoder:

        # Verify codec options include thread tuning for libsvtav1 (lp=…)
        thread = encoder._threads[f"{OBS_IMAGES}.cam"]
-        assert "svtav1-params" in thread.codec_options or "threads" in thread.codec_options
+        codec_opts = thread.video_encoder.get_codec_options(encoder_threads=thread.encoder_threads)
+        assert "svtav1-params" in codec_opts or "threads" in codec_opts

        # Feed some frames and finish to ensure it works end-to-end
        num_frames = 10
--- a/tests/datasets/test_subtask_dataset.py
+++ b/tests/datasets/test_subtask_dataset.py
@@ -0,0 +1,193 @@
+#!/usr/bin/env python
+
+# Copyright 2026 The HuggingFace Inc. team. All rights reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+"""
+Tests for subtask functionality in LeRobotDataset.
+
+These tests verify that:
+- Subtask information is correctly loaded from datasets that have subtask data
+- The __getitem__ method correctly adds subtask strings to returned items
+- Subtask handling gracefully handles missing data
+"""
+
+import pytest
+
+pytest.importorskip("pandas", reason="pandas is required (install lerobot[dataset])")
+
+import pandas as pd  # noqa: E402
+import torch
+
+from lerobot.datasets.lerobot_dataset import LeRobotDataset
+
+
+class TestSubtaskDataset:
+    """Tests for subtask handling in LeRobotDataset."""
+
+    @pytest.fixture
+    def subtask_dataset(self):
+        """Load the test subtask dataset from the hub."""
+        # Use lerobot/pusht-subtask dataset with episode 1
+        return LeRobotDataset(
+            repo_id="lerobot/pusht-subtask",
+            episodes=[1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11],
+        )
+
+    def test_subtask_dataset_loads(self, subtask_dataset):
+        """Test that the subtask dataset loads successfully."""
+        assert subtask_dataset is not None
+        assert len(subtask_dataset) > 0
+
+    def test_subtask_metadata_loaded(self, subtask_dataset):
+        """Test that subtask metadata is loaded when present in dataset."""
+        # The dataset should have subtasks metadata loaded
+        assert subtask_dataset.meta.subtasks is not None
+        assert isinstance(subtask_dataset.meta.subtasks, pd.DataFrame)
+
+    def test_subtask_index_in_features(self, subtask_dataset):
+        """Test that subtask_index is a feature when dataset has subtasks."""
+        assert "subtask_index" in subtask_dataset.features
+
+    def test_getitem_returns_subtask_string(self, subtask_dataset):
+        """Test that __getitem__ correctly adds subtask string to returned item."""
+        item = subtask_dataset[0]
+
+        # Subtask should be present in the returned item
+        assert "subtask" in item
+        assert isinstance(item["subtask"], str)
+        assert len(item["subtask"]) > 0  # Should not be empty
+
+    def test_getitem_has_subtask_index(self, subtask_dataset):
+        """Test that __getitem__ includes subtask_index."""
+        item = subtask_dataset[0]
+
+        assert "subtask_index" in item
+        assert isinstance(item["subtask_index"], torch.Tensor)
+
+    def test_subtask_index_maps_to_valid_subtask(self, subtask_dataset):
+        """Test that subtask_index correctly maps to a subtask in metadata."""
+        item = subtask_dataset[0]
+
+        subtask_idx = item["subtask_index"].item()
+        subtask_from_metadata = subtask_dataset.meta.subtasks.iloc[subtask_idx].name
+
+        assert item["subtask"] == subtask_from_metadata
+
+    def test_all_items_have_subtask(self, subtask_dataset):
+        """Test that all items in the dataset have subtask information."""
+        for i in range(min(len(subtask_dataset), 5)):  # Check first 5 items
+            item = subtask_dataset[i]
+            assert "subtask" in item
+            assert isinstance(item["subtask"], str)
+
+    def test_task_and_subtask_coexist(self, subtask_dataset):
+        """Test that both task and subtask are present in returned items."""
+        item = subtask_dataset[0]
+
+        # Both task and subtask should be present
+        assert "task" in item
+        assert "subtask" in item
+        assert isinstance(item["task"], str)
+        assert isinstance(item["subtask"], str)
+
+
+class TestSubtaskDatasetMissing:
+    """Tests for graceful handling when subtask data is missing."""
+
+    @pytest.fixture
+    def dataset_without_subtasks(self, tmp_path, empty_lerobot_dataset_factory):
+        """Create a dataset without subtask information."""
+        features = {"state": {"dtype": "float32", "shape": (2,), "names": None}}
+        dataset = empty_lerobot_dataset_factory(root=tmp_path / "no_subtask", features=features)
+
+        # Add some frames and save
+        for _ in range(5):
+            dataset.add_frame({"state": torch.randn(2), "task": "Test task"})
+        dataset.save_episode()
+        dataset.finalize()
+
+        # Reload the dataset
+        return LeRobotDataset(dataset.repo_id, root=dataset.root)
+
+    def test_no_subtask_in_features(self, dataset_without_subtasks):
+        """Test that subtask_index is not in features when not provided."""
+        assert "subtask_index" not in dataset_without_subtasks.features
+
+    def test_getitem_without_subtask(self, dataset_without_subtasks):
+        """Test that __getitem__ works when subtask is not present."""
+        item = dataset_without_subtasks[0]
+
+        # Item should still be retrievable
+        assert item is not None
+        assert "state" in item
+        assert "task" in item
+
+        # Subtask should NOT be present
+        assert "subtask" not in item
+
+    def test_subtasks_metadata_is_none(self, dataset_without_subtasks):
+        """Test that subtasks metadata is None when not present."""
+        assert dataset_without_subtasks.meta.subtasks is None
+
+
+class TestSubtaskEdgeCases:
+    """Edge case tests for subtask handling."""
+
+    def test_subtask_with_multiple_episodes(self):
+        """Test subtask handling with multiple episodes if available."""
+        try:
+            dataset = LeRobotDataset(
+                repo_id="lerobot/pusht-subtask",
+                episodes=[1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11],
+            )
+        except Exception:
+            pytest.skip("Could not load test-subtask dataset")
+
+        # Check first and last items have valid subtasks
+        first_item = dataset[0]
+        last_item = dataset[len(dataset) - 1]
+
+        assert "subtask" in first_item
+        assert "subtask" in last_item
+        assert isinstance(first_item["subtask"], str)
+        assert isinstance(last_item["subtask"], str)
+
+    def test_subtask_index_consistency(self):
+        """Test that same subtask_index returns same subtask string."""
+        try:
+            dataset = LeRobotDataset(
+                repo_id="lerobot/pusht-subtask",
+                episodes=[1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11],
+            )
+        except Exception:
+            pytest.skip("Could not load test-subtask dataset")
+
+        if len(dataset) < 2:
+            pytest.skip("Dataset too small for this test")
+
+        # Collect subtask_index to subtask mappings
+        subtask_map = {}
+        for i in range(min(len(dataset), 10)):
+            item = dataset[i]
+            idx = item["subtask_index"].item()
+            subtask = item["subtask"]
+
+            if idx in subtask_map:
+                # Same index should always return same subtask
+                assert subtask_map[idx] == subtask, (
+                    f"Inconsistent subtask for index {idx}: '{subtask_map[idx]}' vs '{subtask}'"
+                )
+            else:
+                subtask_map[idx] = subtask
--- a/tests/datasets/test_video_encoding.py
+++ b/tests/datasets/test_video_encoding.py
@@ -26,7 +26,7 @@ pytest.importorskip("av", reason="av is required (install lerobot[dataset])")

 import av  # noqa: E402

-from lerobot.configs import VALID_VIDEO_CODECS, VideoEncoderConfig
+from lerobot.configs import VALID_VIDEO_CODECS, DepthEncoderConfig, VideoEncoderConfig
 from lerobot.datasets.image_writer import write_image
 from lerobot.datasets.lerobot_dataset import LeRobotDataset
 from lerobot.datasets.pyav_utils import get_codec
@@ -35,7 +35,6 @@ from lerobot.datasets.video_utils import (
    concatenate_video_files,
    encode_video_frames,
    get_video_info,
-    reencode_video,
 )
 from tests.fixtures.constants import DUMMY_VIDEO_INFO

@@ -339,7 +338,7 @@ def _encode_video(
 ) -> Path:
    imgs_dir = path.parent / f"imgs_{path.stem}"
    _write_frames(imgs_dir, num_frames=num_frames)
-    encode_video_frames(imgs_dir, path, fps=fps, camera_encoder=cfg, overwrite=True)
+    encode_video_frames(imgs_dir, path, fps=fps, video_encoder=cfg, overwrite=True)
    return path


@@ -348,22 +347,16 @@ def _read_feature_info(dataset: LeRobotDataset) -> dict:
    return info["features"][VIDEO_KEY]["info"]


-def _add_frames(dataset: LeRobotDataset, num_frames: int, video_keys: list[str] | None = None) -> None:
-    from lerobot.utils.constants import DEFAULT_FEATURES
-
-    if video_keys is None:
-        video_keys = dataset.meta.video_keys
+def _add_frames(dataset: LeRobotDataset, num_frames: int) -> None:
+    shape = dataset.meta.features[VIDEO_KEY]["shape"]
    for _ in range(num_frames):
-        frame: dict = {"task": "test"}
-        for key, ft in dataset.meta.features.items():
-            if key in DEFAULT_FEATURES:
-                continue
-            shape = ft["shape"]
-            if key in video_keys:
-                frame[key] = np.random.randint(0, 256, shape, dtype=np.uint8)
-            else:
-                frame[key] = np.zeros(shape, dtype=np.float32)
-        dataset.add_frame(frame)
+        dataset.add_frame(
+            {
+                VIDEO_KEY: np.random.randint(0, 256, shape, dtype=np.uint8),
+                "action": np.zeros(2, dtype=np.float32),
+                "task": "test",
+            }
+        )


 class TestGetVideoInfo:
@@ -375,7 +368,7 @@ class TestGetVideoInfo:
        assert info["video.pix_fmt"] == "yuv420p"
        assert info["video.fps"] == 30
        assert info["video.channels"] == 3
-        assert info["video.is_depth_map"] is False
+        assert info["is_depth_map"] is False
        assert info["has_audio"] is False
        assert "video.g" not in info
        assert "video.crf" not in info
@@ -385,7 +378,7 @@ class TestGetVideoInfo:
    def test_merges_encoder_config_as_video_prefixed_entries(self):
        cfg = VideoEncoderConfig(vcodec="libsvtav1", g=2, crf=30, preset=12)

-        info = get_video_info(TEST_ARTIFACTS_DIR / "clip_4frames.mp4", camera_encoder=cfg)
+        info = get_video_info(TEST_ARTIFACTS_DIR / "clip_4frames.mp4", video_encoder=cfg)

        assert info["video.g"] == 2
        assert info["video.crf"] == 30
@@ -398,11 +391,16 @@ class TestGetVideoInfo:
    def test_stream_derived_keys_take_precedence_over_config(self):
        cfg = VideoEncoderConfig(vcodec="libsvtav1", pix_fmt="yuv420p")

-        info = get_video_info(TEST_ARTIFACTS_DIR / "clip_4frames.mp4", camera_encoder=cfg)
+        info = get_video_info(TEST_ARTIFACTS_DIR / "clip_4frames.mp4", video_encoder=cfg)

        assert info["video.codec"]  # populated from stream, not from config's vcodec
        assert info["video.pix_fmt"] == "yuv420p"

+    def test_depth_encoder_config_sets_is_depth_map_true(self):
+        """A ``DepthEncoderConfig`` causes ``get_video_info`` to mark the stream as depth."""
+        info = get_video_info(TEST_ARTIFACTS_DIR / "clip_4frames.mp4", video_encoder=DepthEncoderConfig())
+        assert info["is_depth_map"] is True
+

 class TestEncodeVideoFrames:
    @require_libsvtav1
@@ -461,7 +459,7 @@ class TestEncodeVideoFrames:
        cfg = VideoEncoderConfig(vcodec="libsvtav1", g=4, crf=25, preset=10)
        video_path = _encode_video(tmp_path / "out.mp4", num_frames=4, fps=30, cfg=cfg)

-        info = get_video_info(video_path, camera_encoder=cfg)
+        info = get_video_info(video_path, video_encoder=cfg)

        # Stream-derived
        assert info["video.height"] == 64
@@ -470,7 +468,7 @@ class TestEncodeVideoFrames:
        assert info["video.codec"] == "av1"
        assert info["video.pix_fmt"] == "yuv420p"
        assert info["video.fps"] == 30
-        assert info["video.is_depth_map"] is False
+        assert info["is_depth_map"] is False
        assert info["has_audio"] is False
        # Encoder config
        assert info["video.g"] == 4
@@ -481,30 +479,6 @@ class TestEncodeVideoFrames:
        assert info["video.extra_options"] == {}


-class TestReencodeVideo:
-    @require_libsvtav1
-    @require_h264
-    def test_reencode_video(self, tmp_path):
-        src = TEST_ARTIFACTS_DIR / "clip_4frames.mp4"
-        out = tmp_path / "reencoded.mp4"
-        cfg = VideoEncoderConfig(vcodec="h264", g=6, crf=23, pix_fmt="yuv444p")
-        reencode_video(src, out, camera_encoder=cfg, overwrite=True)
-
-        assert out.exists()
-        with av.open(str(out)) as container:
-            n_frames = sum(1 for _ in container.decode(video=0))
-        assert n_frames == 4
-
-        info = get_video_info(out, camera_encoder=cfg)
-        assert info["video.codec"] == "h264"
-        assert info["video.pix_fmt"] == "yuv444p"
-        assert info["video.height"] == 64
-        assert info["video.width"] == 96
-        assert info["video.fps"] == 30
-        assert info["video.g"] == 6
-        assert info["video.crf"] == 23
-
-
 class TestConcatenateVideoFiles:
    def test_two_clips_frame_count(self, tmp_path):
        """Output frame count equals the sum of the two input frame counts."""
--- a/tests/fixtures/constants.py
+++ b/tests/fixtures/constants.py
@@ -39,12 +39,23 @@ DUMMY_VIDEO_INFO = {
    "video.crf": 30,
    "video.preset": 12,
    "video.fast_decode": 0,
-    "video.is_depth_map": False,
+    "is_depth_map": False,
    "has_audio": False,
 }
 DUMMY_CAMERA_FEATURES = {
    "laptop": {"shape": (64, 96, 3), "names": ["height", "width", "channels"], "info": DUMMY_VIDEO_INFO},
    "phone": {"shape": (64, 96, 3), "names": ["height", "width", "channels"], "info": DUMMY_VIDEO_INFO},
 }
+DUMMY_DEPTH_VIDEO_INFO = {
+    **DUMMY_VIDEO_INFO,
+    "is_depth_map": True,
+}
+DUMMY_DEPTH_CAMERA_FEATURES = {
+    "laptop_depth": {
+        "shape": (64, 96, 1),
+        "names": ["height", "width", "channels"],
+        "info": DUMMY_DEPTH_VIDEO_INFO,
+    },
+}
 DUMMY_CHW = (3, 96, 128)
 DUMMY_HWC = (96, 128, 3)
--- a/tests/processor/test_render_messages_processor.py
+++ b/tests/processor/test_render_messages_processor.py
@@ -1,60 +0,0 @@
-#!/usr/bin/env python
-
-import pytest
-
-pytest.importorskip("datasets", reason="datasets is required (install lerobot[dataset])")
-
-import torch  # noqa: E402
-
-from lerobot.configs.recipe import MessageTurn, TrainingRecipe  # noqa: E402
-from lerobot.processor.converters import create_transition  # noqa: E402
-from lerobot.processor.render_messages_processor import RenderMessagesStep  # noqa: E402
-from lerobot.types import TransitionKey  # noqa: E402
-
-
-def test_render_messages_step_noops_without_language_columns():
-    recipe = TrainingRecipe(
-        messages=[
-            MessageTurn(role="user", content="${task}", stream="high_level"),
-            MessageTurn(role="assistant", content="${subtask}", stream="low_level", target=True),
-        ]
-    )
-    transition = create_transition(complementary_data={"task": "do it"})
-
-    assert RenderMessagesStep(recipe)(transition) == transition
-
-
-def test_render_messages_step_renders_and_drops_raw_language():
-    recipe = TrainingRecipe(
-        messages=[
-            MessageTurn(role="user", content="${task}", stream="high_level"),
-            MessageTurn(role="assistant", content="${subtask}", stream="low_level", target=True),
-        ]
-    )
-    transition = create_transition(
-        complementary_data={
-            "task": "do it",
-            "timestamp": torch.tensor(0.0),
-            "index": torch.tensor(7),
-            "language_persistent": [
-                {
-                    "role": "assistant",
-                    "content": "reach carefully",
-                    "style": "subtask",
-                    "timestamp": 0.0,
-                    "camera": None,
-                    "tool_calls": None,
-                }
-            ],
-            "language_events": [],
-        }
-    )
-
-    out = RenderMessagesStep(recipe)(transition)
-    data = out[TransitionKey.COMPLEMENTARY_DATA]
-
-    assert "language_persistent" not in data
-    assert "language_events" not in data
-    assert data["messages"][-1]["content"] == "reach carefully"
-    assert data["message_streams"] == ["high_level", "low_level"]
-    assert data["target_message_indices"] == [1]
--- a/tests/rewards/test_modeling_topreward.py
+++ b/tests/rewards/test_modeling_topreward.py
@@ -1,296 +0,0 @@
-# Copyright 2026 The HuggingFace Inc. team. All rights reserved.
-#
-# Licensed under the Apache License, Version 2.0 (the "License");
-# you may not use this file except in compliance with the License.
-# You may obtain a copy of the License at
-#
-#     http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-
-"""Tests for the TOPReward reward model."""
-
-from __future__ import annotations
-
-from types import SimpleNamespace
-
-import pytest
-import torch
-
-from lerobot.configs.rewards import RewardModelConfig
-from lerobot.rewards.factory import get_reward_model_class, make_reward_model_config
-from lerobot.rewards.topreward import TOPRewardConfig
-from lerobot.rewards.topreward.processor_topreward import TOPREWARD_FEATURE_PREFIX, TOPREWARD_INPUT_KEYS
-from tests.utils import skip_if_package_missing
-
-
-class _FakeQwenModel(torch.nn.Module):
-    """Stand-in for ``Qwen3VLForConditionalGeneration``.
-
-    Returns a ``SimpleNamespace`` with ``logits`` of a controlled shape so
-    the log-prob extraction path in ``compute_reward`` can be exercised
-    without downloading real VLM weights.
-    """
-
-    def __init__(self) -> None:
-        super().__init__()
-        self._param = torch.nn.Parameter(torch.zeros(1))
-        self._reward_value: float = -1.5
-
-    @classmethod
-    def from_pretrained(cls, *args, **kwargs):  # noqa: ARG003
-        return cls()
-
-    def forward(  # noqa: ARG002
-        self, input_ids, attention_mask=None, labels=None, logits_to_keep=0, **kwargs
-    ):
-        batch_size, seq_len = input_ids.shape
-        vocab_size = 1000
-        logits = torch.zeros(batch_size, seq_len, vocab_size)
-        # Place a controlled log-prob at the target token position so the
-        # model returns a predictable reward value.
-        # The label-masked suffix is the last token.
-        # After the causal-LM shift (logits[:, :-1], labels[:, 1:]) the scored
-        # position is logits[:, -2, :] predicting labels[:, -1].
-        # We set logits so that log_softmax at the target token ≈ _reward_value.
-        for i in range(batch_size):
-            target_idx = int(input_ids[i, -1].item())
-            logits[i, -2, target_idx] = self._reward_value * -10  # high logit -> high log-prob
-        if logits_to_keep:
-            logits = logits[:, -logits_to_keep:, :]
-        return SimpleNamespace(logits=logits)
-
-
-def _patch_build(monkeypatch) -> None:
-    """Stub out HF AutoX so TOPReward construction is cheap and offline."""
-    from lerobot.rewards.topreward import modeling_topreward
-
-    monkeypatch.setattr(modeling_topreward, "Qwen3VLForConditionalGeneration", _FakeQwenModel)
-
-
-def _make_batch(
-    input_ids: torch.Tensor,
-    attention_mask: torch.Tensor | None = None,
-    labels: torch.Tensor | None = None,
-    *,
-    omit: str | None = None,
-) -> dict[str, torch.Tensor]:
-    """Build a ``compute_reward``-ready batch using TOPReward's namespaced keys."""
-    batch_size, seq_len = input_ids.shape
-    if attention_mask is None:
-        attention_mask = torch.ones(batch_size, seq_len, dtype=torch.long)
-    batch: dict[str, torch.Tensor] = {}
-    if labels is not None:
-        batch[f"{TOPREWARD_FEATURE_PREFIX}labels"] = labels
-    batch.update(
-        {
-            f"{TOPREWARD_FEATURE_PREFIX}input_ids": input_ids,
-            f"{TOPREWARD_FEATURE_PREFIX}attention_mask": attention_mask,
-            f"{TOPREWARD_FEATURE_PREFIX}pixel_values_videos": torch.zeros(
-                batch_size, 1536, dtype=torch.float32
-            ),
-            f"{TOPREWARD_FEATURE_PREFIX}video_grid_thw": torch.ones(batch_size, 3, dtype=torch.long),
-            f"{TOPREWARD_FEATURE_PREFIX}mm_token_type_ids": torch.zeros_like(input_ids),
-        }
-    )
-    if omit is not None:
-        batch.pop(f"{TOPREWARD_FEATURE_PREFIX}{omit}", None)
-    return batch
-
-
-def _terminal_labels(input_ids: torch.Tensor) -> torch.Tensor:
-    labels = torch.full_like(input_ids, -100)
-    labels[:, -1] = input_ids[:, -1]
-    return labels
-
-
-# ---------------------------------------------------------------------------
-# Registry + factory
-# ---------------------------------------------------------------------------
-
-
-def test_topreward_config_registered():
-    assert "topreward" in RewardModelConfig.get_known_choices()
-    assert RewardModelConfig.get_choice_class("topreward") is TOPRewardConfig
-    assert isinstance(make_reward_model_config("topreward", device="cpu"), TOPRewardConfig)
-
-
-def test_topreward_factory_returns_in_tree_class():
-    from lerobot.rewards.topreward.modeling_topreward import TOPRewardModel
-
-    assert get_reward_model_class("topreward") is TOPRewardModel
-
-
-# ---------------------------------------------------------------------------
-# Config validation
-# ---------------------------------------------------------------------------
-
-
-def test_topreward_config_rejects_zero_max_frames():
-    with pytest.raises(ValueError, match="max_frames must be >= 1"):
-        TOPRewardConfig(device="cpu", max_frames=0)
-
-
-def test_topreward_config_rejects_non_positive_fps():
-    with pytest.raises(ValueError, match="fps must be > 0"):
-        TOPRewardConfig(device="cpu", fps=0.0)
-
-
-def test_topreward_config_rejects_suffix_without_instruction_placeholder():
-    with pytest.raises(ValueError, match=r"\{instruction\}"):
-        TOPRewardConfig(device="cpu", prompt_suffix_template="no placeholder here")
-
-
-# ---------------------------------------------------------------------------
-# compute_reward
-# ---------------------------------------------------------------------------
-
-
-@skip_if_package_missing("transformers")
-def test_topreward_compute_reward_returns_one_scalar_per_sample(monkeypatch):
-    """``compute_reward`` must return a ``(B,)`` float32 tensor with one
-    log-prob reward per sample, consuming pre-encoded Qwen-VL tensors."""
-    from lerobot.rewards.topreward.modeling_topreward import TOPRewardModel
-
-    _patch_build(monkeypatch)
-    cfg = TOPRewardConfig(device="cpu")
-    model = TOPRewardModel(cfg)
-
-    input_ids = torch.randint(0, 100, (2, 10))
-    attention_mask = torch.ones(2, 10, dtype=torch.long)
-    labels = _terminal_labels(input_ids)
-
-    batch = _make_batch(input_ids, attention_mask, labels)
-    rewards = model.compute_reward(batch)
-
-    assert rewards.shape == (2,)
-    assert rewards.dtype == torch.float32
-
-
-@skip_if_package_missing("transformers")
-def test_topreward_compute_reward_applies_success_threshold(monkeypatch):
-    """When ``success_threshold`` is finite, the model returns binary success."""
-    from lerobot.rewards.topreward.modeling_topreward import TOPRewardModel
-
-    _patch_build(monkeypatch)
-    cfg = TOPRewardConfig(device="cpu", success_threshold=0.0)
-    model = TOPRewardModel(cfg)
-
-    input_ids = torch.randint(0, 100, (2, 10))
-    attention_mask = torch.ones(2, 10, dtype=torch.long)
-    labels = _terminal_labels(input_ids)
-
-    batch = _make_batch(input_ids, attention_mask, labels)
-    rewards = model.compute_reward(batch)
-
-    assert rewards.shape == (2,)
-    assert set(rewards.tolist()).issubset({0.0, 1.0})
-
-
-@skip_if_package_missing("transformers")
-def test_topreward_compute_reward_errors_when_inputs_missing(monkeypatch):
-    from lerobot.rewards.topreward.modeling_topreward import TOPRewardModel
-
-    _patch_build(monkeypatch)
-    cfg = TOPRewardConfig(device="cpu")
-    model = TOPRewardModel(cfg)
-
-    with pytest.raises(KeyError, match=r"observation\.topreward\.input_ids"):
-        model.compute_reward(_make_batch(torch.randint(0, 100, (1, 10)), omit="input_ids"))
-
-
-@skip_if_package_missing("transformers")
-def test_topreward_compute_reward_errors_when_labels_missing(monkeypatch):
-    from lerobot.rewards.topreward.modeling_topreward import TOPRewardModel
-
-    _patch_build(monkeypatch)
-    cfg = TOPRewardConfig(device="cpu")
-    model = TOPRewardModel(cfg)
-
-    input_ids = torch.randint(0, 100, (1, 10))
-    with pytest.raises(KeyError, match=r"observation\.topreward\.labels"):
-        model.compute_reward(_make_batch(input_ids, labels=None))
-
-
-@skip_if_package_missing("transformers")
-def test_topreward_compute_reward_requires_all_encoder_keys(monkeypatch):
-    from lerobot.rewards.topreward.modeling_topreward import TOPRewardModel
-
-    _patch_build(monkeypatch)
-    cfg = TOPRewardConfig(device="cpu")
-    model = TOPRewardModel(cfg)
-
-    input_ids = torch.randint(0, 100, (1, 10))
-    labels = _terminal_labels(input_ids)
-    required_encoder_keys = set(TOPREWARD_INPUT_KEYS) - {"input_ids", "labels"}
-
-    for key in required_encoder_keys:
-        with pytest.raises(KeyError, match=rf"observation\.topreward\.{key}"):
-            model.compute_reward(_make_batch(input_ids, labels=labels, omit=key))
-
-
-# ---------------------------------------------------------------------------
-# Save / load — config-only checkpoint
-# ---------------------------------------------------------------------------
-
-
-@skip_if_package_missing("transformers")
-def test_topreward_save_pretrained_writes_only_config_json(monkeypatch, tmp_path):
-    from huggingface_hub.constants import CONFIG_NAME, SAFETENSORS_SINGLE_FILE
-
-    from lerobot.rewards.topreward.modeling_topreward import TOPRewardModel
-
-    _patch_build(monkeypatch)
-    cfg = TOPRewardConfig(
-        device="cpu",
-        vlm_name="Qwen/Qwen3-VL-8B-Instruct",
-        fps=4.0,
-        image_key="observation.images.front",
-    )
-    model = TOPRewardModel(cfg)
-    model.save_pretrained(str(tmp_path))
-
-    assert (tmp_path / CONFIG_NAME).exists()
-    assert not (tmp_path / SAFETENSORS_SINGLE_FILE).exists()
-
-
-@skip_if_package_missing("transformers")
-def test_topreward_from_pretrained_local_dir_roundtrips_config(monkeypatch, tmp_path):
-    from lerobot.rewards.topreward.modeling_topreward import TOPRewardModel
-
-    _patch_build(monkeypatch)
-    cfg = TOPRewardConfig(
-        device="cpu",
-        vlm_name="Qwen/Qwen3-VL-8B-Instruct",
-        fps=4.0,
-        image_key="observation.images.front",
-        add_chat_template=True,
-        success_threshold=-1.5,
-    )
-    TOPRewardModel(cfg).save_pretrained(str(tmp_path))
-
-    reloaded = TOPRewardModel.from_pretrained(str(tmp_path))
-
-    assert isinstance(reloaded.config, TOPRewardConfig)
-    assert reloaded.config.vlm_name == "Qwen/Qwen3-VL-8B-Instruct"
-    assert reloaded.config.fps == 4.0
-    assert reloaded.config.image_key == "observation.images.front"
-    assert reloaded.config.add_chat_template is True
-    assert reloaded.config.success_threshold == -1.5
-
-
-@skip_if_package_missing("transformers")
-def test_topreward_is_not_trainable(monkeypatch):
-    from lerobot.rewards.topreward.modeling_topreward import TOPRewardModel
-
-    _patch_build(monkeypatch)
-    cfg = TOPRewardConfig(device="cpu")
-    model = TOPRewardModel(cfg)
-
-    assert model.is_trainable is False
-    with pytest.raises(NotImplementedError, match="not trainable"):
-        model.forward({"x": torch.zeros(1)})
--- a/tests/rewards/test_topreward.py
+++ b/tests/rewards/test_topreward.py
@@ -1,80 +0,0 @@
-# Copyright 2026 The HuggingFace Inc. team. All rights reserved.
-#
-# Licensed under the Apache License, Version 2.0 (the "License");
-# you may not use this file except in compliance with the License.
-# You may obtain a copy of the License at
-#
-#     http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-
-"""End-to-end TOPReward smoke test with the real Qwen3-VL model."""
-
-import os
-
-import pytest
-import torch
-
-pytest.importorskip("transformers")
-
-from lerobot.rewards.topreward.configuration_topreward import TOPRewardConfig  # noqa: E402
-from lerobot.rewards.topreward.modeling_topreward import TOPRewardModel  # noqa: E402
-from lerobot.rewards.topreward.processor_topreward import (  # noqa: E402
-    TOPREWARD_FEATURE_PREFIX,
-    TOPREWARD_INPUT_KEYS,
-    make_topreward_pre_post_processors,
-)
-from tests.utils import require_cuda  # noqa: E402
-
-pytestmark = pytest.mark.skipif(
-    os.environ.get("CI") == "true" or os.environ.get("GITHUB_ACTIONS") == "true",
-    reason="This test requires downloading and loading Qwen3-VL and is not meant for CI",
-)
-
-
-def _make_dummy_topreward_batch(image_key: str, task_key: str) -> dict[str, object]:
-    num_frames = 4
-    image_size = 64
-    frames = torch.zeros(1, num_frames, 3, image_size, image_size, dtype=torch.uint8)
-    for frame_idx in range(num_frames):
-        frames[0, frame_idx, 0].fill_(min(frame_idx * 48, 255))
-        frames[0, frame_idx, 1].fill_(96)
-        frames[0, frame_idx, 2].fill_(192)
-
-    return {
-        image_key: frames,
-        task_key: ["pick up the red cube"],
-    }
-
-
-@require_cuda
-def test_topreward_full_qwen3vl_preprocessor_to_compute_reward():
-    cfg = TOPRewardConfig(
-        vlm_name="Qwen/Qwen3-VL-8B-Instruct",
-        device="cuda",
-        max_frames=4,
-        fps=2.0,
-        max_input_length=4096,
-    )
-
-    preprocessor, _ = make_topreward_pre_post_processors(cfg)
-    encoded_batch = preprocessor(_make_dummy_topreward_batch(cfg.image_key, cfg.task_key))
-    for key in TOPREWARD_INPUT_KEYS:
-        assert f"{TOPREWARD_FEATURE_PREFIX}{key}" in encoded_batch
-
-    model = TOPRewardModel(cfg)
-    try:
-        model.to(cfg.device)
-        model.eval()
-        rewards = model.compute_reward(encoded_batch)
-    finally:
-        del model
-        torch.cuda.empty_cache()
-
-    assert rewards.shape == (1,)
-    assert rewards.dtype == torch.float32
-    assert torch.isfinite(rewards).all()
--- a/tests/rewards/test_topreward_processor.py
+++ b/tests/rewards/test_topreward_processor.py
@@ -1,246 +0,0 @@
-# Copyright 2026 The HuggingFace Inc. team. All rights reserved.
-#
-# Licensed under the Apache License, Version 2.0 (the "License");
-# you may not use this file except in compliance with the License.
-# You may obtain a copy of the License at
-#
-#     http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-
-"""Tests for TOPReward's pre-processing helpers and encoder step."""
-
-from __future__ import annotations
-
-import pytest
-import torch
-
-from lerobot.configs import FeatureType, PipelineFeatureType, PolicyFeature
-from lerobot.rewards.topreward.processor_topreward import (
-    TOPREWARD_FEATURE_PREFIX,
-    TOPREWARD_INPUT_KEYS,
-    _expand_tasks,
-    _prepare_video_batch,
-)
-from lerobot.types import TransitionKey
-from tests.utils import skip_if_package_missing
-
-# ---------------------------------------------------------------------------
-# _prepare_video_batch — raw image/video batch -> (B, T, C, H, W) uint8
-# ---------------------------------------------------------------------------
-
-
-def test_prepare_video_batch_batched_chw_float_is_converted_to_uint8():
-    video = torch.rand(2, 4, 3, 8, 8)
-    tensor = _prepare_video_batch(video, max_frames=None)
-
-    assert tensor.shape == (2, 4, 3, 8, 8)
-    assert tensor.dtype == torch.uint8
-    assert tensor.min() >= 0 and tensor.max() <= 255
-
-
-def test_prepare_video_batch_batched_thwc_uint8_is_permuted_to_channel_first():
-    video = torch.randint(0, 256, (2, 3, 8, 8, 3), dtype=torch.uint8)
-    tensor = _prepare_video_batch(video, max_frames=None)
-
-    assert tensor.shape == (2, 3, 3, 8, 8)
-    assert tensor.dtype == torch.uint8
-
-
-def test_prepare_video_batch_max_frames_tail_crops_recent_frames():
-    video = torch.zeros(1, 10, 3, 4, 4)
-    for t in range(10):
-        video[:, t] = t / 9.0
-
-    tensor = _prepare_video_batch(video, max_frames=3)
-
-    assert tensor.shape == (1, 3, 3, 4, 4)
-    assert int(tensor[0, 0, 0, 0, 0]) == int(7 / 9 * 255)
-    assert int(tensor[0, -1, 0, 0, 0]) == 255
-
-
-def test_prepare_video_batch_rejects_3d_input():
-    with pytest.raises(ValueError, match="Expected TOPReward frames"):
-        _prepare_video_batch(torch.zeros(4, 8, 8), max_frames=None)
-
-
-def test_prepare_video_batch_floats_above_one_are_rescaled_and_clipped():
-    video = torch.full((1, 1, 3, 2, 2), 5.0)
-    tensor = _prepare_video_batch(video, max_frames=None)
-
-    assert tensor.shape == (1, 1, 3, 2, 2)
-    assert int(tensor.max()) == 255
-
-
-def test_prepare_video_batch_clips_very_large_floats_to_uint8_max():
-    video = torch.full((1, 1, 3, 2, 2), 300.0)
-    tensor = _prepare_video_batch(video, max_frames=None)
-
-    assert int(tensor.max()) == 255
-
-
-# ---------------------------------------------------------------------------
-# _expand_tasks — string / list / tuple broadcasting to batch size
-# ---------------------------------------------------------------------------
-
-
-def test_expand_tasks_string_is_broadcast_to_batch_size():
-    assert _expand_tasks("pick up", batch_size=3, default=None) == ["pick up", "pick up", "pick up"]
-
-
-def test_expand_tasks_list_of_matching_size_passes_through():
-    assert _expand_tasks(["a", "b", "c"], batch_size=3, default=None) == ["a", "b", "c"]
-
-
-def test_expand_tasks_tuple_is_normalised_to_list():
-    assert _expand_tasks(("a", "b"), batch_size=2, default=None) == ["a", "b"]
-
-
-def test_expand_tasks_single_element_list_is_broadcast():
-    assert _expand_tasks(["only one"], batch_size=3, default=None) == ["only one"] * 3
-
-
-def test_expand_tasks_size_mismatch_raises():
-    with pytest.raises(ValueError, match="Expected 3 tasks"):
-        _expand_tasks(["a", "b"], batch_size=3, default=None)
-
-
-def test_expand_tasks_missing_uses_default():
-    assert _expand_tasks(None, batch_size=2, default="fallback") == ["fallback", "fallback"]
-
-
-def test_expand_tasks_missing_without_default_raises():
-    with pytest.raises(KeyError, match="task description"):
-        _expand_tasks(None, batch_size=1, default=None)
-
-
-def test_expand_tasks_wrong_type_raises():
-    with pytest.raises(TypeError, match="must be a string or list"):
-        _expand_tasks(42, batch_size=1, default=None)
-
-
-# ---------------------------------------------------------------------------
-# Encoder step — stubbed AutoProcessor
-# ---------------------------------------------------------------------------
-
-
-def _skip_if_topreward_extras_missing(func):
-    func = skip_if_package_missing("transformers")(func)
-    return func
-
-
-class _FakeTokenizer:
-    eos_token = "<|endoftext|>"
-    pad_token = "<|endoftext|>"
-
-    def __call__(self, *args, **kwargs):
-        return {"input_ids": torch.zeros(1, 10, dtype=torch.long)}
-
-
-class _FakeAutoProcessor:
-    def __init__(self) -> None:
-        self.tokenizer = _FakeTokenizer()
-
-    @classmethod
-    def from_pretrained(cls, *args, **kwargs):  # noqa: ARG003
-        return cls()
-
-    def apply_chat_template(self, messages, **kwargs):  # noqa: ARG002
-        return "fake_prompt_text"
-
-    def __call__(self, text=None, images=None, videos=None, **kwargs):  # noqa: ARG002
-        seq_len = 10
-        batch_size = len(text) if isinstance(text, list) else 1
-        return {
-            "input_ids": torch.randint(0, 100, (batch_size, seq_len)),
-            "attention_mask": torch.ones(batch_size, seq_len, dtype=torch.long),
-            "pixel_values_videos": torch.zeros(batch_size, 1536, dtype=torch.float32),
-            "video_grid_thw": torch.ones(batch_size, 3, dtype=torch.long),
-            "mm_token_type_ids": torch.zeros(batch_size, seq_len, dtype=torch.long),
-        }
-
-
-def _build_step(monkeypatch, **overrides):
-    from lerobot.rewards.topreward import processor_topreward
-
-    monkeypatch.setattr(processor_topreward, "AutoProcessor", _FakeAutoProcessor)
-    return processor_topreward.TOPRewardEncoderProcessorStep(**overrides)
-
-
-def _make_transition(observation: dict, complementary: dict | None = None) -> dict:
-    transition: dict = {TransitionKey.OBSERVATION: observation}
-    if complementary is not None:
-        transition[TransitionKey.COMPLEMENTARY_DATA] = complementary
-    return transition
-
-
-@_skip_if_topreward_extras_missing
-def test_encoder_step_emits_input_ids_and_labels(monkeypatch):
-    """The processor must emit Qwen-VL tensors including ``input_ids`` and
-    ``labels`` under the ``observation.topreward.*`` namespace."""
-    step = _build_step(monkeypatch)
-
-    frames_batch = torch.zeros(2, 4, 3, 8, 8)
-    out = step(
-        _make_transition(
-            observation={"observation.images.top": frames_batch},
-            complementary={"task": ["pick", "place"]},
-        )
-    )
-
-    obs_out = out[TransitionKey.OBSERVATION]
-    for key in TOPREWARD_INPUT_KEYS:
-        assert f"{TOPREWARD_FEATURE_PREFIX}{key}" in obs_out
-
-    input_ids = obs_out[f"{TOPREWARD_FEATURE_PREFIX}input_ids"]
-    labels = obs_out[f"{TOPREWARD_FEATURE_PREFIX}labels"]
-    assert labels.dtype == torch.long
-    assert labels.shape == (2, 10)
-    assert labels[:, :-1].eq(-100).all()
-    assert labels[:, -1].equal(input_ids[:, -1])
-
-
-@_skip_if_topreward_extras_missing
-def test_encoder_step_get_config_roundtrips_user_fields(monkeypatch):
-    step = _build_step(
-        monkeypatch,
-        vlm_name="Qwen/Qwen3-VL-8B-Instruct",
-        image_key="observation.images.cam_top",
-        task_key="task",
-        default_task="do the thing",
-        max_frames=8,
-        fps=4.0,
-        add_chat_template=True,
-        max_length=2048,
-    )
-
-    cfg = step.get_config()
-    assert cfg["vlm_name"] == "Qwen/Qwen3-VL-8B-Instruct"
-    assert cfg["image_key"] == "observation.images.cam_top"
-    assert cfg["default_task"] == "do the thing"
-    assert cfg["max_frames"] == 8
-    assert cfg["fps"] == 4.0
-    assert cfg["add_chat_template"] is True
-    assert cfg["max_length"] == 2048
-
-
-@_skip_if_topreward_extras_missing
-def test_encoder_step_transform_features_is_identity(monkeypatch):
-    step = _build_step(monkeypatch)
-    features = {
-        PipelineFeatureType.OBSERVATION: {
-            "observation.images.top": PolicyFeature(shape=(3, 224, 224), type=FeatureType.VISUAL),
-        }
-    }
-    assert step.transform_features(features) == features
-
-
-@_skip_if_topreward_extras_missing
-def test_encoder_step_rejects_missing_image_key(monkeypatch):
-    step = _build_step(monkeypatch, image_key="observation.images.top")
-    with pytest.raises(KeyError, match="image key"):
-        step(_make_transition(observation={}, complementary={"task": "pick"}))
--- a/tests/robots/test_rebot_b601_follower.py
+++ b/tests/robots/test_rebot_b601_follower.py
@@ -1,116 +0,0 @@
-#!/usr/bin/env python
-
-# Copyright 2026 The HuggingFace Inc. team. All rights reserved.
-#
-# Licensed under the Apache License, Version 2.0 (the "License");
-# you may not use this file except in compliance with the License.
-# You may obtain a copy of the License at
-#
-#     http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-
-import math
-from unittest.mock import MagicMock, patch
-
-import pytest
-
-from lerobot.robots.bi_rebot_b601_follower import BiRebotB601Follower, BiRebotB601FollowerConfig
-from lerobot.robots.rebot_b601_follower import (
-    RebotB601Follower,
-    RebotB601FollowerConfig,
-    RebotB601FollowerRobotConfig,
-)
-
-_MODULE = "lerobot.robots.rebot_b601_follower.rebot_b601_follower"
-
-
-def _make_motor_mock(position_rad: float = 0.0) -> MagicMock:
-    motor = MagicMock(name="MotorMock")
-    state = MagicMock()
-    state.pos = position_rad
-    motor.get_state.return_value = state
-    return motor
-
-
-def _make_bus_mock() -> MagicMock:
-    bus = MagicMock(name="MotorBridgeControllerMock")
-    # add_damiao_motor returns a fresh motor mock; position encodes the call order.
-    bus._motor_count = 0
-
-    def _add_motor(_send_id, _recv_id, _model):
-        bus._motor_count += 1
-        return _make_motor_mock(position_rad=math.radians(bus._motor_count))
-
-    bus.add_damiao_motor.side_effect = _add_motor
-    return bus
-
-
-@pytest.fixture
-def follower():
-    bus_mock = _make_bus_mock()
-    with (
-        patch(f"{_MODULE}.require_package", lambda *a, **kw: None),
-        patch(f"{_MODULE}.MotorBridgeController") as controller_cls,
-        patch(f"{_MODULE}.MotorBridgeMode", MagicMock()),
-    ):
-        controller_cls.from_dm_serial.return_value = bus_mock
-        cfg = RebotB601FollowerRobotConfig(port="/dev/null")
-        robot = RebotB601Follower(cfg)
-        robot.connect(calibrate=False)
-        yield robot
-        if robot.is_connected:
-            robot.disconnect()
-
-
-def test_features_match_joints():
-    with patch(f"{_MODULE}.require_package", lambda *a, **kw: None):
-        robot = RebotB601Follower(RebotB601FollowerRobotConfig(port="/dev/null"))
-    expected = {f"{m}.pos" for m in robot.motor_names}
-    assert set(robot.action_features) == expected
-    assert set(robot.observation_features) == expected
-    assert "gripper.pos" in expected
-
-
-def test_connect_disconnect(follower):
-    assert follower.is_connected
-    follower.disconnect()
-    assert not follower.is_connected
-
-
-def test_get_observation_converts_to_degrees(follower):
-    obs = follower.get_observation()
-    assert set(obs) == {f"{m}.pos" for m in follower.motor_names}
-    # The bus mock seeds each motor's position with its 1-indexed creation order (radians).
-    for idx, motor in enumerate(follower.motor_names, 1):
-        assert obs[f"{motor}.pos"] == pytest.approx(math.degrees(math.radians(idx)))
-
-
-def test_send_action_clips_to_joint_limits(follower):
-    # shoulder_pan limit is (-145, 145); request beyond the upper bound.
-    returned = follower.send_action({"shoulder_pan.pos": 999.0})
-    assert returned["shoulder_pan.pos"] == 145.0
-    follower.motors["shoulder_pan"].send_pos_vel.assert_called_once()
-
-
-def test_send_action_routes_gripper_to_force_pos(follower):
-    follower.send_action({"gripper.pos": -10.0})
-    follower.motors["gripper"].send_force_pos.assert_called_once()
-    follower.motors["gripper"].send_pos_vel.assert_not_called()
-
-
-def test_bimanual_prefixes_features():
-    with patch(f"{_MODULE}.require_package", lambda *a, **kw: None):
-        cfg = BiRebotB601FollowerConfig(
-            left_arm_config=RebotB601FollowerConfig(port="/dev/null0"),
-            right_arm_config=RebotB601FollowerConfig(port="/dev/null1"),
-        )
-        robot = BiRebotB601Follower(cfg)
-    assert any(k.startswith("left_") for k in robot.action_features)
-    assert any(k.startswith("right_") for k in robot.action_features)
-    assert "left_gripper.pos" in robot.action_features
-    assert "right_gripper.pos" in robot.action_features
--- a/tests/teleoperators/test_rebot_102_leader.py
+++ b/tests/teleoperators/test_rebot_102_leader.py
@@ -1,102 +0,0 @@
-#!/usr/bin/env python
-
-# Copyright 2026 The HuggingFace Inc. team. All rights reserved.
-#
-# Licensed under the Apache License, Version 2.0 (the "License");
-# you may not use this file except in compliance with the License.
-# You may obtain a copy of the License at
-#
-#     http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-
-from unittest.mock import MagicMock, patch
-
-import pytest
-
-from lerobot.teleoperators.bi_rebot_102_leader import BiRebotArm102Leader, BiRebotArm102LeaderConfig
-from lerobot.teleoperators.rebot_102_leader import (
-    RebotArm102Leader,
-    RebotArm102LeaderConfig,
-    RebotArm102LeaderTeleopConfig,
-)
-
-_MODULE = "lerobot.teleoperators.rebot_102_leader.rebot_102_leader"
-
-
-def _make_bus_mock(joint_ids: dict[str, int]) -> MagicMock:
-    bus = MagicMock(name="FashionStarServoMock")
-    bus.ping.return_value = True
-
-    def _sync_monitor(ids):
-        # Report each servo at 5 degrees raw.
-        monitors = {}
-        for servo_id in ids:
-            monitor = MagicMock()
-            monitor.angle_deg = 5.0
-            monitors[servo_id] = monitor
-        return monitors
-
-    bus.sync_monitor.side_effect = _sync_monitor
-    return bus
-
-
-@pytest.fixture
-def leader():
-    cfg = RebotArm102LeaderTeleopConfig(port="/dev/null")
-    bus_mock = _make_bus_mock(cfg.joint_ids)
-    with (
-        patch(f"{_MODULE}.require_package", lambda *a, **kw: None),
-        patch(f"{_MODULE}.FashionStarServo", return_value=bus_mock),
-    ):
-        teleop = RebotArm102Leader(cfg)
-        teleop.connect(calibrate=False)
-        yield teleop
-        if teleop.is_connected:
-            teleop.disconnect()
-
-
-def test_action_features_match_joints():
-    with patch(f"{_MODULE}.require_package", lambda *a, **kw: None):
-        teleop = RebotArm102Leader(RebotArm102LeaderTeleopConfig(port="/dev/null"))
-    assert set(teleop.action_features) == {f"{m}.pos" for m in teleop.motor_names}
-    assert teleop.feedback_features == {}
-
-
-def test_connect_disconnect(leader):
-    assert leader.is_connected
-    leader.disconnect()
-    assert not leader.is_connected
-
-
-def test_get_action_applies_direction_and_clamp(leader):
-    action = leader.get_action()
-    assert set(action) == {f"{m}.pos" for m in leader.motor_names}
-    # shoulder_pan has direction -1, so a +5deg raw reading flips to -5deg.
-    assert action["shoulder_pan.pos"] == pytest.approx(-5.0)
-    # Every joint stays within its configured range.
-    for motor, value in action.items():
-        lo, hi = leader.config.joint_ranges[motor.removesuffix(".pos")]
-        assert lo <= value <= hi
-
-
-def test_send_feedback_not_implemented(leader):
-    with pytest.raises(NotImplementedError):
-        leader.send_feedback({})
-
-
-def test_bimanual_prefixes_features():
-    with patch(f"{_MODULE}.require_package", lambda *a, **kw: None):
-        cfg = BiRebotArm102LeaderConfig(
-            left_arm_config=RebotArm102LeaderConfig(port="/dev/null0"),
-            right_arm_config=RebotArm102LeaderConfig(port="/dev/null1"),
-        )
-        teleop = BiRebotArm102Leader(cfg)
-    assert any(k.startswith("left_") for k in teleop.action_features)
-    assert any(k.startswith("right_") for k in teleop.action_features)
-    assert "left_gripper.pos" in teleop.action_features
-    assert "right_gripper.pos" in teleop.action_features
--- a/tests/utils/test_collate.py
+++ b/tests/utils/test_collate.py
@@ -1,84 +0,0 @@
-#!/usr/bin/env python
-
-import pytest
-
-pytest.importorskip("datasets", reason="datasets is required (install lerobot[dataset])")
-
-import torch  # noqa: E402
-
-from lerobot.utils.collate import lerobot_collate_fn  # noqa: E402
-
-
-def test_lerobot_collate_preserves_messages_and_drops_raw_language():
-    batch = [
-        {
-            "index": torch.tensor(0),
-            "messages": [{"role": "assistant", "content": "a"}],
-            "message_streams": ["low_level"],
-            "target_message_indices": [0],
-            "language_persistent": [{"content": "raw"}],
-            "language_events": [],
-        },
-        {
-            "index": torch.tensor(1),
-            "messages": [{"role": "assistant", "content": "b"}],
-            "message_streams": ["low_level"],
-            "target_message_indices": [0],
-            "language_persistent": [{"content": "raw"}],
-            "language_events": [],
-        },
-    ]
-
-    out = lerobot_collate_fn(batch)
-
-    assert out["index"].tolist() == [0, 1]
-    assert out["messages"][0][0]["content"] == "a"
-    assert out["messages"][1][0]["content"] == "b"
-    assert out["message_streams"] == [["low_level"], ["low_level"]]
-    assert out["target_message_indices"] == [[0], [0]]
-    assert "language_persistent" not in out
-    assert "language_events" not in out
-
-
-def test_lerobot_collate_passes_through_standard_batch():
-    """On a non-language batch, the collate must match ``default_collate``.
-
-    Guards against silent regressions: ``lerobot_train.py`` only opts into
-    ``lerobot_collate_fn`` when the dataset declares language columns, but
-    if a future change ever wires it in unconditionally we want the
-    behavior to remain a transparent pass-through for ordinary tensor
-    batches.
-    """
-    from torch.utils.data._utils.collate import default_collate
-
-    batch = [
-        {
-            "observation.image": torch.zeros(3, 4, 4),
-            "action": torch.tensor([0.0, 1.0]),
-            "index": torch.tensor(0),
-        },
-        {
-            "observation.image": torch.ones(3, 4, 4),
-            "action": torch.tensor([2.0, 3.0]),
-            "index": torch.tensor(1),
-        },
-    ]
-
-    custom = lerobot_collate_fn(batch)
-    expected = default_collate(batch)
-
-    assert custom.keys() == expected.keys()
-    for key in expected:
-        assert torch.equal(custom[key], expected[key]), f"key={key} diverged"
-
-
-def test_lerobot_collate_drops_none_samples():
-    """Recipes that yielded no target message return ``None`` — those samples
-    must be filtered out, and an entirely-``None`` batch must collapse to ``None``.
-    """
-    batch = [None, {"index": torch.tensor(0)}, None]
-    out = lerobot_collate_fn(batch)
-    assert out is not None
-    assert out["index"].tolist() == [0]
-
-    assert lerobot_collate_fn([None, None]) is None
--- a/uv.lock
+++ b/uv.lock
@@ -1,5 +1,5 @@
 version = 1
-revision = 3
+revision = 2
 requires-python = ">=3.12"
 resolution-markers = [
    "(python_full_version >= '3.15' and platform_machine == 'AMD64' and sys_platform == 'linux') or (python_full_version >= '3.15' and platform_machine == 'x86_64' and sys_platform == 'linux')",
@@ -1142,7 +1142,7 @@ name = "decord"
 version = "0.6.0"
 source = { registry = "https://pypi.org/simple" }
 dependencies = [
-    { name = "numpy", marker = "(platform_machine != 'arm64' and platform_machine != 's390x' and sys_platform == 'darwin') or (platform_machine == 'AMD64' and sys_platform == 'linux') or (platform_machine == 'x86_64' and sys_platform == 'linux') or (platform_machine != 's390x' and sys_platform != 'darwin' and sys_platform != 'linux')" },
+    { name = "numpy", marker = "(platform_machine != 'arm64' and sys_platform == 'darwin') or (platform_machine == 'AMD64' and sys_platform == 'linux') or (platform_machine == 'x86_64' and sys_platform == 'linux') or (sys_platform != 'darwin' and sys_platform != 'linux')" },
 ]
 wheels = [
    { url = "https://files.pythonhosted.org/packages/11/79/936af42edf90a7bd4e41a6cac89c913d4b47fa48a26b042d5129a9242ee3/decord-0.6.0-py3-none-manylinux2010_x86_64.whl", hash = "sha256:51997f20be8958e23b7c4061ba45d0efcd86bffd5fe81c695d0befee0d442976", size = 13602299, upload-time = "2021-06-14T21:30:55.486Z" },
@@ -2710,8 +2710,6 @@ all = [
    { name = "matplotlib" },
    { name = "metaworld" },
    { name = "mock-serial", marker = "sys_platform != 'win32'" },
-    { name = "motorbridge" },
-    { name = "motorbridge-smart-servo" },
    { name = "mypy" },
    { name = "num2words" },
    { name = "pandas" },
@@ -2915,12 +2913,6 @@ metaworld = [
    { name = "scipy" },
    { name = "torchcodec", marker = "(platform_machine == 'arm64' and sys_platform == 'darwin') or (platform_machine == 'AMD64' and sys_platform == 'linux') or (platform_machine == 'aarch64' and sys_platform == 'linux') or (platform_machine == 'arm64' and sys_platform == 'linux') or (platform_machine == 'x86_64' and sys_platform == 'linux') or sys_platform == 'win32'" },
 ]
-motorbridge-dep = [
-    { name = "motorbridge" },
-]
-motorbridge-smart-servo-dep = [
-    { name = "motorbridge-smart-servo" },
-]
 multi-task-dit = [
    { name = "diffusers" },
    { name = "transformers" },
@@ -2980,10 +2972,6 @@ qwen-vl-utils-dep = [
 reachy2 = [
    { name = "reachy2-sdk" },
 ]
-rebot = [
-    { name = "motorbridge" },
-    { name = "motorbridge-smart-servo" },
-]
 robstride = [
    { name = "python-can" },
 ]
@@ -3009,9 +2997,6 @@ test = [
    { name = "pytest-cov" },
    { name = "pytest-timeout" },
 ]
-topreward = [
-    { name = "transformers" },
-]
 training = [
    { name = "accelerate" },
    { name = "av" },
@@ -3060,7 +3045,7 @@ requires-dist = [
    { name = "av", marker = "extra == 'av-dep'", specifier = ">=15.0.0,<16.0.0" },
    { name = "cmake", specifier = ">=3.29.0.1,<4.2.0" },
    { name = "contourpy", marker = "extra == 'matplotlib-dep'", specifier = ">=1.3.0,<2.0.0" },
-    { name = "datasets", marker = "extra == 'dataset'", specifier = ">=4.7.0,<5.0.0" },
+    { name = "datasets", marker = "extra == 'dataset'", specifier = ">=4.0.0,<5.0.0" },
    { name = "debugpy", marker = "extra == 'dev'", specifier = ">=1.8.1,<1.9.0" },
    { name = "decord", marker = "(platform_machine == 'AMD64' and extra == 'groot') or (platform_machine == 'x86_64' and extra == 'groot')", specifier = ">=0.6.0,<1.0.0" },
    { name = "deepdiff", marker = "extra == 'deepdiff-dep'", specifier = ">=7.0.1,<9.0.0" },
@@ -3131,8 +3116,6 @@ requires-dist = [
    { name = "lerobot", extras = ["matplotlib-dep"], marker = "extra == 'sarm'" },
    { name = "lerobot", extras = ["matplotlib-dep"], marker = "extra == 'unitree-g1'" },
    { name = "lerobot", extras = ["metaworld"], marker = "extra == 'all'" },
-    { name = "lerobot", extras = ["motorbridge-dep"], marker = "extra == 'rebot'" },
-    { name = "lerobot", extras = ["motorbridge-smart-servo-dep"], marker = "extra == 'rebot'" },
    { name = "lerobot", extras = ["multi-task-dit"], marker = "extra == 'all'" },
    { name = "lerobot", extras = ["notebook"], marker = "extra == 'dev'" },
    { name = "lerobot", extras = ["openarms"], marker = "extra == 'all'" },
@@ -3159,7 +3142,6 @@ requires-dist = [
    { name = "lerobot", extras = ["qwen-vl-utils-dep"], marker = "extra == 'sarm'" },
    { name = "lerobot", extras = ["qwen-vl-utils-dep"], marker = "extra == 'wallx'" },
    { name = "lerobot", extras = ["reachy2"], marker = "extra == 'all'" },
-    { name = "lerobot", extras = ["rebot"], marker = "extra == 'all'" },
    { name = "lerobot", extras = ["robstride"], marker = "extra == 'all'" },
    { name = "lerobot", extras = ["sarm"], marker = "extra == 'all'" },
    { name = "lerobot", extras = ["scipy-dep"], marker = "extra == 'aloha'" },
@@ -3170,7 +3152,6 @@ requires-dist = [
    { name = "lerobot", extras = ["scipy-dep"], marker = "extra == 'wallx'" },
    { name = "lerobot", extras = ["smolvla"], marker = "extra == 'all'" },
    { name = "lerobot", extras = ["test"], marker = "extra == 'all'" },
-    { name = "lerobot", extras = ["topreward"], marker = "extra == 'all'" },
    { name = "lerobot", extras = ["training"], marker = "extra == 'all'" },
    { name = "lerobot", extras = ["transformers-dep"], marker = "extra == 'eo1'" },
    { name = "lerobot", extras = ["transformers-dep"], marker = "extra == 'groot'" },
@@ -3181,7 +3162,6 @@ requires-dist = [
    { name = "lerobot", extras = ["transformers-dep"], marker = "extra == 'pi'" },
    { name = "lerobot", extras = ["transformers-dep"], marker = "extra == 'sarm'" },
    { name = "lerobot", extras = ["transformers-dep"], marker = "extra == 'smolvla'" },
-    { name = "lerobot", extras = ["transformers-dep"], marker = "extra == 'topreward'" },
    { name = "lerobot", extras = ["transformers-dep"], marker = "extra == 'wallx'" },
    { name = "lerobot", extras = ["transformers-dep"], marker = "extra == 'xvla'" },
    { name = "lerobot", extras = ["video-benchmark"], marker = "extra == 'all'" },
@@ -3194,8 +3174,6 @@ requires-dist = [
    { name = "meshcat", marker = "extra == 'unitree-g1'", specifier = ">=0.3.0,<0.4.0" },
    { name = "metaworld", marker = "extra == 'metaworld'", specifier = "==3.0.0" },
    { name = "mock-serial", marker = "sys_platform != 'win32' and extra == 'test'", specifier = ">=0.0.1,<0.1.0" },
-    { name = "motorbridge", marker = "extra == 'motorbridge-dep'", specifier = ">=0.3.2,<0.4.0" },
-    { name = "motorbridge-smart-servo", marker = "extra == 'motorbridge-smart-servo-dep'", specifier = ">=0.0.4,<0.1.0" },
    { name = "mypy", marker = "extra == 'dev'", specifier = ">=1.19.1" },
    { name = "ninja", marker = "extra == 'groot'", specifier = ">=1.11.1,<2.0.0" },
    { name = "num2words", marker = "extra == 'smolvla'", specifier = ">=0.5.14,<0.6.0" },
@@ -3249,7 +3227,7 @@ requires-dist = [
    { name = "transformers", marker = "extra == 'transformers-dep'", specifier = ">=5.4.0,<5.6.0" },
    { name = "wandb", marker = "extra == 'training'", specifier = ">=0.24.0,<0.25.0" },
 ]
-provides-extras = ["dataset", "training", "hardware", "viz", "core-scripts", "evaluation", "dataset-viz", "av-dep", "pygame-dep", "placo-dep", "transformers-dep", "grpcio-dep", "can-dep", "peft-dep", "scipy-dep", "diffusers-dep", "qwen-vl-utils-dep", "matplotlib-dep", "pyserial-dep", "deepdiff-dep", "pynput-dep", "pyzmq-dep", "motorbridge-dep", "motorbridge-smart-servo-dep", "feetech", "dynamixel", "damiao", "robstride", "openarms", "gamepad", "hopejr", "lekiwi", "unitree-g1", "reachy2", "rebot", "kinematics", "intelrealsense", "phone", "diffusion", "wallx", "pi", "smolvla", "multi-task-dit", "groot", "sarm", "topreward", "xvla", "eo1", "hilserl", "async", "peft", "dev", "notebook", "test", "video-benchmark", "aloha", "pusht", "libero", "metaworld", "all"]
+provides-extras = ["dataset", "training", "hardware", "viz", "core-scripts", "evaluation", "dataset-viz", "av-dep", "pygame-dep", "placo-dep", "transformers-dep", "grpcio-dep", "can-dep", "peft-dep", "scipy-dep", "diffusers-dep", "qwen-vl-utils-dep", "matplotlib-dep", "pyserial-dep", "deepdiff-dep", "pynput-dep", "pyzmq-dep", "feetech", "dynamixel", "damiao", "robstride", "openarms", "gamepad", "hopejr", "lekiwi", "unitree-g1", "reachy2", "kinematics", "intelrealsense", "phone", "diffusion", "wallx", "pi", "smolvla", "multi-task-dit", "groot", "sarm", "xvla", "eo1", "hilserl", "async", "peft", "dev", "notebook", "test", "video-benchmark", "aloha", "pusht", "libero", "metaworld", "all"]

 [[package]]
 name = "librt"
@@ -3675,35 +3653,6 @@ wheels = [
    { url = "https://files.pythonhosted.org/packages/98/c2/8c1e6bf77cf62a10203a107179e34e0965fc5369386e0b7034a247ed054d/mock_serial-0.0.1-py3-none-any.whl", hash = "sha256:b6b8cc10c302354bf3ca270a3d4d6bf199c4bbe41478c65046db8f30ea967675", size = 6080, upload-time = "2021-11-23T09:34:51.108Z" },
 ]

-[[package]]
-name = "motorbridge"
-version = "0.3.2"
-source = { registry = "https://pypi.org/simple" }
-sdist = { url = "https://files.pythonhosted.org/packages/58/f2/b824ac4d611c71020dccdb72fc50606e543c77c68455ea824b26d9a6de03/motorbridge-0.3.2.tar.gz", hash = "sha256:5cf85dd22c46c7f3c5e6981e90b1034af2deb1bc4e7d74c13074d1d4a7b75ceb", size = 30158, upload-time = "2026-05-18T07:13:17.239Z" }
-wheels = [
-    { url = "https://files.pythonhosted.org/packages/2c/1a/7d367039a8325c0e2796c14a1503dfc563e7b244c815b26e079114244b4b/motorbridge-0.3.2-cp312-cp312-macosx_11_0_arm64.whl", hash = "sha256:8ad158928e93fafd2a7814eaffe8e6ecbec4686f64c2df85f80d7979dfc82532", size = 1108065, upload-time = "2026-05-18T07:13:04.669Z" },
-    { url = "https://files.pythonhosted.org/packages/fe/d6/fafa2b8a3635a6fe7f6e8129e140a68d30f4d6438350a86e51b8198b7834/motorbridge-0.3.2-cp312-cp312-manylinux_2_17_aarch64.manylinux2014_aarch64.whl", hash = "sha256:2adde5f26ea4e37d05da6b41b03b637efa6c80db4676bc6dbdb91ac6e811e54a", size = 1184657, upload-time = "2026-05-18T07:13:06.081Z" },
-    { url = "https://files.pythonhosted.org/packages/d8/30/aca01e81ec523d37b98a1ce6e41688d31827625eb15ecf0cf0485d91d62c/motorbridge-0.3.2-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:a03b6dc0be80db7b47d3f190f8c6f4fc43b0b4089235283f53763153a6d4e58c", size = 1201394, upload-time = "2026-05-18T07:13:07.476Z" },
-    { url = "https://files.pythonhosted.org/packages/70/eb/97b2f93682a1ce67bad50e9b598af889be4a3156ebcec129ebb41fa44e5b/motorbridge-0.3.2-cp312-cp312-win_amd64.whl", hash = "sha256:b0657d47aa94f8535d0663538be4a86c46e314303fba513122d17612b584c6e6", size = 839087, upload-time = "2026-05-18T07:13:08.664Z" },
-    { url = "https://files.pythonhosted.org/packages/6e/b0/03246c25ae67c2b33bd19b5d11bae668bb8baa7d9cbd75b035a8bef61d62/motorbridge-0.3.2-cp313-cp313-macosx_11_0_arm64.whl", hash = "sha256:f305a69c7c3c91dca19c43084beb4cd30a93fd85ff35c712cc3fb0ae33a5c7d3", size = 1108065, upload-time = "2026-05-18T07:13:10.032Z" },
-    { url = "https://files.pythonhosted.org/packages/a9/40/b82d86fbfcc6b18946567f15a7d76d1c673d43bc0c8d268b668506811981/motorbridge-0.3.2-cp313-cp313-manylinux_2_17_aarch64.manylinux2014_aarch64.whl", hash = "sha256:778fdde2b12df20184fb8c8f4c7665919d969bd582589a267c7956d4c57336ad", size = 1184657, upload-time = "2026-05-18T07:13:11.812Z" },
-    { url = "https://files.pythonhosted.org/packages/f2/3e/90e41d798814db89605d9a021e0c182608aec3d40eef2be211427e2bb863/motorbridge-0.3.2-cp313-cp313-manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:eac3a2d27ca387e8d537ec148bea0c28b9517ff4fb9ea0b12f6e78c1e9a7faa4", size = 1201393, upload-time = "2026-05-18T07:13:13.396Z" },
-    { url = "https://files.pythonhosted.org/packages/34/75/3c9ba7514fd0ec330c1fe0b4d76dedfd221abc1b750fe063b6e3f9a88075/motorbridge-0.3.2-cp313-cp313-win_amd64.whl", hash = "sha256:d7d1eb76ae29e8673a320fd1a86b944fb0869129fd4114f0983e43cd48f67372", size = 839087, upload-time = "2026-05-18T07:13:14.555Z" },
-    { url = "https://files.pythonhosted.org/packages/87/33/6787dd22914291a640c2821f175abc7cbb9a1e0fe6c1143f92d7ac362903/motorbridge-0.3.2-cp314-cp314-win_amd64.whl", hash = "sha256:c5f05e36c6607d2145f38fb6f1f11090bb01dbd1012e8251b0d2ae4d60fa4f50", size = 870167, upload-time = "2026-05-18T07:13:15.898Z" },
-]
-
-[[package]]
-name = "motorbridge-smart-servo"
-version = "0.0.4"
-source = { registry = "https://pypi.org/simple" }
-sdist = { url = "https://files.pythonhosted.org/packages/e6/56/45af87189dc49abbe46157b792b7c71f502a5f819f04e7485de0cfa52d9b/motorbridge_smart_servo-0.0.4.tar.gz", hash = "sha256:fb65f3f6e765e6b1915071c255caaf112fad3796fa1761aeee0132d15b8a0989", size = 20415, upload-time = "2026-05-08T09:24:57.563Z" }
-wheels = [
-    { url = "https://files.pythonhosted.org/packages/e9/ee/bec4b3acf55cd18e7db83a6d951caccf699533dbd038c1f0b5f2d16d5208/motorbridge_smart_servo-0.0.4-cp39-abi3-macosx_11_0_arm64.whl", hash = "sha256:8bc1f034fa9f96e23229a834db6e7cfe1368dba7b9a2a6f6dbd316448c4390dc", size = 304384, upload-time = "2026-05-08T09:24:52.619Z" },
-    { url = "https://files.pythonhosted.org/packages/3f/d2/71c87063b826433553ce8869b99df3e4f191b107710dd5c905e637512b10/motorbridge_smart_servo-0.0.4-cp39-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl", hash = "sha256:348cef6a647e5c7f9cc8e8ce1f3c806af4522e1087172bac2f8a1a0daa3592b6", size = 345668, upload-time = "2026-05-08T09:24:53.735Z" },
-    { url = "https://files.pythonhosted.org/packages/9b/6b/e65e7227a510236c6334cf054c501d3de2cbd463f4c594e42c6e965d5143/motorbridge_smart_servo-0.0.4-cp39-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:8c1982643c496c9f425fa9238f9a92ba601d77f4f2279df68c6868e7b997cbe1", size = 348123, upload-time = "2026-05-08T09:24:55.191Z" },
-    { url = "https://files.pythonhosted.org/packages/2d/fa/539ea123a5660c22c5e5cdad62d7bc5e931c816a0ffd402ae6e4623ab45b/motorbridge_smart_servo-0.0.4-cp39-abi3-win_amd64.whl", hash = "sha256:ea3baa9ba25bcec5541f3d86d73a3406ba2fcffe5dbf900c22e058638fc31ab0", size = 194130, upload-time = "2026-05-08T09:24:56.369Z" },
-]
-
 [[package]]
 name = "mpmath"
 version = "1.3.0"
Author	SHA1	Message	Date
Caroline Pascal	3ab08a5318	fix(imports): fixing av import in test_depth.py	2026-05-22 15:13:15 +02:00
CarolinePascal	5e53e6bd2f	tests(typos): fixing typos in tests	2026-05-22 13:09:56 +02:00
CarolinePascal	a94d9f119c	fix(info): fixing info metadata update when is_depth_map was set	2026-05-22 02:48:30 +02:00
CarolinePascal	8a615070e7	fix(pre-commit): fixing mutable defautl value	2026-05-22 02:07:33 +02:00
CarolinePascal	8e56797287	feat(refactor): refactor DepthEncoderConfig quantization pipeline, so that the methods do not live in the config class. Add pixel format - channels validation.Move the default pixel format for depth in the config file.	2026-05-22 02:06:37 +02:00
CarolinePascal	7498f1cf61	feat(pix_fmt channels): use PyAv to check get pixel formats number of channels	2026-05-22 02:03:23 +02:00
CarolinePascal	72a429764a	tests(depth): adding new tests for depth integration validation	2026-05-21 20:20:40 +02:00
CarolinePascal	4ea8653ca3	test(fix): fixing exisiting tests to still work with latest features	2026-05-21 19:56:00 +02:00
CarolinePascal	eeabb4d258	chore(typos): fixing typos	2026-05-21 19:55:33 +02:00
CarolinePascal	2b8d7b3c06	fix(plumbing): fixing missing parts in the depth maps pipeline	2026-05-21 16:11:01 +02:00
CarolinePascal	4a49f4a391	fix(stop_event): fixing stop_event race condition in camera classes	2026-05-21 15:51:12 +02:00
CarolinePascal	15647f50a2	feat(is_depth): simplifying is_depth nested name + legacy support	2026-05-21 14:26:16 +02:00
CarolinePascal	e87933302d	feat(depth shape): ensuring depth maps shape is always including the channel	2026-05-21 14:25:42 +02:00
CarolinePascal	3cf5e3c8cb	chore(format): format code	2026-05-20 16:47:22 +02:00
CarolinePascal	33a3b5a982	feat(depth maps writer): adding support for raw depth maps recording with image writer	2026-05-20 16:42:16 +02:00
CarolinePascal	1dafb4acf6	feat(viz): render depth observations as rr.DepthImage in Viridis	2026-05-20 16:22:34 +02:00
CarolinePascal	14df709201	feat(record): plumb DepthEncoderConfig through lerobot-record	2026-05-20 16:14:14 +02:00
CarolinePascal	d6f97ae17f	feat(robots/so_follower): emit + populate depth keys when use_depth	2026-05-20 16:09:53 +02:00
CarolinePascal	085f574301	feat(features): route 2D camera shapes to observation.depth.<key>	2026-05-20 15:50:46 +02:00
CarolinePascal	f15348e769	feat(cameras/realsense): expose async depth in metric meters	2026-05-20 15:24:47 +02:00
CarolinePascal	e51d45dd2c	feat(depth): wire DatasetReader to decode_depth_frames	2026-05-19 23:46:28 +02:00
CarolinePascal	d39698da0f	feat(depth): wire StreamingVideoEncoder + writer to depth encoder	2026-05-19 23:23:27 +02:00
CarolinePascal	b4c31f0f67	feat(depth): plumb DepthEncoderConfig through LeRobotDataset and DatasetWriter	2026-05-19 22:50:19 +02:00
CarolinePascal	0cc5162078	feat(depth): extend quantization tools to better fit the encoding/decoding pipeline	2026-05-19 17:10:47 +02:00
CarolinePascal	b960524d93	feat(depth): persist depth metadata	2026-05-19 16:13:14 +02:00
CarolinePascal	088352383d	feat(video): add ffv1 to supported codecs	2026-05-19 16:13:01 +02:00
CarolinePascal	42214d1c7a	feat(depth): add depth quantization helpers and tests	2026-05-18 18:09:37 +02:00