docs/source/tools.mdx

# Tools

LeRobot v3.1 supports **tool calls** in policies — assistant messages can
emit structured invocations like `say(text="OK, starting now")` that the
runtime dispatches to a real implementation (TTS, controller, logger, …).

This page covers:

1. Where the tool catalog lives (PR 1).
2. How the annotation pipeline produces tool-call atoms (PR 2).
3. How to add your own tool (PR 3).

## Where tools are declared

Two layers.

**The catalog** — a list of OpenAI-style function schemas — lives at
`meta/info.json["tools"]` on each dataset. Example:

```json
{
  "features": { "...": "..." },
  "tools": [
    {
      "type": "function",
      "function": {
        "name": "say",
        "description": "Speak a short utterance to the user via the TTS executor.",
        "parameters": {
          "type": "object",
          "properties": {
            "text": { "type": "string", "description": "The verbatim text to speak." }
          },
          "required": ["text"]
        }
      }
    }
  ]
}
```

Read it via the dataset metadata accessor:

```python
from lerobot.datasets.dataset_metadata import LeRobotDatasetMetadata

meta = LeRobotDatasetMetadata(repo_id="pepijn/super_poulain_final_annotations")
tools = meta.tools     # list[dict] — OpenAI tool schemas
```

If the dataset's `info.json` doesn't declare any tools, `meta.tools`
returns `DEFAULT_TOOLS` from `lerobot.datasets.language` — currently a
single-entry list with the canonical `say` schema. So unannotated
datasets and chat-template consumers keep working without any
configuration:

```python
prompt_str = tokenizer.apply_chat_template(
    sample["messages"],
    tools=meta.tools,                 # works either way
    add_generation_prompt=False,
    tokenize=False,
)
```

**The implementations** — runnable Python — live under
`src/lerobot/tools/`, one file per tool. The `say` implementation
arrives in PR 3 and wraps Kyutai's pocket-tts model.

## Per-row tool *invocations*

The catalog above describes *what can be called*. The actual *call* — the
function name plus the argument values — is stored per-row, on the
assistant atoms in `language_events`:

```python
{
  "role": "assistant",
  "content": null,
  "style": null,
  "timestamp": 12.4,
  "camera": null,
  "tool_calls": [
    { "type": "function",
      "function": { "name": "say", "arguments": { "text": "On it." } } }
  ]
}
```

Recipes splice these into rendered messages via `tool_calls_from`:

```yaml
user_interjection_response:
  bindings:
    speech: "emitted_at(t, role=assistant, tool_name=say)"
  messages:
    - { role: user,      content: "${task}",         stream: high_level }
    - { role: assistant, content: "${current_plan}", stream: high_level,
        target: true, tool_calls_from: speech }
```

The model's training target is one assistant turn that carries both the
plan text *and* the `say` tool call. At inference, the runtime parses
the generated text back into structured `tool_calls` and dispatches to
the matching implementation.

## How to add your own tool

Three steps. Concrete example: a `record_observation` tool the policy
can call to capture an extra observation outside the regular control
loop.

### Step 1 — declare the schema

Add an entry under `meta/info.json["tools"]`. Either edit the file
directly on disk *before* running the annotation pipeline (it'll be
preserved) or hand it to `lerobot-annotate` via a config flag (PR 2 —
exact CLI lands with the pipeline change).

```json
{
  "tools": [
    { "type": "function", "function": { "name": "say", "...": "..." } },
    {
      "type": "function",
      "function": {
        "name": "record_observation",
        "description": "Capture a high-resolution still image for the user.",
        "parameters": {
          "type": "object",
          "properties": {
            "label": { "type": "string", "description": "Short label for the saved image." }
          },
          "required": ["label"]
        }
      }
    }
  ]
}
```

The schema follows OpenAI's function-calling convention exactly, so the
chat template can render it natively.

### Step 2 — implement the call

Create `src/lerobot/tools/record_observation.py`:

```python
from .base import Tool
from typing import Any

RECORD_OBSERVATION_SCHEMA: dict[str, Any] = { "...": "..." }   # mirrors the JSON above


class RecordObservationTool:
    name = "record_observation"
    schema = RECORD_OBSERVATION_SCHEMA

    def __init__(self, schema: dict | None = None, output_dir: str = "."):
        self.output_dir = output_dir

    def call(self, arguments: dict) -> str:
        label = arguments["label"]
        # ... save the latest camera frame to <output_dir>/<label>.png ...
        return f"saved {label}.png"
```

One file per tool keeps dependencies isolated — `record_observation`
might pull `pillow`, while `say` (PR 3) pulls `pocket-tts`. Users
installing only the tools they need avoid heavy transitive deps.

### Step 3 — register it

Add to `src/lerobot/tools/registry.py` (PR 3):

```python
from .record_observation import RecordObservationTool

TOOL_REGISTRY["record_observation"] = RecordObservationTool
```

That's it. At runtime `get_tools(meta)` looks up each schema in
`meta.tools`, instantiates the matching registered class, and returns
a name → instance dict the dispatcher can route into.

## Where this fits in the three-PR stack

| Layer | PR | What lands |
|---|---|---|
| Catalog storage in `meta/info.json` + `meta.tools` accessor | PR 1 | This page; `SAY_TOOL_SCHEMA`, `DEFAULT_TOOLS` constants in `lerobot.datasets.language`; `LeRobotDatasetMetadata.tools` property |
| Annotation pipeline writes `tools` to meta after a run; honors anything users pre-populated | PR 2 | `lerobot-annotate` ensures `meta/info.json["tools"]` includes the canonical `say` and merges any user-declared tools |
| Runnable implementations under `src/lerobot/tools/`; runtime dispatcher; `say.py` wired to Kyutai's pocket-tts | PR 3 | One file per tool; `Tool` protocol; `TOOL_REGISTRY`; optional `[tools]` extra in `pyproject.toml` |

If you want to use a tool *without* writing an implementation (e.g. for
training-time chat-template formatting only), step 1 alone is enough —
the model still learns to *generate* the call. Steps 2 and 3 are only
needed to actually *execute* it at inference.
feat(language): tool catalog in meta/info.json + LeRobotDatasetMetadata.tools Stores OpenAI-style function schemas at ``meta/info.json["tools"]`` so datasets can declare which tools are available (today: just ``say``; tomorrow: per-dataset extensions). The ``DEFAULT_TOOLS`` constant fills in for unannotated datasets so chat-template consumers don't have to special-case anything. Three pieces: - ``language.py``: ``SAY_TOOL_SCHEMA`` and ``DEFAULT_TOOLS`` constants. Single source of truth — PR 2's writer and PR 3's runtime tool registry will both import from here instead of duplicating the dict. - ``dataset_metadata.py``: ``LeRobotDatasetMetadata.tools`` property reads ``info.json["tools"]`` and falls back to ``DEFAULT_TOOLS``. Returns deep-copied dicts so callers can mutate the result safely. - ``docs/source/tools.mdx``: spec page covering the catalog, per-row invocations, and the three-step "how to add a new tool" workflow (declare schema, implement, register). Linked from the docs toctree under the Datasets section. This lays the groundwork for PR 2's pipeline writing the catalog out during annotation, and PR 3's ``src/lerobot/tools/`` package shipping runnable implementations (one file per tool — first up: ``say.py`` wrapping Kyutai's pocket-tts). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> 2026-04-30 18:44:58 +02:00			`# Tools`

			`LeRobot v3.1 supports tool calls in policies — assistant messages can`
			emit structured invocations like `say(text="OK, starting now")` that the
			`runtime dispatches to a real implementation (TTS, controller, logger, …).`

			`This page covers:`

			`1. Where the tool catalog lives (PR 1).`
			`2. How the annotation pipeline produces tool-call atoms (PR 2).`
			`3. How to add your own tool (PR 3).`

			`## Where tools are declared`

			`Two layers.`

			`The catalog — a list of OpenAI-style function schemas — lives at`
			`meta/info.json["tools"]` on each dataset. Example:

			```json
			`{`
			`"features": { "...": "..." },`
			`"tools": [`
			`{`
			`"type": "function",`
			`"function": {`
			`"name": "say",`
			`"description": "Speak a short utterance to the user via the TTS executor.",`
			`"parameters": {`
			`"type": "object",`
			`"properties": {`
			`"text": { "type": "string", "description": "The verbatim text to speak." }`
			`},`
			`"required": ["text"]`
			`}`
			`}`
			`}`
			`]`
			`}`
			```

			`Read it via the dataset metadata accessor:`

			```python
			`from lerobot.datasets.dataset_metadata import LeRobotDatasetMetadata`

			`meta = LeRobotDatasetMetadata(repo_id="pepijn/super_poulain_final_annotations")`
			`tools = meta.tools # list[dict] — OpenAI tool schemas`
			```

			If the dataset's `info.json` doesn't declare any tools, `meta.tools`
			returns `DEFAULT_TOOLS` from `lerobot.datasets.language` — currently a
			single-entry list with the canonical `say` schema. So unannotated
			`datasets and chat-template consumers keep working without any`
			`configuration:`

			```python
			`prompt_str = tokenizer.apply_chat_template(`
			`sample["messages"],`
			`tools=meta.tools, # works either way`
			`add_generation_prompt=False,`
			`tokenize=False,`
			`)`
			```

			`The implementations — runnable Python — live under`
			`src/lerobot/tools/`, one file per tool. The `say` implementation
			`arrives in PR 3 and wraps Kyutai's pocket-tts model.`

			`## Per-row tool invocations`

			`The catalog above describes what can be called. The actual call — the`
			`function name plus the argument values — is stored per-row, on the`
			assistant atoms in `language_events`:

			```python
			`{`
			`"role": "assistant",`
			`"content": null,`
			`"style": null,`
			`"timestamp": 12.4,`
			`"camera": null,`
			`"tool_calls": [`
			`{ "type": "function",`
			`"function": { "name": "say", "arguments": { "text": "On it." } } }`
			`]`
			`}`
			```

			Recipes splice these into rendered messages via `tool_calls_from`:

			```yaml
			`user_interjection_response:`
			`bindings:`
			`speech: "emitted_at(t, role=assistant, tool_name=say)"`
			`messages:`
			`- { role: user, content: "${task}", stream: high_level }`
			`- { role: assistant, content: "${current_plan}", stream: high_level,`
			`target: true, tool_calls_from: speech }`
			```

			`The model's training target is one assistant turn that carries both the`
			plan text and the `say` tool call. At inference, the runtime parses
			the generated text back into structured `tool_calls` and dispatches to
			`the matching implementation.`

			`## How to add your own tool`

			Three steps. Concrete example: a `record_observation` tool the policy
			`can call to capture an extra observation outside the regular control`
			`loop.`

			`### Step 1 — declare the schema`

			Add an entry under `meta/info.json["tools"]`. Either edit the file
			`directly on disk before running the annotation pipeline (it'll be`
			preserved) or hand it to `lerobot-annotate` via a config flag (PR 2 —
			`exact CLI lands with the pipeline change).`

			```json
			`{`
			`"tools": [`
			`{ "type": "function", "function": { "name": "say", "...": "..." } },`
			`{`
			`"type": "function",`
			`"function": {`
			`"name": "record_observation",`
			`"description": "Capture a high-resolution still image for the user.",`
			`"parameters": {`
			`"type": "object",`
			`"properties": {`
			`"label": { "type": "string", "description": "Short label for the saved image." }`
			`},`
			`"required": ["label"]`
			`}`
			`}`
			`}`
			`]`
			`}`
			```

			`The schema follows OpenAI's function-calling convention exactly, so the`
			`chat template can render it natively.`

			`### Step 2 — implement the call`

			Create `src/lerobot/tools/record_observation.py`:

			```python
			`from .base import Tool`
			`from typing import Any`

			`RECORD_OBSERVATION_SCHEMA: dict[str, Any] = { "...": "..." } # mirrors the JSON above`


			`class RecordObservationTool:`
			`name = "record_observation"`
			`schema = RECORD_OBSERVATION_SCHEMA`

			`def __init__(self, schema: dict \| None = None, output_dir: str = "."):`
			`self.output_dir = output_dir`

			`def call(self, arguments: dict) -> str:`
			`label = arguments["label"]`
			`# ... save the latest camera frame to <output_dir>/<label>.png ...`
			`return f"saved {label}.png"`
			```

			One file per tool keeps dependencies isolated — `record_observation`
			might pull `pillow`, while `say` (PR 3) pulls `pocket-tts`. Users
			`installing only the tools they need avoid heavy transitive deps.`

			`### Step 3 — register it`

			Add to `src/lerobot/tools/registry.py` (PR 3):

			```python
			`from .record_observation import RecordObservationTool`

			`TOOL_REGISTRY["record_observation"] = RecordObservationTool`
			```

			That's it. At runtime `get_tools(meta)` looks up each schema in
			`meta.tools`, instantiates the matching registered class, and returns
			`a name → instance dict the dispatcher can route into.`

			`## Where this fits in the three-PR stack`

			`\| Layer \| PR \| What lands \|`
			`\|---\|---\|---\|`
			\| Catalog storage in `meta/info.json` + `meta.tools` accessor \| PR 1 \| This page; `SAY_TOOL_SCHEMA`, `DEFAULT_TOOLS` constants in `lerobot.datasets.language`; `LeRobotDatasetMetadata.tools` property \|
			\| Annotation pipeline writes `tools` to meta after a run; honors anything users pre-populated \| PR 2 \| `lerobot-annotate` ensures `meta/info.json["tools"]` includes the canonical `say` and merges any user-declared tools \|
			\| Runnable implementations under `src/lerobot/tools/`; runtime dispatcher; `say.py` wired to Kyutai's pocket-tts \| PR 3 \| One file per tool; `Tool` protocol; `TOOL_REGISTRY`; optional `[tools]` extra in `pyproject.toml` \|

			`If you want to use a tool without writing an implementation (e.g. for`
			`training-time chat-template formatting only), step 1 alone is enough —`
			`the model still learns to generate the call. Steps 2 and 3 are only`
			`needed to actually execute it at inference.`