Files
lerobot-clone/src/lerobot/tools/base.py
Pepijn 52e1fd35cb feat(tools): src/lerobot/tools/ — runnable tool registry + SayTool
Ships the runtime side of the OpenAI-style function-calling stack
introduced in PR 1 (catalog in ``meta/info.json["tools"]``) and PR 2
(annotation pipeline writes the catalog after a run). One file per
tool — heavy deps stay isolated.

Layout:

- ``base.py`` — :class:`Tool` Protocol: ``name``, ``schema``,
  ``call(arguments)``. Runtime-checkable so tests can use
  ``isinstance(...)``.
- ``registry.py`` — :data:`TOOL_REGISTRY` (name → class) plus
  ``get_tools(meta, **kwargs)`` that instantiates every entry whose
  ``function.name`` is registered. Tools whose name is unknown are
  silently skipped — the schema still rides through the chat
  template, the model just can't actually invoke that tool at
  inference.
- ``say.py`` — :class:`SayTool` wrapping Kyutai's pocket-tts
  (CPU-only, ~100M params, ~6× real-time on a MacBook Air M4).
  Lazy model load: pocket-tts is imported and the voice state
  computed on first ``call(...)`` (or eagerly via ``preload()``).
  Returns the PCM tensor; optionally writes a ``.wav`` to
  ``output_dir`` for offline inspection.
- ``__init__.py`` — re-exports the public surface.

Optional install:

    pip install lerobot[tools]

The ``[tools]`` extra in ``pyproject.toml`` pulls in ``pocket-tts`` +
``scipy`` (for the wav writer). Adding more tools later means a new
file + a registry entry — no new extras unless the tool brings new
deps.

To add your own tool, follow the three-step guide in
``docs/source/tools.mdx`` (PR 1):

  1. Drop ``src/lerobot/tools/<my_tool>.py`` with a ``Tool``-conforming
     class.
  2. Register the class in ``TOOL_REGISTRY`` (this file).
  3. Pre-populate ``meta/info.json["tools"]`` with the schema (or let
     ``lerobot-annotate`` add it on the next run).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-30 18:58:04 +02:00

59 lines
2.4 KiB
Python

# Copyright 2026 The HuggingFace Inc. team. All rights reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
"""Tool protocol — the contract every runnable tool implementation honors.
Tools are the executable side of the OpenAI-style function-calling
abstraction the v3.1 language schema (PR 1) carries on assistant
messages: the schema describes *what can be called*, the tool
implementation describes *how to call it*.
Implementations live one-per-file under :mod:`lerobot.tools` (e.g.
``say.py`` for ``SayTool``) and are registered in
:mod:`lerobot.tools.registry`. The runtime instantiates them lazily so
heavy dependencies (torch models, audio backends, network clients,
hardware drivers) only load when the dataset actually declares the tool.
"""
from __future__ import annotations
from typing import Any, Protocol, runtime_checkable
@runtime_checkable
class Tool(Protocol):
"""Minimum surface every tool must expose."""
#: Name matching ``schema["function"]["name"]``. The runtime dispatcher
#: routes incoming ``tool_calls`` to the implementation by this key.
name: str
#: OpenAI-style function-call schema. Same dict the dataset stores in
#: ``meta/info.json["tools"]`` and the chat template renders into the
#: prompt.
schema: dict[str, Any]
def call(self, arguments: dict[str, Any]) -> Any:
"""Execute the tool with the model-provided arguments.
``arguments`` is the parsed dict from
``tool_calls[i]["function"]["arguments"]`` (already JSON-decoded
when the model emits a JSON-string by the chat-template
convention). Implementations validate the dict against their own
schema; the runtime only routes by name.
Return value is implementation-defined — typically a tensor
(TTS audio), a Path (saved file), a dict (structured result), or
``None`` (side-effect-only call).
"""