mirror of
https://github.com/huggingface/lerobot.git
synced 2026-05-31 19:01:28 +00:00
Ships the runtime side of the OpenAI-style function-calling stack
introduced in PR 1 (catalog in ``meta/info.json["tools"]``) and PR 2
(annotation pipeline writes the catalog after a run). One file per
tool — heavy deps stay isolated.
Layout:
- ``base.py`` — :class:`Tool` Protocol: ``name``, ``schema``,
``call(arguments)``. Runtime-checkable so tests can use
``isinstance(...)``.
- ``registry.py`` — :data:`TOOL_REGISTRY` (name → class) plus
``get_tools(meta, **kwargs)`` that instantiates every entry whose
``function.name`` is registered. Tools whose name is unknown are
silently skipped — the schema still rides through the chat
template, the model just can't actually invoke that tool at
inference.
- ``say.py`` — :class:`SayTool` wrapping Kyutai's pocket-tts
(CPU-only, ~100M params, ~6× real-time on a MacBook Air M4).
Lazy model load: pocket-tts is imported and the voice state
computed on first ``call(...)`` (or eagerly via ``preload()``).
Returns the PCM tensor; optionally writes a ``.wav`` to
``output_dir`` for offline inspection.
- ``__init__.py`` — re-exports the public surface.
Optional install:
pip install lerobot[tools]
The ``[tools]`` extra in ``pyproject.toml`` pulls in ``pocket-tts`` +
``scipy`` (for the wav writer). Adding more tools later means a new
file + a registry entry — no new extras unless the tool brings new
deps.
To add your own tool, follow the three-step guide in
``docs/source/tools.mdx`` (PR 1):
1. Drop ``src/lerobot/tools/<my_tool>.py`` with a ``Tool``-conforming
class.
2. Register the class in ``TOOL_REGISTRY`` (this file).
3. Pre-populate ``meta/info.json["tools"]`` with the schema (or let
``lerobot-annotate`` add it on the next run).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
59 lines
2.4 KiB
Python
59 lines
2.4 KiB
Python
# Copyright 2026 The HuggingFace Inc. team. All rights reserved.
|
|
#
|
|
# Licensed under the Apache License, Version 2.0 (the "License");
|
|
# you may not use this file except in compliance with the License.
|
|
# You may obtain a copy of the License at
|
|
#
|
|
# http://www.apache.org/licenses/LICENSE-2.0
|
|
#
|
|
# Unless required by applicable law or agreed to in writing, software
|
|
# distributed under the License is distributed on an "AS IS" BASIS,
|
|
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
|
# See the License for the specific language governing permissions and
|
|
# limitations under the License.
|
|
"""Tool protocol — the contract every runnable tool implementation honors.
|
|
|
|
Tools are the executable side of the OpenAI-style function-calling
|
|
abstraction the v3.1 language schema (PR 1) carries on assistant
|
|
messages: the schema describes *what can be called*, the tool
|
|
implementation describes *how to call it*.
|
|
|
|
Implementations live one-per-file under :mod:`lerobot.tools` (e.g.
|
|
``say.py`` for ``SayTool``) and are registered in
|
|
:mod:`lerobot.tools.registry`. The runtime instantiates them lazily so
|
|
heavy dependencies (torch models, audio backends, network clients,
|
|
hardware drivers) only load when the dataset actually declares the tool.
|
|
"""
|
|
|
|
from __future__ import annotations
|
|
|
|
from typing import Any, Protocol, runtime_checkable
|
|
|
|
|
|
@runtime_checkable
|
|
class Tool(Protocol):
|
|
"""Minimum surface every tool must expose."""
|
|
|
|
#: Name matching ``schema["function"]["name"]``. The runtime dispatcher
|
|
#: routes incoming ``tool_calls`` to the implementation by this key.
|
|
name: str
|
|
|
|
#: OpenAI-style function-call schema. Same dict the dataset stores in
|
|
#: ``meta/info.json["tools"]`` and the chat template renders into the
|
|
#: prompt.
|
|
schema: dict[str, Any]
|
|
|
|
def call(self, arguments: dict[str, Any]) -> Any:
|
|
"""Execute the tool with the model-provided arguments.
|
|
|
|
``arguments`` is the parsed dict from
|
|
``tool_calls[i]["function"]["arguments"]`` (already JSON-decoded
|
|
when the model emits a JSON-string by the chat-template
|
|
convention). Implementations validate the dict against their own
|
|
schema; the runtime only routes by name.
|
|
|
|
Return value is implementation-defined — typically a tensor
|
|
(TTS audio), a Path (saved file), a dict (structured result), or
|
|
``None`` (side-effect-only call).
|
|
"""
|