320 lines
8.1 KiB
Markdown
320 lines
8.1 KiB
Markdown
|
|
# TTS/ASR macOS适配修复说明
|
|||
|
|
|
|||
|
|
## 修复概述
|
|||
|
|
|
|||
|
|
本次修复彻底重构了`backend/tts_asr.py`,针对macOS和Apple Silicon (M1/M2/M3)进行了全面优化。
|
|||
|
|
|
|||
|
|
## 主要改进
|
|||
|
|
|
|||
|
|
### 1. 增强的设备检测 (`_detect_device_capabilities`)
|
|||
|
|
|
|||
|
|
**改进前问题**:
|
|||
|
|
- 简单的张量乘法测试不足以验证MPS设备实际可用性
|
|||
|
|
- 缺少内存限制检测
|
|||
|
|
- Apple Silicon没有特殊处理
|
|||
|
|
|
|||
|
|
**改进后**:
|
|||
|
|
- 使用`DeviceCapabilities`结构化存储设备信息
|
|||
|
|
- 更全面的MPS测试(1000x1000矩阵运算)
|
|||
|
|
- 自动检测Apple Silicon并调整内存限制
|
|||
|
|
- 根据系统内存动态设置MPS内存阈值(默认60%)
|
|||
|
|
- 支持设备能力降级(MPS→CPU, CUDA→CPU)
|
|||
|
|
|
|||
|
|
**验证方法**:
|
|||
|
|
```python
|
|||
|
|
# 在Python环境中测试
|
|||
|
|
from tts_asr import _detect_device_capabilities
|
|||
|
|
caps = _detect_device_capabilities()
|
|||
|
|
print(f"Device: {caps.device}")
|
|||
|
|
print(f"MPS Available: {caps.mps_available}")
|
|||
|
|
print(f"Apple Silicon: {_is_apple_silicon()}")
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
### 2. 模型大小选择和量化支持
|
|||
|
|
|
|||
|
|
**新增环境变量**:
|
|||
|
|
- `TTS_ASR_MODEL_SIZE`: 选择Whisper模型大小
|
|||
|
|
- `tiny`: 最小模型,最快但准确度较低
|
|||
|
|
- `base`: 基础模型,平衡性能和准确度
|
|||
|
|
- `small`: 推荐用于Apple Silicon
|
|||
|
|
- `medium`: 中等模型
|
|||
|
|
- `large`: 大模型,最高准确度
|
|||
|
|
- `turbo`: large-v3-turbo (原默认模型)
|
|||
|
|
- `auto`: 自动选择(Apple Silicon默认small)
|
|||
|
|
|
|||
|
|
- `TTS_ASR_QUANTIZE`: 启用INT8量化(减少内存占用)
|
|||
|
|
|
|||
|
|
**Apple Silicon优化**:
|
|||
|
|
- 自动检测并推荐`small`模型
|
|||
|
|
- 考虑MPS内存限制选择合适模型
|
|||
|
|
|
|||
|
|
**验证方法**:
|
|||
|
|
```python
|
|||
|
|
# 查看推荐的模型大小
|
|||
|
|
from tts_asr import _get_recommended_model_size
|
|||
|
|
print(_get_recommended_model_size()) # Apple Silicon: "small"
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
### 3. 离线模式支持
|
|||
|
|
|
|||
|
|
**新增环境变量**:
|
|||
|
|
- `TTS_ASR_OFFLINE_MODE`: 启用离线模式
|
|||
|
|
- 启动前检查模型是否已缓存
|
|||
|
|
- 缓存不存在时优雅失败而非崩溃
|
|||
|
|
|
|||
|
|
**验证方法**:
|
|||
|
|
```bash
|
|||
|
|
# 启用离线模式
|
|||
|
|
export TTS_ASR_OFFLINE_MODE=true
|
|||
|
|
python backend/main.py
|
|||
|
|
|
|||
|
|
# 检查模型缓存
|
|||
|
|
python -c "from tts_asr import _check_model_cached; print(_check_model_cached('openai/whisper-small'))"
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
### 4. 健壮的音频处理
|
|||
|
|
|
|||
|
|
**改进前问题**:
|
|||
|
|
- `librosa.resample`失败时无回退
|
|||
|
|
- 缺少音频数据验证
|
|||
|
|
|
|||
|
|
**改进后**:
|
|||
|
|
- `_validate_audio_data()`: 验证音频数据有效性
|
|||
|
|
- `_resample_audio_robust()`: 多重回退重采样
|
|||
|
|
1. 优先使用`librosa.resample`
|
|||
|
|
2. 回退到`torchaudio.transforms.Resample`
|
|||
|
|
3. 最后使用NumPy线性插值
|
|||
|
|
|
|||
|
|
**验证方法**:
|
|||
|
|
```python
|
|||
|
|
import numpy as np
|
|||
|
|
from tts_asr import _resample_audio_robust
|
|||
|
|
|
|||
|
|
# 测试重采样
|
|||
|
|
audio = np.random.randn(16000).astype(np.float32)
|
|||
|
|
resampled = _resample_audio_robust(audio, 16000, 48000)
|
|||
|
|
print(f"Original: {len(audio)}, Resampled: {len(resampled)}")
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
### 5. 改进的错误处理和降级
|
|||
|
|
|
|||
|
|
**降级路径**:
|
|||
|
|
```
|
|||
|
|
MPS推理失败 → 标记MPS不可用 → 清理MPS缓存 → 降级到CPU
|
|||
|
|
CUDA推理失败 → 标记CUDA不可用 → 清理CUDA缓存 → 降级到CPU
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
**日志改进**:
|
|||
|
|
- 详细记录设备检测过程
|
|||
|
|
- 明确标注降级原因
|
|||
|
|
- 显示模型大小、量化状态、离线模式等配置
|
|||
|
|
|
|||
|
|
### 6. 新增API端点
|
|||
|
|
|
|||
|
|
**GET /v1/tts-asr/config**:
|
|||
|
|
```json
|
|||
|
|
{
|
|||
|
|
"environment": {
|
|||
|
|
"TTS_ASR_DEVICE": "auto",
|
|||
|
|
"TTS_ASR_MODEL_SIZE": "auto",
|
|||
|
|
"TTS_ASR_QUANTIZE": false,
|
|||
|
|
"TTS_ASR_OFFLINE_MODE": false,
|
|||
|
|
...
|
|||
|
|
},
|
|||
|
|
"device": {
|
|||
|
|
"current": "mps",
|
|||
|
|
"mps_available": true,
|
|||
|
|
"cuda_available": false,
|
|||
|
|
"is_apple_silicon": true,
|
|||
|
|
"mps_memory_limit_mb": 8192
|
|||
|
|
},
|
|||
|
|
"model": {
|
|||
|
|
"tts": "hexgrad/Kokoro-82M",
|
|||
|
|
"asr_current_size": "small",
|
|||
|
|
"asr_recommended_size": "small",
|
|||
|
|
"available_sizes": ["tiny", "base", "small", "medium", "large", "turbo"]
|
|||
|
|
}
|
|||
|
|
}
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
## 环境变量完整列表
|
|||
|
|
|
|||
|
|
| 变量名 | 说明 | 默认值 | 示例 |
|
|||
|
|
|--------|------|--------|------|
|
|||
|
|
| `TTS_ASR_DEVICE` | 设备选择 | `auto` | `mps`, `cuda`, `cpu` |
|
|||
|
|
| `TTS_ASR_MODEL_SIZE` | ASR模型大小 | `auto` | `tiny`, `base`, `small`, `medium`, `large`, `turbo` |
|
|||
|
|
| `TTS_ASR_QUANTIZE` | INT8量化 | `false` | `true`, `false` |
|
|||
|
|
| `TTS_ASR_OFFLINE_MODE` | 离线模式 | `false` | `true`, `false` |
|
|||
|
|
| `TTS_ASR_WARMUP` | 启动预热 | `true` | `true`, `false` |
|
|||
|
|
| `TTS_ASR_WARMUP_TIMEOUT` | 预热超时(秒) | `120` | `60`, `180` |
|
|||
|
|
| `TTS_ASR_IDLE_TIMEOUT` | 空闲卸载(秒) | `0` | `300`, `600` |
|
|||
|
|
| `TTS_ASR_MPS_MEMORY_LIMIT_MB` | MPS内存限制(MB) | `8192` | `4096`, `16384` |
|
|||
|
|
|
|||
|
|
## macOS使用建议
|
|||
|
|
|
|||
|
|
### 推荐配置
|
|||
|
|
|
|||
|
|
**Apple Silicon (M1/M2/M3) 8GB内存**:
|
|||
|
|
```bash
|
|||
|
|
export TTS_ASR_MODEL_SIZE=small
|
|||
|
|
export TTS_ASR_MPS_MEMORY_LIMIT_MB=4096
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
**Apple Silicon (M1/M2/M3) 16GB+内存**:
|
|||
|
|
```bash
|
|||
|
|
export TTS_ASR_MODEL_SIZE=medium
|
|||
|
|
export TTS_ASR_MPS_MEMORY_LIMIT_MB=8192
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
**内存紧张时**:
|
|||
|
|
```bash
|
|||
|
|
export TTS_ASR_MODEL_SIZE=tiny
|
|||
|
|
export TTS_ASR_QUANTIZE=true
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
### 性能优化建议
|
|||
|
|
|
|||
|
|
1. **首次运行**: 建议不使用离线模式,让模型自动下载
|
|||
|
|
2. **后续运行**: 启用离线模式避免网络延迟
|
|||
|
|
```bash
|
|||
|
|
export TTS_ASR_OFFLINE_MODE=true
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
3. **长期运行服务**: 设置空闲超时自动卸载模型
|
|||
|
|
```bash
|
|||
|
|
export TTS_ASR_IDLE_TIMEOUT=600 # 10分钟后卸载
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
4. **调试模式**: 查看详细设备检测日志
|
|||
|
|
```python
|
|||
|
|
import logging
|
|||
|
|
logging.getLogger("tts_asr").setLevel(logging.DEBUG)
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
## 验证步骤(非Mac环境)
|
|||
|
|
|
|||
|
|
由于你不在Mac环境下,可以使用以下方法验证代码逻辑:
|
|||
|
|
|
|||
|
|
### 1. 代码静态检查
|
|||
|
|
```bash
|
|||
|
|
# 检查Python语法
|
|||
|
|
python -m py_compile backend/tts_asr.py
|
|||
|
|
|
|||
|
|
# 检查导入
|
|||
|
|
python -c "import backend.tts_asr"
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
### 2. 单元测试模拟
|
|||
|
|
```python
|
|||
|
|
# 模拟Apple Silicon环境
|
|||
|
|
import os
|
|||
|
|
import platform
|
|||
|
|
|
|||
|
|
# 模拟Darwin/arm64
|
|||
|
|
original_system = platform.system
|
|||
|
|
original_machine = platform.machine
|
|||
|
|
|
|||
|
|
def mock_system():
|
|||
|
|
return "Darwin"
|
|||
|
|
|
|||
|
|
def mock_machine():
|
|||
|
|
return "arm64"
|
|||
|
|
|
|||
|
|
platform.system = mock_system
|
|||
|
|
platform.machine = mock_machine
|
|||
|
|
|
|||
|
|
# 测试Apple Silicon检测
|
|||
|
|
from tts_asr import _is_apple_silicon
|
|||
|
|
assert _is_apple_silicon() == True
|
|||
|
|
|
|||
|
|
# 恢复原始函数
|
|||
|
|
platform.system = original_system
|
|||
|
|
platform.machine = original_machine
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
### 3. 环境变量测试
|
|||
|
|
```python
|
|||
|
|
import os
|
|||
|
|
os.environ['TTS_ASR_MODEL_SIZE'] = 'small'
|
|||
|
|
os.environ['TTS_ASR_QUANTIZE'] = 'true'
|
|||
|
|
|
|||
|
|
# 重新加载模块
|
|||
|
|
import importlib
|
|||
|
|
import backend.tts_asr
|
|||
|
|
importlib.reload(backend.tts_asr)
|
|||
|
|
|
|||
|
|
from backend.tts_asr import TTS_ASR_MODEL_SIZE, TTS_ASR_QUANTIZE
|
|||
|
|
assert TTS_ASR_MODEL_SIZE == 'small'
|
|||
|
|
assert TTS_ASR_QUANTIZE == True
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
### 4. API端点测试(需要运行服务)
|
|||
|
|
```bash
|
|||
|
|
# 启动服务
|
|||
|
|
python backend/main.py
|
|||
|
|
|
|||
|
|
# 测试配置端点(需要API Key)
|
|||
|
|
curl -X GET "http://localhost:8001/v1/tts-asr/config" \
|
|||
|
|
-H "X-API-Key: your-secret-key-here"
|
|||
|
|
|
|||
|
|
# 测试状态端点
|
|||
|
|
curl -X GET "http://localhost:8001/v1/tts-asr/status" \
|
|||
|
|
-H "X-API-Key: your-secret-key-here"
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
## 依赖更新
|
|||
|
|
|
|||
|
|
已在`backend/requirements.txt`中添加:
|
|||
|
|
- `psutil`: 系统内存检测
|
|||
|
|
- `torchaudio`: 音频重采样备选方案
|
|||
|
|
|
|||
|
|
安装新依赖:
|
|||
|
|
```bash
|
|||
|
|
pip install -r backend/requirements.txt
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
## 向后兼容性
|
|||
|
|
|
|||
|
|
所有改动保持向后兼容:
|
|||
|
|
- 现有API端点未改变
|
|||
|
|
- 默认行为与原版一致
|
|||
|
|
- 新功能通过环境变量启用
|
|||
|
|
|
|||
|
|
## 已知限制
|
|||
|
|
|
|||
|
|
1. **MPS float16**: 在某些操作上可能不稳定,代码默认使用float32
|
|||
|
|
2. **8-bit量化**: 仅在CPU和CUDA环境支持,MPS不支持
|
|||
|
|
3. **Core ML**: 预留了扩展点但未实现(需要额外依赖)
|
|||
|
|
|
|||
|
|
## 未来改进方向
|
|||
|
|
|
|||
|
|
1. 集成Core ML作为备选推理后端
|
|||
|
|
2. 支持torch.compile (PyTorch 2.0+)
|
|||
|
|
3. 实现模型自动下载的进度显示
|
|||
|
|
4. 添加更多音频格式支持
|
|||
|
|
|
|||
|
|
## 问题排查
|
|||
|
|
|
|||
|
|
### 模型加载失败
|
|||
|
|
1. 检查网络连接
|
|||
|
|
2. 尝试关闭离线模式: `export TTS_ASR_OFFLINE_MODE=false`
|
|||
|
|
3. 查看详细日志: 设置`logging.getLogger("tts_asr").setLevel(logging.DEBUG)`
|
|||
|
|
|
|||
|
|
### MPS内存不足
|
|||
|
|
1. 使用更小的模型: `export TTS_ASR_MODEL_SIZE=tiny`
|
|||
|
|
2. 启用量化: `export TTS_ASR_QUANTIZE=true`
|
|||
|
|
3. 降低内存限制: `export TTS_ASR_MPS_MEMORY_LIMIT_MB=4096`
|
|||
|
|
|
|||
|
|
### 音频处理失败
|
|||
|
|
1. 检查音频格式(支持WAV)
|
|||
|
|
2. 确保音频采样率≥8000Hz
|
|||
|
|
3. 查看日志中的详细错误信息
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
**修复完成日期**: 2026-04-06
|
|||
|
|
**修改文件**:
|
|||
|
|
- `backend/tts_asr.py` (主要重构)
|
|||
|
|
- `backend/requirements.txt` (添加依赖)
|
|||
|
|
- `README.md` (更新文档)
|