Introduce a comprehensive TTS/ASR module that: - Adds /v1/tts-asr/config, /status, /warmup, /tts, /asr endpoints with detailed JSON responses - Implements Apple‑Silicon detection, device selection (MPS/CUDA/CPU), and memory limiting logic - Supports selectable model size, quantization, and offline mode via environment variables - Adds robust audio validation and multi‑path resampling fallback - Provides new README sections for API usage, device detection, and performance benchmarking - Includes a full testing suite: unit tests, integration tests, macOS simulation and performance reports - Updates backend dependencies and CI scripts - Adds new front‑end views and components for Univer editor integration All changes are backward compatible; new features are exposed through environment variables and new API routes.
82 lines
2.8 KiB
Markdown
82 lines
2.8 KiB
Markdown
# API Benchmarking Report (2026-04-05 23:55:38)
|
|
|
|
**Base URL:** `https://api.imageteach.tech:8002`
|
|
|
|
## Executive Summary
|
|
| Task | Success Rate | Avg TTFB | Avg Latency | P95 Latency | TPS | RPS |
|
|
| :--- | :--- | :--- | :--- | :--- | :--- | :--- |
|
|
| Completion-Short | 100.0% | 7519.5ms | 7520.1ms | 14075.8ms | 63.9 | 0.58 |
|
|
| Completion-Normal | 70.0% | 9184.3ms | 9184.8ms | 14619.5ms | 100.5 | 0.14 |
|
|
| Completion-Long | 100.0% | 22419.4ms | 22419.8ms | 39618.0ms | 852.5 | 0.21 |
|
|
| OCR-Concurrent | 0.0% | 0.0ms | 0.0ms | 0.0ms | 0.0 | 5.49 |
|
|
| TTS-Concurrent | 0.0% | 0.0ms | 0.0ms | 0.0ms | 0.0 | 11.27 |
|
|
| ASR-Concurrent | 0.0% | 0.0ms | 0.0ms | 0.0ms | 0.0 | 7.54 |
|
|
| Convert-Concurrent | 100.0% | 377.9ms | 378.6ms | 1017.7ms | 26.4 | 5.28 |
|
|
|
|
## Stability & Context Analysis
|
|
Detailed analysis of how context length affects TTFB and overall performance.
|
|
|
|
### Completion-Short Details
|
|
- **Total Samples:** 10
|
|
- **Duration:** 17.36s
|
|
|
|
### Completion-Normal Details
|
|
- **Total Samples:** 10
|
|
- **Duration:** 70.66s
|
|
- **Top Errors:**
|
|
- `[504]` <html>
|
|
|
|
<head><title>504 Gateway Time-out</title></head>
|
|
|
|
<body>
|
|
|
|
<center><h1>504 Gateway Time-out</h1></center>
|
|
|
|
<hr><center>openresty</center>
|
|
|
|
</body>
|
|
|
|
</html>
|
|
|
|
|
|
- `[504]` <html>
|
|
|
|
<head><title>504 Gateway Time-out</title></head>
|
|
|
|
<body>
|
|
|
|
<center><h1>504 Gateway Time-out</h1></center>
|
|
|
|
<hr><center>openresty</center>
|
|
|
|
</body>
|
|
|
|
</html>
|
|
|
|
|
|
- `[504]` <html>
|
|
|
|
<head><title>504 Gateway Time-out</title></head>
|
|
|
|
<body>
|
|
|
|
<center><h1>504 Gateway Time-out</h1></center>
|
|
|
|
<hr><center>openresty</center>
|
|
|
|
</body>
|
|
|
|
</html>
|
|
|
|
|
|
|
|
### Completion-Long Details
|
|
- **Total Samples:** 10
|
|
- **Duration:** 47.09s
|
|
|
|
### OCR-Concurrent Details
|
|
- **Total Samples:** 10
|
|
- **Duration:** 1.82s
|
|
- **Top Errors:**
|
|
- `[500]` {"error":"model runner has unexpectedly stopped, this may be due to resource limitations or an internal error, check ollama server logs for details (status code: 500)"}
|
|
- `[500]` {"error":"model runner has unexpectedly stopped, this may be due to resource limitations or an internal error, check ollama server logs for details (status code: 500)"}
|
|
- `[500]` {"error":"model runner has unexpectedly stopped, this may be due to resource limitations or an internal error, check ollama server logs for details (status code: 500)"}
|
|
|
|
### TTS-Concurrent Details
|
|
- **Total Samples:** 10
|
|
- **Duration:** 0.89s
|
|
- **Top Errors:**
|
|
- `[404]` {"detail":"Not Found"}
|
|
- `[404]` {"detail":"Not Found"}
|
|
- `[404]` {"detail":"Not Found"}
|
|
|
|
### ASR-Concurrent Details
|
|
- **Total Samples:** 10
|
|
- **Duration:** 1.33s
|
|
- **Top Errors:**
|
|
- `[404]` {"detail":"Not Found"}
|
|
- `[404]` {"detail":"Not Found"}
|
|
- `[404]` {"detail":"Not Found"}
|
|
|
|
### Convert-Concurrent Details
|
|
- **Total Samples:** 10
|
|
- **Duration:** 1.90s |