Files
llm-in-text/backend/api_performance_report.md
ydy0615 7985fe9641 feat(tts): add api endpoints and optimization for apple silicon
Introduce a comprehensive TTS/ASR module that:
- Adds /v1/tts-asr/config, /status, /warmup, /tts, /asr endpoints with detailed JSON responses
- Implements Apple‑Silicon detection, device selection (MPS/CUDA/CPU), and memory limiting logic
- Supports selectable model size, quantization, and offline mode via environment variables
- Adds robust audio validation and multi‑path resampling fallback
- Provides new README sections for API usage, device detection, and performance benchmarking
- Includes a full testing suite: unit tests, integration tests, macOS simulation and performance reports
- Updates backend dependencies and CI scripts
- Adds new front‑end views and components for Univer editor integration

All changes are backward compatible; new features are exposed through environment variables and new API routes.
2026-04-06 11:14:09 +08:00

82 lines
2.8 KiB
Markdown

# API Benchmarking Report (2026-04-05 23:55:38)
**Base URL:** `https://api.imageteach.tech:8002`
## Executive Summary
| Task | Success Rate | Avg TTFB | Avg Latency | P95 Latency | TPS | RPS |
| :--- | :--- | :--- | :--- | :--- | :--- | :--- |
| Completion-Short | 100.0% | 7519.5ms | 7520.1ms | 14075.8ms | 63.9 | 0.58 |
| Completion-Normal | 70.0% | 9184.3ms | 9184.8ms | 14619.5ms | 100.5 | 0.14 |
| Completion-Long | 100.0% | 22419.4ms | 22419.8ms | 39618.0ms | 852.5 | 0.21 |
| OCR-Concurrent | 0.0% | 0.0ms | 0.0ms | 0.0ms | 0.0 | 5.49 |
| TTS-Concurrent | 0.0% | 0.0ms | 0.0ms | 0.0ms | 0.0 | 11.27 |
| ASR-Concurrent | 0.0% | 0.0ms | 0.0ms | 0.0ms | 0.0 | 7.54 |
| Convert-Concurrent | 100.0% | 377.9ms | 378.6ms | 1017.7ms | 26.4 | 5.28 |
## Stability & Context Analysis
Detailed analysis of how context length affects TTFB and overall performance.
### Completion-Short Details
- **Total Samples:** 10
- **Duration:** 17.36s
### Completion-Normal Details
- **Total Samples:** 10
- **Duration:** 70.66s
- **Top Errors:**
- `[504]` <html>
<head><title>504 Gateway Time-out</title></head>
<body>
<center><h1>504 Gateway Time-out</h1></center>
<hr><center>openresty</center>
</body>
</html>
- `[504]` <html>
<head><title>504 Gateway Time-out</title></head>
<body>
<center><h1>504 Gateway Time-out</h1></center>
<hr><center>openresty</center>
</body>
</html>
- `[504]` <html>
<head><title>504 Gateway Time-out</title></head>
<body>
<center><h1>504 Gateway Time-out</h1></center>
<hr><center>openresty</center>
</body>
</html>
### Completion-Long Details
- **Total Samples:** 10
- **Duration:** 47.09s
### OCR-Concurrent Details
- **Total Samples:** 10
- **Duration:** 1.82s
- **Top Errors:**
- `[500]` {"error":"model runner has unexpectedly stopped, this may be due to resource limitations or an internal error, check ollama server logs for details (status code: 500)"}
- `[500]` {"error":"model runner has unexpectedly stopped, this may be due to resource limitations or an internal error, check ollama server logs for details (status code: 500)"}
- `[500]` {"error":"model runner has unexpectedly stopped, this may be due to resource limitations or an internal error, check ollama server logs for details (status code: 500)"}
### TTS-Concurrent Details
- **Total Samples:** 10
- **Duration:** 0.89s
- **Top Errors:**
- `[404]` {"detail":"Not Found"}
- `[404]` {"detail":"Not Found"}
- `[404]` {"detail":"Not Found"}
### ASR-Concurrent Details
- **Total Samples:** 10
- **Duration:** 1.33s
- **Top Errors:**
- `[404]` {"detail":"Not Found"}
- `[404]` {"detail":"Not Found"}
- `[404]` {"detail":"Not Found"}
### Convert-Concurrent Details
- **Total Samples:** 10
- **Duration:** 1.90s