Files
llm-in-text/backend/api_performance_report.md
ydy0615 7985fe9641 feat(tts): add api endpoints and optimization for apple silicon
Introduce a comprehensive TTS/ASR module that:
- Adds /v1/tts-asr/config, /status, /warmup, /tts, /asr endpoints with detailed JSON responses
- Implements Apple‑Silicon detection, device selection (MPS/CUDA/CPU), and memory limiting logic
- Supports selectable model size, quantization, and offline mode via environment variables
- Adds robust audio validation and multi‑path resampling fallback
- Provides new README sections for API usage, device detection, and performance benchmarking
- Includes a full testing suite: unit tests, integration tests, macOS simulation and performance reports
- Updates backend dependencies and CI scripts
- Adds new front‑end views and components for Univer editor integration

All changes are backward compatible; new features are exposed through environment variables and new API routes.
2026-04-06 11:14:09 +08:00

2.8 KiB

API Benchmarking Report (2026-04-05 23:55:38)

Base URL: https://api.imageteach.tech:8002

Executive Summary

Task Success Rate Avg TTFB Avg Latency P95 Latency TPS RPS
Completion-Short 100.0% 7519.5ms 7520.1ms 14075.8ms 63.9 0.58
Completion-Normal 70.0% 9184.3ms 9184.8ms 14619.5ms 100.5 0.14
Completion-Long 100.0% 22419.4ms 22419.8ms 39618.0ms 852.5 0.21
OCR-Concurrent 0.0% 0.0ms 0.0ms 0.0ms 0.0 5.49
TTS-Concurrent 0.0% 0.0ms 0.0ms 0.0ms 0.0 11.27
ASR-Concurrent 0.0% 0.0ms 0.0ms 0.0ms 0.0 7.54
Convert-Concurrent 100.0% 377.9ms 378.6ms 1017.7ms 26.4 5.28

Stability & Context Analysis

Detailed analysis of how context length affects TTFB and overall performance.

Completion-Short Details

  • Total Samples: 10
  • Duration: 17.36s

Completion-Normal Details

  • Total Samples: 10
  • Duration: 70.66s
  • Top Errors:
    • [504] <html>
<head></head>

504 Gateway Time-out


openresty </html>
  • [504] <html>
<head></head>

504 Gateway Time-out


openresty </html>
  • [504] <html>
<head></head>

504 Gateway Time-out


openresty </html>

Completion-Long Details

  • Total Samples: 10
  • Duration: 47.09s

OCR-Concurrent Details

  • Total Samples: 10
  • Duration: 1.82s
  • Top Errors:
    • [500] {"error":"model runner has unexpectedly stopped, this may be due to resource limitations or an internal error, check ollama server logs for details (status code: 500)"}
    • [500] {"error":"model runner has unexpectedly stopped, this may be due to resource limitations or an internal error, check ollama server logs for details (status code: 500)"}
    • [500] {"error":"model runner has unexpectedly stopped, this may be due to resource limitations or an internal error, check ollama server logs for details (status code: 500)"}

TTS-Concurrent Details

  • Total Samples: 10
  • Duration: 0.89s
  • Top Errors:
    • [404] {"detail":"Not Found"}
    • [404] {"detail":"Not Found"}
    • [404] {"detail":"Not Found"}

ASR-Concurrent Details

  • Total Samples: 10
  • Duration: 1.33s
  • Top Errors:
    • [404] {"detail":"Not Found"}
    • [404] {"detail":"Not Found"}
    • [404] {"detail":"Not Found"}

Convert-Concurrent Details

  • Total Samples: 10
  • Duration: 1.90s