Files
llm-in-text/backend/prompt.py
“ydy0615” 64cfa58376 feat(editor): add image insertion with OCR support and size limit handling
Add image button with dropdown menu for uploading local images or inserting from URL.
Integrate VLM-based OCR to extract text context from images and include in AI suggestions.
Implement document size limits to disable AI when exceeding threshold.
Refactor copilot plugin with per-view runtime state and OCR context injection.
Add OCR cache utility for managing image metadata.
Add code splitting configuration for optimized bundle size.
2026-02-14 18:28:37 +08:00

94 lines
3.0 KiB
Python

from typing import Tuple
MAX_PREFIX_CHARS = 12000
MAX_SUFFIX_CHARS = 4000
def _sanitize_language_id(language_id: str) -> str:
if not language_id:
return "markdown"
allowed = []
for ch in language_id.strip():
if ch.isalnum() or ch in "-_+.":
allowed.append(ch)
value = "".join(allowed)[:32]
return value or "markdown"
def _prepare_context(prefix: str, suffix: str) -> Tuple[str, str]:
"""
Prepare prefix/suffix for model completion context.
Keep the historical one-char lookahead behavior to reduce boundary drift.
"""
if suffix:
prefix = prefix + suffix[0]
suffix = suffix[1:]
return prefix[-MAX_PREFIX_CHARS:], suffix[:MAX_SUFFIX_CHARS]
def build_prompt(prefix: str, suffix: str, language_id: str = "markdown") -> str:
safe_language_id = _sanitize_language_id(language_id)
recent_prefix, recent_suffix = _prepare_context(prefix, suffix)
prompt = f"""You are an inline completion engine for a {safe_language_id} editor with ghost-text suggestions.
Your job:
- Return ONLY the text that should be inserted at the cursor between PREFIX and SUFFIX.
- Prefer a meaningful, non-empty insertion with moderate length.
- Avoid overly short outputs with little information value.
Important context:
- PREFIX may contain hidden OCR metadata in HTML comments such as <!--OCR:...-->.
- These comments are non-visible context only.
- Never copy, rewrite, or emit HTML comments in output.
- Never output <!-- or -->.
Hard rules:
1. Seamless join:
PREFIX + OUTPUT + SUFFIX must read naturally as one continuous document.
2. No suffix repetition:
Do NOT repeat text that already appears at the start of SUFFIX.
3. Balanced length:
Prefer concise but meaningful continuation, not ultra-short fragments.
Default target is 20-120 characters and 1-3 lines.
You may go shorter only when syntax requires it.
4. Avoid trivial output:
Do not output only punctuation or filler such as ".", ",", ";", ":".
Do not output just one token unless it is structurally necessary.
5. Preserve local style:
Match nearby language, tone, punctuation, spacing, and indentation.
6. Markdown awareness:
Continue active list/checkbox/ordered-list patterns when applicable.
Preserve indentation in nested list/code contexts.
Close obvious unclosed inline markdown markers only when needed to bridge.
7. Strict output format:
Output insertion text only.
No explanations, labels, quotes, or code fences.
Decision policy:
- If PREFIX already connects naturally to SUFFIX, add a brief but useful continuation when possible.
- If uncertain, prefer a complete short phrase or sentence with clear meaning.
Examples:
<PREFIX>The quick brown fox </PREFIX>
<SUFFIX>jumps over the lazy dog.</SUFFIX>
Output: "moved quietly and then "
<PREFIX>## TODO\\n- [ ] Buy milk\\n- [ ] </PREFIX>
<SUFFIX></SUFFIX>
Output: "Write release notes and share draft with team"
Now produce the insertion.
<PREFIX>
{recent_prefix}
</PREFIX>
<SUFFIX>
{recent_suffix}
</SUFFIX>
Output:"""
return prompt.strip()