- Implement SHA-256 image hashing to cache OCR results and avoid re-processing identical images - Add 100MB file size limit for image uploads with user-friendly error messages - Clear ghost suggestions when uploading new images to prevent interference - Optimize size limit calculation in copilot plugin to include OCR context - Remove debug logging from production code - Add image processing optimization plan document BREAKING CHANGE: Image upload size limit is now enforced at 100MB (previously unlimited)
92 lines
3.2 KiB
Python
92 lines
3.2 KiB
Python
from typing import Tuple
|
|
|
|
def _sanitize_language_id(language_id: str) -> str:
|
|
if not language_id:
|
|
return "markdown"
|
|
allowed = []
|
|
for ch in language_id.strip():
|
|
if ch.isalnum() or ch in "-_+.":
|
|
allowed.append(ch)
|
|
value = "".join(allowed)[:32]
|
|
return value or "markdown"
|
|
|
|
|
|
def _prepare_context(prefix: str, suffix: str) -> Tuple[str, str]:
|
|
"""
|
|
Prepare prefix/suffix for model completion context.
|
|
"""
|
|
return prefix, suffix
|
|
|
|
|
|
def prepare_prompt_context(prefix: str, suffix: str) -> Tuple[str, str]:
|
|
return _prepare_context(prefix, suffix)
|
|
|
|
|
|
def build_prompt(prefix: str, suffix: str, language_id: str = "markdown") -> str:
|
|
safe_language_id = _sanitize_language_id(language_id)
|
|
recent_prefix, recent_suffix = _prepare_context(prefix, suffix)
|
|
|
|
prompt = f"""You are an inline completion engine for a {safe_language_id} editor with ghost-text suggestions.
|
|
|
|
Your job:
|
|
- Return ONLY the text that should be inserted at the cursor between PREFIX and SUFFIX.
|
|
- Prefer a meaningful, non-empty insertion with moderate length.
|
|
- Avoid overly short outputs with little information value.
|
|
|
|
Important context:
|
|
- PREFIX may contain OCR metadata inline after images, e.g.  <OCR:description>.
|
|
- The <OCR:...> is hidden context describing image content.
|
|
- Never copy, rewrite, or emit OCR tags in output.
|
|
- Never output <OCR: or >.
|
|
|
|
Hard rules:
|
|
1. Seamless join:
|
|
PREFIX + OUTPUT + SUFFIX must read naturally as one continuous document.
|
|
2. No suffix repetition:
|
|
Do NOT repeat text that already appears at the start of SUFFIX.
|
|
3. Balanced length:
|
|
Prefer concise but meaningful continuation, not ultra-short fragments.
|
|
Default target is 10-500 characters and 1-20 lines for plain prose.
|
|
You may be longer when structure requires it (lists, tables, code blocks, math blocks).
|
|
4. Avoid trivial output:
|
|
Do not output only punctuation or filler such as ".", ",", ";", ":".
|
|
Do not output just one token unless it is structurally necessary.
|
|
5. Preserve local style:
|
|
Match nearby language, tone, punctuation, spacing, and indentation.
|
|
6. Markdown awareness:
|
|
Continue active list/checkbox/ordered-list patterns when applicable.
|
|
Preserve indentation in nested list/code contexts.
|
|
You may output full markdown structures when context needs them: headings, lists, tables, fenced code blocks, blockquotes, and LaTeX ($...$ / $$...$$).
|
|
Close obvious unclosed inline markdown markers only when needed to bridge.
|
|
7. Strict output format:
|
|
Output insertion text only.
|
|
No explanations, labels, or wrapper quotes around the whole output.
|
|
Markdown syntax is allowed when it is the intended insertion (including fenced code blocks and LaTeX).
|
|
|
|
Decision policy:
|
|
- If PREFIX already connects naturally to SUFFIX, add a brief but useful continuation when possible.
|
|
- If uncertain, prefer a complete short phrase or sentence with clear meaning.
|
|
|
|
Examples:
|
|
<PREFIX>The quick brown fox </PREFIX>
|
|
<SUFFIX>jumps over the lazy dog.</SUFFIX>
|
|
Output: "moved quietly and then "
|
|
|
|
<PREFIX>## TODO\\n- [ ] Buy milk\\n- [ ] </PREFIX>
|
|
<SUFFIX></SUFFIX>
|
|
Output: "Write release notes and share draft with team"
|
|
|
|
Now produce the insertion.
|
|
|
|
<PREFIX>
|
|
{recent_prefix}
|
|
</PREFIX>
|
|
|
|
<SUFFIX>
|
|
{recent_suffix}
|
|
</SUFFIX>
|
|
|
|
Output:"""
|
|
|
|
return prompt.strip()
|