feat: Update image handling and refine AI prompt instructions

Refactor image data passing in `pdf_convertor.py` to use a direct base64 and mime_type format, aligning with updated API requirements for vision models.

Additionally, the `pdf_convertor_prompt.md` has been significantly refined to improve the clarity and specificity of instructions for the AI model, particularly concerning:
- **Image Content Explanation:** Added detailed rules to ensure the AI only processes existing image references, preserves paths, and focuses on descriptive text.
- **Mathematical Formulas:** Clarified conversion to LaTeX notation.
- **Heading Structure:** Enhanced rules and examples for adjusting heading levels and merging adjacent or duplicate headings to ensure logical document flow.
This commit is contained in:
2025-11-12 18:05:24 +11:00
parent 3b62c0f478
commit e8fa2617ba
2 changed files with 40 additions and 36 deletions

View File

@@ -170,10 +170,9 @@ def refine_content(md: str, images: dict[str, bytes], pdf: bytes) -> str:
)
human_message_parts.append(
{
"type": "image_url",
"image_url": {
"url": f"data:image/png;base64,{base64.b64encode(images[image_name]).decode('utf-8')}"
},
"type": "image",
"base64": base64.b64encode(images[image_name]).decode("utf-8"),
"mime_type": "image/png",
}
)