feat: Update image handling and refine AI prompt instructions

Refactor image data passing in `pdf_convertor.py` to use a direct base64 and mime_type format, aligning with updated API requirements for vision models. Additionally, the `pdf_convertor_prompt.md` has been significantly refined to improve the clarity and specificity of instructions for the AI model, particularly concerning: - **Image Content Explanation:** Added detailed rules to ensure the AI only processes existing image references, preserves paths, and focuses on descriptive text. - **Mathematical Formulas:** Clarified conversion to LaTeX notation. - **Heading Structure:** Enhanced rules and examples for adjusting heading levels and merging adjacent or duplicate headings to ensure logical document flow.
2025-11-12 18:05:24 +11:00
parent 3b62c0f478
commit e8fa2617ba
2 changed files with 40 additions and 36 deletions
--- a/pdf_convertor.py
+++ b/pdf_convertor.py
@@ -170,10 +170,9 @@ def refine_content(md: str, images: dict[str, bytes], pdf: bytes) -> str:
        )
        human_message_parts.append(
            {
-                "type": "image_url",
-                "image_url": {
-                    "url": f"data:image/png;base64,{base64.b64encode(images[image_name]).decode('utf-8')}"
-                },
+                "type": "image",
+                "base64": base64.b64encode(images[image_name]).decode("utf-8"),
+                "mime_type": "image/png",
            }
        )