feat: Update image handling and refine AI prompt instructions
Refactor image data passing in `pdf_convertor.py` to use a direct base64 and mime_type format, aligning with updated API requirements for vision models. Additionally, the `pdf_convertor_prompt.md` has been significantly refined to improve the clarity and specificity of instructions for the AI model, particularly concerning: - **Image Content Explanation:** Added detailed rules to ensure the AI only processes existing image references, preserves paths, and focuses on descriptive text. - **Mathematical Formulas:** Clarified conversion to LaTeX notation. - **Heading Structure:** Enhanced rules and examples for adjusting heading levels and merging adjacent or duplicate headings to ensure logical document flow.
This commit is contained in:
@@ -170,10 +170,9 @@ def refine_content(md: str, images: dict[str, bytes], pdf: bytes) -> str:
|
||||
)
|
||||
human_message_parts.append(
|
||||
{
|
||||
"type": "image_url",
|
||||
"image_url": {
|
||||
"url": f"data:image/png;base64,{base64.b64encode(images[image_name]).decode('utf-8')}"
|
||||
},
|
||||
"type": "image",
|
||||
"base64": base64.b64encode(images[image_name]).decode("utf-8"),
|
||||
"mime_type": "image/png",
|
||||
}
|
||||
)
|
||||
|
||||
|
||||
Reference in New Issue
Block a user