Refine the image processing instructions within the PDF conversion prompt to emphasize the critical importance of matching image descriptions to their exact filenames.
The previous instructions were ambiguous and could lead to incorrect image descriptions. This update adds:
- A "Critical" warning to match image names correctly.
- Detailed rules outlining how to process image references based on provided filenames.
- An example workflow to illustrate the correct matching process.
- A new "Critical" verification step in the final instructions to ensure image explanations correspond to their filenames.
This change aims to prevent errors where image descriptions might be mismatched or generated from the wrong image content, ensuring higher accuracy in the conversion process.
Refactor image data passing in `pdf_convertor.py` to use a direct base64 and mime_type format, aligning with updated API requirements for vision models.
Additionally, the `pdf_convertor_prompt.md` has been significantly refined to improve the clarity and specificity of instructions for the AI model, particularly concerning:
- **Image Content Explanation:** Added detailed rules to ensure the AI only processes existing image references, preserves paths, and focuses on descriptive text.
- **Mathematical Formulas:** Clarified conversion to LaTeX notation.
- **Heading Structure:** Enhanced rules and examples for adjusting heading levels and merging adjacent or duplicate headings to ensure logical document flow.
This commit refactors the content refinement process to leverage `SystemMessage` for the primary prompt, enhancing clarity and adherence to LLM best practices.
The `pdf_convertor.py` file was updated to:
- Import `SystemMessage` from `langchain_core.messages`.
- Modify the `refine_content` function to use `SystemMessage` for the main prompt, moving the prompt content from `human_message_parts`.
- Adjust `human_message_parts` to only contain the Markdown and image data for the `HumanMessage`.
The `pdf_convertor_prompt.md` file was updated to:
- Reformat the prompt with clearer headings and instructions for each task.
- Improve the clarity and conciseness of the instructions for cleaning up characters, explaining image content, and correcting list formatting.
Additionally, `.gitignore` was updated to include `.vscode/` to prevent IDE-specific files from being committed.
These changes improve the structure of the LLM interaction and make the prompt more readable and maintainable.