feat: Improve content refinement with SystemMessage and prompt updates

This commit refactors the content refinement process to leverage `SystemMessage` for the primary prompt, enhancing clarity and adherence to LLM best practices.

The `pdf_convertor.py` file was updated to:
- Import `SystemMessage` from `langchain_core.messages`.
- Modify the `refine_content` function to use `SystemMessage` for the main prompt, moving the prompt content from `human_message_parts`.
- Adjust `human_message_parts` to only contain the Markdown and image data for the `HumanMessage`.

The `pdf_convertor_prompt.md` file was updated to:
- Reformat the prompt with clearer headings and instructions for each task.
- Improve the clarity and conciseness of the instructions for cleaning up characters, explaining image content, and correcting list formatting.

Additionally, `.gitignore` was updated to include `.vscode/` to prevent IDE-specific files from being committed.

These changes improve the structure of the LLM interaction and make the prompt more readable and maintainable.
This commit is contained in:
2025-11-11 23:39:47 +11:00
parent 26951b8bc0
commit ae7c579580
4 changed files with 105 additions and 65 deletions

3
.gitignore vendored
View File

@@ -11,4 +11,5 @@ wheels/
input/
output/
config.ini
test.py
test.py
.vscode/