Commit Graph

7 Commits

Author SHA1 Message Date
e8fa2617ba feat: Update image handling and refine AI prompt instructions
Refactor image data passing in `pdf_convertor.py` to use a direct base64 and mime_type format, aligning with updated API requirements for vision models.

Additionally, the `pdf_convertor_prompt.md` has been significantly refined to improve the clarity and specificity of instructions for the AI model, particularly concerning:
- **Image Content Explanation:** Added detailed rules to ensure the AI only processes existing image references, preserves paths, and focuses on descriptive text.
- **Mathematical Formulas:** Clarified conversion to LaTeX notation.
- **Heading Structure:** Enhanced rules and examples for adjusting heading levels and merging adjacent or duplicate headings to ensure logical document flow.
2025-11-12 18:05:24 +11:00
1a867844ce feat: Introduce OpenAI LLM provider and update API key handling
This commit integrates OpenAI as a new Large Language Model (LLM) provider,
expanding the available options for content refinement.

Key changes include:
- Added `set_openai_api_key` to handle OpenAI API key retrieval from
  `config.ini` or environment variables.
- Modified `set_api_key` to dynamically read the LLM provider from `config.ini`
2025-11-12 02:51:18 +11:00
ae7c579580 feat: Improve content refinement with SystemMessage and prompt updates
This commit refactors the content refinement process to leverage `SystemMessage` for the primary prompt, enhancing clarity and adherence to LLM best practices.

The `pdf_convertor.py` file was updated to:
- Import `SystemMessage` from `langchain_core.messages`.
- Modify the `refine_content` function to use `SystemMessage` for the main prompt, moving the prompt content from `human_message_parts`.
- Adjust `human_message_parts` to only contain the Markdown and image data for the `HumanMessage`.

The `pdf_convertor_prompt.md` file was updated to:
- Reformat the prompt with clearer headings and instructions for each task.
- Improve the clarity and conciseness of the instructions for cleaning up characters, explaining image content, and correcting list formatting.

Additionally, `.gitignore` was updated to include `.vscode/` to prevent IDE-specific files from being committed.

These changes improve the structure of the LLM interaction and make the prompt more readable and maintainable.
2025-11-11 23:39:47 +11:00
26951b8bc0 feat(llm): Add Ollama provider and PyMuPDF image extraction
This commit introduces support for Ollama as an alternative Large Language Model (LLM) provider and enhances PDF image extraction capabilities.

- **Ollama Integration:**
    - Implemented `set_ollama_config` to configure Ollama's base URL from `config.ini`.
    - Modified `llm.py` to dynamically select and configure the LLM (Gemini or Ollama) based on the `PROVIDER` setting.
    - Updated `get_model_name` to return provider-specific default model names.
    - `pdf_convertor.py` now conditionally initializes `ChatGoogleGenerativeAI` or `ChatOllama` based on the configured provider.
- **PyMuPDF Image Extraction:**
    - Added a new `extract_images_from_pdf` function using PyMuPDF (`fitz`) for direct image extraction from PDF files.
    - Introduced `get_extract_images_from_pdf_flag` to control this feature via `config.ini`.
    - `convert_pdf_to_markdown` and `refine_content` functions were updated to utilize this new image extraction method when enabled.
- **Refinement Flow:**
    - Adjusted the order of `save_md_images` in `main.py` and added an option to save the refined markdown with a specific filename (`index_refined.md`).
- **Dependencies:**
    - Updated `pyproject.lock` to include new dependencies for Ollama integration (`langchain-ollama`) and PyMuPDF (`PyMuPDF`), along with platform-specific markers for NVIDIA dependencies.
2025-11-11 22:35:23 +11:00
e05c15db16 u 2025-11-07 04:03:57 +11:00
40ff3756a5 update: README 2025-10-30 05:14:51 +11:00
3eef042111 refactor(app): Extract PDF conversion logic into a separate module
The main.py script was becoming monolithic, containing all the logic for PDF conversion, image path simplification, and content refinement. This change extracts these core functionalities into a new `pdf_convertor` module.

This refactoring improves the project structure by:
- Enhancing modularity and separation of concerns.
- Making the main.py script a cleaner, high-level orchestrator.
- Improving code readability and maintainability.

The functions `convert_pdf_to_markdown`, `save_md_images`, and `refine_content` are now imported from the `pdf_convertor` module and called from the main execution block.
2025-10-27 20:02:02 +11:00