Commit Graph

11 Commits

Author SHA1 Message Date
3b62c0f478 mod README 2025-11-12 03:22:50 +11:00
1a867844ce feat: Introduce OpenAI LLM provider and update API key handling
This commit integrates OpenAI as a new Large Language Model (LLM) provider,
expanding the available options for content refinement.

Key changes include:
- Added `set_openai_api_key` to handle OpenAI API key retrieval from
  `config.ini` or environment variables.
- Modified `set_api_key` to dynamically read the LLM provider from `config.ini`
2025-11-12 02:51:18 +11:00
ae7c579580 feat: Improve content refinement with SystemMessage and prompt updates
This commit refactors the content refinement process to leverage `SystemMessage` for the primary prompt, enhancing clarity and adherence to LLM best practices.

The `pdf_convertor.py` file was updated to:
- Import `SystemMessage` from `langchain_core.messages`.
- Modify the `refine_content` function to use `SystemMessage` for the main prompt, moving the prompt content from `human_message_parts`.
- Adjust `human_message_parts` to only contain the Markdown and image data for the `HumanMessage`.

The `pdf_convertor_prompt.md` file was updated to:
- Reformat the prompt with clearer headings and instructions for each task.
- Improve the clarity and conciseness of the instructions for cleaning up characters, explaining image content, and correcting list formatting.

Additionally, `.gitignore` was updated to include `.vscode/` to prevent IDE-specific files from being committed.

These changes improve the structure of the LLM interaction and make the prompt more readable and maintainable.
2025-11-11 23:39:47 +11:00
26951b8bc0 feat(llm): Add Ollama provider and PyMuPDF image extraction
This commit introduces support for Ollama as an alternative Large Language Model (LLM) provider and enhances PDF image extraction capabilities.

- **Ollama Integration:**
    - Implemented `set_ollama_config` to configure Ollama's base URL from `config.ini`.
    - Modified `llm.py` to dynamically select and configure the LLM (Gemini or Ollama) based on the `PROVIDER` setting.
    - Updated `get_model_name` to return provider-specific default model names.
    - `pdf_convertor.py` now conditionally initializes `ChatGoogleGenerativeAI` or `ChatOllama` based on the configured provider.
- **PyMuPDF Image Extraction:**
    - Added a new `extract_images_from_pdf` function using PyMuPDF (`fitz`) for direct image extraction from PDF files.
    - Introduced `get_extract_images_from_pdf_flag` to control this feature via `config.ini`.
    - `convert_pdf_to_markdown` and `refine_content` functions were updated to utilize this new image extraction method when enabled.
- **Refinement Flow:**
    - Adjusted the order of `save_md_images` in `main.py` and added an option to save the refined markdown with a specific filename (`index_refined.md`).
- **Dependencies:**
    - Updated `pyproject.lock` to include new dependencies for Ollama integration (`langchain-ollama`) and PyMuPDF (`PyMuPDF`), along with platform-specific markers for NVIDIA dependencies.
2025-11-11 22:35:23 +11:00
2c6c2c1078 improve prompt 2025-11-10 00:21:18 +11:00
e05c15db16 u 2025-11-07 04:03:57 +11:00
40ff3756a5 update: README 2025-10-30 05:14:51 +11:00
3eef042111 refactor(app): Extract PDF conversion logic into a separate module
The main.py script was becoming monolithic, containing all the logic for PDF conversion, image path simplification, and content refinement. This change extracts these core functionalities into a new `pdf_convertor` module.

This refactoring improves the project structure by:
- Enhancing modularity and separation of concerns.
- Making the main.py script a cleaner, high-level orchestrator.
- Improving code readability and maintainability.

The functions `convert_pdf_to_markdown`, `save_md_images`, and `refine_content` are now imported from the `pdf_convertor` module and called from the main execution block.
2025-10-27 20:02:02 +11:00
4f29d5c814 feat(llm): Send images to model and enhance processing prompt 2025-10-25 22:51:54 +11:00
37d4facee3 feat: Enable batch processing of PDF files and update README 2025-10-22 20:56:17 +11:00
ad212a35af init 2025-10-22 17:10:29 +11:00