slide-translate

Author	SHA1	Message	Date
nite	e8fa2617ba	feat: Update image handling and refine AI prompt instructions Refactor image data passing in `pdf_convertor.py` to use a direct base64 and mime_type format, aligning with updated API requirements for vision models. Additionally, the `pdf_convertor_prompt.md` has been significantly refined to improve the clarity and specificity of instructions for the AI model, particularly concerning: - Image Content Explanation: Added detailed rules to ensure the AI only processes existing image references, preserves paths, and focuses on descriptive text. - Mathematical Formulas: Clarified conversion to LaTeX notation. - Heading Structure: Enhanced rules and examples for adjusting heading levels and merging adjacent or duplicate headings to ensure logical document flow.	2025-11-12 18:05:24 +11:00
nite	1a867844ce	feat: Introduce OpenAI LLM provider and update API key handling This commit integrates OpenAI as a new Large Language Model (LLM) provider, expanding the available options for content refinement. Key changes include: - Added `set_openai_api_key` to handle OpenAI API key retrieval from `config.ini` or environment variables. - Modified `set_api_key` to dynamically read the LLM provider from `config.ini`	2025-11-12 02:51:18 +11:00
nite	ae7c579580	feat: Improve content refinement with SystemMessage and prompt updates This commit refactors the content refinement process to leverage `SystemMessage` for the primary prompt, enhancing clarity and adherence to LLM best practices. The `pdf_convertor.py` file was updated to: - Import `SystemMessage` from `langchain_core.messages`. - Modify the `refine_content` function to use `SystemMessage` for the main prompt, moving the prompt content from `human_message_parts`. - Adjust `human_message_parts` to only contain the Markdown and image data for the `HumanMessage`. The `pdf_convertor_prompt.md` file was updated to: - Reformat the prompt with clearer headings and instructions for each task. - Improve the clarity and conciseness of the instructions for cleaning up characters, explaining image content, and correcting list formatting. Additionally, `.gitignore` was updated to include `.vscode/` to prevent IDE-specific files from being committed. These changes improve the structure of the LLM interaction and make the prompt more readable and maintainable.	2025-11-11 23:39:47 +11:00
nite	26951b8bc0	feat(llm): Add Ollama provider and PyMuPDF image extraction This commit introduces support for Ollama as an alternative Large Language Model (LLM) provider and enhances PDF image extraction capabilities. - Ollama Integration: - Implemented `set_ollama_config` to configure Ollama's base URL from `config.ini`. - Modified `llm.py` to dynamically select and configure the LLM (Gemini or Ollama) based on the `PROVIDER` setting. - Updated `get_model_name` to return provider-specific default model names. - `pdf_convertor.py` now conditionally initializes `ChatGoogleGenerativeAI` or `ChatOllama` based on the configured provider. - PyMuPDF Image Extraction: - Added a new `extract_images_from_pdf` function using PyMuPDF (`fitz`) for direct image extraction from PDF files. - Introduced `get_extract_images_from_pdf_flag` to control this feature via `config.ini`. - `convert_pdf_to_markdown` and `refine_content` functions were updated to utilize this new image extraction method when enabled. - Refinement Flow: - Adjusted the order of `save_md_images` in `main.py` and added an option to save the refined markdown with a specific filename (`index_refined.md`). - Dependencies: - Updated `pyproject.lock` to include new dependencies for Ollama integration (`langchain-ollama`) and PyMuPDF (`PyMuPDF`), along with platform-specific markers for NVIDIA dependencies.	2025-11-11 22:35:23 +11:00
nite	e05c15db16	u	2025-11-07 04:03:57 +11:00
nite	40ff3756a5	update: README	2025-10-30 05:14:51 +11:00
nite	3eef042111	refactor(app): Extract PDF conversion logic into a separate module The main.py script was becoming monolithic, containing all the logic for PDF conversion, image path simplification, and content refinement. This change extracts these core functionalities into a new `pdf_convertor` module. This refactoring improves the project structure by: - Enhancing modularity and separation of concerns. - Making the main.py script a cleaner, high-level orchestrator. - Improving code readability and maintainability. The functions `convert_pdf_to_markdown`, `save_md_images`, and `refine_content` are now imported from the `pdf_convertor` module and called from the main execution block.	2025-10-27 20:02:02 +11:00

7 Commits