slide-translate

Author	SHA1	Message	Date
nite	f1214be148	feat(llm-integration): Enhance prompt clarity and unify PDF attachment This commit improves the structure and clarity of the prompt sent to the LLM (Gemini/OpenAI) in the `refine_content` function. Changes include: * Adding explicit introductory text for the Markdown, individual images, and PDF sections to guide the LLM on the purpose of each input. * Introducing clear "START OF IMAGE" and "END OF IMAGE" delimiters for each image to better define their boundaries. * Unifying the PDF attachment mechanism for both Gemini and OpenAI providers, simplifying the code and ensuring consistent handling of PDF input. These changes aim to improve the LLM's understanding of the provided content, leading to more accurate and relevant refinements.	2025-11-12 19:14:19 +11:00
nite	0e4a609c93	docs: Clarify image processing rules in PDF conversion prompt Refine the image processing instructions within the PDF conversion prompt to emphasize the critical importance of matching image descriptions to their exact filenames. The previous instructions were ambiguous and could lead to incorrect image descriptions. This update adds: - A "Critical" warning to match image names correctly. - Detailed rules outlining how to process image references based on provided filenames. - An example workflow to illustrate the correct matching process. - A new "Critical" verification step in the final instructions to ensure image explanations correspond to their filenames. This change aims to prevent errors where image descriptions might be mismatched or generated from the wrong image content, ensuring higher accuracy in the conversion process.	2025-11-12 18:42:59 +11:00
nite	e8fa2617ba	feat: Update image handling and refine AI prompt instructions Refactor image data passing in `pdf_convertor.py` to use a direct base64 and mime_type format, aligning with updated API requirements for vision models. Additionally, the `pdf_convertor_prompt.md` has been significantly refined to improve the clarity and specificity of instructions for the AI model, particularly concerning: - Image Content Explanation: Added detailed rules to ensure the AI only processes existing image references, preserves paths, and focuses on descriptive text. - Mathematical Formulas: Clarified conversion to LaTeX notation. - Heading Structure: Enhanced rules and examples for adjusting heading levels and merging adjacent or duplicate headings to ensure logical document flow.	2025-11-12 18:05:24 +11:00
nite	3b62c0f478	mod README	2025-11-12 03:22:50 +11:00
nite	1a867844ce	feat: Introduce OpenAI LLM provider and update API key handling This commit integrates OpenAI as a new Large Language Model (LLM) provider, expanding the available options for content refinement. Key changes include: - Added `set_openai_api_key` to handle OpenAI API key retrieval from `config.ini` or environment variables. - Modified `set_api_key` to dynamically read the LLM provider from `config.ini`	2025-11-12 02:51:18 +11:00
nite	ae7c579580	feat: Improve content refinement with SystemMessage and prompt updates This commit refactors the content refinement process to leverage `SystemMessage` for the primary prompt, enhancing clarity and adherence to LLM best practices. The `pdf_convertor.py` file was updated to: - Import `SystemMessage` from `langchain_core.messages`. - Modify the `refine_content` function to use `SystemMessage` for the main prompt, moving the prompt content from `human_message_parts`. - Adjust `human_message_parts` to only contain the Markdown and image data for the `HumanMessage`. The `pdf_convertor_prompt.md` file was updated to: - Reformat the prompt with clearer headings and instructions for each task. - Improve the clarity and conciseness of the instructions for cleaning up characters, explaining image content, and correcting list formatting. Additionally, `.gitignore` was updated to include `.vscode/` to prevent IDE-specific files from being committed. These changes improve the structure of the LLM interaction and make the prompt more readable and maintainable.	2025-11-11 23:39:47 +11:00
nite	26951b8bc0	feat(llm): Add Ollama provider and PyMuPDF image extraction This commit introduces support for Ollama as an alternative Large Language Model (LLM) provider and enhances PDF image extraction capabilities. - Ollama Integration: - Implemented `set_ollama_config` to configure Ollama's base URL from `config.ini`. - Modified `llm.py` to dynamically select and configure the LLM (Gemini or Ollama) based on the `PROVIDER` setting. - Updated `get_model_name` to return provider-specific default model names. - `pdf_convertor.py` now conditionally initializes `ChatGoogleGenerativeAI` or `ChatOllama` based on the configured provider. - PyMuPDF Image Extraction: - Added a new `extract_images_from_pdf` function using PyMuPDF (`fitz`) for direct image extraction from PDF files. - Introduced `get_extract_images_from_pdf_flag` to control this feature via `config.ini`. - `convert_pdf_to_markdown` and `refine_content` functions were updated to utilize this new image extraction method when enabled. - Refinement Flow: - Adjusted the order of `save_md_images` in `main.py` and added an option to save the refined markdown with a specific filename (`index_refined.md`). - Dependencies: - Updated `pyproject.lock` to include new dependencies for Ollama integration (`langchain-ollama`) and PyMuPDF (`PyMuPDF`), along with platform-specific markers for NVIDIA dependencies.	2025-11-11 22:35:23 +11:00
nite	2c6c2c1078	improve prompt	2025-11-10 00:21:18 +11:00
nite	e05c15db16	u	2025-11-07 04:03:57 +11:00
nite	40ff3756a5	update: README	2025-10-30 05:14:51 +11:00
nite	3eef042111	refactor(app): Extract PDF conversion logic into a separate module The main.py script was becoming monolithic, containing all the logic for PDF conversion, image path simplification, and content refinement. This change extracts these core functionalities into a new `pdf_convertor` module. This refactoring improves the project structure by: - Enhancing modularity and separation of concerns. - Making the main.py script a cleaner, high-level orchestrator. - Improving code readability and maintainability. The functions `convert_pdf_to_markdown`, `save_md_images`, and `refine_content` are now imported from the `pdf_convertor` module and called from the main execution block.	2025-10-27 20:02:02 +11:00
nite	4f29d5c814	feat(llm): Send images to model and enhance processing prompt	2025-10-25 22:51:54 +11:00
nite	37d4facee3	feat: Enable batch processing of PDF files and update README	2025-10-22 20:56:17 +11:00
nite	ad212a35af	init	2025-10-22 17:10:29 +11:00

14 Commits