Skip to content

fix(high_level): close pymupdf Document handles to prevent file lock …#597

Open
BlackSharkJ wants to merge 2 commits into
funstory-ai:mainfrom
BlackSharkJ:fix/close-document-handles
Open

fix(high_level): close pymupdf Document handles to prevent file lock …#597
BlackSharkJ wants to merge 2 commits into
funstory-ai:mainfrom
BlackSharkJ:fix/close-document-handles

Conversation

@BlackSharkJ

@BlackSharkJ BlackSharkJ commented Jun 13, 2026

Copy link
Copy Markdown

PR Title

[PR] fix: close pymupdf Document handles to prevent file lock on Windows

Related Issue(s)

N/A

Motivation and Context

When using BabelDOC's Python API to batch translate multiple PDFs in a loop (e.g. translating a folder of papers and moving source files to a done/ directory after completion), the original PDF remains locked on Windows after translation finishes.

Attempting to shutil.move() or os.unlink() the source PDF raises:

PermissionError: [WinError 32] The process cannot access the file
because it is being used by another process.

Root cause: Several places in high_level.py open pymupdf.Document objects but never call close(). On Windows, PyMuPDF's file handles are not released by Python's GC promptly.

This is not noticeable when using the CLI (babeldoc --files) because the process exits after translation, releasing all handles. But it affects any programmatic usage that needs to manipulate the source file afterward. For example, a common workflow is to loop through a directory of PDFs, translate each one, and move completed files to a done/ folder to avoid reprocessing — this pattern breaks on Windows without the fix.

Summary of Changes

Wrap all Document / pymupdf.open() calls in babeldoc/format/pdf/high_level.py with context managers (with statements) to ensure deterministic file handle cleanup:

  • check_metadata(Document(...)) call in do_translate()
  • old_doc and new_doc in migrate_toc()
  • pymupdf.open() in add_metadata()
  • pymupdf.open() in fix_cmap()

PR Type

  • ✨ Enhancement
  • 🐛 Bug Fix
  • 📚 Documentation
  • 🏗️ Refactor
  • 🧪 Test
  • 🧹 Chore

Breaking Changes

No, this PR does not introduce breaking changes.

Contributor Checklist

  • I have fully read and understood the CONTRIBUTING.md guide.
  • I have performed a self-review of my own code.
  • My changes follow the project's code style and guidelines
  • I have linked the related issue(s) in the description above (if applicable)
  • I have updated relevant documentation (if applicable)
  • I have added necessary tests that prove my fix is effective or that my feature works (if applicable)
  • All new and existing tests passed locally with my changes
  • My changes generate no new warnings or errors
  • I understand that due to limited maintainer resources, only small PRs are accepted. Suggestions with proof-of-concept patches are appreciated, and my patch may be rewritten if necessary.

Testing Instructions

  1. Prepare a Python script that uses BabelDOC's API to translate 2+ PDFs in a loop, and after each translation, attempts to shutil.move() the source PDF to another directory.
  2. Run the script on Windows.
  3. Before this fix: The second file's translation fails with PermissionError: [WinError 32] because the first file's handle is still held.
  4. After this fix: All files translate successfully and are moved without error.

Example reproduction script:

import asyncio, shutil
from pathlib import Path
import babeldoc.format.pdf.high_level as hl
from babeldoc.format.pdf.translation_config import TranslationConfig
from babeldoc.translator.translator import OpenAITranslator, set_translate_rate_limiter

hl.init()
translator = OpenAITranslator(lang_in="en", lang_out="zh", model="gpt-4o-mini",
                              base_url="...", api_key="...")
set_translate_rate_limiter(4)

for pdf in Path("input").glob("*.pdf"):
    config = TranslationConfig(input_file=str(pdf), translator=translator, ...)
    async for event in hl.async_translate(config):
        pass
    shutil.move(str(pdf), f"done/{pdf.name}")  # fails without this fix

Additional Notes

  • The CLI path is unaffected since the process exits after translation. This fix only matters for programmatic/API usage.
  • The with statement ensures Document.close() is called deterministically, which is the recommended pattern in PyMuPDF's documentation.

Summary by cubic

Closes pymupdf Document handles in babeldoc/format/pdf/high_level.py to prevent locked PDFs on Windows after translation, fixing PermissionError: [WinError 32] when moving or deleting source files in API workflows. Also corrects TOC page selection in migrate_toc() when only_include_translated_page is enabled.

  • Bug Fixes
    • Wrapped all pymupdf.open()/Document(...) calls in with blocks in do_translate(), migrate_toc(), add_metadata(), and fix_cmap() for deterministic cleanup.
    • Fixed migrate_toc() page filtering by using range(len(old_doc)) when building pages_to_translate.

Written for commit daba33f. Summary will update on new commits.

Review in cubic

@cubic-dev-ai cubic-dev-ai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1 issue found across 1 file

Reply with feedback, questions, or to request a fix.

Re-trigger cubic

Comment thread babeldoc/format/pdf/high_level.py Outdated
fix(high_level): This bug was introduced by awwaawwa in commit 25205ab (2025-06-20).

Co-authored-by: cubic-dev-ai[bot] <191113872+cubic-dev-ai[bot]@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant