Skip to content

fix: stop scraper from finishing ZIM when bad exception arises#55

Merged
benoit74 merged 1 commit intoopenzim:mainfrom
lumpy72006:stop-scraper-on-bad-exceptions
Feb 26, 2026
Merged

fix: stop scraper from finishing ZIM when bad exception arises#55
benoit74 merged 1 commit intoopenzim:mainfrom
lumpy72006:stop-scraper-on-bad-exceptions

Conversation

@lumpy72006
Copy link
Contributor

@lumpy72006 lumpy72006 commented Feb 21, 2026

fixes #54

Changes

  • Wrapped the ZIM creation context manager in a try...except Exception: block inside generator.py.

  • If an exception occurs, delete the incomplete ZIM file and
    bubble the exception to entrypoint and exit with code 1

  • Added test_generate_zim_cleans_up_on_failure to tests/test_generator.py to ensure the file is deleted and the exception is propagated when get_db fails.

@benoit74 benoit74 self-requested a review February 23, 2026 08:35
Copy link
Contributor

@benoit74 benoit74 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not the correct way to fix the issue.

Creator object already has required primitive to indicate ZIM must not be created (finish).

Please have a look at logic around can_finish at https://github.com/openzim/mindtouch/blob/d1909c9b225afc92b44140a5bf9ff5087f6558ec/scraper/src/mindtouch2zim/processor.py#L287-L305 for instance.

@lumpy72006
Copy link
Contributor Author

I didn't know the creator object had that, thank you. Let me know if this works

@lumpy72006 lumpy72006 requested a review from benoit74 February 23, 2026 10:24
@lumpy72006 lumpy72006 force-pushed the stop-scraper-on-bad-exceptions branch from d34ac1e to 35f6118 Compare February 23, 2026 10:28
@codecov
Copy link

codecov bot commented Feb 24, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 92.36%. Comparing base (3ddcae7) to head (0eb08cd).
⚠️ Report is 2 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main      #55      +/-   ##
==========================================
+ Coverage   92.28%   92.36%   +0.08%     
==========================================
  Files           4        4              
  Lines         376      380       +4     
  Branches       37       37              
==========================================
+ Hits          347      351       +4     
  Misses         25       25              
  Partials        4        4              

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Copy link
Contributor

@benoit74 benoit74 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, please add an entry to the CHANGELOG.md and squash all your commits into a single one on top of main branch.

- Set `Creator` object `can_finish` to `False`.
- Add test coverage for failure cleanup.
@lumpy72006 lumpy72006 force-pushed the stop-scraper-on-bad-exceptions branch from 35f6118 to 0eb08cd Compare February 24, 2026 14:21
@lumpy72006
Copy link
Contributor Author

Done

@lumpy72006 lumpy72006 requested a review from benoit74 February 25, 2026 16:15
Copy link
Contributor

@benoit74 benoit74 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@benoit74 benoit74 merged commit 159e681 into openzim:main Feb 26, 2026
8 checks passed
@lumpy72006 lumpy72006 deleted the stop-scraper-on-bad-exceptions branch February 26, 2026 08:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Scraper should not finish ZIM when bad exception arises

2 participants