Skip to content

Conversation

@chrisonntag
Copy link

Description

The current page break handling on divs only recognized the deprecated page-break-after: always property. However, modern browsers and CSS generators use break-after: page.

This extends the current logic by a more detailed regex to catch the deprecated page-break-after, as well as break-after and handle whitespace variations and !important flags.

Checklist Before Requesting a Review

  • I have performed a self-review of my code.
  • My code follows the project's coding style and guidelines.
  • I have run tests and verified that all existing and new tests pass.
  • I have added new tests to cover my changes.

Test issues

I've run all existing and new tests, but both on this branch and the main branch the following test failes:

======================================================================
FAIL: test_local_img (test_h4d.OutputTest.test_local_img)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/christoph/projects/html4docx/tests/test_h4d.py", line 388, in test_local_img
    assert image_found, "No image was found in the document"
           ^^^^^^^^^^^
AssertionError: No image was found in the document

----------------------------------------------------------------------
Ran 84 tests in 6.770s

FAILED (failures=1)

Copilot AI and others added 4 commits February 4, 2026 15:08
Co-authored-by: chrisonntag <5236298+chrisonntag@users.noreply.github.com>
Co-authored-by: chrisonntag <5236298+chrisonntag@users.noreply.github.com>
…itions

Support CSS3 break-after:page alongside deprecated CSS2 page-break-after:always
@dfop02 dfop02 self-requested a review February 10, 2026 17:57
@dfop02 dfop02 self-assigned this Feb 10, 2026
@dfop02 dfop02 added the bug Something isn't working label Feb 10, 2026
Comment on lines +1344 to +1351
if 'style' in current_attrs:
style = current_attrs['style']
# Match CSS2 page-break-after: always or CSS3 break-after: page
# Using regex to ensure we match the exact property-value pairs
# Also handles optional !important flag
if re.search(r'page-break-after\s*:\s*always\s*(!important)?\s*(;|$)', style) or \
re.search(r'break-after\s*:\s*page\s*(!important)?\s*(;|$)', style):
self.doc.add_page_break()
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

First of all, thanks for you support and contribution!
I think it's a good idea move the regex pattern to the constants.py, compile it once and just call it on the h4d.py. This way, we avoid recompiling the patterns on every execution, and the logic becomes cleaner and easier to read. What do you think, do you agree?

constants.py
PAGE_BREAK_REGEXES = (
    re.compile(r'page-break-after\s*:\s*always\s*(?:!important)?\s*(?:;|$)'),
    re.compile(r'break-after\s*:\s*page\s*(?:!important)?\s*(?:;|$)'),
)

h4d.py
if 'style' in current_attrs:
    style = current_attrs['style']
    # Match CSS2 page-break-after: always or CSS3 break-after: page
    # Using regex to ensure we match the exact property-value pairs
    # Also handles optional !important flag
    if style and any(regex.search(style) for regex in constants.PAGE_BREAK_REGEXES):
        self.doc.add_page_break()

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants