Skip to content

find_tables() with layout enabled can return a zero-cell Table, and Table.bbox then raises "ValueError: min() iterable argument is empty" #5030

Description

@armujahid

Description of the bug

With the layout feature enabled (import pymupdf.layout + page.get_layout()), Page.find_tables() can return a Table whose .cells is an empty list. Reading the public Table.bbox property then computes min(map(itemgetter(0), c)) over the empty cell list and raises ValueError: min() iterable argument is empty.

Because Table.bbox is part of the public API — and pymupdf4llm's to_markdown dereferences t.bbox for every detected table — a single zero-cell "phantom" table aborts the whole run on otherwise-valid PDFs (typically image-heavy / scanned slides that go through the layout path).

How to reproduce the bug

Minimal reproducible example (in-memory, no files). Requires the layout package: pip install pymupdf pymupdf-layout.

import pymupdf
import pymupdf.layout  # enables layout-aware table detection

# Eight short text fragments scattered like an OCR'd slide. The layout model
# reads the region as a table, but the grid finder extracts no cells from it.
PLACEMENTS = [
    (84, 620, "Cost", 10),  (214, 280, "Net", 12),
    (88, 505, "12%", 9),    (213, 378, "Margin", 11),
    (130, 245, "Margin", 10), (373, 156, "South", 8),
    (67, 222, "North", 11), (140, 475, "3.4", 11),
]

doc = pymupdf.open()
page = doc.new_page()  # default A4
for x, y, text, size in PLACEMENTS:
    page.insert_text((x, y), text, fontsize=size)

page.get_layout()
tables = page.find_tables()
print("tables found:", len(tables.tables))
for t in tables.tables:
    print("cells:", len(t.cells))
    print("bbox:", t.bbox)   # <-- raises ValueError for the zero-cell table

Actual output / traceback (pymupdf 1.27.2.3, pymupdf-layout 1.27.2.3):

tables found: 1
cells: 0
Traceback (most recent call last):
  ...
  File ".../pymupdf/table.py", line 1534, in bbox
    min(map(itemgetter(0), c)),
    ^^^^^^^^^^^^^^^^^^^^^^^^^^
ValueError: min() iterable argument is empty

Cause. Table.bbox in pymupdf/table.py reduces over the cell list without guarding for an empty one:

@property
def bbox(self):
    c = self.cells
    return (
        min(map(itemgetter(0), c)),  # ValueError when c == []
        min(map(itemgetter(1), c)),
        max(map(itemgetter(2), c)),
        max(map(itemgetter(3), c)),
    )

find_tables() can append Table(page, cells=[]) when the layout model tags a region as a table but the grid finder extracts no cells from it. bbox does not handle that case.

Expected behaviour. Either find_tables() should not emit zero-cell tables, or Table.bbox should return a degenerate/empty rect (e.g. (0, 0, 0, 0)) for a cell-less table instead of raising — so that iterating detected tables and reading .bbox (as pymupdf4llm does) is safe.

Potential fixes, in increasing order of "correctness":

  1. Guard Table.bbox — return a degenerate rect when there are no cells, so the property never reduces over an empty sequence:

    @property
    def bbox(self):
        c = self.cells
        if not c:
            return (0.0, 0.0, 0.0, 0.0)
        return (
            min(map(itemgetter(0), c)),
            min(map(itemgetter(1), c)),
            max(map(itemgetter(2), c)),
            max(map(itemgetter(3), c)),
        )

    Smallest change; the zero-cell table still exists but its bbox is safe.

  2. Don't emit zero-cell tables from find_tables() — skip appending Table(page, cells=[]) when the grid finder extracts no cells from a layout-tagged region. A table with no cells carries no information, and this avoids having to special-case every cell-reducing accessor downstream. This looks like the cleaner fix.

Happy to open a PR for whichever approach you prefer.

Also reproduces on PyMuPDF 1.27.2 (with pymupdf-layout 1.27.2), in addition to the latest 1.27.2.3 selected below.

Related issues — prior reports of the same crash, all closed without a fix for this code path:

PyMuPDF version

1.27.2.3

Operating system

Linux

Python version

3.12

Metadata

Metadata

Assignees

No one assigned

    Labels

    postponepostpone to a future version

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions