Skip to content
This repository was archived by the owner on Apr 11, 2025. It is now read-only.

Commit db01b68

Browse files
committed
[MRG] Utils: optimise get_page_layout
Since the existing code overwrites `layout` and `dim` in each iteration, it is much more efficient to simply return the `layout` and `dim` of the first page. I have tested the difference with a 455 page pdf and the optimisation reduces the time spent from 50 to 5 seconds. Signed-off-by: Karl Bonde Torp <k.torp@samsung.com>
1 parent 644bbe7 commit db01b68

File tree

1 file changed

+8
-6
lines changed

1 file changed

+8
-6
lines changed

camelot/utils.py

Lines changed: 8 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -889,12 +889,14 @@ def get_page_layout(
889889
rsrcmgr = PDFResourceManager()
890890
device = PDFPageAggregator(rsrcmgr, laparams=laparams)
891891
interpreter = PDFPageInterpreter(rsrcmgr, device)
892-
for page in PDFPage.create_pages(document):
893-
interpreter.process_page(page)
894-
layout = device.get_result()
895-
width = layout.bbox[2]
896-
height = layout.bbox[3]
897-
dim = (width, height)
892+
page = next(PDFPage.create_pages(document), None)
893+
if page is None:
894+
raise PDFTextExtractionNotAllowed
895+
interpreter.process_page(page)
896+
layout = device.get_result()
897+
width = layout.bbox[2]
898+
height = layout.bbox[3]
899+
dim = (width, height)
898900
return layout, dim
899901

900902

0 commit comments

Comments
 (0)