Skip to content

[Security] Arbitrary Infinite Loop Denial of Service (DoS) via Crafted PDF Table of Contents #174

@YLChen-007

Description

@YLChen-007

Advisory Details

Title: Arbitrary Infinite Loop Denial of Service (DoS) via Crafted PDF Table of Contents

Description:

Summary

An unbounded while loop vulnerability in the toc_transformer function allows an unauthenticated attacker to cause a perpetual Denial of Service (DoS) and rapidly exhaust LLM API credits. By providing a PDF with an intentionally long Table of Contents, the system triggers length-truncated API responses that permanently trap the application into continuously querying the backend LLM API.

Details

The root cause resides in pageindex/page_index.py at line 303 within the toc_transformer() function. The application uses an LLM to structure a raw Table of Contents string into a hierarchical JSON format.
If the LLM's response hits the maximum output token limit (finish_reason == "length"), the application automatically attempts to instruct the model to "continue". Crucially, the while loop lacks any retry counter or iteration limits (unlike the correctly-patched extract_toc_content function which explicitly caps attempts to 5).

Consequently, if the model repeatedly truncates the JSON or rejects the completeness check, the execution falls into an inescapable infinite loop:

while not (if_complete == "yes" and finish_reason == "finished"):
    # ... rebuilds prompt and calls ChatGPT_API_with_finish_reason
    new_complete, finish_reason = ChatGPT_API_with_finish_reason(model=model, prompt=prompt)
    # ...
    if_complete = check_if_toc_transformation_is_complete(toc_content, last_complete, model)
    # NO ITERATION LIMIT OR BAILOUT CONDITION

PoC

  1. Generate an adversarial PDF with thousands of sections in the TOC (sufficiently large to cause the LLM to truncate output), or set up a Mock OpenAI proxy that forcibly returns finish_reason: "length".
  2. Run the application via the CLI against the malicious PDF:
    python run_pageindex.py --pdf_path evil_toc.pdf --model gpt-3.5-turbo
  3. Observe the process forever attempting to complete the TOC, utilizing 100% of a CPU thread and rapidly emitting requests. (In a real production environment, this drastically drains OpenAI API credits).

Log of Evidence

[*] Setting up Mock API environment variables on port 18080
[*] Triggering PageIndex parsing on the malicious PDF...
[*] Executing: python3 run_pageindex.py --pdf_path evil_toc.pdf --model gpt-3.5-turbo
[Target] Parsing PDF...
[MockAPI] Returning finish_reason: 'length' (max_output_reached)
[MockAPI] Returning completed: 'no'
[MockAPI] Returning finish_reason: 'length' (max_output_reached)
[MockAPI] Returning completed: 'no'
[MockAPI] Returning finish_reason: 'length' (max_output_reached)
[MockAPI] Returning completed: 'no'
...
[!] The process has been running for over 15 seconds, stuck in the infinite loop.

Impact

This vulnerability allows a complete and unauthenticated Denial of Service (DoS) by causing process hanging and unbounded API usage, resulting in service unavailability and the immediate financial exhaustion of the backend LLM service billing account.

Affected products

  • Ecosystem: python
  • Package name: PageIndex
  • Affected versions: All versions currently in repository (main branch)
  • Patched versions:

Severity

  • Severity: High
  • Vector string: CVSS:3.1/AV:N/AC:L/PR:N/UI:N/S:U/C:N/I:N/A:H

Weaknesses

  • CWE: CWE-835: Loop with Unreachable Exit Condition ('Infinite Loop')

Occurrences

Permalink Description
pageindex/page_index.py#L303 The vulnerable unbounded while loop within toc_transformer failing to cap API retry attempts.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions