Skip to content

OPENNLP-1826 : Prevent OOM during Array Allocation#1035

Open
subbudvk wants to merge 6 commits intoapache:mainfrom
subbudvk:subbudvk-patch-6
Open

OPENNLP-1826 : Prevent OOM during Array Allocation#1035
subbudvk wants to merge 6 commits intoapache:mainfrom
subbudvk:subbudvk-patch-6

Conversation

@subbudvk
Copy link
Copy Markdown
Contributor

@subbudvk subbudvk commented May 5, 2026

Summary

  • HeadRules (English) and AncoraSpanishHeadRules (Spanish) parsed the
    tag count field from head rules files with Integer.parseInt() and used
    the result directly as an array size with no bounds check. A crafted model
    file with a count of Integer.MAX_VALUE would trigger an immediate
    OutOfMemoryError during parser model loading.

  • Added a bounds check in readHeadRules() in both classes: values outside
    [0, 1000] throw IOException before any allocation.

    Since this is constrained by the size of the POS tagset being used this is already a safe margin and a configurable override may not have benefit.

@subbudvk subbudvk changed the title Prevent OOM during Array Allocation OPENNLP-1826 : Prevent OOM during Array Allocation May 5, 2026
Copy link
Copy Markdown
Contributor

@rzo1 rzo1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • Extract 1_000 into a named constant (e.g. MAX_TAGS) with a short comment justifying the bound. Magic numbers are badly (most of the time).
  • The check numTags < 0 only catches parseInt(num) - 2 < 0. If num itself is negative or non-numeric, you get NumberFormatException instead of IOException: wrap the parse or validate num first so callers see a single, predictable failure mode.
  • Add an explicit upper-bound test using a value just above 1_000 (e.g. 1_001) in addition to Integer.MAX_VALUE as the boundary is the interesting test case.

@subbudvk subbudvk requested a review from rzo1 May 6, 2026 14:17
@rzo1
Copy link
Copy Markdown
Contributor

rzo1 commented May 7, 2026

@subbudvk LGTM from my side. Could you create a cherry-pick version targeting 2.x too ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants