OPENNLP-1826 : Prevent OOM during Array Allocation#1035
Open
subbudvk wants to merge 6 commits intoapache:mainfrom
Open
OPENNLP-1826 : Prevent OOM during Array Allocation#1035subbudvk wants to merge 6 commits intoapache:mainfrom
subbudvk wants to merge 6 commits intoapache:mainfrom
Conversation
rzo1
reviewed
May 5, 2026
Contributor
rzo1
left a comment
There was a problem hiding this comment.
- Extract 1_000 into a named constant (e.g. MAX_TAGS) with a short comment justifying the bound. Magic numbers are badly (most of the time).
- The check numTags < 0 only catches parseInt(num) - 2 < 0. If num itself is negative or non-numeric, you get NumberFormatException instead of IOException: wrap the parse or validate num first so callers see a single, predictable failure mode.
- Add an explicit upper-bound test using a value just above 1_000 (e.g. 1_001) in addition to Integer.MAX_VALUE as the boundary is the interesting test case.
rzo1
approved these changes
May 7, 2026
Contributor
|
@subbudvk LGTM from my side. Could you create a cherry-pick version targeting 2.x too ? |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
HeadRules(English) andAncoraSpanishHeadRules(Spanish) parsed thetag count field from head rules files with
Integer.parseInt()and usedthe result directly as an array size with no bounds check. A crafted model
file with a count of
Integer.MAX_VALUEwould trigger an immediateOutOfMemoryErrorduring parser model loading.Added a bounds check in
readHeadRules()in both classes: values outside[0, 1000]throwIOExceptionbefore any allocation.Since this is constrained by the size of the POS tagset being used this is already a safe margin and a configurable override may not have benefit.