Skip to content

Update PDFStreamEngine.java#27

Open
royguo wants to merge 5617 commits intoapache:trunkfrom
royguo:trunk
Open

Update PDFStreamEngine.java#27
royguo wants to merge 5617 commits intoapache:trunkfrom
royguo:trunk

Conversation

@royguo
Copy link
Copy Markdown

@royguo royguo commented Oct 3, 2016

No need to allocate a new ArrayList here, reduce text extraction time from 16 seconds to 14 seconds on a 4.2M pdf.

THausherr and others added 30 commits June 21, 2016 16:35
…er from twelvemonkeys; add check for orientation

git-svn-id: https://svn.apache.org/repos/asf/pdfbox/trunk@1749936 13f79535-47bb-0310-9956-ffa450edef68
THausherr and others added 22 commits September 11, 2016 12:23
…d, as suggested by Lorenz Pahl

git-svn-id: https://svn.apache.org/repos/asf/pdfbox/trunk@1760963 13f79535-47bb-0310-9956-ffa450edef68
There's no need to allocate new ArrayList in `processStreamOperators`. In my test case of a `4.2M` pdf, text extraction reduce from 16 seconds to 14 seconds.
@THausherr
Copy link
Copy Markdown
Contributor

THausherr commented Oct 3, 2016

This is a read only mirror. Please close this and open an issue in JIRA.
https://issues.apache.org/jira/browse/PDFBOX

@THausherr
Copy link
Copy Markdown
Contributor

Of course every speed increase is welcome, but this change is one to be discussed with "the rest of the gang" - what is if one of the processOperator methods keeps the argument list? If not now, maybe at a later time? Your change would pull it under the feet.

@royguo
Copy link
Copy Markdown
Author

royguo commented Oct 4, 2016

@THausherr What do you mean by keep the argument list ? I assume you mean someone want to keep the elements in arguments inside processOperator, well, in that case, the clear method only remove elements out of arguments, not destroy them, so if some one keeps reference of the elements, it will still works.

@skjolber
Copy link
Copy Markdown

Any progress on this? The users of the passed array must make a copy of the arguments array.

@THausherr
Copy link
Copy Markdown
Contributor

No progress, this is a read only mirror. I told to create an issue in JIRA. I won't create it myself because I'm not persuaded by this. If "The users of the passed array must make a copy of the arguments array." then where would be the speed gain?

@skjolber
Copy link
Copy Markdown

skjolber commented Jul 1, 2017

I should have written: The users of the passed array, which have to keep a list of the arguments, must make a copy of the arguments array. However I agree, this kind of optimalization must be investigated further, so that there is no unexpected side-effects.

I've created #38 which investigates whether the ArrayList is in use after the call to processor. First impression is that this is not the case, and that the optimalization is possible.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants