Skip to content

Conversation

@valerybokov
Copy link

I did a little research and realized that these streams are already buffered. It's possible that in some very old SDK versions they won't be buffered. In my opinion, it's best to only instantiate a BufferedInputStream if the current stream is unbuffered. I mean, if you'd like then I can add the check is this BufferedInputStream and wrap current stream if not. In the current implementation in the library, we're experiencing double synchronization due to the double wrapping of the BufferedInputStream.

@THausherr
Copy link
Contributor

What research did you do? But yes I'm of course for reducing code but I don't want a check.
Btw I've been quiet here because I'm currently investigating issues in xmpbox.

@valerybokov
Copy link
Author

What research did you do? But yes I'm of course for reducing code but I don't want a check. Btw I've been quiet here because I'm currently investigating issues in xmpbox.

I mean, for all the code sections that were changed here, I verified that the SomePdfBoxClass.class.getResourceAsStream method returns a buffered stream.
Yes, I saw that you were working on xmpbox.

@THausherr
Copy link
Contributor

ChatGPT disagrees, and I usually treat it as a mainstream opinion on a topic (and I'm aware that it is hallucinating sometimes)
https://chatgpt.com/share/694d1725-2728-8006-bfd3-e66c0bba60d0

@valerybokov
Copy link
Author

valerybokov commented Dec 25, 2025

ChatGPT disagrees, and I usually treat it as a mainstream opinion on a topic (and I'm aware that it is hallucinating sometimes)

I don't care what the СhatGPT responds to. As I stated earlier, I checked all cases, and this method always returned a BufferedInputStream. This method returns an InputStream, which is the parent class of BufferedInputStream. Java doesn't really guarantee that it will always return a BufferedInputStream. Therefore, I suggested adding a check there. In my opinion, this is better than double synchronization.

InputStream is = SomePdfBoxClass.class.getResourceAsStream(somePath);
if (!(is instanceof BufferedInputStream))
{
is = new BufferedInputStream(is);
}

//another part of the code

@valerybokov
Copy link
Author

image

@lehmi
Copy link
Contributor

lehmi commented Dec 25, 2025

How could you be so sure that you checked all cases? I had a quick look and there are different branches which get confusing there one dig deeper into it. Many of them ended in java.net.URLConnection.getInputStream() which is implemented by several classes and some are JDK specific, so that there might be other implementations in other JDK you never saw.
I've found one example in the corretto JDK which uses an ByteArrayInput stream: sun.net.www.protocol.file.FileURLConnection.getInputStream()

I'm afraid the assumption that all implementations are already returning a buffered stream is not correct.

I agree with @THausherr and won't support a check. Did you have some cases where such a change introduces a real improvement?

@THausherr
Copy link
Contributor

I think that most usages are in tests and in initialization. If there is a usage that happens regularly, e.g. with each page, only then it would make sense to look for optimization.

@valerybokov
Copy link
Author

How could you be so sure that you checked all cases? I had a quick look and there are different branches which get confusing there one dig deeper into it. Many of them ended in java.net.URLConnection.getInputStream() which is implemented by several classes and some are JDK specific, so that there might be other implementations in other JDK you never saw. I've found one example in the corretto JDK which uses an ByteArrayInput stream: sun.net.www.protocol.file.FileURLConnection.getInputStream()

I'm afraid the assumption that all implementations are already returning a buffered stream is not correct.

I agree with @THausherr and won't support a check. Did you have some cases where such a change introduces a real improvement?

Can I be responsible for all cases in the JDK? I only tested this on the trunk branch. I mean, I tested all the cases I'm requesting to change.
I understand this is a weak argument, but PDFBox tests run noticeably faster after my changes. As I said earlier, you're wrapping a buffered stream inside a buffered stream. The methods of these classes are synchronized. Obviously, execution will be slower due to double synchronization.

My suggestion applies only to library resources. These are relatively small files. You don't want to add a check. Even if the getResourceAsStream method returns an unbuffered stream (I think it will always be buffered for new JDK versions), you don't lose anything significant. In my opinion, having a check will be a safer and faster option than your double synchronization. Although, I repeat, I am not insisting on type checking, I am only saying (as does chat gpt) that there is no guarantee that a buffered stream will always be returned.

@valerybokov
Copy link
Author

I think that most usages are in tests and in initialization. If there is a usage that happens regularly, e.g. with each page, only then it would make sense to look for optimization.

In my opinion, loading speed is also important. Especially for server computing, where there may be hundreds or thousands of library instances.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants