-
Notifications
You must be signed in to change notification settings - Fork 2.3k
Fix: Make concurrent CSV Read function really thread safe #6694
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
FSchumacher
wants to merge
1
commit into
apache:master
Choose a base branch
from
FSchumacher:fix-csv-read
base: master
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
+15
−9
Open
Changes from all commits
Commits
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
WDYT of going with
AtomicLong.getAndIncrementandjava.lang.Math#floorMod(long, int)?Then it could work without a loop.
E.g.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Theoretical we could still run into an overflow and in that case, we would get wrong values ;)
And probably more to the point: the name nextRow is misleading. It should probably be nextRowToReturn or such, as it contains the number, the function should return and the function sets it to the value the next user should read.
Therefore I think the loop is not that badly written.
If we want to take the AtomicLong-road, we should probably start with a value of
-1instead of0, so that the first row is still read. But in that case we would have to check all uses of the value of nextRow outside of the changed function.Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Math.floorModresolves the overflow, so I don't understand what you mean by "wrong values"There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
floorMod should work well within the range of long. But when our long wraps around, the calculation will be wrong.
It might be clearer, when we use a short instead of a long.
There we have
(x % 255) % FILE_SIZEas our current value. Further assume that FILE_SIZE is 200.Then for x == 254 we get 54 and for x == 255 we get 0 (wrap around of short).
The same will happen (really unlikely, as a long can get quite big) with an int or a long. (That might will happen with a long after 286.967.718 years, when we read about 1000 rows a second (for that complete period) :) ) (and that will be one wrong range, that we skip, in the above short example 55..255)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please clarify what exactly would go wrong.
0 is a valid value, isn't it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nothing wrong with the zero per se, but it should have been the 55 in the example I gave.
And we still would have the other problem, that we would have to either decrement the value before returning or start with -1 and look at other uses of nextRow, which doesn't feel right to me, either.