-
Notifications
You must be signed in to change notification settings - Fork 4k
spanset: fix and simplify mergeSpans #157105
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Epic: none Release note: none
41578e4 to
3ded0a0
Compare
| if cur.Timestamp != prev.Timestamp { | ||
| r = append(r, cur) | ||
| } else if len(prev.EndKey) == 0 { // prev is a point key | ||
| if !cur.Key.Equal(prev.Key) { // cur.Key > prev.Key |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So this func goes an extra mile to merge [a,b) + [b] = [a,b] (see below). However, it doesn't handle cases like two consecutive point keys:
[a] + [a.Next()] = [a,a.Next().Next())
I wonder if we should do it for completeness. Today:
[a,b) + [b] + [b.Next()] = [a,b.Next().Next())
[b] + [b.Next()] = [b] + [b.Next()]
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have a commit that fixes it, but I'll send it separately after this PR merges. As is, the PR does not change the behaviour of mergeSpans.
b73d78c to
38d2e14
Compare
Epic: none Release note: none
38d2e14 to
28ea5af
Compare
It was passed by pointer to avoid allocation when converted to sort.Interface, but it's no longer needed with slices.SortFunc. Epic: none Release note: none
28ea5af to
8dc1395
Compare
Epic: none Release note: none
Epic: none Release note: none
Epic: none Release note: none
8f9764e to
cb4c5d0
Compare
stevendanna
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall I like this cleanup.
For the commits with "simplify" I've mostly tried to just verify that I was able to follow the logic without thinking too hard and I think this accomplishes that.
One thing I've seen us do elsewhere is list out the cases in a block comment above where it is a bit easier to spell them out with labels:
// CASE 1: [a] + [b, any) = [a], [b, any)
// CASE 2: [a] + [a, any) = [a, any)
// CASE 3: [a, c) + [d, any) = [a,c], [d,any)
// CASE 4: [a, c) + [c] = [a, c.Next())
// CASE 5: [a, c) + [c, d) = [a, d)
// CASE 6: [a, c) + [b, d) = [a, d)
// CASE 7: [a, c) + [b-bb) = [a, c)
And then inline refer to the case labels.
Not recommending this as I think your inline comments made it straightforward to follow. Just noting it for your consideration in case you prefer it.
| // TODO(pav-kv): does this make CheckAllowed falsely fail in some cases? Maybe | ||
| // it's fine: importantly, it should not falsely succeed. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you have an example of what you are considering here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Something similar to the concern under TODO(irfansharif) below. Say there are 2 spans [a-b)@10 [b-c)@10 that haven't merged due to mergeSpans best-effortness. If there is a read of span [a-c), CheckAllowed can fail due to not finding any whole span that includes [a-c), even though it's technically allowed here.
I'll address this as part of TODO(pav-kv) next to mergeSpans saying that we can sort by (timestamp, key) and achieve "canonicalization" of this set. Seemingly, the callers don't rely on this being sorted by key (and I also confirmed that by running all kvserver tests locally with this change).
Epic: none Release note: none
Epic: none Release note: none
All branches in this loop check timestamp at the end. This was unnecessarily verbose, error-prone, and led to a bug fixed in the previous commit. Check the timestamp at the beginning once instead. Epic: none Release note: none
Epic: none Release note: none
Epic: none Release note: none
Epic: none Release note: none
cb4c5d0 to
d45b333
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TFTR!
@stevendanna the CASE comments look nice, I've adopted the "notation" from it in the inlined comments.
| // TODO(pav-kv): does this make CheckAllowed falsely fail in some cases? Maybe | ||
| // it's fine: importantly, it should not falsely succeed. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Something similar to the concern under TODO(irfansharif) below. Say there are 2 spans [a-b)@10 [b-c)@10 that haven't merged due to mergeSpans best-effortness. If there is a read of span [a-c), CheckAllowed can fail due to not finding any whole span that includes [a-c), even though it's technically allowed here.
I'll address this as part of TODO(pav-kv) next to mergeSpans saying that we can sort by (timestamp, key) and achieve "canonicalization" of this set. Seemingly, the callers don't rely on this being sorted by key (and I also confirmed that by running all kvserver tests locally with this change).
|
bors r=stevendanna |
157105: spanset: fix and simplify mergeSpans r=stevendanna a=pav-kv The `mergeSpans` function used for `SortAndDedup` calls on `SpanSet` is not tested and not clear about the semantics when the input slice of `Span@Timestamp` has keys that are present at multiple timestamps. In the latter case, `mergeSpans` does not output a minimal / "canonical" set of spans. This does not look good because the result of this call is fed to the latch manager, i.e. this function is on the critical path for CRDB correctness / performance. Adding a test revealed a bug in this function that could result in placing a wider latch than was requested. This PR fixes this bug, adds tests, clarifies the semantics of and refactors `mergeSpans` (and the `roachpb.MergeSpans` that it is mostly a copy of) into a readable state. Epic: none Release note (bug fix): a bug in key span merging could result in placing wider latches than needed for a request, which could impose unnecessary contention. The effect of this bug hasn't been observed in the wild, and possibly never happened. Co-authored-by: Pavel Kalinnikov <pavel@cockroachlabs.com>
|
Build failed: |
|
bors retry |
The
mergeSpansfunction used forSortAndDedupcalls onSpanSetis not tested and not clear about the semantics when the input slice ofSpan@Timestamphas keys that are present at multiple timestamps. In the latter case,mergeSpansdoes not output a minimal / "canonical" set of spans.This does not look good because the result of this call is fed to the latch manager, i.e. this function is on the critical path for CRDB correctness / performance. Adding a test revealed a bug in this function that could result in placing a wider latch than was requested. This PR fixes this bug, adds tests, clarifies the semantics of and refactors
mergeSpans(and theroachpb.MergeSpansthat it is mostly a copy of) into a readable state.Epic: none
Release note (bug fix): a bug in key span merging could result in placing wider latches than needed for a request, which could impose unnecessary contention. The effect of this bug hasn't been observed in the wild, and possibly never happened.