-
Notifications
You must be signed in to change notification settings - Fork 10
perf: improve wildcard query perf with predicate and contains-check pushdown #397
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -1,6 +1,7 @@ | ||
| package token | ||
|
|
||
| import ( | ||
| "bytes" | ||
| "encoding/binary" | ||
| "fmt" | ||
| "math" | ||
|
|
@@ -10,6 +11,7 @@ import ( | |
|
|
||
| "github.com/ozontech/seq-db/cache" | ||
| "github.com/ozontech/seq-db/logger" | ||
| "github.com/ozontech/seq-db/pattern" | ||
| "github.com/ozontech/seq-db/storage" | ||
| ) | ||
|
|
||
|
|
@@ -60,6 +62,30 @@ func (b *Block) GetToken(index int) []byte { | |
| return b.Payload[offset : offset+l] | ||
| } | ||
|
|
||
| func (b *Block) FindContains(from, to int, needle []byte) ([]int, error) { | ||
| indices := make([]int, 0) | ||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I guess you could pass here slice of needles as well to handle queries like
Collaborator
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. No, I think it's doable. Maybe will do |
||
| for i := from; i <= to; i++ { | ||
| if bytes.Contains(b.GetToken(i), needle) { | ||
| indices = append(indices, i) | ||
| } | ||
| } | ||
| return indices, nil | ||
| } | ||
|
|
||
| func (b *Block) FindToken(from, to int, searcher pattern.Searcher) ([]int, error) { | ||
| indices := make([]int, 0) | ||
| for i := from; i <= to; i++ { | ||
| ok, err := searcher.Check(b.GetToken(i)) | ||
| if err != nil { | ||
| return nil, err | ||
| } | ||
| if ok { | ||
| indices = append(indices, i) | ||
| } | ||
| } | ||
| return indices, nil | ||
| } | ||
|
|
||
| // BlockLoader is responsible for Reading from disk, unpacking and caching tokens blocks. | ||
| // NOT THREAD SAFE. Do not use concurrently. | ||
| // Use your own BlockLoader instance for each search query | ||
|
|
||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We've discussed that you can perform
bytes.Containson the block payload before checking each token individually. Have you measured performance of such optimization?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I tried calling
bytes.Indexon entire payload. It boosts even further comparing to this PR:message:foobar
35 ms => 9 ms
However, this means that when
bytes.Indexreturns and if we have some proper index returned, then we need to do a bin search onOffsetsto find an index and then check for false positive. It also comes with neat property that we can avoid callUnpack(build offsets) lazily which boosts cold query performance (somewhat around extra 20%).I put a task to the backlog, decided that it's too much for a single PR.