Describe the bug
ExistQuery such as field:* can be wrong when they are satisfied from fast-field storage instead of index_field_presence.
When fast field normalizer is set to raw, the fast-field path treats the whole string as one token and drops values longer than 255 bytes, so nothing is written to the fast column for those docs.
Queries like NOT field:* can then exclude documents that do have field in the source JSON, while term queries on the same field (e.g. field:"github") still work because they use the inverted index.
Suggestions:
Maybe instead of dropping the values we could truncate the values, this will break ordering on that fast column but at the moment with the logic of dropping, it's already broken.
Quickwit version
- Quickwit 0.8.2
- Latest dockerhub image (quickwit/quickwit:edge-slim-bookworm)
Describe the bug
ExistQuery such as
field:*can be wrong when they are satisfied from fast-field storage instead ofindex_field_presence.When fast field normalizer is set to
raw, the fast-field path treats the whole string as one token and drops values longer than 255 bytes, so nothing is written to the fast column for those docs.Queries like
NOT field:*can then exclude documents that do havefieldin the source JSON, while term queries on the same field (e.g. field:"github") still work because they use the inverted index.Suggestions:
Maybe instead of dropping the values we could truncate the values, this will break ordering on that fast column but at the moment with the logic of dropping, it's already broken.
Quickwit version