Skip to content

cachedb_redis: add Redis cluster hash tag support to redisHash#3815

Open
NormB wants to merge 1 commit intoOpenSIPS:masterfrom
NormB:feature/redis-hashtag-support
Open

cachedb_redis: add Redis cluster hash tag support to redisHash#3815
NormB wants to merge 1 commit intoOpenSIPS:masterfrom
NormB:feature/redis-hashtag-support

Conversation

@NormB
Copy link
Copy Markdown
Member

@NormB NormB commented Feb 10, 2026

Fix redisHash() to extract hash tags per the Redis cluster spec and use the spec-mandated CRC16(key) mod 16384 bitmask.

Summary

Fix redisHash() in cachedb_redis to support Redis cluster hash tags ({...}) and use the spec-mandated slot modulus.

Details

The redisHash() function in modules/cachedb_redis/cachedb_redis_utils.c has two correctness issues when compared to the https://redis.io/docs/latest/operate/oss_and_stack/reference/cluster-spec/:

  1. Missing hash tag {...} extraction — Redis cluster keys containing {substring} should hash only the substring inside the first {...} pair, enabling co-location of related keys on the same slot. The current code always hashes the full key. This means keys like {user:1000}.session and {user:1000}.profile, which are designed to land on the same slot, get scattered across different slots.

  2. Uses & con->slots_assigned instead of & 16383 — slots_assigned is set to the max end_slot found across nodes (which happens to be 16383 in a fully-allocated cluster), but this is fragile and semantically wrong. The Redis spec mandates CRC16(key) mod 16384 (i.e., & 16383).

Standalone test results confirming the bug and fix:
CRC16 check: crc16("123456789") = 0x31C3 (expected 0x31C3) PASS

KEY OLD NEW STATUS
foo 12182 12182 PASS (regression safe)
bar 5061 5061 PASS (regression safe)
hello 866 866 PASS (regression safe)
{user:1000}.session 10291 1649 PASS (old=BROKEN, new=FIXED)
{user:1000}.profile 11999 1649 INFO (slot changed)
{user:1000}.followers 4151 1649 PASS (old=BROKEN, new=FIXED)
foo{} 5542 5542 PASS (regression safe)
{}key 14961 14961 PASS (regression safe)
foo{{bar}} 4285 4015 INFO (slot changed)
foo{bar}{zap} 4770 5061 INFO (slot changed)

Results: 10 passed, 0 failed

Solution

The redisHash() function now implements hash tag extraction matching the https://redis.io/docs/latest/operate/oss_and_stack/reference/cluster-spec/ from the Redis cluster spec:

  • Find the first { in the key
  • Find the first } after it
  • If both exist and the substring between them is non-empty, hash only that substring
  • Otherwise, hash the full key
  • Always mask with & 16383 per spec

Additionally adds a NULL/empty key guard with LM_ERR and debug logging (LM_DBG) when a hash tag is detected. The function signature is unchanged — no caller modifications needed.

Compatibility

  • Plain keys (no {...}) produce identical slot values — no behavioral change for existing deployments not using hash tags.

  • Keys containing {...} hash tags will now route to different (correct) slots. Any deployment relying on the old (broken) slot assignment for hash-tagged keys would see keys move to new slots on restart. This is the correct behavior per the Redis spec and matches what redis-cli CLUSTER KEYSLOT returns.

  • The slots_assigned field in redis_con is retained — it is still populated during cluster discovery but no longer used for the bitmask. Removing it would be a separate cleanup.

Closing issues

Fix redisHash() to extract hash tags per the Redis cluster spec
and use the spec-mandated CRC16(key) mod 16384 bitmask.
@dondetir
Copy link
Copy Markdown

Thanks for flagging this @NormB You're right — the hash slot calculation fix overlaps with #3815. Looking at your PR, it's actually more thorough on that front since it also adds hash tag {…} extraction, which mine doesn't cover.

The main contribution of this PR is the ASK redirect handling — when a Redis cluster is mid-resharding, source nodes return ASK instead of MOVED, and this requires sending an ASKING command to the target node before retrying. This completes the cluster redirect support started in #3639. That part has no overlap with #3815.

Happy to rebase this PR to remove the hash slot fix and keep only the ASK redirect handling, so it can sit cleanly on top of #3815. Would that work, or would you prefer I fold both into one PR?

@NormB
Copy link
Copy Markdown
Member Author

NormB commented Mar 30, 2026

Thanks @dondetir — rebasing to keep only the ASK redirect handling sounds like the right approach. Your parse_redirect_reply() refactor is clean and I've verified it passes a set of integration tests I have for this area, including live slot migrations across a 3-node cluster.

I have two follow-up PRs that build on top of both #3815 and your ASK work:

Both depend on #3815 and #3852.

@dondetir
Copy link
Copy Markdown

dondetir commented Mar 31, 2026

Thanks @NormB — really appreciate you taking the time to validate the ASK handling against your integration suite, especially with live slot migrations on a 3-node cluster. That gives me a lot of confidence in how our changes interact.

I saw #3855 and #3856 — impressive work. The dynamic topology refresh and MI observability commands will be a big improvement for production cluster deployments.

I'd be happy to do a thorough review of both PRs if that would be useful — I've already been reading through the diffs and have some notes. Just let me know.

Happy to coordinate on merge ordering to keep the dependency chain clean.

@NormB
Copy link
Copy Markdown
Member Author

NormB commented Mar 31, 2026

I'd be happy to do a thorough review of both PRs if that would be useful — I've already been reading through the diffs and have some notes. Just let me know.

@dondetir Thanks for the offer to perform a review the the two cachedb_redis PRs as it would be greatly appreciated. @liviuchircu knows the module very well, but I think that he and the rest of the team are heads down working on the upcoming 4+ release.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants