Add bucket recall heat map for compaction-aware speculative decoding#28
Add bucket recall heat map for compaction-aware speculative decoding#28sharpninja wants to merge 2 commits intomainfrom
Conversation
- Generate default BitNet model weights (.gguf) during CI build and include them in the BitNetSharp.Core NuGet package as content files - Configure both Azure Pipelines and GitHub Actions to publish NuGet packages to https://nuget.pkg.github.com/sharpninja/index.json - Reference McpServer variable library for GH_TOKEN authentication in Azure Pipelines - Add nuget.config with GitHub Packages source and package source mapping to prevent dependency confusion - Bump version to 0.6.0 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Introduce BucketRecallHeatMap that tracks per-token and per-chain attempt/accept counts during speculative decoding, with a 256x256 chain transition adjacency matrix for hot-path detection. The heat map identifies frequently-traversed chain sequences and uses them to rank buckets for compaction — chains on hot-paths receive a preservation bonus while isolated low-recall chains are pruned first. Key components: - BucketRecallHeatMap: O(1) recording via long[] arrays, hot-path detection via greedy walk on transition matrix, compaction ranking - BucketRecallHeatMapSerializer: binary sidecar (BRHM v1, CRC32) - BucketRecallVisualizer: Mermaid xychart-beta and flowchart output - Configurable via BitNetOptions.EnableRecallHeatMap (default: true) - Persisted alongside model checkpoints and GGUF files as sidecar 41 new tests covering all public methods, serialization round-trips, and benchmark integration (heat map enabled vs disabled). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
Moving PR to Azure DevOps. |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 2c8bd66461
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| _recallHeatMap?.RecordChainAttempt(chain.ChainId, chain.TokenIds, matchedPrefixLen); | ||
| var acceptedTokensForChain = 0; | ||
| for (var ci = matchedPrefixLen; ci < chain.TokenIds.Length && step < maxGeneratedTokens - 1; ci++) |
There was a problem hiding this comment.
Count only verified speculative tokens as attempts
RecordChainAttempt is called before speculative verification starts, and it records the entire remaining chain tail as attempted even when the verification loop runs fewer iterations (or none) due to step < maxGeneratedTokens - 1 and EOS/UNK early breaks. This inflates attempt counters with non-attempted tokens/chains, which biases recall downward and can mis-rank compaction candidates. Record attempts only for tokens that are actually verified (e.g., inside the loop or with a bounded attempted length).
Useful? React with 👍 / 👎.
Summary
BucketRecallHeatMapthat tracks per-token and per-chain attempt/accept counts during speculative decoding with a 256x256 chain transition adjacency matrix for hot-path detectionBRHMv1 format with CRC32) persisted alongside model checkpoints and GGUF filesBitNetOptions.EnableRecallHeatMap(default:true), zero overhead when disabled via null-conditionalTest plan
BucketRecallHeatMap(recording, queries, transitions, hot-paths, compaction, merge, round-trip)BucketRecallHeatMapSerializer(round-trip, CRC32 corruption, invalid magic/version)BucketRecallVisualizer(Mermaid output format, token/chain labels, color styling)🤖 Generated with Claude Code