Add bucket recall heat map for compaction-aware speculative decoding by sharpninja · Pull Request #28 · sharpninja/BitNet-b1.58-Sharp

sharpninja · 2026-04-08T15:42:15Z

Summary

Add BucketRecallHeatMap that tracks per-token and per-chain attempt/accept counts during speculative decoding with a 256x256 chain transition adjacency matrix for hot-path detection
Hot-path-aware compaction ranking: chains on frequently-traversed sequences get a preservation bonus; isolated low-recall chains are identified for pruning
Binary sidecar serializer (BRHM v1 format with CRC32) persisted alongside model checkpoints and GGUF files
Mermaid diagram visualizations (xychart-beta bar charts and flowchart hot-path/compaction reports)
Configurable via BitNetOptions.EnableRecallHeatMap (default: true), zero overhead when disabled via null-conditional

Test plan

24 unit tests for BucketRecallHeatMap (recording, queries, transitions, hot-paths, compaction, merge, round-trip)
6 tests for BucketRecallHeatMapSerializer (round-trip, CRC32 corruption, invalid magic/version)
9 tests for BucketRecallVisualizer (Mermaid output format, token/chain labels, color styling)
2 benchmark integration tests (heat map enabled tracks counters, disabled yields null)
Full regression: 116 existing tests pass on both net9.0 and net10.0
Build: 0 warnings, 0 errors

🤖 Generated with Claude Code

- Generate default BitNet model weights (.gguf) during CI build and include them in the BitNetSharp.Core NuGet package as content files - Configure both Azure Pipelines and GitHub Actions to publish NuGet packages to https://nuget.pkg.github.com/sharpninja/index.json - Reference McpServer variable library for GH_TOKEN authentication in Azure Pipelines - Add nuget.config with GitHub Packages source and package source mapping to prevent dependency confusion - Bump version to 0.6.0 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Introduce BucketRecallHeatMap that tracks per-token and per-chain attempt/accept counts during speculative decoding, with a 256x256 chain transition adjacency matrix for hot-path detection. The heat map identifies frequently-traversed chain sequences and uses them to rank buckets for compaction — chains on hot-paths receive a preservation bonus while isolated low-recall chains are pruned first. Key components: - BucketRecallHeatMap: O(1) recording via long[] arrays, hot-path detection via greedy walk on transition matrix, compaction ranking - BucketRecallHeatMapSerializer: binary sidecar (BRHM v1, CRC32) - BucketRecallVisualizer: Mermaid xychart-beta and flowchart output - Configurable via BitNetOptions.EnableRecallHeatMap (default: true) - Persisted alongside model checkpoints and GGUF files as sidecar 41 new tests covering all public methods, serialization round-trips, and benchmark integration (heat map enabled vs disabled). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

sharpninja · 2026-04-08T15:46:36Z

Moving PR to Azure DevOps.

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 2c8bd66461

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-04-08T15:46:54Z

src/BitNetSharp.Core/BitNetPaperModel.cs

+                            _recallHeatMap?.RecordChainAttempt(chain.ChainId, chain.TokenIds, matchedPrefixLen);
                            var acceptedTokensForChain = 0;
                            for (var ci = matchedPrefixLen; ci < chain.TokenIds.Length && step < maxGeneratedTokens - 1; ci++)


Count only verified speculative tokens as attempts

RecordChainAttempt is called before speculative verification starts, and it records the entire remaining chain tail as attempted even when the verification loop runs fewer iterations (or none) due to step < maxGeneratedTokens - 1 and EOS/UNK early breaks. This inflates attempt counters with non-attempted tokens/chains, which biases recall downward and can mis-rank compaction candidates. Record attempts only for tokens that are actually verified (e.g., inside the loop or with a bounded attempted length).

Useful? React with 👍 / 👎.

sharpninja and others added 2 commits April 7, 2026 12:20

sharpninja closed this Apr 8, 2026

chatgpt-codex-connector bot reviewed Apr 8, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add bucket recall heat map for compaction-aware speculative decoding#28

Add bucket recall heat map for compaction-aware speculative decoding#28
sharpninja wants to merge 2 commits intomainfrom
claude/hardcore-moore

sharpninja commented Apr 8, 2026

Uh oh!

sharpninja commented Apr 8, 2026

Uh oh!

chatgpt-codex-connector bot left a comment

Uh oh!

chatgpt-codex-connector bot Apr 8, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

sharpninja commented Apr 8, 2026

Summary

Test plan

Uh oh!

sharpninja commented Apr 8, 2026

Uh oh!

chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector bot Apr 8, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant