Skip to content

Add bucket recall heat map for compaction-aware speculative decoding#28

Closed
sharpninja wants to merge 2 commits intomainfrom
claude/hardcore-moore
Closed

Add bucket recall heat map for compaction-aware speculative decoding#28
sharpninja wants to merge 2 commits intomainfrom
claude/hardcore-moore

Conversation

@sharpninja
Copy link
Copy Markdown
Owner

Summary

  • Add BucketRecallHeatMap that tracks per-token and per-chain attempt/accept counts during speculative decoding with a 256x256 chain transition adjacency matrix for hot-path detection
  • Hot-path-aware compaction ranking: chains on frequently-traversed sequences get a preservation bonus; isolated low-recall chains are identified for pruning
  • Binary sidecar serializer (BRHM v1 format with CRC32) persisted alongside model checkpoints and GGUF files
  • Mermaid diagram visualizations (xychart-beta bar charts and flowchart hot-path/compaction reports)
  • Configurable via BitNetOptions.EnableRecallHeatMap (default: true), zero overhead when disabled via null-conditional

Test plan

  • 24 unit tests for BucketRecallHeatMap (recording, queries, transitions, hot-paths, compaction, merge, round-trip)
  • 6 tests for BucketRecallHeatMapSerializer (round-trip, CRC32 corruption, invalid magic/version)
  • 9 tests for BucketRecallVisualizer (Mermaid output format, token/chain labels, color styling)
  • 2 benchmark integration tests (heat map enabled tracks counters, disabled yields null)
  • Full regression: 116 existing tests pass on both net9.0 and net10.0
  • Build: 0 warnings, 0 errors

🤖 Generated with Claude Code

sharpninja and others added 2 commits April 7, 2026 12:20
- Generate default BitNet model weights (.gguf) during CI build and
  include them in the BitNetSharp.Core NuGet package as content files
- Configure both Azure Pipelines and GitHub Actions to publish NuGet
  packages to https://nuget.pkg.github.com/sharpninja/index.json
- Reference McpServer variable library for GH_TOKEN authentication
  in Azure Pipelines
- Add nuget.config with GitHub Packages source and package source
  mapping to prevent dependency confusion
- Bump version to 0.6.0

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Introduce BucketRecallHeatMap that tracks per-token and per-chain
attempt/accept counts during speculative decoding, with a 256x256
chain transition adjacency matrix for hot-path detection. The heat
map identifies frequently-traversed chain sequences and uses them
to rank buckets for compaction — chains on hot-paths receive a
preservation bonus while isolated low-recall chains are pruned first.

Key components:
- BucketRecallHeatMap: O(1) recording via long[] arrays, hot-path
  detection via greedy walk on transition matrix, compaction ranking
- BucketRecallHeatMapSerializer: binary sidecar (BRHM v1, CRC32)
- BucketRecallVisualizer: Mermaid xychart-beta and flowchart output
- Configurable via BitNetOptions.EnableRecallHeatMap (default: true)
- Persisted alongside model checkpoints and GGUF files as sidecar

41 new tests covering all public methods, serialization round-trips,
and benchmark integration (heat map enabled vs disabled).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@sharpninja
Copy link
Copy Markdown
Owner Author

Moving PR to Azure DevOps.

@sharpninja sharpninja closed this Apr 8, 2026
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 2c8bd66461

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +313 to 315
_recallHeatMap?.RecordChainAttempt(chain.ChainId, chain.TokenIds, matchedPrefixLen);
var acceptedTokensForChain = 0;
for (var ci = matchedPrefixLen; ci < chain.TokenIds.Length && step < maxGeneratedTokens - 1; ci++)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Count only verified speculative tokens as attempts

RecordChainAttempt is called before speculative verification starts, and it records the entire remaining chain tail as attempted even when the verification loop runs fewer iterations (or none) due to step < maxGeneratedTokens - 1 and EOS/UNK early breaks. This inflates attempt counters with non-attempted tokens/chains, which biases recall downward and can mis-rank compaction candidates. Record attempts only for tokens that are actually verified (e.g., inside the loop or with a bounded attempted length).

Useful? React with 👍 / 👎.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant