Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
68 commits
Select commit Hold shift + click to select a range
32b1b77
Add LazyArrayClientMetaDataProvider primitive (Phase 2.0) (#2)
jayakasadev Jun 10, 2026
94a6007
Add StackWalker primitive with frame-pointer walker (Phase 2.1) (#3)
jayakasadev Jun 10, 2026
dc2fdef
Add Sampler + SampledAlloc + SampledList + ReentrancyGuard (Phase 2.2…
jayakasadev Jun 10, 2026
eba7fcf
Add SNMALLOC_PROFILE CMake flag and CI matrix entries (Phase 3.0) (#5)
jayakasadev Jun 10, 2026
fd8c3b2
Add H1 dealloc hook and record_dealloc machinery (Phase 3.1) (#6)
jayakasadev Jun 11, 2026
202e77f
Add H2 remote dealloc hook (Phase 3.2) (#7)
jayakasadev Jun 11, 2026
a639f4a
Add alloc hook and record_alloc wiring (Phase 3.3) (#8)
jayakasadev Jun 11, 2026
d2a8f93
Add H3 + H4 dealloc edge-case hooks and integration test (Phase 3.4) …
jayakasadev Jun 11, 2026
258bbde
Add SNMALLOC_PROFILE C exports and Rust FFI foundation (Phase 4.0) (#10)
jayakasadev Jun 11, 2026
6f62e4a
Add safe Rust HeapProfile + BtSample + SnMalloc snapshot API (Phase 4…
jayakasadev Jun 11, 2026
c08c2df
Wire profile-enabled Config in rust.cc and enable live Rust sampling …
jayakasadev Jun 11, 2026
53af0fb
Add HeapProfile::write_flamegraph and accuracy integration test (Phas…
jayakasadev Jun 11, 2026
ca6145d
Add Rust symbolicator + write_flamegraph_symbolized (Phase 4.4) (#14)
jayakasadev Jun 11, 2026
945b583
Add Rust profile runtime config + env-var init (Phase 4.5) (#15)
jayakasadev Jun 11, 2026
a4ded83
Add Rust flamegraph viewer round-trip test (inferno + speedscope) (Ph…
jayakasadev Jun 11, 2026
1ac5be4
Add streaming-mode sample broadcast (Phase 5.1) (#17)
jayakasadev Jun 11, 2026
3a9fa3f
Add HeapProfile::write_pprof + pprof proto encoder (Phase 6.1) (#18)
jayakasadev Jun 11, 2026
a1583f0
Phase 7.1+7.3: cache-line align bytes_until_sample, validate zero ove…
jayakasadev Jun 11, 2026
b5c6018
Add Rust ProfilingSession + safe streaming API (Phase 5.2) (#20)
jayakasadev Jun 11, 2026
917763e
Phase 6.2: add `go tool pprof` round-trip integration test (#21)
jayakasadev Jun 11, 2026
b59c906
Phase 7.2: criterion bench suite for profiling overhead (#22)
jayakasadev Jun 11, 2026
40528fc
Phase 7.5: add Rust profiling-feature CI matrix (Linux + macOS) (#23)
jayakasadev Jun 11, 2026
4ff64df
Phase 7.4: add snapshot-under-churn stress test (#24)
jayakasadev Jun 11, 2026
b6d9ad6
Phase 8.1: document heap profiling in top-level + snmalloc-rs READMEs…
jayakasadev Jun 11, 2026
3c8914c
Add rustdoc examples to snmalloc-rs profiling API (Phase 8.2) (#26)
jayakasadev Jun 11, 2026
d506191
Fix flake in profile_accuracy::accuracy_single_threaded (#27)
jayakasadev Jun 11, 2026
58f1f55
CI: add Profile + TSan / ASan matrix entries (86aj0h864) (#28)
jayakasadev Jun 11, 2026
3a9413d
Add HeapProfile::write_pprof_gz for gzipped .pb.gz pprof output (#30)
jayakasadev Jun 11, 2026
3ae1a9e
Follow-up C: Publish profile_bench results — README <1% claim partial…
jayakasadev Jun 11, 2026
9a30e5d
Profiling perf: trim TLS work off the alloc and dealloc fast paths (8…
jayakasadev Jun 11, 2026
d65a832
Add realloc event hook for in-place resize sampling (#32)
jayakasadev Jun 11, 2026
4aed8c8
Two-stage PGO build for snmalloc (86aj0jg18) (#35)
jayakasadev Jun 11, 2026
c961038
Bundle 1+3+2 perf tweaks for the profile fast paths (86aj0jfwh) (#34)
jayakasadev Jun 11, 2026
a9e3c5f
Enable fat LTO across snmalloc-rs <-> snmalloc-sys FFI (86aj0jfz1) (#33)
jayakasadev Jun 11, 2026
f2e439f
Add Bazel build for snmalloc-rs (no bindgen); upgrade Bazel + deps (8…
jayakasadev Jun 11, 2026
cd7ccbd
Add missing upstream/cmake symlink for snmalloc-rs build (#37)
jayakasadev Jun 11, 2026
55dcd10
Verify LTO inlining of FFI thunks (86aj0kdve) (#38)
jayakasadev Jun 11, 2026
b0000f8
Wire PGO into CI matrix (86aj0kdw9) (#39)
jayakasadev Jun 11, 2026
d4c6af0
Perf bundle: bootstrap off hot path + branch hints + medium-active di…
jayakasadev Jun 11, 2026
6ceed2e
Fix -Werror=comment in profile_stress.cc (#45)
jayakasadev Jun 12, 2026
ac14f2e
Document external PMU profiling workflow (#41)
jayakasadev Jun 12, 2026
97ed046
Phase 9.0: rename USE_SNMALLOC_STATS to SNMALLOC_STATS and remove dea…
jayakasadev Jun 12, 2026
c83ddce
Phase 10.2: emit SNMALLOC_LIKELY/UNLIKELY inventory at build time (#43)
jayakasadev Jun 12, 2026
8c57af7
Phase 10.1: hot-spot table API + lookup_alloc_site reverse lookup (#44)
jayakasadev Jun 12, 2026
49655bc
Phase 9.1: FullAllocStats typed struct + C ABI + Rust binding (scaffo…
jayakasadev Jun 12, 2026
46bd61f
Phase 10.4: snmalloc-tools crate — perf record/c2c/script joiner CLI …
jayakasadev Jun 12, 2026
bb89c56
Phase 9.4: backend/chunk-level fragmentation stats (#48)
jayakasadev Jun 12, 2026
d9a13b8
Phase 9.7: runtime tunables (sample rate, decay rate, max local cache…
jayakasadev Jun 12, 2026
3c9d5a8
Phase 9.5: sample lifetime histogram (#50)
jayakasadev Jun 12, 2026
0ad5cd6
Phase 9.2: per-thread frontend cache stats (fast/slow path counters) …
jayakasadev Jun 12, 2026
2687912
Phase 9.3: per-size-class histogram (#52)
jayakasadev Jun 12, 2026
1ad21fc
Phase 9.6: text dump API (snmalloc::dump_stats + SnMalloc::dump_stats…
jayakasadev Jun 12, 2026
ff5dd41
Phase 11.3: symbolicate-aware HotSpotKey::CallSite filter (#54)
jayakasadev Jun 12, 2026
3d891eb
Phase 11.2: vendor dump_branch_hints.py into snmalloc-sys/upstream/ (…
jayakasadev Jun 12, 2026
cdbcfa4
Phase 11.4: largebuddy free-chunk histogram into FullAllocStats.reser…
jayakasadev Jun 12, 2026
262cd30
Phase 11.1: SNMALLOC_STATS=ON bench acceptance verification (#56)
jayakasadev Jun 12, 2026
5760d4d
Phase 11.5 (partial): SNMALLOC_STATS hot-path reduction via cache-lin…
jayakasadev Jun 12, 2026
b8476b7
Phase 11.6: tiered SNMALLOC_STATS (BASIC/FULL split) (#60)
jayakasadev Jun 12, 2026
b5b2180
Phase 11.7: install snmalloc as global allocator in stats tests (#59)
jayakasadev Jun 12, 2026
8cde584
Phase 11.8: batch fast_path_allocs credit at small_refill (#61)
jayakasadev Jun 12, 2026
6a25222
Phase 11.9: batch fast_path_deallocs credit at small_refill (#62)
jayakasadev Jun 12, 2026
f3ee3a1
Phase 11.10: pad backend atomics to eliminate false-sharing (#63)
jayakasadev Jun 12, 2026
337bd4d
Phase 11.11: bench-validate Phase 11.10 alignas fix (#64)
jayakasadev Jun 12, 2026
de7baa7
Phase 11.12: pack slow_path_allocs into combined counter (#65)
jayakasadev Jun 12, 2026
5b54dcd
fast_path_counters: measure dealloc delta from `before` snapshot (#66)
jayakasadev Jun 12, 2026
c8dcffa
bazel: mark fuzztest/googletest/rust toolchain as dev_dependency
jayakasadev Jun 12, 2026
6f49b4e
MODULE.bazel: mark test/toolchain deps as dev_dependency (#67)
jayakasadev Jun 12, 2026
abae9a8
need these files in cmake/:
jayakasadev Jun 12, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 4 additions & 0 deletions .bazelrc
Original file line number Diff line number Diff line change
Expand Up @@ -8,4 +8,8 @@ test --test_output=streamed
build:macos --macos_minimum_os=10.15
build:macos --no@fuzztest//fuzztest:use_riegeli

# Rust integration tests (rust_test) print to stderr; keep the output
# from being suppressed so failures are diagnosable in CI.
test --test_output=errors

try-import %workspace%/fuzztest.bazelrc
2 changes: 1 addition & 1 deletion .bazelversion
Original file line number Diff line number Diff line change
@@ -1 +1 @@
8.2.1
8.7.0
57 changes: 57 additions & 0 deletions .github/workflows/bazel.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,57 @@
name: Bazel build

# Smoke-test that the Bazel target graph keeps working alongside the
# Cargo build. We exercise the rust_library variants and at least
# one rust_test -- enough to catch the common regressions in the
# dual-build layer.

on:
push:
branches: [main]
pull_request:
branches: [main]

jobs:
bazel:
name: bazel build + test
runs-on: ubuntu-latest

steps:
- uses: actions/checkout@v4

# The Bazel team officially recommends bazelisk on CI so a
# `.bazelversion` (or the MODULE.bazel) pins the toolchain
# rather than the system bazel.
- name: Install bazelisk
run: |
sudo curl -L -o /usr/local/bin/bazel \
https://github.com/bazelbuild/bazelisk/releases/latest/download/bazelisk-linux-amd64
sudo chmod +x /usr/local/bin/bazel
bazel --version

# Cache the Bazel disk cache so subsequent runs skip the
# rules_rust toolchain download (~150 MB) and the cmake
# action's output. The cache key folds in MODULE.bazel.lock so
# any dependency bump invalidates the cache rather than
# silently reusing a stale repo set.
- name: Cache Bazel
uses: actions/cache@v4
with:
path: |
~/.cache/bazel
key: bazel-${{ runner.os }}-${{ hashFiles('MODULE.bazel.lock', 'MODULE.bazel') }}-${{ github.sha }}
restore-keys: |
bazel-${{ runner.os }}-${{ hashFiles('MODULE.bazel.lock', 'MODULE.bazel') }}-
bazel-${{ runner.os }}-

- name: Bazel build :: snmalloc-rs Rust library (default)
run: bazel build //snmalloc-rs:snmalloc_rs

- name: Bazel build :: snmalloc-sys Rust library (default + profiling)
run: |
bazel build \
//snmalloc-rs/snmalloc-sys:snmalloc_sys \
//snmalloc-rs/snmalloc-sys:snmalloc_sys_profiling

- name: Bazel test :: snmalloc-rs integration tests
run: bazel test //snmalloc-rs:all
107 changes: 107 additions & 0 deletions .github/workflows/main.yml
Original file line number Diff line number Diff line change
Expand Up @@ -83,6 +83,18 @@ jobs:
build-type: Release
extra-cmake-flags: "-DSNMALLOC_TRACING=On"
build-only: true
- os: "ubuntu-24.04"
variant: "Profile Build (gcc)"
build-type: Release
extra-cmake-flags: "-DSNMALLOC_PROFILE=ON"
build-only: true
- os: "ubuntu-24.04"
variant: "Profile Build (clang)"
build-type: Release
extra-cmake-flags: >-
-DCMAKE_CXX_COMPILER=clang++
-DSNMALLOC_PROFILE=ON
build-only: true
- os: "ubuntu-22.04"
variant: "clang libstdc++ (Build only)"
build-type: Release
Expand Down Expand Up @@ -125,6 +137,33 @@ jobs:
dependencies: "sudo apt install -y ninja-build libc++-dev"
test-exclude-pattern: "memcpy|external_pointer"
test-extra-args: "--repeat-until-fail 2"
# Profile + TSan: exercise the heap-profiling code paths
# (perf-profile_stress + func-profile_*) under ThreadSanitizer.
# Uses libc++ because TSan requires a TSan-instrumented C++
# runtime; libstdc++ is not instrumented on Ubuntu. The
# `-R profile_` ctest filter restricts the run to profile
# tests so the sanitizer overhead stays within the CI budget.
- os: "ubuntu-24.04"
variant: "Profile + TSan (clang)"
build-type: "Debug"
extra-cmake-flags: >-
-DSNMALLOC_PROFILE=ON
-DSNMALLOC_SANITIZER=thread
-DCMAKE_CXX_COMPILER=clang++
-DCMAKE_CXX_FLAGS=-stdlib="libc++ -g"
dependencies: "sudo apt install -y ninja-build libc++-dev"
test-extra-args: "-R profile_"
# Profile + ASan: exercise the heap-profiling code paths
# under AddressSanitizer. ASan is compatible with libstdc++,
# so no extra runtime dependency is needed beyond ninja.
- os: "ubuntu-24.04"
variant: "Profile + ASan (clang)"
build-type: "Debug"
extra-cmake-flags: >-
-DSNMALLOC_PROFILE=ON
-DSNMALLOC_SANITIZER=address
-DCMAKE_CXX_COMPILER=clang++
test-extra-args: "-R profile_"
uses: ./.github/workflows/reusable-cmake-build.yml
with:
os: ${{matrix.os}}
Expand Down Expand Up @@ -190,6 +229,11 @@ jobs:
build-type: Release
extra-cmake-flags: "-DSNMALLOC_ENABLE_PAC=ON"
variant: "PAC"
# Profile build with heap profiling support enabled
- os: "macos-15"
build-type: Release
extra-cmake-flags: "-DSNMALLOC_PROFILE=ON"
variant: "Profile Build (clang)"
uses: ./.github/workflows/reusable-cmake-build.yml
with:
os: ${{matrix.os}}
Expand Down Expand Up @@ -472,6 +516,68 @@ jobs:
cd ${{github.workspace}}/build
ctest --parallel --output-on-failure

# ============================================================================
# Profile + PGO (clang) — two-stage profile-guided optimization build
#
# Runs scripts/run-pgo-build.sh end-to-end: stage 1 builds an
# instrumented snmalloc + func-profile_overhead-fast, executes it to
# populate .profraw data, merges via llvm-profdata, and stage 2
# rebuilds with -fprofile-use=<merged.profdata>. The use-stage
# libsnmallocshim-rust.a is uploaded as a release artifact so
# downstream consumers (snmalloc-rs and friends) can pick up the
# PGO-optimized static archive on every push to main.
#
# macOS is intentionally skipped — the matrix has limited macOS
# minutes and the AppleClang/Xcode profraw format is pinned per OS
# image, which would force re-merge across runner upgrades. Run
# scripts/run-pgo-build.sh locally on macOS.
#
# LLVM 19 matches the COMPILER_RT_LLVM_VERSION env at the top of
# this file and the coverage.yml job, so llvm-profdata's raw-profile
# format is consistent across CI legs.
# ============================================================================
pgo:
name: Profile + PGO (clang)
runs-on: ubuntu-24.04
timeout-minutes: 30
steps:
- uses: actions/checkout@v4
- name: Install clang-19 + llvm-19 + ninja
run: |
sudo apt-get update
sudo apt-get install -y ninja-build clang-19 llvm-19
- name: Run two-stage PGO build
env:
# Route stage artifacts to absolute paths under the runner
# workspace so the upload-artifact step below can find them
# regardless of where the script's repo_root resolves to.
CC: clang-19
CXX: clang++-19
PGO_STAGE1_DIR: ${{ github.workspace }}/build-pgo-gen
PGO_STAGE2_DIR: ${{ github.workspace }}/build-pgo-use
PGO_PROFILE_DATA_DIR: ${{ github.workspace }}/build-pgo-gen/pgo-data
PGO_PROFILE_FILE: ${{ github.workspace }}/build-pgo-gen/pgo.profdata
# SNMALLOC_RUST_SUPPORT=ON materializes libsnmallocshim-rust.a
# under the use-stage build directory; that file is the
# uploaded artifact below. Use CMake-provided clang names so
# the configure step does not fall back to system gcc.
PGO_EXTRA_CMAKE_FLAGS: >-
-G Ninja
-DSNMALLOC_RUST_SUPPORT=ON
-DCMAKE_C_COMPILER=clang-19
-DCMAKE_CXX_COMPILER=clang++-19
run: scripts/run-pgo-build.sh
- name: Verify PGO artifact
run: |
ls -l "${{ github.workspace }}/build-pgo-use/libsnmallocshim-rust.a"
- name: Upload PGO artifact (libsnmallocshim-rust.a)
uses: actions/upload-artifact@v4
with:
name: pgo-libsnmallocshim-rust-linux-x64
path: ${{ github.workspace }}/build-pgo-use/libsnmallocshim-rust.a
if-no-files-found: error
retention-days: 14

# ============================================================================
# vcpkg integration
# ============================================================================
Expand Down Expand Up @@ -557,6 +663,7 @@ jobs:
qemu-crossbuild,
windows,
format,
pgo,
vcpkg-integration
]
runs-on: ubuntu-24.04
Expand Down
44 changes: 44 additions & 0 deletions .github/workflows/rust.yml
Original file line number Diff line number Diff line change
Expand Up @@ -70,6 +70,50 @@ jobs:
- name: Run tests
run: cargo test ${{ matrix.release.flag }} --all ${{ matrix.features.args }}

# ============================================================================
# Heap-profiling feature build (Phase 7.5)
#
# Exercises the `profiling` cargo feature (which propagates
# SNMALLOC_PROFILE=ON to the C++ build via snmalloc-sys) on every push.
# Restricted to Linux + macOS because the profile code paths are validated
# there in the C++ matrix; Windows profile coverage can be added later if
# needed.
# ============================================================================
profiling:
runs-on: ${{ matrix.os }}
name: "profiling-${{ matrix.os }}-${{ matrix.release.name }}"
defaults:
run:
shell: bash
working-directory:
./snmalloc-rs
strategy:
matrix:
os: [ubuntu-latest, macos-14, macos-15]
rust: [stable]
release:
- name: release
flag: "--release"
- name: debug
flag: ""
fail-fast: false
steps:
- uses: actions-rs/toolchain@v1
with:
toolchain: ${{ matrix.rust }}
- name: Checkout
uses: actions/checkout@v4
- name: update dependency
run: |
if bash -c 'uname -s | grep 'Linux' >/dev/null'; then
sudo apt-get update -y && sudo apt-get --reinstall install -y libc6-dev
fi
shell: bash
- name: Build (profiling)
run: cargo build ${{ matrix.release.flag }} --verbose --features profiling
- name: Run tests (profiling)
run: cargo test ${{ matrix.release.flag }} --all --features profiling

publish-scan:
runs-on: ubuntu-latest
name: publish-scan
Expand Down
5 changes: 5 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -27,3 +27,8 @@

# rust target
/target

# bazel convenience symlinks (created in the workspace root by `bazel
# build` / `bazel test`). The actual outputs live under the user's
# bazel cache so the symlinks are pure noise on commit.
/bazel-*
33 changes: 32 additions & 1 deletion BUILD.bazel
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,7 @@ filegroup(
"src/test/*.h",
"src/test/*.cc",
"CMakeLists.txt",
"cmake/**/*.cmake",
],
),
visibility = ["//visibility:private"],
Expand Down Expand Up @@ -39,7 +40,7 @@ CMAKE_FLAGS = {
"SNMALLOC_OPTIMISE_FOR_CURRENT_MACHINE": "ON",
"SNMALLOC_USE_SELF_VENDORED_STL": "OFF",
"SNMALLOC_IPO": "ON",
"USE_SNMALLOC_STATS": "ON",
"SNMALLOC_STATS": "ON",
"SNMALLOC_BUILD_TESTING": "OFF",
} | select({
":release_with_debug": {"CMAKE_BUILD_TYPE": "RelWithDebInfo"},
Expand Down Expand Up @@ -87,6 +88,36 @@ cmake(
out_static_libs = [
"libsnmallocshim-static.a",
"libsnmalloc-new-override.a",
"libsnmallocshim-rust.a",
],
postfix_script = "ninja",
visibility = ["//visibility:public"],
)

# Profile-enabled variant of the Rust shim archive. Same source set as
# `:snmalloc-rs` but with SNMALLOC_PROFILE=ON so the `sn_rust_profile_*`
# exports in `rust.cc` switch from the no-op stubs to real bodies. Used
# by the `snmalloc_sys_profiling` Rust target.
cmake(
name = "snmalloc-rs-profile",
cache_entries = CMAKE_FLAGS | {
"SNMALLOC_RUST_SUPPORT": "ON",
"SNMALLOC_PROFILE": "ON",
},
generate_args = ["-G Ninja"],
lib_source = ":srcs",
out_shared_libs = select({
"@bazel_tools//src/conditions:darwin": [
"libsnmallocshim-checks-memcpy-only.dylib",
"libsnmallocshim-checks.dylib",
"libsnmallocshim.dylib",
],
"//conditions:default": [],
}),
out_static_libs = [
"libsnmallocshim-static.a",
"libsnmalloc-new-override.a",
"libsnmallocshim-rust.a",
],
postfix_script = "ninja",
visibility = ["//visibility:public"],
Expand Down
Loading
Loading