Skip to content

MOD-15578 Track shared SVS thread pool memory & expose it through public API#972

Open
dor-forer wants to merge 16 commits into
mainfrom
dor-forer-MOD-15578-track-svs-thpool-memory
Open

MOD-15578 Track shared SVS thread pool memory & expose it through public API#972
dor-forer wants to merge 16 commits into
mainfrom
dor-forer-MOD-15578-track-svs-thpool-memory

Conversation

@dor-forer
Copy link
Copy Markdown
Collaborator

@dor-forer dor-forer commented May 26, 2026

The shared VecSimSVSThreadPool singleton was previously created via raw new with the default allocator, so its slot vector and per-slot ThreadSlot objects bypassed VecSimAllocator and were invisible to any memory accounting downstream (FT.INFO, INFO MODULES, etc.).

This PR:

  1. Routes the shared SVS thread pool's allocations through a dedicated VecSimAllocator (using VecsimSTLAllocator for the slot vector and std::allocate_shared for thread objects).
  2. Adds a new public API size_t VecSim_GetSharedMemory(void) returning the total bytes of process-wide VecSim allocations not tied to any single index — today equal to the shared SVS thread pool's tracked allocation size.
  3. Surfaces these bytes in VECSIM_INFO via one new field:
    • SHARED_MEMORY — appended at the top level of every algorithm's debug response by the C API wrapper VecSimIndex_DebugInfoIterator. Always present (value may be 0).

Note on a single field. An earlier revision also emitted a per-contributor SHARED_SVS_THREADPOOL_MEMORY breakdown inside SVS sections. Since the SVS thread pool is the only process-wide contributor today, that field always reported the exact same bytes as SHARED_MEMORY and carried no extra information, so it was dropped (YAGNI). If a second global contributor is ever added, an itemized breakdown can be reintroduced then with real differentiation.


Public API change

Before

// (no API to query process-wide VecSim memory)

After

// Returns total bytes of process-wide VecSim allocations not tied to any single
// index (today: the shared SVS thread pool singleton). Returns 0 when no such
// shared allocations exist.
size_t VecSim_GetSharedMemory(void);

Callers (e.g. RediSearch) can fold this into per-spec or process-wide memory metrics without depending on which algorithm contributes.


VECSIM_INFO (FT.DEBUG) output change

Common header (every algorithm, unchanged)

ALGORITHM, TYPE, DIMENSION, METRIC, IS_MULTI_VALUE, IS_DISK,
INDEX_SIZE, INDEX_LABEL_COUNT, MEMORY, LAST_SEARCH_MODE

FLAT — 11 → 12 fields

  <common header × 10>
  BLOCK_SIZE
+ SHARED_MEMORY

HNSW — 18 → 19 fields

  <common header × 10>
  BLOCK_SIZE, M, EF_CONSTRUCTION, EF_RUNTIME, MAX_LEVEL, ENTRYPOINT,
  EPSILON, NUMBER_OF_MARKED_DELETED
+ SHARED_MEMORY

SVS (non-tiered) — 25 → 26 fields

  <common header × 10>
  BLOCK_SIZE, QUANT_BITS, ALPHA, GRAPH_MAX_DEGREE, CONSTRUCTION_WINDOW_SIZE,
  MAX_CANDIDATE_POOL_SIZE, PRUNE_TO, USE_SEARCH_HISTORY, NUM_THREADS,
  LAST_RESERVED_NUM_THREADS, NUMBER_OF_MARKED_DELETED, SEARCH_WINDOW_SIZE,
  SEARCH_BUFFER_CAPACITY, LEANVEC_DIMENSION, EPSILON
+ SHARED_MEMORY

Tiered HNSW — 16 → 17 fields

  <common header × 10>           (ALGORITHM = "TIERED")
  MANAGEMENT_LAYER_MEMORY, BACKGROUND_INDEXING, TIERED_BUFFER_LIMIT
  FRONTEND_INDEX = nested [<FLAT fields, 11>]      (no SHARED_MEMORY in nested)
  BACKEND_INDEX  = nested [<HNSW fields, 18>]      (no SHARED_MEMORY in nested)
  TIERED_HNSW_SWAP_JOBS_THRESHOLD
+ SHARED_MEMORY

Tiered SVS — 18 → 19 fields

  <common header × 10>           (ALGORITHM = "TIERED")
  MANAGEMENT_LAYER_MEMORY, BACKGROUND_INDEXING, TIERED_BUFFER_LIMIT
  FRONTEND_INDEX = nested [<FLAT fields, 11>]      (no SHARED_MEMORY in nested)
  BACKEND_INDEX  = nested [<SVS fields, 25>]       (no SHARED_MEMORY in nested)
  TIERED_SVS_TRAINING_THRESHOLD, TIERED_SVS_UPDATE_THRESHOLD,
  TIERED_SVS_THREADS_RESERVE_TIMEOUT
+ SHARED_MEMORY

Emission rule

SHARED_MEMORY is appended exactly once, at the outermost iterator level, by the C API wrapper VecSimIndex_DebugInfoIterator. It never appears inside a nested FRONTEND_INDEX/BACKEND_INDEX.


Stats / API output change

VecSim_GetSharedMemory()

Before: API did not exist. The pool's slot vector and per-slot ThreadSlot objects went through the default allocator and were not tracked anywhere.

After:

size_t bytes = VecSim_GetSharedMemory();
// == VecSimSVSThreadPool::getSharedAllocationSize()
//    (slot vector + per-slot ThreadSlot objects, allocated through the
//     dedicated VecSimAllocator).
// == 0 before any SVS index has been constructed and the worker pool size set.

Per-index getAllocationSize() (SVS only)

Before: Did not include any per-index portion of the parallelism slot, since the pool was untracked entirely.

After: Each SVS index's per-index allocator now tracks its own parallelism slot (a small fixed-size structure inside the index, allocated through the index's VecSimAllocator). The shared pool itself remains process-wide and reported via VecSim_GetSharedMemory().


Tests

  • Added SVSTest.debugInfoSharedMemoryMatchesApi (typed test, runs per SVS data type) — asserts the SHARED_MEMORY field appears exactly once in a non-tiered SVS response and reports the same bytes as VecSim_GetSharedMemory().
  • Added SVSTest.sharedMemoryTracksThreadPoolResize (typed test) — asserts the reported shared memory actually grows when the pool is resized up and shrinks when resized down, so the value tracks real allocation rather than being a constant that happens to agree with the API.
  • Updated compareFlatIndexInfoToIterator, compareHNSWIndexInfoToIterator, compareSVSIndexInfoToIterator to take an expect_shared_memory parameter (default true) — needed because these comparators are called both with the C API iterator (top level, has SHARED_MEMORY) and as nested-backend comparators inside compareTieredIndexInfoToIterator (no SHARED_MEMORY).
  • Expected field counts (C++ method, no SHARED_MEMORY): FLAT 11, HNSW 18, SVS 25, TIERED_HNSW 16, TIERED_SVS 18. The top-level comparators add +1 for the SHARED_MEMORY field that the C API wrapper appends.
  • getFlatFields(), getHNSWFields(), getTieredHNSWFields(), getSVSFields(), getTieredSVSFields() updated to append SHARED_MEMORY at the top level.

Compatibility

  • Wire-compatible — adds one new field at a well-defined position in VECSIM_INFO. Existing consumers parsing by field name are unaffected; consumers indexing by position must shift their expectations accordingly (covered above).
  • Source-compatibleVecSim_GetSharedMemory() is purely additive.
  • Behavioral parity — for callers that don't read the new field/API, total tracked memory is now strictly larger by exactly the bytes the shared SVS thread pool was already consuming but not reporting.

Mark if applicable

  • This PR introduces API changes
  • This PR introduces serialization changes

Supersedes #967 — ownership transferred to @dor-forer. Original branch authored by @meiravgri; identical commits cherry-pushed to dor-forer-MOD-15578-track-svs-thpool-memory so a non-self reviewer can be assigned.

@jit-ci
Copy link
Copy Markdown

jit-ci Bot commented May 26, 2026

🛡️ Jit Security Scan Results

CRITICAL HIGH MEDIUM

✅ No security findings were detected in this PR


Security scan by Jit

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR improves VecSim memory accounting by routing the shared SVS thread pool singleton allocations through a tracked allocator and exposing that process-wide (non-index) memory via a new public C API and additional VECSIM_INFO fields.

Changes:

  • Added VecSim_GetGlobalMemory() and appended GLOBAL_MEMORY to the top-level debug info iterator returned by VecSimIndex_DebugInfoIterator.
  • Tracked shared SVS thread pool allocations and exposed them via SHARED_SVS_THREADPOOL_MEMORY in SVS debug info.
  • Updated unit-test comparators/field-order expectations and added a test to enforce the invariant between GLOBAL_MEMORY, SHARED_SVS_THREADPOOL_MEMORY, and VecSim_GetGlobalMemory().

Reviewed changes

Copilot reviewed 13 out of 13 changed files in this pull request and generated 4 comments.

Show a summary per file
File Description
tests/unit/unit_test_utils.h Extends debug-info comparator signatures to optionally expect GLOBAL_MEMORY.
tests/unit/unit_test_utils.cpp Updates debug-info comparators and expected field orders to include GLOBAL_MEMORY / SHARED_SVS_THREADPOOL_MEMORY.
tests/unit/test_svs.cpp Adds a typed test validating global-memory vs shared-threadpool-memory invariants.
tests/unit/test_svs_tiered.cpp Updates SVS threadpool wrapper construction to pass a VecSimAllocator.
tests/unit/test_svs_threadpool.cpp Updates threadpool wrapper tests to pass an allocator and resets shared pool state in setup.
src/VecSim/vec_sim.h Adds the new public API declaration VecSim_GetGlobalMemory().
src/VecSim/vec_sim.cpp Implements VecSim_GetGlobalMemory() and appends GLOBAL_MEMORY in the C API debug iterator wrapper.
src/VecSim/vec_sim_common.h Clarifies that per-index memory excludes process-wide/global allocations.
src/VecSim/utils/vec_utils.h Adds new debug field name constants for GLOBAL_MEMORY / SHARED_SVS_THREADPOOL_MEMORY.
src/VecSim/utils/vec_utils.cpp Defines the new debug field name strings.
src/VecSim/index_factories/svs_factory.cpp Accounts for per-index threadpool wrapper allocation in SVS initial size estimation.
src/VecSim/algorithms/svs/svs.h Constructs the per-index SVS threadpool wrapper with the index allocator and adds SHARED_SVS_THREADPOOL_MEMORY to debug info.
src/VecSim/algorithms/svs/svs_utils.h Introduces a tracked allocator for the shared pool, tracks allocations, and changes wrapper ctor to allocate parallelism via index allocator.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread src/VecSim/index_factories/svs_factory.cpp Outdated
Comment thread src/VecSim/utils/vec_utils.h Outdated
Comment thread tests/unit/unit_test_utils.cpp Outdated
Comment thread tests/unit/unit_test_utils.cpp Outdated
@dor-forer dor-forer force-pushed the dor-forer-MOD-15578-track-svs-thpool-memory branch from 31c0e0a to 0adca87 Compare May 26, 2026 14:51
Comment thread src/VecSim/algorithms/svs/svs_utils.h Outdated
The doc string claimed the field is exposed only in SVS tiered indexes,
but `SVSIndex::debugInfoIterator()` emits it for flat SVS indexes too
(and inside the BACKEND_INDEX section of tiered SVS).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@codecov
Copy link
Copy Markdown

codecov Bot commented May 26, 2026

Codecov Report

❌ Patch coverage is 95.45455% with 2 lines in your changes missing coverage. Please review.
✅ Project coverage is 97.08%. Comparing base (bbe9dfd) to head (a923fa7).
⚠️ Report is 1 commits behind head on main.

Files with missing lines Patch % Lines
src/VecSim/algorithms/svs/svs_utils.h 92.30% 2 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #972      +/-   ##
==========================================
+ Coverage   96.99%   97.08%   +0.09%     
==========================================
  Files         130      141      +11     
  Lines        7793     8144     +351     
==========================================
+ Hits         7559     7907     +348     
- Misses        234      237       +3     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

* Drop the unconditional `#include "VecSim/algorithms/svs/svs_utils.h"`
  at the top of `unit_test_utils.cpp` (added in 3c2023a). The header
  pulls in `<svs/core/distance.h>`, `<svs/index/vamana/dynamic_index.h>`,
  etc. — none of which are available when `USE_SVS=OFF`. The
  HAVE_SVS-guarded include lower in the file already provides the symbols
  `validateSVSIndexAttributesInfo` needs.
* Wrap `compareSVSIndexInfoToIterator` (definition + declaration in the
  header) and its call site inside `chooseCompareIndexInfoToIterator`
  with `#if HAVE_SVS`. The function references `VecSimSVSThreadPool`,
  which is undefined when SVS is disabled.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@dor-forer dor-forer force-pushed the dor-forer-MOD-15578-track-svs-thpool-memory branch from 0adca87 to 9d082f0 Compare May 26, 2026 15:23
- svs_factory.cpp: mark unused 'p' variable with [[maybe_unused]] to
  silence -Wunused-variable under -Werror (Copilot review).
- vec_utils.h: rewrite SHARED_SVS_THREADPOOL_MEMORY_STRING doc comment
  to match actual emission behavior — emitted by SVSIndex::debugInfoIterator()
  for both non-tiered (top-level) and tiered (BACKEND_INDEX) SVS responses
  (Copilot review).
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 13 out of 13 changed files in this pull request and generated no new comments.

Comment thread src/VecSim/vec_sim.h Outdated
* (e.g. the shared SVS thread pool singleton). 0 if no such allocations
* have been made.
*/
size_t VecSim_GetGlobalMemory(void);
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Problematic name, implies that it also includes the per-index memory

// Capacity hint for the iterator. Must equal the number of addInfoField()
// calls below (1 for ALGORITHM + 9 from addCommonInfoToIterator + 16 SVS-specific).
// Update this number when fields are added or removed.
size_t numberOfInfoFields = 26;
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Only one pair was added no? How come it increased by 3?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's actually +1 for the new field and +2 because the old 23 was already wrong.
Before the change the real count was 25 (1 ALGORITHM + 9 from addCommonInfoToIterator + 15 SVS-specific), but the comment just said "update this when needed" and it looks like nobody had.
So adding the shared-memory field bumped the correct count to 26, not 24.
I added a little breakdown in the comment so it doesn't drift again. Worth noting it's only a reserve() hint, not a hard size, so the stale value wasn't actually causing any bug — just an extra realloc — which is probably why it slipped by.

Comment thread tests/unit/test_svs.cpp
Comment on lines +3416 to +3418
VecSimSVSThreadPool::resize(1);
}

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a validation that the tracked memory actually increased/decreased as expected (rather than just that both APIs return the same value) ?

Copy link
Copy Markdown
Collaborator Author

@dor-forer dor-forer Jun 1, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a new test sharedMemoryTracksThreadPoolResize that specifically validates directional tracking: it asserts that VecSim_GetSharedMemory() increases when the pool grows and decreases when it shrinks, ensuring the accounting is live and not just a static value that happens to match both APIs

Comment thread src/VecSim/algorithms/svs/svs_utils.h Outdated
std::shared_ptr<VecSimAllocator> allocator_; // pool's own allocator for memory tracking
mutable std::mutex pool_mutex_;
std::vector<std::shared_ptr<ThreadSlot>> slots_;
std::vector<SlotPtr, SlotVecAllocator> slots_;
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wouldn't using vecsim_stl::vector simplify the whole thing?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done — slots_ is now vecsim_stl::vector<SlotPtr> (constructed with the pool's allocator), which removed the explicit SlotVecAllocator alias entirely.

dor-forer added a commit that referenced this pull request Jun 1, 2026
…size

- Rename VecSim_GetGlobalMemory -> VecSim_GetSharedMemory and the
  GLOBAL_MEMORY field -> SHARED_MEMORY; "global" wrongly implied it
  included per-index memory. Doc now states it excludes per-index memory.
- Simplify VecSimSVSThreadPoolImpl::slots_ to vecsim_stl::vector<SlotPtr>,
  dropping the SlotVecAllocator alias and the explicit allocator spelling
  in the ctor; add the now-needed vecsim_stl.h include.
- Fix the SVS debugInfoIterator field count (was a stale 23; true count
  was 25, now 26 with the new shared-memory field) and pin a breakdown
  comment so it can't drift again.
- Add SVSTest.sharedMemoryTracksThreadPoolResize: assert shared memory
  grows when the pool grows and shrinks when it shrinks, not just that
  the SHARED_MEMORY / SHARED_SVS_THREADPOOL_MEMORY / VecSim_GetSharedMemory
  readouts agree.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Comment thread tests/unit/test_svs.cpp Outdated
…size

- Rename VecSim_GetGlobalMemory -> VecSim_GetSharedMemory and the
  GLOBAL_MEMORY field -> SHARED_MEMORY; "global" wrongly implied it
  included per-index memory. Doc now states it excludes per-index memory.
- Simplify VecSimSVSThreadPoolImpl::slots_ to vecsim_stl::vector<SlotPtr>,
  dropping the SlotVecAllocator alias and the explicit allocator spelling
  in the ctor; add the now-needed vecsim_stl.h include.
- Fix the SVS debugInfoIterator field count (was a stale 23; true count
  was 25, now 26 with the new shared-memory field) and pin a breakdown
  comment so it can't drift again.
- Add SVSTest.sharedMemoryTracksThreadPoolResize: assert shared memory
  grows when the pool grows and shrinks when it shrinks, not just that
  the SHARED_MEMORY / SHARED_SVS_THREADPOOL_MEMORY / VecSim_GetSharedMemory
  readouts agree.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@dor-forer dor-forer force-pushed the dor-forer-MOD-15578-track-svs-thpool-memory branch from 2f308d1 to 7c8b871 Compare June 1, 2026 11:57
Copy link
Copy Markdown

@cursor cursor Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

There are 2 total unresolved issues (including 1 from previous review).

Fix All in Cursor

❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, have a team admin enable autofix in the Cursor dashboard.

Reviewed by Cursor Bugbot for commit 7c8b871. Configure here.

Comment thread src/VecSim/algorithms/svs/svs.h
dor-forer and others added 4 commits June 1, 2026 15:04
…field

SVSIndex::debugInfoIterator() reserved 26 fields, but the C API wrapper
VecSimIndex_DebugInfoIterator appends one more (SHARED_MEMORY) after the
method returns, forcing a reallocation on the top-level path. Bump the
capacity hint to 27 and document the +1. Addresses Cursor Bugbot finding.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
compareTieredIndexInfoToIterator expects the top-level SHARED_MEMORY field
that VecSimIndex_DebugInfoIterator (C API) appends, but the tiered tests fed
it a C++-method iterator (tiered_index->debugInfoIterator()) which omits that
field, causing an off-by-one field-count failure (HNSW tiered 16 vs 17, SVS
tiered 18 vs 19). Obtain the iterator via the C API like the non-tiered tests
and the real RediSearch consumer.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Lead with 27 = 26 + 1 so the comment no longer reads as a 26-vs-27 mismatch
against numberOfInfoFields. No functional change.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@dor-forer dor-forer requested a review from alonre24 June 1, 2026 13:36
const char *VecSimCommonStrings::SHARED_SVS_THREADPOOL_MEMORY_STRING =
"SHARED_SVS_THREADPOOL_MEMORY";

const char *VecSimCommonStrings::SHARED_MEMORY_STRING = "SHARED_MEMORY";
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need both SHARED_MEMORY and SHARED_SVS_THREADPOOL_MEMORY_STRING ? is there any difference between them?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch — there's no real difference today, so I collapsed them into the single generic SHARED_MEMORY field (a923fa7).

The original two-field design came from the ticket spec (umbrella GLOBAL_MEMORY/SHARED_MEMORY + per-contributor SHARED_SVS_THREADPOOL_MEMORY breakdown). But the SVS thread pool is the only process-wide contributor today, so both always reported the exact same bytes — the breakdown carried no extra information. YAGNI: dropped SHARED_SVS_THREADPOOL_MEMORY and kept only SHARED_MEMORY + VecSim_GetSharedMemory().

If a second global contributor is ever added, an itemized breakdown can be reintroduced then with real differentiation.

Address PR #972 review (alonre24): the SVS-specific
SHARED_SVS_THREADPOOL_MEMORY field always reported the same bytes as the
generic top-level SHARED_MEMORY field, since the SVS thread pool is the
only process-wide contributor today. The duplicate breakdown carried no
extra information, so drop it (YAGNI) and keep only the generic
SHARED_MEMORY field + VecSim_GetSharedMemory() API.

If a second global contributor is added later, an itemized breakdown can
be reintroduced then with real differentiation.

* svs.h: remove the SHARED_SVS_THREADPOOL_MEMORY field emission in
  SVSIndex::debugInfoIterator(); capacity hint 27 -> 26 (15 SVS-specific
  fields instead of 16).
* vec_utils.{h,cpp}: drop SHARED_SVS_THREADPOOL_MEMORY_STRING.
* test_svs.cpp: rename debugInfoSharedMemoryEqualsSharedSVSThreadPoolMemory
  -> debugInfoSharedMemoryMatchesApi; assert SHARED_MEMORY appears once
  and equals VecSim_GetSharedMemory().
* unit_test_utils.cpp: SVS field count 26 -> 25; drop the comparator
  branch and getSVSFields() entry for the removed field.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@dor-forer dor-forer requested a review from alonre24 June 3, 2026 15:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants