Skip to content

feat: ONNX Runtime embedding integration for conductor-knowledge (ADR-023)#783

Merged
amiable-dev merged 7 commits intomainfrom
feat/onnx-embeddings
Mar 26, 2026
Merged

feat: ONNX Runtime embedding integration for conductor-knowledge (ADR-023)#783
amiable-dev merged 7 commits intomainfrom
feat/onnx-embeddings

Conversation

@amiable-dev
Copy link
Owner

@amiable-dev amiable-dev commented Mar 26, 2026

Summary

Wire real ONNX Runtime embeddings into the conductor-knowledge crate, replacing the stub_embed placeholder. When the onnx feature is enabled and the all-MiniLM-L6-v2 model file is present, produces 384-dim semantic embeddings. Otherwise gracefully falls back to tiled hash-based stubs (384-dim with onnx compiled, 64-dim without).

Changes

  • New embedder.rs module: Feature-gated ONNX backend with OnceLock<Mutex<Session>> for lazy, thread-safe model loading. Mean pooling + L2 normalization with output shape validation.
  • Dependencies: ort 2.0.0-rc.12 (with ndarray feature), tokenizers 0.21, ndarray 0.17 — all optional behind onnx feature. dirs also optional.
  • Bundled tokenizer.json (466KB): Compile-time fallback from HuggingFace all-MiniLM-L6-v2
  • Replaced all stub_embed() calls: retrieval.rs, lib.rs, index.rs, device_profiles.rs now use embedder::embed()
  • Public API: embed(), embed_dim(), is_onnx_loaded() re-exported from lib.rs
  • embedding_dim() helper on KnowledgeIndex: validates dimension consistency, rejects empty embeddings
  • Dimension-stable fallback: when onnx compiled but model absent, returns tiled 384-dim stubs matching embed_dim()

Key Design

  • embed() never returns errors — always produces a normalized vector, falls back silently
  • Model loaded from $CONDUCTOR_DATA_DIR/knowledge/ or ~/.conductor/knowledge/
  • ONNX output validated against expected shape [1, seq_len, 384]
  • Daemon has zero transitive dependency on ort/tokenizers (verified via cargo tree)

Test plan

  • 23 existing tests pass (stub fallback)
  • 6 new embedder tests (dimension, normalization, consistency)
  • cargo check --package conductor-knowledge (no feature) passes
  • cargo check --package conductor-knowledge --features onnx passes
  • cargo tree -p conductor-daemon | grep ort — empty (daemon isolation)
  • #[ignore] ONNX tests: require model file on disk

🤖 Generated with Claude Code

Add embedder.rs module with feature-gated ONNX backend:
- ort 2.0.0-rc.12 + tokenizers 0.21 + ndarray 0.17 behind `onnx` feature
- OnceLock<Mutex<Session>> for lazy, thread-safe model loading
- Mean pooling + L2 normalization for all-MiniLM-L6-v2 (384-dim)
- Graceful fallback to 64-dim hash-based stub when model unavailable
- Bundle tokenizer.json (466KB) from HuggingFace for compile-time fallback

Replace all stub_embed() calls with embedder::embed():
- retrieval.rs, lib.rs, index.rs, device_profiles.rs
- Add embedding_dim() helper to KnowledgeIndex
- Re-export embed, embed_dim, is_onnx_active from lib.rs

23 existing tests pass + 6 new embedder tests.
Daemon has zero transitive dependency on ort/tokenizers.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Copilot AI review requested due to automatic review settings March 26, 2026 11:26
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR wires a real embedding backend into conductor-knowledge, adding a feature-gated ONNX Runtime path (all-MiniLM-L6-v2, 384-dim) while keeping a stub embedding fallback for development/CI. It updates retrieval/indexing call sites to use the new unified embedder API and adds an index helper for embedding dimension introspection.

Changes:

  • Added embedder.rs with ONNX Runtime + tokenizer-based embeddings behind the onnx feature, with normalized stub fallback.
  • Replaced prior stub_embed() usage across the knowledge layer with embedder::embed() and re-exported embedder APIs from lib.rs.
  • Added KnowledgeIndex::embedding_dim() and updated dependencies/features to support optional ONNX embedding.

Reviewed changes

Copilot reviewed 7 out of 9 changed files in this pull request and generated 8 comments.

Show a summary per file
File Description
conductor-knowledge/src/embedder.rs New embedder module with ONNX backend + stub fallback and tests.
conductor-knowledge/src/retrieval.rs Uses embed() for query and test chunk embeddings.
conductor-knowledge/src/lib.rs Exposes embedder module + re-exports embedder API.
conductor-knowledge/src/index.rs Removes old stub_embed() and adds embedding_dim() helper.
conductor-knowledge/src/device_profiles.rs Updates tests to use embed() for chunk embeddings.
conductor-knowledge/Cargo.toml Adds onnx feature and optional deps for ONNX backend (plus dirs).
Cargo.toml Adds workspace dependency versions for ort/tokenizers/ndarray.
Cargo.lock Locks new transitive dependencies introduced by the ONNX stack.

Comment on lines +91 to +93
/// Get the embedding dimension of chunks in this index (None if empty).
pub fn embedding_dim(&self) -> Option<usize> {
self.chunks.first().map(|c| c.embedding.len())
Copy link

Copilot AI Mar 26, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

KnowledgeIndex::embedding_dim() returns the first chunk's embedding length, but it doesn't detect mixed-dimension indices. Since cosine similarity returns 0.0 when lengths differ, a partially-mismatched index can fail silently. Consider either validating on add_chunk (reject/mutate mismatched embeddings) or having embedding_dim() return None when not all chunks share the same dimension.

Suggested change
/// Get the embedding dimension of chunks in this index (None if empty).
pub fn embedding_dim(&self) -> Option<usize> {
self.chunks.first().map(|c| c.embedding.len())
/// Get the embedding dimension of chunks in this index.
///
/// Returns:
/// - `None` if the index is empty, or if not all chunks share the same
/// embedding dimension.
/// - `Some(dim)` when all chunks use the same embedding dimension.
pub fn embedding_dim(&self) -> Option<usize> {
let first_dim = self.chunks.first()?.embedding.len();
if self
.chunks
.iter()
.all(|c| c.embedding.len() == first_dim)
{
Some(first_dim)
} else {
None
}

Copilot uses AI. Check for mistakes.
Comment on lines +25 to 33
pub mod embedder;
pub mod index;
pub mod provider_tuning;
pub mod retrieval;

// Re-export key types for convenience
pub use community::{CommunityProfile, L3Status};
pub use embedder::{embed, embed_dim, is_onnx_active};
pub use index::{ChunkMetadata, KnowledgeChunk, KnowledgeIndex};
Copy link

Copilot AI Mar 26, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The crate-level module docs at the top of this file still say ONNX integration is "not yet wired" and that the in-process implementation uses stub embeddings. Now that embedder is a public module and embed/embed_dim/is_onnx_active are re-exported, please update those docs to reflect the actual ONNX+fallback behavior so they don't mislead users of the public API.

Copilot uses AI. Check for mistakes.
min_similarity: f32,
) -> RetrievedContext {
let query_embedding = stub_embed(query);
let query_embedding = embed(query);
Copy link

Copilot AI Mar 26, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The retrieve function’s doc comment still describes embedding the query via the stub embedder, but the implementation now calls embed() (which may use ONNX when available). Please update the comment to describe the feature-gated ONNX embedding with stub fallback to avoid misleading readers.

Copilot uses AI. Check for mistakes.
Comment on lines +227 to +234
/// Returns the current embedding dimension.
pub fn embed_dim() -> usize {
if is_onnx_active() {
EMBED_DIM
} else {
STUB_DIM
}
}
Copy link

Copilot AI Mar 26, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

embed_dim() is based on is_onnx_active(), which only becomes true after the embedder has been initialized (i.e., after an embed() call). With the model present, embed_dim() can therefore report STUB_DIM even though the next embed() call will produce 384-dim vectors. Consider initializing/probing the backend inside is_onnx_active()/embed_dim() (or documenting that callers must call embed() once before relying on these).

Copilot uses AI. Check for mistakes.
Comment on lines +35 to +38
// Safety: Mutex<Session> provides thread-safe mutable access
unsafe impl Send for OnnxEmbedder {}
unsafe impl Sync for OnnxEmbedder {}

Copy link

Copilot AI Mar 26, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

unsafe impl Send/Sync for OnnxEmbedder is risky because it can mask ort::Session thread-affinity / thread-safety requirements. Prefer to avoid unsafe here: either rely on Session's actual auto-traits, or redesign to keep a per-thread session (thread_local) / create sessions per call if required. If unsafe is truly required, add a detailed safety justification referencing ort's guarantees.

Suggested change
// Safety: Mutex<Session> provides thread-safe mutable access
unsafe impl Send for OnnxEmbedder {}
unsafe impl Sync for OnnxEmbedder {}

Copilot uses AI. Check for mistakes.
Comment on lines +209 to +211
if let Some(v) = onnx_backend::try_embed(text) {
return v;
}
Copy link

Copilot AI Mar 26, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

embed() falls back to the 64-dim stub whenever try_embed() returns None, including transient inference failures after the ONNX model has successfully loaded. That can create dimension mismatches (e.g., index built with 384-dim vectors, query embedded to 64-dim) which makes cosine similarity return 0 for all chunks. Consider making the fallback dimension-stable (e.g., once ONNX is active, return a 384-dim zero/placeholder vector on inference error, or treat ONNX errors as disabling ONNX for the process) so callers never see mixed dimensions.

Suggested change
if let Some(v) = onnx_backend::try_embed(text) {
return v;
}
{
if let Some(v) = onnx_backend::try_embed(text) {
return v;
}
// ONNX is compiled in but this particular inference failed.
// If the model has previously loaded successfully, return a
// deterministic EMBED_DIM-sized placeholder vector so callers
// never see mixed embedding dimensions in a single process.
if onnx_backend::is_active() {
let mut v = vec![0.0f32; EMBED_DIM];
// Unit vector in the first dimension to keep the vector normalized.
if !v.is_empty() {
v[0] = 1.0;
}
return v;
}
}
// ONNX is not active (feature disabled or model unavailable),
// so fall back to the 64-dim hash-based stub.

Copilot uses AI. Check for mistakes.
Comment on lines +159 to +172
/// Resolve model directory from environment or platform config.
fn resolve_model_dir() -> Option<PathBuf> {
if let Ok(dir) = std::env::var("CONDUCTOR_DATA_DIR") {
let p = PathBuf::from(dir).join("knowledge");
if p.exists() {
return Some(p);
}
}
if let Some(config) = dirs::config_dir() {
let p = config.join("midimon").join("knowledge");
if p.exists() {
return Some(p);
}
}
Copy link

Copilot AI Mar 26, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Model files are large runtime assets; elsewhere in the repo runtime assets are generally stored under a data directory (e.g., dirs::data_dir().join("midimon")...). resolve_model_dir() currently probes dirs::config_dir()/midimon/knowledge, which is an unusual location for model binaries. Consider switching this probe to dirs::data_dir() (or aligning with an established app data path) to match existing conventions and avoid placing large files under config.

Copilot uses AI. Check for mistakes.
Comment on lines 9 to +16
[dependencies]
serde = { version = "1", features = ["derive"] }
dirs = { workspace = true }

# ONNX Runtime — added when real embedding work begins.
# Currently stub_embed() is used. See ADR-023 D2.
# ort = { version = "2.0.0-rc.12", optional = true }
# ONNX Runtime embedding (ADR-023 D2) — behind `onnx` feature flag
ort = { workspace = true, optional = true }
tokenizers = { workspace = true, optional = true }
ndarray = { workspace = true, optional = true }
Copy link

Copilot AI Mar 26, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

dirs is added as a non-optional dependency, but it is only used by the ONNX backend (which is feature-gated). If the goal is to keep the default (non-onnx) build lightweight, consider making dirs optional and including it in the onnx feature so it isn't built/pulled in for stub-only builds.

Copilot uses AI. Check for mistakes.
- Remove unsafe Send/Sync impls (ort::Session already Send+Sync)
- Dimension-stable fallback: if ONNX active but inference fails, return
  384-dim unit vector instead of 64-dim stub (prevents index mismatch)
- embedding_dim(): validate all chunks share same dimension
- resolve_model_dir: use data_dir (not config_dir) for model files
- Make dirs dependency optional (only pulled with onnx feature)
- Update doc comments in lib.rs, retrieval.rs to reflect ONNX+fallback

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 7 out of 9 changed files in this pull request and generated 6 comments.

Comment on lines +297 to +300
#[test]
fn test_is_onnx_active_without_model() {
// Without a model file present, should be false
assert!(!is_onnx_active() || cfg!(feature = "onnx"));
Copy link

Copilot AI Mar 26, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test doesn’t actually validate the behavior it describes when the onnx feature is enabled: !is_onnx_active() || cfg!(feature = "onnx") always passes under onnx. Consider either gating the test to #[cfg(not(feature = "onnx"))] or setting CONDUCTOR_DATA_DIR to a temp directory containing a knowledge/ folder without a model, then triggering initialization and asserting is_onnx_active() is false.

Suggested change
#[test]
fn test_is_onnx_active_without_model() {
// Without a model file present, should be false
assert!(!is_onnx_active() || cfg!(feature = "onnx"));
#[cfg(not(feature = "onnx"))]
#[test]
fn test_is_onnx_active_without_model() {
// Without a model file present and without the `onnx` feature, should be false
assert!(!is_onnx_active());

Copilot uses AI. Check for mistakes.
Comment on lines +9 to +11
//! and L3 community profile stubs. When the `onnx` feature is enabled and the
//! all-MiniLM-L6-v2 model is present, produces 384-dim semantic embeddings.
//! Otherwise falls back to 64-dim hash-based stubs.
Copy link

Copilot AI Mar 26, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The crate-level docs still state that InProcessKnowledge uses “stub embeddings” (later in this module doc), but retrieval now routes through embed() which can use ONNX when enabled. Please update the remaining doc sentence(s) to reflect the new behavior so the documentation is consistent.

Copilot uses AI. Check for mistakes.
Comment on lines +233 to +237
/// Returns the current embedding dimension.
pub fn embed_dim() -> usize {
if is_onnx_active() {
EMBED_DIM
} else {
Copy link

Copilot AI Mar 26, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

embed_dim() relies on is_onnx_active(), which only becomes true after the first successful ONNX initialization (triggered by embed()). If a caller queries embed_dim() before calling embed(), it will report STUB_DIM even when the model is present, which can lead to dimension mismatches in callers that preallocate/check dimensions. Consider making embed_dim() reflect the configured/available backend (e.g., attempt lazy init or a cheap availability check) or documenting that it only reflects the currently-initialized backend.

Suggested change
/// Returns the current embedding dimension.
pub fn embed_dim() -> usize {
if is_onnx_active() {
EMBED_DIM
} else {
/// Returns the embedding dimension for the configured backend.
///
/// When built with the `onnx` feature enabled, this returns [`EMBED_DIM`]
/// (the ONNX model dimension) regardless of whether the ONNX backend has
/// already been initialized. When built without `onnx`, this returns
/// [`STUB_DIM`] (the hash-based stub dimension).
pub fn embed_dim() -> usize {
#[cfg(feature = "onnx")]
{
EMBED_DIM
}
#[cfg(not(feature = "onnx"))]
{

Copilot uses AI. Check for mistakes.
Comment on lines +251 to +255
if norm > 0.0 {
for v in &mut embedding {
*v /= norm;
}
}
Copy link

Copilot AI Mar 26, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

embed() claims to always return a normalized vector, but the stub path returns an all-zero vector for empty input (no bytes => norm=0 => no normalization). Consider ensuring the stub always returns a unit-length vector (e.g., special-case empty text / zero-norm output) so the API guarantee holds.

Suggested change
if norm > 0.0 {
for v in &mut embedding {
*v /= norm;
}
}
if norm <= 0.0 {
// For empty input (or any zero-norm case), return a deterministic
// unit vector to uphold the "always normalized" API contract.
if let Some(first) = embedding.get_mut(0) {
*first = 1.0;
}
return embedding;
}
for v in &mut embedding {
*v /= norm;
}

Copilot uses AI. Check for mistakes.
Comment on lines +210 to +214
// If ONNX loaded but this inference failed, return a dimension-stable
// placeholder to avoid mixing 384-dim and 64-dim vectors in the same index.
if onnx_backend::is_active() {
let mut v = vec![0.0f32; EMBED_DIM];
v[0] = 1.0; // unit vector to stay normalized
Copy link

Copilot AI Mar 26, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When ONNX is active but a single inference fails, embed() returns the same constant unit vector (v[0]=1.0) for all failures. This makes all such embeddings identical, which can severely degrade retrieval quality and make failures hard to detect. Consider using a deterministic fallback that preserves dimensionality but still varies by input (e.g., a hash-based embedding expanded to 384 dims), or at least logging/telemetry in the failure path.

Suggested change
// If ONNX loaded but this inference failed, return a dimension-stable
// placeholder to avoid mixing 384-dim and 64-dim vectors in the same index.
if onnx_backend::is_active() {
let mut v = vec![0.0f32; EMBED_DIM];
v[0] = 1.0; // unit vector to stay normalized
// If ONNX loaded but this inference failed, derive a deterministic,
// input-varying 384-dim fallback from the hash-based stub to avoid
// mixing 384-dim and 64-dim vectors in the same index.
if onnx_backend::is_active() {
let stub = stub_embed_internal(text);
let mut v = vec![0.0f32; EMBED_DIM];
// Tile the stub embedding to fill EMBED_DIM.
for i in 0..EMBED_DIM {
v[i] = stub[i % stub.len()];
}
// Re-normalize to unit length to preserve API guarantees.
let norm: f32 = v.iter().map(|x| x * x).sum::<f32>().sqrt();
if norm > 0.0 {
for x in &mut v {
*x /= norm;
}
}

Copilot uses AI. Check for mistakes.
Comment on lines +181 to +185
let result = EMBEDDER.get_or_init(|| match resolve_model_dir() {
Some(dir) => OnnxEmbedder::load(&dir).map(Arc::new),
None => Err("No model directory found".to_string()),
});
match result {
Copy link

Copilot AI Mar 26, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The OnceLock caches initialization errors permanently (get_or_init stores an Err), so if the model directory/file appears later (e.g., user downloads the model while the app is running), ONNX will never become active until restart. If hot-install is a desired UX, consider not caching failures (retry until success), or caching failures with a timestamp/backoff so initialization can be retried.

Copilot uses AI. Check for mistakes.
amiable-dev and others added 2 commits March 26, 2026 11:51
- ONNX failure fallback: tile stub to 384-dim (input-varying, not constant)
- embed_dim(): returns EMBED_DIM at compile time with onnx feature
  (no dependency on runtime init state)
- Stub: return unit vector on empty input (zero-norm guarantee)
- Test: gate is_onnx_active test to #[cfg(not(feature = "onnx"))]
- OnceLock: document that init failures are cached (restart required)
- Docs: update remaining "stub embeddings" references

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
When onnx feature is enabled but model is absent, embed() now returns
tiled 384-dim stub vectors (not 64-dim) to stay consistent with
embed_dim(). Fixes CI test_embed_dim_consistent failure on Linux.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 7 out of 9 changed files in this pull request and generated 5 comments.

if norm > 0.0 {
for x in v.iter_mut() {
*x /= norm;
}
Copy link

Copilot AI Mar 26, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The public API/docs state embeddings are always normalized, but the ONNX path’s normalize() leaves the vector unchanged when norm == 0.0, which can return a zero vector (not unit length). To uphold the contract, handle the zero-norm case similarly to the stub path (e.g., return a deterministic unit vector).

Suggested change
}
}
} else if !v.is_empty() {
// Deterministic fallback for zero-norm vectors: use a fixed unit basis vector.
// This upholds the public contract that embeddings are always normalized.
for x in v.iter_mut() {
*x = 0.0;
}
v[0] = 1.0;

Copilot uses AI. Check for mistakes.
//! the `onnx` feature flag (not yet wired — uses stub embeddings).
//! and L3 community profile stubs. When the `onnx` feature is enabled and the
//! all-MiniLM-L6-v2 model is present, produces 384-dim semantic embeddings.
//! Otherwise falls back to 64-dim hash-based stubs.
Copy link

Copilot AI Mar 26, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Crate-level docs say it “falls back to 64-dim hash-based stubs”, but when the crate is compiled with the onnx feature and the model is missing, embed() currently returns a 384-dim tiled stub to keep embed_dim() stable. Please align this doc comment with the actual behavior (or adjust the implementation if runtime-dynamic dimensions are intended).

Suggested change
//! Otherwise falls back to 64-dim hash-based stubs.
//! When `onnx` is enabled but the model is missing, returns 384-dim tiled
//! hash-based stubs to keep [`embed_dim`] stable. With `onnx` disabled, falls
//! back to 64-dim hash-based stubs.

Copilot uses AI. Check for mistakes.
# Change to `onnx = ["dep:ort"]` when wiring real ONNX integration.
onnx = []
# Enable ONNX Runtime for real semantic embeddings (all-MiniLM-L6-v2, 384-dim).
# When disabled (default), stub_embed() hash-based vectors are used.
Copy link

Copilot AI Mar 26, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The feature comment still refers to stub_embed() even though the stub embedder is now internal to embedder.rs and the public API is embed(). Updating the comment will avoid confusion for feature users.

Suggested change
# When disabled (default), stub_embed() hash-based vectors are used.
# When disabled (default), embed() uses an internal stub hash-based embedder.

Copilot uses AI. Check for mistakes.
Comment on lines +4 to +9
//! Embedding backend for the knowledge layer (ADR-023).
//!
//! When the `onnx` feature is enabled and the all-MiniLM-L6-v2 model file is
//! present on disk, produces 384-dim semantic embeddings via ONNX Runtime.
//! Otherwise falls back to a 64-dim hash-based stub (development/CI).
//!
Copy link

Copilot AI Mar 26, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The module-level docs say the fallback is a “64-dim hash-based stub”, but when built with the onnx feature and the model is missing/inference fails, embed() returns a tiled stub of length EMBED_DIM (384) to keep embed_dim() consistent. Please update the docs to match the actual runtime behavior (or change the fallback behavior if 64-dim is the intended contract).

Copilot uses AI. Check for mistakes.
Comment on lines +109 to +112
if shape.len() < 3 {
return Err("Unexpected output shape".to_string());
}
let hidden_dim = shape[2];
Copy link

Copilot AI Mar 26, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OnnxEmbedder::embed() does not validate that the model output hidden size matches the expected EMBED_DIM (384). If the wrong model is present (or an updated ONNX export changes output dims), embed() can return a vector whose length disagrees with embed_dim(), breaking callers’ assumptions. Consider checking the output shape (including shape[1] == seq_len and shape[2] == EMBED_DIM) and returning an error so the public embed() can fall back to the stub.

Suggested change
if shape.len() < 3 {
return Err("Unexpected output shape".to_string());
}
let hidden_dim = shape[2];
if shape.len() != 3 {
return Err(format!("Unexpected output rank: {:?} (expected 3D [1, seq_len, {}])", shape, super::EMBED_DIM));
}
let batch = shape[0];
let out_seq_len = shape[1];
let hidden_dim = shape[2];
if batch != 1 || out_seq_len != seq_len || hidden_dim != super::EMBED_DIM {
return Err(format!(
"Unexpected output shape: {:?} (expected [1, {}, {}])",
shape,
seq_len,
super::EMBED_DIM
));
}

Copilot uses AI. Check for mistakes.
- Validate ONNX output shape matches [1, seq_len, EMBED_DIM] — reject
  wrong model early so fallback kicks in
- normalize(): return unit basis vector on zero-norm (upholds contract)
- Align all docs: lib.rs, Cargo.toml, embedder.rs describe tiled stub
  fallback when onnx compiled but model missing

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 7 out of 9 changed files in this pull request and generated 1 comment.

///
/// When the `onnx` feature is enabled and the model is available,
/// produces a 384-dim vector using all-MiniLM-L6-v2.
/// Otherwise falls back to a 64-dim hash-based stub.
Copy link

Copilot AI Mar 26, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The rustdoc for embed() says the fallback is a 64-dim stub, but when the crate is built with the onnx feature and the model is missing/inference fails, the function returns a 384-dim tiled stub (tiled_stub(text, EMBED_DIM)). Please update this doc comment to reflect the actual fallback behavior so API consumers understand the returned vector length under each case.

Suggested change
/// Otherwise falls back to a 64-dim hash-based stub.
/// When `onnx` is enabled but the model is missing or inference fails,
/// returns a 384-dim tiled hash-based stub to keep [`embed_dim`] stable.
/// With `onnx` disabled, falls back to a 64-dim hash-based stub.

Copilot uses AI. Check for mistakes.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 7 out of 9 changed files in this pull request and generated 2 comments.

Comment on lines +93 to +99
pub fn embedding_dim(&self) -> Option<usize> {
let first_dim = self.chunks.first()?.embedding.len();
if self.chunks.iter().all(|c| c.embedding.len() == first_dim) {
Some(first_dim)
} else {
None
}
Copy link

Copilot AI Mar 26, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

KnowledgeIndex::embedding_dim() can return Some(0) when chunks have empty embeddings (e.g., chunker.rs and device_profiles.rs build chunks with embedding: Vec::new() placeholders). That makes the helper misleading for “dimension consistency checks”; consider returning None if first_dim == 0 or if any chunk has an empty embedding, so callers don’t treat an unbuilt index as having a valid dimension.

Copilot uses AI. Check for mistakes.
pub fn is_onnx_active() -> bool {
#[cfg(feature = "onnx")]
{
return onnx_backend::is_active();
Copy link

Copilot AI Mar 26, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is_onnx_active() currently only checks whether the OnceLock has already been initialized (EMBEDDER.get()), so it will return false until the first embed() call happens—even if the model is present and would load successfully. If this is meant as a capability/availability probe, consider triggering initialization (e.g., get_or_init/try_embed internally) or rename it to something like is_onnx_loaded() to match the current semantics.

Suggested change
return onnx_backend::is_active();
// Trigger lazy initialization by attempting a minimal embedding.
// `try_embed` returns `Some(_)` only when the ONNX backend is usable.
onnx_backend::try_embed("").is_some()

Copilot uses AI. Check for mistakes.
- embedding_dim(): return None when first_dim == 0 (unbuilt chunks)
- Rename is_onnx_active → is_onnx_loaded to reflect lazy-init semantics
- Update doc: only reflects state after first embed() call

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 7 out of 9 changed files in this pull request and generated 1 comment.


// Re-export key types for convenience
pub use community::{CommunityProfile, L3Status};
pub use embedder::{embed, embed_dim, is_onnx_loaded};
Copy link

Copilot AI Mar 26, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PR description mentions a public is_onnx_active() API, but the crate actually re-exports is_onnx_loaded. Please align the PR description with the implemented API name, or rename the function/re-export so the public surface matches what’s documented for consumers.

Copilot uses AI. Check for mistakes.
@amiable-dev amiable-dev merged commit 5f99b30 into main Mar 26, 2026
15 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants