Feature/generic ensemble#366
Open
skywardfire1 wants to merge 3 commits intosmartcorelib:developmentfrom
Open
Conversation
…Tree, RandomForest and kNN only. 7 new methods, a lot of tests
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## development #366 +/- ##
===============================================
- Coverage 45.59% 44.24% -1.35%
===============================================
Files 93 96 +3
Lines 8034 8190 +156
===============================================
- Hits 3663 3624 -39
- Misses 4371 4566 +195 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
Contributor
Author
|
Around 10 days of work. I'll fix 2 failing builds soon. |
Collaborator
|
wow this is great! thanks. it would be nice to have also #365 fixed with this so we can bump to v0.5.0 |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
this PR implements:
The Generic Ensemble Subsystem, how I name it
"What and why, mr Anderson?"
It allows a user to build his own custom ensemble models. More than that, since I used box dyn predictor, a user can even combine models of a different kind!
In my project I only use 18 kNNs, still I have been only interested in creation of universal ensemble, so... Again here is my attempt.
And here we come to two limitations
Allright, with that being said...
🔑 Key Features
🔄 Allows creation of heterogeneous predictor ensembles: mix KNN, Random Forest, Decision Tree are now the only supported models.
(Almost) any type implementing
Predictor<X, Y>can be a member.⚖️ Two voting strategies, uniform or weighted: simple majority or confidence-based aggregation.
Switch strategies at runtime with
set_voting_strategy(); weights are validated on insertion.🎛️ Dynamic enable/disable of members at runtime: toggle models without retraining.
Useful for A/B testing, fallback logic, or excluding underperforming models on-the-fly. My own idea!
🏷️ Metadata: descriptions, tags...: document and organize your ensemble.
Attach human-readable notes, group models by tags, I have no idea if anyone will use it, but implementing this was too fun and easy
⚖️ Set weights at anythime
Adjust voting influence with
set_weight()✂️ Feature slicing via
predict_using_names(): different inputs per model.Train models on disjoint feature subsets and combine predictions — ideal for multi-view learning.
Again, it was cruicial in my project, that's why I threw it into Smartcore, but unsure whether it is really useful
📊 Built-in scoring: quick accuracy evaluation with
score().Equivalent to
accuracy(y, predict(x))— and just for being more sklearn-ishDocumentation
📦 Model Management
🔄 Heterogeneous ensembles: Mix KNN, Random Forest, Decision Tree, SVM, or any custom model implementing
Predictor<X, Y>.No common base class required — trait-based composition.
🎯 Three ways to add models (3 public methods total for model management):
add(model)add_named(name, model)add_with_params(name?, model, weight?, desc?, tags?)🏷️ Rich metadata: Attach descriptions, tags, and voting weights to each member. Query voting weight via
weight(name).⚙️ Dynamic runtime control: Enable/disable individual models without retraining via
enable(),disable(),enabled(). Perfect for A/B testing, fallback logic, or excluding underperformers on-the-fly.🗳️ Voting Strategies
⚖️ Uniform or Weighted voting: Simple majority or confidence-based aggregation. Switch at runtime with
set_voting_strategy().🛡️ Rust-style strictness in Weighted mode:
🔧 Weight management: Set or update weights anytime via
set_weight(). Weights are validated on insertion and on strategy switch.🔮 Prediction & Evaluation
predict(&x)Xfor all modelspredict_using_names(&HashMap<String, X>)Xvia namescore(&x, &y) -> f64X+ labelsY📊 Built-in scoring:
score()returns accuracy in[0.0, 1.0]— equivalent toaccuracy(y, predict(x)), but convenient for cross-validation loops and hyperparameter tuning.✅ Type-safe predictions: All models in an ensemble must share the same
X: Array2<f64>andY: Array1<i32> + Clone, enforced at compile time via generics +PhantomData.🧰 Introspection & Utilities
🔍 Ensemble state:
names(),len(),is_empty(),strategy(),get_ensemble_info()— query structure and configuration anytime.🏷️ Metadata queries:
weight(name)— get voting weight for a member.🔄 Strategy switching:
set_voting_strategy()validates all weights when switching toWeighted, ensuring consistency.📚 Usage Guide: From Simple to Advanced
🎯 Scenario 1: The "Just Works" Way (3 lines)
Wanna go on ease? No problem! Just do:
✅ That's it. No weights, no names, no config.
🎯 Scenario 2: Name Your Models
Use
add_named()when you want explicit control and better observability in your ensemble. Meaningful names make it easier to:🎯 Scenario 3: Control Voting — Full Lifecycle
Step-by-step: Uniform → assign weights → switch to Weighted
🎯 Scenario 4: Feature Slicing — Different Inputs per Model
For advanced use-cases like training on different feature subsets (multi-view learning).
🎯 Scenario 5: Full Control — Metadata, Tags, Dynamic Management
🏭 Real-World Usage Patterns
Those are of my SAAN project.
Pattern 1: Auto-disable underperforming models
Pattern 2: Compare voting strategies on the same ensemble
Pattern 3: Dynamically add a strong model and boost its influence
📊 Interpreting Ensemble Logs
When running ensembles in production, you'll see structured output like this:
🔍 How to read this:
Active=13/19means 6 models were disabled due to low precision💡 Pro tips:
get_ensemble_info()before/after major changesenabled()to verify which models actually contributed to a prediction🧪 Testing Philosophy
Our test suite covers:
add(),add_named(), auto-namespredict_using_names()with per-model inputsenable()/disable()affecting predictionsscore()validity across model additions/removalsAll tests use minimal, reproducible dummy data and verify both success and failure paths.
📋 Public API Summary
new(),with_strategy()add(),add_named(),add_with_params()set_weight(),set_description(),weight()enable(),disable(),enabled()predict(),predict_using_names(),score()names(),len(),is_empty(),strategy(),get_ensemble_info()set_voting_strategy()add_with_params()checksHashMapkeys; fails fastpredict_using_names()Array2trait enforces shape;Failederror on mismatchenabled()filter applied automatically inpredict()Ensemble<X, Y>enforce same input/output types at compile time🚀 What's Next? (Roadmap)
predict_proba()supportdescription()andtags()Nonewhen switching Weighted → UniformReady for review. 🦀✨