-
Notifications
You must be signed in to change notification settings - Fork 51
Experimental/two stage #296
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
`CandidateRankingModel`
We make changes to the `get_train_with_targets_for_reranker` method to separate the retrieval of sampled candidates and unsampled candidates from first-stage candidate generators for the reranker.
…eSystems/RecTools into experimental/two_stage
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR introduces a two-stage recommendation pipeline through the CandidateRankingModel class, which combines first-stage candidate generation with second-stage reranking using gradient boosting models.
Key Changes:
- Implements a flexible two-stage ranking architecture with support for multiple candidate generators and various reranking models
- Adds specialized support for CatBoost models through the
CatBoostRerankerclass - Introduces helper classes for feature collection, negative sampling, and candidate generation
- Includes comprehensive tests and a detailed tutorial notebook
Reviewed changes
Copilot reviewed 13 out of 15 changed files in this pull request and generated 9 comments.
Show a summary per file
| File | Description |
|---|---|
rectools/models/ranking/candidate_ranking.py |
Core implementation of the two-stage ranking model with candidate generation, feature collection, and reranking logic |
rectools/models/ranking/catboost_reranker.py |
Specialized reranker for CatBoost classifiers and rankers with pool preparation |
rectools/models/ranking/__init__.py |
Module exports with fallback imports for optional dependencies |
rectools/exceptions.py |
New NotFittedForStageError exception for stage-specific fitting requirements |
rectools/columns.py |
Added Target column constant for train/test target values |
rectools/compat.py |
Compatibility class for CatBoost when dependency is unavailable |
tests/models/ranking/test_candidate_ranking.py |
Comprehensive tests for all ranking components |
tests/models/ranking/test_catboost_reranker.py |
Tests for CatBoost-specific functionality |
tests/models/test_serialization.py |
Model serialization tests including CandidateRankingModel |
tests/test_compat.py |
Compatibility layer tests for CatBoostReranker |
pyproject.toml |
Added catboost dependency and updated black version |
README.md |
Documentation of new catboost extension |
examples/tutorials/candidate_ranking_model_tutorial.ipynb |
Detailed tutorial with multiple reranker examples |
.github/workflows/test.yml |
Removed trailing whitespace |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #296 +/- ##
===========================================
Coverage 100.00% 100.00%
===========================================
Files 45 85 +40
Lines 2242 5870 +3628
===========================================
+ Hits 2242 5870 +3628 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
| A series containing the predicted scores for each candidate. If the model is a classifier, the scores | ||
| represent probabilities for the positive class. | ||
| """ | ||
| x_full = candidates.drop(columns=Columns.UserItem) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are we sure we always want to remove user and item ids?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@blondered My old question to you
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
Copilot reviewed 15 out of 18 changed files in this pull request and generated 8 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| splitter: Splitter, | ||
| reranker: Reranker, | ||
| sampler: NegativeSamplerBase = PerUserNegativeSampler(), | ||
| feature_collector: CandidateFeatureCollector = CandidateFeatureCollector(), |
Copilot
AI
Feb 1, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These defaults instantiate objects at function definition time (PerUserNegativeSampler() / CandidateFeatureCollector()), which can unintentionally share state across CandidateRankingModel instances and is flagged by many linters. Prefer sampler: Optional[...] = None / feature_collector: Optional[...] = None and create instances inside __init__.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@blondered I actually agree. WDYT?
| train_targets[Columns.Target] = 1 | ||
|
|
||
| # Remember that this way we exclude positives that weren't present in candidates | ||
| train = pd.merge( | ||
| candidates, | ||
| train_targets[[Columns.User, Columns.Item, Columns.Target]], |
Copilot
AI
Feb 1, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
train_targets[Columns.Target] = 1 mutates the train_targets DataFrame passed into this method and can also trigger SettingWithCopyWarning depending on how the slice was created. Prefer working on a copy (e.g., train_targets = train_targets.copy()), or create a separate [user,item,target] frame to merge without mutating the input.
| train_targets[Columns.Target] = 1 | |
| # Remember that this way we exclude positives that weren't present in candidates | |
| train = pd.merge( | |
| candidates, | |
| train_targets[[Columns.User, Columns.Item, Columns.Target]], | |
| # Work on a separate DataFrame to avoid mutating the input and to prevent SettingWithCopyWarning | |
| targets = train_targets[[Columns.User, Columns.Item]].copy() | |
| targets[Columns.Target] = 1 | |
| # Remember that this way we exclude positives that weren't present in candidates | |
| train = pd.merge( | |
| candidates, | |
| targets[[Columns.User, Columns.Item, Columns.Target]], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed using indicator
Description
Type of change
How Has This Been Tested?
Before submitting a PR, please check yourself against the following list. It would save us quite a lot of time.