Skip to content

Conversation

@feldlime
Copy link
Collaborator

@feldlime feldlime commented Aug 30, 2025

Description

Type of change

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • Optimization

How Has This Been Tested?

Before submitting a PR, please check yourself against the following list. It would save us quite a lot of time.

  • Have you read the contribution guide?
  • Have you updated the relevant docstrings? We're using Numpy format, please double-check yourself
  • Does your change require any new tests?
  • Have you updated the changelog file?

Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR introduces a two-stage recommendation pipeline through the CandidateRankingModel class, which combines first-stage candidate generation with second-stage reranking using gradient boosting models.

Key Changes:

  • Implements a flexible two-stage ranking architecture with support for multiple candidate generators and various reranking models
  • Adds specialized support for CatBoost models through the CatBoostReranker class
  • Introduces helper classes for feature collection, negative sampling, and candidate generation
  • Includes comprehensive tests and a detailed tutorial notebook

Reviewed changes

Copilot reviewed 13 out of 15 changed files in this pull request and generated 9 comments.

Show a summary per file
File Description
rectools/models/ranking/candidate_ranking.py Core implementation of the two-stage ranking model with candidate generation, feature collection, and reranking logic
rectools/models/ranking/catboost_reranker.py Specialized reranker for CatBoost classifiers and rankers with pool preparation
rectools/models/ranking/__init__.py Module exports with fallback imports for optional dependencies
rectools/exceptions.py New NotFittedForStageError exception for stage-specific fitting requirements
rectools/columns.py Added Target column constant for train/test target values
rectools/compat.py Compatibility class for CatBoost when dependency is unavailable
tests/models/ranking/test_candidate_ranking.py Comprehensive tests for all ranking components
tests/models/ranking/test_catboost_reranker.py Tests for CatBoost-specific functionality
tests/models/test_serialization.py Model serialization tests including CandidateRankingModel
tests/test_compat.py Compatibility layer tests for CatBoostReranker
pyproject.toml Added catboost dependency and updated black version
README.md Documentation of new catboost extension
examples/tutorials/candidate_ranking_model_tutorial.ipynb Detailed tutorial with multiple reranker examples
.github/workflows/test.yml Removed trailing whitespace

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@codecov
Copy link

codecov bot commented Jan 31, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 100.00%. Comparing base (9b3992e) to head (8d8aeed).
⚠️ Report is 122 commits behind head on main.

Additional details and impacted files
@@             Coverage Diff             @@
##              main      #296     +/-   ##
===========================================
  Coverage   100.00%   100.00%             
===========================================
  Files           45        85     +40     
  Lines         2242      5870   +3628     
===========================================
+ Hits          2242      5870   +3628     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

A series containing the predicted scores for each candidate. If the model is a classifier, the scores
represent probabilities for the positive class.
"""
x_full = candidates.drop(columns=Columns.UserItem)
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are we sure we always want to remove user and item ids?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@blondered My old question to you

@feldlime feldlime marked this pull request as ready for review February 1, 2026 17:09
@feldlime feldlime requested review from blondered and Copilot February 1, 2026 17:09
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 15 out of 18 changed files in this pull request and generated 8 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +503 to +506
splitter: Splitter,
reranker: Reranker,
sampler: NegativeSamplerBase = PerUserNegativeSampler(),
feature_collector: CandidateFeatureCollector = CandidateFeatureCollector(),
Copy link

Copilot AI Feb 1, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These defaults instantiate objects at function definition time (PerUserNegativeSampler() / CandidateFeatureCollector()), which can unintentionally share state across CandidateRankingModel instances and is flagged by many linters. Prefer sampler: Optional[...] = None / feature_collector: Optional[...] = None and create instances inside __init__.

Copilot uses AI. Check for mistakes.
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@blondered I actually agree. WDYT?

Comment on lines 677 to 682
train_targets[Columns.Target] = 1

# Remember that this way we exclude positives that weren't present in candidates
train = pd.merge(
candidates,
train_targets[[Columns.User, Columns.Item, Columns.Target]],
Copy link

Copilot AI Feb 1, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

train_targets[Columns.Target] = 1 mutates the train_targets DataFrame passed into this method and can also trigger SettingWithCopyWarning depending on how the slice was created. Prefer working on a copy (e.g., train_targets = train_targets.copy()), or create a separate [user,item,target] frame to merge without mutating the input.

Suggested change
train_targets[Columns.Target] = 1
# Remember that this way we exclude positives that weren't present in candidates
train = pd.merge(
candidates,
train_targets[[Columns.User, Columns.Item, Columns.Target]],
# Work on a separate DataFrame to avoid mutating the input and to prevent SettingWithCopyWarning
targets = train_targets[[Columns.User, Columns.Item]].copy()
targets[Columns.Target] = 1
# Remember that this way we exclude positives that weren't present in candidates
train = pd.merge(
candidates,
targets[[Columns.User, Columns.Item, Columns.Target]],

Copilot uses AI. Check for mistakes.
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed using indicator

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants