Skip to content

Conversation

@AngelReyero
Copy link
Collaborator

No description provided.

@AngelReyero AngelReyero linked an issue Dec 8, 2025 that may be closed by this pull request
@AngelReyero AngelReyero requested a review from jpaillard December 8, 2025 14:06
@codecov
Copy link

codecov bot commented Dec 8, 2025

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 98.37%. Comparing base (eda767d) to head (6e7ef93).
⚠️ Report is 1 commits behind head on main.

Additional details and impacted files
@@           Coverage Diff           @@
##             main     #553   +/-   ##
=======================================
  Coverage   98.37%   98.37%           
=======================================
  Files          23       23           
  Lines        1602     1602           
=======================================
  Hits         1576     1576           
  Misses         26       26           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Comment on lines 29 to 37
Note that this method was initially introduced as the mean decrease accuracy (MDA)
by :footcite:t:`breimanRandomForests2001` for Random Forests. It was initially proposed
as an heuristic Variable Importance Measure and not as a formal estimator of a
interesting theoretical quantity. Moreover, it was shown in
:footcite:t:`benard2022SobolMDA` that PFI estimates a quantity that can be decomposed
as the sum of the Total Sobol Index (TSI) :ref:`total_sobol_index` and two extra terms
that are not significant due to correlations. Thus, the theoretical quantity estimated by PFI is
not a relevant quantity contrarily to :ref:`leave_one_covariate_out` or
:ref:`conditional_feature_importance`.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • Here, I would be a bit more direct, saying that it does not estimate any meaningful / previously studied quantity.
  • Maybe we can make it a Note like for "extrapolation issues" below?

Comment on lines 45 to 48
estimating a conditional sampler as in :ref:`conditional_feature_importance`. Since
the distribution from which we are sampling is the marginal distribution of the feature
breaking the relationship with the others, a simple permutation of the feature values
across the individuals is sufficient. Also, note that the same estimated model is
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
estimating a conditional sampler as in :ref:`conditional_feature_importance`. Since
the distribution from which we are sampling is the marginal distribution of the feature
breaking the relationship with the others, a simple permutation of the feature values
across the individuals is sufficient. Also, note that the same estimated model is
estimating a conditional sampler as in :ref:`conditional_feature_importance`. A simple permutation
of the feature values across the individuals is sufficient since the distribution from which we are sampling is the
marginal distribution of the feature, thus breaking the relationship with the others. Also, note that the
same estimated model is

Comment on lines 234 to 235
url = {https://doi.org/10.1007/s11222-021-10057-z},
abstract = {This paper reviews and advocates against the use of permute-and-predict (PaP) methods for interpreting black box functions. Methods such as the variable importance measures proposed for random forests, partial dependence plots, and individual conditional expectation plots remain popular because they are both model-agnostic and depend only on the pre-trained model output, making them computationally efficient and widely available in software. However, numerous studies have found that these tools can produce diagnostics that are highly misleading, particularly when there is strong dependence among features. The purpose of this work is to review the growing body of literature, demonstrate these drawbacks, explain why they occur, and advocate for alternative measures involving additional modeling. In particular, breaking dependencies between features forces extrapolation into sparse regions of the feature space, over-emphasizing correlated features in both variable importance measures and partial dependence plots.}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
url = {https://doi.org/10.1007/s11222-021-10057-z},
abstract = {This paper reviews and advocates against the use of permute-and-predict (PaP) methods for interpreting black box functions. Methods such as the variable importance measures proposed for random forests, partial dependence plots, and individual conditional expectation plots remain popular because they are both model-agnostic and depend only on the pre-trained model output, making them computationally efficient and widely available in software. However, numerous studies have found that these tools can produce diagnostics that are highly misleading, particularly when there is strong dependence among features. The purpose of this work is to review the growing body of literature, demonstrate these drawbacks, explain why they occur, and advocate for alternative measures involving additional modeling. In particular, breaking dependencies between features forces extrapolation into sparse regions of the feature space, over-emphasizing correlated features in both variable importance measures and partial dependence plots.}

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OOps, what are all these files ?

@jpaillard
Copy link
Collaborator

It looks almost good to me. I still have comments regarding the .bib file, from which I suggest removing links and abstracts.
Also, suggest adding at the end of the page of PFI, which will add links to the examples using PFI.

Examples
--------

.. minigallery:: hidimstat.PFI

I see that the test are not passing but merging #558 shoud improve

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[DOC] Permutation Feature Importance

4 participants