-
Notifications
You must be signed in to change notification settings - Fork 12
Permutation Feature Importance DOC #553
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
…50-doc-permutation-feature-importance
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #553 +/- ##
=======================================
Coverage 98.37% 98.37%
=======================================
Files 23 23
Lines 1602 1602
=======================================
Hits 1576 1576
Misses 26 26 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
| Note that this method was initially introduced as the mean decrease accuracy (MDA) | ||
| by :footcite:t:`breimanRandomForests2001` for Random Forests. It was initially proposed | ||
| as an heuristic Variable Importance Measure and not as a formal estimator of a | ||
| interesting theoretical quantity. Moreover, it was shown in | ||
| :footcite:t:`benard2022SobolMDA` that PFI estimates a quantity that can be decomposed | ||
| as the sum of the Total Sobol Index (TSI) :ref:`total_sobol_index` and two extra terms | ||
| that are not significant due to correlations. Thus, the theoretical quantity estimated by PFI is | ||
| not a relevant quantity contrarily to :ref:`leave_one_covariate_out` or | ||
| :ref:`conditional_feature_importance`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- Here, I would be a bit more direct, saying that it does not estimate any meaningful / previously studied quantity.
- Maybe we can make it a Note like for "extrapolation issues" below?
| estimating a conditional sampler as in :ref:`conditional_feature_importance`. Since | ||
| the distribution from which we are sampling is the marginal distribution of the feature | ||
| breaking the relationship with the others, a simple permutation of the feature values | ||
| across the individuals is sufficient. Also, note that the same estimated model is |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| estimating a conditional sampler as in :ref:`conditional_feature_importance`. Since | |
| the distribution from which we are sampling is the marginal distribution of the feature | |
| breaking the relationship with the others, a simple permutation of the feature values | |
| across the individuals is sufficient. Also, note that the same estimated model is | |
| estimating a conditional sampler as in :ref:`conditional_feature_importance`. A simple permutation | |
| of the feature values across the individuals is sufficient since the distribution from which we are sampling is the | |
| marginal distribution of the feature, thus breaking the relationship with the others. Also, note that the | |
| same estimated model is |
docs/tools/references.bib
Outdated
| url = {https://doi.org/10.1007/s11222-021-10057-z}, | ||
| abstract = {This paper reviews and advocates against the use of permute-and-predict (PaP) methods for interpreting black box functions. Methods such as the variable importance measures proposed for random forests, partial dependence plots, and individual conditional expectation plots remain popular because they are both model-agnostic and depend only on the pre-trained model output, making them computationally efficient and widely available in software. However, numerous studies have found that these tools can produce diagnostics that are highly misleading, particularly when there is strong dependence among features. The purpose of this work is to review the growing body of literature, demonstrate these drawbacks, explain why they occur, and advocate for alternative measures involving additional modeling. In particular, breaking dependencies between features forces extrapolation into sparse regions of the feature space, over-emphasizing correlated features in both variable importance measures and partial dependence plots.} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| url = {https://doi.org/10.1007/s11222-021-10057-z}, | |
| abstract = {This paper reviews and advocates against the use of permute-and-predict (PaP) methods for interpreting black box functions. Methods such as the variable importance measures proposed for random forests, partial dependence plots, and individual conditional expectation plots remain popular because they are both model-agnostic and depend only on the pre-trained model output, making them computationally efficient and widely available in software. However, numerous studies have found that these tools can produce diagnostics that are highly misleading, particularly when there is strong dependence among features. The purpose of this work is to review the growing body of literature, demonstrate these drawbacks, explain why they occur, and advocate for alternative measures involving additional modeling. In particular, breaking dependencies between features forces extrapolation into sparse regions of the feature space, over-emphasizing correlated features in both variable importance measures and partial dependence plots.} |
…ub.com/mind-inria/hidimstat into 550-doc-permutation-feature-importance
docs/src/model_agnostic_methods/permutation_feature_importance.rst
Outdated
Show resolved
Hide resolved
hidimpy/bin/Activate.ps1
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OOps, what are all these files ?
|
It looks almost good to me. I still have comments regarding the I see that the test are not passing but merging #558 shoud improve |
No description provided.