Skip to content

[BUG] Fix Sklearn Models detection by safely registering SklearnExtension #1542

@geetu040

Description

@geetu040

Running a scikit-learn Pipeline with openml.runs.run_model_on_task fails even when openml-sklearn is installed.

Reproduction:

import openml
from sklearn import impute, tree, pipeline

clf = pipeline.Pipeline(
    steps=[
        ('imputer', impute.SimpleImputer()),
        ('estimator', tree.DecisionTreeClassifier(max_depth=2))
    ]
)

task = openml.tasks.get_task(32)
run = openml.runs.run_model_on_task(clf, task, avoid_duplicate_runs=False)

print(run)

This raises:

ValueError: No extension registered which can handle model: Pipeline(...).
But it looks related to scikit-learn. Please install the OpenML scikit-learn extension (openml-sklearn) and try again.

This happens even if openml-sklearn is already installed.

If openml-sklearn is installed, sklearn models (including Pipeline) should be detected automatically without requiring any manual imports.

SklearnExtension is only registered if it is imported somewhere during execution. In normal usage, this does not happen, so the extension is never added to openml.extensions.extensions.

Possible fixes:

  • Safely import and register SklearnExtension when openml-sklearn is available. This can be done by adding a safe import statement or moving the register_extension out of openml_sklearn/__init__.py
  • Improve the error message when openml-sklearn is not installed. The current link does not lead to an installation guide.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions