Use khiops-core and khiops-driver-* pip packages containing binaries only#582
Use khiops-core and khiops-driver-* pip packages containing binaries only#582tramora wants to merge 2 commits into
Conversation
caa2f18 to
fd7c550
Compare
394d474 to
5312426
Compare
|
False sentiment of success : we simulate multiple Python environments using conda (#573). This is very unfortunate as we are not able yet to detect automatically an issue with the new PyPI packages regarding the remote drivers khiops-core:1041. In my opinion this one is a blocking issue. |
5312426 to
b0223fa
Compare
Hence, to properly test this, issue #573 must be addressed first. Right, @tramora ? |
95a1d21 to
cfd1802
Compare
| python -m pip install --user . | ||
| # khiops-core and khiops-drivers-* must always be installed from TestPyPI in order to avoid distorting usage statistics | ||
| # Use of both 'index-url' and '--extra-index-url' options allows indexes prioritization (PEP 766) | ||
| python -m pip install --index-url https://test.pypi.org/simple --extra-index-url https://pypi.org/simple --user . |
There was a problem hiding this comment.
I am not sure I see why just python -m pip install --extra-index-url https://test.pypi.org/simple does not suffice:
- the latest version of
khiops-coreis taken fromtest.pypi.org; - all other dependencies are taken from
pypi.org.
Why is this not enough (no need for drivers to build the API docs)?
There was a problem hiding this comment.
It was must first change but ... when the package will be on both PyPI and Test PyPI (under the same version or not), as PyPI is the index in the list, it will be used first. That was the rationale of the reordering
There was a problem hiding this comment.
I see. In this case, in order to make sure all other dependencies than khiops-*(core and drivers) are taken from pypi.org, IMHO we need to:
- First install all dependencies except
khiops-*-ones from pypi.org - Install only the
khiops-*packages from test.pypi.org only (as the only --index-url, no --extra-index-url).
There was a problem hiding this comment.
I will split into 2 distinct lines then.
There is a difficulty however in a few cases
- when a
pip install .is performed in thekhiops-pythonfolder. (inapi-docs.yml) - when the
khiops-python"tar.gz" artifact is installed (inpip.yml) - on Windows Powershell where the
perlandgreptools are not available
It seems impossible to perform a on-the-fly filtering (unless doing complex tricks)
There was a problem hiding this comment.
Refactored in splitting in two lines except for Windows (PowerShell) where I do not know yet the tools that would help me
| echo "KHIOPSDEV_OS_CODENAME=$(echo '${{ matrix.khiopsdev-os }}' | tr -d '0-9.')" >> "$GITHUB_ENV" | ||
| KHIOPSDEV_OS_CODENAME=$(echo '${{ matrix.khiopsdev-os }}' | tr -d '0-9.') | ||
| echo "KHIOPSDEV_OS_CODENAME=${KHIOPSDEV_OS_CODENAME}" >> "$GITHUB_ENV" | ||
| if [[ "${KHIOPSDEV_OS_CODENAME}" == "ubuntu" || "${KHIOPSDEV_OS_CODENAME}" == "debian" ]]; then |
There was a problem hiding this comment.
I would use a Bash case here with an error message in the default case.
There was a problem hiding this comment.
I changed the syntax, unless I'm wrong, there is no error message to show in the default case as then a specific Dockerfile is used ("Dockerfile.rocky")
There was a problem hiding this comment.
I changed the syntax, unless I'm wrong, there is no error message to show in the default case as then a specific Dockerfile is used ("Dockerfile.rocky")
I thought about explicitly supporting debian, ubuntu (one case), and rocky (second case), with an error for any other codename.
There was a problem hiding this comment.
Changed but in my opinion it adds a burden to the maintainer (compared to the situation before this PR).
The matrix of all the supported OSes is defined in the same file and the Dockerfile extension was evaluated using each value of the matrix (after suppressing the digits or .). It seems unnecessary to verify that every value is in the same matrix
| pip install $(ls khiops*.tar.gz) | ||
| # khiops-core and khiops-drivers-* must always be installed from TestPyPI in order to avoid distorting usage statistics | ||
| # Use of both 'index-url' and '--extra-index-url' options allows indexes prioritization (PEP 766) | ||
| pip install --index-url https://test.pypi.org/simple --extra-index-url https://pypi.org/simple $(ls khiops*.tar.gz) |
There was a problem hiding this comment.
See comment above about the --extra-index-url.
There was a problem hiding this comment.
Refactored in splitting in two lines except for Windows (PowerShell) where I do not know yet the tools that would help me
|
|
||
| # khiops-core and khiops-drivers-* must always be installed from TestPyPI in order to avoid distorting usage statistics | ||
| # Use of both 'index-url' and '--extra-index-url' options allows indexes prioritization (PEP 766) | ||
| pip-compile --index-url https://test.pypi.org/simple --extra-index-url https://pypi.org/simple -o requirements.txt |
There was a problem hiding this comment.
Refactored in splitting in two lines except for Windows (PowerShell) where I do not know yet the tools that would help me
| # Install dev dependencies | ||
| pip install -r requirements.txt | ||
| # Same indexes prioritization as above | ||
| pip install --index-url https://test.pypi.org/simple --extra-index-url https://pypi.org/simple -r requirements.txt |
There was a problem hiding this comment.
Refactored in splitting in two lines except for Windows (PowerShell) where I do not know yet the tools that would help me
| # Use of both 'index-url' and '--extra-index-url' options allows indexes prioritization (PEP 766) | ||
| Get-Content .\requires.txt ` | ||
| | ForEach-Object {python -m pip install $_.toString()} | ||
| | ForEach-Object {python -m pip install --index-url https://test.pypi.org/simple --extra-index-url https://pypi.org/simple $_.toString()} |
There was a problem hiding this comment.
Same as above regarding --extra-index-url.
There was a problem hiding this comment.
for Windows (PowerShell), I do not know yet the tools that would help me split into 2 lines
| # Same indexes prioritization as above | ||
| Get-Content .\requires.txt ` | ||
| | ForEach-Object {khiops-windows-venv\Scripts\python -m pip install $_.toString()} | ||
| | ForEach-Object {khiops-windows-venv\Scripts\python -m pip install --index-url https://test.pypi.org/simple --extra-index-url https://pypi.org/simple $_.toString()} |
There was a problem hiding this comment.
Cf. the --extra-index-url comments above.
There was a problem hiding this comment.
for Windows (PowerShell), I do not know yet the tools that would help me split into 2 lines
| pip install `cat requires.txt` | ||
| # khiops-core and khiops-drivers-* must always be installed from TestPyPI in order to avoid distorting usage statistics | ||
| # Use of both 'index-url' and '--extra-index-url' options allows indexes prioritization (PEP 766) | ||
| pip install --index-url https://test.pypi.org/simple --extra-index-url https://pypi.org/simple `cat requires.txt` |
There was a problem hiding this comment.
Refactored in splitting in two lines except for Windows (PowerShell) where I do not know yet the tools that would help me
| return uri_info._replace(path=_child_path(uri_info.path, child_name)) | ||
|
|
||
|
|
||
| def _check_khiops_driver_library(khiops_drivers_path, remote_storage_type): |
There was a problem hiding this comment.
Have khiops_drivers_path as a kwarg set to None by default.
| # We cannot guarantee where the dynamic library is located. | ||
| # It seems wiser to avoid raising a false error | ||
| return | ||
| extension = ".dll" if platform.system() == "Windows" else ".so" |
There was a problem hiding this comment.
I think the DLL extension on MacOS is .dylib. Cf. e.g. https://github.com/KhiopsML/khiopsdriver-gcs/blob/943fef618b689d8e1abb05e6feb70e1f8efd1db7/test/plugin_test.cpp#L32.
There was a problem hiding this comment.
You are right, even if I read in different pages that the ".so" was supported first to avoid differences between the "*BSD/Darwin" world and the Linux... then that ".so" was intented only for "Shared Modules" not "Shared Libraries" that shoud be ".dylib" or ".framework". Nevertheless it is implemented as a ".dylib" in our case.
| return uri_info._replace(path=_child_path(uri_info.path, child_name)) | ||
|
|
||
|
|
||
| def _check_khiops_driver_library(khiops_drivers_path, remote_storage_type): |
There was a problem hiding this comment.
Add type and value checks for remote_storage_type (as asserts, because this is a "private" function by convention) - also see the comment below in this respect.
I changed the |
| if khiops_drivers_path is None or len(khiops_drivers_path) == 0: | ||
| # We cannot guarantee where the dynamic library is located. | ||
| # It seems wiser to avoid raising a false error | ||
| return |
There was a problem hiding this comment.
I would ban None or the empty string (or bytes) as a khiops_drivers_path value: if this function is called, as it is "private" by convention, we can just assert that khiops_drivers_path is not None and it is of type str or bytes and non-empty at its very beginning: it is the programmer's responsibility to make sure khiops_drivers_path has proper type and value before calling the function IMHO.
| return | ||
| extension = ".dll" if platform.system() == "Windows" else ".so" | ||
| absolute_path = os.path.join( | ||
| khiops_drivers_path, |
There was a problem hiding this comment.
I would write os.path.abspath(khiops_drivers_path) instead, to make sure the path is absolute.
| extension = ".dll" if platform.system() == "Windows" else ".so" | ||
| absolute_path = os.path.join( | ||
| khiops_drivers_path, | ||
| "libkhiopsdriver_file_" f"{remote_storage_type}{extension}", |
There was a problem hiding this comment.
Put all this line inside a single f-string: f"libkhiopsdriver_file_{remote_storage_type}{extension}"
|
|
||
| def __init__(self, uri): | ||
|
|
||
| _check_khiops_driver_library(os.environ.get("KHIOPS_DRIVERS_PATH"), "gcs") |
There was a problem hiding this comment.
Only call this function if KHIOPS_DRIVERS_PATH is properly set (see comments above regarding this function). Also, note that KHIOPS_DRIVERS_PATH is not supposed to be changed by the end-user. A wrong value of it is an upstream bug.
|
|
||
| def __init__(self, uri): | ||
|
|
||
| _check_khiops_driver_library(os.environ.get("KHIOPS_DRIVERS_PATH"), "s3") |
| Azure Storage Resource initializer common to Files and Blobs | ||
| """ | ||
|
|
||
| _check_khiops_driver_library(os.environ.get("KHIOPS_DRIVERS_PATH"), "azure") |
| # KHIOPS_HOME variable by default | ||
| if "KHIOPS_HOME" in os.environ: | ||
| if platform.system() == "Windows" and installation_method == "pip": | ||
| # On Windows native installations, |
There was a problem hiding this comment.
On Windows, Pip should also install khiops-core. Hence, I am not sure I understand why all this comment is still relevant. Windows native installation should be out-of-scope now, shouldn't it?
There was a problem hiding this comment.
As I commented in the initial issue, on Linux at least, given the 3 levels "venv pip", "system-wide pip", "system-wide native", if the user removes the khiops-core package, the khiops in upper level is used instead (khiops_env is searched in the available PATH).
On Windows, the system-wide native is the equivalent of the "system-wide native" cited above. In my opinion forbidding its use (in ignoring the KHIOPS_HOME set by the native installer) will make a behavior difference between the 2 platforms
There was a problem hiding this comment.
IMHO, we should not support fallback to native installation: if the user removes the khiops-core Pip package, then we should detect this and fail right away (I would say, directly in khiops/__init__.py): indeed, now the link between the Python library and the binaries is no longer (only) a matter of local runner, but a matter of packaging. Thus, in khiops/__init__.py, we could do something like this:
from importlib.metadata import distribution, PackageNotFoundError
try:
distribution("khiops-core")
except PackageNotFoundError as exc:
raise KhiopsEnvironmentError(f"The Khiops binaries are not installed properly: ${exc}") from excI think this should cover the case evoked here. In any case, IMHO the fallback to OS-native packages should not be authorized, because we would run into version compatibility checks etc, which would defeat the purpose of having a "full-pip" or "full-conda" Python experience with Khiops.
There was a problem hiding this comment.
Agreed.
I then copied your snippet into khiops/__init.py but I had to comment it out immediately until we find a compatible fix.
Indeed in 'conda' and 'conda-based' installation methods where a Conda package is installed (not a PyPI package)
distribution("khiops-core") always raises a PackageNotFoundError
On Windows 'pip', I also modified the error message if khiops_env cannot be found : the user must check the installation of the 'Khiops Python library' not 'Khiops' any longer
| # Miss : KHIOPS_HOME variable evaluation is the fallback | ||
| elif "KHIOPS_HOME" in os.environ: | ||
| khiops_env_path = os.path.join( | ||
| os.environ["KHIOPS_HOME"], "bin", "khiops_env.cmd" | ||
| ) | ||
| # Raise error if KHIOPS_HOME is not set |
There was a problem hiding this comment.
Why do we still need this? Either khiops_env.cmd can be found as expected according to the current Python environment (virtualenv, user-site or system-wide - discouraged), or it cannot. Why rely on KHIOPS_HOME?
| raise KhiopsEnvironmentError( | ||
| "No environment variable named 'KHIOPS_HOME' found. " | ||
| "No 'khiops_env.cmd' found in current environment nor " | ||
| "in 'KHIOPS_HOME' location. " |
There was a problem hiding this comment.
We should drop KHIOPS_HOME fallback IMHO (see comments above).
| # must be appended to "PATH" otherwise Khiops wouldn't find it | ||
| # and fail immediately | ||
| elif var_name == "KHIOPS_MPI_DLL_PATH": | ||
| os.environ["PATH"] = os.environ.get("PATH") + os.pathsep + var_value |
There was a problem hiding this comment.
I would write either os.pathsep.join([var_value, os.environ.get("PATH")]), or f"{var_value}{os.pathsep}{os.environ.get('PATH')}" and prefer the former (more readable IMHO).
But anyway, I would prepend var_value to the path, to make sure the correct MPI path takes precedence over whatever (potentially system-wide) MPI installation might preexist.
| - (`sklearn`) `keep_selected_variables_only` parameter to the predictors (`KhiopsClassifier` and `KhiopsRegressor`) | ||
| - (General) Support for Azure storage | ||
| ### Changed | ||
| - *Internals*: |
There was a problem hiding this comment.
I would place this change under (General) and rephrase it to something like:
"- (General) Full-Pip installation support: khiops now depends on the khiops-core and remote storage driver Pip packages"
| - (General) Support for Azure storage | ||
| ### Changed | ||
| - *Internals*: | ||
| - Declaration of khiops-core and khiops drivers as dependencies (mandatory or optional) |
There was a problem hiding this comment.
I would also add an entry to the ### Removed section, along the lines of:
"- Support for OS-native Khiops Core and Pip Khiops Python library installations"
There was a problem hiding this comment.
Line added even if I'm not 100% sure because of #582 (comment)
daeaf02 to
19e7ca7
Compare
…ries only
- these packages become mandatory ("khiops-core") or optional ("khiops-driver-*") dependencies dragged during the installation process
- these packages must be installed in the Conda environments used to simulate multiple environs (the Conda packages MUST not be used)
- in order to avoid distorting usage statistics the test workflows will always use packages from TestPypi
19e7ca7 to
e45f7dc
Compare
Fixes #572 and #578
TODO Before Asking for a Review
main(ormain-v10)Unreleasedsection ofCHANGELOG.md(no date)index.html