Skip to content

Conversation

@nicklamiller
Copy link
Contributor

@nicklamiller nicklamiller commented Oct 24, 2025

Contributes to: #7031

Adds test for _cint64_array_to_numpy:

def _cint64_array_to_numpy(*, cptr: "ctypes._Pointer", length: int) -> np.ndarray:
"""Convert a ctypes int pointer array to a numpy array."""
if isinstance(cptr, ctypes.POINTER(ctypes.c_int64)):
return np.ctypeslib.as_array(cptr, shape=(length,)).copy()
else:
raise RuntimeError("Expected int64 pointer")

@nicklamiller nicklamiller changed the title Add test for converting a ctypes int64 pointer array to a NumPy array [python-package] Add test for converting a ctypes int64 pointer array to a NumPy array Oct 24, 2025
Copy link
Collaborator

@jameslamb jameslamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for working on this @nicklamiller

Before I review this... did you try to do this through LightGBM's public API?

From #7031:

Tests should only use lightgbm's public API, unless that is very difficult or expensive. Any function whose name begins with a _ is considered private.

This particular internal function is so small and simple, I think it'd be preferable to have tests which cover it via the public API. That'd give us more coverage of lightgbm under whatever conditions lead to this function being invoked. I'm not exactly sure what code paths do that, probably passing int64 arrays for label or something,... you'd have to do a bit of investigation.

@nicklamiller
Copy link
Contributor Author

did you try to do this through LightGBM's public API?

@jameslamb thank you very much for the detailed instructions in #7031 and sorry for completely missing that point! Testing through the public API to mimic how functionality is called in the wild by users makes sense, and _cint64_array_to_numpy is now tested through Booster(...).predict(...).

Newly covered lines

Running:

pytest \
    --cov=lightgbm \
    --cov-report="term" \
    --cov-report="html:htmlcov" \
    tests/python_package_test/test_engine.py

and viewing coverage report for python-package/lightgbm/basic.py (where _cint64_array_to_numpy is defined)

on master:
coverage_master

on this PR's branch:
coverage_pr

Some additional notes/thoughts:

  • _cint64_array_to_numpy only ever gets called if pred_contrib=True in Booster(...).predict(...), please see call chain/graph below
  • Even though _cint64_array_to_numpy is defined in basic.py, I noticed some tests in test_engine.py were directly testing Booster(...).predict(...) specifically with pred_contrib=True so I placed my test in test_engine.py instead of test_basic.py, one notable example of such a test is test_contribs_sparse:
    def test_contribs_sparse():
    n_features = 20
    n_samples = 100
    # generate CSR sparse dataset
    X, y = make_multilabel_classification(
    n_samples=n_samples, sparse=True, n_features=n_features, n_classes=1, n_labels=2
    )
    y = y.flatten()
    X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.1, random_state=42)
    params = {
    "objective": "binary",
    "verbose": -1,
    }
    lgb_train = lgb.Dataset(X_train, y_train)
    gbm = lgb.train(params, lgb_train, num_boost_round=20)
    contribs_csr = gbm.predict(X_test, pred_contrib=True)
    assert isspmatrix_csr(contribs_csr)
    # convert data to dense and get back same contribs
    contribs_dense = gbm.predict(X_test.toarray(), pred_contrib=True)
    # validate the values are the same
    if platform.machine() == "aarch64":
    np.testing.assert_allclose(contribs_csr.toarray(), contribs_dense, rtol=1, atol=1e-12)
    else:
    np.testing.assert_allclose(contribs_csr.toarray(), contribs_dense)
    assert np.linalg.norm(gbm.predict(X_test, raw_score=True) - np.sum(contribs_dense, axis=1)) < 1e-4
    # validate using CSC matrix
    X_test_csc = X_test.tocsc()
    contribs_csc = gbm.predict(X_test_csc, pred_contrib=True)
    assert isspmatrix_csc(contribs_csc)
    # validate the values are the same
    if platform.machine() == "aarch64":
    np.testing.assert_allclose(contribs_csc.toarray(), contribs_dense, rtol=1, atol=1e-12)
    else:
    np.testing.assert_allclose(contribs_csc.toarray(), contribs_dense)
    • I debated appending to this test but given it was already testing quite a few things, I decided to add test_predict_contrib_int64 as a separate one
  • There are some failing CI jobs for older CUDA versions, it's not immediately obvious to me if these are related to my changes, I'll dig a bit deeper.
`_cint64_array_to_numpy` call chain from Booster(...).predict(...)

Booster.predict(pred_contrib=True)
    ↓
_InnerPredictor.predict(pred_contrib=True)
    ↓
__pred_for_csr(csr, predict_type=_C_API_PREDICT_CONTRIB)
    ↓
__inner_predict_csr_sparse(csr, predict_type=_C_API_PREDICT_CONTRIB)
    ↓
_LIB.LGBM_BoosterPredictSparseOutput()  # C API call
    ↓
__create_sparse_native(csr, out_ptr_indptr, ...)
    ↓
_cint64_array_to_numpy(cptr=out_ptr_indptr, length=indptr_len)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants