Skip to content

feat: solve bug reported in the alpha feedback and added some features like the cumulative arg. #10

Merged
ElouenGinat merged 10 commits intomainfrom
alpha-feedback
Apr 13, 2026
Merged

feat: solve bug reported in the alpha feedback and added some features like the cumulative arg. #10
ElouenGinat merged 10 commits intomainfrom
alpha-feedback

Conversation

@ElouenGinat
Copy link
Copy Markdown
Collaborator

  • Implemented functions to generate Gaussian and Pareto distributed data.
  • Created visualizations for both distributions using adaptive histograms.
  • Saved generated figures to the 'docs/images' directory.
  • Included sample statistics in the console output for verification.

…n examples

- Implemented functions to generate Gaussian and Pareto distributed data.
- Created visualizations for both distributions using adaptive histograms.
- Saved generated figures to the 'docs/images' directory.
- Included sample statistics in the console output for verification.
@ElouenGinat ElouenGinat self-assigned this Apr 9, 2026
@ElouenGinat ElouenGinat requested a review from Copilot April 9, 2026 12:54
@ElouenGinat ElouenGinat linked an issue Apr 9, 2026 that may be closed by this pull request
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR expands Khisto’s histogram APIs and documentation, adding a cumulative histogram API (cumfreq) plus Matplotlib cumulative support, alongside improved CLI error reporting and updated example assets for the README/docs.

Changes:

  • Added khisto.cumfreq (array-level cumulative histogram) and exposed it via package exports.
  • Added cumulative support (incl. reverse cumulative) to khisto.matplotlib.hist, plus new tests.
  • Improved CLI adapter behavior: richer error messages on CLI failures and better “best histogram” selection logic.

Reviewed changes

Copilot reviewed 13 out of 16 changed files in this pull request and generated 6 comments.

Show a summary per file
File Description
tests/plot/test_matplotlib_histogram.py Updates default density expectations; adds cumulative + sequence-input tests for Matplotlib wrapper.
tests/core/test_cli_adapter.py Adds tests for CLI failure reporting and best-histogram file authority.
tests/array/test_histogram.py Adds cumfreq tests; strengthens density vs count assertion.
src/khisto/matplotlib/hist.py Adds cumulative parameter, unsupported-kwargs checks, and cumulative application logic.
src/khisto/core/cli_adapter.py Improves CSV parsing types, best-histogram selection, and wraps subprocess failures as RuntimeError.
src/khisto/array/histogram/api.py Adds _apply_cumulative, changes density default, and introduces cumfreq.
src/khisto/array/histogram/init.py Re-exports cumfreq.
src/khisto/array/init.py Re-exports cumfreq.
src/khisto/init.py Exposes cumfreq at top-level.
scripts/generate_distribution_examples.py New script to generate Gaussian/Pareto examples and save plots to docs.
README.md Adds example images and updates quick start / cumulative docs and dev commands.
docs/images/pareto-quick-start.png Adds generated Pareto example image for README.
docs/images/gaussian-quick-start.png Adds generated Gaussian example image for README.
docs/API.md Documents cumfreq, cumulative plotting changes, and “How it works” section.
docs/API_COMPARISON.md Adds SciPy cumfreq comparison; updates Matplotlib comparison to include cumulative + sequence concatenation.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread src/khisto/array/histogram/api.py Outdated
Comment thread src/khisto/array/histogram/api.py Outdated
Comment thread src/khisto/core/cli_adapter.py Outdated
Comment thread scripts/generate_distribution_examples.py Outdated
Comment thread README.md Outdated
Comment thread docs/API.md Outdated
ElouenGinat and others added 2 commits April 9, 2026 15:02
remove the dimension check on array for histogram api

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Copy link
Copy Markdown

@marcboulle marcboulle left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Quelques petits détails ou réponse à tes questions dans les notes de review de copilot.

Sinon, j'ai vérifié un à un les points de l'issue: l'essentiel me semble OK.

Cumulative

Pour la cumulative, je pense que c'est très bien de prendre en compte cette option de matplotlib.hist, pur être le plus possible compatible avec cette méthode.

Par contre, je pense qu'il ne faut pas ajouter la nouvelle méthode cumfreq:

  • cela augmente de 50% le nombre des méthode de l'API ;)
  • autant améliorer à peu de frais la compatibilité avec matplotlib.hist est pertinent, autant cette nouvelle méthode est de faible intérêt, puisque l'on peut exploiter les résultats d'un appel à khisto.histogram en une ligne de code pour avoir l'équivalent
  • cela pose de nouveau problème de compatibilité avec scipy.stats. cumfreq, qui a des paramètre spécifique comme defaultreallimits ou weights

Bugs

Je n'ai testé si les bugs sont corrigés: as-tu pu reproduire ces bugs et les corriger?

Format binaire

Enfin, je pense que c'est le moment de prendre en compte le format binaire, maintenant géré par khisto dans khiops (disponible dans la release https://github.com/KhiopsML/khiops/releases/tag/11.0.1-a.3).
Cf nouvelle version du readme, avec la nouvelle option -b

Usage: `khisto [OPTIONS] INPUT_FILE HISTOGRAM`
The output HISTOGRAM contains bin details: lower bound, upper bound, bin width, frequency, probability, and density.

Options:
- -b: Read input as binary (8-byte doubles) instead of text
- -e: Generate a series of histograms with increasing accuracy for exploratory analysis
- -j: Save all outputs in a single JSON file
- -h: Display help message and exit
- -v: Show version information and exit

Pour cela, la méthode tofilede numpy devrait faire l'affaire:

        data = np.random.normal(0, 1, size)
        data.tofile(f"Gaussian{size}.bin")

Test finaux

Dès que tout cela sera prêt, je retesterais le tout en mode béta.

Comment thread README.md Outdated
Copy link
Copy Markdown

@marcboulle marcboulle left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Il reste des référence à cumfreq.

Sinon, pourrais-tu répondre aux remarques de review que tu as prises en compte (par exemple, par "Done"): cela me permettrait de les mettre en "Resolved" après vérification.

Tu n'as pas non plus répondu à ma question sur les bugs (reproduction et résolution).

Comment thread README.md Outdated
Comment thread README.md Outdated
@ElouenGinat
Copy link
Copy Markdown
Collaborator Author

Bonjour,

Je n’ai pas réussi à reproduire les deux bugs que tu as décrits. On peut en discuter ensemble si tu veux.
Dans tous les cas, le paramètre density fonctionne.

J’ai supprimé cumfreq. J’ai également implémenté le format binaire et mis à jour la version de khisto.

@marcboulle marcboulle self-requested a review April 13, 2026 09:33
Copy link
Copy Markdown

@marcboulle marcboulle left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Pour les bugs, je verrai cela lors du béta-test.

@ElouenGinat ElouenGinat merged commit 1b4cdcf into main Apr 13, 2026
15 checks passed
@ElouenGinat ElouenGinat deleted the alpha-feedback branch April 13, 2026 10:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Feedback from alpha-test

3 participants