Feature/issue 1274 notebook format check#1299
Open
rosspeili wants to merge 5 commits into
Open
Conversation
Add check/nbformat, a new bash script that checks the format of all tracked Jupyter notebook (.ipynb) files using the TensorFlow Docs notebook format checker (tensorflow_docs.tools.nbformat). The script: - Discovers all .ipynb files tracked by git via git ls-files - Runs python -m tensorflow_docs.tools.nbformat on each notebook - Defaults to check-only mode (non-destructive) - Accepts --fix to reformat notebooks in place - Accepts --help to print usage information - Exits non-zero if any notebook fails the format check Wire check/nbformat into: - check/all: called unconditionally alongside check/mypy and check/shellcheck, since notebook format is structural and not related to which Python files changed - check/shellcheck: added to the required_shell_scripts list (in sorted order) so shellcheck verifies the new script is present Add tensorflow-docs to dev_tools/requirements/deps/pytest.txt alongside the existing nbformat dep, since both are needed for notebook-related checks and belong in the same env grouping. Add a notebook-checks CI job to .github/workflows/ci.yaml that triggers when .ipynb files change and runs check/nbformat using the pytest.env.txt environment (which includes tensorflow-docs). Also add the new job to report-results so it is a required check.
Run pip-compile to update pytest.env.txt and dev.env.txt with the pinned version of tensorflow-docs (==2025.2.19.33219), which was added to deps/pytest.txt for use by check/nbformat. Also picks up its transitive deps protobuf and pyyaml.
…tion Regenerated pytest.env.txt and dev.env.txt with pip-compile after accepting upstream version bumps during merge conflict resolution. tensorflow-docs==2025.2.19.33219 is now correctly pinned alongside the updated package versions from main.
Contributor
There was a problem hiding this comment.
Code Review
This pull request introduces a new script, check/nbformat, to validate and fix Jupyter notebook formatting using tensorflow_docs.tools.nbformat, and updates the project's dependency requirements and lockfiles. The review highlights two critical issues: the inefficiency of invoking the format checker in a loop rather than passing all files at once, and the breaking of environment compatibility for older Python versions caused by regenerating lockfiles with Python 3.13 instead of the project's minimum supported version.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Closes #1274
What was solved based on the issue
OpenFermion has 8 Jupyter notebook tutorials in
docs/tutorials/, but none of the scripts incheck/ever looked at them. This PR adds a newcheck/nbformatscript that validates alltracked
.ipynbfiles using the TensorFlow Docs notebookformat checker, and wires it into the rest of the CI pipeline.
Changes
check/nbformat(new file) which is a bash script that finds all.ipynbfiles tracked by git andruns
python -m tensorflow_docs.tools.nbformaton each one. Defaults to check-only mode, pass--fixto reformat notebooks in place, or--helpfor usage info. Follows the same structureand conventions as the other scripts in
check/.check/all, addedrun check/nbformatso it runs as part of master check.check/shellcheck, also addedcheck/nbformatto the required scripts list (in sorted order)so shellcheck verifies itself the new script is sound.
dev_tools/requirements/deps/pytest.txt, addedtensorflow-docsalongside the existingnbformatdep (explaining decision below).dev_tools/requirements/envs/pytest.env.txt/dev.env.txt, regenerated pinned lockfilesto include
tensorflow-docs==2025.2.19.33219and its transitive deps (protobuf,pyyaml)..github/workflows/ci.yaml, added anotebook-checksCI job that triggers when.ipynbfiles change and runs
check/nbformat. Also added toreport-resultsso it's required.Why this way
1. Check-only vs. auto-fix?
The script supports both. By default it's read-only and exits non-zero if any notebook fails,
which is what we want in CI. The
--fixflag is there for devs who want to fix theirnotebooks locally without manually figuring out what's wrong. This matches the pattern found in
check/format-incremental(which also has an--applyflag).2. Should
check/nbformatrun unconditionally incheck/all?I think yes. Unlike Python linting or formatting, notebook format issues aren't tied to which files changed
in a given commit, but they're property of the repo as a whole. So it runs unconditionally alongside
check/mypyandcheck/shellcheck, rather than only in the changed files branch.3. Where to put the
tensorflow-docsdependency?tensorflow-docsalready had its own orphaneddeps/tensorflow-docs.txtfile that wasn'treferenced by any compiled environment. Rather than creating yet another new file, I added it
to
deps/pytest.txt, which containednbformatalready (the package for reading notebooks). Bothare notebook-related checking tools used outside of normal test runs, so I thought they belong together.
This keeps the number of dep files as low as possible and the grouping easy to grasp.
Let me know how this looks, or if there's something I missed. <3