Fix #413: chore: regenerate only changed colab notebooks in CI and ma...#486
Conversation
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
Thank you for your submission! We ask that you sign our Developer Certificate of Origin before we can accept your contribution. You can sign the DCO by adding a comment below using this text: I have read the DCO document and I hereby sign the DCO. You can retrigger this bot by commenting recheck in this Pull Request. Posted by the DCO Assistant Lite bot. |
Greptile SummaryThis PR optimises the Colab notebook CI by detecting which
|
| Filename | Overview |
|---|---|
| .github/workflows/check-colab-notebooks.yml | Adds selective notebook regeneration via git diff; the multiline GITHUB_OUTPUT bug causes only the first changed file to be regenerated when multiple notebooks change simultaneously. |
| Makefile | Adds ifdef FILES block to forward the FILES variable to the Python script's --files argument; logic is correct and idiomatic Make. |
Sequence Diagram
sequenceDiagram
participant GH as GitHub Actions
participant Step1 as Get changed sources
participant Step2 as Generate notebooks
participant Make as Makefile
participant Py as generate_colab_notebooks.py
participant Step3 as Check for differences
GH->>Step1: git diff --name-only base..head docs/notebook_source/*.py
Step1->>Step1: xargs basename → filenames (newline-separated)
Step1-->>GH: GITHUB_OUTPUT files= (⚠️ truncated at first newline)
GH->>Step2: evaluate steps.changed.outputs.files
alt files non-empty
Step2->>Make: make generate-colab-notebooks FILES="a.py [b.py missing]"
Make->>Py: --files a.py
Py-->>Make: writes docs/colab_notebooks/a.ipynb
else files empty
Step2->>Make: make generate-colab-notebooks
Make->>Py: (no --files, process all)
Py-->>Make: writes all notebooks
end
GH->>Step3: git diff docs/colab_notebooks/
Step3-->>GH: empty → ✅ / non-empty → ❌ exit 1
Prompt To Fix All With AI
This is a comment left during a code review.
Path: .github/workflows/check-colab-notebooks.yml
Line: 36-37
Comment:
**Multiline `FILES` value truncated in `GITHUB_OUTPUT`**
When more than one `.py` file changes, `xargs -I{} basename {}` emits one filename per line. A bare `echo "files=$FILES"` writes those newlines verbatim into `$GITHUB_OUTPUT`, so GitHub Actions reads only the first line as the `files` value — all subsequent filenames are silently dropped. The downstream step therefore regenerates only one notebook even though several changed.
Fix by collapsing to a space-separated string before writing the output:
```suggestion
FILES=$(git diff --name-only ${{ github.event.pull_request.base.sha || 'HEAD~1' }} -- docs/notebook_source/*.py | xargs -I{} basename {} | tr '\n' ' ' | sed 's/ $//' || true)
echo "files=$FILES" >> "$GITHUB_OUTPUT"
```
How can I resolve this? If you propose a fix, please make it concise.
---
This is a comment left during a code review.
Path: .github/workflows/check-colab-notebooks.yml
Line: 41-43
Comment:
**Script injection via direct expression interpolation**
`${{ steps.changed.outputs.files }}` is interpolated directly into the shell script body. If a filename were to contain shell metacharacters the expression would be executed literally. The recommended GitHub Actions pattern is to surface the value through an environment variable so the shell never evaluates the expression inline:
```suggestion
env:
CHANGED_FILES: ${{ steps.changed.outputs.files }}
run: |
if [ -n "$CHANGED_FILES" ]; then
make generate-colab-notebooks FILES="$CHANGED_FILES"
else
make generate-colab-notebooks
fi
```
How can I resolve this? If you propose a fix, please make it concise.Reviews (1): Last reviewed commit: "Regenerate only changed colab notebooks ..." | Re-trigger Greptile
| FILES=$(git diff --name-only ${{ github.event.pull_request.base.sha || 'HEAD~1' }} -- docs/notebook_source/*.py | xargs -I{} basename {} || true) | ||
| echo "files=$FILES" >> "$GITHUB_OUTPUT" |
There was a problem hiding this comment.
Multiline
FILES value truncated in GITHUB_OUTPUT
When more than one .py file changes, xargs -I{} basename {} emits one filename per line. A bare echo "files=$FILES" writes those newlines verbatim into $GITHUB_OUTPUT, so GitHub Actions reads only the first line as the files value — all subsequent filenames are silently dropped. The downstream step therefore regenerates only one notebook even though several changed.
Fix by collapsing to a space-separated string before writing the output:
| FILES=$(git diff --name-only ${{ github.event.pull_request.base.sha || 'HEAD~1' }} -- docs/notebook_source/*.py | xargs -I{} basename {} || true) | |
| echo "files=$FILES" >> "$GITHUB_OUTPUT" | |
| FILES=$(git diff --name-only ${{ github.event.pull_request.base.sha || 'HEAD~1' }} -- docs/notebook_source/*.py | xargs -I{} basename {} | tr '\n' ' ' | sed 's/ $//' || true) | |
| echo "files=$FILES" >> "$GITHUB_OUTPUT" |
Prompt To Fix With AI
This is a comment left during a code review.
Path: .github/workflows/check-colab-notebooks.yml
Line: 36-37
Comment:
**Multiline `FILES` value truncated in `GITHUB_OUTPUT`**
When more than one `.py` file changes, `xargs -I{} basename {}` emits one filename per line. A bare `echo "files=$FILES"` writes those newlines verbatim into `$GITHUB_OUTPUT`, so GitHub Actions reads only the first line as the `files` value — all subsequent filenames are silently dropped. The downstream step therefore regenerates only one notebook even though several changed.
Fix by collapsing to a space-separated string before writing the output:
```suggestion
FILES=$(git diff --name-only ${{ github.event.pull_request.base.sha || 'HEAD~1' }} -- docs/notebook_source/*.py | xargs -I{} basename {} | tr '\n' ' ' | sed 's/ $//' || true)
echo "files=$FILES" >> "$GITHUB_OUTPUT"
```
How can I resolve this? If you propose a fix, please make it concise.
Closes #413
CI now detects which
docs/notebook_source/*.pyfiles changed viagit diff --name-onlyand passes them tomake generate-colab-notebooks FILES="...", regenerating only affected notebooks instead of all six on every run. The simplified diff check in theCheck for differencesstep drops thegrep-based cell-ID filter (previously needed to suppress noise from unrelated notebooks) and does a plaingit diff docs/colab_notebooks/..github/workflows/check-colab-notebooks.yml: addsGet changed notebook sourcesstep (id:changed) that populatessteps.changed.outputs.files; updatesGenerate Colab notebooksto conditionally passFILES=; replaces the multi-grepMEANINGFUL_DIFFpipeline with a baregit diff docs/colab_notebooks/Makefile(generate-colab-notebookstarget, line ~481): addsifdef FILES/else/endifblock to forward theFILESvariable togenerate_colab_notebooks.py --filesVerified by inspecting that the
FILESvariable threads correctly from the workflow output through the Makefile conditional to the Python script's existing--filesargument, and confirmed the previously noisy 188-line cell-ID diff (as in PR #403) would produce an emptyMEANINGFUL_DIFFunder the new plain-diff check since only the source-matched notebook is regenerated.This PR was created with AI assistance (Claude). The changes were reviewed by quality gates and a critic model before submission.