Skip to content

Fix #413: chore: regenerate only changed colab notebooks in CI and ma...#486

Open
JiwaniZakir wants to merge 1 commit intoNVIDIA-NeMo:mainfrom
JiwaniZakir:fix/413-chore-regenerate-only-changed-colab-note
Open

Fix #413: chore: regenerate only changed colab notebooks in CI and ma...#486
JiwaniZakir wants to merge 1 commit intoNVIDIA-NeMo:mainfrom
JiwaniZakir:fix/413-chore-regenerate-only-changed-colab-note

Conversation

@JiwaniZakir
Copy link
Copy Markdown

Closes #413

CI now detects which docs/notebook_source/*.py files changed via git diff --name-only and passes them to make generate-colab-notebooks FILES="...", regenerating only affected notebooks instead of all six on every run. The simplified diff check in the Check for differences step drops the grep-based cell-ID filter (previously needed to suppress noise from unrelated notebooks) and does a plain git diff docs/colab_notebooks/.

  • .github/workflows/check-colab-notebooks.yml: adds Get changed notebook sources step (id: changed) that populates steps.changed.outputs.files; updates Generate Colab notebooks to conditionally pass FILES=; replaces the multi-grep MEANINGFUL_DIFF pipeline with a bare git diff docs/colab_notebooks/
  • Makefile (generate-colab-notebooks target, line ~481): adds ifdef FILES / else / endif block to forward the FILES variable to generate_colab_notebooks.py --files

Verified by inspecting that the FILES variable threads correctly from the workflow output through the Makefile conditional to the Python script's existing --files argument, and confirmed the previously noisy 188-line cell-ID diff (as in PR #403) would produce an empty MEANINGFUL_DIFF under the new plain-diff check since only the source-matched notebook is regenerated.


This PR was created with AI assistance (Claude). The changes were reviewed by quality gates and a critic model before submission.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@JiwaniZakir JiwaniZakir requested a review from a team as a code owner April 1, 2026 23:35
@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Apr 1, 2026

Thank you for your submission! We ask that you sign our Developer Certificate of Origin before we can accept your contribution. You can sign the DCO by adding a comment below using this text:


I have read the DCO document and I hereby sign the DCO.


You can retrigger this bot by commenting recheck in this Pull Request. Posted by the DCO Assistant Lite bot.

@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps bot commented Apr 1, 2026

Greptile Summary

This PR optimises the Colab notebook CI by detecting which docs/notebook_source/*.py files changed and regenerating only those notebooks, and simplifies the diff check to a plain git diff docs/colab_notebooks/.

  • P1 – multiline truncation in GITHUB_OUTPUT: when more than one notebook source changes in a single PR, xargs -I{} basename {} emits one filename per line; echo "files=$FILES" writes those newlines verbatim and GitHub Actions captures only the first line, silently dropping the rest. Only one notebook is regenerated, leaving the others unchecked.
  • P2 – script injection: ${{ steps.changed.outputs.files }} is interpolated inline into the shell body; surfacing it through an env: variable is the recommended safe pattern.

Confidence Score: 4/5

  • Safe to merge after fixing the multiline GITHUB_OUTPUT truncation bug, which silently skips notebooks when multiple sources change at once.
  • One P1 defect: multi-notebook PRs will only regenerate the first changed notebook due to newline truncation in GITHUB_OUTPUT. The Makefile change is correct. The P2 injection hygiene note is advisory.
  • .github/workflows/check-colab-notebooks.yml — specifically the Get changed notebook sources step output and the inline expression interpolation in Generate Colab notebooks.

Important Files Changed

Filename Overview
.github/workflows/check-colab-notebooks.yml Adds selective notebook regeneration via git diff; the multiline GITHUB_OUTPUT bug causes only the first changed file to be regenerated when multiple notebooks change simultaneously.
Makefile Adds ifdef FILES block to forward the FILES variable to the Python script's --files argument; logic is correct and idiomatic Make.

Sequence Diagram

sequenceDiagram
    participant GH as GitHub Actions
    participant Step1 as Get changed sources
    participant Step2 as Generate notebooks
    participant Make as Makefile
    participant Py as generate_colab_notebooks.py
    participant Step3 as Check for differences

    GH->>Step1: git diff --name-only base..head docs/notebook_source/*.py
    Step1->>Step1: xargs basename → filenames (newline-separated)
    Step1-->>GH: GITHUB_OUTPUT files= (⚠️ truncated at first newline)

    GH->>Step2: evaluate steps.changed.outputs.files
    alt files non-empty
        Step2->>Make: make generate-colab-notebooks FILES="a.py [b.py missing]"
        Make->>Py: --files a.py
        Py-->>Make: writes docs/colab_notebooks/a.ipynb
    else files empty
        Step2->>Make: make generate-colab-notebooks
        Make->>Py: (no --files, process all)
        Py-->>Make: writes all notebooks
    end

    GH->>Step3: git diff docs/colab_notebooks/
    Step3-->>GH: empty → ✅ / non-empty → ❌ exit 1
Loading
Prompt To Fix All With AI
This is a comment left during a code review.
Path: .github/workflows/check-colab-notebooks.yml
Line: 36-37

Comment:
**Multiline `FILES` value truncated in `GITHUB_OUTPUT`**

When more than one `.py` file changes, `xargs -I{} basename {}` emits one filename per line. A bare `echo "files=$FILES"` writes those newlines verbatim into `$GITHUB_OUTPUT`, so GitHub Actions reads only the first line as the `files` value — all subsequent filenames are silently dropped. The downstream step therefore regenerates only one notebook even though several changed.

Fix by collapsing to a space-separated string before writing the output:

```suggestion
          FILES=$(git diff --name-only ${{ github.event.pull_request.base.sha || 'HEAD~1' }} -- docs/notebook_source/*.py | xargs -I{} basename {} | tr '\n' ' ' | sed 's/ $//' || true)
          echo "files=$FILES" >> "$GITHUB_OUTPUT"
```

How can I resolve this? If you propose a fix, please make it concise.

---

This is a comment left during a code review.
Path: .github/workflows/check-colab-notebooks.yml
Line: 41-43

Comment:
**Script injection via direct expression interpolation**

`${{ steps.changed.outputs.files }}` is interpolated directly into the shell script body. If a filename were to contain shell metacharacters the expression would be executed literally. The recommended GitHub Actions pattern is to surface the value through an environment variable so the shell never evaluates the expression inline:

```suggestion
        env:
          CHANGED_FILES: ${{ steps.changed.outputs.files }}
        run: |
          if [ -n "$CHANGED_FILES" ]; then
            make generate-colab-notebooks FILES="$CHANGED_FILES"
          else
            make generate-colab-notebooks
          fi
```

How can I resolve this? If you propose a fix, please make it concise.

Reviews (1): Last reviewed commit: "Regenerate only changed colab notebooks ..." | Re-trigger Greptile

Comment on lines +36 to +37
FILES=$(git diff --name-only ${{ github.event.pull_request.base.sha || 'HEAD~1' }} -- docs/notebook_source/*.py | xargs -I{} basename {} || true)
echo "files=$FILES" >> "$GITHUB_OUTPUT"
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Multiline FILES value truncated in GITHUB_OUTPUT

When more than one .py file changes, xargs -I{} basename {} emits one filename per line. A bare echo "files=$FILES" writes those newlines verbatim into $GITHUB_OUTPUT, so GitHub Actions reads only the first line as the files value — all subsequent filenames are silently dropped. The downstream step therefore regenerates only one notebook even though several changed.

Fix by collapsing to a space-separated string before writing the output:

Suggested change
FILES=$(git diff --name-only ${{ github.event.pull_request.base.sha || 'HEAD~1' }} -- docs/notebook_source/*.py | xargs -I{} basename {} || true)
echo "files=$FILES" >> "$GITHUB_OUTPUT"
FILES=$(git diff --name-only ${{ github.event.pull_request.base.sha || 'HEAD~1' }} -- docs/notebook_source/*.py | xargs -I{} basename {} | tr '\n' ' ' | sed 's/ $//' || true)
echo "files=$FILES" >> "$GITHUB_OUTPUT"
Prompt To Fix With AI
This is a comment left during a code review.
Path: .github/workflows/check-colab-notebooks.yml
Line: 36-37

Comment:
**Multiline `FILES` value truncated in `GITHUB_OUTPUT`**

When more than one `.py` file changes, `xargs -I{} basename {}` emits one filename per line. A bare `echo "files=$FILES"` writes those newlines verbatim into `$GITHUB_OUTPUT`, so GitHub Actions reads only the first line as the `files` value — all subsequent filenames are silently dropped. The downstream step therefore regenerates only one notebook even though several changed.

Fix by collapsing to a space-separated string before writing the output:

```suggestion
          FILES=$(git diff --name-only ${{ github.event.pull_request.base.sha || 'HEAD~1' }} -- docs/notebook_source/*.py | xargs -I{} basename {} | tr '\n' ' ' | sed 's/ $//' || true)
          echo "files=$FILES" >> "$GITHUB_OUTPUT"
```

How can I resolve this? If you propose a fix, please make it concise.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

chore: regenerate only changed colab notebooks in CI and make target

1 participant