Skip to content

fix(core-tools): resolve Jupyter Notebook and JSON corruption in write_file#28000

Open
amelidev wants to merge 4 commits into
google-gemini:mainfrom
amelidev:b_446760218
Open

fix(core-tools): resolve Jupyter Notebook and JSON corruption in write_file#28000
amelidev wants to merge 4 commits into
google-gemini:mainfrom
amelidev:b_446760218

Conversation

@amelidev

@amelidev amelidev commented Jun 18, 2026

Copy link
Copy Markdown
Contributor

Summary

This PR resolves a critical bug where the write_file tool silently corrupts Jupyter Notebook (.ipynb) and JSON files, rendering them unparseable or causing the environment (like Colab or JupyterLab) to discard changes and revert to checkpoints.

Following code review feedback, we have expanded this protection to cover other highly structured data and config formats (YAML, TOML, map files, and common JSON-based configuration dotfiles) that are equally vulnerable to aggressive unescaping corruption.

Details

The root cause was isolated to the unescapeStringForGeminiBug utility in packages/core/src/utils/editCorrector.ts. The utility aggressively unescapes sequences (such as replacing \\n with raw newlines \n or \\" with raw double quotes "). While useful for fixing over-escaped code block lines in standard source files, this corrupts the syntax of JSON, YAML, TOML, and structured configuration files because raw control characters and unescaped double quotes are strictly illegal inside string values in these specifications.

Changes:

  • JSON/Structured format bypass: Added an optional isJsonLike parameter to ensureCorrectFileContent(). When true, it immediately returns the proposed content untouched, safely bypassing LLM-based correction and aggressive unescaping.
  • Robust format-aware checking: Configured getCorrectedFileContent() in packages/core/src/tools/write-file.ts to detect the target file:
    • Standard extensions: .json, .ipynb, .map, .yaml, .yml, .toml.
    • Common JSON-based dotfiles (without typical extensions): .eslintrc, .babelrc, .prettierrc, .stylelintrc, .watchmanconfig by checking their basenames.
    • Sets isJsonLike = true (and aggressiveUnescape = false) for these files.
  • Exhaustive test coverage: Added a comprehensive parameter-matrix test inside write-file.test.ts to ensure that all of the mentioned extensions/files correctly trigger JSON-like protection.

Related Issues

Resolves Jupyter Notebook (.ipynb) modification failure in Google3 and cloud development environments (Colab).

How to Validate

1. Execute Unit Tests:

Run the modified test suites in the core package:

npm test -w @google/gemini-cli-core -- src/utils/editCorrector.test.ts
npm test -w @google/gemini-cli-core -- src/tools/write-file.test.ts

Expected: All 66 tests pass successfully.

2. Manual Verification:

Verify that ensureCorrectFileContent with isJsonLike: true does not corrupt JSON/YAML/TOML strings containing escaped values:

const result = await ensureCorrectFileContent(
  validJsonString, 
  mockBaseLlmClient, 
  abortSignal, 
  true, 
  true, 
  true // isJsonLike
);
JSON.parse(result); // Does not throw SyntaxError

Pre-Merge Checklist

  • Updated relevant documentation and README (if needed)
  • Added/updated tests (if needed)
  • Noted breaking changes (if any)
  • Validated on required platforms/methods:
    • MacOS
      • npm run
      • npx
      • Docker
      • Podman
      • Seatbelt
    • Windows
      • npm run
      • npx
      • Docker
    • Linux
      • npm run
      • npx
      • Docker

@amelidev amelidev requested review from a team as code owners June 18, 2026 01:43
@github-actions github-actions Bot added the size/m A medium sized PR label Jun 18, 2026
@github-actions

github-actions Bot commented Jun 18, 2026

Copy link
Copy Markdown

📊 PR Size: size/M

  • Lines changed: 155
  • Additions: +123
  • Deletions: -32
  • Files changed: 5

@github-actions

github-actions Bot commented Jun 18, 2026

Copy link
Copy Markdown

🛑 Action Required: Evaluation Approval

Steering changes have been detected in this PR. To prevent regressions, a maintainer must approve the evaluation run before this PR can be merged.

Maintainers:

  1. Go to the Workflow Run Summary.
  2. Click the yellow 'Review deployments' button.
  3. Select the 'eval-gate' environment and click 'Approve'.

Once approved, the evaluation results will be posted here automatically.

@gemini-code-assist

Copy link
Copy Markdown
Contributor

Summary of Changes

Hello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request addresses a critical issue where the write_file tool was corrupting JSON and Jupyter Notebook files by applying aggressive unescaping logic intended for standard source code. By introducing a bypass mechanism for JSON-like files, the tool now preserves the integrity of these structured data formats, ensuring they remain valid and parseable in their respective environments.

Highlights

  • JSON and Jupyter Notebook Protection: Introduced an isJsonLike flag to prevent aggressive unescaping of JSON and Jupyter Notebook files, which previously caused syntax corruption.
  • Automated File Detection: Updated getCorrectedFileContent to automatically detect .json and .ipynb file extensions and bypass potentially destructive correction logic.
  • Test Suite Enhancements: Added comprehensive unit tests to verify that JSON content remains intact and that the unescape utility correctly avoids problematic sequences.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize the Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counterproductive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request prevents corruption of escaped sequences in JSON-like files (such as .json and .ipynb) by bypassing aggressive unescaping and LLM correction. This is achieved by introducing an isJsonLike flag based on the file extension, which is passed to ensureCorrectFileContent to return the content untouched. The review feedback suggests expanding this check to support other JSON-based formats like .jsonc, .json5, .geojson, and .map to prevent similar issues.

Comment thread packages/core/src/tools/write-file.ts Outdated
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
@TrueAlpha-spiral

TrueAlpha-spiral commented Jun 18, 2026 via email

Copy link
Copy Markdown

@gemini-code-assist

Copy link
Copy Markdown
Contributor

Thank you for the detailed architectural perspective on integrating structural integrity checks into the Universal Verification Kernel (UVK). Your proposal for a formal _validate_json_integrity gate is a compelling approach to ensuring long-term data consistency and preventing semantic drift.

While the concept of a kernel-level structural audit is an excellent direction for the system's evolution, my current focus is on addressing the immediate, critical bug in the write-file tool within the TypeScript codebase. The current PR aims to provide a reliable, localized fix to prevent the silent corruption of JSON and Jupyter Notebook files during standard operations.

I will continue to prioritize the robustness of the isJsonLike bypass in the current implementation. If you have any specific feedback or suggestions regarding the current TypeScript logic in packages/core/src/tools/write-file.ts or packages/core/src/utils/editCorrector.ts, I would be very interested in hearing them to ensure the immediate fix is as effective as possible.

@gemini-cli gemini-cli Bot added the status/need-issue Pull requests that need to have an associated issue. label Jun 18, 2026
@luisfelipe-alt

Copy link
Copy Markdown
Contributor

/gemini review

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a mechanism to bypass unescaping and LLM correction for JSON-like files (such as .json, .ipynb, and .map files) to prevent syntax corruption. It updates getCorrectedFileContent and ensureCorrectFileContent to detect these file types and return their content untouched, accompanied by comprehensive unit tests. The reviewer suggests improving the JSON detection logic by checking if the file extension contains 'json' rather than starting with '.json', which would robustly support other JSON-derived formats like .geojson and .topojson.

Comment thread packages/core/src/tools/write-file.ts Outdated
@TrueAlpha-spiral

TrueAlpha-spiral commented Jun 18, 2026 via email

Copy link
Copy Markdown

@TrueAlpha-spiral

TrueAlpha-spiral commented Jun 18, 2026 via email

Copy link
Copy Markdown

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
@amelidev

Copy link
Copy Markdown
Contributor Author

/gemini review

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a mechanism to bypass aggressive unescaping and LLM correction for JSON-like files (such as .json, .ipynb, and .map) to prevent content corruption during file writes. The feedback suggests expanding this detection to include other structured configuration files, such as YAML, TOML, and common JSON-based dotfiles (e.g., .eslintrc, .prettierrc), which are also vulnerable to corruption.

Comment thread packages/core/src/tools/write-file.ts Outdated
@amelidev

Copy link
Copy Markdown
Contributor Author

/gemini review

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a mechanism to identify JSON-like files (such as .json, .ipynb, .yaml, .toml, and common configuration dotfiles) and bypasses aggressive unescaping and LLM correction for them to prevent corruption of structured data. The review feedback suggests expanding this list of JSON-like files to include lockfiles (e.g., .lock) and additional common dotfiles (e.g., .nycrc, .releaserc), along with updating the corresponding unit tests to cover these new cases.

Comment on lines +137 to +150
const isJsonLike =
fileExtension.includes('json') ||
fileExtension === '.ipynb' ||
fileExtension === '.map' ||
fileExtension === '.yaml' ||
fileExtension === '.yml' ||
fileExtension === '.toml' ||
[
'.eslintrc',
'.babelrc',
'.prettierrc',
'.stylelintrc',
'.watchmanconfig',
].includes(fileName);

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

While the current list of extensions and dotfiles covers many common structured formats, it misses several critical structured files that are highly vulnerable to aggressive unescaping corruption:

  1. Lockfiles (.lock): Files like Cargo.lock (TOML), poetry.lock (TOML), Pipfile.lock (JSON), and yarn.lock (YAML-like) are highly structured. Corrupting these files can break the entire project's dependency resolution and build process.
  2. Common extensionless dotfiles: .nycrc (Istanbul/nyc coverage config) and .releaserc (semantic-release config) are very common JSON/YAML configuration files that do not have standard extensions.

Adding .lock to the extension checks and .nycrc/.releaserc to the dotfile list will prevent silent corruption of these critical files.

Suggested change
const isJsonLike =
fileExtension.includes('json') ||
fileExtension === '.ipynb' ||
fileExtension === '.map' ||
fileExtension === '.yaml' ||
fileExtension === '.yml' ||
fileExtension === '.toml' ||
[
'.eslintrc',
'.babelrc',
'.prettierrc',
'.stylelintrc',
'.watchmanconfig',
].includes(fileName);
const isJsonLike =
fileExtension.includes('json') ||
fileExtension === '.ipynb' ||
fileExtension === '.map' ||
fileExtension === '.yaml' ||
fileExtension === '.yml' ||
fileExtension === '.toml' ||
fileExtension === '.lock' ||
[
'.eslintrc',
'.babelrc',
'.prettierrc',
'.stylelintrc',
'.watchmanconfig',
'.nycrc',
'.releaserc',
].includes(fileName);

Comment on lines +399 to +408
const filesToTest = [
'notebook.ipynb',
'data.json',
'bundle.js.map',
'workflow.yaml',
'config.yml',
'Cargo.toml',
'.eslintrc',
'.prettierrc',
];

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

Update the test cases to verify that .lock files (like Cargo.lock, poetry.lock, yarn.lock) and common extensionless dotfiles (like .nycrc, .releaserc) are correctly identified as JSON-like and bypass aggressive unescaping.

      const filesToTest = [
        'notebook.ipynb',
        'data.json',
        'bundle.js.map',
        'workflow.yaml',
        'config.yml',
        'Cargo.toml',
        'Cargo.lock',
        'poetry.lock',
        'yarn.lock',
        '.eslintrc',
        '.prettierrc',
        '.nycrc',
        '.releaserc',
      ];

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

size/m A medium sized PR status/need-issue Pull requests that need to have an associated issue.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants