Skip to content

Comments

fix/json unicode offsets#2659

Open
jurgenvinju wants to merge 23 commits intomainfrom
fix/json-unicode-offsets
Open

fix/json unicode offsets#2659
jurgenvinju wants to merge 23 commits intomainfrom
fix/json-unicode-offsets

Conversation

@jurgenvinju
Copy link
Member

@jurgenvinju jurgenvinju commented Feb 18, 2026

Fixes unicode offsets for the JSON parser/validator:

  • parse error locations
  • origin src keyword fields
  • fully satisfies the loc semantics of vallang and Rascal (offset, length, line, column)

This makes JSON parsers ready for use in an editor/UI context. From that perspective, this was a bug. From the "we had a reasonable JSON parser" perspective, this was an enhancement.

Instruments the OriginTrackingReader embedded in JSONValueReader to accurately deal with the presence of unicode surrogate pairs in the char buffer of the reader.

Note that unicode characters in comments are equally responsible for shifts in the offsets as unicode characters in string constants and field names.

  • start of buffer offset compensated for presence of surrogate pairs
  • pos index into buffer compensated for surrogate pairs
  • column position (same)
  • line position (same)
  • asserts added to reflect the many many assumptions the links between GsonReader, OriginTrackingReader and JSONValueReader depend on.
  • comments to explain the above as well; it's rather finicky
  • added specific file with unicode for testing purposes
  • added JSON generators with different shapes of unicode to the JSONIOTests, specifically for triggering exceptional conditions around the buffer limits:
    • high surrogate and low surrogate before the limit
    • high surrogate before the limit and low surrogate after the limit
    • several surrogate pairs in a row around the limit
    • alternated surrogate pairs and normal chars around the limit

The current solution still streams quickly and scales freely to very long JSON content, very long lines in JSON content, and very long comments or strings in JSON content.

@codecov
Copy link

codecov bot commented Feb 18, 2026

Codecov Report

❌ Patch coverage is 83.92857% with 9 lines in your changes missing coverage. Please review.
✅ Project coverage is 46%. Comparing base (57d8bcd) to head (df0a5c4).
⚠️ Report is 19 commits behind head on main.

Files with missing lines Patch % Lines
...pl/library/lang/json/internal/JsonValueReader.java 83% 2 Missing and 7 partials ⚠️
Additional details and impacted files
@@           Coverage Diff           @@
##              main   #2659   +/-   ##
=======================================
  Coverage       46%     46%           
+ Complexity    6677    6668    -9     
=======================================
  Files          795     795           
  Lines        65899   65945   +46     
  Branches      9878    9895   +17     
=======================================
+ Hits         30709   30733   +24     
- Misses       32806   32825   +19     
- Partials      2384    2387    +3     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@jurgenvinju jurgenvinju self-assigned this Feb 19, 2026
@sonarqubecloud
Copy link

@jurgenvinju jurgenvinju marked this pull request as ready for review February 24, 2026 13:54
@DavyLandman
Copy link
Member

Error: RROR] /home/runner/actions-runner/_work/rascal/rascal/src/org/rascalmpl/library/lang/json/internal/JsonValueReader.java:[1032,17] cannot find symbol
  symbol: variable lineHandler
Error: RROR] /home/runner/actions-runner/_work/rascal/rascal/src/org/rascalmpl/library/lang/json/internal/JsonValueReader.java:[1033,17] cannot find symbol

I think there's some uncommitted file?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants