fix/json unicode offsets by jurgenvinju · Pull Request #2659 · usethesource/rascal

jurgenvinju · 2026-02-18T11:04:40Z

Fixes unicode offsets for the JSON parser/validator:

parse error locations
origin src keyword fields
fully satisfies the loc semantics of vallang and Rascal (offset, length, line, column)

This makes JSON parsers ready for use in an editor/UI context. From that perspective, this was a bug. From the "we had a reasonable JSON parser" perspective, this was an enhancement.

Instruments the OriginTrackingReader embedded in JSONValueReader to accurately deal with the presence of unicode surrogate pairs in the char buffer of the reader.

Note that unicode characters in comments are equally responsible for shifts in the offsets as unicode characters in string constants and field names.

The current solution still streams quickly and scales freely to very long JSON content, very long lines in JSON content, and very long comments or strings in JSON content.

codecov · 2026-02-18T11:10:33Z

Codecov Report

❌ Patch coverage is 83.92857% with 9 lines in your changes missing coverage. Please review.
✅ Project coverage is 46%. Comparing base (57d8bcd) to head (df0a5c4).
⚠️ Report is 19 commits behind head on main.

Files with missing lines	Patch %	Lines
...pl/library/lang/json/internal/JsonValueReader.java	83%	2 Missing and 7 partials ⚠️

Additional details and impacted files

@@           Coverage Diff           @@
##              main   #2659   +/-   ##
=======================================
  Coverage       46%     46%           
+ Complexity    6677    6668    -9     
=======================================
  Files          795     795           
  Lines        65899   65945   +46     
  Branches      9878    9895   +17     
=======================================
+ Hits         30709   30733   +24     
- Misses       32806   32825   +19     
- Partials      2384    2387    +3

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

sonarqubecloud · 2026-02-24T12:36:02Z

Quality Gate passed

Issues
0 New issues
0 Accepted issues

Measures
0 Security Hotspots
0.0% Coverage on New Code
0.0% Duplication on New Code

See analysis details on SonarQube Cloud

DavyLandman · 2026-02-24T14:05:39Z

Error: RROR] /home/runner/actions-runner/_work/rascal/rascal/src/org/rascalmpl/library/lang/json/internal/JsonValueReader.java:[1032,17] cannot find symbol
  symbol: variable lineHandler
Error: RROR] /home/runner/actions-runner/_work/rascal/rascal/src/org/rascalmpl/library/lang/json/internal/JsonValueReader.java:[1033,17] cannot find symbol

I think there's some uncommitted file?

buffer offset is now compensated for surrogate pairs

3aa527d

jurgenvinju added 4 commits February 18, 2026 12:58

Merge branch 'main' into fix/json-unicode-offsets

505e233

added test with unicode surrogate pairs

977759a

initial throw at unicode resilient positions during JSON parsing

5432aea

minor improvements

fd57b97

jurgenvinju self-assigned this Feb 19, 2026

jurgenvinju added 18 commits February 19, 2026 12:47

minor comment

6c64095

working to get unicode columns right

f1b25c8

minor fix

39a0ef1

fixed line markup in unicode example for testing

d21bc29

gettin the off-by-ones under control

e1be9a1

cleanup, refactoring and documentation, plus corrections

d842105

cleanup

31f1a8a

working on another bug

09bbbb3

fixed boundary condition for getOffset

eb402e3

fixed all tests

df0a5c4

added new failing tests for boundary conditions with unicode origins

38e625a

fixed specific unicode offset issues

b0ce362

cleanup debug code

0c29e5a

cleanup unused handler code

9d146dd

added rationale in comment to explain use of offset buffers

0f7162e

better field names, removed need for comments

c88eb12

comments

7c2170c

comments

cebf2f9

jurgenvinju added enhancement bug labels Feb 24, 2026

jurgenvinju marked this pull request as ready for review February 24, 2026 13:54

jurgenvinju requested a review from DavyLandman February 24, 2026 13:54

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Comments

fix/json unicode offsets#2659

fix/json unicode offsets#2659
jurgenvinju wants to merge 23 commits intomainfrom
fix/json-unicode-offsets

jurgenvinju commented Feb 18, 2026 •

edited

Loading

Uh oh!

codecov bot commented Feb 18, 2026 •

edited

Loading

Uh oh!

sonarqubecloud bot commented Feb 24, 2026

Uh oh!

DavyLandman commented Feb 24, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Comments

Conversation

jurgenvinju commented Feb 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

codecov bot commented Feb 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

sonarqubecloud bot commented Feb 24, 2026

Quality Gate passed

Uh oh!

DavyLandman commented Feb 24, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

jurgenvinju commented Feb 18, 2026 •

edited

Loading

codecov bot commented Feb 18, 2026 •

edited

Loading