Skip to content

perf(expression): fast-path identifier characters and short-circuit keyword scans#223

Merged
DylanPiercey merged 1 commit into
mainfrom
claude/expression-fast-path
Jun 20, 2026
Merged

perf(expression): fast-path identifier characters and short-circuit keyword scans#223
DylanPiercey merged 1 commit into
mainfrom
claude/expression-fast-path

Conversation

@DylanPiercey

Copy link
Copy Markdown
Contributor

Summary

Speeds up EXPRESSION parsing — the hottest state in the parser (~25% of parse time) — by skipping work that provably cannot match. No behavior change.

Changes (all in src/states/EXPRESSION.ts)

  1. Identifier/number-character fast path. Inside the per-character loop, a word character (A-Z a-z 0-9 $ _) is never whitespace, is never a terminator (no shouldTerminate implementation matches a word character), and is not one of the switch's cases — it always falls through to default: pos++. A single guard clause now short-circuits the termination checks and the switch dispatch for the bulk of expression content:

    if (isWordCode(code)) {
      this.pos++;
      continue;
    }
  2. Operator keyword-scan short-circuit. lookBehindForOperator / lookAheadForOperator looped over the unary/binary keyword lists even when the surrounding character could not possibly start or end a keyword. Since every keyword is lowercase ASCII letters, they now bail out immediately when the relevant character is not a-z.

Correctness

  • No behavior change — the full test suite passes, and parser output (a checksum over every emitted range across a 1027-file corpus) is byte-for-byte identical.
  • The fast path is safe because every shouldTerminate implementation only terminates on punctuation (verified across all implementations), and word characters are not handled by the expression switch.

Performance

Measured with a process-isolated A/B harness (each variant in its own process to avoid JIT cross-talk, alternating order, sign test over many rounds) on a corpus of real Marko fixtures:

  • Steady-state throughput: ~6% faster vs main, with the identifier fast path alone contributing +3.8% (faster in 20/20 rounds) on top of the keyword-scan change.

🤖 Generated with Claude Code


Generated by Claude Code

…eyword scans

Speed up expression parsing by skipping work that provably cannot match:

- Add an identifier/number-character fast path to the expression loop. Such a
  character is never whitespace, never a terminator (no `shouldTerminate`
  implementation matches a word character), and is not one of the switch's
  cases, so it can short-circuit the termination checks and the switch dispatch
  entirely and just advance the position.
- Bail out of the unary/binary operator keyword scans immediately when the
  surrounding character cannot start or end a keyword (every keyword is
  lowercase ASCII letters).

No behavior change: the full test suite passes and parser output is identical.
@changeset-bot

changeset-bot Bot commented Jun 20, 2026

Copy link
Copy Markdown

🦋 Changeset detected

Latest commit: 543d209

The changes in this PR will be included in the next version bump.

This PR includes changesets to release 1 package
Name Type
htmljs-parser Patch

Not sure what this means? Click here to learn what changesets are.

Click here if you're a maintainer who wants to add another changeset to this PR

@codecov

codecov Bot commented Jun 20, 2026

Copy link
Copy Markdown

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 99.95%. Comparing base (221d3b7) to head (543d209).

Additional details and impacted files
@@            Coverage Diff             @@
##             main     #223      +/-   ##
==========================================
- Coverage   99.97%   99.95%   -0.03%     
==========================================
  Files          34       34              
  Lines        4204     4223      +19     
  Branches      776      780       +4     
==========================================
+ Hits         4203     4221      +18     
- Misses          1        2       +1     

☔ View full report in Codecov by Harness.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@coderabbitai

coderabbitai Bot commented Jun 20, 2026

Copy link
Copy Markdown

Review Change Stack

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: e77a1323-a1b5-4d9a-89fe-09d106903e79

📥 Commits

Reviewing files that changed from the base of the PR and between 221d3b7 and 543d209.

📒 Files selected for processing (2)
  • .changeset/fast-expressions-skip-scans.md
  • src/states/EXPRESSION.ts

Walkthrough

Three fast-path optimizations are added to src/states/EXPRESSION.ts. In the main expression parse loop, when the current character satisfies isWordCode, this.pos is incremented and the loop continues immediately, bypassing termination checks and the switch dispatch. In lookBehindForOperator, a check on the character preceding pos returns -1 early if it is not a lowercase letter (az), skipping the unary keyword scan. In lookAheadForOperator, the same check is applied to the character at pos before the binary keyword scan. A changeset file documents these changes.

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title accurately summarizes the main changes: performance improvements through fast-path identifier character handling and short-circuit keyword scans in expression parsing.
Description check ✅ Passed The description is comprehensive and directly related to the changeset, providing clear context about the performance optimization, specific implementation details, correctness guarantees, and measured performance improvements.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch claude/expression-fast-path

Comment @coderabbitai help to get the list of available commands and usage tips.

@DylanPiercey DylanPiercey merged commit 3c95d7f into main Jun 20, 2026
10 of 11 checks passed
@DylanPiercey DylanPiercey deleted the claude/expression-fast-path branch June 20, 2026 21:14
@github-actions github-actions Bot mentioned this pull request Jun 20, 2026
DylanPiercey added a commit to marko-js/tree-sitter that referenced this pull request Jun 26, 2026
…ord scans

Port htmljs-parser's EXPRESSION fast paths to the external scanner:

- Add an identifier/number-character fast path to the expression loop. Such a
  character is never whitespace, never a terminator (no should_terminate case
  matches a word character), and is not one of the switch's cases, so it
  short-circuits the termination checks (including should_terminate's eager
  lookahead) and the switch dispatch and just advances.
- Bail out of the unary/binary operator keyword scans immediately when the
  surrounding character cannot start or end a keyword (every keyword is
  lowercase ASCII letters).

No behavior change: the full fixture-comparison suite passes. ~6-9% faster on
expression-heavy input in an A/B build.

Mirrors marko-js/htmljs-parser#223 (commit 3c95d7f).
DylanPiercey added a commit to marko-js/tree-sitter that referenced this pull request Jun 26, 2026
…ord scans

Port htmljs-parser's EXPRESSION fast paths to the external scanner:

- Add an identifier/number-character fast path to the expression loop. Such a
  character is never whitespace, never a terminator (no should_terminate case
  matches a word character), and is not one of the switch's cases, so it
  short-circuits the termination checks (including should_terminate's eager
  lookahead) and the switch dispatch and just advances.
- Bail out of the unary/binary operator keyword scans immediately when the
  surrounding character cannot start or end a keyword (every keyword is
  lowercase ASCII letters).

No behavior change: the full fixture-comparison suite passes. ~6-9% faster on
expression-heavy input in an A/B build.

Mirrors marko-js/htmljs-parser#223 (commit 3c95d7f).
@DylanPiercey DylanPiercey moved this to Done in Roadmap Jun 29, 2026
@DylanPiercey DylanPiercey self-assigned this Jun 29, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

1 participant