Skip to content

LSP textDocument/definition support for HEEX ~H sigil via tree-sitter-heex#75

Open
superhawk610 wants to merge 25 commits into
remoteoss:mainfrom
superhawk610:feat/treesitter-heex
Open

LSP textDocument/definition support for HEEX ~H sigil via tree-sitter-heex#75
superhawk610 wants to merge 25 commits into
remoteoss:mainfrom
superhawk610:feat/treesitter-heex

Conversation

@superhawk610

@superhawk610 superhawk610 commented Jun 6, 2026

Copy link
Copy Markdown
Contributor

Resolves #74.

This PR uses https://github.com/phoenixframework/tree-sitter-heex to parse the contents of ~H sigils and resolve any function components and expressions contained within so that textDocument/definition works. This allows go-to definition within ~H sigils, commonly found in Phoenix LiveView render functions and function components.

Performance Considerations

This approach builds on the existing token parser. ExpressionAtCursor will now parse the contents of ~H sigils, pass it to tree-sitter-heex, and find the nearest component / self_closing_component node to the cursor and return its component_name. This allows the corresponding function's definition to be provided, which makes both go-to definition and docs-on-hover work. Similarly, HEEX expression interpolation (within {}) both on tag attributes and within the template itself will resolve similarly.

Expressions are recursively parsed with ExpressionAtCursor on-demand, bypassing the tokenized file cache. I think this acceptable, since expressions tend to be very small (typically only a single line). tree-sitter-heex is invoked on each textDocument/definition call, with no caching, which may be worth considering. In my local testing it hasn't noticeably lagged, but that may not hold up on larger files.

Why tree-sitter-heex?

I noticed that there's an open issue to remove tree-sitter #63. I think it fits well here because we only care about a small subset of the HEEX contents. It allowed for implementing this feature quickly with little overhead, and with pretty good performance. I think it's totally fine to replace this with something else if desired!


Note

Medium Risk
Touches core parsing, document caching, and many LSP code paths; nested Elixir/HEEX trees add complexity and parse cost on template-heavy files.

Overview
Adds go-to-definition, references, hover, and related LSP behavior inside Phoenix ~H HEEX templates by teaching the lexer and tree-sitter layer about HEEX, not only plain Elixir.

For ~H sigils, the tokenizer now emits HEEX-specific tokens (TokHEEXOpenTag / TokHEEXCloseTag, components like <.foo />, Foo.bar, and Elixir inside {...}) via TokenizeHeex and updated scanSigil. ExpressionAtCursor and the tokenized reference indexer treat those patterns as module/function calls so definition and references work from LiveView templates. tree-sitter-heex is wired in through a new treesitter.Tree that parses Elixir with nested HEEX branches for ~H quoted_content and nested Elixir for HEEX expression_value; DocumentStore keeps per-language parsers and caches these composite trees. LSP handlers call tree.FindVariableOccurrences (and related methods) on the wrapper instead of raw Elixir roots.

Tests cover HEEX cursor context, HEEX tokenization, definition/references in ~H, and document-store tree access via tree.Trunk.

Reviewed by Cursor Bugbot for commit 036033f. Bugbot is set up for automated code reviews on this repo. Configure here.

Comment thread internal/lsp/elixir.go Outdated
Comment thread internal/lsp/elixir.go Outdated
@JesseHerrick

Copy link
Copy Markdown
Member

Hey @superhawk610, nice work on this! However, I would suggest that we take a slightly different approach. I think that we should index HEEX (both inside sigils and heex files) as well using the tokenizer and parser. My reasoning:

  • We plan on getting rid of tree-sitter at some point. It adds quite a bit of complexity.
  • In this current approach, go to definition works but go to references wouldn't. We also gain other features by indexing these files.

I realize that this is more complex to build up front, but I think the end result would be well worth it.

- support multicharacter sigils
- newline after sigil delimiter is optional
- ExprStart / ExprEnd are line-relative
Comment thread internal/lsp/elixir.go Outdated
Comment thread internal/lsp/elixir.go Outdated
Comment thread internal/treesitter/variables.go Outdated
- sigil delim offset relative to tok.Start
- ignore when cursor is on sigil char/delim
- fix possible nil access
@superhawk610

Copy link
Copy Markdown
Contributor Author

I'm working on indexing HEEX with the tokenizer/parser so it's incorporated with the existing process. I'm not confident I can write a full tokenizer for HEEX's grammar, so I'm going with a hybrid approach for now that still shells out to tree-sitter-heex. I'm hoping that once this first step is done, it will provide most of the plumbing and we'll just need a tokenizer for HEEX.

Comment thread internal/parser/tokenizer.go Outdated
Comment thread internal/treesitter/variables.go Outdated
Comment thread internal/treesitter/variables.go Outdated
Comment thread internal/treesitter/variables.go Outdated
Comment thread internal/lsp/server.go Outdated
Comment thread internal/parser/tokenizer.go Outdated
Comment thread internal/treesitter/variables.go Outdated
Comment thread internal/treesitter/variables.go Outdated

@superhawk610 superhawk610 left a comment

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At this point go-to definition works, but find all references doesn't since the tree-sitter parsing in variables.go treats HEEX sigil contents as opaque. I've taken a pass at a possible approach through this using nested HEEX tree-sitter trees, but this has grown quite a bit in scope. I think a good next step would be to think this through a bit more thoroughly at a high level and plan the implementation more explicitly. What do you think?

- handle parseElixir / parseHeex failure
- don't double-count line offset
Comment thread internal/parser/tokenizer.go
@JesseHerrick

Copy link
Copy Markdown
Member

I'm working on indexing HEEX with the tokenizer/parser so it's incorporated with the existing process. I'm not confident I can write a full tokenizer for HEEX's grammar, so I'm going with a hybrid approach for now that still shells out to tree-sitter-heex. I'm hoping that once this first step is done, it will provide most of the plumbing and we'll just need a tokenizer for HEEX.

I can take a stab at this part if you'd like - I can't promise a specific timeline though. Unfortunately, we can't ship something that shells out to tree-sitter during indexing. It's extremely important that we don't have performance regressions during indexing. There's too much overhead in calling out to tree-sitter. Remote's codebase has >57k files to parse, so we need indexing to be as fast as possible.

@superhawk610

superhawk610 commented Jun 7, 2026

Copy link
Copy Markdown
Contributor Author

No expectations at all on timeline! I've gotten this to handle what I need (go-to definition within HEEX sigils), and I don't mind just building from my fork if this ends up being too much work or too niche. Totally agree on avoiding performance regressions, Dexter's speed is its best selling point amongst my peers! 😎

Please feel free to contribute whatever you'd like to this branch, take over or use some/any of this PR, or close it out if it's not in the cards.

@JesseHerrick

Copy link
Copy Markdown
Member

I'll see if I can take a stab at it on this PR sometime this week and we can ship it together. In the meantime, feel free to keep shipping things to this branch and I'll add stuff when I can.

Comment thread internal/parser/tokenizer.go
Comment thread internal/lsp/documents.go Outdated
Comment thread internal/lsp/documents.go Outdated
- use NewTreeWithParsers in DocumentStore
- prevent server startup if parsers unavailable
- emit TokSigil for empty ~H sigils
- simplify break to loop condition
Comment thread internal/lsp/server.go Outdated
Comment thread internal/treesitter/tree.go Outdated
@superhawk610

superhawk610 commented Jun 9, 2026

Copy link
Copy Markdown
Contributor Author

I started on a barebones HEEX tokenizer. The line between tokenizer and parser is a bit blurry and this may lean a bit far into parsing. My aim is to get a simple replacement for tree-sitter-heex that can tokenize at minimum: TokModule, TokIdent, TokDot, and recursively tokenize interpolated expressions. If this seems to be moving in the right direction, I may have some more time later in the week to continue.

Comment thread internal/parser/tokenizer.go Outdated
Comment thread internal/parser/tokenizer.go
Comment thread internal/parser/tokenizer.go Outdated
Comment thread internal/parser/tokenizer_test.go
Comment thread internal/treesitter/tree.go
Comment thread internal/parser/tokenizer.go
Comment thread internal/parser/tokenizer.go Outdated
Comment thread internal/parser/tokenizer.go Outdated
Comment thread internal/parser/tokenizer.go Outdated
Comment thread internal/parser/tokenizer_test.go
@superhawk610

superhawk610 commented Jun 10, 2026

Copy link
Copy Markdown
Contributor Author

OK, HEEX tokenizer is now on par with the tree-sitter-heex approach, though lacking some edge-case coverage.

  • scanInterpolation / scanUntil don't handle early occurrences of the terminator, e.g. <div class={"{}"}> will terminate at the first } rather than the second
  • HEEX special forms e.g. <%= for, <%= case, etc. aren't parsed correctly
  • malformed HTML can probably get the tokenizer stuck in an infinite loop
  • find all references still doesn't work, as the treesitter module needs to be updated to traverse the new heex nested sub-tree

Comment thread internal/parser/tokenizer.go
Comment thread internal/parser/tokenizer.go Outdated
Comment thread internal/treesitter/tree.go
Comment thread internal/treesitter/variables.go Outdated
Comment thread internal/treesitter/variables.go Outdated
Comment thread internal/treesitter/tree.go Outdated
Comment thread internal/lsp/server_test.go Outdated

@cursor cursor Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

Fix All in Cursor

Reviewed by Cursor Bugbot for commit 0cdc226. Configure here.

Comment thread internal/treesitter/variables.go
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

LSP textDocument/definition support for HEEX ~H sigils

2 participants