LSP textDocument/definition support for HEEX ~H sigil via tree-sitter-heex#75
LSP textDocument/definition support for HEEX ~H sigil via tree-sitter-heex#75superhawk610 wants to merge 25 commits into
textDocument/definition support for HEEX ~H sigil via tree-sitter-heex#75Conversation
|
Hey @superhawk610, nice work on this! However, I would suggest that we take a slightly different approach. I think that we should index HEEX (both inside sigils and heex files) as well using the tokenizer and parser. My reasoning:
I realize that this is more complex to build up front, but I think the end result would be well worth it. |
- support multicharacter sigils - newline after sigil delimiter is optional - ExprStart / ExprEnd are line-relative
- sigil delim offset relative to tok.Start - ignore when cursor is on sigil char/delim - fix possible nil access
|
I'm working on indexing HEEX with the tokenizer/parser so it's incorporated with the existing process. I'm not confident I can write a full tokenizer for HEEX's grammar, so I'm going with a hybrid approach for now that still shells out to |
superhawk610
left a comment
There was a problem hiding this comment.
At this point go-to definition works, but find all references doesn't since the tree-sitter parsing in variables.go treats HEEX sigil contents as opaque. I've taken a pass at a possible approach through this using nested HEEX tree-sitter trees, but this has grown quite a bit in scope. I think a good next step would be to think this through a bit more thoroughly at a high level and plan the implementation more explicitly. What do you think?
- handle parseElixir / parseHeex failure - don't double-count line offset
I can take a stab at this part if you'd like - I can't promise a specific timeline though. Unfortunately, we can't ship something that shells out to tree-sitter during indexing. It's extremely important that we don't have performance regressions during indexing. There's too much overhead in calling out to tree-sitter. Remote's codebase has >57k files to parse, so we need indexing to be as fast as possible. |
|
No expectations at all on timeline! I've gotten this to handle what I need (go-to definition within HEEX sigils), and I don't mind just building from my fork if this ends up being too much work or too niche. Totally agree on avoiding performance regressions, Dexter's speed is its best selling point amongst my peers! 😎 Please feel free to contribute whatever you'd like to this branch, take over or use some/any of this PR, or close it out if it's not in the cards. |
|
I'll see if I can take a stab at it on this PR sometime this week and we can ship it together. In the meantime, feel free to keep shipping things to this branch and I'll add stuff when I can. |
- use NewTreeWithParsers in DocumentStore - prevent server startup if parsers unavailable - emit TokSigil for empty ~H sigils - simplify break to loop condition
|
I started on a barebones HEEX tokenizer. The line between tokenizer and parser is a bit blurry and this may lean a bit far into parsing. My aim is to get a simple replacement for |
|
OK, HEEX tokenizer is now on par with the
|
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 1 potential issue.
Reviewed by Cursor Bugbot for commit 0cdc226. Configure here.

Resolves #74.
This PR uses https://github.com/phoenixframework/tree-sitter-heex to parse the contents of
~Hsigils and resolve any function components and expressions contained within so thattextDocument/definitionworks. This allows go-to definition within~Hsigils, commonly found in Phoenix LiveViewrenderfunctions and function components.Performance Considerations
This approach builds on the existing token parser.
ExpressionAtCursorwill now parse the contents of~Hsigils, pass it totree-sitter-heex, and find the nearestcomponent/self_closing_componentnode to the cursor and return itscomponent_name. This allows the corresponding function's definition to be provided, which makes both go-to definition and docs-on-hover work. Similarly, HEEX expression interpolation (within{}) both on tag attributes and within the template itself will resolve similarly.Expressions are recursively parsed with
ExpressionAtCursoron-demand, bypassing the tokenized file cache. I think this acceptable, since expressions tend to be very small (typically only a single line).tree-sitter-heexis invoked on eachtextDocument/definitioncall, with no caching, which may be worth considering. In my local testing it hasn't noticeably lagged, but that may not hold up on larger files.Why
tree-sitter-heex?I noticed that there's an open issue to remove
tree-sitter#63. I think it fits well here because we only care about a small subset of the HEEX contents. It allowed for implementing this feature quickly with little overhead, and with pretty good performance. I think it's totally fine to replace this with something else if desired!Note
Medium Risk
Touches core parsing, document caching, and many LSP code paths; nested Elixir/HEEX trees add complexity and parse cost on template-heavy files.
Overview
Adds go-to-definition, references, hover, and related LSP behavior inside Phoenix
~HHEEX templates by teaching the lexer and tree-sitter layer about HEEX, not only plain Elixir.For
~Hsigils, the tokenizer now emits HEEX-specific tokens (TokHEEXOpenTag/TokHEEXCloseTag, components like<.foo />,Foo.bar, and Elixir inside{...}) viaTokenizeHeexand updatedscanSigil.ExpressionAtCursorand the tokenized reference indexer treat those patterns as module/function calls so definition and references work from LiveView templates.tree-sitter-heexis wired in through a newtreesitter.Treethat parses Elixir with nested HEEX branches for~Hquoted_contentand nested Elixir for HEEXexpression_value;DocumentStorekeeps per-language parsers and caches these composite trees. LSP handlers calltree.FindVariableOccurrences(and related methods) on the wrapper instead of raw Elixir roots.Tests cover HEEX cursor context, HEEX tokenization, definition/references in
~H, and document-store tree access viatree.Trunk.Reviewed by Cursor Bugbot for commit 036033f. Bugbot is set up for automated code reviews on this repo. Configure here.