Skip to content

Refactor post-processing to use a visitor pattern + AST edits instead of regex #220

@NathanLovato

Description

@NathanLovato

Right now the formatter has two phases. First, Topiary does a formatting pass driven by the tree-sitter query file. Then a postprocess function runs a chain of smaller fixes to handle things Topiary cannot express or that are a bit too expensive to do through queries.

Some of those postprocessing steps use regex matched against the raw text of the file. To avoid accidentally changing content inside strings, there is a helper function called regex_replace_all_outside_strings() that checks each match against the tree before applying it. After every regex pass, the tree has to be incrementally re-parsed so the next step still has accurate info about the AST and the code's content.

This works, but:

  • Regex match text patterns, they're a bit unreliable in nature.
  • We need to reparse the AST multiple times because we're modifying the text.

This program is really designed as a stopgap solution for until we get an official formatter. But if we're going to keep maintaining it, we might as well set a good example and steer it in a direction that makes it more correct and more efficient, where the code sets good examples for contributors.

This is what this task is about. I'd like to refactor to get rid of the quick and dirty regular expressions and replace that with a visitor pattern that will apply some formatting rules based on the AST. Now that the program works overall, now that it is usable by many people in their day-to-day work, with a good test corpus, it's a good time to consider consolidating it.

The idea is to replace each regex replace with a function that walks the abstract syntax tree directly, maybe collects structs representing edits (start_byte, old_end_byte, new_bytes), and applies them all together in reverse byte order before doing a single incremental re-parse. This would also allow removing the reparse before the last step that ensures two blank lines between things.

And once we've done that we can also remove the regex lib dependency.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions