Skip to content

Conversation

@gsusanto-fastly
Copy link
Collaborator

@gsusanto-fastly gsusanto-fastly commented Nov 25, 2025

This PR introduces a new documentation file (doc/advice.md) that provides practical advice on using libfsm for generating reasonably performant pattern matching code.

doc/GUIDE.md Outdated

### 2. Anchor When You Require Full Matches

FSMs only do what’s specified. Explicitly anchor when matching entire strings:
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this isn't about doing only what's specified, it's about writing what we mean. without the anchors, you'd allow a prefix and a suffix. if you mean that, great. if you don't mean that, we're advising to not write them

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I updated by adding "When the intention is to match an entire string, use anchors."
Let me know if you still think it is better to not write them.

doc/GUIDE.md Outdated

### Compilation Takes Too Long

Likely caused by unrestricted wildcards (`.*`, `.+`). Fix with:
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's worth explaining why this happens

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added some explanation and link back to the Writing Effective Regex guide. Let me know if you think it is too repetitive or too long.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks great, thank you

doc/GUIDE.md Outdated
@@ -0,0 +1,186 @@
# Using libfsm for High-Performance Pattern Matching

libfsm compiles regular expressions to deterministic finite state machines (FSMs) and generates executable code. FSM-based matching runs in **linear time O(n)** with **no backtracking**.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is such a significant feature that it may be worth saying a little more here?

Do you think the target audience will know what backtracking is, or the implications of that? Maybe saying that it doesn't need to make any complex decisions at runtime? In contrast to PCRE, where the same regex can have wildly different runtime costs depending on the input.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added some information, what do you think?

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think if we have anything else to say about that, it should go in the top-level readme. we're underselling there

doc/GUIDE.md Outdated

* Word boundaries (`\b`)
* Non-greedy quantifiers (`*?`, `+?`, `??`)
* Group capture and backreferences
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Group capture is coming soon :)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added "coming soon" in the places I mention "group capture".
Let me know if I should retract those.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that's fine!

doc/GUIDE.md Outdated
Likely caused by unrestricted wildcards (`.*`, `.+`). Fix with:

* Negated classes (`[^)]*`)
* Bounded repeats (`{0,50}`)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Depending on what's being repeated, bounded repeats can be substantially more expensive, because instead of representing and then adding a loop around it, the FSM needs to explicitly represent 50 instances of the pattern (which, in this case, are all optional) in order to have the fixed upper limit.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

definitely worth explaining

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps a more informative example: If you have the pattern ^x{3,5}$, libfsm's resulting DFA will be structured like "match an x, then match an x, then match an x, then match an x or skip it, then match an x or skip it, then report an overall match if at the end of input". It has to repeat the pattern, noting each time whether it's required or optional (beyond the lower count in {min,max}), because DFA execution doesn't have a counter, just the current state within the overall DFA.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the explanation!
Now that I know that the suggestion is not correct, I have removed it from the docs.

Do you think this information is good to be included in the document?
If so, in which section should it be added into? Is it recommended to not use this? What is the alternative instead?

Or is it just a "good to know" section? What other "good to know" do you think is worth mentioning in the document?

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm going to add back in a section about bounded repetition

@gsusanto-fastly
Copy link
Collaborator Author

Thank you so much for the reviews!
I have updated the docs based on the reviews, further comments are more than welcomed.

doc/GUIDE.md Outdated
# Risky: $ may allow an extra newline
/foo$
```
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's not "risky", it's either incorrect, or it's unnecessary complexity

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I changed it to:

# Incorrect: allows a trailing newline,
# which is usually unintended and adds unnecessary complexity

Let me know if this is a bit wordy?

@katef katef changed the title Add documentation guide on how to use Libfsm effectively Document advice on how to use libfsm for generating performant pattern matchers Nov 27, 2025
@katef katef changed the title Document advice on how to use libfsm for generating performant pattern matchers Advice on how to use libfsm for generating performant pattern matchers Nov 27, 2025
@katef katef merged commit 27802dc into main Nov 27, 2025
346 checks passed
@katef katef deleted the gsusanto-docs branch November 27, 2025 17:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants