Skip to content

Commit 4e17509

Browse files
doc: add debug tips
fix #664
1 parent 25998ee commit 4e17509

File tree

2 files changed

+308
-0
lines changed

2 files changed

+308
-0
lines changed

website/blog/how-to-debug.md

Lines changed: 308 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,308 @@
1+
---
2+
author:
3+
- name: Herrington Darkholme
4+
date: 2025-11-26
5+
head:
6+
- - meta
7+
- property: og:type
8+
content: website
9+
- - meta
10+
- property: og:title
11+
content: How to Debug ast-grep Rule Effectively
12+
- - meta
13+
- property: og:url
14+
content: https://ast-grep.github.io/blog/how-to-debug.html
15+
- - meta
16+
- property: og:description
17+
content: Learn how to debug ast-grep rules effectively by simplifying code and rules step by step.
18+
---
19+
20+
21+
# How to Debug ast-grep Rule Effectively
22+
23+
Debugging ast-grep rules can be frustrating. You write what looks like a perfectly reasonable rule, test it against your code, and... nothing matches. Or worse, it matches things you didn't expect.
24+
25+
The key to effective debugging is one word: **SIMPLIFY**.
26+
27+
When your rule doesn't work, resist the urge to add more conditions or make the pattern more complex. Instead, strip everything down to the basics and build back up systematically. This post will teach you a reliable debugging workflow that works for any ast-grep rule.
28+
29+
30+
## The Debugging Workflow
31+
32+
Here's a step-by-step process to debug any ast-grep rule:
33+
34+
1. **Set up a reproducible test case.** Use `ast-grep scan -r test.yml test.file` or the [online playground](https://ast-grep.github.io/playground.html) to quickly iterate on your code and rule.
35+
36+
2. **Reduce the code to a minimal example.** Delete everything unrelated to the rule. If your rule should match a function call, remove all the surrounding code until you have just the essential lines.
37+
38+
3. **Inspect the AST structure.** Use `ast-grep run -p '{code}' -l {lang} --debug-query=cst` or the playground's AST view to understand the actual tree structure. The AST often looks different from what you'd expect.
39+
40+
4. **Simplify the rule.** Remove rule conditions one by one. Test each simpler version to see what actually matches. Pay attention to how _**meta-variables**_ are being captured.
41+
42+
5. **Repeat steps 2-4.** Continue simplifying both code and rule until you isolate the issue.
43+
44+
Let's see this workflow in action with real examples.
45+
46+
47+
## Example 1: The SQL Injection Detector
48+
49+
Consider this rule designed to [detect potential SQL injection](https://ast-grep.github.io/playground.html#eyJtb2RlIjoiQ29uZmlnIiwibGFuZyI6InB5dGhvbiIsInF1ZXJ5IjoiPERpYWxvZyAkJCQ+IiwicmV3cml0ZSI6IiIsInN0cmljdG5lc3MiOiJzbWFydCIsInNlbGVjdG9yIjoiIiwiY29uZmlnIjoiaWQ6IHNvbWVfc3FsaV9ydWxlXG5sYW5ndWFnZTogcHl0aG9uXG5ydWxlOlxuICBwYXR0ZXJuOiAkWC5leGVjdXRlKCQkJClcbiAgaGFzOlxuICAgIGtpbmQ6IGFyZ3VtZW50X2xpc3RcbiAgICBoYXM6XG4gICAgICBudGhDaGlsZDogMVxuICAgICAgYW55OlxuICAgICAgICAtIGtpbmQ6IGlkZW50aWZpZXJcbiAgICAgICAgICBwYXR0ZXJuOiAkVkFSXG4gICAgICAgIC0gaGFzOlxuICAgICAgICAgICAgc3RvcEJ5OiBlbmRcbiAgICAgICAgICAgIGtpbmQ6IGlkZW50aWZpZXJcbiAgICAgICAgICAgIHBhdHRlcm46ICRWQVJcbiAgaW5zaWRlOlxuICAgIHN0b3BCeTogZW5kXG4gICAga2luZDogbW9kdWxlXG4gICAgaGFzOlxuICAgICAgc3RvcEJ5OiBlbmRcbiAgICAgIGtpbmQ6IGFzc2lnbm1lbnRcbiAgICAgIHBhdHRlcm46ICRWQVIgPSAkJCQiLCJzb3VyY2UiOiJkZWYgdGVzdF9zcWxfaW5qZWN0aW9uX2RldGVjdGlvbigpOlxuICAgIFwiXCJcIlRlc3QgY2FzZSBmb3Igc3RhdGljIGFuYWx5c2lzIHRvb2xzIGRldGVjdGluZyBTUUwgaW5qZWN0aW9uIHZ1bG5lcmFiaWxpdGllc1wiXCJcIlxuICAgICMgU2V0dXAgdGVzdCBkYXRhYmFzZVxuICAgIGRiID0gRGF0YWJhc2VNYW5hZ2VyKCc6bWVtb3J5OicpXG4gICAgdXNlcl9pbnB1dCA9IHJlcS5xdWVyeS5wYXJhbVxuICAgIHZ1bG5fcGFyYW0gPSBjb21wdXRlX2Jhc2VkX29uX2lucHV0KHVzZXJfaW5wdXQpXG4gICAgZGIuZXhlY3V0ZShmXCJEUk9QIFRBQkxFIElGIEVYSVNUUyB7dnVsbl9wYXJhbX1cIikgICMgVnVsbmVyYWJsZSwgYnV0IG5vdCBkZXRlY3RlZCEifQ==) vulnerabilities in Python:
50+
51+
```yaml
52+
id: some_sqli_rule
53+
language: python
54+
rule:
55+
pattern: $X.execute($$$)
56+
has:
57+
kind: argument_list
58+
has:
59+
nthChild: 1
60+
any:
61+
- kind: identifier
62+
pattern: $VAR
63+
- has:
64+
stopBy: end
65+
kind: identifier
66+
pattern: $VAR
67+
inside:
68+
stopBy: end
69+
kind: module
70+
has:
71+
stopBy: end
72+
kind: assignment
73+
pattern: $VAR = $$$
74+
```
75+
76+
The rule should flag cases where a variable assigned from user input is passed to `execute()`. Let's test it against this code:
77+
78+
```python
79+
def test_sql_injection_detection():
80+
"""Test case for static analysis tools detecting SQL injection vulnerabilities"""
81+
# Setup test database
82+
db = DatabaseManager(':memory:')
83+
user_input = req.query.param
84+
vuln_param = compute_based_on_input(user_input)
85+
db.execute(f"DROP TABLE IF EXISTS {vuln_param}") # Vulnerable, but not detected!
86+
```
87+
88+
No match. Why?
89+
90+
### Step 1: Reduce to Minimal Code
91+
92+
Let's strip the code down to the essentials:
93+
94+
```python
95+
something = "value"
96+
vuln_param = other
97+
x.execute(f"DROP TABLE IF EXISTS {vuln_param}") # Still no match
98+
```
99+
100+
Interestingly, if we remove the first assignment:
101+
102+
```python
103+
vuln_param = other
104+
x.execute(f"DROP TABLE IF EXISTS {vuln_param}") # This matches!
105+
```
106+
107+
Now it matches! The presence of `something = "value"` somehow breaks the rule. This is our first clue.
108+
109+
### Step 2: Simplify the Rule
110+
111+
Let's test individual parts of the rule. First, just the `has` portion:
112+
113+
```yaml
114+
rule:
115+
pattern: $X.execute($$$)
116+
has:
117+
kind: argument_list
118+
has:
119+
nthChild: 1
120+
any:
121+
- kind: identifier
122+
pattern: $VAR
123+
- has:
124+
stopBy: end
125+
kind: identifier
126+
pattern: $VAR
127+
```
128+
129+
This matches! And it captures `$VAR` as `vuln_param`.
130+
131+
Now let's test just the `inside` portion:
132+
133+
```yaml
134+
rule:
135+
pattern: $X.execute($$$)
136+
inside:
137+
stopBy: end
138+
kind: module
139+
has:
140+
stopBy: end
141+
kind: assignment
142+
pattern: $VAR = $$$
143+
```
144+
145+
This also matches! But wait—what does it capture `$VAR` as?
146+
147+
### Step 3: Identify the Conflict
148+
149+
Here's the problem: when we have both assignments in the code, the `inside` rule matches `$VAR` to `something` (the first assignment it encounters), while the `has` rule expects `$VAR` to be `vuln_param`.
150+
151+
Since `something ≠ vuln_param`, the combined rule fails.
152+
153+
This is due to [rule matching order sensitivity](https://ast-grep.github.io/advanced/faq.html#why-is-rule-matching-order-sensitive). In YAML, sibling keys are processed in an implementation-defined order. The `inside` rule executes first, binding `$VAR` to `something`, so the subsequent `has` rule cannot match.
154+
155+
### Step 4: The Fix
156+
157+
Use `all` to explicitly control the matching order:
158+
159+
```yaml
160+
id: some_sqli_rule
161+
language: python
162+
rule:
163+
pattern: $X.execute($$$)
164+
all:
165+
- has:
166+
kind: argument_list
167+
has:
168+
nthChild: 1
169+
any:
170+
- kind: identifier
171+
pattern: $VAR
172+
- has:
173+
stopBy: end
174+
kind: identifier
175+
pattern: $VAR
176+
- inside:
177+
stopBy: end
178+
kind: module
179+
has:
180+
stopBy: end
181+
kind: assignment
182+
pattern: $VAR = $$$
183+
```
184+
185+
By putting `has` before `inside` in the `all` array, we ensure `$VAR` is first bound to the identifier in the execute call, and then we verify that this same variable was assigned earlier.
186+
187+
Inspecing the [playground](https://ast-grep.github.io/playground.html#eyJtb2RlIjoiQ29uZmlnIiwibGFuZyI6InB5dGhvbiIsInF1ZXJ5IjoiPERpYWxvZyAkJCQ+IiwicmV3cml0ZSI6IiIsInN0cmljdG5lc3MiOiJzbWFydCIsInNlbGVjdG9yIjoiIiwiY29uZmlnIjoiaWQ6IHNvbWVfc3FsaV9ydWxlXG5sYW5ndWFnZTogcHl0aG9uXG5ydWxlOlxuICBwYXR0ZXJuOiAkWC5leGVjdXRlKCQkJClcbiAgYWxsOlxuICAgIC0gaGFzOlxuICAgICAgICBraW5kOiBhcmd1bWVudF9saXN0XG4gICAgICAgIGhhczpcbiAgICAgICAgICBudGhDaGlsZDogMVxuICAgICAgICAgIGFueTpcbiAgICAgICAgICAgIC0ga2luZDogaWRlbnRpZmllclxuICAgICAgICAgICAgICBwYXR0ZXJuOiAkVkFSXG4gICAgICAgICAgICAtIGhhczpcbiAgICAgICAgICAgICAgICBzdG9wQnk6IGVuZFxuICAgICAgICAgICAgICAgIGtpbmQ6IGlkZW50aWZpZXJcbiAgICAgICAgICAgICAgICBwYXR0ZXJuOiAkVkFSXG4gICAgLSBpbnNpZGU6XG4gICAgICAgIHN0b3BCeTogZW5kXG4gICAgICAgIGtpbmQ6IG1vZHVsZVxuICAgICAgICBoYXM6XG4gICAgICAgICAgc3RvcEJ5OiBlbmRcbiAgICAgICAgICBraW5kOiBhc3NpZ25tZW50XG4gICAgICAgICAgcGF0dGVybjogJFZBUiA9ICQkJCIsInNvdXJjZSI6ImRlZiB0ZXN0X3NxbF9pbmplY3Rpb25fZGV0ZWN0aW9uKCk6XG4gICAgXCJcIlwiVGVzdCBjYXNlIGZvciBzdGF0aWMgYW5hbHlzaXMgdG9vbHMgZGV0ZWN0aW5nIFNRTCBpbmplY3Rpb24gdnVsbmVyYWJpbGl0aWVzXCJcIlwiXG4gICAgIyBTZXR1cCB0ZXN0IGRhdGFiYXNlXG4gICAgZGIgPSBEYXRhYmFzZU1hbmFnZXIoJzptZW1vcnk6JylcbiAgICB1c2VyX2lucHV0ID0gcmVxLnF1ZXJ5LnBhcmFtXG4gICAgdnVsbl9wYXJhbSA9IGNvbXB1dGVfYmFzZWRfb25faW5wdXQodXNlcl9pbnB1dClcbiAgICBkYi5leGVjdXRlKGZcIkRST1AgVEFCTEUgSUYgRVhJU1RTIHt2dWxuX3BhcmFtfVwiKSAgIyBWdWxuZXJhYmxlLCBidXQgbm90IGRldGVjdGVkISJ9) now shows the correct match!
188+
189+
190+
## Example 2: The Missing Case Statement
191+
192+
Here's another [puzzling case](/playground.html#eyJtb2RlIjoiQ29uZmlnIiwibGFuZyI6ImNwcCIsInF1ZXJ5IjoiPERpYWxvZyAkJCQ+IiwicmV3cml0ZSI6IiIsInN0cmljdG5lc3MiOiJzbWFydCIsInNlbGVjdG9yIjoiIiwiY29uZmlnIjoicnVsZTpcbiAga2luZDogY2FzZV9zdGF0ZW1lbnRcbiAgbm90OlxuICAgIGhhczpcbiAgICAgIHBhdHRlcm46IGFzc2VydCgkQSlcbiAgICAgIGhhczpcbiAgICAgICAga2luZDogJ2ZhbHNlJ1xuICAgICAgICBzdG9wQnk6IGVuZFxuICAgICAgc3RvcEJ5OiBlbmQiLCJzb3VyY2UiOiJzd2l0Y2ggKE15Y2hhcikge1xuICBbW2xpa2VseV1dICAgY2FzZSAnMSc6IHsgYXNzZXJ0KE90aGVyVmFyID4gMSk7IH1cbiAgW1t1bmxpa2VseV1dIGNhc2UgJzInOiB7IGFzc2VydChcIjJcIiAmJiBmYWxzZSk7IH1cbiAgW1t1bmxpa2VseV1dIGNhc2UgJzMnOiB7IGFzc2VydChcIjNcIiAmJiB0cnVlKTsgfVxuICBbW3VubGlrZWx5XV0gY2FzZSAnNCc6IHsgYXNzZXJ0KFwiXCIgJiYgdHJ1ZSk7IH1cbn1cbiJ9). This rule should find `case` statements that don't contain `assert(false)`:
193+
194+
```yaml
195+
rule:
196+
kind: case_statement
197+
not:
198+
has:
199+
pattern: assert($A)
200+
has:
201+
kind: 'false'
202+
stopBy: end
203+
stopBy: end
204+
```
205+
206+
Test code:
207+
208+
```cpp
209+
switch (Mychar) {
210+
[[likely]] case '1': { assert(OtherVar > 1); }
211+
[[unlikely]] case '2': { assert("2" && false); }
212+
[[unlikely]] case '3': { assert("3" && true); }
213+
[[unlikely]] case '4': { assert("" && true); }
214+
}
215+
```
216+
217+
Expected: Match cases '1', '3', and '4' (they don't have `assert(false)`).
218+
219+
Actual: Only cases '3' and '4' are matched. Case '1' is missing. Why?
220+
221+
### Step 1: Find All Case Statements
222+
223+
First, let's see what `case_statement` nodes exist:
224+
225+
```bash
226+
ast-grep scan --inline-rules "id: all-cases
227+
language: cpp
228+
rule:
229+
kind: case_statement" test.cpp
230+
```
231+
232+
The output reveals something surprising. Each match shows the **range** of the node:
233+
234+
- Case '1': spans lines 2-5 (includes cases 2, 3, and 4!)
235+
- Case '2': spans lines 3-5 (includes cases 3 and 4)
236+
- Case '3': spans lines 4-5 (includes case 4)
237+
- Case '4': spans line 5 only
238+
239+
### Step 2: Understand the AST Structure
240+
241+
In C/C++ tree-sitter grammar, `case_statement` nodes are **nested**. Each case statement contains all subsequent case statements as descendants. This is how tree-sitter represents the fall-through semantics of C switch statements.
242+
243+
```
244+
case '1' node
245+
├── { assert(OtherVar > 1); }
246+
└── case '2' node ← nested inside case '1'!
247+
├── { assert("2" && false); }
248+
└── case '3' node ← nested inside case '2'!
249+
├── { assert("3" && true); }
250+
└── case '4' node ← nested inside case '3'!
251+
└── { assert("" && true); }
252+
```
253+
254+
You can also use playground to visualize the AST structure.
255+
256+
![cpp syntax tree](/image/blog/cpp-case-tree.png)
257+
258+
### Step 3: Identify the Problem
259+
260+
Now the issue is clear. When our rule checks case '1':
261+
262+
1. It looks for `kind: case_statement` ✓
263+
2. It checks `not: has: ... kind: 'false'` — does case '1' have a descendant with kind `false`?
264+
265+
Since case '2' is **nested inside** case '1', and case '2' contains `assert("2" && false)`, the `false` keyword IS a descendant of case '1'!
266+
267+
The `not: has:` condition fails because `false` exists somewhere in the subtree. Case '1' is incorrectly excluded.
268+
269+
### Step 4: The Fix
270+
271+
To fix this, we need to restrict the search to only the **immediate body** of each case, not its nested case statements. We can use `stopBy` to stop at the next case:
272+
273+
```yaml
274+
rule:
275+
kind: case_statement
276+
not:
277+
has:
278+
pattern: assert($A)
279+
has:
280+
kind: 'false'
281+
stopBy: end
282+
stopBy:
283+
kind: case_statement
284+
```
285+
286+
By setting `stopBy: { kind: case_statement }`, the `has` search stops when it encounters another case statement, preventing it from looking into nested cases.
287+
288+
### The Lesson
289+
290+
When a rule unexpectedly fails to match:
291+
292+
1. Don't assume the AST matches your mental model — C/C++ case statements nest!
293+
2. Always inspect the actual node ranges, not just the source text
294+
3. Use `stopBy` to control how deep relational rules search
295+
296+
## Key Takeaways
297+
298+
1. **Simplify, don't complicate.** When debugging, remove code and rule conditions until you find the minimal failing case.
299+
300+
2. **Trust the AST, not the source.** The AST structure can surprise you. Always verify with `--debug-query=cst` or the playground.
301+
302+
3. **Watch meta-variable bindings.** When using the same meta-variable in multiple places, order matters. Use `all` to control matching order.
303+
304+
4. **Iterate systematically.** Don't guess. Remove one thing at a time, test, observe, repeat.
305+
306+
5. **Use the right tools.** The [online playground](https://ast-grep.github.io/playground.html) provides instant feedback and AST visualization. Use it liberally.
307+
308+
Happy debugging!
172 KB
Loading

0 commit comments

Comments
 (0)