|
| 1 | +--- |
| 2 | +author: |
| 3 | + - name: Herrington Darkholme |
| 4 | +date: 2025-11-26 |
| 5 | +head: |
| 6 | + - - meta |
| 7 | + - property: og:type |
| 8 | + content: website |
| 9 | + - - meta |
| 10 | + - property: og:title |
| 11 | + content: How to Debug ast-grep Rule Effectively |
| 12 | + - - meta |
| 13 | + - property: og:url |
| 14 | + content: https://ast-grep.github.io/blog/how-to-debug.html |
| 15 | + - - meta |
| 16 | + - property: og:description |
| 17 | + content: Learn how to debug ast-grep rules effectively by simplifying code and rules step by step. |
| 18 | +--- |
| 19 | + |
| 20 | + |
| 21 | +# How to Debug ast-grep Rule Effectively |
| 22 | + |
| 23 | +Debugging ast-grep rules can be frustrating. You write what looks like a perfectly reasonable rule, test it against your code, and... nothing matches. Or worse, it matches things you didn't expect. |
| 24 | + |
| 25 | +The key to effective debugging is one word: **SIMPLIFY**. |
| 26 | + |
| 27 | +When your rule doesn't work, resist the urge to add more conditions or make the pattern more complex. Instead, strip everything down to the basics and build back up systematically. This post will teach you a reliable debugging workflow that works for any ast-grep rule. |
| 28 | + |
| 29 | + |
| 30 | +## The Debugging Workflow |
| 31 | + |
| 32 | +Here's a step-by-step process to debug any ast-grep rule: |
| 33 | + |
| 34 | +1. **Set up a reproducible test case.** Use `ast-grep scan -r test.yml test.file` or the [online playground](https://ast-grep.github.io/playground.html) to quickly iterate on your code and rule. |
| 35 | + |
| 36 | +2. **Reduce the code to a minimal example.** Delete everything unrelated to the rule. If your rule should match a function call, remove all the surrounding code until you have just the essential lines. |
| 37 | + |
| 38 | +3. **Inspect the AST structure.** Use `ast-grep run -p '{code}' -l {lang} --debug-query=cst` or the playground's AST view to understand the actual tree structure. The AST often looks different from what you'd expect. |
| 39 | + |
| 40 | +4. **Simplify the rule.** Remove rule conditions one by one. Test each simpler version to see what actually matches. Pay attention to how _**meta-variables**_ are being captured. |
| 41 | + |
| 42 | +5. **Repeat steps 2-4.** Continue simplifying both code and rule until you isolate the issue. |
| 43 | + |
| 44 | +Let's see this workflow in action with real examples. |
| 45 | + |
| 46 | + |
| 47 | +## Example 1: The SQL Injection Detector |
| 48 | + |
| 49 | +Consider this rule designed to [detect potential SQL injection](https://ast-grep.github.io/playground.html#eyJtb2RlIjoiQ29uZmlnIiwibGFuZyI6InB5dGhvbiIsInF1ZXJ5IjoiPERpYWxvZyAkJCQ+IiwicmV3cml0ZSI6IiIsInN0cmljdG5lc3MiOiJzbWFydCIsInNlbGVjdG9yIjoiIiwiY29uZmlnIjoiaWQ6IHNvbWVfc3FsaV9ydWxlXG5sYW5ndWFnZTogcHl0aG9uXG5ydWxlOlxuICBwYXR0ZXJuOiAkWC5leGVjdXRlKCQkJClcbiAgaGFzOlxuICAgIGtpbmQ6IGFyZ3VtZW50X2xpc3RcbiAgICBoYXM6XG4gICAgICBudGhDaGlsZDogMVxuICAgICAgYW55OlxuICAgICAgICAtIGtpbmQ6IGlkZW50aWZpZXJcbiAgICAgICAgICBwYXR0ZXJuOiAkVkFSXG4gICAgICAgIC0gaGFzOlxuICAgICAgICAgICAgc3RvcEJ5OiBlbmRcbiAgICAgICAgICAgIGtpbmQ6IGlkZW50aWZpZXJcbiAgICAgICAgICAgIHBhdHRlcm46ICRWQVJcbiAgaW5zaWRlOlxuICAgIHN0b3BCeTogZW5kXG4gICAga2luZDogbW9kdWxlXG4gICAgaGFzOlxuICAgICAgc3RvcEJ5OiBlbmRcbiAgICAgIGtpbmQ6IGFzc2lnbm1lbnRcbiAgICAgIHBhdHRlcm46ICRWQVIgPSAkJCQiLCJzb3VyY2UiOiJkZWYgdGVzdF9zcWxfaW5qZWN0aW9uX2RldGVjdGlvbigpOlxuICAgIFwiXCJcIlRlc3QgY2FzZSBmb3Igc3RhdGljIGFuYWx5c2lzIHRvb2xzIGRldGVjdGluZyBTUUwgaW5qZWN0aW9uIHZ1bG5lcmFiaWxpdGllc1wiXCJcIlxuICAgICMgU2V0dXAgdGVzdCBkYXRhYmFzZVxuICAgIGRiID0gRGF0YWJhc2VNYW5hZ2VyKCc6bWVtb3J5OicpXG4gICAgdXNlcl9pbnB1dCA9IHJlcS5xdWVyeS5wYXJhbVxuICAgIHZ1bG5fcGFyYW0gPSBjb21wdXRlX2Jhc2VkX29uX2lucHV0KHVzZXJfaW5wdXQpXG4gICAgZGIuZXhlY3V0ZShmXCJEUk9QIFRBQkxFIElGIEVYSVNUUyB7dnVsbl9wYXJhbX1cIikgICMgVnVsbmVyYWJsZSwgYnV0IG5vdCBkZXRlY3RlZCEifQ==) vulnerabilities in Python: |
| 50 | + |
| 51 | +```yaml |
| 52 | +id: some_sqli_rule |
| 53 | +language: python |
| 54 | +rule: |
| 55 | + pattern: $X.execute($$$) |
| 56 | + has: |
| 57 | + kind: argument_list |
| 58 | + has: |
| 59 | + nthChild: 1 |
| 60 | + any: |
| 61 | + - kind: identifier |
| 62 | + pattern: $VAR |
| 63 | + - has: |
| 64 | + stopBy: end |
| 65 | + kind: identifier |
| 66 | + pattern: $VAR |
| 67 | + inside: |
| 68 | + stopBy: end |
| 69 | + kind: module |
| 70 | + has: |
| 71 | + stopBy: end |
| 72 | + kind: assignment |
| 73 | + pattern: $VAR = $$$ |
| 74 | +``` |
| 75 | +
|
| 76 | +The rule should flag cases where a variable assigned from user input is passed to `execute()`. Let's test it against this code: |
| 77 | + |
| 78 | +```python |
| 79 | +def test_sql_injection_detection(): |
| 80 | + """Test case for static analysis tools detecting SQL injection vulnerabilities""" |
| 81 | + # Setup test database |
| 82 | + db = DatabaseManager(':memory:') |
| 83 | + user_input = req.query.param |
| 84 | + vuln_param = compute_based_on_input(user_input) |
| 85 | + db.execute(f"DROP TABLE IF EXISTS {vuln_param}") # Vulnerable, but not detected! |
| 86 | +``` |
| 87 | + |
| 88 | +No match. Why? |
| 89 | + |
| 90 | +### Step 1: Reduce to Minimal Code |
| 91 | + |
| 92 | +Let's strip the code down to the essentials: |
| 93 | + |
| 94 | +```python |
| 95 | +something = "value" |
| 96 | +vuln_param = other |
| 97 | +x.execute(f"DROP TABLE IF EXISTS {vuln_param}") # Still no match |
| 98 | +``` |
| 99 | + |
| 100 | +Interestingly, if we remove the first assignment: |
| 101 | + |
| 102 | +```python |
| 103 | +vuln_param = other |
| 104 | +x.execute(f"DROP TABLE IF EXISTS {vuln_param}") # This matches! |
| 105 | +``` |
| 106 | + |
| 107 | +Now it matches! The presence of `something = "value"` somehow breaks the rule. This is our first clue. |
| 108 | + |
| 109 | +### Step 2: Simplify the Rule |
| 110 | + |
| 111 | +Let's test individual parts of the rule. First, just the `has` portion: |
| 112 | + |
| 113 | +```yaml |
| 114 | +rule: |
| 115 | + pattern: $X.execute($$$) |
| 116 | + has: |
| 117 | + kind: argument_list |
| 118 | + has: |
| 119 | + nthChild: 1 |
| 120 | + any: |
| 121 | + - kind: identifier |
| 122 | + pattern: $VAR |
| 123 | + - has: |
| 124 | + stopBy: end |
| 125 | + kind: identifier |
| 126 | + pattern: $VAR |
| 127 | +``` |
| 128 | + |
| 129 | +This matches! And it captures `$VAR` as `vuln_param`. |
| 130 | + |
| 131 | +Now let's test just the `inside` portion: |
| 132 | + |
| 133 | +```yaml |
| 134 | +rule: |
| 135 | + pattern: $X.execute($$$) |
| 136 | + inside: |
| 137 | + stopBy: end |
| 138 | + kind: module |
| 139 | + has: |
| 140 | + stopBy: end |
| 141 | + kind: assignment |
| 142 | + pattern: $VAR = $$$ |
| 143 | +``` |
| 144 | + |
| 145 | +This also matches! But wait—what does it capture `$VAR` as? |
| 146 | + |
| 147 | +### Step 3: Identify the Conflict |
| 148 | + |
| 149 | +Here's the problem: when we have both assignments in the code, the `inside` rule matches `$VAR` to `something` (the first assignment it encounters), while the `has` rule expects `$VAR` to be `vuln_param`. |
| 150 | + |
| 151 | +Since `something ≠ vuln_param`, the combined rule fails. |
| 152 | + |
| 153 | +This is due to [rule matching order sensitivity](https://ast-grep.github.io/advanced/faq.html#why-is-rule-matching-order-sensitive). In YAML, sibling keys are processed in an implementation-defined order. The `inside` rule executes first, binding `$VAR` to `something`, so the subsequent `has` rule cannot match. |
| 154 | + |
| 155 | +### Step 4: The Fix |
| 156 | + |
| 157 | +Use `all` to explicitly control the matching order: |
| 158 | + |
| 159 | +```yaml |
| 160 | +id: some_sqli_rule |
| 161 | +language: python |
| 162 | +rule: |
| 163 | + pattern: $X.execute($$$) |
| 164 | + all: |
| 165 | + - has: |
| 166 | + kind: argument_list |
| 167 | + has: |
| 168 | + nthChild: 1 |
| 169 | + any: |
| 170 | + - kind: identifier |
| 171 | + pattern: $VAR |
| 172 | + - has: |
| 173 | + stopBy: end |
| 174 | + kind: identifier |
| 175 | + pattern: $VAR |
| 176 | + - inside: |
| 177 | + stopBy: end |
| 178 | + kind: module |
| 179 | + has: |
| 180 | + stopBy: end |
| 181 | + kind: assignment |
| 182 | + pattern: $VAR = $$$ |
| 183 | +``` |
| 184 | + |
| 185 | +By putting `has` before `inside` in the `all` array, we ensure `$VAR` is first bound to the identifier in the execute call, and then we verify that this same variable was assigned earlier. |
| 186 | + |
| 187 | +Inspecing the [playground](https://ast-grep.github.io/playground.html#eyJtb2RlIjoiQ29uZmlnIiwibGFuZyI6InB5dGhvbiIsInF1ZXJ5IjoiPERpYWxvZyAkJCQ+IiwicmV3cml0ZSI6IiIsInN0cmljdG5lc3MiOiJzbWFydCIsInNlbGVjdG9yIjoiIiwiY29uZmlnIjoiaWQ6IHNvbWVfc3FsaV9ydWxlXG5sYW5ndWFnZTogcHl0aG9uXG5ydWxlOlxuICBwYXR0ZXJuOiAkWC5leGVjdXRlKCQkJClcbiAgYWxsOlxuICAgIC0gaGFzOlxuICAgICAgICBraW5kOiBhcmd1bWVudF9saXN0XG4gICAgICAgIGhhczpcbiAgICAgICAgICBudGhDaGlsZDogMVxuICAgICAgICAgIGFueTpcbiAgICAgICAgICAgIC0ga2luZDogaWRlbnRpZmllclxuICAgICAgICAgICAgICBwYXR0ZXJuOiAkVkFSXG4gICAgICAgICAgICAtIGhhczpcbiAgICAgICAgICAgICAgICBzdG9wQnk6IGVuZFxuICAgICAgICAgICAgICAgIGtpbmQ6IGlkZW50aWZpZXJcbiAgICAgICAgICAgICAgICBwYXR0ZXJuOiAkVkFSXG4gICAgLSBpbnNpZGU6XG4gICAgICAgIHN0b3BCeTogZW5kXG4gICAgICAgIGtpbmQ6IG1vZHVsZVxuICAgICAgICBoYXM6XG4gICAgICAgICAgc3RvcEJ5OiBlbmRcbiAgICAgICAgICBraW5kOiBhc3NpZ25tZW50XG4gICAgICAgICAgcGF0dGVybjogJFZBUiA9ICQkJCIsInNvdXJjZSI6ImRlZiB0ZXN0X3NxbF9pbmplY3Rpb25fZGV0ZWN0aW9uKCk6XG4gICAgXCJcIlwiVGVzdCBjYXNlIGZvciBzdGF0aWMgYW5hbHlzaXMgdG9vbHMgZGV0ZWN0aW5nIFNRTCBpbmplY3Rpb24gdnVsbmVyYWJpbGl0aWVzXCJcIlwiXG4gICAgIyBTZXR1cCB0ZXN0IGRhdGFiYXNlXG4gICAgZGIgPSBEYXRhYmFzZU1hbmFnZXIoJzptZW1vcnk6JylcbiAgICB1c2VyX2lucHV0ID0gcmVxLnF1ZXJ5LnBhcmFtXG4gICAgdnVsbl9wYXJhbSA9IGNvbXB1dGVfYmFzZWRfb25faW5wdXQodXNlcl9pbnB1dClcbiAgICBkYi5leGVjdXRlKGZcIkRST1AgVEFCTEUgSUYgRVhJU1RTIHt2dWxuX3BhcmFtfVwiKSAgIyBWdWxuZXJhYmxlLCBidXQgbm90IGRldGVjdGVkISJ9) now shows the correct match! |
| 188 | + |
| 189 | + |
| 190 | +## Example 2: The Missing Case Statement |
| 191 | + |
| 192 | +Here's another [puzzling case](/playground.html#eyJtb2RlIjoiQ29uZmlnIiwibGFuZyI6ImNwcCIsInF1ZXJ5IjoiPERpYWxvZyAkJCQ+IiwicmV3cml0ZSI6IiIsInN0cmljdG5lc3MiOiJzbWFydCIsInNlbGVjdG9yIjoiIiwiY29uZmlnIjoicnVsZTpcbiAga2luZDogY2FzZV9zdGF0ZW1lbnRcbiAgbm90OlxuICAgIGhhczpcbiAgICAgIHBhdHRlcm46IGFzc2VydCgkQSlcbiAgICAgIGhhczpcbiAgICAgICAga2luZDogJ2ZhbHNlJ1xuICAgICAgICBzdG9wQnk6IGVuZFxuICAgICAgc3RvcEJ5OiBlbmQiLCJzb3VyY2UiOiJzd2l0Y2ggKE15Y2hhcikge1xuICBbW2xpa2VseV1dICAgY2FzZSAnMSc6IHsgYXNzZXJ0KE90aGVyVmFyID4gMSk7IH1cbiAgW1t1bmxpa2VseV1dIGNhc2UgJzInOiB7IGFzc2VydChcIjJcIiAmJiBmYWxzZSk7IH1cbiAgW1t1bmxpa2VseV1dIGNhc2UgJzMnOiB7IGFzc2VydChcIjNcIiAmJiB0cnVlKTsgfVxuICBbW3VubGlrZWx5XV0gY2FzZSAnNCc6IHsgYXNzZXJ0KFwiXCIgJiYgdHJ1ZSk7IH1cbn1cbiJ9). This rule should find `case` statements that don't contain `assert(false)`: |
| 193 | + |
| 194 | +```yaml |
| 195 | +rule: |
| 196 | + kind: case_statement |
| 197 | + not: |
| 198 | + has: |
| 199 | + pattern: assert($A) |
| 200 | + has: |
| 201 | + kind: 'false' |
| 202 | + stopBy: end |
| 203 | + stopBy: end |
| 204 | +``` |
| 205 | + |
| 206 | +Test code: |
| 207 | + |
| 208 | +```cpp |
| 209 | +switch (Mychar) { |
| 210 | + [[likely]] case '1': { assert(OtherVar > 1); } |
| 211 | + [[unlikely]] case '2': { assert("2" && false); } |
| 212 | + [[unlikely]] case '3': { assert("3" && true); } |
| 213 | + [[unlikely]] case '4': { assert("" && true); } |
| 214 | +} |
| 215 | +``` |
| 216 | + |
| 217 | +Expected: Match cases '1', '3', and '4' (they don't have `assert(false)`). |
| 218 | + |
| 219 | +Actual: Only cases '3' and '4' are matched. Case '1' is missing. Why? |
| 220 | + |
| 221 | +### Step 1: Find All Case Statements |
| 222 | + |
| 223 | +First, let's see what `case_statement` nodes exist: |
| 224 | + |
| 225 | +```bash |
| 226 | +ast-grep scan --inline-rules "id: all-cases |
| 227 | +language: cpp |
| 228 | +rule: |
| 229 | + kind: case_statement" test.cpp |
| 230 | +``` |
| 231 | + |
| 232 | +The output reveals something surprising. Each match shows the **range** of the node: |
| 233 | + |
| 234 | +- Case '1': spans lines 2-5 (includes cases 2, 3, and 4!) |
| 235 | +- Case '2': spans lines 3-5 (includes cases 3 and 4) |
| 236 | +- Case '3': spans lines 4-5 (includes case 4) |
| 237 | +- Case '4': spans line 5 only |
| 238 | + |
| 239 | +### Step 2: Understand the AST Structure |
| 240 | + |
| 241 | +In C/C++ tree-sitter grammar, `case_statement` nodes are **nested**. Each case statement contains all subsequent case statements as descendants. This is how tree-sitter represents the fall-through semantics of C switch statements. |
| 242 | + |
| 243 | +``` |
| 244 | +case '1' node |
| 245 | +├── { assert(OtherVar > 1); } |
| 246 | +└── case '2' node ← nested inside case '1'! |
| 247 | + ├── { assert("2" && false); } |
| 248 | + └── case '3' node ← nested inside case '2'! |
| 249 | + ├── { assert("3" && true); } |
| 250 | + └── case '4' node ← nested inside case '3'! |
| 251 | + └── { assert("" && true); } |
| 252 | +``` |
| 253 | +
|
| 254 | +You can also use playground to visualize the AST structure. |
| 255 | +
|
| 256 | + |
| 257 | +
|
| 258 | +### Step 3: Identify the Problem |
| 259 | +
|
| 260 | +Now the issue is clear. When our rule checks case '1': |
| 261 | +
|
| 262 | +1. It looks for `kind: case_statement` ✓ |
| 263 | +2. It checks `not: has: ... kind: 'false'` — does case '1' have a descendant with kind `false`? |
| 264 | +
|
| 265 | +Since case '2' is **nested inside** case '1', and case '2' contains `assert("2" && false)`, the `false` keyword IS a descendant of case '1'! |
| 266 | +
|
| 267 | +The `not: has:` condition fails because `false` exists somewhere in the subtree. Case '1' is incorrectly excluded. |
| 268 | +
|
| 269 | +### Step 4: The Fix |
| 270 | +
|
| 271 | +To fix this, we need to restrict the search to only the **immediate body** of each case, not its nested case statements. We can use `stopBy` to stop at the next case: |
| 272 | +
|
| 273 | +```yaml |
| 274 | +rule: |
| 275 | + kind: case_statement |
| 276 | + not: |
| 277 | + has: |
| 278 | + pattern: assert($A) |
| 279 | + has: |
| 280 | + kind: 'false' |
| 281 | + stopBy: end |
| 282 | + stopBy: |
| 283 | + kind: case_statement |
| 284 | +``` |
| 285 | + |
| 286 | +By setting `stopBy: { kind: case_statement }`, the `has` search stops when it encounters another case statement, preventing it from looking into nested cases. |
| 287 | + |
| 288 | +### The Lesson |
| 289 | + |
| 290 | +When a rule unexpectedly fails to match: |
| 291 | + |
| 292 | +1. Don't assume the AST matches your mental model — C/C++ case statements nest! |
| 293 | +2. Always inspect the actual node ranges, not just the source text |
| 294 | +3. Use `stopBy` to control how deep relational rules search |
| 295 | + |
| 296 | +## Key Takeaways |
| 297 | + |
| 298 | +1. **Simplify, don't complicate.** When debugging, remove code and rule conditions until you find the minimal failing case. |
| 299 | + |
| 300 | +2. **Trust the AST, not the source.** The AST structure can surprise you. Always verify with `--debug-query=cst` or the playground. |
| 301 | + |
| 302 | +3. **Watch meta-variable bindings.** When using the same meta-variable in multiple places, order matters. Use `all` to control matching order. |
| 303 | + |
| 304 | +4. **Iterate systematically.** Don't guess. Remove one thing at a time, test, observe, repeat. |
| 305 | + |
| 306 | +5. **Use the right tools.** The [online playground](https://ast-grep.github.io/playground.html) provides instant feedback and AST visualization. Use it liberally. |
| 307 | + |
| 308 | +Happy debugging! |
0 commit comments