Skip to content

fix(parsers/javascript): extract Express anonymous route handler callbacks (#21)#48

Open
ChrisJr404 wants to merge 1 commit intoknostic:masterfrom
ChrisJr404:fix/express-anonymous-handlers-21
Open

fix(parsers/javascript): extract Express anonymous route handler callbacks (#21)#48
ChrisJr404 wants to merge 1 commit intoknostic:masterfrom
ChrisJr404:fix/express-anonymous-handlers-21

Conversation

@ChrisJr404
Copy link
Copy Markdown

Closes #21.

The analyzer never walked into call-expression argument lists, so when a route was defined idiomatically as

```js
router.post('/orders', authenticateToken, async (req, res) => { ... });
```

the only thing it picked up was the named middleware reference. The actual handler body — where most of the application's business logic and vulnerabilities live — was invisible to every downstream stage, which is why `reachability_filter` was dropping ~75% of the units in the example codebase.

Approach

New `_extractRouteHandlerCallbacks` pass on every source file that walks every `CallExpression`, detects the Express verb shape (`.{get,post,put,patch,delete,options,head,all,use}(...)`), and extracts each `ArrowFunction` / `FunctionExpression` argument as its own unit with:

  • synthetic name ` ` (e.g. `POST /orders`) when the path is a string literal — matches the "method and path as metadata" bullet under "Expected behavior" in the issue.
  • `unitType: "route_handler"` so the existing classifier downstream doesn't have to re-derive it from the body.
  • `isEntryPoint: true` so `reachability_filter` treats it as a request-data entry the way it already treats named `(req, res)` middleware.
  • `httpMethod` / `httpPath` carried through for any follow-up step that wants to render route info.

Multi-callback registrations (chained middleware + final handler) get suffixed with their post-path arg index — `POST /orders` for the first callback, `POST /orders [1]` for the second, etc — so they don't collide.

`SyntaxKind.CallExpression` is resolved off the typescript dep at call time (its numeric value drifts between releases — 213 in older versions, 214 in 5.x) rather than hard-coded.

Smoke test

Ran the analyzer against the issue's example file plus three additional shapes:

```
$ node -e "const { TypeScriptAnalyzer } = require('./typescript_analyzer.js');
const a = new TypeScriptAnalyzer('/tmp/express_test_repo');
const result = a.analyzeFiles(['routes.js']);
for (const [id, f] of Object.entries(result.functions)) {
console.log(id, '| type=' + f.unitType, '| entry=' + (f.isEntryPoint||false),
'| method=' + (f.httpMethod||''), '| path=' + (f.httpPath||''));
}
"
routes.js:POST /orders [1] | type=route_handler | entry=true | method=POST | path=/orders
routes.js:GET /users/:id | type=route_handler | entry=true | method=GET | path=/users/:id
routes.js:USE /api | type=route_handler | entry=true | method=USE | path=/api
routes.js:DELETE /items/:id [2] | type=route_handler | entry=true | method=DELETE | path=/items/:id
```

Named middleware references (`authenticateToken`, `authorizeAdmin`) correctly stay out of the unit list since they parse as `Identifier`, not `ArrowFunction` / `FunctionExpression`. The existing variable-decl and module.exports passes still pick those up where they're defined.

What's not in scope

  • Hapi / Koa / Fastify use a different shape — `server.route({ method, path, handler: () => {...} })` — and would need separate detection. Left out so this PR is focused on the Express case the issue described.
  • Method names spelled different from `router` / `app` — the detector looks at the verb suffix only, not the receiver. So `myCustomRouter.post('/x', handler)` works the same way `app.post` does. If you'd prefer a tighter receiver allowlist, easy to swap in.
  • Route handlers passed indirectly (`const h = (req, res) => {...}; router.post('/x', h)`) — the variable-declaration pass already extracts `h`, so reachability_filter sees it via that name. Worth flagging for review in case you'd rather de-dup or tag it as a route handler too.

Test plan

  • Manual smoke test against the issue's example file (4 routes → 4 units, all flagged `isEntryPoint`)
  • Verified named-middleware references stay out of the unit list (no double-extraction)
  • No JS-side test harness exists in the repo for the parser, so I'm relying on the smoke test above plus your CI for downstream regressions. Happy to bolt on a small node-based test suite for the analyzer in a follow-up if you want one.

…backs (knostic#21)

The analyzer never walked into call-expression argument lists, so when
a route was defined idiomatically as

  router.post('/orders', authenticateToken, async (req, res) => { ... });

the only thing it picked up was the named middleware reference. The
actual handler body — where most of the application's business logic
and vulnerabilities live — was invisible to every downstream stage,
causing reachability_filter to drop almost everything as
"unreachable from input."

Adds an _extractRouteHandlerCallbacks pass that walks every
CallExpression in the file, detects the Express verb shape
(`<receiver>.{get,post,put,patch,delete,options,head,all,use}(...)`),
and extracts each ArrowFunction / FunctionExpression argument as a
unit with:

  - synthetic name `<METHOD> <path>` (e.g. `POST /orders`) when the
    path is a string literal — matches the "method and path as
    metadata" expectation in the issue.
  - `unitType: "route_handler"` so the existing classifier downstream
    doesn't have to re-derive it from the body.
  - `isEntryPoint: true` so reachability_filter treats it as a
    request-data entry the way it already treats named (req, res)
    middleware.
  - `httpMethod` / `httpPath` properties carried through for any
    follow-up steps that want to render route info.

Multi-callback registrations (chained middleware + final handler) get
suffixed with their post-path arg index so they don't collide.

`SyntaxKind.CallExpression` is resolved dynamically off the typescript
dep at call time — its numeric value drifts between typescript
releases (213 in older versions, 214 in 5.x).

Smoke-tested against the issue's example file plus three additional
shapes (`router.get(path, handler)`, `app.use(path, mw)`,
`router.delete(path, mw1, mw2, handler)`); all four units extracted
with correct method / path / index. The named middleware references
(`authenticateToken`) correctly stay out of the unit list since they
parse as Identifiers, not ArrowFunctions.

Hapi / Koa / Fastify use a different shape (object literal with a
`handler` property) and are out of scope for this fix.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[bug] JavaScript parser misses Express.js anonymous route handler callbacks

1 participant