fix(parsers/javascript): extract Express anonymous route handler callbacks (#21)#48
Open
ChrisJr404 wants to merge 1 commit intoknostic:masterfrom
Open
fix(parsers/javascript): extract Express anonymous route handler callbacks (#21)#48ChrisJr404 wants to merge 1 commit intoknostic:masterfrom
ChrisJr404 wants to merge 1 commit intoknostic:masterfrom
Conversation
…backs (knostic#21) The analyzer never walked into call-expression argument lists, so when a route was defined idiomatically as router.post('/orders', authenticateToken, async (req, res) => { ... }); the only thing it picked up was the named middleware reference. The actual handler body — where most of the application's business logic and vulnerabilities live — was invisible to every downstream stage, causing reachability_filter to drop almost everything as "unreachable from input." Adds an _extractRouteHandlerCallbacks pass that walks every CallExpression in the file, detects the Express verb shape (`<receiver>.{get,post,put,patch,delete,options,head,all,use}(...)`), and extracts each ArrowFunction / FunctionExpression argument as a unit with: - synthetic name `<METHOD> <path>` (e.g. `POST /orders`) when the path is a string literal — matches the "method and path as metadata" expectation in the issue. - `unitType: "route_handler"` so the existing classifier downstream doesn't have to re-derive it from the body. - `isEntryPoint: true` so reachability_filter treats it as a request-data entry the way it already treats named (req, res) middleware. - `httpMethod` / `httpPath` properties carried through for any follow-up steps that want to render route info. Multi-callback registrations (chained middleware + final handler) get suffixed with their post-path arg index so they don't collide. `SyntaxKind.CallExpression` is resolved dynamically off the typescript dep at call time — its numeric value drifts between typescript releases (213 in older versions, 214 in 5.x). Smoke-tested against the issue's example file plus three additional shapes (`router.get(path, handler)`, `app.use(path, mw)`, `router.delete(path, mw1, mw2, handler)`); all four units extracted with correct method / path / index. The named middleware references (`authenticateToken`) correctly stay out of the unit list since they parse as Identifiers, not ArrowFunctions. Hapi / Koa / Fastify use a different shape (object literal with a `handler` property) and are out of scope for this fix.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Closes #21.
The analyzer never walked into call-expression argument lists, so when a route was defined idiomatically as
```js
router.post('/orders', authenticateToken, async (req, res) => { ... });
```
the only thing it picked up was the named middleware reference. The actual handler body — where most of the application's business logic and vulnerabilities live — was invisible to every downstream stage, which is why `reachability_filter` was dropping ~75% of the units in the example codebase.
Approach
New `_extractRouteHandlerCallbacks` pass on every source file that walks every `CallExpression`, detects the Express verb shape (`.{get,post,put,patch,delete,options,head,all,use}(...)`), and extracts each `ArrowFunction` / `FunctionExpression` argument as its own unit with:
Multi-callback registrations (chained middleware + final handler) get suffixed with their post-path arg index — `POST /orders` for the first callback, `POST /orders [1]` for the second, etc — so they don't collide.
`SyntaxKind.CallExpression` is resolved off the typescript dep at call time (its numeric value drifts between releases — 213 in older versions, 214 in 5.x) rather than hard-coded.
Smoke test
Ran the analyzer against the issue's example file plus three additional shapes:
```
$ node -e "const { TypeScriptAnalyzer } = require('./typescript_analyzer.js');
const a = new TypeScriptAnalyzer('/tmp/express_test_repo');
const result = a.analyzeFiles(['routes.js']);
for (const [id, f] of Object.entries(result.functions)) {
console.log(id, '| type=' + f.unitType, '| entry=' + (f.isEntryPoint||false),
'| method=' + (f.httpMethod||''), '| path=' + (f.httpPath||''));
}
"
routes.js:POST /orders [1] | type=route_handler | entry=true | method=POST | path=/orders
routes.js:GET /users/:id | type=route_handler | entry=true | method=GET | path=/users/:id
routes.js:USE /api | type=route_handler | entry=true | method=USE | path=/api
routes.js:DELETE /items/:id [2] | type=route_handler | entry=true | method=DELETE | path=/items/:id
```
Named middleware references (`authenticateToken`, `authorizeAdmin`) correctly stay out of the unit list since they parse as `Identifier`, not `ArrowFunction` / `FunctionExpression`. The existing variable-decl and module.exports passes still pick those up where they're defined.
What's not in scope
Test plan