Skip to content

fix: extract Express.js anonymous route handler callbacks#49

Draft
joshbouncesecurity wants to merge 3 commits intoknostic:masterfrom
joshbouncesecurity:fix/issue21-express-handlers
Draft

fix: extract Express.js anonymous route handler callbacks#49
joshbouncesecurity wants to merge 3 commits intoknostic:masterfrom
joshbouncesecurity:fix/issue21-express-handlers

Conversation

@joshbouncesecurity
Copy link
Copy Markdown
Contributor

Summary

The JS/TS parser pipeline did not extract anonymous arrow function callbacks used as Express route handlers, so most application code in a typical Express app was invisible to analysis.

This PR adds detection in typescript_analyzer.js:

  • Recognises Express-style call sites (<obj>.<verb>(<path>, ...callbacks) with verb in get|post|put|patch|delete|options|head|all|use).
  • Filters the receiver to plausibly-Express identifiers (app, router, routes, server, plus chained .route(...)/.Router() calls) so generic .get(...) calls on caches/clients/query-builders aren't misread as routes.
  • Extracts anonymous arrow / function expressions in callback positions as units, marking the last as route_handler (with is_entry_point: true) and earlier callbacks as route_middleware.
  • Adds metadata: http_method, http_path, callback_index, named_middleware.
  • Emits call-graph edges from each anonymous callback to named middleware identifiers in the same call (e.g. authenticateToken); dependency_resolver.js merges these "explicit" edges with the body-text regex edges so reachability/upstream-deps see the relationship.

unit_generator.js surfaces the new fields on each unit (route, is_entry_point, http_method, http_path, callback_index). Existing route_handlers detected by the per-function classifier still flow through unchanged.

Addresses #21 (does not close — verify against the user's reported repo first).

Test plan

  • tests/parsers/javascript/test_express_route_handlers.py covers:
    • anonymous handler with named middleware: extraction, is_entry_point=true, http_method/http_path, call-graph edge to authenticateToken
    • handler with no middleware: clean unit with no extra edges
    • router.use('/api', anonMw1, anonMw2, anonHandler): one route_handler + two route_middleware
    • non-Express myCache.get('foo', () => {}) and queryBuilder.post(...): nothing extracted
    • named-only router.get('/x', namedHandler): no anonymous unit synthesised; named function still extracted by the regular path
  • Existing JS parser tests (tests/test_js_parser.py) still pass.
  • Full suite: pytest tests/ -> 96 passed, 12 skipped, 3 xfailed (preexisting Windows-only xfails).
  • Manual: parse a small Express app — anonymous route handler bodies appear as units in dataset.json, is_entry_point: true, route populated.

joshbouncesecurity and others added 3 commits May 4, 2026 21:42
The JS/TS parser pipeline did not extract anonymous arrow function
callbacks used as Express route handlers, so most application code in a
typical Express app was invisible to analysis.

This change adds detection in typescript_analyzer.js:

- Recognises Express-style call sites (`<obj>.<verb>(<path>, ...callbacks)`
  with verb in `get|post|put|patch|delete|options|head|all|use`).
- Filters the receiver to plausibly-Express identifiers (`app`, `router`,
  `routes`, `server` and chained `.route(...)`/`.Router()`) so generic
  `.get(...)` calls on caches/clients aren't misread as routes.
- Extracts anonymous arrow / function expressions in callback positions
  as units, marking the last as `route_handler` (with
  `is_entry_point: true`) and earlier callbacks as `route_middleware`.
- Adds metadata: `http_method`, `http_path`, `callback_index`,
  `named_middleware`.
- Records explicit call-graph edges from each anonymous callback to
  named middleware identifiers in the same call (e.g.
  `authenticateToken`); dependency_resolver.js merges these with the
  body-text regex edges so reachability/upstream-deps work.
- unit_generator.js surfaces the metadata on the unit (`route`,
  `is_entry_point`, `http_method`, `http_path`, `callback_index`).

Addresses #21.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The Express anonymous route handler extraction added synthetic entries
to `data["functions"]` but not to `data["callGraph"]`, breaking the
existing invariant `len(callGraph) == len(functions)` exercised by
`test_js_parser.TestTypeScriptAnalyzer::test_builds_call_graph`.

Emit a callGraph entry for each synthesised route_handler /
route_middleware unit, capturing inline call expressions from the
callback body. Named middleware identifiers continue to flow through
`explicitCalls` and are merged downstream by `dependency_resolver.js`.

Add a regression test asserting the callGraph/functions invariant for
the synthetic units.
Adds four regression tests for `_extractExpressRouteCallbacks`:

- TypeScript-annotated callbacks `(req: Request, res: Response) => {...}`
  parse correctly and produce the expected synthetic handler unit.
- `app.get('/' + prefix, handler)` (dynamic path) is skipped without
  throwing — confirms the StringLiteral check is a hard gate.
- `app.use((req, res, next) => {...})` with no path produces a single
  unit with http_path=null and http_method='USE'.
- `app.get(path, anonMw, namedHandler)` (anon middleware before named
  handler): the anon callback gets a route_middleware unit with
  named_middleware=['namedHandler'], and the named handler is left to
  the regular extractor.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@joshbouncesecurity
Copy link
Copy Markdown
Contributor Author

Manual verification

Sample Express file:

const router = express.Router();
router.post('/orders', authenticateToken, async (req, res) => {
  const { productId } = req.body;
  // ...
});
  • openant parse <repo>: dataset.json contains a unit for the anonymous POST /orders handler with is_entry_point: true, http_method: "POST", http_path: "/orders".
  • Call graph: edge from the synthesized handler to authenticateToken.
  • app.get('/users', (req, res) => res.json([])): extracted with no extra middleware edges.
  • router.use('/api', limiter, auth, async (req, res, next) => {...}) (multi-middleware, all anonymous): one route_handler unit + two route_middleware units.
  • False-positive guard: myCache.get('foo', () => {}) and queryBuilder.post('users', () => {}): NOT extracted (correctly skipped — non-Express receivers).
  • Named handler: router.get('/x', namedHandler): no anonymous unit synthesized; namedHandler extracted by the existing path.
  • TS-typed callback: (req: Request, res: Response): Promise<void> => {...}: extracted correctly with metadata.
  • Dynamic path: app.get('/' + prefix, ...) (path is not a string literal): correctly skipped, no crash.
  • On the issue's reported repo (Express app with 7 vulnerabilities): unit count grows; reachability_filter retains the route handlers.

@joshbouncesecurity
Copy link
Copy Markdown
Contributor Author

Local test results

Built a tiny inline Express fixture (1 file, 13 lines) and ran the JS pipeline (typescript_analyzer + unit_generator) from this branch.

Fixture (orders.js):

const express = require('express');
const router = express.Router();

function authenticateToken(req, res, next) { return next(); }

router.post('/orders', authenticateToken, async (req, res) => {
  const { productId } = req.body;
  res.json({ ok: true, productId });
});

module.exports = router;

Commands run:

node typescript_analyzer.js <repo> --files-from list.txt --output analyzer_output.json
node unit_generator.js analyzer_output.json --output dataset.json

Outcome (per checklist item):

  • Anonymous POST /orders handler appears as a unit (orders.js:express(POST:/orders:8:1)) ✅
  • is_entry_point: true on the synthesized handler ✅
  • HTTP method and path are present — but on the unit's route field as route.method = 'POST' / route.path = '/orders' (the top-level http_method / http_path / callback_index shown in the checklist are not populated on the unit; they may be intentionally consumed only at intermediate stages or there may be a small surfacing gap in unit_generator.js) ⚠️
  • Call-graph edge from the anonymous handler to authenticateToken is present in metadata.direct_calls = ['orders.js:authenticateToken']; the named middleware also lists the handler as a caller in its direct_callers
  • route.middleware = ['authenticateToken'] correctly captures the named middleware ✅
  • Did not separately exercise the multi-middleware, false-positive guard (myCache.get, queryBuilder.post), TS-typed callback, dynamic-path skip, or the originally-reported repo on this run — those are covered by tests/parsers/javascript/test_express_route_handlers.py per the diff.

One small thing worth a look: the manual-verification checklist mentions top-level http_method / http_path / callback_index fields, but in dataset.json only route.method / route.path are populated. Either the checklist text or unit_generator.js's field-surfacing might want a small alignment.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant