fix: extract Express.js anonymous route handler callbacks#49
Draft
joshbouncesecurity wants to merge 3 commits intoknostic:masterfrom
Draft
fix: extract Express.js anonymous route handler callbacks#49joshbouncesecurity wants to merge 3 commits intoknostic:masterfrom
joshbouncesecurity wants to merge 3 commits intoknostic:masterfrom
Conversation
The JS/TS parser pipeline did not extract anonymous arrow function callbacks used as Express route handlers, so most application code in a typical Express app was invisible to analysis. This change adds detection in typescript_analyzer.js: - Recognises Express-style call sites (`<obj>.<verb>(<path>, ...callbacks)` with verb in `get|post|put|patch|delete|options|head|all|use`). - Filters the receiver to plausibly-Express identifiers (`app`, `router`, `routes`, `server` and chained `.route(...)`/`.Router()`) so generic `.get(...)` calls on caches/clients aren't misread as routes. - Extracts anonymous arrow / function expressions in callback positions as units, marking the last as `route_handler` (with `is_entry_point: true`) and earlier callbacks as `route_middleware`. - Adds metadata: `http_method`, `http_path`, `callback_index`, `named_middleware`. - Records explicit call-graph edges from each anonymous callback to named middleware identifiers in the same call (e.g. `authenticateToken`); dependency_resolver.js merges these with the body-text regex edges so reachability/upstream-deps work. - unit_generator.js surfaces the metadata on the unit (`route`, `is_entry_point`, `http_method`, `http_path`, `callback_index`). Addresses #21. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The Express anonymous route handler extraction added synthetic entries to `data["functions"]` but not to `data["callGraph"]`, breaking the existing invariant `len(callGraph) == len(functions)` exercised by `test_js_parser.TestTypeScriptAnalyzer::test_builds_call_graph`. Emit a callGraph entry for each synthesised route_handler / route_middleware unit, capturing inline call expressions from the callback body. Named middleware identifiers continue to flow through `explicitCalls` and are merged downstream by `dependency_resolver.js`. Add a regression test asserting the callGraph/functions invariant for the synthetic units.
Adds four regression tests for `_extractExpressRouteCallbacks`:
- TypeScript-annotated callbacks `(req: Request, res: Response) => {...}`
parse correctly and produce the expected synthetic handler unit.
- `app.get('/' + prefix, handler)` (dynamic path) is skipped without
throwing — confirms the StringLiteral check is a hard gate.
- `app.use((req, res, next) => {...})` with no path produces a single
unit with http_path=null and http_method='USE'.
- `app.get(path, anonMw, namedHandler)` (anon middleware before named
handler): the anon callback gets a route_middleware unit with
named_middleware=['namedHandler'], and the named handler is left to
the regular extractor.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Contributor
Author
Manual verificationSample Express file: const router = express.Router();
router.post('/orders', authenticateToken, async (req, res) => {
const { productId } = req.body;
// ...
});
|
Contributor
Author
Local test resultsBuilt a tiny inline Express fixture (1 file, 13 lines) and ran the JS pipeline (typescript_analyzer + unit_generator) from this branch. Fixture ( const express = require('express');
const router = express.Router();
function authenticateToken(req, res, next) { return next(); }
router.post('/orders', authenticateToken, async (req, res) => {
const { productId } = req.body;
res.json({ ok: true, productId });
});
module.exports = router;Commands run: Outcome (per checklist item):
One small thing worth a look: the manual-verification checklist mentions top-level |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
The JS/TS parser pipeline did not extract anonymous arrow function callbacks used as Express route handlers, so most application code in a typical Express app was invisible to analysis.
This PR adds detection in
typescript_analyzer.js:<obj>.<verb>(<path>, ...callbacks)with verb inget|post|put|patch|delete|options|head|all|use).app,router,routes,server, plus chained.route(...)/.Router()calls) so generic.get(...)calls on caches/clients/query-builders aren't misread as routes.route_handler(withis_entry_point: true) and earlier callbacks asroute_middleware.http_method,http_path,callback_index,named_middleware.authenticateToken);dependency_resolver.jsmerges these "explicit" edges with the body-text regex edges so reachability/upstream-deps see the relationship.unit_generator.jssurfaces the new fields on each unit (route,is_entry_point,http_method,http_path,callback_index). Existing route_handlers detected by the per-function classifier still flow through unchanged.Addresses #21 (does not close — verify against the user's reported repo first).
Test plan
tests/parsers/javascript/test_express_route_handlers.pycovers:is_entry_point=true,http_method/http_path, call-graph edge toauthenticateTokenrouter.use('/api', anonMw1, anonMw2, anonHandler): oneroute_handler+ tworoute_middlewaremyCache.get('foo', () => {})andqueryBuilder.post(...): nothing extractedrouter.get('/x', namedHandler): no anonymous unit synthesised; named function still extracted by the regular pathtests/test_js_parser.py) still pass.pytest tests/-> 96 passed, 12 skipped, 3 xfailed (preexisting Windows-only xfails).dataset.json,is_entry_point: true,routepopulated.