fix(security): validate ATS URL hosts#19
Conversation
- Replace substring domain checks with parsed hostname validation. - Add regression coverage for lookalike ATS hosts. Tests: cd sidecar && python3 -m pytest -q tests/test_ashby.py tests/test_greenhouse.py tests/test_indeed.py tests/test_linkedin.py -q (passed); python3 -m ruff check touched sidecar files (passed)
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 5456db8beb
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
|
|
||
| def hostname_matches(url: str, trusted_domain: str) -> bool: | ||
| """Return true when a URL host is the trusted domain or its subdomain.""" | ||
| parsed = urlparse(url) |
There was a problem hiding this comment.
Handle malformed URLs without raising
When a job has a malformed apply_url such as http://[::1, urlparse raises ValueError, so adapter validate() calls now escape instead of returning the validation-error list promised by BaseAdapter.validate. The previous substring checks simply reported the URL as invalid; wrapping parsing failures and returning False would preserve that behavior for bad user/input data.
Useful? React with 👍 / 👎.
| hostname_matches(url, _GREENHOUSE_ROOT) | ||
| and host not in _GREENHOUSE_BOARDS_HOSTS |
There was a problem hiding this comment.
Reject bare Greenhouse hosts before extracting board token
When apply_url is a bare-root Greenhouse URL like https://greenhouse.io/jobs/12345 and no explicit board_token is supplied, this condition passes because hostname_matches accepts the root domain; the return below then uses greenhouse.io as the board token and calls the boards API with a bogus board instead of rejecting an unsupported URL shape. The supported subdomain pattern requires a company host, so this branch should exclude host == _GREENHOUSE_ROOT before deriving the token.
Useful? React with 👍 / 👎.
What
Why
How
Testing
Performance Impact
Risk / Notes