Skip to content

CCM-16073 - Enhanced callbacks#145

Draft
rhyscoxnhs wants to merge 29 commits intomainfrom
feature/CCM-16073
Draft

CCM-16073 - Enhanced callbacks#145
rhyscoxnhs wants to merge 29 commits intomainfrom
feature/CCM-16073

Conversation

@rhyscoxnhs
Copy link
Copy Markdown
Contributor

Description

Context

Type of changes

  • Refactoring (non-breaking change)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would change existing functionality)
  • Bug fix (non-breaking change which fixes an issue)

Checklist

  • I am familiar with the contributing guidelines
  • I have followed the code style of the project
  • I have added tests to cover my changes
  • I have updated the documentation accordingly
  • This PR is a result of pair or mob programming

Sensitive Information Declaration

To ensure the utmost confidentiality and protect your and others privacy, we kindly ask you to NOT including PII (Personal Identifiable Information) / PID (Personal Identifiable Data) or any other sensitive data in this PR (Pull Request) and the codebase changes. We will remove any PR that do contain any sensitive information. We really appreciate your cooperation in this matter.

  • I confirm that neither PII/PID nor sensitive data are included in this PR and the codebase changes.

@rhyscoxnhs rhyscoxnhs requested review from a team as code owners April 15, 2026 10:34
@rhyscoxnhs rhyscoxnhs changed the title CCM-16073 - Enhanced callbacks [skip ci] CCM-16073 - Enhanced callbacks Apr 15, 2026
@rhyscoxnhs rhyscoxnhs requested a review from Copilot April 15, 2026 10:35
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Introduces enhanced callback delivery security and operational controls by adding mTLS + certificate pinning configuration to callback targets, and shifting delivery from direct EventBridge API Destinations to an SQS + per-client HTTPS delivery Lambda model (with Redis-backed gating for rate limiting/circuit breaking).

Changes:

  • Add mtls, certPinning, and delivery fields to callback target model + schema validation, and update fixtures/tests accordingly.
  • Add CLI commands to manage mTLS, certificate pinning enable/disable, and SPKI hash extraction/storage for targets.
  • Add new https-client-lambda (delivery, signing, retries, DLQ handling, Redis/Lua gate) and new shared config-cache package; update Terraform to provision per-client delivery infra (SQS/Lambda/ElastiCache) and mock mTLS ALB.

Reviewed changes

Copilot reviewed 104 out of 107 changed files in this pull request and generated 7 comments.

Show a summary per file
File Description
tools/client-subscriptions-management/src/entrypoint/cli/targets-set-pinning.ts New CLI command to enable/disable certificate pinning for a target.
tools/client-subscriptions-management/src/entrypoint/cli/targets-set-mtls.ts New CLI command to enable/disable mTLS for a target.
tools/client-subscriptions-management/src/entrypoint/cli/targets-set-certificate.ts New CLI command to extract/store SPKI hash from PEM for a target.
tools/client-subscriptions-management/src/entrypoint/cli/clients-put.ts Minor cleanup in CLI file handling comment.
tools/client-subscriptions-management/src/domain/client-subscription-builder.ts Adds default mtls/certPinning and emits security warnings when building targets.
tools/client-subscriptions-management/src/tests/helpers/client-subscription-fixtures.ts Updates test target fixtures with required mtls/certPinning fields.
tools/client-subscriptions-management/src/tests/entrypoint/cli/targets-set-pinning.test.ts Unit tests for targets-set-pinning CLI behavior.
tools/client-subscriptions-management/src/tests/entrypoint/cli/targets-set-mtls.test.ts Unit tests for targets-set-mtls CLI behavior.
tools/client-subscriptions-management/src/tests/entrypoint/cli/targets-set-certificate.test.ts Unit tests for targets-set-certificate CLI behavior.
tools/client-subscriptions-management/src/tests/domain/client-subscription-builder.test.ts Adds tests around security warning emission in target builder.
tools/client-subscriptions-management/package.json Adds picocolors dependency for warning output.
tests/integration/helpers/mock-client-config.ts Adds a new integration fixture key for an mTLS-enabled mock client.
tests/integration/helpers/event-factories.ts Adds factory for delivery messages compatible with the new delivery flow.
tests/integration/fixtures/subscriptions/mock-client-rate-limit.json New integration fixture including required mtls/certPinning.
tests/integration/fixtures/subscriptions/mock-client-mtls.json New integration fixture for mTLS + pinning enabled target.
tests/integration/fixtures/subscriptions/mock-client-circuit-breaker.json New integration fixture including delivery circuit breaker configuration.
tests/integration/fixtures/subscriptions/mock-client-2.json Updates existing integration fixture targets to include mtls/certPinning.
tests/integration/fixtures/subscriptions/mock-client-1.json Updates existing integration fixture targets to include mtls/certPinning.
src/models/src/client-config.ts Extends callback target type with mtls, certPinning, and optional delivery.
src/models/src/client-config-schema.ts Adds Zod validation for mtls, certPinning (incl. SPKI hash constraints), and delivery.
src/models/src/tests/client-config-schema.test.ts Adds/updates schema tests for new target fields and constraints.
src/config-cache/tsconfig.json New package tsconfig for config-cache workspace.
src/config-cache/src/index.ts Exports ConfigCache from new workspace package.
src/config-cache/src/config-cache.ts Implements TTL-based in-memory config cache.
src/config-cache/src/tests/config-cache.test.ts Unit tests for ConfigCache TTL and behaviors.
src/config-cache/package.json New workspace package definition for config-cache.
src/config-cache/jest.config.ts Jest config for config-cache package.
scripts/config/pre-commit.yaml Adjusts pre-commit detect-private-key exclusions for a test file.
pnpm-workspace.yaml Adds workspace catalog entries for @redis/client, picocolors, and Secrets Manager SDK.
lambdas/mock-webhook-lambda/src/index.ts Adds ALB mTLS client-cert header validation for mock webhook endpoint.
lambdas/mock-webhook-lambda/src/tests/index.test.ts Adds unit tests for ALB mTLS certificate verification flow.
lambdas/https-client-lambda/tsconfig.json New lambda workspace tsconfig.
lambdas/https-client-lambda/src/services/ssm-applications-map.ts Loads and caches clientId→applicationId map from SSM.
lambdas/https-client-lambda/src/services/sqs-visibility.ts SQS ChangeMessageVisibility helper.
lambdas/https-client-lambda/src/services/record-result.lua Redis Lua script to update circuit breaker state.
lambdas/https-client-lambda/src/services/payload-signer.ts Adjusts payload signer function signature (parameter order).
lambdas/https-client-lambda/src/services/logger.ts Re-exports shared logger for lambda local imports.
lambdas/https-client-lambda/src/services/endpoint-gate.ts Redis admission + circuit-breaker integration (EVALSHA/EVAL Lua execution).
lambdas/https-client-lambda/src/services/dlq-sender.ts DLQ send helper.
lambdas/https-client-lambda/src/services/delivery/tls-agent-factory.ts Builds TLS agent with optional mTLS material and SPKI pinning.
lambdas/https-client-lambda/src/services/delivery/retry-policy.ts Retry/backoff and Retry-After parsing helpers.
lambdas/https-client-lambda/src/services/delivery/https-client.ts HTTPS delivery client + result classification.
lambdas/https-client-lambda/src/services/delivery-metrics.ts Embedded metrics emission for delivery and circuit-breaker events.
lambdas/https-client-lambda/src/services/config-loader.ts Loads client config/targets from S3 with TTL caching and schema validation.
lambdas/https-client-lambda/src/services/admit.lua Redis Lua script for token-bucket rate limiting + CB admission.
lambdas/https-client-lambda/src/lua.d.ts Type declaration for importing .lua as text.
lambdas/https-client-lambda/src/index.ts Lambda entrypoint delegating to record processor.
lambdas/https-client-lambda/src/handler.ts Main SQS batch handler: load config, sign, gate, deliver, retry/DLQ, metrics.
lambdas/https-client-lambda/src/tests/tls-agent-factory.test.ts Unit tests for TLS agent factory behavior (S3/SecretsManager/pinning).
lambdas/https-client-lambda/src/tests/ssm-applications-map.test.ts Unit tests for SSM applications map loading/caching/errors.
lambdas/https-client-lambda/src/tests/sqs-visibility.test.ts Unit tests for SQS visibility changes.
lambdas/https-client-lambda/src/tests/retry-policy.test.ts Unit tests for retry policy helpers.
lambdas/https-client-lambda/src/tests/payload-signer.test.ts Unit tests for payload signing.
lambdas/https-client-lambda/src/tests/index.test.ts Unit test for lambda entrypoint wiring.
lambdas/https-client-lambda/src/tests/https-client.test.ts Unit tests for delivery HTTP behavior classification.
lambdas/https-client-lambda/src/tests/handler.test.ts Unit tests for SQS record processing paths (DLQ/retry/gate/CB).
lambdas/https-client-lambda/src/tests/endpoint-gate.test.ts Unit tests for Redis Lua invocation paths and client creation.
lambdas/https-client-lambda/src/tests/dlq-sender.test.ts Unit tests for DLQ sender.
lambdas/https-client-lambda/src/tests/delivery-metrics.test.ts Unit tests for embedded metrics behavior.
lambdas/https-client-lambda/src/tests/config-loader.test.ts Unit tests for S3 config loading + TTL cache.
lambdas/https-client-lambda/package.json New lambda workspace package definition.
lambdas/https-client-lambda/lua-transform.js Jest transform to load .lua scripts as strings.
lambdas/https-client-lambda/jest.config.ts Jest config enabling .lua transform.
lambdas/client-transform-filter-lambda/src/services/observability.ts Removes callback signing observability (signing moved downstream).
lambdas/client-transform-filter-lambda/src/services/config-loader.ts Switches to shared config-cache package import.
lambdas/client-transform-filter-lambda/src/services/config-loader-service.ts Switches to shared config-cache package import.
lambdas/client-transform-filter-lambda/src/index.ts Removes ApplicationsMapService wiring from handler creation.
lambdas/client-transform-filter-lambda/src/handler.ts Removes per-target signatures from output; outputs deliverable payload + subscriptions only.
lambdas/client-transform-filter-lambda/src/tests/services/payload-signer.test.ts Removes tests for payload signing in transform-filter lambda.
lambdas/client-transform-filter-lambda/src/tests/services/config-update.component.test.ts Updates imports to new config-cache package.
lambdas/client-transform-filter-lambda/src/tests/services/config-loader.test.ts Updates imports to new config-cache package.
lambdas/client-transform-filter-lambda/src/tests/services/config-cache.test.ts Updates imports to new config-cache package.
lambdas/client-transform-filter-lambda/src/tests/index.test.ts Updates tests to reflect removal of signatures and apps map dependency.
lambdas/client-transform-filter-lambda/src/tests/index.component.test.ts Updates component tests to reflect new payload shape and removed SSM dependency.
lambdas/client-transform-filter-lambda/src/tests/helpers/client-subscription-fixtures.ts Updates target fixtures to include required mtls/certPinning.
lambdas/client-transform-filter-lambda/package.json Adds config-cache workspace dependency.
knip.ts Updates Knip workspace config (new workspaces and integration entrypoints).
infrastructure/terraform/modules/client-destination/variables.tf Removes legacy API Destination-based module.
infrastructure/terraform/modules/client-destination/module_target_dlq.tf Removes legacy per-target DLQ module.
infrastructure/terraform/modules/client-destination/locals.tf Removes legacy locals for old module.
infrastructure/terraform/modules/client-destination/iam_role_api_target_role.tf Removes legacy IAM role/policy for API destinations.
infrastructure/terraform/modules/client-destination/cloudwatch_event_rule_main.tf Removes legacy EventBridge rule/target to API destination setup.
infrastructure/terraform/modules/client-destination/cloudwatch_event_connection_main.tf Removes legacy EventBridge Connection resources.
infrastructure/terraform/modules/client-destination/cloudwatch_event_api_destination_this.tf Removes legacy API Destination resources.
infrastructure/terraform/modules/client-destination/README.md Removes docs for legacy module.
infrastructure/terraform/modules/client-delivery/variables.tf Adds new per-client delivery module variables.
infrastructure/terraform/modules/client-delivery/outputs.tf Adds outputs for per-client queues and lambda details.
infrastructure/terraform/modules/client-delivery/module_sqs_per_client.tf Provisions per-client delivery SQS queue.
infrastructure/terraform/modules/client-delivery/module_https_client_lambda.tf Provisions per-client https-client lambda + SQS event source mapping.
infrastructure/terraform/modules/client-delivery/module_dlq_per_client.tf Provisions per-client DLQ + DLQ depth alarm.
infrastructure/terraform/modules/client-delivery/locals.tf Defines naming/tagging locals for per-client module.
infrastructure/terraform/modules/client-delivery/iam_role_sqs_target.tf IAM policy for https-client lambda (SQS/S3/SSM/KMS and optional SecretsManager/S3 cert).
infrastructure/terraform/modules/client-delivery/cloudwatch_event_rule_per_subscription.tf Defines EventBridge rules/targets routing to per-client SQS delivery queue.
infrastructure/terraform/modules/client-delivery/README.md Adds docs for new per-client delivery module.
infrastructure/terraform/components/callbacks/variables.tf Enables X-Ray by default and adds mTLS-related component variables.
infrastructure/terraform/components/callbacks/pipes_pipe_main.tf Removes signatures from Pipe output template.
infrastructure/terraform/components/callbacks/module_mock_webhook_alb_mtls.tf Adds internal ALB + ACM import + passthrough mTLS wiring for mock webhook.
infrastructure/terraform/components/callbacks/module_client_destination.tf Removes legacy client_destination module usage.
infrastructure/terraform/components/callbacks/module_client_delivery.tf Adds per-client delivery module instantiation.
infrastructure/terraform/components/callbacks/locals.tf Reworks locals for per-client subscriptions/targets and mTLS mock endpoint selection.
infrastructure/terraform/components/callbacks/elasticache_delivery_state.tf Adds ElastiCache Serverless Redis for delivery state + security groups/alarms.
infrastructure/terraform/components/callbacks/cloudwatch_metric_alarm_dlq_depth.tf Removes legacy per-target DLQ alarm (replaced by per-client alarms).
infrastructure/terraform/components/callbacks/cloudwatch_eventbus_main.tf Adds EventBridge archive for 7-day retention.
infrastructure/terraform/components/callbacks/README.md Updates docs to reflect new modules, variables, and defaults.
eslint.config.mjs Ignores lua-transform.js and expands rule scope to include src workspaces.
.gitleaksignore Adds ignore for test private key fixture in tls-agent-factory tests.
Files not reviewed (1)
  • pnpm-lock.yaml: Language not supported

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread lambdas/https-client-lambda/src/services/delivery/https-client.ts
Comment thread lambdas/https-client-lambda/src/services/delivery/https-client.ts
Comment thread tools/client-subscriptions-management/src/entrypoint/cli/targets-set-pinning.ts Outdated
Comment thread infrastructure/terraform/modules/client-delivery/module_https_client_lambda.tf Outdated
Comment thread lambdas/https-client-lambda/src/services/delivery/tls-agent-factory.ts Outdated
@mjewildnhs mjewildnhs marked this pull request as draft April 15, 2026 10:45
Copy link
Copy Markdown
Contributor

@aidenvaines-cgi aidenvaines-cgi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the TF stuff all looks cool from an rough check

Comment thread infrastructure/terraform/components/callbacks/elasticache_delivery_state.tf Outdated
Comment thread infrastructure/terraform/components/callbacks/module_client_destination.tf Outdated
Comment thread infrastructure/terraform/modules/client-delivery/module_sqs_per_client.tf Outdated
Comment thread infrastructure/terraform/components/callbacks/variables.tf Outdated
Comment thread lambdas/https-client-lambda/src/__tests__/handler.test.ts
Comment thread lambdas/https-client-lambda/src/services/delivery/retry-policy.ts
Comment thread lambdas/https-client-lambda/src/services/delivery/tls-agent-factory.ts Outdated
Comment thread lambdas/https-client-lambda/src/services/delivery/tls-agent-factory.ts Outdated
Comment thread lambdas/https-client-lambda/src/services/endpoint-gate.ts Outdated
Comment thread lambdas/https-client-lambda/src/__tests__/handler.test.ts Outdated
Comment thread lambdas/https-client-lambda/src/__tests__/payload-signer.test.ts
Copy link
Copy Markdown
Contributor

@mjewildnhs mjewildnhs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As far as reviewing the terraform

Comment thread infrastructure/terraform/components/callbacks/elasticache_delivery_state.tf Outdated
Comment thread .gitleaksignore
Comment thread infrastructure/terraform/components/callbacks/elasticache_delivery_state.tf Outdated
Comment thread infrastructure/terraform/modules/client-delivery/module_https_client_lambda.tf Outdated
Comment thread infrastructure/terraform/modules/client-delivery/module_https_client_lambda.tf Outdated
Comment thread infrastructure/terraform/modules/client-delivery/outputs.tf Outdated
Comment thread infrastructure/terraform/modules/client-delivery/variables.tf Outdated
Comment thread infrastructure/terraform/modules/client-delivery/variables.tf Outdated
Copy link
Copy Markdown
Contributor

@mjewildnhs mjewildnhs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done transform filter lambda changes - in middle of http lambda

Comment thread lambdas/client-transform-filter-lambda/src/index.ts
Comment thread lambdas/client-transform-filter-lambda/package.json Outdated
Comment thread lambdas/https-client-lambda/src/services/admit.lua
Comment thread lambdas/https-client-lambda/src/handler.ts Outdated
async function processRecord(record: SQSRecord): Promise<void> {
const { CLIENT_ID } = process.env;
if (!CLIENT_ID) {
throw new Error("CLIENT_ID is required");
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this is misconfigured then the behaviour will be undesirable - SQS will just keep retrying it 100 times.
However it will never happen so I don't think its worth explicitly handling it and treating the events as permanent failures (by sending to the DLQ).

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Re-opening as had a config issue in test and couldn't stop it!

Comment thread lambdas/https-client-lambda/src/handler.ts Outdated
Comment thread lambdas/https-client-lambda/src/services/delivery/https-client.ts Outdated
Comment thread lambdas/https-client-lambda/src/services/delivery/retry-policy.ts Outdated
Comment thread lambdas/https-client-lambda/src/services/delivery/tls-agent-factory.ts Outdated
Comment thread lambdas/https-client-lambda/src/services/config-loader.ts
Comment thread lambdas/https-client-lambda/src/services/logger.ts Outdated
Comment thread lambdas/https-client-lambda/src/handler.ts Outdated
Comment thread lambdas/https-client-lambda/src/handler.ts Outdated
Copy link
Copy Markdown
Contributor

@mjewildnhs mjewildnhs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Middle of http client lambda review - done with the handler, client, retry policy

Comment thread lambdas/https-client-lambda/src/handler.ts Outdated
Comment thread lambdas/https-client-lambda/src/handler.ts Outdated
Comment thread lambdas/https-client-lambda/src/handler.ts Outdated
Comment thread lambdas/https-client-lambda/src/handler.ts Outdated
Comment thread lambdas/https-client-lambda/src/services/delivery/tls-agent-factory.ts Outdated
ELASTICACHE_IAM_USERNAME = var.elasticache_iam_username
}

vpc_config = var.lambda_security_group_id != "" ? {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where is the security group defined?
Am I missing something?
We need to consider what the lambda can talk to and on what ports - the lambda supports a port being in the webhook URL.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The security group is defined in the component (components/callbacks/elasticache_delivery_state.tf), not in this module. It's created there as aws_security_group.https_client_lambda alongside the ElastiCache security group, and its ID is passed into this module via var.lambda_security_group_id (line 58), which is then attached through the vpc_config block.

Fixed port mapping to allow any port since this is configurable per-client.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Resolved

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ahh wasn't looking in elasticache_delivery_state for lambda -> webhook permissions - but it makes sense why its the way it is.
I've unresolved this for a @cgitim opinion on which ports we open up.
I think its safe to allow any port because it ultimately comes from tightly restricted config.
Would be better if we could limit to 443 - but I don't suppose we have any kind of requirements on it.

Comment thread lambdas/https-client-lambda/src/services/delivery/https-client.ts
Comment thread lambdas/https-client-lambda/src/services/delivery/retry-policy.ts Outdated
Comment thread lambdas/https-client-lambda/src/services/delivery/tls-agent-factory.ts Outdated
Comment thread lambdas/https-client-lambda/src/services/delivery/tls-agent-factory.ts Outdated
}

async function loadCertMaterial(): Promise<CertMaterial> {
const isProduction = Boolean(MTLS_CERT_SECRET_ARN);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should revisit the decision to have the difference in cert handling between environments.
It adds a lot of complexity and unless we always test for it in non-prod there's a big danger we won't realize its broken until we've released it.
I seem to remember it was because were were concerned about costs for ephemeral environments.
Its priced per secret per month (at $0.40) - doesn't that mean we'll only be billed for a fraction of the based on how long the env was deployed?
https://aws.amazon.com/secrets-manager/pricing/

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At least loadFromSecretsManager is trivial but the loadFromS3 function is currently awful.
Theres a tonne of env vars as well backing the S3 solution.

Comment thread lambdas/https-client-lambda/src/services/delivery/tls-agent-factory.ts Outdated
Copy link
Copy Markdown
Contributor

@mjewildnhs mjewildnhs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed the http-lambda (minus unit tests)

Comment thread lambdas/https-client-lambda/src/services/config-loader.ts
Comment thread lambdas/https-client-lambda/src/services/admit.lua
Comment thread lambdas/https-client-lambda/src/services/record-result.lua
Comment thread lambdas/https-client-lambda/src/services/endpoint-gate.ts Outdated
Comment thread lambdas/https-client-lambda/src/services/ssm-applications-map.ts
Comment thread lambdas/mock-webhook-lambda/src/index.ts
Comment thread lambdas/mock-webhook-lambda/src/index.ts
Comment thread src/models/src/client-config.ts Outdated
Comment thread src/models/src/client-config-schema.ts Outdated
Comment thread tools/client-subscriptions-management/src/entrypoint/cli/targets-set-mtls.ts Outdated
Comment thread tools/client-subscriptions-management/src/entrypoint/cli/targets-set-mtls.ts Outdated
Comment thread tools/client-subscriptions-management/src/entrypoint/cli/targets-set-mtls.ts Outdated
Comment thread tools/client-subscriptions-management/src/entrypoint/cli/targets-set-pinning.ts Outdated
Comment thread tools/client-subscriptions-management/src/entrypoint/cli/clients-put.ts Outdated
Comment thread lambdas/https-client-lambda/src/services/delivery/https-client.ts Outdated
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All looks very promising - but where are the tests?!
Happy to leave them out till we get something reasonable looking deployed.

@rhyscoxnhs rhyscoxnhs force-pushed the feature/CCM-16073 branch 2 times, most recently from 71c3f08 to 5f9b348 Compare April 17, 2026 08:08
Comment thread lambdas/https-client-lambda/src/services/delivery-observability.ts
Comment thread knip.ts Outdated
Comment thread lambdas/https-client-lambda/src/services/admit.lua
Comment thread lambdas/https-client-lambda/src/services/endpoint-gate.ts Outdated
Comment thread lambdas/https-client-lambda/src/handler.ts Outdated
Comment thread lambdas/https-client-lambda/src/handler.ts
variable "sqs_max_receive_count" {
type = number
description = "Maximum receive count before message moves to DLQ"
default = 100
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did we have a max retry attempts in the spec?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doesn't look like it, should we omit this?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need something set for when things go badly wrong - any unexpected errors (e.g. redis connection failures) currently retry up to this limit.
I guess we just need the 100 to cover a reasonable amount of retry attempts for transient errors.
I guess the circuit should break and prevent us burning through the attempts too quickly.
copying @cgitim in if he has any opinion - given his research paper on rate limiting 😉

Copy link
Copy Markdown
Contributor

@mjewildnhs mjewildnhs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed

Comment thread lambdas/https-client-lambda/src/services/delivery/retry-policy.ts Outdated
Comment thread lambdas/https-client-lambda/src/services/delivery/https-client.ts Outdated
rhyscoxnhs and others added 4 commits April 22, 2026 08:15
* ALB/webhook uses https for mTLS and non mTLS - remove http endpoint

* Log thrown errors in http-client-lambda

* Update int test debug script

* Update SQS to webhook int tests to use correct queues

* Update debug int script README

* Fix redis script error and better logging for redis errors

* Log status code of perm failures

* fix: load test CA for server trust when mtls is disabled

In test environments, the mock webhook ALB uses a server certificate
signed by a locally-generated test CA. Previously, the CA was only
loaded into the TLS agent when mtls.enabled was true (needed for
client certificate auth). Targets with mtls.enabled: false used
Node's default trust store, which does not include the test CA,
causing every delivery attempt to fail with SELF_SIGNED_CERT_IN_CHAIN.

Fix by loading the CA whenever MTLS_TEST_CA_S3_KEY is set, regardless
of mtls.enabled. The client key and cert are still only applied when
mtls.enabled is true. MTLS_TEST_CA_S3_KEY is not set in production,
so non-mTLS targets in production are unaffected.

* fix: set ERROR_CODE and ERROR_MESSAGE on DLQ messages for permanent delivery failures

AWS API Destinations previously set these SQS message attributes automatically.
After the migration to the https-client lambda, they were no longer being set.

- Read the response body for 4xx responses (previously discarded with res.resume())
  so the error message can be included in the DLQ message attributes
- Set ERROR_CODE=HTTP_CLIENT_ERROR for 4xx webhook rejections, or the TLS error
  code (e.g. CERT_HAS_EXPIRED) for connection-level failures
- Set ERROR_MESSAGE to the message field from the JSON response body, falling
  back to the raw body if not valid JSON
- Extended sendToDlq to accept and forward MessageAttributes to SQS

* Fix redrive IT dlq names

* Fix metric IT dlq names

* Fix alarm test

* Bump test coverage

* Fix redis client IAM / connectivity issues + logging improvements
Comment thread lambdas/https-client-lambda/src/handler.ts
Comment thread lambdas/https-client-lambda/src/handler.ts
* CCM-16002 - Revised performance test implementation

metricsInstance = createMetricsLogger();
metricsInstance.setNamespace(namespace);
metricsInstance.setDimensions({ Environment: environment });
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need ClientId as an additional dimension so we can filter metrics by client. I don't suppose we have anything defined in the spec or anything from any observability conversations? @cgitim
If we do add it then it will have an impact on cost as its $0.30 per metric and currently we have 10 metrics and if we have 100 clients thats $300 vs $3

Comment on lines +262 to +267
it("accepts maxRetryDurationSeconds below 60", () => {
const config = createValidConfig();
config.targets[0].delivery = { maxRetryDurationSeconds: 10 };

expect(parseClientSubscriptionConfiguration(config).success).toBe(true);
});
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we remove this test as its a bit redundant (arbitrarily testing 10 seconds).

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this was purely to not reduce coverage, since Sonar can be a bit strict about that

metrics.putMetric("DeliveryPermanentFailure", 1, Unit.Count);
}

export function emitRateLimited(targetId: string): void {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The metrics DeliveryRateLimited and AdmissionDenied are a bit confusing.
DeliveryRateLimited is when the webhook responds with a 429 and AdmissionDenied could be us rate limiting or circuit breaking.
I think we should have something like:

  • DeliveryRateLimited -> DeliveryServerRateLimited
  • AdmissionDenied (reason "rate_limited") -> DeliveryRateLimited
  • AdmissionDenied (reason "circuit_open") -> DeliveryCircuitBlocked

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When the IT and final perf test stuff has merged in we should get AI to assess the commonality in helpers between the 2 and put common code in test-support.

"@nhs-notify-client-callbacks/models": "workspace:*",
"@nhs-notify-client-callbacks/test-support": "workspace:*",
"async-wait-until": "catalog:app"
"esbuild": "catalog:tools"
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this not a dev dependency?

const startSec = Math.floor(testStartMs / 1000);
const endSec = Math.floor(Date.now() / 1000);

const snap = await queryMetricsSnapshot(
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we refactor some of this to cut down on the duplication in query code within the loop and the final snapshot?

}),
);

if (!queryId) return null;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we refactor away some of the duplication 60-73 vs 96-109 (the polling query result code)

* Rename mock-client-1/2 and add mtls test

* Retry window and exhaustion test

* DLQ redrive delivery queue test

* Metric test coverage

* Rate limit tests

* Fix: reset metrics per invocation

* Fix new tests to use UUIDs as message ids rather than timestamps

* Add correlation id to logging

* Batch count logging

* Circuit breaker test

* Integration test tidy up refactor

* Log receive count and sqsMessageId

* Fix batch success count on DLQ

* Log receive count for all messages
"enabled": true
},
"maxRetryDurationSeconds": 7200,
"mtls": {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should turn on mtls and pinning for all the perf clients as this will be the slowest config.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants