Conversation
There was a problem hiding this comment.
Summary
This PR adds a solid observability baseline with CloudWatch dashboards, alarms, and structured logging. However, 2 logic errors block merge that will prevent the observability features from functioning correctly.
Critical Issues (Must Fix)
- Lambda log group mismatch: Custom log group won't receive logs due to incorrect naming pattern
- Missing correlation ID in error paths: Breaks request tracing for failed requests
Overall Assessment
Once the logic errors are fixed, this PR delivers the intended observability-as-a-service baseline. The structured logging, alarms, and dashboard implementation are well-designed.
You can now have the agent implement changes and create commits directly on your pull request's source branch. Simply comment with /q followed by your request in natural language to ask the agent to make changes.
| return { | ||
| statusCode: 500, | ||
| headers: { | ||
| 'Content-Type': 'application/json', | ||
| 'Access-Control-Allow-Origin': '*' | ||
| 'Access-Control-Allow-Origin': '*', | ||
| }, |
There was a problem hiding this comment.
🛑 Logic Error: Missing correlation ID in error response breaks observability contract. Error responses should include the same headers as success responses to enable request tracing.
| return { | |
| statusCode: 500, | |
| headers: { | |
| 'Content-Type': 'application/json', | |
| 'Access-Control-Allow-Origin': '*' | |
| 'Access-Control-Allow-Origin': '*', | |
| }, | |
| const errorCorrelationId = | |
| event.headers?.['x-correlation-id'] || | |
| event.headers?.['X-Correlation-Id'] || | |
| context.awsRequestId; | |
| return { | |
| statusCode: 500, | |
| headers: { | |
| 'Content-Type': 'application/json', | |
| 'Access-Control-Allow-Origin': '*', | |
| 'x-correlation-id': errorCorrelationId, | |
| }, | |
| body: JSON.stringify({ error: 'Internal Server Error' }), | |
| }; |
| const lambdaAppLogs = new logs.LogGroup(this, 'LambdaApplicationLogs', { | ||
| logGroupName: `/aws/lambda/${lambda_backend.functionName}-application`, | ||
| encryptionKey, | ||
| retention: logs.RetentionDays.ONE_MONTH, | ||
| removalPolicy: RemovalPolicy.RETAIN, | ||
| }); |
There was a problem hiding this comment.
🛑 Logic Error: Custom log group is created but Lambda won't write to it. Lambda writes to /aws/lambda/${lambda_backend.functionName} by default, not -application suffix. To use this log group, either configure Lambda's logGroup property or use metric filters on the default log group.
| const lambdaAppLogs = new logs.LogGroup(this, 'LambdaApplicationLogs', { | |
| logGroupName: `/aws/lambda/${lambda_backend.functionName}-application`, | |
| encryptionKey, | |
| retention: logs.RetentionDays.ONE_MONTH, | |
| removalPolicy: RemovalPolicy.RETAIN, | |
| }); | |
| const lambdaAppLogs = new logs.LogGroup(this, 'LambdaApplicationLogs', { | |
| logGroupName: `/aws/lambda/${lambda_backend.functionName}`, | |
| encryptionKey, | |
| retention: logs.RetentionDays.ONE_MONTH, | |
| removalPolicy: RemovalPolicy.RETAIN, | |
| }); |
Motivation
Description
lib/cdk-app-stack.ts: importedaws-cloudwatch,aws-cloudwatch-actions, andaws-sns, and provisioned an encrypted SNS topicObservabilityAlarmTopic.LambdaErrorsAlarm), Lambda p95 duration (LambdaDurationP95Alarm), and API Gateway 5xx (Api5xxAlarm) and wired them to the SNS topic viaSnsAction.PlatformObservabilityDashboardwith widgets for Lambda invocations/errors, Lambda duration (p50/p95), API requests/5xx, and API latency (p50/p95).LambdaApplicationLogsand exported stack outputsObservabilityDashboardName,ObservabilityAlarmTopicArn, andLambdaApplicationLogGroupNamefor discoverability.APP_LOG_LEVELandSERVICE_NAME, and refactored the handler inlib/function.tsto emit structured JSON logs, aloghelper, and to propagate a correlation ID in responses (x-correlation-id).docs/observability-as-a-service.mddescribing assessment, implemented baseline, and recommended next steps, and updateddocs/platform-product-progress.mdto reflect the maturity increment.Testing
npm run buildwhich completed successfully (TypeScript compile succeeded).npm run synthwhich completed successfully and produced a CloudFormation template (synth succeeded with non-blocking tooling/runtime warnings).Codex Task