Skip to content

Automated Container Publishing to AWS#8

Merged
cabutlermit merged 5 commits intomainfrom
USE-428-ecr-deployment
Mar 6, 2026
Merged

Automated Container Publishing to AWS#8
cabutlermit merged 5 commits intomainfrom
USE-428-ecr-deployment

Conversation

@cabutlermit
Copy link
Contributor

@cabutlermit cabutlermit commented Mar 5, 2026

Purpose and background context

DiscoEng is building a new Lambda function that will be built from a containerized image. Infra has built the ECR Repositories in dev, stage, and prod AWS Accounts to store the image, so this application repository needs the standard GitHub Actions workflows and the additional commands for the Makefile to enable the developer to easily deploy test images to Dev1.

There are a number of warnings from qltysh[bot] regarding the security in the GitHub Actions workflows. This should be addressed at some point in the future for all of our repositories, but these are not super dangerous for now.

How can a reviewer manually see the effects of these changes?

Makefile

It's possible to test the new Makefile commands by checking out this branch and running the check-arch, dist-dev, publish-dev, and docker-clean commands. The update-lambda-dev command will not work until Infra has built out additional infrastructure for the Lambda function itself.

GitHub Actions

It is possible to verify that the dev-build GHA workflow triggered correctly by looking at the Actions tab for this repository (or just click here to see the actual run). It is also possible to verify the workflow_dispatch option for the dev-build workflow by manually triggering the workflow via the Actions UI in this repository.

NOTE: The automated dev-build Action run did "fail", but that was expected. The update-lambda-function step in the workflow will not be successful until InfraEng builds out the other infrastructure for the Lambda function itself (see note above about the same command in the Makefile).

Includes new or updated dependencies?

YES: Depends on shared container publishing workflows in our .github repository

Changes expectations for external applications?

NO

What are the relevant tickets?

Code review

  • Code review best practices are documented here and you are encouraged to have a constructive dialogue with your reviewers about their preferences and expectations.

JPrevost and others added 3 commits March 5, 2026 17:22
Why these changes are being introduced:
All containerized applications, whether intended for ECS or Lambda,
give the developer the option to push an updated container directly to
the ECR Repository in Dev1 from their local machine. These commands are
generated by the mitlib-tf-workloads-ecr repository that creates the
ECR Repository in AWS and are put in the application Makefile.

How this addresses that need:
* Update the Makefile with the commands generated by HCP Terraform run
of mitlib-tf-workloads-ecr

Relevant ticket(s):
* https://mitlibraries.atlassian.net/browse/USE-428
Why these changes are being introduced:
There are minor tweaks to the default ci.yml workflow. Additionally, now
that the ECR Repositories exist in dev, stage, and prod AWS Accounts,
for this application, we use the GHA workflow files from the outputs
of mitlib-tf-worklads-ecr to create the files here.

How this addresses that need:
* Switch the ci.yml trigger from on.push to on.pull-request
* Filter out changes inside the .github directory for the ci.yml
workflow
* Add a permissions statement that just gives "read" access for the
ci.yml workflow
* Create the dev-build workflow
* Create the stage-build workflow
* Create the prod-deploy workflow

Side effects of this change:
None.

Relevant ticket(s):
* https://mitlibraries.atlassian.net/browse/USE-428
@qltysh
Copy link

qltysh bot commented Mar 5, 2026

❌ 10 blocking issues (10 total)

Tool Category Rule Count
zizmor Vulnerability overly broad permissions 4
zizmor Vulnerability credential persistence through GitHub Actions artifacts 3
zizmor Vulnerability secrets unconditionally inherited by called workflow 3

paths-ignore:
- '.github/**'

permissions: read-all
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

overly broad permissions [zizmor:zizmor/excessive-permissions]

- '.github/**'

permissions:
id-token: write
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

overly broad permissions [zizmor:zizmor/excessive-permissions]

outputs:
cpuarch: ${{ steps.setarch.outputs.cpuarch }}
steps:
- name: Checkout
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

credential persistence through GitHub Actions artifacts [zizmor:zizmor/artipacked]

deploy:
needs: prep
name: Dev Deploy
uses: mitlibraries/.github/.github/workflows/ecr-multi-arch-deploy-dev.yml@main
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

secrets unconditionally inherited by called workflow [zizmor:zizmor/secrets-inherit]

types: [published]

permissions:
id-token: write
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

overly broad permissions [zizmor:zizmor/excessive-permissions]

outputs:
cpuarch: ${{ steps.setarch.outputs.cpuarch }}
steps:
- name: Checkout
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

credential persistence through GitHub Actions artifacts [zizmor:zizmor/artipacked]

deploy:
needs: prep
name: Deploy
uses: mitlibraries/.github/.github/workflows/ecr-multi-arch-promote-prod.yml@main
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

secrets unconditionally inherited by called workflow [zizmor:zizmor/secrets-inherit]

- '.github/**'

permissions:
id-token: write
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

overly broad permissions [zizmor:zizmor/excessive-permissions]

outputs:
cpuarch: ${{ steps.setarch.outputs.cpuarch }}
steps:
- name: Checkout
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

credential persistence through GitHub Actions artifacts [zizmor:zizmor/artipacked]

deploy:
needs: prep
name: Stage Deploy
uses: mitlibraries/.github/.github/workflows/ecr-multi-arch-deploy-stage.yml@main
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

secrets unconditionally inherited by called workflow [zizmor:zizmor/secrets-inherit]

@cabutlermit cabutlermit marked this pull request as ready for review March 6, 2026 13:54
@cabutlermit cabutlermit requested a review from JPrevost March 6, 2026 13:54
@JPrevost JPrevost self-assigned this Mar 6, 2026
Copy link
Member

@JPrevost JPrevost left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mostly comments are for future discussions.

One possible change for now would be to add .arch_tag to .gitignore to prevent someone accidentally committing it if we they do a dist-dev without subsequent docker-clean and don't pay attention to what they commit later on.

--tag $(ECR_NAME_DEV):$$ARCH_TAG \
.

publish-dev: dist-dev ## Build, tag and push (intended for developer-based manual publish)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Non-blocking:

I wonder if we could get a permissions check. I intentionally ran this without credentials and was surprised to see it try to do a lot of work before failing due to lack of credentials. I quick fail would be nice longterm, but clearly not needed now as it is very minor.

Works with expected role.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah... publish-dev starts by re-running dist-dev before attempting to connect to AWS. I'm 100% on board to rethinking these commands, but maybe not for this PR?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Definitely not for this pr, and thanks for pointing out what was happening was the dist-dev. We probably don't need to rethink this at all with that in mind.

@@ -1,5 +1,12 @@
name: CI
on: push
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are we changing CI to only run on PRs? This doesn't feel related to the ECR work.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a result of conversations a month or two ago when Graham & I were working on fixing up another repository. We both agreed that running CI on every single push before a PR was excessive. Once a PR is opened, CI will run on every subsequent push in the PR, which seemed fine.

But your question is more about scope creep... I'm happy to pull this out since it's not directly related to the publishing workflows (and then fix it at a more appropriate time).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I don't like this change and would prefer to have it removed for now. I'm open to discussing further, but at this point I prefer to see CI turn green before opening a PR

echo "### :abacus: Architecture Selection" >> $GITHUB_STEP_SUMMARY
if [[ -f .aws-architecture ]]; then
ARCH=$(cat .aws-architecture)
echo "\`$ARCH\` was read from \`.aws-architecture\` and passed to the deploy job." >> $GITHUB_STEP_SUMMARY
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, this may be why we generate a separate arch_tag file... we need different values in the docker tag than in this parameter here. (I have another comment elsewhere that I'm choosing to leave even though I think this explains it just so you can see what was confusing when reviewing the workflow)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is likely a better way to do this, but it does come down to the fact that we need two different formats to track the architecture. Annoying.

on:
workflow_dispatch:
release:
types: [published]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Non-blocking:

Longterm I'd like to talk about how we might have an emergency rollback option to get back to a clean deploy if something unexpected goes wrong. This may add some level of risk and uncertainty as to what is in a given tier at any time, but it adds a huge safety net that allow us to get back to a working state rapidly.

One way might be to have a way to deploy the previous-latest tag from ECR

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will definitely take some thinking and conversation. Infra has ways to do this via the Console/CLI, but I'm pretty sure that the permissions we grant (in this case, to Timdex_Managers) don't provide rights to do this.

name: Dev Container Build and Deploy
on:
workflow_dispatch:
pull_request:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Slightly nervous having this automatically deploy on pull request could lead to race conditions as to what is in dev1 at any given time. I'm open to giving it a shot though and if we find it confusing we can switch to just workflow_dispatch: for dev1

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I hear your concern on this and would be happy to revisit in the future. So far, we haven't seen a problem with DataEng applications and this automatic deploy on PR, but that doesn't mean that this might not become a problem in the future.

Copy link
Member

@JPrevost JPrevost Mar 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think for this lambda it's probably fine.

Do we actually run deploys for PRs opened from renovate and the like or are we somehow limiting to human PRs? If it's just humans, I have no real worry. If bots can overwrite my dev1, it's eventually gonna happen :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The shared workflows (in the repository) have this conditional:

    # This line adds a check for the user which is requesting the PR. As long as its not dependabot, we go ahead and run it. 
    if: ${{ github.triggering_actor != 'dependabot[bot]' }}

So, it will not build on PRs from dependabot. I have not addressed Renovate skips yet!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Gotcha. Once I activate renovate on this I'll PR to that or open a ticket.

Add .arch_tag to .gitgnore
@JPrevost
Copy link
Member

JPrevost commented Mar 6, 2026

@cabutlermit I think we're good to go with this. All remaining comments are really to set up future discussions.

Rolls back the change to the CI workflow so that it runs on push again.
@cabutlermit cabutlermit merged commit deb0c4a into main Mar 6, 2026
2 of 3 checks passed
@cabutlermit cabutlermit deleted the USE-428-ecr-deployment branch March 6, 2026 15:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants