Conversation
Why these changes are being introduced: All containerized applications, whether intended for ECS or Lambda, give the developer the option to push an updated container directly to the ECR Repository in Dev1 from their local machine. These commands are generated by the mitlib-tf-workloads-ecr repository that creates the ECR Repository in AWS and are put in the application Makefile. How this addresses that need: * Update the Makefile with the commands generated by HCP Terraform run of mitlib-tf-workloads-ecr Relevant ticket(s): * https://mitlibraries.atlassian.net/browse/USE-428
Why these changes are being introduced: There are minor tweaks to the default ci.yml workflow. Additionally, now that the ECR Repositories exist in dev, stage, and prod AWS Accounts, for this application, we use the GHA workflow files from the outputs of mitlib-tf-worklads-ecr to create the files here. How this addresses that need: * Switch the ci.yml trigger from on.push to on.pull-request * Filter out changes inside the .github directory for the ci.yml workflow * Add a permissions statement that just gives "read" access for the ci.yml workflow * Create the dev-build workflow * Create the stage-build workflow * Create the prod-deploy workflow Side effects of this change: None. Relevant ticket(s): * https://mitlibraries.atlassian.net/browse/USE-428
❌ 10 blocking issues (10 total)
|
| paths-ignore: | ||
| - '.github/**' | ||
|
|
||
| permissions: read-all |
| - '.github/**' | ||
|
|
||
| permissions: | ||
| id-token: write |
| outputs: | ||
| cpuarch: ${{ steps.setarch.outputs.cpuarch }} | ||
| steps: | ||
| - name: Checkout |
| deploy: | ||
| needs: prep | ||
| name: Dev Deploy | ||
| uses: mitlibraries/.github/.github/workflows/ecr-multi-arch-deploy-dev.yml@main |
| types: [published] | ||
|
|
||
| permissions: | ||
| id-token: write |
| outputs: | ||
| cpuarch: ${{ steps.setarch.outputs.cpuarch }} | ||
| steps: | ||
| - name: Checkout |
| deploy: | ||
| needs: prep | ||
| name: Deploy | ||
| uses: mitlibraries/.github/.github/workflows/ecr-multi-arch-promote-prod.yml@main |
| - '.github/**' | ||
|
|
||
| permissions: | ||
| id-token: write |
| outputs: | ||
| cpuarch: ${{ steps.setarch.outputs.cpuarch }} | ||
| steps: | ||
| - name: Checkout |
| deploy: | ||
| needs: prep | ||
| name: Stage Deploy | ||
| uses: mitlibraries/.github/.github/workflows/ecr-multi-arch-deploy-stage.yml@main |
JPrevost
left a comment
There was a problem hiding this comment.
Mostly comments are for future discussions.
One possible change for now would be to add .arch_tag to .gitignore to prevent someone accidentally committing it if we they do a dist-dev without subsequent docker-clean and don't pay attention to what they commit later on.
| --tag $(ECR_NAME_DEV):$$ARCH_TAG \ | ||
| . | ||
|
|
||
| publish-dev: dist-dev ## Build, tag and push (intended for developer-based manual publish) |
There was a problem hiding this comment.
Non-blocking:
I wonder if we could get a permissions check. I intentionally ran this without credentials and was surprised to see it try to do a lot of work before failing due to lack of credentials. I quick fail would be nice longterm, but clearly not needed now as it is very minor.
Works with expected role.
There was a problem hiding this comment.
yeah... publish-dev starts by re-running dist-dev before attempting to connect to AWS. I'm 100% on board to rethinking these commands, but maybe not for this PR?
There was a problem hiding this comment.
Definitely not for this pr, and thanks for pointing out what was happening was the dist-dev. We probably don't need to rethink this at all with that in mind.
| @@ -1,5 +1,12 @@ | |||
| name: CI | |||
| on: push | |||
There was a problem hiding this comment.
Why are we changing CI to only run on PRs? This doesn't feel related to the ECR work.
There was a problem hiding this comment.
This is a result of conversations a month or two ago when Graham & I were working on fixing up another repository. We both agreed that running CI on every single push before a PR was excessive. Once a PR is opened, CI will run on every subsequent push in the PR, which seemed fine.
But your question is more about scope creep... I'm happy to pull this out since it's not directly related to the publishing workflows (and then fix it at a more appropriate time).
There was a problem hiding this comment.
Yeah, I don't like this change and would prefer to have it removed for now. I'm open to discussing further, but at this point I prefer to see CI turn green before opening a PR
| echo "### :abacus: Architecture Selection" >> $GITHUB_STEP_SUMMARY | ||
| if [[ -f .aws-architecture ]]; then | ||
| ARCH=$(cat .aws-architecture) | ||
| echo "\`$ARCH\` was read from \`.aws-architecture\` and passed to the deploy job." >> $GITHUB_STEP_SUMMARY |
There was a problem hiding this comment.
Ah, this may be why we generate a separate arch_tag file... we need different values in the docker tag than in this parameter here. (I have another comment elsewhere that I'm choosing to leave even though I think this explains it just so you can see what was confusing when reviewing the workflow)
There was a problem hiding this comment.
There is likely a better way to do this, but it does come down to the fact that we need two different formats to track the architecture. Annoying.
| on: | ||
| workflow_dispatch: | ||
| release: | ||
| types: [published] |
There was a problem hiding this comment.
Non-blocking:
Longterm I'd like to talk about how we might have an emergency rollback option to get back to a clean deploy if something unexpected goes wrong. This may add some level of risk and uncertainty as to what is in a given tier at any time, but it adds a huge safety net that allow us to get back to a working state rapidly.
One way might be to have a way to deploy the previous-latest tag from ECR
There was a problem hiding this comment.
This will definitely take some thinking and conversation. Infra has ways to do this via the Console/CLI, but I'm pretty sure that the permissions we grant (in this case, to Timdex_Managers) don't provide rights to do this.
| name: Dev Container Build and Deploy | ||
| on: | ||
| workflow_dispatch: | ||
| pull_request: |
There was a problem hiding this comment.
Slightly nervous having this automatically deploy on pull request could lead to race conditions as to what is in dev1 at any given time. I'm open to giving it a shot though and if we find it confusing we can switch to just workflow_dispatch: for dev1
There was a problem hiding this comment.
I hear your concern on this and would be happy to revisit in the future. So far, we haven't seen a problem with DataEng applications and this automatic deploy on PR, but that doesn't mean that this might not become a problem in the future.
There was a problem hiding this comment.
I think for this lambda it's probably fine.
Do we actually run deploys for PRs opened from renovate and the like or are we somehow limiting to human PRs? If it's just humans, I have no real worry. If bots can overwrite my dev1, it's eventually gonna happen :)
There was a problem hiding this comment.
The shared workflows (in the repository) have this conditional:
# This line adds a check for the user which is requesting the PR. As long as its not dependabot, we go ahead and run it.
if: ${{ github.triggering_actor != 'dependabot[bot]' }}So, it will not build on PRs from dependabot. I have not addressed Renovate skips yet!
There was a problem hiding this comment.
Gotcha. Once I activate renovate on this I'll PR to that or open a ticket.
Add .arch_tag to .gitgnore
|
@cabutlermit I think we're good to go with this. All remaining comments are really to set up future discussions. |
Rolls back the change to the CI workflow so that it runs on push again.
Purpose and background context
DiscoEng is building a new Lambda function that will be built from a containerized image. Infra has built the ECR Repositories in dev, stage, and prod AWS Accounts to store the image, so this application repository needs the standard GitHub Actions workflows and the additional commands for the
Makefileto enable the developer to easily deploy test images to Dev1.There are a number of warnings from
qltysh[bot]regarding the security in the GitHub Actions workflows. This should be addressed at some point in the future for all of our repositories, but these are not super dangerous for now.How can a reviewer manually see the effects of these changes?
Makefile
It's possible to test the new
Makefilecommands by checking out this branch and running thecheck-arch,dist-dev,publish-dev, anddocker-cleancommands. Theupdate-lambda-devcommand will not work until Infra has built out additional infrastructure for the Lambda function itself.GitHub Actions
It is possible to verify that the
dev-buildGHA workflow triggered correctly by looking at the Actions tab for this repository (or just click here to see the actual run). It is also possible to verify theworkflow_dispatchoption for thedev-buildworkflow by manually triggering the workflow via the Actions UI in this repository.NOTE: The automated
dev-buildAction run did "fail", but that was expected. Theupdate-lambda-functionstep in the workflow will not be successful until InfraEng builds out the other infrastructure for the Lambda function itself (see note above about the same command in theMakefile).Includes new or updated dependencies?
YES: Depends on shared container publishing workflows in our .github repository
Changes expectations for external applications?
NO
What are the relevant tickets?
Code review