Skip to content

Commit 53eea69

Browse files
author
Dinesh Mane
committed
Initial commit v1:
-Launching SM studio domain stack -Creating SM pipeline stack -Executing SM Pipeline stack -Inference result stack
1 parent a6d9318 commit 53eea69

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

48 files changed

+18006
-6
lines changed

README.md

Lines changed: 123 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -1,11 +1,129 @@
1-
## My Project
1+
# Deploy and Manage 100x models using Amazon SageMaker Pipelines
22

3-
TODO: Fill this README out!
3+
## Overview
44

5-
Be sure to:
5+
This GitHub repository showcases the implementation of a comprehensive end-to-end MLOps pipeline using Amazon SageMaker pipelines to deploy and manage 100x machine learning models. The pipeline covers data pre-processing, model training/re-training, hyperparameter tuning, data quality check,model quality check, model registry, and model deployment. Automation of the MLOps pipeline is achieved through Continuous Integration and Continuous Deployment (CI/CD). Machine learning model for this sample code is SageMaker built-in XGBoost algorithm.
66

7-
* Change the title in this README
8-
* Edit your repository description on GitHub
7+
## CDK Stacks
8+
9+
The Cloud Development Kit (CDK) is utilized to define four stacks:
10+
11+
1. **SM StudioSetup Stack**: Launches the SageMaker studio domain notebook with enabled SageMaker projects.
12+
2. **SM Pipeline Stack**: Creates a code build pipeline responsible for creating and updating the SageMaker pipeline to orchastrate ML-ops. This stack defines the workflow and dependencies for building and deploying your machine learning pipeline.
13+
3. **Start SM Pipeline Stack**: Designed to respond to new training data uploaded to the specified S3 bucket. It utilizes a Lambda function to trigger the SageMaker pipeline, ensuring that your machine learning models are updated with the latest data automatically.
14+
4. **Inference Result Stack**: This stack creates necessary resources such as SQS (Simple Queue Service) and Lambda functions to handle the inference results from the SageMaker endpoint.
15+
16+
17+
## Dataset
18+
We use a synthetic telecommunication customer churn dataset as our sample use case. The dataset contains customer phone plan usage, account information and their churn status, whether customer would stay or leave the plan. We use SageMaker's built-in XGBoost algorithm which is suitable for this structured data. In the enhancement of the churn dataset, a new dimension column has been introduced to accommodate values denoted as DummyDim1 through DummyDimN. This approach allows for the creation and training of 100x different models, with each model associated with a distinct DummyDimension. Each row within this dataset pertains to a specific DummyDimX and will be utilized for the training of a corresponding model X. For instance, if there are X customer profiles, you can train X ML models, each associated with DummyDimension values ranging from DummyDim1 to DummyDimX. Inference dataset if already pre-processed dataset from data preproecessing_step of SageMaker pipeline.
19+
20+
## Architecture
21+
22+
The architecture relies primarily on Amazon SageMaker Pipeline to deliver an end-to-end MLOps pipeline for building, training, monitoring, and deploying machine learning models. The architecture can be divided into two main components:
23+
![SageMaker Pipeline Architecture](images/architecture_sm_pipeline.jpg)
24+
1. **SageMaker Pipeline**: Handles data pre-processing, model training/tuning, monitoring, and deployment. It leverages SageMaker Studio Domain as a unified interface for model build and inference. AWS CodePipeline, triggered by an AWS CodeBuild project, automates the creation/updation of the SageMaker pipeline. When new training data is uploaded to the input-bucket, the SageMaker re-training pipeline is executed. The sequential steps include:
25+
26+
- Pulling new training data from S3
27+
- Preprocessing data for training
28+
- Conducting data quality checks
29+
- Training/tuning the model
30+
- Performing model quality checks
31+
- Utilizing the Model Registry to store the model
32+
- Deploying the model
33+
34+
Model deployment is managed by a Lambda step, providing flexibility for data scientists to deploy specific models with customized logic. As there are 100x model deployment so the deployment process is triggered promptly upon the addition of a new model to the model registry instead of manual approval. The LambdaStep is invoked to retrieve the recently added model from the registry and effectuate its deployment on the SageMaker endpoint and the decommissioning of the previous version of the model, ensuring a seamless transition and continuous availability of the latest model version for inference purposes.
35+
![Inference architecture](images/architecture_inference.jpg)
36+
2. **Inference**: The real-time inference process is initiated by uploading a sample inference file to the Amazon S3 bucket. Subsequently, an AWS Lambda function is triggered, responsible for fetching all records from the CSV file and dispatching them to an Amazon Simple Queue Service (SQS) queue. The SQS queue, in turn, activates a designated Lambda function, consume_messages_lambda, designed to invoke the SageMaker endpoint. This endpoint executes the necessary machine learning model for inference on the provided data, and the resulting predictions are then stored in an Amazon DynamoDB table for further analysis and retrieval. This end-to-end workflow ensures efficient and scalable real-time inference capabilities by leveraging AWS services for seamless data processing and storage.
37+
38+
39+
40+
## How to setup CDK project!
41+
42+
This repository includes a `project_config.json` file containing the following attributes:
43+
44+
- **MainStackName**: Name of the main stack.
45+
- **SageMakerPipelineName**: Name of the SageMaker pipeline.
46+
- **SageMakerUserProfiles**: Usernames for Sagemaker studio domain (e.g., ["User1", "User2"]).
47+
- **USE_AMT**: Automatic Model Tuning (AMT) flag. If set to "yes", AMT will be employed for each model deployment, selecting the best-performing model.
48+
49+
```json
50+
{
51+
"MainStackName": "SMPipelineRootStack",
52+
"SageMakerPipelineName": "model-train-deploy-pipeline",
53+
"SageMakerUserProfiles": ["User1", "User2"],
54+
"USE_AMT": "yes"
55+
}
56+
```
57+
58+
Please refer to this configuration file for and update it as per your usecase.
59+
60+
This project is set up like a standard Python project. The initialization
61+
process also creates a virtualenv within this project, stored under the `.venv`
62+
directory. To create the virtualenv it assumes that there is a `python3`
63+
(or `python` for Windows) executable in your path with access to the `venv`
64+
package. If for any reason the automatic creation of the virtualenv fails,
65+
you can create the virtualenv manually.
66+
67+
To manually create a virtualenv on MacOS and Linux:
68+
69+
```
70+
$ python3 -m venv .venv
71+
```
72+
73+
After the init process completes and the virtualenv is created, you can use the following
74+
step to activate your virtualenv.
75+
76+
```
77+
$ source .venv/bin/activate
78+
```
79+
80+
If you are a Windows platform, you would activate the virtualenv like this:
81+
82+
```
83+
% .venv\Scripts\activate.bat
84+
```
85+
86+
Once the virtualenv is activated, you can install the required dependencies.
87+
88+
```
89+
$ pip install -r requirements.txt
90+
```
91+
92+
At this point you can now synthesize the CloudFormation template for this code.
93+
94+
```
95+
$ cdk synth --all
96+
```
97+
98+
Deploy all stacks
99+
100+
```
101+
$ cdk deploy --all
102+
```
103+
104+
After deploying all stacks, ensure the successful execution of the CodePipeline. On AWS Console, navigate to `Developer Tools -> CodePipeline -> Pipelines -> model-train-deploy-pipeline-modelbuild`. Verify the successful completion of the CodePipeline. If the Build phase fails, attempt to rerun the build phase.
105+
106+
![Code pipeline](images/code_pipeline.png)
107+
108+
Move the directories from the `dataset/training-dataset` folder to the `inputbucket` S3 bucket. This will kick off the SageMaker pipeline, initiating three separate executions and deploying three models on the SageMaker endpoint. The process is expected to take approximately 45 minutes, and you can monitor the progress through the SageMaker Studio pipeline UI.
109+
110+
![Pipeline Execution](images/pipeline_execution.png)
111+
112+
For each dimension in our dataset, a corresponding model registry will be created. In our current demonstration, where we have three dimensions, three model registries will be generated. Each model registry will encompass all the models associated with its respective dimension, ensuring a dedicated registry for each dimension.
113+
114+
![Model Registry](images/model_registry.png)
115+
116+
After successfully executing all pipelines and deploying models on the SageMaker endpoint, copy the files from `dataset/inference-dataset` to the `inferencebucket` S3 bucket. Subsequently, the records are read, and the inference results are stored in a DynamoDB table. It's important to note that the inference data has already undergone preprocessing for seamless integration with the endpoint. In a production setting, it is recommended to implement an inference pipeline to preprocess input data consistently with the training data, ensuring alignment between training and production data.
117+
118+
![SM Endpoints](images/sm_endpoint.png)
119+
120+
## Useful commands
121+
122+
* `cdk ls` list all stacks in the app
123+
* `cdk synth` emits the synthesized CloudFormation template
124+
* `cdk deploy` deploy this stack to your default AWS account/region
125+
* `cdk diff` compare deployed stack with current state
126+
* `cdk docs` open CDK documentation
9127

10128
## Security
11129

@@ -14,4 +132,3 @@ See [CONTRIBUTING](CONTRIBUTING.md#security-issue-notifications) for more inform
14132
## License
15133

16134
This library is licensed under the MIT-0 License. See the LICENSE file.
17-

app.py

Lines changed: 33 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,33 @@
1+
import json
2+
import cdk_nag
3+
from aws_cdk import Aspects
4+
import aws_cdk as cdk
5+
6+
from sagemaker_pipeline_deploy_manage_n_models_cdk.stacks.sagemaker_studio_setup_stack import SagemakerStudioSetupStack
7+
from sagemaker_pipeline_deploy_manage_n_models_cdk.stacks.sagemaker_pipeline_stack import SagemakerPipelineStack
8+
from sagemaker_pipeline_deploy_manage_n_models_cdk.stacks.start_sagemaker_pipeline_stack import StartSagemakerPipelineStack
9+
from sagemaker_pipeline_deploy_manage_n_models_cdk.stacks.inference_results_stack import InferenceResultsStack
10+
11+
file = open("project_config.json")
12+
variables = json.load(file)
13+
main_stack_name = variables["MainStackName"]
14+
app = cdk.App()
15+
16+
# This Stack will create resources to create SageMaker Studio notebook
17+
sm_studio_stack = SagemakerStudioSetupStack(app, "SMStudioSetupStack")
18+
19+
# This stack will create resources to create SageMaker Pipeline
20+
sm_pipeline_stack = SagemakerPipelineStack(app, "SMPipelineStack")
21+
sm_pipeline_stack.add_dependency(sm_studio_stack)
22+
23+
# This stack will create resources to execute SageMaker Pipeline
24+
start_sm_pipeline_stack = StartSagemakerPipelineStack(app, "SMPipelineExecutionStack")
25+
start_sm_pipeline_stack.add_dependency(sm_pipeline_stack)
26+
27+
# This stack will create resources to get inference results from SageMaker Endpoint
28+
inference_results_stack = InferenceResultsStack(app, "InferenceResultsStack")
29+
inference_results_stack.add_dependency(start_sm_pipeline_stack)
30+
31+
Aspects.of(app).add(cdk_nag.AwsSolutionsChecks(reports=True, verbose=True))
32+
33+
app.synth()

cdk.json

Lines changed: 61 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,61 @@
1+
{
2+
"app": "python3 app.py",
3+
"watch": {
4+
"include": [
5+
"**"
6+
],
7+
"exclude": [
8+
"README.md",
9+
"cdk*.json",
10+
"requirements*.txt",
11+
"source.bat",
12+
"**/__init__.py",
13+
"**/__pycache__",
14+
"tests"
15+
]
16+
},
17+
"context": {
18+
"@aws-cdk/aws-lambda:recognizeLayerVersion": true,
19+
"@aws-cdk/core:checkSecretUsage": true,
20+
"@aws-cdk/core:target-partitions": [
21+
"aws",
22+
"aws-cn"
23+
],
24+
"@aws-cdk-containers/ecs-service-extensions:enableDefaultLogDriver": true,
25+
"@aws-cdk/aws-ec2:uniqueImdsv2TemplateName": true,
26+
"@aws-cdk/aws-ecs:arnFormatIncludesClusterName": true,
27+
"@aws-cdk/aws-iam:minimizePolicies": true,
28+
"@aws-cdk/core:validateSnapshotRemovalPolicy": true,
29+
"@aws-cdk/aws-codepipeline:crossAccountKeyAliasStackSafeResourceName": true,
30+
"@aws-cdk/aws-s3:createDefaultLoggingPolicy": true,
31+
"@aws-cdk/aws-sns-subscriptions:restrictSqsDescryption": true,
32+
"@aws-cdk/aws-apigateway:disableCloudWatchRole": true,
33+
"@aws-cdk/core:enablePartitionLiterals": true,
34+
"@aws-cdk/aws-events:eventsTargetQueueSameAccount": true,
35+
"@aws-cdk/aws-iam:standardizedServicePrincipals": true,
36+
"@aws-cdk/aws-ecs:disableExplicitDeploymentControllerForCircuitBreaker": true,
37+
"@aws-cdk/aws-iam:importedRoleStackSafeDefaultPolicyName": true,
38+
"@aws-cdk/aws-s3:serverAccessLogsUseBucketPolicy": true,
39+
"@aws-cdk/aws-route53-patters:useCertificate": true,
40+
"@aws-cdk/customresources:installLatestAwsSdkDefault": false,
41+
"@aws-cdk/aws-rds:databaseProxyUniqueResourceName": true,
42+
"@aws-cdk/aws-codedeploy:removeAlarmsFromDeploymentGroup": true,
43+
"@aws-cdk/aws-apigateway:authorizerChangeDeploymentLogicalId": true,
44+
"@aws-cdk/aws-ec2:launchTemplateDefaultUserData": true,
45+
"@aws-cdk/aws-secretsmanager:useAttachedSecretResourcePolicyForSecretTargetAttachments": true,
46+
"@aws-cdk/aws-redshift:columnId": true,
47+
"@aws-cdk/aws-stepfunctions-tasks:enableEmrServicePolicyV2": true,
48+
"@aws-cdk/aws-ec2:restrictDefaultSecurityGroup": true,
49+
"@aws-cdk/aws-apigateway:requestValidatorUniqueId": true,
50+
"@aws-cdk/aws-kms:aliasNameRef": true,
51+
"@aws-cdk/aws-autoscaling:generateLaunchTemplateInsteadOfLaunchConfig": true,
52+
"@aws-cdk/core:includePrefixInUniqueNameGeneration": true,
53+
"@aws-cdk/aws-efs:denyAnonymousAccess": true,
54+
"@aws-cdk/aws-opensearchservice:enableOpensearchMultiAzWithStandby": true,
55+
"@aws-cdk/aws-lambda-nodejs:useLatestRuntimeVersion": true,
56+
"@aws-cdk/aws-efs:mountTargetOrderInsensitiveLogicalId": true,
57+
"@aws-cdk/aws-rds:auroraClusterChangeScopeOfInstanceParameterGroupWithEachParameters": true,
58+
"@aws-cdk/aws-appsync:useArnForSourceApiAssociationIdentifier": true,
59+
"@aws-cdk/aws-rds:preventRenderingDeprecatedCredentials": true
60+
}
61+
}

images/architecture_inference.jpg

46.1 KB
Loading
165 KB
Loading

images/code_pipeline.png

293 KB
Loading

images/model_registry.png

251 KB
Loading

images/pipeline_execution.png

276 KB
Loading

images/sm_endpoint.png

173 KB
Loading

project_config.json

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,7 @@
1+
{
2+
"MainStackName":"SMPipelineRootStack",
3+
"SageMakerPipelineName": "model-train-deploy-pipeline",
4+
"SageMakerUserProfiles": ["User1", "User2"],
5+
"USE_AMT": "yes"
6+
}
7+

0 commit comments

Comments
 (0)