-
Notifications
You must be signed in to change notification settings - Fork 230
kcl github workflow #336
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
kcl github workflow #336
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,19 @@ | ||
| #!/bin/bash | ||
|
|
||
| aws kinesis delete-stream --stream-name $STREAM_NAME || true | ||
|
|
||
| # Delete all tables | ||
| for i in {1..10}; do | ||
| echo "Deleting table $APP_NAME" | ||
| aws dynamodb delete-table --table-name $APP_NAME && break || | ||
| echo "Table deletion failed, attempt $i/10. Retrying DynamoDB Table deletion in $((i * 3)) seconds" && sleep $((i * 3)) | ||
| done | ||
| for SUFFIX in "-CoordinatorState" "-WorkerMetricStats"; do | ||
| if aws dynamodb describe-table --table-name $APP_NAME$SUFFIX &>/dev/null; then | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Why do we need to describe on these tables but not the main lease table?
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The describe is used to check if the tables exist, and the main lease table basically always exists. In some cases either the CoordinatorState table or the WorkerMetricStats table aren't created by the end of the allotted runtime, and would cause problems when trying to delete a non-existent table. I now realize I could just have a blank suffix for the main lease table and have them all be in the same clean-up loop.
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Its possible for the main lease table to not exist if there is a runtime error right at the beginning of KCL app and nothing is able to start up. There is a LeaseCoordinator thread which runs which creates the lease table if it doesn't exist
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I'll account for the edge case of the main lease table not existing by also checking for its existence |
||
| echo "Deleting table $APP_NAME$SUFFIX" | ||
| for i in {1..10}; do | ||
| aws dynamodb delete-table --table-name $APP_NAME$SUFFIX && break || | ||
| echo "Table deletion failed, attempt $i/10. Retrying DynamoDB Table deletion in $((i * 3))s" && sleep $((i * 3)) | ||
| done | ||
| fi | ||
| done | ||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,12 @@ | ||
| #!/bin/bash | ||
| set -e | ||
|
|
||
| for i in {1..10}; do | ||
| if aws kinesis create-stream --stream-name $STREAM_NAME --shard-count 1; then | ||
| break | ||
| else | ||
| echo "Stream creation failed, attempt $i/10. Waiting $((i * 3)) seconds..." | ||
| sleep $((i * 3)) | ||
| fi | ||
| done | ||
| aws kinesis wait stream-exists --stream-name $STREAM_NAME |
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,29 @@ | ||
| #!/bin/bash | ||
| set -e | ||
|
|
||
| # Manipulate sample.properties file that the KCL application pulls properties from (ex: streamName, applicationName) | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. If we ever change the sample properties, some of this may not work anymore right? I see there are some assumptions of what the sample properties should be and some explicit find and replace. Feels finicky and can be easily broken in the future if for example we add a new property. Would it be easier to just create a folder which contains the properties files we are going to use for the github actions and have the workflow use that file?
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This is true. I had an aversion to adding any new files since I wanted the tests to use the same files that the customers would from the readme to avoid any additional overhead/work for future changes, but this is fragile implementation and would (imo) justify "maintaining" two different samples.properties files. |
||
| # Depending on the OS, different properties need to be changed | ||
| if [[ "$RUNNER_OS" == "macOS" ]]; then | ||
| sed -i "" "s/kclpysample/$STREAM_NAME/g" samples/sample.properties | ||
| sed -i "" "s/PythonKCLSample/$APP_NAME/g" samples/sample.properties | ||
| sed -i "" 's/us-east-5/us-east-1/g' samples/sample.properties | ||
| grep -v "idleTimeBetweenReadsInMillis" samples/sample.properties > samples/temp.properties | ||
| echo "idleTimeBetweenReadsInMillis = 250" >> samples/temp.properties | ||
| mv samples/temp.properties samples/sample.properties | ||
| elif [[ "$RUNNER_OS" == "Linux" || "$RUNNER_OS" == "Windows" ]]; then | ||
| sed -i "s/kclpysample/$STREAM_NAME/g" samples/sample.properties | ||
| sed -i "s/PythonKCLSample/$APP_NAME/g" samples/sample.properties | ||
| sed -i 's/us-east-5/us-east-1/g' samples/sample.properties | ||
| sed -i "/idleTimeBetweenReadsInMillis/c\idleTimeBetweenReadsInMillis = 250" samples/sample.properties | ||
|
|
||
| if [[ "$RUNNER_OS" == "Windows" ]]; then | ||
| echo '@echo off' > samples/run_script.bat | ||
| echo 'python %~dp0\sample_kclpy_app.py %*' >> samples/run_script.bat | ||
| sed -i 's/executableName = sample_kclpy_app.py/executableName = samples\/run_script.bat/' samples/sample.properties | ||
| fi | ||
| else | ||
| echo "Unknown OS: $RUNNER_OS" | ||
| exit 1 | ||
| fi | ||
|
|
||
| cat samples/sample.properties | ||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,15 @@ | ||
| #!/bin/bash | ||
| set -e | ||
|
|
||
| sample_kinesis_wordputter.py --stream $STREAM_NAME -w cat -w dog -w bird -w lobster -w octopus | ||
|
|
||
| # Get records from stream to verify they exist before continuing | ||
| SHARD_ITERATOR=$(aws kinesis get-shard-iterator --stream-name $STREAM_NAME --shard-id shardId-000000000000 --shard-iterator-type TRIM_HORIZON --query 'ShardIterator' --output text) | ||
| INITIAL_RECORDS=$(aws kinesis get-records --shard-iterator $SHARD_ITERATOR) | ||
| RECORD_COUNT_BEFORE=$(echo $INITIAL_RECORDS | jq '.Records | length') | ||
|
|
||
| if [ "$RECORD_COUNT_BEFORE" -eq 0 ]; then | ||
| echo "No records found in stream. Test cannot proceed." | ||
| exit 1 | ||
| fi | ||
| echo "Found $RECORD_COUNT_BEFORE records in stream before KCL start" |
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,28 @@ | ||
| #!/bin/bash | ||
| set -e | ||
| set -o pipefail | ||
|
|
||
| chmod +x samples/sample.properties | ||
| chmod +x samples/sample_kclpy_app.py | ||
|
|
||
| # Get records from stream to verify they exist before continuing | ||
| SHARD_ITERATOR=$(aws kinesis get-shard-iterator --stream-name $STREAM_NAME --shard-id shardId-000000000000 --shard-iterator-type TRIM_HORIZON --query 'ShardIterator' --output text) | ||
| INITIAL_RECORDS=$(aws kinesis get-records --shard-iterator $SHARD_ITERATOR) | ||
| RECORD_COUNT_BEFORE=$(echo $INITIAL_RECORDS | jq '.Records | length') | ||
|
|
||
| echo "Found $RECORD_COUNT_BEFORE records in stream before KCL start" | ||
|
Comment on lines
+8
to
+13
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Why do we do this a second time? I see this is also done in
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. It is a redundant check to make sure the $STREAM_NAME variable has records in the stream. Its not necessary and could be removed. |
||
|
|
||
| if [[ "$RUNNER_OS" == "macOS" ]]; then | ||
| brew install coreutils | ||
| KCL_COMMAND=$(amazon_kclpy_helper.py --print_command --java $(which java) --properties samples/sample.properties) | ||
| gtimeout $RUN_TIME_SECONDS $KCL_COMMAND 2>&1 | tee kcl_output.log || [ $? -eq 124 ] | ||
| elif [[ "$RUNNER_OS" == "Linux" || "$RUNNER_OS" == "Windows" ]]; then | ||
| KCL_COMMAND=$(amazon_kclpy_helper.py --print_command --java $(which java) --properties samples/sample.properties) | ||
| timeout $RUN_TIME_SECONDS $KCL_COMMAND 2>&1 | tee kcl_output.log || [ $? -eq 124 ] | ||
| else | ||
| echo "Unknown OS: $RUNNER_OS" | ||
| exit 1 | ||
| fi | ||
|
|
||
| echo "==========ERROR LOGS==========" | ||
| grep -i error kcl_output.log || echo "No errors found in logs" | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Wonder if there's anything else we would care about besides "error" here, like "fatal", "failed" or something even more general like "exception". Don't think its needed for now and we can always look at failed runs in the future and add stuff to this grep if we find ones that time out but don't show error logs.
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I hadn't found many other interesting key terms to search for other than "error", usually when something fails or is of note we don't need to check this section anyway. Like you said in the future we could expand the search but for now I think it should be good. |
||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,20 @@ | ||
| #!/bin/bash | ||
| set -e | ||
|
|
||
| NUM_LEASES_FOUND=$(aws dynamodb scan --table-name $APP_NAME --select "COUNT" --query "Count" --output text || echo "0") | ||
| NUM_CHECKPOINTS_FOUND=$(aws dynamodb scan --table-name $APP_NAME --select "COUNT" --filter-expression "attribute_exists(checkpoint) AND checkpoint <> :trim_horizon" --expression-attribute-values '{":trim_horizon": {"S": "TRIM_HORIZON"}}' --query "Count" --output text || echo "0") | ||
|
|
||
| echo "Found $NUM_LEASES_FOUND leases and $NUM_CHECKPOINTS_FOUND non-TRIM-HORIZON checkpoint in DynamoDB" | ||
|
|
||
| echo "Printing checkpoint values" | ||
| aws dynamodb scan --table-name $APP_NAME --projection-expression "leaseKey,checkpoint" --output json | ||
|
|
||
| if [ "$NUM_LEASES_FOUND" -gt 0 ] && [ "$NUM_CHECKPOINTS_FOUND" -gt 0 ]; then | ||
| echo "Test passed: Found both leases and non-TRIM_HORIZON checkpoints in DDB (KCL is fully functional)" | ||
| exit 0 | ||
| else | ||
| echo "Test failed: KCL not fully functional" | ||
| echo "Lease(s) found: $NUM_LEASES_FOUND" | ||
| echo "non-TRIM_HORIZON checkpoint(s) found: $NUM_CHECKPOINTS_FOUND" | ||
| exit 1 | ||
| fi |
This file was deleted.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Recommend making a bash function like
So the primary script can just make an array with the three KCL tables and call the function to reduce the code duplication