Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
158 changes: 158 additions & 0 deletions documentdb-playground/keda-autoscaling/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,158 @@
# KEDA autoscaling with DocumentDB

This playground demonstrates event-driven autoscaling using [KEDA](https://keda.sh/) with DocumentDB. KEDA's [MongoDB scaler](https://keda.sh/docs/2.19/scalers/mongodb/) polls a DocumentDB collection for pending jobs and automatically scales a worker Deployment from 0 to N pods based on the query result.

## Architecture

```mermaid
flowchart LR
subgraph cluster["Kubernetes cluster"]
KEDA[KEDA Operator] -->|"polls appdb.jobs<br/>every 5s"| DDB[(DocumentDB<br/>port 10260)]
KEDA -->|scales 0 → N| Worker["job-worker<br/>Deployment"]
Seed[seed-jobs Job] -->|inserts pending jobs| DDB
Drain[drain-jobs Job] -->|marks jobs completed| DDB
end
```

## Prerequisites

| Tool | Version | Purpose |
|------|---------|---------|
| [Kind](https://kind.sigs.k8s.io/) | 0.20+ | Local Kubernetes cluster |
| [kubectl](https://kubernetes.io/docs/tasks/tools/) | 1.30+ | Kubernetes CLI |
| [Helm](https://helm.sh/docs/intro/install/) | 3.x | Package manager for Kubernetes |
| DocumentDB operator | — | Must be running in the Kubernetes cluster |

> **Note:** If you don't have a Kubernetes cluster with the DocumentDB operator, use the
> [development deploy script](https://github.com/documentdb/documentdb-kubernetes-operator/blob/main/operator/src/scripts/development/deploy.sh)
> to set up Kind with the operator:
>
> ```bash
> cd operator/src
> DEPLOY=true DEPLOY_CLUSTER=true ./scripts/development/deploy.sh
> ```

## Quick start

```bash
# 1. Ensure the DocumentDB operator is running on your Kubernetes cluster

# 2. Deploy KEDA and the demo
cd documentdb-playground/keda-autoscaling
./scripts/setup.sh

# 3. Watch pods scale up (10 pending jobs trigger scaling)
kubectl get pods -n app -w

# 4. Drain jobs to scale back to 0
kubectl delete job drain-pending-jobs -n app --ignore-not-found
kubectl apply -f manifests/drain-jobs.yaml

# 5. Clean up
./scripts/teardown.sh
```

## How it works

1. **`setup.sh`** installs KEDA and deploys a DocumentDB instance with a worker Deployment.
2. A `ClusterTriggerAuthentication` stores the DocumentDB connection string (with TLS and auth parameters) in a Secret.
3. A `ScaledObject` configures KEDA to poll the `appdb.jobs` collection for documents matching `{"status": "pending"}` every 5 seconds.
4. When the number of pending jobs exceeds 5, KEDA creates an HPA that scales the `job-worker` Deployment from 0 up to 10 pods.
5. The **seed job** inserts 10 pending documents to trigger scaling.
6. The **drain job** updates all pending jobs to `"completed"`, causing KEDA to scale the Deployment back to 0 after the 30-second cooldown.

## Connection string configuration

The setup script builds a connection string for DocumentDB and stores it in a Kubernetes Secret:

```text
mongodb://<user>:<pass>@documentdb-service-keda-demo.documentdb-ns.svc.cluster.local:10260/?directConnection=true&authMechanism=SCRAM-SHA-256&tls=true&tlsInsecure=true
```

| Parameter | Value | Why it's needed |
|-----------|-------|-----------------|
| Port | `10260` | DocumentDB Gateway port (not MongoDB's default 27017) |
| `directConnection=true` | Required | DocumentDB doesn't run a real replica set topology |
| `authMechanism=SCRAM-SHA-256` | Required | DocumentDB's authentication mechanism |
| `tls=true` | Required | DocumentDB Gateway serves TLS by default |
| `tlsInsecure=true` | For self-signed | Skip certificate validation with the default self-signed certificates |

## Script reference

### `setup.sh`

Installs KEDA, deploys a DocumentDB instance, and configures the autoscaling demo.

| Environment variable | Description | Default |
|---------------------|-------------|---------|
| `DOCUMENTDB_NAMESPACE` | Namespace for the DocumentDB instance | `documentdb-ns` |
| `DOCUMENTDB_NAME` | DocumentDB instance name | `keda-demo` |
| `APP_NAMESPACE` | Namespace for KEDA resources and the worker | `app` |
| `KEDA_VERSION` | KEDA Helm chart version | `2.17.0` |

### `teardown.sh`

Removes demo resources. By default, KEDA and DocumentDB are preserved.

| Flag | Description |
|------|-------------|
| `--uninstall-keda` | Also uninstall the KEDA Helm release |
| `--delete-documentdb` | Also delete the DocumentDB instance |

### `demo.sh`

Interactive walkthrough that seeds jobs, watches scaling, drains jobs, and watches scale-down. Pauses between steps so you can observe the behavior.

## Manifest reference

| File | Description |
|------|-------------|
| `manifests/documentdb-instance.yaml` | DocumentDB CR (1 node, 2Gi storage) for Kind |
| `manifests/keda-trigger-auth.yaml` | `ClusterTriggerAuthentication` referencing the DocumentDB connection Secret |
| `manifests/keda-scaled-object.yaml` | `ScaledObject` with MongoDB trigger — polls `appdb.jobs` for pending documents |
| `manifests/job-worker.yaml` | Worker Deployment that KEDA scales (starts at 0 replicas) |
| `manifests/seed-jobs.yaml` | Job that inserts 10 pending documents into DocumentDB |
| `manifests/drain-jobs.yaml` | Job that marks all pending documents as completed |

## Known limitations and gotchas

> **Important:** KEDA's MongoDB scaler uses the Go MongoDB driver internally. You must use
> `tlsInsecure=true` (not `tlsAllowInvalidCertificates=true`) in the connection string. The
> Go driver only respects `tlsInsecure` for skipping both certificate and hostname verification.
> Using `tlsAllowInvalidCertificates=true` alone causes `x509: certificate is not valid for any
> names` errors because hostname verification still applies. For production, switch DocumentDB to
> `CertManager` TLS mode — see the
> [TLS configuration documentation](https://github.com/documentdb/documentdb-kubernetes-operator/blob/main/docs/operator-public-documentation/preview/configuration/tls.md).

> **Note:** The `directConnection=true` parameter is essential. Without it, the Go MongoDB driver
> attempts replica set discovery, which fails with DocumentDB because it doesn't expose a standard
> MongoDB replica set topology.

> **Note:** Remove the `replicaSet=rs0` parameter from the connection string. KEDA's Go driver
> fails topology negotiation with DocumentDB when `replicaSet` is specified alongside
> `directConnection=true`. The setup script strips this parameter automatically.

> **Note:** A `ClusterTriggerAuthentication` is used because the DocumentDB credentials Secret
> (`documentdb-ns`) and the ScaledObject (`app`) are in different namespaces. If you deploy
> everything in the same namespace, you can use a namespace-scoped `TriggerAuthentication` instead.

> **Note:** The `mongo:8.0` image is used for the seed and
> drain jobs because it includes `mongosh`. Any image with `mongosh` installed works.
Comment on lines +137 to +140
Copy link

Copilot AI Apr 2, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

README claims the seed/drain Jobs use mongodb/mongodb-community-server:8.0-ubuntu2404, but the actual manifests use mongo:8.0. Please update the README to match what gets deployed (or change the manifests), so users know which image to expect/pull.

Copilot uses AI. Check for mistakes.

## Cleanup

```bash
# Remove demo resources only (keep KEDA and DocumentDB)
./scripts/teardown.sh

# Remove everything including KEDA and DocumentDB
./scripts/teardown.sh --uninstall-keda --delete-documentdb
```

## Related resources

- [KEDA MongoDB scaler documentation](https://keda.sh/docs/2.19/scalers/mongodb/)
- [KEDA ScaledObject specification](https://keda.sh/docs/2.19/concepts/scaling-deployments/)
- [DocumentDB Kubernetes Operator documentation](https://documentdb.io/documentdb-kubernetes-operator/preview/)
- [Connecting to DocumentDB](https://github.com/documentdb/documentdb-kubernetes-operator/blob/main/docs/operator-public-documentation/preview/getting-started/connecting-to-documentdb.md)
- [DocumentDB TLS configuration](https://github.com/documentdb/documentdb-kubernetes-operator/blob/main/docs/operator-public-documentation/preview/configuration/tls.md)
Original file line number Diff line number Diff line change
@@ -0,0 +1,17 @@
apiVersion: documentdb.io/preview
kind: DocumentDB
metadata:
name: keda-demo
namespace: documentdb-ns
Comment on lines +4 to +5
Copy link

Copilot AI Apr 2, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This manifest hard-codes metadata.name/metadata.namespace (keda-demo / documentdb-ns). That conflicts with DOCUMENTDB_NAME/DOCUMENTDB_NAMESPACE being configurable in scripts/README and makes overrides ineffective. Consider removing the namespace/name from the manifest and letting the scripts supply them (or template this file).

Suggested change
name: keda-demo
namespace: documentdb-ns

Copilot uses AI. Check for mistakes.
labels:
app.kubernetes.io/part-of: keda-documentdb-demo
spec:
nodeCount: 1
instancesPerNode: 1
documentDbCredentialSecret: documentdb-credentials
Copy link

Copilot AI Apr 2, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

documentDbCredentialSecret is hard-coded to documentdb-credentials, but setup.sh allows overriding the secret name via DOCUMENTDB_SECRET. If a user changes DOCUMENTDB_SECRET, the DocumentDB CR will still reference documentdb-credentials and fail to authenticate. Align this with the script (template/patch the manifest or remove configurability).

Suggested change
documentDbCredentialSecret: documentdb-credentials
documentDbCredentialSecret: ${DOCUMENTDB_SECRET:-documentdb-credentials}

Copilot uses AI. Check for mistakes.
resource:
storage:
pvcSize: 2Gi
exposeViaService:
serviceType: ClusterIP
sidecarInjectorPluginName: cnpg-i-sidecar-injector.documentdb.io
Original file line number Diff line number Diff line change
@@ -0,0 +1,36 @@
apiVersion: batch/v1
kind: Job
metadata:
name: drain-pending-jobs
namespace: app
Copy link

Copilot AI Apr 2, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This Job hard-codes metadata.namespace: app, but APP_NAMESPACE is presented as configurable in scripts/README. If a user overrides APP_NAMESPACE, this Job will still run in app and may not find the Secret/ScaledObject in the expected namespace. Consider removing the namespace from the manifest and applying it with -n from the scripts (or template it).

Suggested change
namespace: app

Copilot uses AI. Check for mistakes.
labels:
app.kubernetes.io/part-of: keda-documentdb-demo
spec:
backoffLimit: 3
template:
metadata:
labels:
app.kubernetes.io/part-of: keda-documentdb-demo
spec:
restartPolicy: Never
containers:
- name: drain
image: mongo:8.0
command:
- /bin/sh
- -c
- |
mongosh "$MONGODB_URI" --eval '
const result = db.getSiblingDB("appdb").jobs.updateMany(
{status: "pending"},
{$set: {status: "completed", completedAt: new Date()}}
);
print("Marked " + result.modifiedCount + " jobs as completed");
print("Remaining pending: " + db.getSiblingDB("appdb").jobs.countDocuments({status: "pending"}));
'
env:
- name: MONGODB_URI
valueFrom:
secretKeyRef:
name: documentdb-keda-connection
key: connectionString
Original file line number Diff line number Diff line change
@@ -0,0 +1,30 @@
apiVersion: apps/v1
kind: Deployment
metadata:
name: job-worker
namespace: app
Copy link

Copilot AI Apr 2, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This Deployment hard-codes metadata.namespace: app, but setup.sh/README expose APP_NAMESPACE as configurable. If APP_NAMESPACE is overridden, the Deployment will still be created in app while the script looks in the overridden namespace. Consider removing the namespace from the manifest and applying with -n "$APP_NAMESPACE" (or templating).

Suggested change
namespace: app

Copilot uses AI. Check for mistakes.
labels:
app: job-worker
app.kubernetes.io/part-of: keda-documentdb-demo
spec:
replicas: 0
selector:
matchLabels:
app: job-worker
template:
metadata:
labels:
app: job-worker
app.kubernetes.io/part-of: keda-documentdb-demo
spec:
containers:
- name: worker
image: busybox:1.36
command: ["sh", "-c", "echo 'Processing job...' && sleep 30"]
resources:
requests:
cpu: 50m
memory: 64Mi
limits:
cpu: 100m
memory: 128Mi
Original file line number Diff line number Diff line change
@@ -0,0 +1,25 @@
apiVersion: keda.sh/v1alpha1
kind: ScaledObject
metadata:
name: documentdb-worker-scaler
namespace: app
Copy link

Copilot AI Apr 2, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This ScaledObject hard-codes metadata.namespace: app, but the scripts/README expose APP_NAMESPACE as configurable. Overriding APP_NAMESPACE will not move this resource, and subsequent kubectl get -n "$APP_NAMESPACE" calls will miss it. Consider templating/removing the namespace field so the scripts can control it.

Suggested change
namespace: app

Copilot uses AI. Check for mistakes.
labels:
app.kubernetes.io/part-of: keda-documentdb-demo
spec:
scaleTargetRef:
name: job-worker
minReplicaCount: 0
maxReplicaCount: 10
pollingInterval: 5
cooldownPeriod: 30
triggers:
- type: mongodb
metadata:
dbName: "appdb"
collection: "jobs"
query: '{"status":"pending"}'
queryValue: "5"
activationQueryValue: "1"
authenticationRef:
name: documentdb-trigger-auth
kind: ClusterTriggerAuthentication
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
apiVersion: keda.sh/v1alpha1
kind: ClusterTriggerAuthentication
metadata:
name: documentdb-trigger-auth
labels:
app.kubernetes.io/part-of: keda-documentdb-demo
spec:
secretTargetRef:
- parameter: connectionString
name: documentdb-keda-connection
key: connectionString
38 changes: 38 additions & 0 deletions documentdb-playground/keda-autoscaling/manifests/seed-jobs.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,38 @@
apiVersion: batch/v1
kind: Job
metadata:
name: seed-pending-jobs
namespace: app
Copy link

Copilot AI Apr 2, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This Job hard-codes metadata.namespace: app, but APP_NAMESPACE is presented as configurable in scripts/README. If a user overrides APP_NAMESPACE, this Job will still run in app and may not find the Secret/ScaledObject in the expected namespace. Consider removing the namespace from the manifest and applying it with -n from the scripts (or template it).

Suggested change
namespace: app

Copilot uses AI. Check for mistakes.
labels:
app.kubernetes.io/part-of: keda-documentdb-demo
spec:
backoffLimit: 3
template:
metadata:
labels:
app.kubernetes.io/part-of: keda-documentdb-demo
spec:
restartPolicy: Never
containers:
- name: seed
image: mongo:8.0
command:
- /bin/sh
- -c
- |
mongosh "$MONGODB_URI" --eval '
db.getSiblingDB("appdb").jobs.insertMany(
Array.from({length: 10}, (_, i) => ({
status: "pending",
created: new Date(),
data: "job-" + (i + 1)
}))
);
print("Pending jobs: " + db.getSiblingDB("appdb").jobs.countDocuments({status: "pending"}));
'
env:
- name: MONGODB_URI
valueFrom:
secretKeyRef:
name: documentdb-keda-connection
key: connectionString
55 changes: 55 additions & 0 deletions documentdb-playground/keda-autoscaling/scripts/demo.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,55 @@
#!/usr/bin/env bash
set -euo pipefail

APP_NAMESPACE="${APP_NAMESPACE:-app}"
SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
MANIFESTS_DIR="${SCRIPT_DIR}/../manifests"

GREEN='\033[0;32m'
YELLOW='\033[1;33m'
CYAN='\033[0;36m'
NC='\033[0m'

log() { echo -e "${GREEN}[INFO]${NC} $*"; }
step() { echo -e "\n${CYAN}=== $* ===${NC}\n"; }

pause() {
echo -e "${YELLOW}Press Enter to continue...${NC}"
read -r
}

step "Current state"
kubectl get scaledobject,hpa -n "$APP_NAMESPACE" 2>/dev/null || true
echo ""
kubectl get pods -n "$APP_NAMESPACE" 2>/dev/null || true

pause

step "Seeding 10 pending jobs into DocumentDB"
kubectl delete job seed-pending-jobs -n "$APP_NAMESPACE" --ignore-not-found=true 2>/dev/null
kubectl apply -f "${MANIFESTS_DIR}/seed-jobs.yaml"
kubectl wait --for=condition=complete job/seed-pending-jobs -n "$APP_NAMESPACE" --timeout=120s 2>/dev/null || true
kubectl logs job/seed-pending-jobs -n "$APP_NAMESPACE" 2>/dev/null || true

step "Watching pods scale up (Ctrl+C to continue)"
log "KEDA polls every 5 seconds. Worker pods should appear within 15-30 seconds."
timeout 60 kubectl get pods -n "$APP_NAMESPACE" -w 2>/dev/null || true

Comment on lines +34 to +37
Copy link

Copilot AI Apr 2, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

timeout is not available by default on some common dev environments (notably macOS without coreutils). Since this is a local playground, consider replacing these timeout ... kubectl -w calls with a portable loop + sleep, or add a prerequisite check/fallback so demo.sh doesn't fail immediately.

Copilot uses AI. Check for mistakes.
pause

step "Draining all pending jobs (marking as completed)"
kubectl delete job drain-pending-jobs -n "$APP_NAMESPACE" --ignore-not-found=true 2>/dev/null
kubectl apply -f "${MANIFESTS_DIR}/drain-jobs.yaml"
kubectl wait --for=condition=complete job/drain-pending-jobs -n "$APP_NAMESPACE" --timeout=120s 2>/dev/null || true
kubectl logs job/drain-pending-jobs -n "$APP_NAMESPACE" 2>/dev/null || true

step "Watching pods scale down (cooldown: 30 seconds)"
timeout 90 kubectl get pods -n "$APP_NAMESPACE" -w 2>/dev/null || true

step "Final state"
kubectl get scaledobject,hpa -n "$APP_NAMESPACE" 2>/dev/null || true
echo ""
kubectl get pods -n "$APP_NAMESPACE" 2>/dev/null || true

echo ""
log "Demo complete!"
Loading
Loading