diff --git a/developer_manual/exapp_development/faq/Scaling.rst b/developer_manual/exapp_development/faq/Scaling.rst
deleted file mode 100644
index 117bee5d2d6..00000000000
--- a/developer_manual/exapp_development/faq/Scaling.rst
+++ /dev/null
@@ -1,22 +0,0 @@
-Scaling
-=======
-
-AppAPI delegates the scaling task to the ExApp itself.
-This means that the ExApp must be designed in a way to be able to scale vertically.
-As for the horizontal scaling, it is currently not possible except by using,
-for example, a Server-Worker architecture, which is a good way to support basic scaling capabilities.
-In this case, the Server is your ExApp and the Workers are the external machines that can work with the ExApp
-using Nextcloud user authentication.
-Additional clients (or workers) can be (optionally) added (or attached) to the ExApp
-to increase the capacity and performance.
-
-
-GPUs scaling
-------------
-
-Currently, if a Deploy daemon configured with GPUs available,
-AppAPI by default will attach all available GPU devices to each ExApp container on this Deploy daemon.
-This means that these GPUs are shared between all ExApps on the same Deploy daemon.
-Therefore, for the ExApps that require heavy use of GPUs,
-it is recommended to have a separate Deploy daemon (host) for them.
-
diff --git a/developer_manual/exapp_development/faq/index.rst b/developer_manual/exapp_development/faq/index.rst
index 97dd9ddddb9..0105f6b8f05 100644
--- a/developer_manual/exapp_development/faq/index.rst
+++ b/developer_manual/exapp_development/faq/index.rst
@@ -19,6 +19,5 @@ or provide a brief answer.
DockerContainerRegistry
DockerSocketProxy
GpuSupport
- Scaling
BehindCompanyProxy
Troubleshooting
diff --git a/developer_manual/exapp_development/index.rst b/developer_manual/exapp_development/index.rst
index 2625bde34ae..e62766e5d55 100644
--- a/developer_manual/exapp_development/index.rst
+++ b/developer_manual/exapp_development/index.rst
@@ -7,6 +7,7 @@ ExApp development
Introduction
DevSetup
+ scaling/index.rst
development_overview/index.rst
tech_details/index.rst
faq/index.rst
diff --git a/developer_manual/exapp_development/scaling/AppAPIEmulation.rst b/developer_manual/exapp_development/scaling/AppAPIEmulation.rst
new file mode 100644
index 00000000000..ec9dde77944
--- /dev/null
+++ b/developer_manual/exapp_development/scaling/AppAPIEmulation.rst
@@ -0,0 +1,380 @@
+Emulating AppAPI
+================
+
+This section documents the ``curl`` commands used to emulate AppAPI when
+testing HaRP’s Kubernetes backend.
+
+Prerequisites
+-------------
+
+- HaRP is reachable at: ``http://nextcloud.local/exapps``
+- HaRP was started with the same shared key as used below
+ (``HP_SHARED_KEY``)
+- HaRP has Kubernetes backend enabled (``HP_K8S_ENABLED=true``) and can
+ access the k8s API
+- ``kubectl`` is configured to point to the same cluster HaRP uses
+- Optional: ``jq`` for parsing JSON responses
+
+Environment variables
+---------------------
+
+.. code:: bash
+
+ # .env
+ EXAPPS_URL="http://nextcloud.local/exapps"
+ APPAPI_URL="${EXAPPS_URL}/app_api"
+ HP_SHARED_KEY="some_very_secure_password"
+
+ # Optional: Nextcloud base (only used by ExApp container env in this guide)
+ NEXTCLOUD_URL="http://nextcloud.local"
+
+.. code:: bash
+
+ source .env
+
+..
+
+ Notes:
+
+ - All AppAPI-emulation calls go to ``$APPAPI_URL/...`` and require
+ the header ``harp-shared-key``.
+ - You can also hit the agent directly on
+ ``http://127.0.0.1:8200/...`` for debugging, but that bypasses the
+ HAProxy/AppAPI path and may skip shared-key enforcement depending
+ on your routing.
+
+--------------
+
+1) Check if ExApp is present (k8s deployment exists)
+----------------------------------------------------
+
+.. code:: bash
+
+ curl -sS \
+ -H "harp-shared-key: $HP_SHARED_KEY" \
+ -H "Content-Type: application/json" \
+ -X POST \
+ -d '{
+ "name": "test-deploy",
+ "instance_id": ""
+ }' \
+ "$APPAPI_URL/k8s/exapp/exists"
+
+Expected output:
+
+.. code:: json
+
+ {"exists": true}
+
+or
+
+.. code:: json
+
+ {"exists": false}
+
+--------------
+
+2) Create ExApp (PVC + Deployment with replicas=0)
+--------------------------------------------------
+
+.. code:: bash
+
+ curl -sS \
+ -H "harp-shared-key: $HP_SHARED_KEY" \
+ -H "Content-Type: application/json" \
+ -X POST \
+ -d '{
+ "name": "test-deploy",
+ "instance_id": "",
+ "image": "ghcr.io/nextcloud/test-deploy:latest",
+ "environment_variables": [
+ "APP_ID=test-deploy",
+ "APP_DISPLAY_NAME=Test Deploy",
+ "APP_VERSION=1.2.1",
+ "APP_HOST=0.0.0.0",
+ "APP_PORT=23000",
+ "NEXTCLOUD_URL='"$NEXTCLOUD_URL"'",
+ "APP_SECRET=some-dev-secret",
+ "APP_PERSISTENT_STORAGE=/nc_app_test-deploy_data"
+ ],
+ "resource_limits": { "cpu": "500m", "memory": "512Mi" }
+ }' \
+ "$APPAPI_URL/k8s/exapp/create"
+
+Expected output (example):
+
+.. code:: json
+
+ {"name":"nc-app-test-deploy"}
+
+--------------
+
+3) Start ExApp (scale replicas to 1)
+------------------------------------
+
+.. code:: bash
+
+ curl -sS \
+ -H "harp-shared-key: $HP_SHARED_KEY" \
+ -H "Content-Type: application/json" \
+ -X POST \
+ -d '{
+ "name": "test-deploy",
+ "instance_id": ""
+ }' \
+ "$APPAPI_URL/k8s/exapp/start"
+
+Expected: HTTP 204.
+
+--------------
+
+4) Wait for ExApp to become Ready
+---------------------------------
+
+.. code:: bash
+
+ curl -sS \
+ -H "harp-shared-key: $HP_SHARED_KEY" \
+ -H "Content-Type: application/json" \
+ -X POST \
+ -d '{
+ "name": "test-deploy",
+ "instance_id": ""
+ }' \
+ "$APPAPI_URL/k8s/exapp/wait_for_start"
+
+Expected output (example):
+
+.. code:: json
+
+ {
+ "started": true,
+ "status": "running",
+ "health": "ready",
+ "reason": null,
+ "message": null
+ }
+
+--------------
+
+5) Expose + register in HaRP
+----------------------------
+
+5.1 NodePort (default behavior)
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+**Minimal (uses defaults, may auto-pick a node address):**
+
+.. code:: bash
+
+ EXPOSE_JSON=$(
+ curl -sS \
+ -H "harp-shared-key: $HP_SHARED_KEY" \
+ -H "Content-Type: application/json" \
+ -X POST \
+ -d '{
+ "name": "test-deploy",
+ "instance_id": "",
+ "port": 23000,
+ "expose_type": "nodeport"
+ }' \
+ "$APPAPI_URL/k8s/exapp/expose"
+ )
+
+ echo "$EXPOSE_JSON"
+
+**Recommended (provide a stable host reachable by HaRP):**
+
+.. code:: bash
+
+ # Example: edge node IP / VIP / L4 LB that forwards NodePort range
+ UPSTREAM_HOST="172.18.0.2"
+
+ EXPOSE_JSON=$(
+ curl -sS \
+ -H "harp-shared-key: $HP_SHARED_KEY" \
+ -H "Content-Type: application/json" \
+ -X POST \
+ -d '{
+ "name": "test-deploy",
+ "instance_id": "",
+ "port": 23000,
+ "expose_type": "nodeport",
+ "upstream_host": "'"$UPSTREAM_HOST"'"
+ }' \
+ "$APPAPI_URL/k8s/exapp/expose"
+ )
+
+ echo "$EXPOSE_JSON"
+
+5.2 ClusterIP (only if HaRP can reach ClusterIP + resolve service DNS)
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+.. code:: bash
+
+ EXPOSE_JSON=$(
+ curl -sS \
+ -H "harp-shared-key: $HP_SHARED_KEY" \
+ -H "Content-Type: application/json" \
+ -X POST \
+ -d '{
+ "name": "test-deploy",
+ "instance_id": "",
+ "port": 23000,
+ "expose_type": "clusterip"
+ }' \
+ "$APPAPI_URL/k8s/exapp/expose"
+ )
+
+ echo "$EXPOSE_JSON"
+
+5.3 Manual (HaRP does not create or inspect any Service)
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+.. code:: bash
+
+ EXPOSE_JSON=$(
+ curl -sS \
+ -H "harp-shared-key: $HP_SHARED_KEY" \
+ -H "Content-Type: application/json" \
+ -X POST \
+ -d '{
+ "name": "test-deploy",
+ "instance_id": "",
+ "port": 23000,
+ "expose_type": "manual",
+ "upstream_host": "exapp-test-deploy.internal",
+ "upstream_port": 23000
+ }' \
+ "$APPAPI_URL/k8s/exapp/expose"
+ )
+
+ echo "$EXPOSE_JSON"
+
+--------------
+
+6) Extract exposed host/port for follow-up tests (requires ``jq``)
+------------------------------------------------------------------
+
+.. code:: bash
+
+ EXAPP_HOST=$(echo "$EXPOSE_JSON" | jq -r '.host')
+ EXAPP_PORT=$(echo "$EXPOSE_JSON" | jq -r '.port')
+
+ echo "ExApp upstream endpoint: ${EXAPP_HOST}:${EXAPP_PORT}"
+
+--------------
+
+7) Check ``/heartbeat`` via HaRP routing (AppAPI-style direct routing headers)
+------------------------------------------------------------------------------
+
+This checks HaRP’s ability to route to the ExApp given an explicit
+upstream host/port and AppAPI-style authorization header.
+
+7.1 Build ``authorization-app-api`` value
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+HaRP typically expects this value to be the **base64-encoded value of**
+``user_id:APP_SECRET`` (similar to HTTP Basic without the ``Basic``
+prefix). For an “anonymous” style request, use ``:APP_SECRET``.
+
+.. code:: bash
+
+ # Option A: anonymous-style
+ AUTH_APP_API=$(printf '%s' ':some-dev-secret' | base64 | tr -d '\n')
+
+ # Option B: user-scoped style (example user "admin")
+ # AUTH_APP_API=$(printf '%s' 'admin:some-dev-secret' | base64 | tr -d '\n')
+
+7.2 Call heartbeat
+~~~~~~~~~~~~~~~~~~
+
+.. code:: bash
+
+ curl -sS \
+ "http://nextcloud.local/exapps/test-deploy/heartbeat" \
+ -H "harp-shared-key: $HP_SHARED_KEY" \
+ -H "ex-app-version: 1.2.1" \
+ -H "ex-app-id: test-deploy" \
+ -H "ex-app-host: $EXAPP_HOST" \
+ -H "ex-app-port: $EXAPP_PORT" \
+ -H "authorization-app-api: $AUTH_APP_API"
+
+If this fails with auth-related errors, verify:
+
+- ``APP_SECRET`` in the ExApp matches what you used here,
+- your HaProxy config expectations for ``authorization-app-api`` (raw
+ vs base64).
+
+--------------
+
+8) Stop and remove (API-based cleanup)
+--------------------------------------
+
+Stop ExApp (scale replicas to 0)
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+.. code:: bash
+
+ curl -sS \
+ -H "harp-shared-key: $HP_SHARED_KEY" \
+ -H "Content-Type: application/json" \
+ -X POST \
+ -d '{
+ "name": "test-deploy",
+ "instance_id": ""
+ }' \
+ "$APPAPI_URL/k8s/exapp/stop"
+
+Remove ExApp (Deployment + optional PVC; Service may be removed depending on HaRP version)
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+.. code:: bash
+
+ curl -sS \
+ -H "harp-shared-key: $HP_SHARED_KEY" \
+ -H "Content-Type: application/json" \
+ -X POST \
+ -d '{
+ "name": "test-deploy",
+ "instance_id": "",
+ "remove_data": true
+ }' \
+ "$APPAPI_URL/k8s/exapp/remove"
+
+--------------
+
+Useful ``kubectl`` commands (debug / manual cleanup)
+----------------------------------------------------
+
+Check resources
+~~~~~~~~~~~~~~~
+
+.. code:: bash
+
+ kubectl get deploy,svc,pvc -n nextcloud-exapps -o wide | grep -E 'test-deploy|NAME' || true
+ kubectl get pods -n nextcloud-exapps -o wide
+
+Delete Service (if it was exposed and needs manual cleanup)
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+.. code:: bash
+
+ kubectl delete svc nc-app-test-deploy -n nextcloud-exapps
+
+Delete Deployment
+~~~~~~~~~~~~~~~~~
+
+.. code:: bash
+
+ kubectl delete deployment nc-app-test-deploy -n nextcloud-exapps
+
+Delete PVC (data)
+~~~~~~~~~~~~~~~~~
+
+PVC name is derived from ``nc_app_test-deploy_data`` and sanitized for
+k8s, typically: ``nc-app-test-deploy-data``
+
+.. code:: bash
+
+ kubectl delete pvc nc-app-test-deploy-data -n nextcloud-exapps
diff --git a/developer_manual/exapp_development/scaling/KEDASetup.rst b/developer_manual/exapp_development/scaling/KEDASetup.rst
new file mode 100644
index 00000000000..f7c015add69
--- /dev/null
+++ b/developer_manual/exapp_development/scaling/KEDASetup.rst
@@ -0,0 +1,541 @@
+Autoscaling with KEDA
+=====================
+
+This section explains how to set up `KEDA `__ to auto-scale ExApp pods
+(e.g. ``llm2``) based on the Nextcloud TaskProcessing queue depth.
+
+Prerequisites
+-------------
+
+- A working Nextcloud + HaRP + k8s setup (see
+ :ref:`scaling-kubernetes-setup`)
+- An ExApp deployed and running (e.g. ``llm2`` with deployment name
+ ``nc-app-llm2``)
+- ``kubectl`` configured and pointing to the cluster
+- ``helm`` installed (`install
+ guide `__)
+- For GPU ExApps: the daemon must be registered with
+ ``--compute_device=cuda``
+
+Architecture
+------------
+
+::
+
+ Users submit tasks
+ |
+ v
+ Nextcloud TaskProcessing Queue
+ (scheduled + running tasks)
+ |
+ | GET /ocs/v2.php/taskprocessing/queue_stats
+ | Auth: Basic (admin app_password)
+ |
+ v
+ KEDA (metrics-api-server in k8s)
+ |
+ | polls every pollingInterval (e.g. 15s)
+ | scales Deployment based on queue depth
+ |
+ v
+ nc-app-llm2 Deployment (1..N pods)
+ Each pod independently calls next_task()
+
+KEDA uses a ``metrics`` trigger (HTTP polling) to query Nextcloud
+``queue_stats`` endpoint. When the queue grows, KEDA scales up the ExApp
+deployment. When queue reduces in size, KEDA scales back down.
+
+--------------
+
+Step 0: GPU Setup (kind cluster)
+--------------------------------
+
+If your ExApp needs GPU (e.g. llm2), you must set up GPU passthrough in
+the kind cluster.
+
+0.1 Configure Docker on the host
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+.. code:: bash
+
+ sudo nvidia-ctk runtime configure --runtime=docker --set-as-default --cdi.enabled
+ sudo nvidia-ctk config --set accept-nvidia-visible-devices-as-volume-mounts=true --in-place
+ sudo systemctl restart docker
+
+0.2 Create kind cluster with GPU support
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+.. code:: yaml
+
+ # kind-gpu-config.yaml
+ kind: Cluster
+ apiVersion: kind.x-k8s.io/v1alpha4
+ nodes:
+ - role: control-plane
+ extraMounts:
+ - hostPath: /dev/null
+ containerPath: /var/run/nvidia-container-devices/all
+
+.. code:: bash
+
+ kind create cluster --name nc-exapps --config kind-gpu-config.yaml
+
+0.3 Install nvidia-container-toolkit inside the kind node
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+.. code:: bash
+
+ docker exec nc-exapps-control-plane bash -c '
+ apt-get update -y && apt-get install -y gnupg2 curl &&
+ curl -fsSL https://nvidia.github.io/libnvidia-container/gpgkey | \
+ gpg --dearmor -o /usr/share/keyrings/nvidia-container-toolkit-keyring.gpg &&
+ curl -s -L https://nvidia.github.io/libnvidia-container/stable/deb/nvidia-container-toolkit.list | \
+ sed "s#deb https://#deb [signed-by=/usr/share/keyrings/nvidia-container-toolkit-keyring.gpg] https://#g" \
+ > /etc/apt/sources.list.d/nvidia-container-toolkit.list &&
+ apt-get update && apt-get install -y nvidia-container-toolkit
+ '
+
+0.4 Configure containerd and restart
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+.. code:: bash
+
+ docker exec nc-exapps-control-plane bash -c '
+ nvidia-ctk runtime configure --runtime=containerd --set-as-default &&
+ systemctl restart containerd
+ '
+
+0.5 Install NVIDIA device plugin
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+For a single GPU shared across multiple pods, use **time-slicing**.
+First create a ConfigMap with the number of replicas (virtual GPUs):
+
+.. code:: bash
+
+ cat <<'EOF' | kubectl apply -f -
+ apiVersion: v1
+ kind: ConfigMap
+ metadata:
+ name: nvidia-device-plugin-config
+ namespace: kube-system
+ data:
+ config.yaml: |
+ version: v1
+ sharing:
+ timeSlicing:
+ renameByDefault: false
+ resources:
+ - name: nvidia.com/gpu
+ replicas: 4
+ EOF
+
+Then deploy the device plugin with the config:
+
+.. code:: bash
+
+ cat <<'EOF' | kubectl apply -f -
+ apiVersion: apps/v1
+ kind: DaemonSet
+ metadata:
+ name: nvidia-device-plugin-daemonset
+ namespace: kube-system
+ spec:
+ selector:
+ matchLabels:
+ name: nvidia-device-plugin-ds
+ template:
+ metadata:
+ labels:
+ name: nvidia-device-plugin-ds
+ spec:
+ tolerations:
+ - key: nvidia.com/gpu
+ operator: Exists
+ effect: NoSchedule
+ priorityClassName: system-node-critical
+ containers:
+ - image: nvcr.io/nvidia/k8s-device-plugin:v0.17.0
+ name: nvidia-device-plugin-ctr
+ args: ["--config-file=/config/config.yaml"]
+ env:
+ - name: FAIL_ON_INIT_ERROR
+ value: "false"
+ securityContext:
+ allowPrivilegeEscalation: false
+ capabilities:
+ drop: ["ALL"]
+ volumeMounts:
+ - name: device-plugin
+ mountPath: /var/lib/kubelet/device-plugins
+ - name: plugin-config
+ mountPath: /config
+ volumes:
+ - name: device-plugin
+ hostPath:
+ path: /var/lib/kubelet/device-plugins
+ - name: plugin-config
+ configMap:
+ name: nvidia-device-plugin-config
+ items:
+ - key: config.yaml
+ path: config.yaml
+ EOF
+
+0.6 Verify GPU is visible
+~~~~~~~~~~~~~~~~~~~~~~~~~
+
+.. code:: bash
+
+ kubectl get nodes -o json | python3 -c "
+ import json,sys
+ for n in json.load(sys.stdin)['items']:
+ gpu = n['status']['capacity'].get('nvidia.com/gpu','N/A')
+ print(f'{n[\"metadata\"][\"name\"]}: nvidia.com/gpu = {gpu}')
+ "
+
+Expected: ``nvidia.com/gpu = 4`` (or your configured replicas count).
+
+0.7 Test GPU from a pod
+~~~~~~~~~~~~~~~~~~~~~~~
+
+.. code:: bash
+
+ kubectl run gpu-test --image=nvidia/cuda:12.6.3-base-ubuntu24.04 --restart=Never \
+ --overrides='{"spec":{"containers":[{"name":"gpu-test","image":"nvidia/cuda:12.6.3-base-ubuntu24.04","command":["nvidia-smi"],"resources":{"limits":{"nvidia.com/gpu":"1"}}}]}}' \
+ -n nextcloud-exapps
+ sleep 30 && kubectl logs gpu-test -n nextcloud-exapps
+ kubectl delete pod gpu-test -n nextcloud-exapps
+
+--------------
+
+Step 1: Install KEDA
+--------------------
+
+.. code:: bash
+
+ helm repo add kedacore https://kedacore.github.io/charts
+ helm repo update
+ helm install keda kedacore/keda --namespace keda --create-namespace
+
+Verify:
+
+.. code:: bash
+
+ kubectl get pods -n keda
+ # All pods should be Running
+
+--------------
+
+Step 2: DNS setup (kind only)
+-----------------------------
+
+KEDA pods need to resolve ``nextcloud.local``. **HaRP does this
+automatically now** — when ``HP_K8S_HOST_ALIASES`` is set, HaRP patches
+the CoreDNS ``ConfigMap`` on startup and restarts CoreDNS so that every
+pod in the cluster (including KEDA) can resolve the configured
+hostnames.
+
+If you need to do it manually (or verify), the commands are:
+
+.. code:: bash
+
+ # Get the nginx proxy IP
+ PROXY_IP=$(docker inspect master-proxy-1 \
+ --format '{{(index .NetworkSettings.Networks "master_default").IPAddress}}')
+ echo "Proxy IP: $PROXY_IP"
+
+ # Write the Corefile with the correct IP
+ cat > /tmp/Corefile << EOF
+ .:53 {
+ errors
+ health {
+ lameduck 5s
+ }
+ ready
+ kubernetes cluster.local in-addr.arpa ip6.arpa {
+ pods insecure
+ fallthrough in-addr.arpa ip6.arpa
+ ttl 30
+ }
+ prometheus :9153
+ hosts {
+ ${PROXY_IP} nextcloud.local
+ fallthrough
+ }
+ forward . /etc/resolv.conf {
+ max_concurrent 1000
+ }
+ cache 30
+ loop
+ reload
+ loadbalance
+ }
+ EOF
+
+ kubectl create configmap coredns -n kube-system \
+ --from-file=Corefile=/tmp/Corefile \
+ --dry-run=client -o yaml | kubectl apply -f -
+
+ kubectl rollout restart deployment coredns -n kube-system
+
+Verify:
+
+.. code:: bash
+
+ kubectl run dns-test --rm -i --restart=Never --image=busybox -- nslookup nextcloud.local
+
+--------------
+
+Step 3: Create a Nextcloud App Password
+---------------------------------------
+
+KEDA needs credentials to poll the ``queue_stats`` endpoint. The
+endpoint is admin-only.
+
+1. Log in to Nextcloud as admin
+2. Go to **Settings > Security > Devices & sessions**
+3. Enter a name (e.g. ``keda-scaler``) and click **Create new app
+ password**
+4. Copy the password into a **.env** file
+
+.. code:: bash
+
+ # .env
+ NC_USER="admin"
+ NC_APP_PASSWORD=""
+ NC_URL="https://nextcloud.local"
+
+Verify:
+
+.. code:: bash
+
+ source .env
+ curl -s -k -u "${NC_USER}:${NC_APP_PASSWORD}" \
+ "${NC_URL}/ocs/v2.php/taskprocessing/queue_stats?format=json"
+
+Expected:
+
+.. code:: json
+
+ {"ocs":{"meta":{"status":"ok","statuscode":200,"message":"OK"},"data":{"scheduled":0,"running":0}}}
+
+--------------
+
+Step 4: Create k8s secret
+-------------------------
+
+.. code:: bash
+
+ kubectl create secret generic nextcloud-keda-auth \
+ --namespace=nextcloud-exapps \
+ --from-literal=username="${NC_USER}" \
+ --from-literal=password="${NC_APP_PASSWORD}"
+
+--------------
+
+Step 5: Create KEDA TriggerAuthentication
+-----------------------------------------
+
+.. code:: bash
+
+ cat <<'EOF' | kubectl apply -f -
+ apiVersion: keda.sh/v1alpha1
+ kind: TriggerAuthentication
+ metadata:
+ name: nextcloud-auth
+ namespace: nextcloud-exapps
+ spec:
+ secretTargetRef:
+ - parameter: username
+ name: nextcloud-keda-auth
+ key: username
+ - parameter: password
+ name: nextcloud-keda-auth
+ key: password
+ EOF
+
+--------------
+
+Step 6: Create KEDA ScaledObject
+--------------------------------
+
+.. note::
+
+ Nextcloud OCS returns XML by default. Always include ``format=json`` in the URL.
+
+Task type filter
+~~~~~~~~~~~~~~~~
+
+llm2 registers many task types. Use a comma-separated list to scale on
+all of them:
+
+::
+
+ ?taskTypeId=core:text2text,core:text2text:chat,core:text2text:summary,core:text2text:headline,core:text2text:topics,core:text2text:simplification,core:text2text:reformulation,core:contextwrite,core:text2text:changetone,core:text2text:chatwithtools,core:text2text:proofread
+
+Apply
+~~~~~
+
+.. code:: yaml
+
+ # keda-llm2-scaler.yaml
+ apiVersion: keda.sh/v1alpha1
+ kind: ScaledObject
+ metadata:
+ name: llm2-scaler
+ namespace: nextcloud-exapps
+ spec:
+ scaleTargetRef:
+ name: nc-app-llm2
+ pollingInterval: 15
+ cooldownPeriod: 120
+ initialCooldownPeriod: 60
+ minReplicaCount: 1
+ maxReplicaCount: 4
+ triggers:
+ - type: metrics-api
+ metadata:
+ url: "https://nextcloud.local/ocs/v2.php/taskprocessing/queue_stats?format=json&taskTypeId=core:text2text,core:text2text:chat,core:text2text:summary"
+ valueLocation: "ocs.data.scheduled"
+ targetValue: "5"
+ authMode: "basic"
+ unsafeSsl: "true"
+ authenticationRef:
+ name: nextcloud-auth
+
+.. code:: bash
+
+ kubectl apply -f keda-llm2-scaler.yaml
+
+Scaling formula
+~~~~~~~~~~~~~~~
+
+::
+
+ desiredReplicas = ceil( metricValue / targetValue )
+
+=============== ============= ===================
+Scheduled tasks targetValue=5 Result
+=============== ============= ===================
+0 \- 1 (minReplicaCount)
+3 ceil(3/5)=1 1 pod
+12 ceil(12/5)=3 3 pods
+50 ceil(50/5)=10 4 (capped at max)
+=============== ============= ===================
+
+--------------
+
+Step 7: Verify and Monitor
+--------------------------
+
+Quick status
+~~~~~~~~~~~~
+
+.. code:: bash
+
+ kubectl get scaledobject -n nextcloud-exapps && echo && \
+ kubectl get deploy nc-app-llm2 -n nextcloud-exapps && echo && \
+ kubectl get pods -n nextcloud-exapps -l app=nc-app-llm2 -o wide
+
+- ``READY=True`` - KEDA can reach the metrics endpoint
+- ``ACTIVE=False`` - no tasks queued
+- ``AVAILABLE=1`` - one pod running (minReplicaCount)
+
+Watch scaling live
+~~~~~~~~~~~~~~~~~~
+
+.. code:: bash
+
+ # Terminal 1: pods
+ kubectl get pods -n nextcloud-exapps -l app=nc-app-llm2 -w
+
+ # Terminal 2: deployment
+ kubectl get deploy nc-app-llm2 -n nextcloud-exapps -w
+
+ # Terminal 3: KEDA logs
+ kubectl logs -n keda -l app=keda-operator -f --tail=5
+
+Check HPA (KEDA creates this)
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+.. code:: bash
+
+ kubectl get hpa -n nextcloud-exapps
+ kubectl describe hpa -n nextcloud-exapps
+
+Full dashboard
+~~~~~~~~~~~~~~
+
+.. code:: bash
+
+ echo "=== ScaledObject ===" && \
+ kubectl get scaledobject -n nextcloud-exapps && echo && \
+ echo "=== HPA ===" && \
+ kubectl get hpa -n nextcloud-exapps && echo && \
+ echo "=== Deployment ===" && \
+ kubectl get deploy nc-app-llm2 -n nextcloud-exapps && echo && \
+ echo "=== Pods ===" && \
+ kubectl get pods -n nextcloud-exapps -l app=nc-app-llm2 -o wide && echo && \
+ echo "=== Queue ===" && \
+ curl -s -k -u "${NC_USER}:${NC_APP_PASSWORD}" \
+ "${NC_URL}/ocs/v2.php/taskprocessing/queue_stats?format=json"
+
+--------------
+
+Tuning Guide
+------------
+
++---------------------------+---------+---------+-------------------------------------+
+| Parameter | Example | Default | What it does |
++===========================+=========+=========+=====================================+
+| ``pollingInterval`` | 15 | 30 | Seconds between polls. |
+| | | | Lower = faster reaction |
++---------------------------+---------+---------+-------------------------------------+
+| ``cooldownPeriod`` | 120 | 300 | Seconds to wait before scaling down |
++---------------------------+---------+---------+-------------------------------------+
+| ``initialCooldownPeriod`` | 60 | 0 | Wait after new pod starts. Set to |
+| | | | 60 for LLM model loading time |
++---------------------------+---------+---------+-------------------------------------+
+| ``minReplicaCount`` | 1 | 0 | Min pods. Must be 1+ (AppAPI needs |
+| | | | at least one pod for heartbeat) |
++---------------------------+---------+---------+-------------------------------------+
+| ``maxReplicaCount`` | 4 | 100 | Max pods. Match your GPU count or |
+| | | | time-slicing replicas |
++---------------------------+---------+---------+-------------------------------------+
+| ``targetValue`` | 5 | \- | Tasks per pod. |
+| | | | Lower = more pods sooner |
++---------------------------+---------+---------+-------------------------------------+
+
+GPU time-slicing notes
+~~~~~~~~~~~~~~~~~~~~~~
+
+- One physical GPU can be shared by multiple pods using NVIDIA
+ time-slicing
+- Each llm2 pod uses about 8GB VRAM (model dependent)
+- RTX 5090 (32GB): can run 3-4 pods with time-slicing replicas=4
+- RTX 4090 (24GB): can run 2-3 pods with time-slicing replicas=3
+- Set ``maxReplicaCount`` to match your time-slicing replicas
+- CUDA gives each pod equal GPU time
+
+LLM notes
+~~~~~~~~~
+
+- Model loading takes 30-60s. New pods are not ready right away
+- Use ``initialCooldownPeriod`` to avoid over-scaling during warmup
+- PVC access mode is ``ReadWriteOnce``. Works on single-node only
+- Multi-node clusters are not supported yet
+
+--------------
+
+Cleanup
+-------
+
+.. code:: bash
+
+ # Remove KEDA ScaledObject
+ kubectl delete scaledobject llm2-scaler -n nextcloud-exapps
+
+ # Remove auth resources
+ kubectl delete triggerauthentication nextcloud-auth -n nextcloud-exapps
+ kubectl delete secret nextcloud-keda-auth -n nextcloud-exapps
diff --git a/developer_manual/exapp_development/scaling/KubernetesSetup.rst b/developer_manual/exapp_development/scaling/KubernetesSetup.rst
new file mode 100644
index 00000000000..53e2a986d3c
--- /dev/null
+++ b/developer_manual/exapp_development/scaling/KubernetesSetup.rst
@@ -0,0 +1,399 @@
+.. _scaling-kubernetes-setup:
+
+Setting up Kubernetes
+=====================
+
+This guide will help you set up a local Kubernetes cluster
+(via `kind `__)
+with HaRP and AppAPI for ExApp development.
+After completing these steps you will be able to register a k8s deploy daemon in Nextcloud and deploy a test app.
+
+Prerequisites
+-------------
+
+- Docker must be installed and running
+- A `nextcloud-docker-dev `__ environment running at ``https://nextcloud.local``
+
+ - The Nextcloud container is on the ``master_default`` Docker
+ network
+
+- ``kubectl`` installed (`install
+ guide `__)
+- ``kind`` installed (`install
+ guide `__)
+- HaRP repository cloned (e.g. ``~/nextcloud/HaRP``)
+
+Architecture Overview
+---------------------
+
+::
+
+ Browser / OCC
+ |
+ Nextcloud (PHP, in Docker container)
+ | OCC commands or API calls
+ v
+ nginx proxy ──/exapps/──> HaRP (host network, port 8780)
+ |
+ | k8s API calls (Deployments, Services, PVCs)
+ v
+ kind cluster (nc-exapps)
+ |
+ v
+ ExApp Pod (e.g. test-deploy)
+
+- **HaRP** runs on the host network (``--network=host``) and
+ communicates with:
+
+ - The kind k8s API server (via ``https://127.0.0.1:``)
+ - ExApp pods via NodePort services (via the kind node IP)
+
+- **Nextcloud** reaches HaRP via the Docker network gateway IP
+- **nginx proxy** forwards ``/exapps/`` requests to HaRP
+
+Step 1: Create the kind Cluster
+-------------------------------
+
+.. code:: bash
+
+ kind create cluster --name nc-exapps
+
+Verify:
+
+.. code:: bash
+
+ kubectl config use-context kind-nc-exapps
+ kubectl cluster-info
+ kubectl get nodes -o wide
+
+Note the **API server URL** (e.g. ``https://127.0.0.1:37151``) and the
+**node InternalIP** (e.g. ``172.18.0.2``):
+
+.. code:: bash
+
+ # API server
+ kubectl config view --minify -o jsonpath='{.clusters[0].cluster.server}'
+
+ # Node internal IP
+ kubectl get nodes -o jsonpath='{.items[0].status.addresses[?(@.type=="InternalIP")].address}'
+
+Step 2: Create Namespace and RBAC
+---------------------------------
+
+.. code:: bash
+
+ # Create the ExApps namespace
+ kubectl create namespace nextcloud-exapps
+
+ # Create a ServiceAccount for HaRP
+ kubectl -n nextcloud-exapps create serviceaccount harp-exapps
+
+ # Grant cluster-admin (for development; restrict in production)
+ kubectl create clusterrolebinding harp-exapps-admin \
+ --clusterrole=cluster-admin \
+ --serviceaccount=nextcloud-exapps:harp-exapps
+
+Generate a bearer token (valid for 1 year):
+
+.. code:: bash
+
+ kubectl -n nextcloud-exapps create token harp-exapps --duration=8760h
+
+..
+
+ The ``redeploy_host_k8s.sh`` script generates this token
+ automatically, so you don’t need to copy it manually.
+
+Step 3: Configure the nginx Proxy
+---------------------------------
+
+The nextcloud-docker-dev nginx proxy must forward ``/exapps/`` to HaRP.
+
+Find the gateway IP of the ``master_default`` Docker network (this is
+how containers reach the host):
+
+.. code:: bash
+
+ docker network inspect master_default \
+ --format '{{range .IPAM.Config}}Gateway: {{.Gateway}}{{end}}'
+
+Typically this is your host IP like ``192.168.21.1`` (may vary on your
+machine).
+
+Edit the nginx vhost file:
+
+.. code:: bash
+
+ # Path relative to your nextcloud-docker-dev checkout:
+ # data/nginx/vhost.d/nextcloud.local_location
+
+Set the content to:
+
+.. code:: nginx
+
+ location /exapps/ {
+ set $harp_addr :8780;
+ proxy_pass http://$harp_addr;
+
+ # Forward the true client identity
+ proxy_set_header Host $host;
+ proxy_set_header X-Real-IP $remote_addr;
+ proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
+ proxy_set_header X-Forwarded-Proto $scheme;
+ }
+
+Replace ```` with the gateway from above
+(e.g. ``192.168.21.1``).
+
+Reload nginx:
+
+.. code:: bash
+
+ docker exec master-proxy-1 nginx -s reload
+
+Step 4: Build and Deploy HaRP
+-----------------------------
+
+From the HaRP repository root:
+
+.. code:: bash
+
+ cd ~/nextcloud/HaRP
+ bash development/redeploy_host_k8s.sh
+
+The script will:
+
+ 1. Auto-detect the k8s API server URL
+ 2. Generate a fresh bearer token
+ 3. Build the HaRP Docker image
+ 4. Start HaRP with k8s backend enabled on host network
+
+Wait for HaRP to become healthy:
+
+.. code:: bash
+
+ docker ps | grep harp
+ # Should show "(healthy)" after ~15 seconds
+
+Check logs if needed:
+
+.. code:: bash
+
+ docker logs appapi-harp --tail=20
+
+Step 5: Register the k8s Deploy Daemon in Nextcloud
+---------------------------------------------------
+
+Run this inside the Nextcloud container (replace ```` with
+your container ID or name, and ```` with the gateway from
+Step 3):
+
+.. code:: bash
+
+ docker exec php occ app_api:daemon:register \
+ k8s_local "Kubernetes Local" "kubernetes-install" \
+ "http" ":8780" "http://nextcloud.local" \
+ --harp \
+ --harp_shared_key "some_very_secure_password" \
+ --harp_frp_address ":8782" \
+ --k8s \
+ --k8s_expose_type=nodeport \
+ --set-default
+
+Verify:
+
+.. code:: bash
+
+ docker exec php occ app_api:daemon:list
+
+Step 6: Run Test Deploy
+-----------------------
+
+Via OCC
+~~~~~~~
+
+.. code:: bash
+
+ docker exec php occ app_api:app:register test-deploy k8s_local \
+ --info-xml https://raw.githubusercontent.com/nextcloud/test-deploy/main/appinfo/info.xml \
+ --test-deploy-mode
+
+Expected output:
+
+::
+
+ ExApp test-deploy deployed successfully.
+ ExApp test-deploy successfully registered.
+
+Via API (same as what the Admin UI uses)
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+.. code:: bash
+
+ # Start test deploy
+ curl -X POST -u admin:admin -H "OCS-APIREQUEST: true" -k \
+ "https://nextcloud.local/index.php/apps/app_api/daemons/k8s_local/test_deploy"
+
+ # Check status
+ curl -u admin:admin -H "OCS-APIREQUEST: true" -k \
+ "https://nextcloud.local/index.php/apps/app_api/daemons/k8s_local/test_deploy/status"
+
+ # Stop and clean up
+ curl -X DELETE -u admin:admin -H "OCS-APIREQUEST: true" -k \
+ "https://nextcloud.local/index.php/apps/app_api/daemons/k8s_local/test_deploy"
+
+Verify k8s Resources
+~~~~~~~~~~~~~~~~~~~~
+
+.. code:: bash
+
+ kubectl get deploy,svc,pvc,pods -n nextcloud-exapps -o wide
+
+Unregister
+~~~~~~~~~~
+
+.. code:: bash
+
+ docker exec php occ app_api:app:unregister test-deploy
+
+Cluster Overview
+----------------
+
+==================== ===========================
+Component Value
+==================== ===========================
+**Type** kind (Kubernetes in Docker)
+**Cluster Name** ``nc-exapps``
+**Node** ``nc-exapps-control-plane``
+**ExApps Namespace** ``nextcloud-exapps``
+**ServiceAccount** ``harp-exapps``
+==================== ===========================
+
+Monitoring Commands
+-------------------
+
+Cluster Status
+~~~~~~~~~~~~~~
+
+.. code:: bash
+
+ kubectl cluster-info
+ kubectl get nodes -o wide
+ kubectl get pods -n nextcloud-exapps
+ kubectl get pods -n nextcloud-exapps -w # watch in real-time
+
+Pod Inspection
+~~~~~~~~~~~~~~
+
+.. code:: bash
+
+ kubectl describe pod -n nextcloud-exapps
+ kubectl logs -n nextcloud-exapps
+ kubectl logs -f -n nextcloud-exapps # follow logs
+ kubectl logs --previous -n nextcloud-exapps # after restart
+
+Resources
+~~~~~~~~~
+
+.. code:: bash
+
+ kubectl get svc,deploy,pvc -n nextcloud-exapps
+ kubectl get all -n nextcloud-exapps
+
+HaRP Logs
+~~~~~~~~~
+
+.. code:: bash
+
+ docker logs appapi-harp --tail=50
+ docker logs -f appapi-harp # follow
+
+Troubleshooting
+---------------
+
+HaRP can’t reach k8s API
+~~~~~~~~~~~~~~~~~~~~~~~~
+
+.. code:: bash
+
+ # Check if kind container is running
+ docker ps | grep kind
+
+ # Verify API server is reachable from host
+ curl -k https://127.0.0.1:37151/version
+
+Nextcloud can’t reach HaRP
+~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+.. code:: bash
+
+ # From inside the Nextcloud container, test connectivity to HaRP:
+ docker exec curl -s http://:8780/
+
+ # Should return "404 Not Found" (HaRP is responding)
+ # If connection refused: check HaRP is running and gateway IP is correct
+
+Heartbeat fails after successful deploy
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+Check HaRP logs for routing errors:
+
+.. code:: bash
+
+ docker logs appapi-harp --tail=20
+
+HaRP lazily resolves the k8s Service upstream on first request after a
+restart, so restarting HaRP does **not** require re-deploying ExApps. If
+heartbeat still fails, verify the k8s Service exists and is reachable:
+
+.. code:: bash
+
+ kubectl get svc -n nextcloud-exapps
+
+Pods stuck in Pending
+~~~~~~~~~~~~~~~~~~~~~
+
+.. code:: bash
+
+ kubectl describe pod -n nextcloud-exapps
+ # Check Events section for scheduling or image pull issues
+
+Image pull errors
+~~~~~~~~~~~~~~~~~
+
+The kind cluster needs to be able to pull images. For public images
+(like ``ghcr.io/nextcloud/test-deploy:release``) this should work out of
+the box.
+
+Token expired
+~~~~~~~~~~~~~
+
+Regenerate by rerunning the redeploy script:
+
+.. code:: bash
+
+ cd ~/nextcloud/HaRP
+ bash development/redeploy_host_k8s.sh
+
+Clean up all ExApp resources
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+.. code:: bash
+
+ kubectl delete deploy,svc,pvc -n nextcloud-exapps --all
+
+Reset everything
+~~~~~~~~~~~~~~~~
+
+.. code:: bash
+
+ # Remove daemon config
+ docker exec php occ app_api:daemon:unregister k8s_local
+
+ # Delete kind cluster
+ kind delete cluster --name nc-exapps
+
+ # Remove HaRP container
+ docker rm -f appapi-harp
+
+Then start again from Step 1.
diff --git a/developer_manual/exapp_development/scaling/index.rst b/developer_manual/exapp_development/scaling/index.rst
new file mode 100644
index 00000000000..f1e8aeeebb8
--- /dev/null
+++ b/developer_manual/exapp_development/scaling/index.rst
@@ -0,0 +1,32 @@
+Scaling ExApps
+==============
+
+AppAPI delegates the scaling task to the ExApp itself.
+This means that the ExApp must be designed in a way so that it is possible to scale vertically.
+As for horizontal scaling, we recommend using Kubernetes for this.
+
+You could also implement, for example, a Server-Worker architecture for basic scaling.
+In this case, the Server is your ExApp and the Workers are the external machines that can work with the ExApp
+using Nextcloud user authentication.
+Additional clients (or workers) can be (optionally) added (or attached) to the ExApp
+to increase the capacity and performance.
+
+The rest of this section will explain how to setup and use Kubernetes for automated scaling.
+Additional instructions are also provided if you have a GPU device for GPU scaling.
+
+
+.. note::
+
+ Currently, if a Deploy daemon is configured with GPUs available,
+ AppAPI will by default attach all available GPU devices to each ExApp container on this Deploy daemon.
+ This means that these GPUs are shared between all ExApps on the same Deploy daemon.
+ Therefore, for the ExApps that require heavy use of GPUs,
+ it is recommended to have a separate Deploy daemon (host) for them.
+
+
+.. toctree::
+ :maxdepth: 2
+
+ KubernetesSetup
+ KEDASetup
+ AppAPIEmulation