diff --git a/developer_manual/exapp_development/faq/Scaling.rst b/developer_manual/exapp_development/faq/Scaling.rst deleted file mode 100644 index 117bee5d2d6..00000000000 --- a/developer_manual/exapp_development/faq/Scaling.rst +++ /dev/null @@ -1,22 +0,0 @@ -Scaling -======= - -AppAPI delegates the scaling task to the ExApp itself. -This means that the ExApp must be designed in a way to be able to scale vertically. -As for the horizontal scaling, it is currently not possible except by using, -for example, a Server-Worker architecture, which is a good way to support basic scaling capabilities. -In this case, the Server is your ExApp and the Workers are the external machines that can work with the ExApp -using Nextcloud user authentication. -Additional clients (or workers) can be (optionally) added (or attached) to the ExApp -to increase the capacity and performance. - - -GPUs scaling ------------- - -Currently, if a Deploy daemon configured with GPUs available, -AppAPI by default will attach all available GPU devices to each ExApp container on this Deploy daemon. -This means that these GPUs are shared between all ExApps on the same Deploy daemon. -Therefore, for the ExApps that require heavy use of GPUs, -it is recommended to have a separate Deploy daemon (host) for them. - diff --git a/developer_manual/exapp_development/faq/index.rst b/developer_manual/exapp_development/faq/index.rst index 97dd9ddddb9..0105f6b8f05 100644 --- a/developer_manual/exapp_development/faq/index.rst +++ b/developer_manual/exapp_development/faq/index.rst @@ -19,6 +19,5 @@ or provide a brief answer. DockerContainerRegistry DockerSocketProxy GpuSupport - Scaling BehindCompanyProxy Troubleshooting diff --git a/developer_manual/exapp_development/index.rst b/developer_manual/exapp_development/index.rst index 2625bde34ae..e62766e5d55 100644 --- a/developer_manual/exapp_development/index.rst +++ b/developer_manual/exapp_development/index.rst @@ -7,6 +7,7 @@ ExApp development Introduction DevSetup + scaling/index.rst development_overview/index.rst tech_details/index.rst faq/index.rst diff --git a/developer_manual/exapp_development/scaling/AppAPIEmulation.rst b/developer_manual/exapp_development/scaling/AppAPIEmulation.rst new file mode 100644 index 00000000000..ec9dde77944 --- /dev/null +++ b/developer_manual/exapp_development/scaling/AppAPIEmulation.rst @@ -0,0 +1,380 @@ +Emulating AppAPI +================ + +This section documents the ``curl`` commands used to emulate AppAPI when +testing HaRP’s Kubernetes backend. + +Prerequisites +------------- + +- HaRP is reachable at: ``http://nextcloud.local/exapps`` +- HaRP was started with the same shared key as used below + (``HP_SHARED_KEY``) +- HaRP has Kubernetes backend enabled (``HP_K8S_ENABLED=true``) and can + access the k8s API +- ``kubectl`` is configured to point to the same cluster HaRP uses +- Optional: ``jq`` for parsing JSON responses + +Environment variables +--------------------- + +.. code:: bash + + # .env + EXAPPS_URL="http://nextcloud.local/exapps" + APPAPI_URL="${EXAPPS_URL}/app_api" + HP_SHARED_KEY="some_very_secure_password" + + # Optional: Nextcloud base (only used by ExApp container env in this guide) + NEXTCLOUD_URL="http://nextcloud.local" + +.. code:: bash + + source .env + +.. + + Notes: + + - All AppAPI-emulation calls go to ``$APPAPI_URL/...`` and require + the header ``harp-shared-key``. + - You can also hit the agent directly on + ``http://127.0.0.1:8200/...`` for debugging, but that bypasses the + HAProxy/AppAPI path and may skip shared-key enforcement depending + on your routing. + +-------------- + +1) Check if ExApp is present (k8s deployment exists) +---------------------------------------------------- + +.. code:: bash + + curl -sS \ + -H "harp-shared-key: $HP_SHARED_KEY" \ + -H "Content-Type: application/json" \ + -X POST \ + -d '{ + "name": "test-deploy", + "instance_id": "" + }' \ + "$APPAPI_URL/k8s/exapp/exists" + +Expected output: + +.. code:: json + + {"exists": true} + +or + +.. code:: json + + {"exists": false} + +-------------- + +2) Create ExApp (PVC + Deployment with replicas=0) +-------------------------------------------------- + +.. code:: bash + + curl -sS \ + -H "harp-shared-key: $HP_SHARED_KEY" \ + -H "Content-Type: application/json" \ + -X POST \ + -d '{ + "name": "test-deploy", + "instance_id": "", + "image": "ghcr.io/nextcloud/test-deploy:latest", + "environment_variables": [ + "APP_ID=test-deploy", + "APP_DISPLAY_NAME=Test Deploy", + "APP_VERSION=1.2.1", + "APP_HOST=0.0.0.0", + "APP_PORT=23000", + "NEXTCLOUD_URL='"$NEXTCLOUD_URL"'", + "APP_SECRET=some-dev-secret", + "APP_PERSISTENT_STORAGE=/nc_app_test-deploy_data" + ], + "resource_limits": { "cpu": "500m", "memory": "512Mi" } + }' \ + "$APPAPI_URL/k8s/exapp/create" + +Expected output (example): + +.. code:: json + + {"name":"nc-app-test-deploy"} + +-------------- + +3) Start ExApp (scale replicas to 1) +------------------------------------ + +.. code:: bash + + curl -sS \ + -H "harp-shared-key: $HP_SHARED_KEY" \ + -H "Content-Type: application/json" \ + -X POST \ + -d '{ + "name": "test-deploy", + "instance_id": "" + }' \ + "$APPAPI_URL/k8s/exapp/start" + +Expected: HTTP 204. + +-------------- + +4) Wait for ExApp to become Ready +--------------------------------- + +.. code:: bash + + curl -sS \ + -H "harp-shared-key: $HP_SHARED_KEY" \ + -H "Content-Type: application/json" \ + -X POST \ + -d '{ + "name": "test-deploy", + "instance_id": "" + }' \ + "$APPAPI_URL/k8s/exapp/wait_for_start" + +Expected output (example): + +.. code:: json + + { + "started": true, + "status": "running", + "health": "ready", + "reason": null, + "message": null + } + +-------------- + +5) Expose + register in HaRP +---------------------------- + +5.1 NodePort (default behavior) +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +**Minimal (uses defaults, may auto-pick a node address):** + +.. code:: bash + + EXPOSE_JSON=$( + curl -sS \ + -H "harp-shared-key: $HP_SHARED_KEY" \ + -H "Content-Type: application/json" \ + -X POST \ + -d '{ + "name": "test-deploy", + "instance_id": "", + "port": 23000, + "expose_type": "nodeport" + }' \ + "$APPAPI_URL/k8s/exapp/expose" + ) + + echo "$EXPOSE_JSON" + +**Recommended (provide a stable host reachable by HaRP):** + +.. code:: bash + + # Example: edge node IP / VIP / L4 LB that forwards NodePort range + UPSTREAM_HOST="172.18.0.2" + + EXPOSE_JSON=$( + curl -sS \ + -H "harp-shared-key: $HP_SHARED_KEY" \ + -H "Content-Type: application/json" \ + -X POST \ + -d '{ + "name": "test-deploy", + "instance_id": "", + "port": 23000, + "expose_type": "nodeport", + "upstream_host": "'"$UPSTREAM_HOST"'" + }' \ + "$APPAPI_URL/k8s/exapp/expose" + ) + + echo "$EXPOSE_JSON" + +5.2 ClusterIP (only if HaRP can reach ClusterIP + resolve service DNS) +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +.. code:: bash + + EXPOSE_JSON=$( + curl -sS \ + -H "harp-shared-key: $HP_SHARED_KEY" \ + -H "Content-Type: application/json" \ + -X POST \ + -d '{ + "name": "test-deploy", + "instance_id": "", + "port": 23000, + "expose_type": "clusterip" + }' \ + "$APPAPI_URL/k8s/exapp/expose" + ) + + echo "$EXPOSE_JSON" + +5.3 Manual (HaRP does not create or inspect any Service) +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +.. code:: bash + + EXPOSE_JSON=$( + curl -sS \ + -H "harp-shared-key: $HP_SHARED_KEY" \ + -H "Content-Type: application/json" \ + -X POST \ + -d '{ + "name": "test-deploy", + "instance_id": "", + "port": 23000, + "expose_type": "manual", + "upstream_host": "exapp-test-deploy.internal", + "upstream_port": 23000 + }' \ + "$APPAPI_URL/k8s/exapp/expose" + ) + + echo "$EXPOSE_JSON" + +-------------- + +6) Extract exposed host/port for follow-up tests (requires ``jq``) +------------------------------------------------------------------ + +.. code:: bash + + EXAPP_HOST=$(echo "$EXPOSE_JSON" | jq -r '.host') + EXAPP_PORT=$(echo "$EXPOSE_JSON" | jq -r '.port') + + echo "ExApp upstream endpoint: ${EXAPP_HOST}:${EXAPP_PORT}" + +-------------- + +7) Check ``/heartbeat`` via HaRP routing (AppAPI-style direct routing headers) +------------------------------------------------------------------------------ + +This checks HaRP’s ability to route to the ExApp given an explicit +upstream host/port and AppAPI-style authorization header. + +7.1 Build ``authorization-app-api`` value +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +HaRP typically expects this value to be the **base64-encoded value of** +``user_id:APP_SECRET`` (similar to HTTP Basic without the ``Basic`` +prefix). For an “anonymous” style request, use ``:APP_SECRET``. + +.. code:: bash + + # Option A: anonymous-style + AUTH_APP_API=$(printf '%s' ':some-dev-secret' | base64 | tr -d '\n') + + # Option B: user-scoped style (example user "admin") + # AUTH_APP_API=$(printf '%s' 'admin:some-dev-secret' | base64 | tr -d '\n') + +7.2 Call heartbeat +~~~~~~~~~~~~~~~~~~ + +.. code:: bash + + curl -sS \ + "http://nextcloud.local/exapps/test-deploy/heartbeat" \ + -H "harp-shared-key: $HP_SHARED_KEY" \ + -H "ex-app-version: 1.2.1" \ + -H "ex-app-id: test-deploy" \ + -H "ex-app-host: $EXAPP_HOST" \ + -H "ex-app-port: $EXAPP_PORT" \ + -H "authorization-app-api: $AUTH_APP_API" + +If this fails with auth-related errors, verify: + +- ``APP_SECRET`` in the ExApp matches what you used here, +- your HaProxy config expectations for ``authorization-app-api`` (raw + vs base64). + +-------------- + +8) Stop and remove (API-based cleanup) +-------------------------------------- + +Stop ExApp (scale replicas to 0) +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +.. code:: bash + + curl -sS \ + -H "harp-shared-key: $HP_SHARED_KEY" \ + -H "Content-Type: application/json" \ + -X POST \ + -d '{ + "name": "test-deploy", + "instance_id": "" + }' \ + "$APPAPI_URL/k8s/exapp/stop" + +Remove ExApp (Deployment + optional PVC; Service may be removed depending on HaRP version) +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +.. code:: bash + + curl -sS \ + -H "harp-shared-key: $HP_SHARED_KEY" \ + -H "Content-Type: application/json" \ + -X POST \ + -d '{ + "name": "test-deploy", + "instance_id": "", + "remove_data": true + }' \ + "$APPAPI_URL/k8s/exapp/remove" + +-------------- + +Useful ``kubectl`` commands (debug / manual cleanup) +---------------------------------------------------- + +Check resources +~~~~~~~~~~~~~~~ + +.. code:: bash + + kubectl get deploy,svc,pvc -n nextcloud-exapps -o wide | grep -E 'test-deploy|NAME' || true + kubectl get pods -n nextcloud-exapps -o wide + +Delete Service (if it was exposed and needs manual cleanup) +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +.. code:: bash + + kubectl delete svc nc-app-test-deploy -n nextcloud-exapps + +Delete Deployment +~~~~~~~~~~~~~~~~~ + +.. code:: bash + + kubectl delete deployment nc-app-test-deploy -n nextcloud-exapps + +Delete PVC (data) +~~~~~~~~~~~~~~~~~ + +PVC name is derived from ``nc_app_test-deploy_data`` and sanitized for +k8s, typically: ``nc-app-test-deploy-data`` + +.. code:: bash + + kubectl delete pvc nc-app-test-deploy-data -n nextcloud-exapps diff --git a/developer_manual/exapp_development/scaling/KEDASetup.rst b/developer_manual/exapp_development/scaling/KEDASetup.rst new file mode 100644 index 00000000000..f7c015add69 --- /dev/null +++ b/developer_manual/exapp_development/scaling/KEDASetup.rst @@ -0,0 +1,541 @@ +Autoscaling with KEDA +===================== + +This section explains how to set up `KEDA `__ to auto-scale ExApp pods +(e.g. ``llm2``) based on the Nextcloud TaskProcessing queue depth. + +Prerequisites +------------- + +- A working Nextcloud + HaRP + k8s setup (see + :ref:`scaling-kubernetes-setup`) +- An ExApp deployed and running (e.g. ``llm2`` with deployment name + ``nc-app-llm2``) +- ``kubectl`` configured and pointing to the cluster +- ``helm`` installed (`install + guide `__) +- For GPU ExApps: the daemon must be registered with + ``--compute_device=cuda`` + +Architecture +------------ + +:: + + Users submit tasks + | + v + Nextcloud TaskProcessing Queue + (scheduled + running tasks) + | + | GET /ocs/v2.php/taskprocessing/queue_stats + | Auth: Basic (admin app_password) + | + v + KEDA (metrics-api-server in k8s) + | + | polls every pollingInterval (e.g. 15s) + | scales Deployment based on queue depth + | + v + nc-app-llm2 Deployment (1..N pods) + Each pod independently calls next_task() + +KEDA uses a ``metrics`` trigger (HTTP polling) to query Nextcloud +``queue_stats`` endpoint. When the queue grows, KEDA scales up the ExApp +deployment. When queue reduces in size, KEDA scales back down. + +-------------- + +Step 0: GPU Setup (kind cluster) +-------------------------------- + +If your ExApp needs GPU (e.g. llm2), you must set up GPU passthrough in +the kind cluster. + +0.1 Configure Docker on the host +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +.. code:: bash + + sudo nvidia-ctk runtime configure --runtime=docker --set-as-default --cdi.enabled + sudo nvidia-ctk config --set accept-nvidia-visible-devices-as-volume-mounts=true --in-place + sudo systemctl restart docker + +0.2 Create kind cluster with GPU support +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +.. code:: yaml + + # kind-gpu-config.yaml + kind: Cluster + apiVersion: kind.x-k8s.io/v1alpha4 + nodes: + - role: control-plane + extraMounts: + - hostPath: /dev/null + containerPath: /var/run/nvidia-container-devices/all + +.. code:: bash + + kind create cluster --name nc-exapps --config kind-gpu-config.yaml + +0.3 Install nvidia-container-toolkit inside the kind node +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +.. code:: bash + + docker exec nc-exapps-control-plane bash -c ' + apt-get update -y && apt-get install -y gnupg2 curl && + curl -fsSL https://nvidia.github.io/libnvidia-container/gpgkey | \ + gpg --dearmor -o /usr/share/keyrings/nvidia-container-toolkit-keyring.gpg && + curl -s -L https://nvidia.github.io/libnvidia-container/stable/deb/nvidia-container-toolkit.list | \ + sed "s#deb https://#deb [signed-by=/usr/share/keyrings/nvidia-container-toolkit-keyring.gpg] https://#g" \ + > /etc/apt/sources.list.d/nvidia-container-toolkit.list && + apt-get update && apt-get install -y nvidia-container-toolkit + ' + +0.4 Configure containerd and restart +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +.. code:: bash + + docker exec nc-exapps-control-plane bash -c ' + nvidia-ctk runtime configure --runtime=containerd --set-as-default && + systemctl restart containerd + ' + +0.5 Install NVIDIA device plugin +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +For a single GPU shared across multiple pods, use **time-slicing**. +First create a ConfigMap with the number of replicas (virtual GPUs): + +.. code:: bash + + cat <<'EOF' | kubectl apply -f - + apiVersion: v1 + kind: ConfigMap + metadata: + name: nvidia-device-plugin-config + namespace: kube-system + data: + config.yaml: | + version: v1 + sharing: + timeSlicing: + renameByDefault: false + resources: + - name: nvidia.com/gpu + replicas: 4 + EOF + +Then deploy the device plugin with the config: + +.. code:: bash + + cat <<'EOF' | kubectl apply -f - + apiVersion: apps/v1 + kind: DaemonSet + metadata: + name: nvidia-device-plugin-daemonset + namespace: kube-system + spec: + selector: + matchLabels: + name: nvidia-device-plugin-ds + template: + metadata: + labels: + name: nvidia-device-plugin-ds + spec: + tolerations: + - key: nvidia.com/gpu + operator: Exists + effect: NoSchedule + priorityClassName: system-node-critical + containers: + - image: nvcr.io/nvidia/k8s-device-plugin:v0.17.0 + name: nvidia-device-plugin-ctr + args: ["--config-file=/config/config.yaml"] + env: + - name: FAIL_ON_INIT_ERROR + value: "false" + securityContext: + allowPrivilegeEscalation: false + capabilities: + drop: ["ALL"] + volumeMounts: + - name: device-plugin + mountPath: /var/lib/kubelet/device-plugins + - name: plugin-config + mountPath: /config + volumes: + - name: device-plugin + hostPath: + path: /var/lib/kubelet/device-plugins + - name: plugin-config + configMap: + name: nvidia-device-plugin-config + items: + - key: config.yaml + path: config.yaml + EOF + +0.6 Verify GPU is visible +~~~~~~~~~~~~~~~~~~~~~~~~~ + +.. code:: bash + + kubectl get nodes -o json | python3 -c " + import json,sys + for n in json.load(sys.stdin)['items']: + gpu = n['status']['capacity'].get('nvidia.com/gpu','N/A') + print(f'{n[\"metadata\"][\"name\"]}: nvidia.com/gpu = {gpu}') + " + +Expected: ``nvidia.com/gpu = 4`` (or your configured replicas count). + +0.7 Test GPU from a pod +~~~~~~~~~~~~~~~~~~~~~~~ + +.. code:: bash + + kubectl run gpu-test --image=nvidia/cuda:12.6.3-base-ubuntu24.04 --restart=Never \ + --overrides='{"spec":{"containers":[{"name":"gpu-test","image":"nvidia/cuda:12.6.3-base-ubuntu24.04","command":["nvidia-smi"],"resources":{"limits":{"nvidia.com/gpu":"1"}}}]}}' \ + -n nextcloud-exapps + sleep 30 && kubectl logs gpu-test -n nextcloud-exapps + kubectl delete pod gpu-test -n nextcloud-exapps + +-------------- + +Step 1: Install KEDA +-------------------- + +.. code:: bash + + helm repo add kedacore https://kedacore.github.io/charts + helm repo update + helm install keda kedacore/keda --namespace keda --create-namespace + +Verify: + +.. code:: bash + + kubectl get pods -n keda + # All pods should be Running + +-------------- + +Step 2: DNS setup (kind only) +----------------------------- + +KEDA pods need to resolve ``nextcloud.local``. **HaRP does this +automatically now** — when ``HP_K8S_HOST_ALIASES`` is set, HaRP patches +the CoreDNS ``ConfigMap`` on startup and restarts CoreDNS so that every +pod in the cluster (including KEDA) can resolve the configured +hostnames. + +If you need to do it manually (or verify), the commands are: + +.. code:: bash + + # Get the nginx proxy IP + PROXY_IP=$(docker inspect master-proxy-1 \ + --format '{{(index .NetworkSettings.Networks "master_default").IPAddress}}') + echo "Proxy IP: $PROXY_IP" + + # Write the Corefile with the correct IP + cat > /tmp/Corefile << EOF + .:53 { + errors + health { + lameduck 5s + } + ready + kubernetes cluster.local in-addr.arpa ip6.arpa { + pods insecure + fallthrough in-addr.arpa ip6.arpa + ttl 30 + } + prometheus :9153 + hosts { + ${PROXY_IP} nextcloud.local + fallthrough + } + forward . /etc/resolv.conf { + max_concurrent 1000 + } + cache 30 + loop + reload + loadbalance + } + EOF + + kubectl create configmap coredns -n kube-system \ + --from-file=Corefile=/tmp/Corefile \ + --dry-run=client -o yaml | kubectl apply -f - + + kubectl rollout restart deployment coredns -n kube-system + +Verify: + +.. code:: bash + + kubectl run dns-test --rm -i --restart=Never --image=busybox -- nslookup nextcloud.local + +-------------- + +Step 3: Create a Nextcloud App Password +--------------------------------------- + +KEDA needs credentials to poll the ``queue_stats`` endpoint. The +endpoint is admin-only. + +1. Log in to Nextcloud as admin +2. Go to **Settings > Security > Devices & sessions** +3. Enter a name (e.g. ``keda-scaler``) and click **Create new app + password** +4. Copy the password into a **.env** file + +.. code:: bash + + # .env + NC_USER="admin" + NC_APP_PASSWORD="" + NC_URL="https://nextcloud.local" + +Verify: + +.. code:: bash + + source .env + curl -s -k -u "${NC_USER}:${NC_APP_PASSWORD}" \ + "${NC_URL}/ocs/v2.php/taskprocessing/queue_stats?format=json" + +Expected: + +.. code:: json + + {"ocs":{"meta":{"status":"ok","statuscode":200,"message":"OK"},"data":{"scheduled":0,"running":0}}} + +-------------- + +Step 4: Create k8s secret +------------------------- + +.. code:: bash + + kubectl create secret generic nextcloud-keda-auth \ + --namespace=nextcloud-exapps \ + --from-literal=username="${NC_USER}" \ + --from-literal=password="${NC_APP_PASSWORD}" + +-------------- + +Step 5: Create KEDA TriggerAuthentication +----------------------------------------- + +.. code:: bash + + cat <<'EOF' | kubectl apply -f - + apiVersion: keda.sh/v1alpha1 + kind: TriggerAuthentication + metadata: + name: nextcloud-auth + namespace: nextcloud-exapps + spec: + secretTargetRef: + - parameter: username + name: nextcloud-keda-auth + key: username + - parameter: password + name: nextcloud-keda-auth + key: password + EOF + +-------------- + +Step 6: Create KEDA ScaledObject +-------------------------------- + +.. note:: + + Nextcloud OCS returns XML by default. Always include ``format=json`` in the URL. + +Task type filter +~~~~~~~~~~~~~~~~ + +llm2 registers many task types. Use a comma-separated list to scale on +all of them: + +:: + + ?taskTypeId=core:text2text,core:text2text:chat,core:text2text:summary,core:text2text:headline,core:text2text:topics,core:text2text:simplification,core:text2text:reformulation,core:contextwrite,core:text2text:changetone,core:text2text:chatwithtools,core:text2text:proofread + +Apply +~~~~~ + +.. code:: yaml + + # keda-llm2-scaler.yaml + apiVersion: keda.sh/v1alpha1 + kind: ScaledObject + metadata: + name: llm2-scaler + namespace: nextcloud-exapps + spec: + scaleTargetRef: + name: nc-app-llm2 + pollingInterval: 15 + cooldownPeriod: 120 + initialCooldownPeriod: 60 + minReplicaCount: 1 + maxReplicaCount: 4 + triggers: + - type: metrics-api + metadata: + url: "https://nextcloud.local/ocs/v2.php/taskprocessing/queue_stats?format=json&taskTypeId=core:text2text,core:text2text:chat,core:text2text:summary" + valueLocation: "ocs.data.scheduled" + targetValue: "5" + authMode: "basic" + unsafeSsl: "true" + authenticationRef: + name: nextcloud-auth + +.. code:: bash + + kubectl apply -f keda-llm2-scaler.yaml + +Scaling formula +~~~~~~~~~~~~~~~ + +:: + + desiredReplicas = ceil( metricValue / targetValue ) + +=============== ============= =================== +Scheduled tasks targetValue=5 Result +=============== ============= =================== +0 \- 1 (minReplicaCount) +3 ceil(3/5)=1 1 pod +12 ceil(12/5)=3 3 pods +50 ceil(50/5)=10 4 (capped at max) +=============== ============= =================== + +-------------- + +Step 7: Verify and Monitor +-------------------------- + +Quick status +~~~~~~~~~~~~ + +.. code:: bash + + kubectl get scaledobject -n nextcloud-exapps && echo && \ + kubectl get deploy nc-app-llm2 -n nextcloud-exapps && echo && \ + kubectl get pods -n nextcloud-exapps -l app=nc-app-llm2 -o wide + +- ``READY=True`` - KEDA can reach the metrics endpoint +- ``ACTIVE=False`` - no tasks queued +- ``AVAILABLE=1`` - one pod running (minReplicaCount) + +Watch scaling live +~~~~~~~~~~~~~~~~~~ + +.. code:: bash + + # Terminal 1: pods + kubectl get pods -n nextcloud-exapps -l app=nc-app-llm2 -w + + # Terminal 2: deployment + kubectl get deploy nc-app-llm2 -n nextcloud-exapps -w + + # Terminal 3: KEDA logs + kubectl logs -n keda -l app=keda-operator -f --tail=5 + +Check HPA (KEDA creates this) +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +.. code:: bash + + kubectl get hpa -n nextcloud-exapps + kubectl describe hpa -n nextcloud-exapps + +Full dashboard +~~~~~~~~~~~~~~ + +.. code:: bash + + echo "=== ScaledObject ===" && \ + kubectl get scaledobject -n nextcloud-exapps && echo && \ + echo "=== HPA ===" && \ + kubectl get hpa -n nextcloud-exapps && echo && \ + echo "=== Deployment ===" && \ + kubectl get deploy nc-app-llm2 -n nextcloud-exapps && echo && \ + echo "=== Pods ===" && \ + kubectl get pods -n nextcloud-exapps -l app=nc-app-llm2 -o wide && echo && \ + echo "=== Queue ===" && \ + curl -s -k -u "${NC_USER}:${NC_APP_PASSWORD}" \ + "${NC_URL}/ocs/v2.php/taskprocessing/queue_stats?format=json" + +-------------- + +Tuning Guide +------------ + ++---------------------------+---------+---------+-------------------------------------+ +| Parameter | Example | Default | What it does | ++===========================+=========+=========+=====================================+ +| ``pollingInterval`` | 15 | 30 | Seconds between polls. | +| | | | Lower = faster reaction | ++---------------------------+---------+---------+-------------------------------------+ +| ``cooldownPeriod`` | 120 | 300 | Seconds to wait before scaling down | ++---------------------------+---------+---------+-------------------------------------+ +| ``initialCooldownPeriod`` | 60 | 0 | Wait after new pod starts. Set to | +| | | | 60 for LLM model loading time | ++---------------------------+---------+---------+-------------------------------------+ +| ``minReplicaCount`` | 1 | 0 | Min pods. Must be 1+ (AppAPI needs | +| | | | at least one pod for heartbeat) | ++---------------------------+---------+---------+-------------------------------------+ +| ``maxReplicaCount`` | 4 | 100 | Max pods. Match your GPU count or | +| | | | time-slicing replicas | ++---------------------------+---------+---------+-------------------------------------+ +| ``targetValue`` | 5 | \- | Tasks per pod. | +| | | | Lower = more pods sooner | ++---------------------------+---------+---------+-------------------------------------+ + +GPU time-slicing notes +~~~~~~~~~~~~~~~~~~~~~~ + +- One physical GPU can be shared by multiple pods using NVIDIA + time-slicing +- Each llm2 pod uses about 8GB VRAM (model dependent) +- RTX 5090 (32GB): can run 3-4 pods with time-slicing replicas=4 +- RTX 4090 (24GB): can run 2-3 pods with time-slicing replicas=3 +- Set ``maxReplicaCount`` to match your time-slicing replicas +- CUDA gives each pod equal GPU time + +LLM notes +~~~~~~~~~ + +- Model loading takes 30-60s. New pods are not ready right away +- Use ``initialCooldownPeriod`` to avoid over-scaling during warmup +- PVC access mode is ``ReadWriteOnce``. Works on single-node only +- Multi-node clusters are not supported yet + +-------------- + +Cleanup +------- + +.. code:: bash + + # Remove KEDA ScaledObject + kubectl delete scaledobject llm2-scaler -n nextcloud-exapps + + # Remove auth resources + kubectl delete triggerauthentication nextcloud-auth -n nextcloud-exapps + kubectl delete secret nextcloud-keda-auth -n nextcloud-exapps diff --git a/developer_manual/exapp_development/scaling/KubernetesSetup.rst b/developer_manual/exapp_development/scaling/KubernetesSetup.rst new file mode 100644 index 00000000000..53e2a986d3c --- /dev/null +++ b/developer_manual/exapp_development/scaling/KubernetesSetup.rst @@ -0,0 +1,399 @@ +.. _scaling-kubernetes-setup: + +Setting up Kubernetes +===================== + +This guide will help you set up a local Kubernetes cluster +(via `kind `__) +with HaRP and AppAPI for ExApp development. +After completing these steps you will be able to register a k8s deploy daemon in Nextcloud and deploy a test app. + +Prerequisites +------------- + +- Docker must be installed and running +- A `nextcloud-docker-dev `__ environment running at ``https://nextcloud.local`` + + - The Nextcloud container is on the ``master_default`` Docker + network + +- ``kubectl`` installed (`install + guide `__) +- ``kind`` installed (`install + guide `__) +- HaRP repository cloned (e.g. ``~/nextcloud/HaRP``) + +Architecture Overview +--------------------- + +:: + + Browser / OCC + | + Nextcloud (PHP, in Docker container) + | OCC commands or API calls + v + nginx proxy ──/exapps/──> HaRP (host network, port 8780) + | + | k8s API calls (Deployments, Services, PVCs) + v + kind cluster (nc-exapps) + | + v + ExApp Pod (e.g. test-deploy) + +- **HaRP** runs on the host network (``--network=host``) and + communicates with: + + - The kind k8s API server (via ``https://127.0.0.1:``) + - ExApp pods via NodePort services (via the kind node IP) + +- **Nextcloud** reaches HaRP via the Docker network gateway IP +- **nginx proxy** forwards ``/exapps/`` requests to HaRP + +Step 1: Create the kind Cluster +------------------------------- + +.. code:: bash + + kind create cluster --name nc-exapps + +Verify: + +.. code:: bash + + kubectl config use-context kind-nc-exapps + kubectl cluster-info + kubectl get nodes -o wide + +Note the **API server URL** (e.g. ``https://127.0.0.1:37151``) and the +**node InternalIP** (e.g. ``172.18.0.2``): + +.. code:: bash + + # API server + kubectl config view --minify -o jsonpath='{.clusters[0].cluster.server}' + + # Node internal IP + kubectl get nodes -o jsonpath='{.items[0].status.addresses[?(@.type=="InternalIP")].address}' + +Step 2: Create Namespace and RBAC +--------------------------------- + +.. code:: bash + + # Create the ExApps namespace + kubectl create namespace nextcloud-exapps + + # Create a ServiceAccount for HaRP + kubectl -n nextcloud-exapps create serviceaccount harp-exapps + + # Grant cluster-admin (for development; restrict in production) + kubectl create clusterrolebinding harp-exapps-admin \ + --clusterrole=cluster-admin \ + --serviceaccount=nextcloud-exapps:harp-exapps + +Generate a bearer token (valid for 1 year): + +.. code:: bash + + kubectl -n nextcloud-exapps create token harp-exapps --duration=8760h + +.. + + The ``redeploy_host_k8s.sh`` script generates this token + automatically, so you don’t need to copy it manually. + +Step 3: Configure the nginx Proxy +--------------------------------- + +The nextcloud-docker-dev nginx proxy must forward ``/exapps/`` to HaRP. + +Find the gateway IP of the ``master_default`` Docker network (this is +how containers reach the host): + +.. code:: bash + + docker network inspect master_default \ + --format '{{range .IPAM.Config}}Gateway: {{.Gateway}}{{end}}' + +Typically this is your host IP like ``192.168.21.1`` (may vary on your +machine). + +Edit the nginx vhost file: + +.. code:: bash + + # Path relative to your nextcloud-docker-dev checkout: + # data/nginx/vhost.d/nextcloud.local_location + +Set the content to: + +.. code:: nginx + + location /exapps/ { + set $harp_addr :8780; + proxy_pass http://$harp_addr; + + # Forward the true client identity + proxy_set_header Host $host; + proxy_set_header X-Real-IP $remote_addr; + proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for; + proxy_set_header X-Forwarded-Proto $scheme; + } + +Replace ```` with the gateway from above +(e.g. ``192.168.21.1``). + +Reload nginx: + +.. code:: bash + + docker exec master-proxy-1 nginx -s reload + +Step 4: Build and Deploy HaRP +----------------------------- + +From the HaRP repository root: + +.. code:: bash + + cd ~/nextcloud/HaRP + bash development/redeploy_host_k8s.sh + +The script will: + + 1. Auto-detect the k8s API server URL + 2. Generate a fresh bearer token + 3. Build the HaRP Docker image + 4. Start HaRP with k8s backend enabled on host network + +Wait for HaRP to become healthy: + +.. code:: bash + + docker ps | grep harp + # Should show "(healthy)" after ~15 seconds + +Check logs if needed: + +.. code:: bash + + docker logs appapi-harp --tail=20 + +Step 5: Register the k8s Deploy Daemon in Nextcloud +--------------------------------------------------- + +Run this inside the Nextcloud container (replace ```` with +your container ID or name, and ```` with the gateway from +Step 3): + +.. code:: bash + + docker exec php occ app_api:daemon:register \ + k8s_local "Kubernetes Local" "kubernetes-install" \ + "http" ":8780" "http://nextcloud.local" \ + --harp \ + --harp_shared_key "some_very_secure_password" \ + --harp_frp_address ":8782" \ + --k8s \ + --k8s_expose_type=nodeport \ + --set-default + +Verify: + +.. code:: bash + + docker exec php occ app_api:daemon:list + +Step 6: Run Test Deploy +----------------------- + +Via OCC +~~~~~~~ + +.. code:: bash + + docker exec php occ app_api:app:register test-deploy k8s_local \ + --info-xml https://raw.githubusercontent.com/nextcloud/test-deploy/main/appinfo/info.xml \ + --test-deploy-mode + +Expected output: + +:: + + ExApp test-deploy deployed successfully. + ExApp test-deploy successfully registered. + +Via API (same as what the Admin UI uses) +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +.. code:: bash + + # Start test deploy + curl -X POST -u admin:admin -H "OCS-APIREQUEST: true" -k \ + "https://nextcloud.local/index.php/apps/app_api/daemons/k8s_local/test_deploy" + + # Check status + curl -u admin:admin -H "OCS-APIREQUEST: true" -k \ + "https://nextcloud.local/index.php/apps/app_api/daemons/k8s_local/test_deploy/status" + + # Stop and clean up + curl -X DELETE -u admin:admin -H "OCS-APIREQUEST: true" -k \ + "https://nextcloud.local/index.php/apps/app_api/daemons/k8s_local/test_deploy" + +Verify k8s Resources +~~~~~~~~~~~~~~~~~~~~ + +.. code:: bash + + kubectl get deploy,svc,pvc,pods -n nextcloud-exapps -o wide + +Unregister +~~~~~~~~~~ + +.. code:: bash + + docker exec php occ app_api:app:unregister test-deploy + +Cluster Overview +---------------- + +==================== =========================== +Component Value +==================== =========================== +**Type** kind (Kubernetes in Docker) +**Cluster Name** ``nc-exapps`` +**Node** ``nc-exapps-control-plane`` +**ExApps Namespace** ``nextcloud-exapps`` +**ServiceAccount** ``harp-exapps`` +==================== =========================== + +Monitoring Commands +------------------- + +Cluster Status +~~~~~~~~~~~~~~ + +.. code:: bash + + kubectl cluster-info + kubectl get nodes -o wide + kubectl get pods -n nextcloud-exapps + kubectl get pods -n nextcloud-exapps -w # watch in real-time + +Pod Inspection +~~~~~~~~~~~~~~ + +.. code:: bash + + kubectl describe pod -n nextcloud-exapps + kubectl logs -n nextcloud-exapps + kubectl logs -f -n nextcloud-exapps # follow logs + kubectl logs --previous -n nextcloud-exapps # after restart + +Resources +~~~~~~~~~ + +.. code:: bash + + kubectl get svc,deploy,pvc -n nextcloud-exapps + kubectl get all -n nextcloud-exapps + +HaRP Logs +~~~~~~~~~ + +.. code:: bash + + docker logs appapi-harp --tail=50 + docker logs -f appapi-harp # follow + +Troubleshooting +--------------- + +HaRP can’t reach k8s API +~~~~~~~~~~~~~~~~~~~~~~~~ + +.. code:: bash + + # Check if kind container is running + docker ps | grep kind + + # Verify API server is reachable from host + curl -k https://127.0.0.1:37151/version + +Nextcloud can’t reach HaRP +~~~~~~~~~~~~~~~~~~~~~~~~~~ + +.. code:: bash + + # From inside the Nextcloud container, test connectivity to HaRP: + docker exec curl -s http://:8780/ + + # Should return "404 Not Found" (HaRP is responding) + # If connection refused: check HaRP is running and gateway IP is correct + +Heartbeat fails after successful deploy +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +Check HaRP logs for routing errors: + +.. code:: bash + + docker logs appapi-harp --tail=20 + +HaRP lazily resolves the k8s Service upstream on first request after a +restart, so restarting HaRP does **not** require re-deploying ExApps. If +heartbeat still fails, verify the k8s Service exists and is reachable: + +.. code:: bash + + kubectl get svc -n nextcloud-exapps + +Pods stuck in Pending +~~~~~~~~~~~~~~~~~~~~~ + +.. code:: bash + + kubectl describe pod -n nextcloud-exapps + # Check Events section for scheduling or image pull issues + +Image pull errors +~~~~~~~~~~~~~~~~~ + +The kind cluster needs to be able to pull images. For public images +(like ``ghcr.io/nextcloud/test-deploy:release``) this should work out of +the box. + +Token expired +~~~~~~~~~~~~~ + +Regenerate by rerunning the redeploy script: + +.. code:: bash + + cd ~/nextcloud/HaRP + bash development/redeploy_host_k8s.sh + +Clean up all ExApp resources +~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +.. code:: bash + + kubectl delete deploy,svc,pvc -n nextcloud-exapps --all + +Reset everything +~~~~~~~~~~~~~~~~ + +.. code:: bash + + # Remove daemon config + docker exec php occ app_api:daemon:unregister k8s_local + + # Delete kind cluster + kind delete cluster --name nc-exapps + + # Remove HaRP container + docker rm -f appapi-harp + +Then start again from Step 1. diff --git a/developer_manual/exapp_development/scaling/index.rst b/developer_manual/exapp_development/scaling/index.rst new file mode 100644 index 00000000000..f1e8aeeebb8 --- /dev/null +++ b/developer_manual/exapp_development/scaling/index.rst @@ -0,0 +1,32 @@ +Scaling ExApps +============== + +AppAPI delegates the scaling task to the ExApp itself. +This means that the ExApp must be designed in a way so that it is possible to scale vertically. +As for horizontal scaling, we recommend using Kubernetes for this. + +You could also implement, for example, a Server-Worker architecture for basic scaling. +In this case, the Server is your ExApp and the Workers are the external machines that can work with the ExApp +using Nextcloud user authentication. +Additional clients (or workers) can be (optionally) added (or attached) to the ExApp +to increase the capacity and performance. + +The rest of this section will explain how to setup and use Kubernetes for automated scaling. +Additional instructions are also provided if you have a GPU device for GPU scaling. + + +.. note:: + + Currently, if a Deploy daemon is configured with GPUs available, + AppAPI will by default attach all available GPU devices to each ExApp container on this Deploy daemon. + This means that these GPUs are shared between all ExApps on the same Deploy daemon. + Therefore, for the ExApps that require heavy use of GPUs, + it is recommended to have a separate Deploy daemon (host) for them. + + +.. toctree:: + :maxdepth: 2 + + KubernetesSetup + KEDASetup + AppAPIEmulation