generative-computing · psschwei · May 29, 2026
@@ -18,7 +18,7 @@
     "\n",
     "## Prerequisites\n",
     "\n",
-    "- **GPU runtime: A100 (Colab Pro) recommended.** L4 works. T4 will OOM — both Granite models won't fit.\n",
+    "- **GPU runtime: A100 (Colab Pro) required.** Smaller GPUs won't have enough VRAM to hold both Granite models simultaneously.\n",
     "- **HuggingFace read token.** Free; create one at https://huggingface.co/settings/tokens. Add it as a Colab Secret named `HF_TOKEN` (sidebar → 🔑 → New secret). Used for two things: downloading the Granite model weights, *and* minting per-session WebRTC TURN credentials so audio reaches your browser.\n",
     "- **Browser:** Chrome, Edge, or Firefox. Safari may behave oddly with WebRTC.\n",
     "\n",
@@ -30,7 +30,7 @@
     "## What to do\n",
     "\n",
     "1. Set the `HF_TOKEN` Colab Secret.\n",
-    "2. Switch the runtime to a GPU (Runtime → Change runtime type → A100/L4).\n",
+    "2. Switch the runtime to an A100 GPU (Runtime → Change runtime type → A100).\n",
     "3. **Runtime → Run all.**\n",
     "4. When the last cell prints a `*.trycloudflare.com` URL, open it, allow mic access, and start talking.\n",
     "\n",
@@ -225,7 +225,7 @@
     "View one with `!tail -100 logs/vllm-speech.log` (or open the file from the Colab file browser).\n",
     "\n",
     "**Common failures:**\n",
-    "- *T4 OOM:* switch the runtime to A100 or L4. Both Granite models won't fit on a T4.\n",
+    "- *GPU OOM:* switch the runtime to an A100. Both Granite models won't fit on smaller GPUs (T4, L4).\n",
     "- *`HF_TOKEN` missing:* re-run Cell 3 after adding the secret. Without it, the backend falls back to STUN-only and audio likely won't connect through the cloudflared tunnel.\n",
     "- *Stuck \"waiting for vLLM\":* model weights are downloading. The cell waits up to 20 min — let it run.\n",
     "- *Re-running cells without cleaning up:* old processes still hold the ports. Run the kill-switch cell below, then re-run from the top.\n",