Skip to content
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 3 additions & 3 deletions tutorials/notebooks/granite_speech_demo.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -18,7 +18,7 @@
"\n",
"## Prerequisites\n",
"\n",
"- **GPU runtime: A100 (Colab Pro) recommended.** L4 works. T4 will OOM — both Granite models won't fit.\n",
"- **GPU runtime: A100 (Colab Pro) required.** Smaller GPUs won't have enough VRAM to hold both Granite models simultaneously.\n",
"- **HuggingFace read token.** Free; create one at https://huggingface.co/settings/tokens. Add it as a Colab Secret named `HF_TOKEN` (sidebar → 🔑 → New secret). Used for two things: downloading the Granite model weights, *and* minting per-session WebRTC TURN credentials so audio reaches your browser.\n",
"- **Browser:** Chrome, Edge, or Firefox. Safari may behave oddly with WebRTC.\n",
"\n",
Expand All @@ -30,7 +30,7 @@
"## What to do\n",
"\n",
"1. Set the `HF_TOKEN` Colab Secret.\n",
"2. Switch the runtime to a GPU (Runtime → Change runtime type → A100/L4).\n",
"2. Switch the runtime to an A100 GPU (Runtime → Change runtime type → A100).\n",
"3. **Runtime → Run all.**\n",
"4. When the last cell prints a `*.trycloudflare.com` URL, open it, allow mic access, and start talking.\n",
"\n",
Expand Down Expand Up @@ -225,7 +225,7 @@
"View one with `!tail -100 logs/vllm-speech.log` (or open the file from the Colab file browser).\n",
"\n",
"**Common failures:**\n",
"- *T4 OOM:* switch the runtime to A100 or L4. Both Granite models won't fit on a T4.\n",
"- *GPU OOM:* switch the runtime to an A100. Both Granite models won't fit on smaller GPUs (T4, L4).\n",
"- *`HF_TOKEN` missing:* re-run Cell 3 after adding the secret. Without it, the backend falls back to STUN-only and audio likely won't connect through the cloudflared tunnel.\n",
"- *Stuck \"waiting for vLLM\":* model weights are downloading. The cell waits up to 20 min — let it run.\n",
"- *Re-running cells without cleaning up:* old processes still hold the ports. Run the kill-switch cell below, then re-run from the top.\n",
Expand Down