English | 中文
StackChan HA Add-ons turns your StackChan desktop robot into an AI voice assistant integrated with your smart home — no Xiaozhi account needed.
StackChan is a palm-sized robot built on the M5Stack CoreS3 (ESP32-S3). It ships with the open-source xiaozhi-esp32 firmware, which normally relies on the Xiaozhi cloud for speech recognition, language model, and text-to-speech. This add-on replaces that with a Home Assistant add-on: your voice data goes to OpenAI or Google Gemini (your pick) instead of Xiaozhi, and Home Assistant stays entirely on your local network.
The device firmware is never modified — the add-on speaks the same Xiaozhi WebSocket protocol v3 the device already expects. Voice commands are processed by either OpenAI Realtime API or Google Gemini Live API (streaming speech-to-speech, ~0.5–1.5 s latency, switchable from the add-on UI), and the robot can control any Home Assistant device by voice.
Key features:
- Choice of provider: OpenAI Realtime API or Google Gemini Live API, switchable in the add-on UI
- ~0.5–1.5 s end-to-end latency via streaming speech-to-speech
- Controls lights, climate, covers, media players, and scripts by voice
- Area-based control ("turn off all lights in the living room")
- No Xiaozhi account — audio is processed by OpenAI/Gemini, HA stays on your LAN
- Easy installation as a standard Home Assistant add-on
This add-on replaces the Xiaozhi cloud entirely. The StackChan device thinks it's talking to Xiaozhi's servers, but it's actually talking to this local server running on your Home Assistant.
StackChan ESP32-S3 (unmodified xiaozhi-esp32 firmware)
│ Xiaozhi WebSocket protocol v3 (OPUS audio + JSON)
▼
StackChan AI Server (this add-on, on your HA at port 12800)
├─ /xiaozhi/ota/ → returns local WebSocket address
└─ /xiaozhi/ws → WebSocket session
├─ OpenAI Realtime API ─┐
│ ├─ STT + LLM + TTS, streaming (pick one)
└─ Gemini Live API ──┘
└─ Home Assistant WebSocket API (device control)
Audio pipeline (streaming, ~0.5–1.5s latency):
Device OPUS (16kHz) → PCM → OpenAI Realtime / Gemini Live
↓ server VAD detects speech end
Streaming PCM response (24kHz)
↓
OPUS encode → Device speaker
No Xiaozhi account needed. The only cloud dependency is your chosen provider (OpenAI or Google).
Low-latency speech-to-speech conversation powered by OpenAI Realtime API (gpt-realtime-1.5) or Google Gemini Live API (gemini-2.5-flash-preview-native-audio-dialog) — pick your provider in the add-on UI. Both options give natural-language control of Home Assistant devices.
Features:
- Switchable AI provider: OpenAI Realtime / Gemini Live (dropdown)
- ~0.5–1.5s response latency (server-side VAD, streaming audio)
- Controls HA devices by voice: lights, climate, covers, media players, scripts
- Area-based control ("turn off all lights in the living room")
- Conversation history maintained across utterances within a session
- Configurable voice and model via dropdown in the add-on UI
- In Home Assistant, go to Settings → Add-ons → Add-on Store
- Click the three-dot menu (top right) → Repositories
- Add this URL:
https://github.com/rudyll/stackchan_ha_addons - Find StackChan AI Server in the store and click Install
- Go to the add-on Configuration tab and fill in the required fields (see below)
- Start the add-on
Pick one AI provider via ai_provider and fill in only its API key. The other provider's fields can stay blank.
| Option | Required | Description |
|---|---|---|
local_host |
✅ | LAN IP of your Home Assistant instance (e.g. 192.168.1.100). The device uses this to connect. |
ha_mcp_token |
✅ | HA Long-Lived Access Token. Create one in Profile → Security → Long-Lived Access Tokens. |
ai_provider |
✅ | openai (default) or gemini. Selects which backend handles speech + LLM + TTS. |
system_prompt |
Custom personality/instructions for the assistant. | |
OpenAI (when ai_provider=openai) |
||
openai_api_key |
✅ | Your OpenAI API key from platform.openai.com. |
openai_realtime_model |
Realtime model. Default: gpt-realtime-1.5. Mini (cheaper): gpt-realtime-mini, gpt-4o-mini-realtime-preview. |
|
openai_tts_voice |
TTS voice. Default: alloy. Female voices: nova, shimmer, coral, sage, cedar, marin, cove. |
|
Gemini (when ai_provider=gemini) |
||
gemini_api_key |
✅ | Your Google AI Studio API key from aistudio.google.com. |
gemini_model |
Gemini Live model. Default: gemini-2.5-flash-preview-native-audio-dialog. |
|
gemini_voice |
TTS voice. Default: Aoede. Options: Aoede, Charon, Fenrir, Kore, Puck. |
The device firmware needs to know your local server address instead of the Xiaozhi cloud. There are two ways to do this.
Which method should I use? Use Method A (NVS) for most cases — re-injection is just two commands and doesn't require recompiling. Use Method B (compile) only if you want to make other firmware customisations at the same time.
⚠️ Important: The official xiaozhi-esp32 OTA upgrade writes a full flash image and overwrites the NVS partition. After any firmware upgrade you will need to re-inject the NVS key (Steps 3–4 of Method A). This is still much faster than recompiling.💡 Shortcut: Run
python3 flash_nvs.pyfor an interactive guided injector that handles all four steps automatically (English / 中文).
The firmware checks NVS (non-volatile storage) for an OTA URL override before using its hardcoded default. This setting persists across firmware OTA upgrades, so you only need to do it once.
You need ESP-IDF installed. Follow the official installation guide if you haven't done this yet.
Every time you open a new terminal, activate the ESP-IDF environment first — otherwise the parttool.py scripts will not be found.
- macOS / Linux:
. $HOME/esp/esp-idf/export.sh
- Windows (PowerShell):
C:\esp\v6.0.1\esp-idf\export.ps1
- Windows (Command Prompt):
C:\esp\v6.0.1\esp-idf\export.bat
Verify activation: idf.py --version should print the IDF version without errors.
Step 1 — Find your NVS partition size:
python3 $IDF_PATH/components/partition_table/parttool.py \
--port /dev/tty.usbserial-XXXX \
get_partition_info --partition-name nvsNote the size value (commonly 0x4000 or 0x6000). Replace /dev/tty.usbserial-XXXX with your device's serial port (see Finding your serial port below).
Step 2 — Create the NVS data file:
cat > nvs.csv << 'EOF'
key,type,encoding,value
wifi,namespace,,
ota_url,data,string,http://<YOUR_HA_IP>:12800/xiaozhi/ota/
EOFReplace <YOUR_HA_IP> with your Home Assistant's LAN IP (same as local_host in the add-on config).
Step 3 — Generate the NVS binary (replace 0x4000 with the actual size from Step 1):
python3 $IDF_PATH/components/nvs_flash/nvs_partition_generator/nvs_partition_gen.py \
generate nvs.csv nvs.bin 0x4000Step 4 — Write to device:
python3 $IDF_PATH/components/partition_table/parttool.py \
--port /dev/tty.usbserial-XXXX \
write_partition --partition-name nvs --input nvs.binUse this only if you need to make other firmware customisations. Note that the OTA URL baked in via menuconfig will be overwritten if the device performs a firmware OTA upgrade — in that case you will need to redo Step 3–4 of Method A anyway.
Same ESP-IDF installation and environment activation as Method A above.
-
Clone and set up the StackChan firmware:
git clone https://github.com/m5stack/StackChan.git cd StackChan/firmware python3 fetch_repos.py -
Install third-party component dependencies:
idf.py add-dependency "bblanchon/arduinojson" idf.py update-dependenciesDo not skip this step — it installs
ArduinoJsonand other components declared inidf_component.yml. Skipping it causes aFailed to resolve component 'ArduinoJson'error during build. -
Open menuconfig and set the OTA URL:
idf.py menuconfig
- Press
/and search forOTA_URL - Set it to
http://<YOUR_HA_IP>:12800/xiaozhi/ota/ - Save and exit
- Press
-
Build and flash:
idf.py set-target esp32s3 idf.py build idf.py -p /dev/tty.usbserial-XXXX -b 921600 flash
- macOS:
ls /dev/tty.usb* - Linux:
ls /dev/ttyUSB* /dev/ttyACM*
If the device has no Wi-Fi credentials (factory reset or first flash):
- Download the StackChan World app (iOS / Android)
- Open the app and follow the "Add device" flow
- The app uses Bluetooth to push your Wi-Fi credentials to the device
- Once connected, the device will use the OTA URL you configured (via NVS or menuconfig) to reach your local add-on instead of the Xiaozhi cloud
The official xiaozhi-esp32 OTA upgrade writes a full flash image, which overwrites the NVS partition. After any firmware upgrade you will need to re-inject the NVS key by redoing Steps 3–4 of Method A. This is still much faster than recompiling the firmware from source.
| Port | Purpose |
|---|---|
12800/tcp |
Main StackChan WebSocket server (OTA discovery + WebSocket AI session) |
443/tcp |
Legacy HTTPS intercept (unused) |
MIT — see LICENSE