This project manages Vapi voice agent configurations as code. All resources (assistants, tools, squads, etc.) are declarative files that sync to the Vapi platform via a gitops engine.
You do NOT need to know how Vapi works internally. This guide tells you everything you need to author and modify resources.
Prompt quality: Whenever you create a new assistant or change an existing assistant’s system prompt, read docs/Vapi Prompt Optimization Guide.md first. It goes deeper on structure, voice constraints, tool usage, and evaluation than the summary in this file.
Org-scoped resources: Resources live in resources/<org>/ (e.g. resources/my-org/, resources/my-org-prod/). Each org directory is isolated — npm run push -- my-org only touches resources/my-org/. Run npm run setup to create a new org.
Template-safe first run: In a fresh clone, prefer npm run pull -- <org> --bootstrap to refresh .vapi-state.<org>.json and credential mappings without materializing the target org's resources into resources/<org>/. npm run push -- <org> will auto-run the same bootstrap sync when it detects empty or stale state for the resources being applied.
Excluding resources from sync (.vapi-ignore): To prevent specific resources from being touched in either direction (e.g. assistants owned by another team or legacy resources you don't want to manage), create resources/<org>/.vapi-ignore with gitignore-style patterns. See resources/.vapi-ignore.example for syntax and examples. The list is bidirectional: matched ids are skipped on pull (never written), on push and apply (never sent), and orphan-protected (a --force push will not DELETE a dashboard resource whose id matches the ignore). --force on push bypasses the load-filter so a deliberate override can flow through, but orphan-protect still applies. A resource that references an ignored resource (e.g. a squad pointing at assistants/foo while assistants/foo is ignored) is a validation ERROR — --strict push aborts before any API call.
Learnings & recipes: Before configuring resources or debugging issues, read the relevant file in docs/learnings/. Load only what you need:
| Working on | Read |
|---|---|
| Assistants (model, voice, transcriber, hooks) | docs/learnings/assistants.md |
| Tools (apiRequest, function, transferCall, handoff, code) | docs/learnings/tools.md |
| Squads / multi-agent handoffs | docs/learnings/squads.md |
| Transfers not working | docs/learnings/transfers.md |
| Structured outputs / post-call analysis | docs/learnings/structured-outputs.md |
| Simulations / test suites | docs/learnings/simulations.md |
| Webhooks / server config | docs/learnings/webhooks.md |
| Latency optimization | docs/learnings/latency.md |
| Fallback providers / error hooks | docs/learnings/fallbacks.md |
| Azure OpenAI BYOK with regional failover | docs/learnings/azure-openai-fallback.md |
| Multilingual agents (English/Spanish) | docs/learnings/multilingual.md |
| WebSocket audio streaming | docs/learnings/websocket.md |
| Building outbound calling agents | docs/learnings/outbound-agents.md |
| Bulk-dialing from a CSV (Outbound Call Campaigns) | docs/learnings/outbound-campaigns.md |
| Voicemail detection / VM vs human classification | docs/learnings/voicemail-detection.md |
| Enforcing call time limits / graceful call ending | docs/learnings/call-duration.md |
| Voice provider field cheat-sheet (Cartesia vs 11labs vs OpenAI etc.) | docs/learnings/voice-providers.md |
| YAML authoring conventions, .vapi-ignore lifecycle | docs/learnings/yaml-conventions.md |
Where new knowledge goes:
| Kind of knowledge | Home | Convention |
|---|---|---|
| Per-resource gotchas, recipes, troubleshooting | docs/learnings/<topic>.md |
One file per resource type or topic. Add a row to this table AND to docs/learnings/README.md when you add a new file. CLAUDE.md mirrors this list — keep both in sync. |
| Engine-friction log (push/pull/state/cleanup pain points + fixes) | improvements.md |
Format: Problem → Current behavior → Risk → Current mitigation → Possible fix → Status. Mark [RESOLVED YYYY-MM-DD] (#<PR>) when fixed; never delete. |
| Code-level rationale (why a function works the way it does) | Code comments | Only when the WHY is non-obvious — not what the code does. Don't reference PR/issue numbers; they rot. |
| Setup, install, repo orientation | README.md |
One-time onboarding only. Don't put runtime gotchas here. |
If you're unsure where something goes, default to docs/learnings/. The README and engine-friction log are deliberately narrow.
| I want to... | What to do |
|---|---|
| Edit an assistant's system prompt | Edit the markdown body in resources/<org>/assistants/<name>.md |
| Change assistant settings | Edit the YAML frontmatter in the same .md file |
| Add a new tool | Create resources/<org>/tools/<name>.yml |
| Add a new assistant | Create resources/<org>/assistants/<name>.md |
| Create a multi-agent squad | Create resources/<org>/squads/<name>.yml |
| Add post-call analysis | Create resources/<org>/structuredOutputs/<name>.yml |
| Write test simulations | Create files under resources/<org>/simulations/ |
| Promote resources across orgs | Copy files between resources/<org-a>/ and resources/<org-b>/ |
| Deploy local changes (default) | npm run apply -- <org> — pull → merge → push, safe against dashboard drift |
| Pre-flight schema check (no network) | npm run validate -- <org> — run before every apply |
| Audit state/dashboard drift (read-only) | npm run audit -- <org> — orphans, ghosts, content-identical clusters, inline-tools. Exit 1 on findings. |
| Pull latest from Vapi | npm run pull -- <org>, --force, or --bootstrap |
| Pull one known remote resource | npm run pull -- <org> --type assistants --id <uuid> |
| Deploy a single file | npm run apply -- <org> resources/<org>/assistants/my-agent.md |
| Recover from a bad deploy | npm run rollback -- <org> --list then --to <ISO-timestamp> |
| Raw push (no pre-pull) | npm run push -- <org> — see safety hierarchy below; rarely the right call |
| Push with new resources | npm run push -- <org> --allow-new-files — bypass orphan-YAML gate. AI agents: do NOT auto-pass this flag; confirm with the human first (see push section below) |
| Test a call | npm run call -- <org> -a <assistant-name> or -s <squad-name> |
| Run a simulation suite | npm run sim -- <org> --suite <name> --target <assistant-name> |
Three commands deploy changes to the Vapi platform. Pick the safest one that fits the task. apply is the default.
Pulls the platform's current state, merges with your local files, then pushes the merged result. This protects you from racing dashboard edits made between your last pull and your push. Use this for ~99% of deployments.
npm run validate -- <org> # schema check first, no network call
npm run apply -- <org> # full-org apply
npm run apply -- <org> <path-to-file> # single-file apply (same safety, scoped diff)Runs the engine's local validators against every YAML/MD file in the org without any network call. Catches shape errors (missing required fields, wrong types, stale tool references) before they burn a deploy. Run before every apply.
Skips the merge pass. Only use when (a) you literally just ran pull and (b) you're certain no one has touched the dashboard since. In a multi-developer environment or when dashboard editors are in play, default to apply instead. Stale local state can clobber recent dashboard edits or PATCH against UUIDs that no longer exist.
If you do use push, dry-run first: npm run push -- <org> --dry-run.
push refuses by default when any local YAML file lacks a corresponding entry in .vapi-state.<org>.json. The engine can't disambiguate "intentionally new resource" vs "rename of an existing resource" vs "stale cruft" from the file alone. Silently treating every orphan as a create has been the spawn-source for duplicate-resource cascades on customer dashboards.
When the gate fires, push exits 1 with a verbose message listing every orphan and pairing them with possible "rename source" candidates (state entries with no matching local file that share a base slug). Read the message — it tells you exactly what to do for each case.
Override: --allow-new-files. Pass this flag ONLY after confirming with the human operator that each orphan is intentionally a new resource (case a above). For renames (case b), rename the file back and run npm run pull -- <org> to re-key state. For stale files (case c), delete them locally.
FOR AI AGENTS: when you encounter this gate, do NOT auto-pass --allow-new-files. Surface the error message to the human and ask them to classify each orphan. Silent bypass defeats the entire purpose of the gate.
Suppressed automatically when:
--bootstrapis passed (every file is legitimately "new" in a from-scratch population).- The file is matched by
.vapi-ignore(the engine wasn't going to upload it anyway). - A selective push (
-- <path>) is requested and the orphan is outside the selection.
The same gate fires inside apply (which runs pull → merge → push). If pull's rename-detection orphans a local YAML, apply's push stage hits the gate and halts the whole apply with one explicit message. The --allow-new-files flag propagates through apply -- <org> --allow-new-files to the push stage.
npm run rollback -- <org> --list— every push/apply writes a state snapshot to.vapi-state.<org>.snapshots/before mutating.--to <ISO-timestamp>re-applies a specific snapshot, effectively undoing the deploy.npm run cleanup -- <org>(no--force) — enumerate orphaned dashboard resources without deleting. Destructive run is double-gated: requires--force --confirm <org>.- Surgical alternative to
--forcecleanup: when the orphan set includes Vapi-default fixtures (seedocs/learnings/simulations.md— the seven immortal stock personalities), delete individual resources viacurl -X DELETEagainst the API, thennpm run pull -- <org> --bootstrapto refresh state.--forcehalts on the first immortal-default 404 and exits non-zero.
git status— uncommitted changes are intentional?npm run validate -- <org>— schema clean?npm run apply -- <org>(orapply -- <org> <path>for single-file)- Verify with
npm run call -- <org> -a <name>and/ornpm run sim -- <org> --suite <name> --target <name> - If something looks wrong:
npm run rollback -- <org> --list→--to <timestamp>
Why this matters: the gitops engine tracks resource UUIDs in .vapi-state.<org>.json. Bare push trusts that file is current. If the dashboard has changed since your last pull — someone edited an assistant in the UI, a teammate ran a different push, a structured output got linked via another path — your local state can be stale by minutes. apply refreshes state before mutating, eliminating that entire class of race.
docs/
├── Vapi Prompt Optimization Guide.md # In-depth prompt authoring
├── changelog.md # Template for tracking per-customer config changes
└── learnings/ # Gotchas, recipes, and troubleshooting
├── README.md # Task-routed index — start here
├── tools.md # Tool configuration gotchas (incl. dedup behavior)
├── assistants.md # Assistant configuration gotchas
├── squads.md # Squad and multi-agent gotchas
├── structured-outputs.md # Structured output gotchas + KPI patterns
├── simulations.md # Simulation and testing gotchas
├── webhooks.md # Server and webhook gotchas
├── transfers.md # Transfer troubleshooting runbook
├── latency.md # Latency optimization guide
├── fallbacks.md # Fallback and error handling recipes
├── azure-openai-fallback.md # Azure OpenAI BYOK multi-region setup
├── multilingual.md # Multilingual agent architecture guide
├── websocket.md # WebSocket transport rules
├── outbound-agents.md # Outbound agent design & IVR navigation
├── outbound-campaigns.md # Bulk-dial CSV campaigns + dynamic variables
├── voicemail-detection.md # Voicemail vs human classification
├── call-duration.md # Call time limits and graceful end-of-call
├── voice-providers.md # Per-provider voice block field cheat-sheet
└── yaml-conventions.md # YAML authoring conventions, .vapi-ignore lifecycle
resources/
├── <org>/ # Org-scoped resources (npm run push -- <org> reads here)
│ ├── assistants/
│ ├── tools/
│ ├── squads/
│ ├── structuredOutputs/
│ ├── evals/
│ └── simulations/
└── <another-org>/ # Another org (each is isolated)
└── (same structure)
Assistants are voice agents that handle phone calls. They are defined as Markdown files with YAML frontmatter.
File: resources/<org>/assistants/<name>.md
---
name: My Assistant
firstMessage: Hi, thanks for calling! How can I help you today?
voice:
provider: 11labs
voiceId: your-voice-id-here
model: eleven_turbo_v2
stability: 0.7
similarityBoost: 0.75
speed: 1.1
enableSsmlParsing: true
model:
provider: openai
model: gpt-4.1
temperature: 0
toolIds:
- end-call-tool
- transfer-call
transcriber:
provider: deepgram
model: nova-3
language: en
numerals: true
confidenceThreshold: 0.5
endCallFunctionEnabled: true
endCallMessage: Thank you for calling. Have a great day!
silenceTimeoutSeconds: 30
maxDurationSeconds: 600
backgroundDenoisingEnabled: true
backgroundSound: off
---
# Identity & Purpose
You are a virtual assistant for the business you represent...
# Workflow
## STEP 1: Greeting
...How it works:
- Everything between
---markers = YAML configuration (voice, model, tools, etc.) - Everything below the second
---= system prompt (markdown, sent as the LLM system message) - The system prompt IS the core behavior definition — write it like detailed instructions for an AI
| Setting | Purpose | Common Values |
|---|---|---|
name |
Display name in Vapi dashboard | Any string |
firstMessage |
What the assistant says first when a call connects | Greeting text (supports SSML like <break time='0.3s'/>) |
firstMessageMode |
How the first message is generated | assistant-speaks-first (default, uses firstMessage), assistant-speaks-first-with-model-generated-message (LLM generates it) |
voice |
Text-to-speech configuration | See Voice section below |
model |
LLM configuration | See Model section below |
transcriber |
Speech-to-text configuration | See Transcriber section below |
endCallFunctionEnabled |
Allow the assistant to hang up | true / false |
endCallMessage |
What to say when ending the call | Text string |
silenceTimeoutSeconds |
Hang up after N seconds of silence | 30 typical |
maxDurationSeconds |
Maximum call duration | 600 (10 min) typical |
backgroundDenoisingEnabled |
Reduce background noise | true / false |
backgroundSound |
Ambient sound during pauses | off, office |
voicemailMessage |
Message to leave if voicemail detected | Text string |
hooks |
Event-driven actions (see Hooks section) | Array of hook objects |
messagePlan |
Idle message behavior | See below |
startSpeakingPlan |
Endpointing configuration | See below |
stopSpeakingPlan |
Interruption sensitivity | See below |
server |
Webhook server for tool calls | { url, timeoutSeconds, credentialId } |
serverMessages |
Which events to send to webhook | ["end-of-call-report", "status-update"] |
analysisPlan |
Post-call analysis configuration | See below |
artifactPlan |
What to save after calls | See below |
observabilityPlan |
Logging/monitoring | { provider: "langfuse", tags: [...] } |
compliancePlan |
HIPAA/PCI compliance | { hipaaEnabled: false, pciEnabled: false } |
voice:
provider: 11labs # 11labs, playht, cartesia, azure, deepgram, openai, rime, lmnt
voiceId: your-voice-id-here # Provider-specific voice ID
model: eleven_turbo_v2 # Provider-specific model
stability: 0.7 # 0.0-1.0, higher = more consistent
similarityBoost: 0.75 # 0.0-1.0, higher = closer to original voice
speed: 1.1 # Speech rate multiplier
enableSsmlParsing: true # Allow SSML tags in responses
inputPunctuationBoundaries: # When to start TTS (chunk boundaries)
- "."
- "!"
- "?"
- ";"
- ","model:
provider: openai # openai, anthropic, google, azure-openai, groq, cerebras
model: gpt-4.1 # Provider-specific model name
temperature: 0 # 0.0-2.0, lower = more deterministic
toolIds: # Tools this assistant can use (reference by filename)
- my-tool-name
- another-tooltranscriber:
provider: deepgram # deepgram, assemblyai, azure, google, openai, gladia
model: nova-3 # Provider-specific model
language: en # Language code
numerals: true # Convert spoken numbers to digits
confidenceThreshold: 0.5 # Minimum confidence to accept transcriptionHooks trigger actions based on call events:
hooks:
# Say something when transcription confidence is low
- on: assistant.transcriber.endpointedSpeechLowConfidence
options:
confidenceMin: 0.2
confidenceMax: 0.49
do:
- type: say
exact: "I'm sorry, I didn't quite catch that. Could you please repeat?"
# End call on long customer silence
- on: customer.speech.timeout
options:
timeoutSeconds: 90
do:
- type: say
exact: "I'll be ending the call now. Please feel free to call back anytime."
- type: tool
tool:
type: endCallmessagePlan:
idleTimeoutSeconds: 15 # Seconds before idle message
idleMessages: # Messages to say when idle
- "I'm still here if you need assistance."
- "Are you still there?"
idleMessageMaxSpokenCount: 3 # Max idle messages before giving up
idleMessageResetCountOnUserSpeechEnabled: true # Reset counter when user speaksControls when the assistant starts responding after the user stops speaking:
startSpeakingPlan:
smartEndpointingPlan:
provider: livekit
waitFunction: "20 + 500 * sqrt(x) + 2500 * x^3" # Custom wait curvestopSpeakingPlan:
numWords: 1 # How many user words before assistant stops speaking (lower = more interruptible)analysisPlan:
summaryPlan:
enabled: true
messages:
- role: system
content: "Summarize this call concisely. Include: ..."
- role: user
content: |
Here is the transcript:
{{transcript}}
Here is the ended reason:
{{endedReason}}artifactPlan:
fullMessageHistoryEnabled: true # Save full message history
structuredOutputIds: # Run these structured outputs after call
- customer-data
- call-summaryTools are functions the assistant can call during a conversation.
File: resources/<org>/tools/<name>.yml
type: function
async: false
function:
name: get_weather
description: Get the current weather for a location
strict: true
parameters:
type: object
properties:
location:
type: string
description: The city name
unit:
type: string
enum: [celsius, fahrenheit]
description: Temperature unit
required:
- location
messages:
- type: request-start
blocking: true
content: "Let me check the weather for you."
- type: request-response-delayed
timingMilliseconds: 5000
content: "Still looking that up."
server:
url: https://my-api.com/weather
timeoutSeconds: 20
credentialId: optional-credential-uuid # Optional: server auth credential
headers: # Optional: custom request headers
Content-Type: application/jsontype: transferCall
async: false
function:
name: transfer_call
description: Transfer the caller to a human agent
destinations:
- type: number
number: "+15551234567"
numberE164CheckEnabled: true
message: "Please hold while I transfer you."
transferPlan:
mode: blind-transfer
sipVerb: refer
messages:
- type: request-start
blocking: falsetype: endCall
async: false
function:
name: end_call
description: Allows the agent to terminate the call
parameters:
type: object
properties: {}
required: []
messages:
- type: request-start
blocking: falsetype: handoff
function:
name: handoff_tool| Type | Purpose | Key Properties |
|---|---|---|
request-start |
Said when tool is called | content, blocking (pause speech until tool returns) |
request-response-delayed |
Said if tool takes too long | content, timingMilliseconds |
request-complete |
Said when tool returns | content |
request-failed |
Said when tool errors | content |
Structured outputs extract data from call transcripts after the call ends. They run LLM analysis on the conversation.
File: resources/<org>/structuredOutputs/<name>.yml
name: success_evaluation
type: ai
target: messages
description: "Determines if the call met its objectives"
assistant_ids:
- a1b2c3d4-e5f6-7890-abcd-ef1234567890
model:
provider: openai
model: gpt-4.1-mini
temperature: 0
schema:
type: boolean
description: "Return true if the call successfully met its objectives."name: customer_data
type: ai
target: messages
description: "Extracts customer contact info and call details"
assistant_ids:
- a1b2c3d4-e5f6-7890-abcd-ef1234567890
model:
provider: openai
model: gpt-4.1-mini
temperature: 0
schema:
type: object
properties:
customerName:
type: string
description: "The customer's full name"
customerPhone:
type: string
description: "The customer's phone number"
callReason:
type: string
description: "Why the customer called"
enum: [new_inquiry, existing_project, complaint, spam]
appointmentBooked:
type: boolean
description: "True if an appointment was booked"name: call_summary
type: ai
target: messages
description: "Generates a concise summary of the conversation"
model:
provider: openai
model: gpt-4.1-mini
temperature: 0
schema:
type: string
description: "Summarize the call in 2-3 sentences."
minLength: 10
maxLength: 500Notes:
assistant_idsuses Vapi UUIDs (not local filenames) — these are the IDs of assistants this output applies totarget: messagesmeans the LLM analyzes the full message historytype: aimeans an LLM generates the output (vs.type: codefor programmatic)schema.typemust be a simple string (e.g.type: string,type: boolean,type: object). Do NOT use a YAML array liketype: [string, "null"]— the Vapi dashboard calls.toLowerCase()on this field and will crash withTypeError: .toLowerCase is not a functionif it receives an array. For nullable values, express nullability in thedescriptioninstead (e.g. "Return null if no follow-up is needed")
Squads define multi-agent systems where assistants can hand off to each other.
File: resources/<org>/squads/<name>.yml
name: My Squad
members:
- assistantId: intake-agent-a1b2c3d4 # References resources/<org>/assistants/<id>.md
assistantOverrides: # Override assistant settings within this squad
metadata:
position: # Visual position in dashboard editor
x: 250
y: 100
tools:append: # Add tools to this member (in addition to their own)
- type: handoff
async: false
messages: []
function:
name: handoff_to_Booking_Agent
description: "Hand off to booking agent when customer wants to schedule"
parameters:
type: object
properties:
reason:
type: string
description: "Why the handoff is happening"
required:
- reason
destinations:
- type: assistant
assistantName: Booking Assistant # Must match the `name` field in target assistant
description: "Handles appointment booking"
- assistantId: booking-agent-e5f67890
assistantOverrides:
metadata:
position:
x: 650
y: 100
tools:append:
- type: handoff
async: false
messages: []
function:
name: handoff_back_to_Intake
description: "Hand back to intake agent for wrap-up"
destinations:
- type: assistant
assistantName: Intake Assistant
description: "Intake agent for call wrap-up"
membersOverrides: # Settings applied to ALL members
transcriber:
provider: deepgram
model: nova-3
language: en
hooks:
- on: customer.speech.timeout
options:
timeoutSeconds: 90
do:
- type: say
exact: "Ending the call now. Feel free to call back."
- type: tool
tool:
type: endCall
observabilityPlan:
provider: langfuse
tags:
- my-tagKey Concepts:
assistantIdreferences an assistant file by filename (without extension)tools:appendadds handoff tools without replacing the assistant's existing tools- Handoff
destinationslink to other squad members byassistantName(thenamefield in their YAML frontmatter) membersOverridesapplies settings to all members (useful for shared transcriber, hooks, etc.)- Handoff functions can have parameters that pass context between agents
Simulations let you test assistants with automated "caller" personas.
Define simulated caller behaviors:
name: Skeptical Sam
assistant:
model:
provider: openai
model: gpt-4.1
messages:
- role: system
content: >
You are skeptical and need convincing before trusting information.
You question everything and ask for specifics.
tools:
- type: endCallDefine test case scripts with evaluation criteria:
name: "Happy Path: New customer books appointment"
instructions: >
You are a new customer calling to schedule an appointment.
Provide your name as "John Smith", phone as "206-555-1234".
Be cooperative and confirm all information.
End the call when the assistant confirms the booking.
evaluations:
- structuredOutputId: a1b2c3d4-e5f6-7890-abcd-ef1234567890
comparator: "="
value: true
required: trueCombine a personality with a scenario:
name: Happy Path Test 1
personalityId: skeptical-sam-a0000001 # References personalities/<id>.yml
scenarioId: happy-path-booking-a0000002 # References scenarios/<id>.ymlGroup simulations into test batches:
name: Booking Flow Tests
simulationIds:
- booking-test-1-a0000001
- booking-test-2-a0000002
- booking-test-3-a0000003Resources reference each other by filename without extension:
| From | Field | References | Example |
|---|---|---|---|
| Assistant | model.toolIds[] |
Tool files | - end-call-tool |
| Assistant | artifactPlan.structuredOutputIds[] |
Structured Output files | - customer-data |
| Squad | members[].assistantId |
Assistant files | assistantId: intake-agent-a1b2c3d4 |
| Squad handoff | destinations[].assistantName |
Assistant name field |
assistantName: Booking Assistant |
| Simulation | personalityId |
Personality files | personalityId: skeptical-sam-a0000001 |
| Simulation | scenarioId |
Scenario files | scenarioId: happy-path-booking-a0000002 |
| Suite | simulationIds[] |
Simulation test files | - booking-test-1-a0000001 |
The gitops engine resolves these local filenames to Vapi UUIDs automatically during push.
The markdown body of an assistant .md file is the system prompt — the core instructions that define how the AI behaves on a call. This is the most important part to get right.
Before drafting or changing prompts: work through docs/Vapi Prompt Optimization Guide.md so structure, guardrails, and voice-specific habits stay consistent across agents.
# Identity & Purpose
Who the assistant is and what it does.
# Guardrails
Hard rules that override everything else:
- Scope limits (what topics to handle)
- Data protection (what NOT to collect)
- Abuse handling
- Off-topic deflection
- Fabrication prohibition
# Primary Objectives
Numbered list of what the assistant should accomplish.
# Personality
Tone, style, language constraints.
# Response Guidelines
How to speak, confirm information, format numbers/prices, etc.
# Context
## Business Knowledge Base
Static facts: hours, services, contact info, service areas.
## Customer Context
Dynamic variables: {{ customer.number }}, current date/time.
# Workflow
## STEP 1: ...
## STEP 2: ...
## STEP 3: ...
Detailed step-by-step conversation flow.
# Error Handling
What to do when things go wrong (tool failures, repeated misunderstandings, etc.).
# Example Flows
Concrete example conversations showing expected behavior.- One question at a time — Voice agents should never ask multiple questions
- Confirm critical fields — Always repeat back names, phone numbers, addresses
- Use SSML —
<break time='0.5s'/>,<flush/>,<spell>text</spell>for voice control - E.164 phone format — Always store as
+1XXXXXXXXXX - Guard against jailbreaks — Include identity lock and prompt protection sections
- Template variables — Use
{{ customer.number }}for caller phone,{{"now" | date: "%A, %B %d, %Y"}}for date/time - Tool call announcements — Tell the user before calling tools: "Let me check that for you"
- Transfer pattern — Always speak first, then call transfer tool (two-step: say message, then tool call)
# Setup
npm run setup # Interactive wizard: API key, org slug, resource selection
# Sync
npm run pull -- <org> # Pull from Vapi (preserve local changes)
npm run pull -- <org> --force # Pull from Vapi (overwrite everything)
npm run pull -- <org> --bootstrap # Refresh state without writing remote resources locally
npm run pull -- <org> --type squads --id <uuid> # Pull one known remote resource by UUID
npm run push -- <org> # Push all local changes to Vapi
npm run push -- <org> assistants # Push only assistants
npm run push -- <org> resources/<org>/assistants/my-agent.md # Push single file
npm run push -- <org> <path1> <path2> # Push multiple specific files (one state write)
npm run push -- <org> --dry-run # Preview without applying any platform changes
npm run push -- <org> --strict # Abort push if any validator returns an error
npm run push -- <org> --allow-new-files # Bypass orphan-YAML gate (use only after confirming each orphan is intentionally new — see "Orphan-YAML gate" section above)
npm run apply -- <org> # Pull then push (full sync)
npm run apply -- <org> --allow-new-files # Same, propagating the bypass through to the push stage
npm run validate -- <org> # Lint resources locally (fails fast on schema drift)
npm run audit -- <org> # Read-only drift detector: orphan YAML, state ghosts, content-identical clusters, sibling base-slugs, dashboard orphans, inline model.tools. Exit 1 on findings.
npm run audit -- <org> --type assistants # Scope audit to a single resource type
npm run sim -- <org> --suite <name> --target <name> # Run a simulation suite against an assistant/squad
npm run rollback -- <org> --to <ISO-timestamp> # Re-apply a snapshot taken before a push
npm run rollback -- <org> --list # List available snapshots
# Testing
npm run call -- <org> -a <assistant-name> # Call an assistant via WebSocket
npm run call -- <org> -s <squad-name> # Call a squad via WebSocket
# Maintenance
npm run cleanup -- <org> # Dry-run: show orphaned remote resources
npm run cleanup -- <org> --force # Delete orphaned remote resources
# Build
npm run build # Type-checkAll commands accept an org slug (e.g. my-org). Running without arguments launches interactive mode.
The test-call CLI cleans its terminal output for the developer loop:
-
Coalesced transcripts. Chunked TTS providers (Cartesia Sonic, etc.) stream each utterance as 2–4 separate
finaltranscript events. The CLI buffers consecutive finals from the same role and flushes them as one merged🤖 Assistant:/🎤 You:line after a 600 ms quiet window, on role change, onspeech-updatefrom the opposite role, oncall-ended, and on Ctrl+C. -
Suppressed
mpg123warnings. macOS speaker output emitsDidn't have any audio data in callback (buffer underflow)lines from native code on every chunk-boundary gap. Thenpm run callscript wraps invocation inbash -c+ a stderr filter that drops these lines so they no longer dominate the log. RequiresbashonPATH(universal on macOS, Linux, WSL). -
Tool / handoff / status visibility. The CLI surfaces previously-dropped WebSocket control messages:
🔧 Tool call: <name>(<args>)— regular tool invocations🔀 Handoff → <Target Name>— squad handoffs (detected fromhandoff_to_<Target_Name>function names)✅ Tool result: <name> → <preview>/❌ Tool failed: <name> → <preview>— tool responses, truncated to 200 chars📞 Status: <state>[+reason]—in-progress,forwarding,ended⚠️ Hang warning— impending termination🔀 Transfer → <destination>— number / SIP / cross-assistant transfers
-
Discovery mode. Set
VAPI_CALL_DEBUG=1in the environment to log unknown control message types (high-frequency events likeconversation-update,model-output,function-call,user-interruptedare silently dropped by default to keep the log readable):VAPI_CALL_DEBUG=1 npm run call -- <org> -s <squad>
These are CLI-only changes — no runtime behavior change for the agent, no per-customer config required. Every downstream customer clone of this template inherits them automatically.
For the complete schema of all available properties on each resource type, consult the Vapi API documentation:
| Resource | API Docs |
|---|---|
| Assistants | https://docs.vapi.ai/api-reference/assistants/create |
| Tools | https://docs.vapi.ai/api-reference/tools/create |
| Squads | https://docs.vapi.ai/api-reference/squads/create |
| Structured Outputs | https://docs.vapi.ai/api-reference/structured-outputs/structured-output-controller-create |
| Simulations | https://docs.vapi.ai/api-reference/simulations |
For voice/model/transcriber provider options:
- Voice providers: https://docs.vapi.ai/providers/voice
- Model providers: https://docs.vapi.ai/providers/model
- Transcriber providers: https://docs.vapi.ai/providers/transcriber
For feature-specific documentation:
- Hooks: https://docs.vapi.ai/assistants/hooks
- Tools: https://docs.vapi.ai/tools
- Squads: https://docs.vapi.ai/squads
- Workflows: https://docs.vapi.ai/workflows
Tip: The Vapi MCP server and API reference pages provide full JSON schemas with all available fields, enums, and defaults. Use them to discover settings not covered in this guide.
- Filenames include a UUID suffix for uniqueness:
my-agent-a1b2c3d4.md - The UUID suffix comes from the Vapi platform ID (first 8 chars of the UUID)
- New resources created locally don't need the UUID suffix — it gets added after first push
- Tool function names use
snake_case:book_appointment,check_availability - Assistant names use natural language:
Intake Assistant,Booking Assistant - Structured output names use
snake_case:customer_data,call_summary
The engine has a name_mismatch guard that auto-bootstraps state from the dashboard before applying changes. Editing .vapi-state.<org>.json by hand to repoint a renamed file at the existing dashboard UUID does not work — the bootstrap runs first, overwrites your manual edit, and the rename gets treated as "delete the old resource + create a new one."
What this means in practice for renames:
| Approach | What happens |
|---|---|
Rename the file locally + npm run push -- <org> |
New UUID is minted for the renamed file; the old UUID becomes orphaned in the dashboard. Run npm run cleanup -- <org> --force (or npm run push -- <org> --force <file>) to delete the orphan. |
Rename in the dashboard first, then npm run pull -- <org> |
UUID is preserved. The pulled file lands with the new name and the existing UUID suffix; no orphan. |
If preserving the UUID matters (e.g. it's referenced from a phone number, outbound campaign, or external integration), rename via the dashboard first and pull. Otherwise, accept the new UUID and clean up the orphan.
Two-step pattern (speak first, then call tool):
In the system prompt:
When transferring to human:
1. First: Speak transfer message ending with <break time='0.5s'/><flush/>
2. Second: Call transfer_call with no spoken text
- Create each agent as a separate assistant
.mdfile - Create a squad
.ymlthat lists them as members - Define handoff tools in
tools:appendon each member - Handoff functions can pass parameters (context) between agents
- Create structured outputs for the data you want
- Reference them in the assistant's
artifactPlan.structuredOutputIds - After each call, Vapi runs the LLM analysis and stores results
- Create personalities (how the simulated caller behaves)
- Create scenarios (what the simulated caller says + evaluation criteria)
- Create simulations (pair personality + scenario)
- Create suites (batch simulations together)
- Run via Vapi dashboard or API