Vapi GitOps — Agent Guide

This project manages Vapi voice agent configurations as code. All resources (assistants, tools, squads, etc.) are declarative files that sync to the Vapi platform via a gitops engine.

You do NOT need to know how Vapi works internally. This guide tells you everything you need to author and modify resources.

Prompt quality: Whenever you create a new assistant or change an existing assistant’s system prompt, read docs/Vapi Prompt Optimization Guide.md first. It goes deeper on structure, voice constraints, tool usage, and evaluation than the summary in this file.

Org-scoped resources: Resources live in resources/<org>/ (e.g. resources/my-org/, resources/my-org-prod/). Each org directory is isolated — npm run push -- my-org only touches resources/my-org/. Run npm run setup to create a new org.

Template-safe first run: In a fresh clone, prefer npm run pull -- <org> --bootstrap to refresh .vapi-state.<org>.json and credential mappings without materializing the target org's resources into resources/<org>/. npm run push -- <org> will auto-run the same bootstrap sync when it detects empty or stale state for the resources being applied.

Excluding resources from sync (.vapi-ignore): To prevent specific resources from being touched in either direction (e.g. assistants owned by another team or legacy resources you don't want to manage), create resources/<org>/.vapi-ignore with gitignore-style patterns. See resources/.vapi-ignore.example for syntax and examples. The list is bidirectional: matched ids are skipped on pull (never written), on push and apply (never sent), and orphan-protected (a --force push will not DELETE a dashboard resource whose id matches the ignore). --force on push bypasses the load-filter so a deliberate override can flow through, but orphan-protect still applies. A resource that references an ignored resource (e.g. a squad pointing at assistants/foo while assistants/foo is ignored) is a validation ERROR — --strict push aborts before any API call.

Learnings & recipes: Before configuring resources or debugging issues, read the relevant file in docs/learnings/. Load only what you need:

Working on	Read
Assistants (model, voice, transcriber, hooks)	`docs/learnings/assistants.md`
Tools (apiRequest, function, transferCall, handoff, code)	`docs/learnings/tools.md`
Squads / multi-agent handoffs	`docs/learnings/squads.md`
Transfers not working	`docs/learnings/transfers.md`
Structured outputs / post-call analysis	`docs/learnings/structured-outputs.md`
Simulations / test suites	`docs/learnings/simulations.md`
Webhooks / server config	`docs/learnings/webhooks.md`
Latency optimization	`docs/learnings/latency.md`
Fallback providers / error hooks	`docs/learnings/fallbacks.md`
Azure OpenAI BYOK with regional failover	`docs/learnings/azure-openai-fallback.md`
Multilingual agents (English/Spanish)	`docs/learnings/multilingual.md`
WebSocket audio streaming	`docs/learnings/websocket.md`
Building outbound calling agents	`docs/learnings/outbound-agents.md`
Bulk-dialing from a CSV (Outbound Call Campaigns)	`docs/learnings/outbound-campaigns.md`
Voicemail detection / VM vs human classification	`docs/learnings/voicemail-detection.md`
Enforcing call time limits / graceful call ending	`docs/learnings/call-duration.md`
Voice provider field cheat-sheet (Cartesia vs 11labs vs OpenAI etc.)	`docs/learnings/voice-providers.md`
YAML authoring conventions, .vapi-ignore lifecycle	`docs/learnings/yaml-conventions.md`

Where new knowledge goes:

Kind of knowledge	Home	Convention
Per-resource gotchas, recipes, troubleshooting	`docs/learnings/<topic>.md`	One file per resource type or topic. Add a row to this table AND to `docs/learnings/README.md` when you add a new file. `CLAUDE.md` mirrors this list — keep both in sync.
Engine-friction log (push/pull/state/cleanup pain points + fixes)	`improvements.md`	Format: Problem → Current behavior → Risk → Current mitigation → Possible fix → Status. Mark `[RESOLVED YYYY-MM-DD] (#<PR>)` when fixed; never delete.
Code-level rationale (why a function works the way it does)	Code comments	Only when the WHY is non-obvious — not what the code does. Don't reference PR/issue numbers; they rot.
Setup, install, repo orientation	`README.md`	One-time onboarding only. Don't put runtime gotchas here.

If you're unsure where something goes, default to docs/learnings/. The README and engine-friction log are deliberately narrow.

Quick Reference

I want to...	What to do
Edit an assistant's system prompt	Edit the markdown body in `resources/<org>/assistants/<name>.md`
Change assistant settings	Edit the YAML frontmatter in the same `.md` file
Add a new tool	Create `resources/<org>/tools/<name>.yml`
Add a new assistant	Create `resources/<org>/assistants/<name>.md`
Create a multi-agent squad	Create `resources/<org>/squads/<name>.yml`
Add post-call analysis	Create `resources/<org>/structuredOutputs/<name>.yml`
Write test simulations	Create files under `resources/<org>/simulations/`
Promote resources across orgs	Copy files between `resources/<org-a>/` and `resources/<org-b>/`
Deploy local changes (default)	`npm run apply -- <org>` — pull → merge → push, safe against dashboard drift
Pre-flight schema check (no network)	`npm run validate -- <org>` — run before every `apply`
Audit state/dashboard drift (read-only)	`npm run audit -- <org>` — orphans, ghosts, content-identical clusters, inline-tools. Exit 1 on findings.
Pull latest from Vapi	`npm run pull -- <org>`, `--force`, or `--bootstrap`
Pull one known remote resource	`npm run pull -- <org> --type assistants --id <uuid>`
Deploy a single file	`npm run apply -- <org> resources/<org>/assistants/my-agent.md`
Recover from a bad deploy	`npm run rollback -- <org> --list` then `--to <ISO-timestamp>`
Raw push (no pre-pull)	`npm run push -- <org>` — see safety hierarchy below; rarely the right call
Push with new resources	`npm run push -- <org> --allow-new-files` — bypass orphan-YAML gate. AI agents: do NOT auto-pass this flag; confirm with the human first (see push section below)
Test a call	`npm run call -- <org> -a <assistant-name>` or `-s <squad-name>`
Run a simulation suite	`npm run sim -- <org> --suite <name> --target <assistant-name>`

Choosing a sync command (safety hierarchy)

Three commands deploy changes to the Vapi platform. Pick the safest one that fits the task. apply is the default.

`npm run apply -- <org>` — DEFAULT deploy verb

Pulls the platform's current state, merges with your local files, then pushes the merged result. This protects you from racing dashboard edits made between your last pull and your push. Use this for ~99% of deployments.

npm run validate -- <org>             # schema check first, no network call
npm run apply -- <org>                # full-org apply
npm run apply -- <org> <path-to-file> # single-file apply (same safety, scoped diff)

`npm run validate -- <org>` — pre-flight schema check

Runs the engine's local validators against every YAML/MD file in the org without any network call. Catches shape errors (missing required fields, wrong types, stale tool references) before they burn a deploy. Run before every apply.

`npm run push -- <org>` — raw push, no pre-pull

Skips the merge pass. Only use when (a) you literally just ran pull and (b) you're certain no one has touched the dashboard since. In a multi-developer environment or when dashboard editors are in play, default to apply instead. Stale local state can clobber recent dashboard edits or PATCH against UUIDs that no longer exist.

If you do use push, dry-run first: npm run push -- <org> --dry-run.

Orphan-YAML gate (default-on, refuses ambiguous pushes)

push refuses by default when any local YAML file lacks a corresponding entry in .vapi-state.<org>.json. The engine can't disambiguate "intentionally new resource" vs "rename of an existing resource" vs "stale cruft" from the file alone. Silently treating every orphan as a create has been the spawn-source for duplicate-resource cascades on customer dashboards.

When the gate fires, push exits 1 with a verbose message listing every orphan and pairing them with possible "rename source" candidates (state entries with no matching local file that share a base slug). Read the message — it tells you exactly what to do for each case.

Override: --allow-new-files. Pass this flag ONLY after confirming with the human operator that each orphan is intentionally a new resource (case a above). For renames (case b), rename the file back and run npm run pull -- <org> to re-key state. For stale files (case c), delete them locally.

FOR AI AGENTS: when you encounter this gate, do NOT auto-pass --allow-new-files. Surface the error message to the human and ask them to classify each orphan. Silent bypass defeats the entire purpose of the gate.

Suppressed automatically when:

--bootstrap is passed (every file is legitimately "new" in a from-scratch population).
The file is matched by .vapi-ignore (the engine wasn't going to upload it anyway).
A selective push (-- <path>) is requested and the orphan is outside the selection.

The same gate fires inside apply (which runs pull → merge → push). If pull's rename-detection orphans a local YAML, apply's push stage hits the gate and halts the whole apply with one explicit message. The --allow-new-files flag propagates through apply -- <org> --allow-new-files to the push stage.

After-the-fact safety

npm run rollback -- <org> --list — every push/apply writes a state snapshot to .vapi-state.<org>.snapshots/ before mutating. --to <ISO-timestamp> re-applies a specific snapshot, effectively undoing the deploy.
npm run cleanup -- <org> (no --force) — enumerate orphaned dashboard resources without deleting. Destructive run is double-gated: requires --force --confirm <org>.
Surgical alternative to --force cleanup: when the orphan set includes Vapi-default fixtures (see docs/learnings/simulations.md — the seven immortal stock personalities), delete individual resources via curl -X DELETE against the API, then npm run pull -- <org> --bootstrap to refresh state. --force halts on the first immortal-default 404 and exits non-zero.

Pre-flight checklist (memorize this loop)

git status — uncommitted changes are intentional?
npm run validate -- <org> — schema clean?
npm run apply -- <org> (or apply -- <org> <path> for single-file)
Verify with npm run call -- <org> -a <name> and/or npm run sim -- <org> --suite <name> --target <name>
If something looks wrong: npm run rollback -- <org> --list → --to <timestamp>

Why this matters: the gitops engine tracks resource UUIDs in .vapi-state.<org>.json. Bare push trusts that file is current. If the dashboard has changed since your last pull — someone edited an assistant in the UI, a teammate ran a different push, a structured output got linked via another path — your local state can be stale by minutes. apply refreshes state before mutating, eliminating that entire class of race.

Project Structure

docs/
├── Vapi Prompt Optimization Guide.md          # In-depth prompt authoring
├── changelog.md                               # Template for tracking per-customer config changes
└── learnings/                                 # Gotchas, recipes, and troubleshooting
    ├── README.md                              # Task-routed index — start here
    ├── tools.md                               # Tool configuration gotchas (incl. dedup behavior)
    ├── assistants.md                          # Assistant configuration gotchas
    ├── squads.md                              # Squad and multi-agent gotchas
    ├── structured-outputs.md                  # Structured output gotchas + KPI patterns
    ├── simulations.md                         # Simulation and testing gotchas
    ├── webhooks.md                            # Server and webhook gotchas
    ├── transfers.md                           # Transfer troubleshooting runbook
    ├── latency.md                             # Latency optimization guide
    ├── fallbacks.md                           # Fallback and error handling recipes
    ├── azure-openai-fallback.md               # Azure OpenAI BYOK multi-region setup
    ├── multilingual.md                        # Multilingual agent architecture guide
    ├── websocket.md                           # WebSocket transport rules
    ├── outbound-agents.md                     # Outbound agent design & IVR navigation
    ├── outbound-campaigns.md                  # Bulk-dial CSV campaigns + dynamic variables
    ├── voicemail-detection.md                 # Voicemail vs human classification
    ├── call-duration.md                       # Call time limits and graceful end-of-call
    ├── voice-providers.md                     # Per-provider voice block field cheat-sheet
    └── yaml-conventions.md                    # YAML authoring conventions, .vapi-ignore lifecycle

resources/
├── <org>/                   # Org-scoped resources (npm run push -- <org> reads here)
│   ├── assistants/
│   ├── tools/
│   ├── squads/
│   ├── structuredOutputs/
│   ├── evals/
│   └── simulations/
└── <another-org>/           # Another org (each is isolated)
    └── (same structure)

Resource Formats

Assistants (`.md`) — The Most Important Resource

Assistants are voice agents that handle phone calls. They are defined as Markdown files with YAML frontmatter.

File: resources/<org>/assistants/<name>.md

---
name: My Assistant
firstMessage: Hi, thanks for calling! How can I help you today?
voice:
  provider: 11labs
  voiceId: your-voice-id-here
  model: eleven_turbo_v2
  stability: 0.7
  similarityBoost: 0.75
  speed: 1.1
  enableSsmlParsing: true
model:
  provider: openai
  model: gpt-4.1
  temperature: 0
  toolIds:
    - end-call-tool
    - transfer-call
transcriber:
  provider: deepgram
  model: nova-3
  language: en
  numerals: true
  confidenceThreshold: 0.5
endCallFunctionEnabled: true
endCallMessage: Thank you for calling. Have a great day!
silenceTimeoutSeconds: 30
maxDurationSeconds: 600
backgroundDenoisingEnabled: true
backgroundSound: off
---

# Identity & Purpose

You are a virtual assistant for the business you represent...

# Workflow

## STEP 1: Greeting

...

How it works:

Everything between --- markers = YAML configuration (voice, model, tools, etc.)
Everything below the second --- = system prompt (markdown, sent as the LLM system message)
The system prompt IS the core behavior definition — write it like detailed instructions for an AI

Key Assistant Settings

Setting	Purpose	Common Values
`name`	Display name in Vapi dashboard	Any string
`firstMessage`	What the assistant says first when a call connects	Greeting text (supports SSML like `<break time='0.3s'/>`)
`firstMessageMode`	How the first message is generated	`assistant-speaks-first` (default, uses `firstMessage`), `assistant-speaks-first-with-model-generated-message` (LLM generates it)
`voice`	Text-to-speech configuration	See Voice section below
`model`	LLM configuration	See Model section below
`transcriber`	Speech-to-text configuration	See Transcriber section below
`endCallFunctionEnabled`	Allow the assistant to hang up	`true` / `false`
`endCallMessage`	What to say when ending the call	Text string
`silenceTimeoutSeconds`	Hang up after N seconds of silence	`30` typical
`maxDurationSeconds`	Maximum call duration	`600` (10 min) typical
`backgroundDenoisingEnabled`	Reduce background noise	`true` / `false`
`backgroundSound`	Ambient sound during pauses	`off`, `office`
`voicemailMessage`	Message to leave if voicemail detected	Text string
`hooks`	Event-driven actions (see Hooks section)	Array of hook objects
`messagePlan`	Idle message behavior	See below
`startSpeakingPlan`	Endpointing configuration	See below
`stopSpeakingPlan`	Interruption sensitivity	See below
`server`	Webhook server for tool calls	`{ url, timeoutSeconds, credentialId }`
`serverMessages`	Which events to send to webhook	`["end-of-call-report", "status-update"]`
`analysisPlan`	Post-call analysis configuration	See below
`artifactPlan`	What to save after calls	See below
`observabilityPlan`	Logging/monitoring	`{ provider: "langfuse", tags: [...] }`
`compliancePlan`	HIPAA/PCI compliance	`{ hipaaEnabled: false, pciEnabled: false }`

Voice Configuration

voice:
  provider: 11labs # 11labs, playht, cartesia, azure, deepgram, openai, rime, lmnt
  voiceId: your-voice-id-here # Provider-specific voice ID
  model: eleven_turbo_v2 # Provider-specific model
  stability: 0.7 # 0.0-1.0, higher = more consistent
  similarityBoost: 0.75 # 0.0-1.0, higher = closer to original voice
  speed: 1.1 # Speech rate multiplier
  enableSsmlParsing: true # Allow SSML tags in responses
  inputPunctuationBoundaries: # When to start TTS (chunk boundaries)
    - "."
    - "!"
    - "?"
    - ";"
    - ","

Model (LLM) Configuration

model:
  provider: openai # openai, anthropic, google, azure-openai, groq, cerebras
  model: gpt-4.1 # Provider-specific model name
  temperature: 0 # 0.0-2.0, lower = more deterministic
  toolIds: # Tools this assistant can use (reference by filename)
    - my-tool-name
    - another-tool

Transcriber (STT) Configuration

transcriber:
  provider: deepgram # deepgram, assemblyai, azure, google, openai, gladia
  model: nova-3 # Provider-specific model
  language: en # Language code
  numerals: true # Convert spoken numbers to digits
  confidenceThreshold: 0.5 # Minimum confidence to accept transcription

Hooks (Event-Driven Actions)

Hooks trigger actions based on call events:

hooks:
  # Say something when transcription confidence is low
  - on: assistant.transcriber.endpointedSpeechLowConfidence
    options:
      confidenceMin: 0.2
      confidenceMax: 0.49
    do:
      - type: say
        exact: "I'm sorry, I didn't quite catch that. Could you please repeat?"

  # End call on long customer silence
  - on: customer.speech.timeout
    options:
      timeoutSeconds: 90
    do:
      - type: say
        exact: "I'll be ending the call now. Please feel free to call back anytime."
      - type: tool
        tool:
          type: endCall

Message Plan (Idle Behavior)

messagePlan:
  idleTimeoutSeconds: 15 # Seconds before idle message
  idleMessages: # Messages to say when idle
    - "I'm still here if you need assistance."
    - "Are you still there?"
  idleMessageMaxSpokenCount: 3 # Max idle messages before giving up
  idleMessageResetCountOnUserSpeechEnabled: true # Reset counter when user speaks

Start Speaking Plan (Endpointing)

Controls when the assistant starts responding after the user stops speaking:

startSpeakingPlan:
  smartEndpointingPlan:
    provider: livekit
    waitFunction: "20 + 500 * sqrt(x) + 2500 * x^3" # Custom wait curve

Stop Speaking Plan (Interruption)

stopSpeakingPlan:
  numWords: 1 # How many user words before assistant stops speaking (lower = more interruptible)

Analysis Plan (Post-Call Summaries)

analysisPlan:
  summaryPlan:
    enabled: true
    messages:
      - role: system
        content: "Summarize this call concisely. Include: ..."
      - role: user
        content: |
          Here is the transcript:
          {{transcript}}
          Here is the ended reason:
          {{endedReason}}

Artifact Plan (Post-Call Data)

artifactPlan:
  fullMessageHistoryEnabled: true # Save full message history
  structuredOutputIds: # Run these structured outputs after call
    - customer-data
    - call-summary

Tools (`.yml`)

Tools are functions the assistant can call during a conversation.

File: resources/<org>/tools/<name>.yml

Function Tool (calls a webhook)

type: function
async: false
function:
  name: get_weather
  description: Get the current weather for a location
  strict: true
  parameters:
    type: object
    properties:
      location:
        type: string
        description: The city name
      unit:
        type: string
        enum: [celsius, fahrenheit]
        description: Temperature unit
    required:
      - location
messages:
  - type: request-start
    blocking: true
    content: "Let me check the weather for you."
  - type: request-response-delayed
    timingMilliseconds: 5000
    content: "Still looking that up."
server:
  url: https://my-api.com/weather
  timeoutSeconds: 20
  credentialId: optional-credential-uuid # Optional: server auth credential
  headers: # Optional: custom request headers
    Content-Type: application/json

Transfer Call Tool

type: transferCall
async: false
function:
  name: transfer_call
  description: Transfer the caller to a human agent
destinations:
  - type: number
    number: "+15551234567"
    numberE164CheckEnabled: true
    message: "Please hold while I transfer you."
    transferPlan:
      mode: blind-transfer
      sipVerb: refer
messages:
  - type: request-start
    blocking: false

End Call Tool

type: endCall
async: false
function:
  name: end_call
  description: Allows the agent to terminate the call
  parameters:
    type: object
    properties: {}
    required: []
messages:
  - type: request-start
    blocking: false

Handoff Tool (minimal — usually defined inline in squads)

type: handoff
function:
  name: handoff_tool

Tool Message Types

Type	Purpose	Key Properties
`request-start`	Said when tool is called	`content`, `blocking` (pause speech until tool returns)
`request-response-delayed`	Said if tool takes too long	`content`, `timingMilliseconds`
`request-complete`	Said when tool returns	`content`
`request-failed`	Said when tool errors	`content`

Structured Outputs (`.yml`)

Structured outputs extract data from call transcripts after the call ends. They run LLM analysis on the conversation.

File: resources/<org>/structuredOutputs/<name>.yml

Boolean Output (yes/no evaluation)

name: success_evaluation
type: ai
target: messages
description: "Determines if the call met its objectives"
assistant_ids:
  - a1b2c3d4-e5f6-7890-abcd-ef1234567890
model:
  provider: openai
  model: gpt-4.1-mini
  temperature: 0
schema:
  type: boolean
  description: "Return true if the call successfully met its objectives."

Object Output (structured data extraction)

name: customer_data
type: ai
target: messages
description: "Extracts customer contact info and call details"
assistant_ids:
  - a1b2c3d4-e5f6-7890-abcd-ef1234567890
model:
  provider: openai
  model: gpt-4.1-mini
  temperature: 0
schema:
  type: object
  properties:
    customerName:
      type: string
      description: "The customer's full name"
    customerPhone:
      type: string
      description: "The customer's phone number"
    callReason:
      type: string
      description: "Why the customer called"
      enum: [new_inquiry, existing_project, complaint, spam]
    appointmentBooked:
      type: boolean
      description: "True if an appointment was booked"

String Output (free-text summary)

name: call_summary
type: ai
target: messages
description: "Generates a concise summary of the conversation"
model:
  provider: openai
  model: gpt-4.1-mini
  temperature: 0
schema:
  type: string
  description: "Summarize the call in 2-3 sentences."
  minLength: 10
  maxLength: 500

Notes:

assistant_ids uses Vapi UUIDs (not local filenames) — these are the IDs of assistants this output applies to
target: messages means the LLM analyzes the full message history
type: ai means an LLM generates the output (vs. type: code for programmatic)
schema.type must be a simple string (e.g. type: string, type: boolean, type: object). Do NOT use a YAML array like type: [string, "null"] — the Vapi dashboard calls .toLowerCase() on this field and will crash with TypeError: .toLowerCase is not a function if it receives an array. For nullable values, express nullability in the description instead (e.g. "Return null if no follow-up is needed")

Squads (`.yml`)

Squads define multi-agent systems where assistants can hand off to each other.

File: resources/<org>/squads/<name>.yml

name: My Squad
members:
  - assistantId: intake-agent-a1b2c3d4 # References resources/<org>/assistants/<id>.md
    assistantOverrides: # Override assistant settings within this squad
      metadata:
        position: # Visual position in dashboard editor
          x: 250
          y: 100
      tools:append: # Add tools to this member (in addition to their own)
        - type: handoff
          async: false
          messages: []
          function:
            name: handoff_to_Booking_Agent
            description: "Hand off to booking agent when customer wants to schedule"
            parameters:
              type: object
              properties:
                reason:
                  type: string
                  description: "Why the handoff is happening"
              required:
                - reason
          destinations:
            - type: assistant
              assistantName: Booking Assistant # Must match the `name` field in target assistant
              description: "Handles appointment booking"

  - assistantId: booking-agent-e5f67890
    assistantOverrides:
      metadata:
        position:
          x: 650
          y: 100
      tools:append:
        - type: handoff
          async: false
          messages: []
          function:
            name: handoff_back_to_Intake
            description: "Hand back to intake agent for wrap-up"
          destinations:
            - type: assistant
              assistantName: Intake Assistant
              description: "Intake agent for call wrap-up"

membersOverrides: # Settings applied to ALL members
  transcriber:
    provider: deepgram
    model: nova-3
    language: en
  hooks:
    - on: customer.speech.timeout
      options:
        timeoutSeconds: 90
      do:
        - type: say
          exact: "Ending the call now. Feel free to call back."
        - type: tool
          tool:
            type: endCall
  observabilityPlan:
    provider: langfuse
    tags:
      - my-tag

Key Concepts:

assistantId references an assistant file by filename (without extension)
tools:append adds handoff tools without replacing the assistant's existing tools
Handoff destinations link to other squad members by assistantName (the name field in their YAML frontmatter)
membersOverrides applies settings to all members (useful for shared transcriber, hooks, etc.)
Handoff functions can have parameters that pass context between agents

Simulations (Test Infrastructure)

Simulations let you test assistants with automated "caller" personas.

Personalities (`simulations/personalities/<name>.yml`)

Define simulated caller behaviors:

name: Skeptical Sam
assistant:
  model:
    provider: openai
    model: gpt-4.1
    messages:
      - role: system
        content: >
          You are skeptical and need convincing before trusting information.
          You question everything and ask for specifics.
    tools:
      - type: endCall

Scenarios (`simulations/scenarios/<name>.yml`)

Define test case scripts with evaluation criteria:

name: "Happy Path: New customer books appointment"
instructions: >
  You are a new customer calling to schedule an appointment.
  Provide your name as "John Smith", phone as "206-555-1234".
  Be cooperative and confirm all information.
  End the call when the assistant confirms the booking.
evaluations:
  - structuredOutputId: a1b2c3d4-e5f6-7890-abcd-ef1234567890
    comparator: "="
    value: true
    required: true

Simulations / Tests (`simulations/tests/<name>.yml`)

Combine a personality with a scenario:

name: Happy Path Test 1
personalityId: skeptical-sam-a0000001 # References personalities/<id>.yml
scenarioId: happy-path-booking-a0000002 # References scenarios/<id>.yml

Simulation Suites (`simulations/suites/<name>.yml`)

Group simulations into test batches:

name: Booking Flow Tests
simulationIds:
  - booking-test-1-a0000001
  - booking-test-2-a0000002
  - booking-test-3-a0000003

Cross-Resource References

Resources reference each other by filename without extension:

From	Field	References	Example
Assistant	`model.toolIds[]`	Tool files	`- end-call-tool`
Assistant	`artifactPlan.structuredOutputIds[]`	Structured Output files	`- customer-data`
Squad	`members[].assistantId`	Assistant files	`assistantId: intake-agent-a1b2c3d4`
Squad handoff	`destinations[].assistantName`	Assistant `name` field	`assistantName: Booking Assistant`
Simulation	`personalityId`	Personality files	`personalityId: skeptical-sam-a0000001`
Simulation	`scenarioId`	Scenario files	`scenarioId: happy-path-booking-a0000002`
Suite	`simulationIds[]`	Simulation test files	`- booking-test-1-a0000001`

The gitops engine resolves these local filenames to Vapi UUIDs automatically during push.

Writing System Prompts (Best Practices)

The markdown body of an assistant .md file is the system prompt — the core instructions that define how the AI behaves on a call. This is the most important part to get right.

Before drafting or changing prompts: work through docs/Vapi Prompt Optimization Guide.md so structure, guardrails, and voice-specific habits stay consistent across agents.

Recommended Structure

# Identity & Purpose

Who the assistant is and what it does.

# Guardrails

Hard rules that override everything else:

- Scope limits (what topics to handle)
- Data protection (what NOT to collect)
- Abuse handling
- Off-topic deflection
- Fabrication prohibition

# Primary Objectives

Numbered list of what the assistant should accomplish.

# Personality

Tone, style, language constraints.

# Response Guidelines

How to speak, confirm information, format numbers/prices, etc.

# Context

## Business Knowledge Base

Static facts: hours, services, contact info, service areas.

## Customer Context

Dynamic variables: {{ customer.number }}, current date/time.

# Workflow

## STEP 1: ...

## STEP 2: ...

## STEP 3: ...

Detailed step-by-step conversation flow.

# Error Handling

What to do when things go wrong (tool failures, repeated misunderstandings, etc.).

# Example Flows

Concrete example conversations showing expected behavior.

Tips

One question at a time — Voice agents should never ask multiple questions
Confirm critical fields — Always repeat back names, phone numbers, addresses
Use SSML — <break time='0.5s'/>, <flush/>, <spell>text</spell> for voice control
E.164 phone format — Always store as +1XXXXXXXXXX
Guard against jailbreaks — Include identity lock and prompt protection sections
Template variables — Use {{ customer.number }} for caller phone, {{"now" | date: "%A, %B %d, %Y"}} for date/time
Tool call announcements — Tell the user before calling tools: "Let me check that for you"
Transfer pattern — Always speak first, then call transfer tool (two-step: say message, then tool call)

Available Commands

# Setup
npm run setup                                      # Interactive wizard: API key, org slug, resource selection

# Sync
npm run pull -- <org>                              # Pull from Vapi (preserve local changes)
npm run pull -- <org> --force                      # Pull from Vapi (overwrite everything)
npm run pull -- <org> --bootstrap                  # Refresh state without writing remote resources locally
npm run pull -- <org> --type squads --id <uuid>    # Pull one known remote resource by UUID
npm run push -- <org>                              # Push all local changes to Vapi
npm run push -- <org> assistants                   # Push only assistants
npm run push -- <org> resources/<org>/assistants/my-agent.md  # Push single file
npm run push -- <org> <path1> <path2>              # Push multiple specific files (one state write)
npm run push -- <org> --dry-run                    # Preview without applying any platform changes
npm run push -- <org> --strict                     # Abort push if any validator returns an error
npm run push -- <org> --allow-new-files            # Bypass orphan-YAML gate (use only after confirming each orphan is intentionally new — see "Orphan-YAML gate" section above)
npm run apply -- <org>                             # Pull then push (full sync)
npm run apply -- <org> --allow-new-files           # Same, propagating the bypass through to the push stage
npm run validate -- <org>                          # Lint resources locally (fails fast on schema drift)
npm run audit -- <org>                             # Read-only drift detector: orphan YAML, state ghosts, content-identical clusters, sibling base-slugs, dashboard orphans, inline model.tools. Exit 1 on findings.
npm run audit -- <org> --type assistants           # Scope audit to a single resource type
npm run sim -- <org> --suite <name> --target <name>  # Run a simulation suite against an assistant/squad
npm run rollback -- <org> --to <ISO-timestamp>     # Re-apply a snapshot taken before a push
npm run rollback -- <org> --list                   # List available snapshots

# Testing
npm run call -- <org> -a <assistant-name>          # Call an assistant via WebSocket
npm run call -- <org> -s <squad-name>              # Call a squad via WebSocket

# Maintenance
npm run cleanup -- <org>                           # Dry-run: show orphaned remote resources
npm run cleanup -- <org> --force                   # Delete orphaned remote resources

# Build
npm run build                                      # Type-check

All commands accept an org slug (e.g. my-org). Running without arguments launches interactive mode.

`npm run call` CLI behavior

The test-call CLI cleans its terminal output for the developer loop:

Coalesced transcripts. Chunked TTS providers (Cartesia Sonic, etc.) stream each utterance as 2–4 separate final transcript events. The CLI buffers consecutive finals from the same role and flushes them as one merged 🤖 Assistant: / 🎤 You: line after a 600 ms quiet window, on role change, on speech-update from the opposite role, on call-ended, and on Ctrl+C.
Suppressed mpg123 warnings. macOS speaker output emits Didn't have any audio data in callback (buffer underflow) lines from native code on every chunk-boundary gap. The npm run call script wraps invocation in bash -c + a stderr filter that drops these lines so they no longer dominate the log. Requires bash on PATH (universal on macOS, Linux, WSL).
Tool / handoff / status visibility. The CLI surfaces previously-dropped WebSocket control messages:
- 🔧 Tool call: <name>(<args>) — regular tool invocations
- 🔀 Handoff → <Target Name> — squad handoffs (detected from handoff_to_<Target_Name> function names)
- ✅ Tool result: <name> → <preview> / ❌ Tool failed: <name> → <preview> — tool responses, truncated to 200 chars
- 📞 Status: <state>[+reason] — in-progress, forwarding, ended
- ⚠️ Hang warning — impending termination
- 🔀 Transfer → <destination> — number / SIP / cross-assistant transfers
Discovery mode. Set VAPI_CALL_DEBUG=1 in the environment to log unknown control message types (high-frequency events like conversation-update, model-output, function-call, user-interrupted are silently dropped by default to keep the log readable):
```
VAPI_CALL_DEBUG=1 npm run call -- <org> -s <squad>
```

These are CLI-only changes — no runtime behavior change for the agent, no per-customer config required. Every downstream customer clone of this template inherits them automatically.

Discovering Available Settings

For the complete schema of all available properties on each resource type, consult the Vapi API documentation:

Resource	API Docs
Assistants	https://docs.vapi.ai/api-reference/assistants/create
Tools	https://docs.vapi.ai/api-reference/tools/create
Squads	https://docs.vapi.ai/api-reference/squads/create
Structured Outputs	https://docs.vapi.ai/api-reference/structured-outputs/structured-output-controller-create
Simulations	https://docs.vapi.ai/api-reference/simulations

For voice/model/transcriber provider options:

Voice providers: https://docs.vapi.ai/providers/voice
Model providers: https://docs.vapi.ai/providers/model
Transcriber providers: https://docs.vapi.ai/providers/transcriber

For feature-specific documentation:

Hooks: https://docs.vapi.ai/assistants/hooks
Tools: https://docs.vapi.ai/tools
Squads: https://docs.vapi.ai/squads
Workflows: https://docs.vapi.ai/workflows

Tip: The Vapi MCP server and API reference pages provide full JSON schemas with all available fields, enums, and defaults. Use them to discover settings not covered in this guide.

Naming Conventions

Filenames include a UUID suffix for uniqueness: my-agent-a1b2c3d4.md
The UUID suffix comes from the Vapi platform ID (first 8 chars of the UUID)
New resources created locally don't need the UUID suffix — it gets added after first push
Tool function names use snake_case: book_appointment, check_availability
Assistant names use natural language: Intake Assistant, Booking Assistant
Structured output names use snake_case: customer_data, call_summary

Renaming an existing resource

The engine has a name_mismatch guard that auto-bootstraps state from the dashboard before applying changes. Editing .vapi-state.<org>.json by hand to repoint a renamed file at the existing dashboard UUID does not work — the bootstrap runs first, overwrites your manual edit, and the rename gets treated as "delete the old resource + create a new one."

What this means in practice for renames:

Approach	What happens
Rename the file locally + `npm run push -- <org>`	New UUID is minted for the renamed file; the old UUID becomes orphaned in the dashboard. Run `npm run cleanup -- <org> --force` (or `npm run push -- <org> --force <file>`) to delete the orphan.
Rename in the dashboard first, then `npm run pull -- <org>`	UUID is preserved. The pulled file lands with the new name and the existing UUID suffix; no orphan.

If preserving the UUID matters (e.g. it's referenced from a phone number, outbound campaign, or external integration), rename via the dashboard first and pull. Otherwise, accept the new UUID and clean up the orphan.

Common Patterns

Transfer to Human

Two-step pattern (speak first, then call tool):

In the system prompt:

When transferring to human:
1. First: Speak transfer message ending with <break time='0.5s'/><flush/>
2. Second: Call transfer_call with no spoken text

Multi-Agent Handoff (Squad)

Create each agent as a separate assistant .md file
Create a squad .yml that lists them as members
Define handoff tools in tools:append on each member
Handoff functions can pass parameters (context) between agents

Post-Call Data Extraction

Create structured outputs for the data you want
Reference them in the assistant's artifactPlan.structuredOutputIds
After each call, Vapi runs the LLM analysis and stores results

Testing with Simulations

Create personalities (how the simulated caller behaves)
Create scenarios (what the simulated caller says + evaluation criteria)
Create simulations (pair personality + scenario)
Create suites (batch simulations together)
Run via Vapi dashboard or API

FilesExpand file tree

AGENTS.md

Latest commit

History