Skip to content

feat: add Pulse STT support for smallest.ai pulse (streaming + pre-recorded) #4858

Draft
mahimairaja wants to merge 15 commits intolivekit:mainfrom
mahimairaja:feat/smallest-ai-stt
Draft

feat: add Pulse STT support for smallest.ai pulse (streaming + pre-recorded) #4858
mahimairaja wants to merge 15 commits intolivekit:mainfrom
mahimairaja:feat/smallest-ai-stt

Conversation

@mahimairaja
Copy link

What does this PR does?

Adds Speech-to-Text (STT) support to the livekit-plugins-smallestai plugin using Smallest AI's Pulse STT API. The existing plugin only supported TTS, this PR brings it to parity with plugins like Deepgram, ElevenLabs, and Soniox that offer both TTS and STT.

Closes #4856

Summary of Changes

New: STT class (stt.py)

  • Pre-recorded transcription via HTTP POST (/api/v1/pulse/get_text)
  • Real-time streaming via WebSocket (wss://waves-api.smallest.ai/api/v1/pulse/get_text)
  • ~64ms TTFB streaming, word-level timestamps, speaker diarization
  • 32+ languages with auto-detection (language="multi")
  • Capabilities: streaming=True, interim_results=True

New: SpeechStream class (stt.py)

  • WebSocket-based streaming with concurrent send/recv/keepalive tasks
  • Audio chunking via AudioByteStream (~4096 byte chunks per Smallest AI docs)
  • Full speech event lifecycle: START_OF_SPEECHINTERIM_TRANSCRIPT / FINAL_TRANSCRIPTEND_OF_SPEECH
  • Graceful shutdown with {"type": "end"} signaling

Usage

from livekit.plugins import smallestai

# Pre-recorded
stt = smallestai.STT(language="en")

# Streaming (used in AgentSession)
session = AgentSession(
    stt=smallestai.STT(language="en"),
    llm=...,
    tts=smallestai.TTS(),
)

Configuration via SMALLEST_API_KEY environment variable (same key used for TTS).

Testing

  • Verified pre-recorded transcription with WAV audio files
  • Verified real-time streaming with live microphone input via LiveKit Agents Playground
  • Tested interim + final transcript emission and speech event lifecycle
  • Tested with language="en" and language="multi" (auto-detection)
  • Ran ruff format and check
❯ uv run ruff check .
All checks passed!

❯ uv run ruff format .
629 files left unchanged
  • Ran type checking
❯ uv pip install pip && uv run mypy --install-types --non-interactive \
    -p livekit.agents \
    -p livekit.plugins.smallestai
Audited 1 package in 5ms
Success: no issues found in 169 source files

API Reference

@CLAassistant
Copy link

CLAassistant commented Feb 16, 2026

CLA assistant check
All committers have signed the CLA.

devin-ai-integration[bot]

This comment was marked as resolved.

chatgpt-codex-connector[bot]

This comment was marked as resolved.

@mahimairaja
Copy link
Author

mahimairaja commented Feb 16, 2026

Tested prerecorded:

import asyncio
from pathlib import Path

import aiohttp
from dotenv import load_dotenv

from livekit.agents import utils
from livekit.plugins import smallestai

load_dotenv()


async def main():
    wav = Path(__file__).resolve().parent / "sample.wav"

    async with aiohttp.ClientSession() as session:
        stt = smallestai.STT(language="en", http_session=session)
        frames = [
            f
            async for f in utils.audio.audio_frames_from_file(
                str(wav), sample_rate=16000, num_channels=1
            )
        ]
        event = await stt.recognize(frames)

    print(event.alternatives[0].text if event.alternatives else "")


if __name__ == "__main__":
    asyncio.run(main())

@mahimairaja
Copy link
Author

mahimairaja commented Feb 16, 2026

Testing streaming:

from dotenv import load_dotenv

from livekit import agents
from livekit.agents import Agent, AgentServer, AgentSession, room_io
from livekit.plugins import silero
from livekit.plugins.openai.llm import LLM
from livekit.plugins.smallestai.stt import STT
from livekit.plugins.smallestai.tts import TTS
from livekit.plugins.turn_detector.english import EnglishModel

load_dotenv()


class Assistant(Agent):
    def __init__(self) -> None:
        super().__init__(
            instructions="""You are a helpful voice AI assistant.""",
        )


server = AgentServer()


@server.rtc_session(agent_name="my-agent")
async def my_agent(ctx: agents.JobContext):
    session = AgentSession(
        stt=STT(),
        llm=LLM(model="gpt-4.1-mini"),
        tts=TTS(),
        vad=silero.VAD.load(),
        turn_detection=EnglishModel(),
    )

    await session.start(
        room=ctx.room,
        agent=Assistant(),
        room_options=room_io.RoomOptions(),
    )

    await session.generate_reply(instructions="Greet the user and offer your assistance.")


if __name__ == "__main__":
    agents.cli.run_app(server)

devin-ai-integration[bot]

This comment was marked as resolved.

devin-ai-integration[bot]

This comment was marked as resolved.

devin-ai-integration[bot]

This comment was marked as resolved.

@mahimairaja
Copy link
Author

After conversations with @ harshitajain165 from smallest.ai, I came to know that few more steps needed for streaming support from the smallest server. for now I am moving this PR to draft.

@mahimairaja mahimairaja marked this pull request as draft February 16, 2026 19:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

feat: Add STT (Speech-to-Text) support to livekit-plugins-smallestai

2 participants