feat: add Pulse STT support for smallest.ai pulse (streaming + pre-recorded) #4858
Draft
mahimairaja wants to merge 15 commits intolivekit:mainfrom
Draft
feat: add Pulse STT support for smallest.ai pulse (streaming + pre-recorded) #4858mahimairaja wants to merge 15 commits intolivekit:mainfrom
mahimairaja wants to merge 15 commits intolivekit:mainfrom
Conversation
Author
|
Tested prerecorded: import asyncio
from pathlib import Path
import aiohttp
from dotenv import load_dotenv
from livekit.agents import utils
from livekit.plugins import smallestai
load_dotenv()
async def main():
wav = Path(__file__).resolve().parent / "sample.wav"
async with aiohttp.ClientSession() as session:
stt = smallestai.STT(language="en", http_session=session)
frames = [
f
async for f in utils.audio.audio_frames_from_file(
str(wav), sample_rate=16000, num_channels=1
)
]
event = await stt.recognize(frames)
print(event.alternatives[0].text if event.alternatives else "")
if __name__ == "__main__":
asyncio.run(main()) |
Author
|
Testing streaming: from dotenv import load_dotenv
from livekit import agents
from livekit.agents import Agent, AgentServer, AgentSession, room_io
from livekit.plugins import silero
from livekit.plugins.openai.llm import LLM
from livekit.plugins.smallestai.stt import STT
from livekit.plugins.smallestai.tts import TTS
from livekit.plugins.turn_detector.english import EnglishModel
load_dotenv()
class Assistant(Agent):
def __init__(self) -> None:
super().__init__(
instructions="""You are a helpful voice AI assistant.""",
)
server = AgentServer()
@server.rtc_session(agent_name="my-agent")
async def my_agent(ctx: agents.JobContext):
session = AgentSession(
stt=STT(),
llm=LLM(model="gpt-4.1-mini"),
tts=TTS(),
vad=silero.VAD.load(),
turn_detection=EnglishModel(),
)
await session.start(
room=ctx.room,
agent=Assistant(),
room_options=room_io.RoomOptions(),
)
await session.generate_reply(instructions="Greet the user and offer your assistance.")
if __name__ == "__main__":
agents.cli.run_app(server) |
Author
|
After conversations with @ harshitajain165 from smallest.ai, I came to know that few more steps needed for streaming support from the smallest server. for now I am moving this PR to draft. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What does this PR does?
Adds Speech-to-Text (STT) support to the
livekit-plugins-smallestaiplugin using Smallest AI's Pulse STT API. The existing plugin only supported TTS, this PR brings it to parity with plugins like Deepgram, ElevenLabs, and Soniox that offer both TTS and STT.Closes #4856
Summary of Changes
New:
STTclass (stt.py)/api/v1/pulse/get_text)wss://waves-api.smallest.ai/api/v1/pulse/get_text)language="multi")streaming=True,interim_results=TrueNew:
SpeechStreamclass (stt.py)AudioByteStream(~4096 byte chunks per Smallest AI docs)START_OF_SPEECH→INTERIM_TRANSCRIPT/FINAL_TRANSCRIPT→END_OF_SPEECH{"type": "end"}signalingUsage
Configuration via
SMALLEST_API_KEYenvironment variable (same key used for TTS).Testing
language="en"andlanguage="multi"(auto-detection)API Reference