feat(clevrlabs): add Clevr Labs TTS plugin#6005
Conversation
Adds livekit-plugins-clevrlabs, a streaming TTS plugin backed by the Clevr Labs conversational speech model. Supports per-conversation voice consistency via add_user_turn(), and ships an is_whisper_hallucination() helper for filtering Whisper-family STT before it pollutes voice context. Registered in the workspace [tool.uv.sources]. Passes ruff (lint + format) and mypy --strict under the repo's root config.
| for chunk in buf.push(data): | ||
| await self._synthesize_segment(chunk, output_emitter, audio_bstream) |
There was a problem hiding this comment.
🟡 Missing _mark_started() call prevents TTS metrics from being emitted
All other streaming TTS plugins (e.g. Cartesia at cartesia/tts.py:424, ElevenLabs at elevenlabs/tts.py:480, Deepgram at deepgram/tts.py:316) call self._mark_started() when they begin synthesizing. The Clevr Labs plugin never calls it, so _started_time remains 0. In the base class's _metrics_monitor_task (livekit-agents/livekit/agents/tts/tts.py:576), the guard if not self._started_time returns early, meaning TTS metrics (TTFB, duration, etc.) are silently never emitted.
Was this helpful? React with 👍 or 👎 to provide feedback.
…ies, httpx dep, int dtype
…nt import table Avoids the ~100ms import-time scan of the full Unicode range by inspecting only the characters present in each (short) input string. Behaviour is byte-identical across all 1,114,112 codepoints. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…afe context, self-healing session, drop sample_rate - _synthesize_segment: open the session inside the wrapped try so session-start HTTP errors become APIError (retryable/visible) instead of a raw httpx error - clear _pending_user_turn only after a successful request so the user audio context survives a base-class retry - simplify session handling to a lock + started flag; a failed start no longer caches the error permanently, it is retried on the next call - remove the sample_rate constructor knob (server output is fixed at 24 kHz); output rate is now the _OUTPUT_SAMPLE_RATE constant Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
|
|
||
| _CURRENCY_MAP = { | ||
| "$": ("dollar", "cent"), | ||
| "£": ("pound", "penny"), |
There was a problem hiding this comment.
🟡 Incorrect pluralization of "penny" produces "pennys" instead of "pence"
The _expand_currency function naively appends 's' to pluralize all fractional currency names. For $ and € this works ("cent" → "cents"), but for £ it produces "pennys" instead of the correct "pence". For example, £2.50 becomes "two pounds and fifty pennys".
| "£": ("pound", "penny"), | |
| "£": ("pound", "pence"), |
Was this helpful? React with 👍 or 👎 to provide feedback.
livekit-plugins-clevrlabs
Adds a streaming TTS plugin for the Clevr Labs conversational speech
model,
following the existing provider-plugin structure (modeled on cartesia).
tts.TTS(streaming=True)add_user_turn()off the public surface)
Already published & in production use:
https://pypi.org/project/livekit-plugins-clevrlabs/
Passes
make check(ruff lint + format, mypy --strict) under the rootconfig.
Testing
Talks to the hosted Clevr API; needs a key (free at theclevr.com). The
model is
live right now — happy to provision a credited test account for any
maintainer
who wants to run it live, or you can talk to the hosted model directly.
Reach me
on the LiveKit Slack (@cyrus), or at cyrus@theclevr.com.
Demo link, Uploaded to youtube for ease of access.
Youtube: https://youtu.be/pN7K82K9SzE
Cheers, Thank you for all the work you do. =)