Skip to content

ElevenLabs Audio Provider

The ElevenLabs audio provider contributes speech-to-text and text-to-speech under the elevenlabs provider key.

Use it when an agent should transcribe user audio with Scribe or synthesize assistant text through ElevenLabs voices.

Package And Families

Reference the ElevenLabs audio provider package or project. The provider key is:

text
elevenlabs

Source-confirmed client families:

FamilyConfig slot
Speech to textClients.SpeechToText
Text to speechClients.TextToSpeech

ElevenLabs does not provide chat, realtime, image, embedding, or hosted-file clients in this provider package.

Secrets

Use:

text
ELEVENLABS_API_KEY

Provider-scoped configuration can also use the elevenlabs:ApiKey shape.

Defaults

Source defaults include:

OptionDefault
STT modelscribe_v1
Realtime STT modelscribe_v2_realtime
TTS modeleleven_turbo_v2_5
TTS voice21m00Tcm4TlvDq8ikWAM
TTS output formatmp3_44100_128

Speech To Text

ElevenLabs speech-to-text uses the Scribe endpoint through the MEAI ISpeechToTextClient abstraction.

In C# apps, the fluent helper configures Clients.SpeechToText:

csharp
using HPD.Agent;
using HPD.Agent.Providers.Audio.ElevenLabs;
csharp
var agent = await new AgentBuilder()
    .WithElevenLabsSpeechToText(
        model: "scribe_v2",
        configure: options =>
        {
            options.LanguageCode = "en";
            options.Diarize = true;
        })
    .BuildAsync();

The helper selects the speech-to-text client family. Audio runtime attachment still decides when finite audio input is transcribed and how committed transcripts are projected into the turn.

ElevenLabs also supports realtime speech-to-text through the same ISpeechToTextClient streaming method. This is not the Clients.Realtime model family and should not be configured as the agent realtime transport. Use it when the app wants live transcript updates from ElevenLabs, then passes committed text into the agent flow.

ElevenLabs STT options use ElevenLabsSttConfig through ProviderOptionsJson:

json
{
  "Clients": {
    "SpeechToText": {
      "ProviderKey": "elevenlabs",
      "ModelName": "scribe_v2",
      "ProviderOptionsJson": "{\"languageCode\":\"en\",\"diarize\":true,\"timestampsGranularity\":\"word\"}"
    }
  }
}

Source-confirmed STT provider options:

OptionUse it for
apiKeyProvider-specific API key override.
baseUrlHTTP API base URL override.
webSocketBaseUrlWebSocket API base URL override for realtime STT.
defaultModelIdDefault Scribe model when ModelName is not set.
realtimeModelIdDefault realtime Scribe model, usually scribe_v2_realtime.
languageCodeDefault language code.
diarizeSpeaker diarization toggle.
tagAudioEventsAudio-event tagging toggle.
timestampsGranularityTimestamp granularity, such as word.
audioFormatRealtime STT audio format, such as pcm_16000.
commitStrategyRealtime STT commit strategy, such as manual.
includeTimestampsInclude timestamps in realtime transcript responses.
includeLanguageDetectionInclude language detection in realtime transcript responses.
keytermsRealtime STT keyterms/hints.
noVerbatimDisable verbatim transcript behavior when supported.
vadSilenceThresholdSecondsProvider VAD silence threshold.
vadThresholdProvider VAD threshold.
minSpeechDurationMillisecondsMinimum speech duration for VAD.
minSilenceDurationMillisecondsMinimum silence duration for VAD.
enableLoggingProvider-side logging toggle.
streamingChunkSizeBytesClient-side chunk size for streaming audio.

Text To Speech

ElevenLabs TTS options use ElevenLabsTtsConfig through ProviderOptionsJson:

In C# apps, the fluent helper configures the same Clients.TextToSpeech family slot:

csharp
using HPD.Agent;
using HPD.Agent.Audio;
using HPD.Agent.Providers.Audio.ElevenLabs;
csharp
var agent = await new AgentBuilder()
    .WithElevenLabsTextToSpeech(
        model: "eleven_turbo_v2_5",
        voice: "21m00Tcm4TlvDq8ikWAM",
        outputFormat: "mp3_44100_128",
        configure: options =>
        {
            options.Stability = 0.5;
            options.SimilarityBoost = 0.75;
        })
    .WithAudioRuntimeAttachment(audio =>
    {
        audio.AssistantOutputSynthesisMode = AssistantOutputSynthesisMode.FinalText;
    })
    .BuildAsync();
json
{
  "Clients": {
    "TextToSpeech": {
      "ProviderKey": "elevenlabs",
      "ModelName": "eleven_turbo_v2_5",
      "ProviderOptionsJson": "{\"defaultVoiceId\":\"21m00Tcm4TlvDq8ikWAM\",\"outputFormat\":\"mp3_44100_128\",\"stability\":0.5,\"similarityBoost\":0.75}"
    }
  }
}

Source-confirmed provider options:

OptionUse it for
apiKeyProvider-specific API key override.
baseUrlHTTP API base URL override.
webSocketBaseUrlWebSocket API base URL override.
defaultModelIdDefault TTS model when ModelName is not set.
defaultVoiceIdDefault ElevenLabs voice id.
outputFormatOutput format, such as mp3_44100_128.
stabilityVoice stability.
similarityBoostVoice similarity boost.
styleVoice style.
useSpeakerBoostSpeaker boost toggle.
speedSpeech speed.
applyTextNormalizationElevenLabs text-normalization behavior.
enablePushTextStreamingEnable push-text streaming support.
pushTextAggregationModePush-text aggregation mode.
autoModeProvider auto mode.
syncAlignmentAlignment synchronization.
inactivityTimeoutStreaming inactivity timeout.

Manual typed config is still useful when you are dynamically composing family configs or loading part of the setup outside normal builder code:

csharp
using HPD.Agent;
using HPD.Agent.Providers;
using HPD.Agent.Providers.Audio.ElevenLabs;
csharp
var tts = new ClientProviderConfig
{
    ProviderKey = "elevenlabs",
    ModelName = "eleven_turbo_v2_5"
};

tts.SetProviderConfig(
    new ElevenLabsTtsConfig
    {
        DefaultVoiceId = "21m00Tcm4TlvDq8ikWAM",
        OutputFormat = "mp3_44100_128",
        Stability = 0.5,
        SimilarityBoost = 0.75
    },
    ProviderClientFamily.TextToSpeech);

Use With Audio Output

ElevenLabs supplies the TTS client. HPD audio runtime attachment decides when assistant text is synthesized, where audio artifacts are stored, and whether playback hooks are used.

See Text To Speech Output for the runtime side of the flow.

Built for production .NET agent applications.