Audio Events And Traces
Audio workflows are event-heavy because they cross model output, provider calls, artifacts, playback, transcript projection, and branch history.
Use typed events for application behavior. Use traces and struct samples for local diagnostics.
Event Families
| Family | Examples | Use it for |
|---|---|---|
| User transcript events | transcript delta, completed, failed | Show or persist realtime input transcripts. |
| Assistant audio output events | output started, text pushed, segment, chunk, artifact, playback, completed, failed | Track final-text TTS and playback flow. |
| Tool/function events | tool start, args, result, end, custom tool progress | Understand realtime tool execution and audio-adjacent tool work. |
| Struct audio samples | playout, queue depth, underrun, diagnostic samples | Local low-overhead audio pipeline diagnostics. |
Assistant audio event type names are listed in the Events Reference. Prefer typed handling over hand-written JSON field assumptions.
Live Events Vs Durable History
Not every audio event should become branch history.
| Data | Usual destination |
|---|---|
| Transcript text | Branch history when committed. |
| Assistant text | Branch history as the primary assistant result. |
| Assistant audio artifact | IContentStore when artifact capture is enabled. |
| Playback progress | Live event stream or trace. |
| Queue depth and underrun samples | Local struct-event observers or diagnostics. |
| Raw input audio | Explicit app storage only. |
This split keeps replay useful without quietly turning every run into a raw-media archive.
Mixed Input Correlation
When a user message contains both text and audio, audio input metadata records the original content index for each detected media item. Use that index to correlate UI attachments, uploaded content, transcripts, and runtime metadata without guessing from display order.
Committed transcript metadata records the transcript text, provider key, route decision, topology, and response ownership when available. Assistant audio output events are separate: they describe synthesized or provider-owned assistant audio, not the user's input transcript.
Assistant Audio Output Flow
Final-text TTS emits events around the assistant audio output flow:
- the assistant text is available,
- synthesis starts,
- segments or chunks may be produced,
- an artifact may be stored,
- playback may queue, progress, complete, or fail,
- the output flow completes or falls back to text-only.
Playback truth is intentionally conservative. A queued artifact is not proof that the user heard the audio.
Struct Samples
Audio struct samples are process-local diagnostics. They are useful when code in the same process needs low-overhead observations such as playback queue depth or underruns.
They are not the same thing as normal AgentEvent streaming. If you need a client, host, or persisted report to see something, emit or bridge a regular agent event or artifact.
Testing
For event-driven audio tests:
- subscribe before
RunAsync, - capture typed events into a list,
- assert event type, session id, branch id, response id, output flow id, and ordering,
- test text-only fallback paths as well as successful synthesis,
- test playback truth separately from synthesis truth.