Skip to content

Audio Events And Traces

Audio workflows are event-heavy because they cross model output, provider calls, artifacts, playback, transcript projection, and branch history.

Use typed events for application behavior. Use traces and struct samples for local diagnostics.

Event Families

FamilyExamplesUse it for
User transcript eventstranscript delta, completed, failedShow or persist realtime input transcripts.
Assistant audio output eventsoutput started, text pushed, segment, chunk, artifact, playback, completed, failedTrack final-text TTS and playback flow.
Tool/function eventstool start, args, result, end, custom tool progressUnderstand realtime tool execution and audio-adjacent tool work.
Struct audio samplesplayout, queue depth, underrun, diagnostic samplesLocal low-overhead audio pipeline diagnostics.

Assistant audio event type names are listed in the Events Reference. Prefer typed handling over hand-written JSON field assumptions.

Live Events Vs Durable History

Not every audio event should become branch history.

DataUsual destination
Transcript textBranch history when committed.
Assistant textBranch history as the primary assistant result.
Assistant audio artifactIContentStore when artifact capture is enabled.
Playback progressLive event stream or trace.
Queue depth and underrun samplesLocal struct-event observers or diagnostics.
Raw input audioExplicit app storage only.

This split keeps replay useful without quietly turning every run into a raw-media archive.

Mixed Input Correlation

When a user message contains both text and audio, audio input metadata records the original content index for each detected media item. Use that index to correlate UI attachments, uploaded content, transcripts, and runtime metadata without guessing from display order.

Committed transcript metadata records the transcript text, provider key, route decision, topology, and response ownership when available. Assistant audio output events are separate: they describe synthesized or provider-owned assistant audio, not the user's input transcript.

Assistant Audio Output Flow

Final-text TTS emits events around the assistant audio output flow:

  1. the assistant text is available,
  2. synthesis starts,
  3. segments or chunks may be produced,
  4. an artifact may be stored,
  5. playback may queue, progress, complete, or fail,
  6. the output flow completes or falls back to text-only.

Playback truth is intentionally conservative. A queued artifact is not proof that the user heard the audio.

Struct Samples

Audio struct samples are process-local diagnostics. They are useful when code in the same process needs low-overhead observations such as playback queue depth or underruns.

They are not the same thing as normal AgentEvent streaming. If you need a client, host, or persisted report to see something, emit or bridge a regular agent event or artifact.

Testing

For event-driven audio tests:

  • subscribe before RunAsync,
  • capture typed events into a list,
  • assert event type, session id, branch id, response id, output flow id, and ordering,
  • test text-only fallback paths as well as successful synthesis,
  • test playback truth separately from synthesis truth.

Built for production .NET agent applications.