Skip to content

Python: Skip get_final_response in OTel _finalize_stream when stream errored#5232

Open
droideronline wants to merge 3 commits intomicrosoft:mainfrom
droideronline:dinesh/fix-otel-finalize-stream-result-hooks-on-error
Open

Python: Skip get_final_response in OTel _finalize_stream when stream errored#5232
droideronline wants to merge 3 commits intomicrosoft:mainfrom
droideronline:dinesh/fix-otel-finalize-stream-result-hooks-on-error

Conversation

@droideronline
Copy link
Copy Markdown
Contributor

Closes #5231

Problem

AgentTelemetryLayer registers _finalize_stream as a cleanup hook on the ResponseStream. Cleanup hooks run on both normal completion and streaming errors. Inside _finalize_stream, get_final_response() was called unconditionally — and get_final_response() always runs all registered result hooks, including _post_hook in _agents.py which calls _run_after_providers.

This caused after_run context providers to fire on error paths, which is incorrect.

The call chain on streaming error:

stream raises Exception
  → ResponseStream.__anext__ runs _run_cleanup_hooks()
    → _finalize_stream() called
      → get_final_response() called unconditionally   ← bug
        → result hooks fire (e.g. _post_hook)
          → _run_after_providers called               ← wrong on error

Fix

In both _finalize_stream closures (chat client layer and agent layer), check result_stream._consumed before calling get_final_response(). _consumed is set to True only in the StopAsyncIteration branch of __anext__ (normal stream completion). On the error path it remains False, so we can use it to skip get_final_response() safely.

async def _finalize_stream() -> None:
    try:
        if not result_stream._consumed:
            # Stream did not complete normally (e.g., it errored). Skip
            # get_final_response() to avoid firing result hooks such as
            # after_run context providers on error paths. The span is
            # still closed in the finally block below.
            return
        response = await result_stream.get_final_response()
        ...
    except Exception as exception:
        capture_exception(span=span, exception=exception, timestamp=time_ns())
    finally:
        _close_span()

The OTel span is still closed via the finally block in both the error and normal paths.

Notes

Affects both _trace_chat_client_invocation (chat layer) and _trace_agent_invocation (agent layer) — both had the same pattern and are fixed here.

Copilot AI review requested due to automatic review settings April 13, 2026 17:37
…errored

When a streaming error occurs, _finalize_stream (a cleanup hook registered by
AgentTelemetryLayer) was unconditionally calling get_final_response(), which
triggers all registered result hooks including after_run context providers.
This caused providers to fire incorrectly on error paths.

Guard against this by checking result_stream._consumed: True only after
StopAsyncIteration (normal completion), False when an exception was raised.
The fix applies to both the chat client and agent telemetry layers.

Closes microsoft#5231
@droideronline droideronline force-pushed the dinesh/fix-otel-finalize-stream-result-hooks-on-error branch from ec7ad25 to b345761 Compare April 13, 2026 17:38
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Note

Copilot was unable to run its full agentic suite in this review.

Fixes OTel streaming instrumentation so result hooks (e.g., after_run context providers) don’t fire when a stream errors, by skipping get_final_response() unless the stream completed normally.

Changes:

  • Add a _consumed guard in the chat stream _finalize_stream to avoid calling get_final_response() on error paths.
  • Add the same guard in the agent stream _finalize_stream.

…ror in OTel span

Address Copilot review feedback on microsoft#5232:

- Add `_stream_error: Exception | None` to ResponseStream, set in __anext__'s
  except branch so cleanup hooks can inspect the failure.
- Expose public `consumed` and `stream_error` properties to avoid coupling
  observability.py to private stream internals.
- Update both _finalize_stream closures (chat and agent layers) to use the
  public properties and call capture_exception() with the stream error before
  returning early, ensuring the OTel span records the failure rather than
  closing silently.
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.

- Use stream_error is not None as the guard in _finalize_stream instead of
  not consumed, so the early-return path is keyed precisely to actual errors
  rather than any non-normal completion state.
- Clear _stream_error after _run_cleanup_hooks() completes to avoid retaining
  the exception traceback (and any large object graphs it references) on the
  stream instance beyond the cleanup phase.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Python: OTel _finalize_stream triggers result hooks on streaming error, causing after_run providers to fire incorrectly

3 participants