Python: Skip get_final_response in OTel _finalize_stream when stream errored#5232
Open
droideronline wants to merge 3 commits intomicrosoft:mainfrom
Open
Conversation
…errored When a streaming error occurs, _finalize_stream (a cleanup hook registered by AgentTelemetryLayer) was unconditionally calling get_final_response(), which triggers all registered result hooks including after_run context providers. This caused providers to fire incorrectly on error paths. Guard against this by checking result_stream._consumed: True only after StopAsyncIteration (normal completion), False when an exception was raised. The fix applies to both the chat client and agent telemetry layers. Closes microsoft#5231
ec7ad25 to
b345761
Compare
5 tasks
Contributor
There was a problem hiding this comment.
Pull request overview
Note
Copilot was unable to run its full agentic suite in this review.
Fixes OTel streaming instrumentation so result hooks (e.g., after_run context providers) don’t fire when a stream errors, by skipping get_final_response() unless the stream completed normally.
Changes:
- Add a
_consumedguard in the chat stream_finalize_streamto avoid callingget_final_response()on error paths. - Add the same guard in the agent stream
_finalize_stream.
…ror in OTel span Address Copilot review feedback on microsoft#5232: - Add `_stream_error: Exception | None` to ResponseStream, set in __anext__'s except branch so cleanup hooks can inspect the failure. - Expose public `consumed` and `stream_error` properties to avoid coupling observability.py to private stream internals. - Update both _finalize_stream closures (chat and agent layers) to use the public properties and call capture_exception() with the stream error before returning early, ensuring the OTel span records the failure rather than closing silently.
- Use stream_error is not None as the guard in _finalize_stream instead of not consumed, so the early-return path is keyed precisely to actual errors rather than any non-normal completion state. - Clear _stream_error after _run_cleanup_hooks() completes to avoid retaining the exception traceback (and any large object graphs it references) on the stream instance beyond the cleanup phase.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Closes #5231
Problem
AgentTelemetryLayerregisters_finalize_streamas a cleanup hook on theResponseStream. Cleanup hooks run on both normal completion and streaming errors. Inside_finalize_stream,get_final_response()was called unconditionally — andget_final_response()always runs all registered result hooks, including_post_hookin_agents.pywhich calls_run_after_providers.This caused after_run context providers to fire on error paths, which is incorrect.
The call chain on streaming error:
Fix
In both
_finalize_streamclosures (chat client layer and agent layer), checkresult_stream._consumedbefore callingget_final_response()._consumedis set toTrueonly in theStopAsyncIterationbranch of__anext__(normal stream completion). On the error path it remainsFalse, so we can use it to skipget_final_response()safely.The OTel span is still closed via the
finallyblock in both the error and normal paths.Notes
Affects both
_trace_chat_client_invocation(chat layer) and_trace_agent_invocation(agent layer) — both had the same pattern and are fixed here.