Skip to content

Comments

fix(client): propagate HTTP transport exceptions to caller#2124

Open
gspeter-max wants to merge 3 commits intomodelcontextprotocol:mainfrom
gspeter-max:python-sdk/issue_2110
Open

fix(client): propagate HTTP transport exceptions to caller#2124
gspeter-max wants to merge 3 commits intomodelcontextprotocol:mainfrom
gspeter-max:python-sdk/issue_2110

Conversation

@gspeter-max
Copy link

Summary

Fixes #2110 - HTTP transport swallows non-2xx status codes causing client to hang

When an MCP server returns non-2xx HTTP status codes (401/403/404/5xx), the Streamable HTTP and SSE transports catch exceptions in post_writer but only log them without forwarding them through the read stream. This causes callers to hang indefinitely waiting for a response that will never arrive.

Root Cause

In both sse.py and streamable_http.py, the post_writer function catches exceptions but never sends them to read_stream_writer, breaking the exception propagation pattern that works correctly in other transports (stdio, websocket).

Changes

  • src/mcp/client/sse.py: Modified exception handler in post_writer to send exceptions to read_stream_writer
  • src/mcp/client/streamable_http.py: Modified exception handler in post_writer to send exceptions to read_stream_writer

This aligns with the pattern used in working transports:

  • stdio.py line 156: await read_stream_writer.send(exc)
  • websocket.py line 59: await read_stream_writer.send(exc)

Testing

  • All existing client tests pass (182 passed)
  • All shared/SSE tests pass (146 passed, 25 passed respectively)
  • Code formatting and linting verified
  • Commit includes proper trailers (Reported-by, Github-Issue)

ROOT CAUSE:
In SSE and StreamableHTTP transports, when HTTP errors occur or other
exceptions are raised in post_writer, exceptions are caught and logged
but never sent to read_stream_writer, causing the caller to hang
indefinitely waiting for a response that will never arrive.

CHANGES:
- sse.py: Send exceptions from post_writer to read_stream_writer
- streamable_http.py: Send exceptions from post_writer to read_stream_writer

IMPACT:
This prevents callers from hanging when HTTP errors occur in the
transport layer, ensuring exceptions are properly propagated to the
caller for handling.

FILES MODIFIED:
- src/mcp/client/sse.py
- src/mcp/client/streamable_http.py

Reported-by: gspeter
Github-Issue: modelcontextprotocol#2110
@gspeter-max
Copy link
Author

CI Failure Note

The failing test test_basic_child_process_cleanup in tests/client/test_stdio.py is unrelated to this PR.

Evidence:

  • This PR only modifies sse.py and streamable_http.py (HTTP transports)
  • The failing test is in test_stdio.py (stdio transport process cleanup)
  • 23 out of 24 test jobs passed - only one Windows 3.11 job failed
  • This is a known flaky Windows test that intermittently fails

Files changed in this PR:

  • src/mcp/client/sse.py (+3 -1)
  • src/mcp/client/streamable_http.py (+3 -1)

Requesting a re-run of the failed job would be appreciated.

ROOT CAUSE:
Flaky test failures on Windows Python 3.11 CI were caused by
arbitrary timeout-based waiting (sleep 0.5s + 0.3s) instead of
waiting for the actual condition (child process writing to file).

CHANGES:
- Added _wait_for_file_to_exist() helper with condition-based polling
- Replaced arbitrary sleep calls with condition-based waiting in:
  - test_basic_child_process_cleanup
  - test_nested_process_tree
  - test_early_parent_exit

IMPACT:
Eliminates race conditions in process startup timing. Tests now wait
for the actual condition (file exists and is non-empty) rather than
guessing how long process startup takes.

FILES MODIFIED:
- tests/client/test_stdio.py
@gspeter-max
Copy link
Author

Update: Fixed the Flaky Test

I have also fixed the flaky test that was failing on Windows Python 3.11.

Root Cause:
The test used arbitrary timeouts (sleep 0.5s + 0.3s) instead of waiting for the actual condition (child process writing to file).

Fix:

  • Added _wait_for_file_to_exist() helper with condition-based polling
  • Replaced arbitrary sleep with condition-based waiting in 3 tests
  • Tests now poll every 10ms for the file to exist and be non-empty
  • 5-second timeout prevents infinite hangs

Verification:

  • All 11 stdio tests pass locally
  • All 3 TestChildProcessCleanup tests pass
  • This eliminates the race condition that caused the intermittent Windows CI failure

ROOT CAUSE:
CI coverage check failed because timeout error paths in
_wait_for_file_to_exist() were not covered (99.96% instead of required 100%).

CHANGES:
- Added test_wait_for_file_to_exist_timeout() to test nonexistent file case
- Added test_wait_for_file_to_exist_empty_file() to test empty file case

IMPACT:
Brings test_stdio.py coverage back to 100%, satisfying CI requirements.
These error paths are now tested with realistic timeout scenarios.

FILES MODIFIED:
- tests/client/test_stdio.py
@gspeter-max
Copy link
Author

@maxisbey @Kludex @felixweinberger This PR fixes issue #2110. Would appreciate your review when you have a chance.

Summary of Changes:

  1. Fixed HTTP transport exception propagation in SSE and StreamableHTTP transports
  2. Fixed flaky test with condition-based waiting approach
  3. Added coverage for all error paths

All tests pass locally and CI is running. Thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

HTTP transport swallows non-2xx status codes causing client to hang

2 participants