Skip to content

fix(sandbox): relay WebSocket frames after HTTP 101 Switching Protocols#718

Open
johntmyers wants to merge 1 commit intomainfrom
fix/websocket-101-upgrade-relay-v2
Open

fix(sandbox): relay WebSocket frames after HTTP 101 Switching Protocols#718
johntmyers wants to merge 1 commit intomainfrom
fix/websocket-101-upgrade-relay-v2

Conversation

@johntmyers
Copy link
Copy Markdown
Collaborator

Summary

  • Detect HTTP 101 Switching Protocols in the L7 relay and switch to raw bidirectional TCP instead of re-entering the HTTP parsing loop
  • Validate client sent Upgrade + Connection: Upgrade headers before accepting 101 (rejects unsolicited upgrades from non-compliant upstream servers)
  • Replace bool return type with RelayOutcome enum across the L7Provider trait and all relay functions for cleaner upgrade signaling

Related Issue

Changes

  • crates/openshell-sandbox/src/l7/provider.rs: Add RelayOutcome enum (Reusable/Consumed/Upgraded { overflow }). Update L7Provider::relay trait return type from Result<bool> to Result<RelayOutcome>.
  • crates/openshell-sandbox/src/l7/rest.rs: Detect 101 in relay_response() before the generic 1xx/bodiless handler. Return RelayOutcome directly (no intermediate tuple). Add client_requested_upgrade() validation in relay_http_request_with_resolver(). Update relay_response_to_client wrapper to return RelayOutcome.
  • crates/openshell-sandbox/src/l7/relay.rs: Add shared handle_upgrade() helper. Update relay_rest() and relay_passthrough_with_credentials() to match on RelayOutcome. Add l7_decision = "allow_upgrade" audit log annotation.
  • crates/openshell-sandbox/tests/websocket_upgrade.rs: Integration test spinning up a real WebSocket echo server, upgrading through L7Provider::relay, and exchanging a text frame bidirectionally.
  • crates/openshell-sandbox/Cargo.toml: Add tokio-tungstenite and futures dev-dependencies for integration tests.

Improvements over #683

Area #683 This PR
Return type (bool, u16, Vec<u8>) tuple RelayOutcome enum directly from relay_response()
Trait consistency L7Provider::relay still returned bool Trait updated to Result<RelayOutcome>
Unsolicited 101 No validation Validates client sent Upgrade + Connection: Upgrade headers
Code duplication Duplicate upgrade handling in two relay paths Shared handle_upgrade() helper
Audit logging No upgrade-specific annotation l7_decision = "allow_upgrade" for observability
Dead code relay_response_to_client silently discarded overflow Updated to return RelayOutcome
Integration test Unit test only (relay_response in isolation) Full WebSocket echo test through L7Provider::relay

Security Notes

  • Post-upgrade L7 bypass is inherent and documented: After copy_bidirectional, no further L7 inspection occurs. The initial HTTP request is policy-checked before the upstream ever sees it.
  • Unsolicited 101 rejection: If the upstream sends 101 but the client never requested an upgrade, the connection is killed (Consumed). Prevents a non-compliant upstream from triggering a policy bypass.
  • Overflow buffer is bounded: ~17 KiB max (header read buffer + one read call), immediately forwarded, then dropped.

Testing

  • mise run pre-commit passes (format fixed)
  • All 405 existing unit tests pass
  • 8 new unit tests: 101 overflow capture, 101 no overflow, unsolicited 101 rejection, accepted 101 with upgrade headers, 4 client_requested_upgrade header parsing tests
  • 2 new integration tests: WebSocket upgrade through L7 relay with echo exchange, normal HTTP regression test
  • cargo clippy clean (no new warnings)

Checklist

  • Follows Conventional Commits
  • Architecture docs updated (not applicable — no new components)
  • Tests added for new behavior

Detect 101 Switching Protocols in relay_response() and switch to raw
bidirectional TCP relay instead of re-entering the HTTP parsing loop.

Previously, is_bodiless_response() treated 101 as a generic 1xx
informational response, forwarding only the headers and returning to
the HTTP parsing loop. After a 101, subsequent bytes are upgraded
protocol frames (e.g. WebSocket), not HTTP — causing the relay to
block or silently drop all post-upgrade traffic.

Changes:
- Add RelayOutcome enum (Reusable/Consumed/Upgraded) replacing bool
  return type across L7Provider::relay trait and all relay functions
- Detect 101 before generic 1xx handler in relay_response(), capture
  overflow bytes, return RelayOutcome::Upgraded
- Validate client sent Upgrade + Connection: Upgrade headers before
  accepting 101 (rejects unsolicited upgrades from non-compliant
  upstream servers)
- Extract shared handle_upgrade() helper used by both relay_rest()
  and relay_passthrough_with_credentials()
- Add l7_decision=allow_upgrade audit log annotation for upgrades
- Add unit tests for 101 overflow capture, unsolicited 101 rejection,
  and client_requested_upgrade header validation
- Add integration test: WebSocket echo through L7Provider::relay

Fixes: #652
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

test:e2e Requires end-to-end coverage

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Bug] Egress proxy fails to relay WebSocket frames after successful HTTP CONNECT + 101 Switching Protocols

1 participant