PR5 ready. CI green. #481

Merged
navicore merged 2 commits from net-pr5 into main 2026-05-13 16:47:48 +00:00
Owner

Changes:

  • crates/runtime/src/udp.rs — patch_seq_udp_send_to no longer builds a "host:port" string and hands it to may's
    ToSocketAddrs. Instead it calls crate::dns::resolve(host) (cooperative via the worker pool), parses returned
    strings to IpAddr, and walks the resulting addresses building typed SocketAddr::new(ip, port) values for
    send_to. Multi-address fallback semantics match the TCP-connect path: first send_to that doesn't error wins,
    transparently handling the v4-only-socket + v6-first-DNS case for localhost-style hostnames. Public signature
    unchanged.
  • tests/integration/src/test-udp.seq — new test-udp-send-to-resolves-hostname test sends "pong" to "localhost"
    (a DNS name, not IP literal) and asserts the round-trip. On Linux's localhost → ::1, 127.0.0.1 resolution this
    exercises both the new DNS path and the multi-address walk.
  • docs/STDLIB_REFERENCE.md — added a paragraph under the UDP table describing the resolution semantics, the
    multi-address walk, and the carrier-non-blocking guarantee.

Design-doc checkpoint #9 (the send-to host-resolution fix) is now satisfied.

The full 5-PR slicing of docs/design/NONBLOCKING_NETWORKING.md is complete: DNS (PR1) → TCP connect (PR2) → TLS
client (PR3) → HTTP rewrite (PR4) → UDP send-to via DNS (PR5). Since docs/design/done/ is the established
convention for completed designs, you may want to mv docs/design/NONBLOCKING_NETWORKING.md docs/design/done/ and
stage that as part of the PR5 commit.

Outstanding follow-ups carried across all five PRs (in case you want a consolidated list for backlog tickets):

  • Deadline pass across the networking stack (connect timeout from PR2, handshake timeout from PR3, per-request
    timeout from PR4, EOF-framed-body hang from PR4).
  • net.tcp.local-port op (closes the hardcoded-port test fragility from PR2).
  • STREAMS / LISTENERS id-aliasing cleanup (flagged in PR2).
  • Per-attempt connect-error introspection (PR2 nit).
  • Happy-eyeballs / RFC 8305 outbound connection ordering (PR2).
  • Encapsulation refactor: move take_tcp and install-into-registry into tcp.rs so STREAMS/SocketRegistry go back
    to fully private (PR3).
  • Happy-path TLS test using a same-process rustls server with a fixture CA (PR3); PR4's HTTPS coverage closes
    part of this gap.
  • Single-flight DNS resolution (PR1) — collapse concurrent resolves of the same uncached host.
  • HTTP client: header-bag API, redirect following, automatic gzip decompression, mTLS, ALPN selection, peer-cert
    inspection, User-Agent override (PR3/PR4).
  • pool::is_alive rename to something less property-test-shaped, plus a Connection: close pool-eviction test
    (PR4).
  • SSRF DNS-rebinding-closure regression test (PR4, would lock the architectural property).
  • Investigate linker-plugin-lto on Linux or feature-gate http_client into its own codegen unit to strip the
    17-byte rustls drop-trampoline residue (PR4).
Changes: - crates/runtime/src/udp.rs — patch_seq_udp_send_to no longer builds a "host:port" string and hands it to may's ToSocketAddrs. Instead it calls crate::dns::resolve(host) (cooperative via the worker pool), parses returned strings to IpAddr, and walks the resulting addresses building typed SocketAddr::new(ip, port) values for send_to. Multi-address fallback semantics match the TCP-connect path: first send_to that doesn't error wins, transparently handling the v4-only-socket + v6-first-DNS case for localhost-style hostnames. Public signature unchanged. - tests/integration/src/test-udp.seq — new test-udp-send-to-resolves-hostname test sends "pong" to "localhost" (a DNS name, not IP literal) and asserts the round-trip. On Linux's localhost → ::1, 127.0.0.1 resolution this exercises both the new DNS path and the multi-address walk. - docs/STDLIB_REFERENCE.md — added a paragraph under the UDP table describing the resolution semantics, the multi-address walk, and the carrier-non-blocking guarantee. Design-doc checkpoint #9 (the send-to host-resolution fix) is now satisfied. The full 5-PR slicing of docs/design/NONBLOCKING_NETWORKING.md is complete: DNS (PR1) → TCP connect (PR2) → TLS client (PR3) → HTTP rewrite (PR4) → UDP send-to via DNS (PR5). Since docs/design/done/ is the established convention for completed designs, you may want to mv docs/design/NONBLOCKING_NETWORKING.md docs/design/done/ and stage that as part of the PR5 commit. Outstanding follow-ups carried across all five PRs (in case you want a consolidated list for backlog tickets): - Deadline pass across the networking stack (connect timeout from PR2, handshake timeout from PR3, per-request timeout from PR4, EOF-framed-body hang from PR4). - net.tcp.local-port op (closes the hardcoded-port test fragility from PR2). - STREAMS / LISTENERS id-aliasing cleanup (flagged in PR2). - Per-attempt connect-error introspection (PR2 nit). - Happy-eyeballs / RFC 8305 outbound connection ordering (PR2). - Encapsulation refactor: move take_tcp and install-into-registry into tcp.rs so STREAMS/SocketRegistry go back to fully private (PR3). - Happy-path TLS test using a same-process rustls server with a fixture CA (PR3); PR4's HTTPS coverage closes part of this gap. - Single-flight DNS resolution (PR1) — collapse concurrent resolves of the same uncached host. - HTTP client: header-bag API, redirect following, automatic gzip decompression, mTLS, ALPN selection, peer-cert inspection, User-Agent override (PR3/PR4). - pool::is_alive rename to something less property-test-shaped, plus a Connection: close pool-eviction test (PR4). - SSRF DNS-rebinding-closure regression test (PR4, would lock the architectural property). - Investigate linker-plugin-lto on Linux or feature-gate http_client into its own codegen unit to strip the 17-byte rustls drop-trampoline residue (PR4).
PR5 ready. CI green.
All checks were successful
CI - Linux / CI - Linux x86_64 (pull_request) Successful in 10m46s
ea56f52625
Changes:

  - crates/runtime/src/udp.rs — patch_seq_udp_send_to no longer builds a "host:port" string and hands it to may's
  ToSocketAddrs. Instead it calls crate::dns::resolve(host) (cooperative via the worker pool), parses returned
  strings to IpAddr, and walks the resulting addresses building typed SocketAddr::new(ip, port) values for
  send_to. Multi-address fallback semantics match the TCP-connect path: first send_to that doesn't error wins,
  transparently handling the v4-only-socket + v6-first-DNS case for localhost-style hostnames. Public signature
  unchanged.
  - tests/integration/src/test-udp.seq — new test-udp-send-to-resolves-hostname test sends "pong" to "localhost"
  (a DNS name, not IP literal) and asserts the round-trip. On Linux's localhost → ::1, 127.0.0.1 resolution this
  exercises both the new DNS path and the multi-address walk.
  - docs/STDLIB_REFERENCE.md — added a paragraph under the UDP table describing the resolution semantics, the
  multi-address walk, and the carrier-non-blocking guarantee.

  Design-doc checkpoint #9 (the send-to host-resolution fix) is now satisfied.

  The full 5-PR slicing of docs/design/NONBLOCKING_NETWORKING.md is complete: DNS (PR1) → TCP connect (PR2) → TLS
  client (PR3) → HTTP rewrite (PR4) → UDP send-to via DNS (PR5). Since docs/design/done/ is the established
  convention for completed designs, you may want to mv docs/design/NONBLOCKING_NETWORKING.md docs/design/done/ and
   stage that as part of the PR5 commit.

  Outstanding follow-ups carried across all five PRs (in case you want a consolidated list for backlog tickets):

  - Deadline pass across the networking stack (connect timeout from PR2, handshake timeout from PR3, per-request
  timeout from PR4, EOF-framed-body hang from PR4).
  - net.tcp.local-port op (closes the hardcoded-port test fragility from PR2).
  - STREAMS / LISTENERS id-aliasing cleanup (flagged in PR2).
  - Per-attempt connect-error introspection (PR2 nit).
  - Happy-eyeballs / RFC 8305 outbound connection ordering (PR2).
  - Encapsulation refactor: move take_tcp and install-into-registry into tcp.rs so STREAMS/SocketRegistry go back
  to fully private (PR3).
  - Happy-path TLS test using a same-process rustls server with a fixture CA (PR3); PR4's HTTPS coverage closes
  part of this gap.
  - Single-flight DNS resolution (PR1) — collapse concurrent resolves of the same uncached host.
  - HTTP client: header-bag API, redirect following, automatic gzip decompression, mTLS, ALPN selection, peer-cert
   inspection, User-Agent override (PR3/PR4).
  - pool::is_alive rename to something less property-test-shaped, plus a Connection: close pool-eviction test
  (PR4).
  - SSRF DNS-rebinding-closure regression test (PR4, would lock the architectural property).
  - Investigate linker-plugin-lto on Linux or feature-gate http_client into its own codegen unit to strip the
  17-byte rustls drop-trampoline residue (PR4).
Author
Owner

Review — PR5 (UDP send-to via DNS pool)

Reviewed against origin/net-pr5. Small, well-scoped PR that closes the last remaining blocking-getaddrinfo hazard in the networking stack. The fix mirrors the PR2 TCP-connect pattern: pre-resolve via the may-aware DNS pool, walk typed SocketAddr::new(ip, port) values rather than building a host:port string (so the IPv6-bracket trap can't reappear here either), and fall back across the resolved address list. No public-signature change, doc explains the new semantics, and the integration test exercises the new path against localhost. Looks good to merge with a couple of small clarifications.

What works well

  • Arc<UdpSocket> clone before the dns::resolve call. The socket is checked out of the registry before the (possibly yielding) DNS lookup, so a concurrent udp.close that drops the registry slot can't yank the socket out from under the send. The Arc keeps it alive for the duration; the existing comment at udp.rs:223-226 captures this. Same invariant the PR1/PR2 cooperative patterns rely on, preserved correctly here.
  • Typed SocketAddr::new(ip, port) end-to-end. The IPv6-bracket bug we hit in PR2 is structurally impossible on this path because nothing ever sees a string-form address.
  • Empty-hostname fast-fail before DNS — defensive shortcut matches the rest of the surface.
  • addrs.iter().any(...) short-circuits on the first Ok, so a successful v6 send doesn't get followed by a needless v4 send. Standard.

Worth tightening (small)

Per-syscall "doesn't error" vs per-peer reachability — doc could be sharper. The new paragraph in STDLIB_REFERENCE.md says "the first send_to that doesn't error wins." For TCP connect that phrasing reads naturally as "first peer that completes the handshake." For UDP it means "first address whose family the local socket actually supports" — send_to returning Ok means the kernel queued the datagram, not that the peer received it. The code comment at udp.rs:247-250 already nails this ("typically address-family mismatches"); the doc should pick up the same framing so users don't read this as a delivery-validation guarantee. One sentence: "Note that for UDP, send_to returning OK means the kernel accepted the datagram for the chosen address family — it doesn't confirm peer delivery." Otherwise people will (reasonably) expect this to keep walking if the first peer is unreachable, which it can't.

Multi-address walk test coverage depends on /etc/hosts configuration. The test comment claims "On Linux's localhost → ::1, 127.0.0.1 resolution" — true on stock distros, but minimal containers, Alpine variants, and some BSD-derived /etc/hosts setups list only 127.0.0.1. On those, the test still passes (single-address walk hits the first entry), but the v6-fails-then-v4-succeeds path is silently not exercised. Test isn't wrong — it does prove DNS-through-the-pool works — just that "verifies the multi-address fallback" in the test comment is conditional on the runner's hosts file. Either weaken the comment ("on dual-stack /etc/hosts configurations this also exercises…") or add a parallel test using an IPv6 literal ("::1") bound on a v6 socket so the walk semantics get explicit coverage. Optional — the architectural property is the same as TCP-connect, which already has dual-stack test coverage.

filter_map silently drops parse failures. dns::resolve returns strings produced by SocketAddr::ip().to_string(), so a non-parseable result would be a runtime invariant violation. The TCP-connect path made the same call with a one-line comment explaining the skip-rather-than-panic rationale; PR5 doesn't carry that comment forward here. Worth a one-liner so a future reader doesn't wonder if they should be propagating an error:

// Resolver only emits IP-string forms produced by
// SocketAddr::ip().to_string(); a parse failure here would
// be a runtime invariant violation — skip rather than panic.
let addrs: Vec<IpAddr> = crate::dns::resolve(hostname)
    .iter()
    .filter_map(|s| s.parse::<IpAddr>().ok())
    .collect();

Optional follow-ups (not for this PR)

  • IP-literal fast path. dns::resolve("127.0.0.1") goes through the worker-pool channel, the cache, etc., even though the resolver's getaddrinfo numeric-host arm returns trivially. Same shape applies to TCP-connect. A hostname.parse::<IpAddr>().ok() short-circuit before dns::resolve would skip the pool round-trip for literal calls — measurable on tight loops, irrelevant otherwise. Worth doing once across all three call sites (TCP-connect, UDP send-to, the HTTP client) rather than per-PR.
  • Single-flight DNS (already on the consolidated backlog from PR1).

Verdict

LGTM. The doc clarification on UDP-OK-isn't-delivery is the only item I'd want before merge; the comment-forward on filter_map is a 30-second nice-to-have; the test-comment caveat is informational. None block.

The "all five PRs delivered" arc closes out a clean refactor of the networking surface — happy to see the consolidated follow-up list in the PR description; that's the right shape for a backlog pass.

## Review — PR5 (UDP send-to via DNS pool) Reviewed against `origin/net-pr5`. Small, well-scoped PR that closes the last remaining blocking-`getaddrinfo` hazard in the networking stack. The fix mirrors the PR2 TCP-connect pattern: pre-resolve via the may-aware DNS pool, walk typed `SocketAddr::new(ip, port)` values rather than building a `host:port` string (so the IPv6-bracket trap can't reappear here either), and fall back across the resolved address list. No public-signature change, doc explains the new semantics, and the integration test exercises the new path against `localhost`. Looks good to merge with a couple of small clarifications. ### What works well - **`Arc<UdpSocket>` clone before the `dns::resolve` call.** The socket is checked out of the registry before the (possibly yielding) DNS lookup, so a concurrent `udp.close` that drops the registry slot can't yank the socket out from under the send. The Arc keeps it alive for the duration; the existing comment at `udp.rs:223-226` captures this. Same invariant the PR1/PR2 cooperative patterns rely on, preserved correctly here. - **Typed `SocketAddr::new(ip, port)` end-to-end.** The IPv6-bracket bug we hit in PR2 is structurally impossible on this path because nothing ever sees a string-form address. - **Empty-hostname fast-fail** before DNS — defensive shortcut matches the rest of the surface. - **`addrs.iter().any(...)` short-circuits** on the first `Ok`, so a successful v6 send doesn't get followed by a needless v4 send. Standard. ### Worth tightening (small) **Per-syscall "doesn't error" vs per-peer reachability — doc could be sharper.** The new paragraph in `STDLIB_REFERENCE.md` says "the first `send_to` that doesn't error wins." For TCP `connect` that phrasing reads naturally as "first peer that completes the handshake." For UDP it means "first address whose family the local socket actually supports" — `send_to` returning `Ok` means the kernel queued the datagram, not that the peer received it. The code comment at `udp.rs:247-250` already nails this ("typically address-family mismatches"); the doc should pick up the same framing so users don't read this as a delivery-validation guarantee. One sentence: "Note that for UDP, `send_to` returning OK means the kernel accepted the datagram for the chosen address family — it doesn't confirm peer delivery." Otherwise people will (reasonably) expect this to keep walking if the first peer is unreachable, which it can't. **Multi-address walk test coverage depends on `/etc/hosts` configuration.** The test comment claims "On Linux's localhost → ::1, 127.0.0.1 resolution" — true on stock distros, but minimal containers, Alpine variants, and some BSD-derived `/etc/hosts` setups list only `127.0.0.1`. On those, the test still passes (single-address walk hits the first entry), but the v6-fails-then-v4-succeeds path is silently not exercised. Test isn't *wrong* — it does prove DNS-through-the-pool works — just that "verifies the multi-address fallback" in the test comment is conditional on the runner's hosts file. Either weaken the comment ("on dual-stack `/etc/hosts` configurations this also exercises…") or add a parallel test using an IPv6 literal (`"::1"`) bound on a v6 socket so the walk semantics get explicit coverage. Optional — the architectural property is the same as TCP-connect, which already has dual-stack test coverage. **`filter_map` silently drops parse failures.** `dns::resolve` returns strings produced by `SocketAddr::ip().to_string()`, so a non-parseable result would be a runtime invariant violation. The TCP-connect path made the same call with a one-line comment explaining the skip-rather-than-panic rationale; PR5 doesn't carry that comment forward here. Worth a one-liner so a future reader doesn't wonder if they should be propagating an error: ```rust // Resolver only emits IP-string forms produced by // SocketAddr::ip().to_string(); a parse failure here would // be a runtime invariant violation — skip rather than panic. let addrs: Vec<IpAddr> = crate::dns::resolve(hostname) .iter() .filter_map(|s| s.parse::<IpAddr>().ok()) .collect(); ``` ### Optional follow-ups (not for this PR) - **IP-literal fast path.** `dns::resolve("127.0.0.1")` goes through the worker-pool channel, the cache, etc., even though the resolver's `getaddrinfo` numeric-host arm returns trivially. Same shape applies to TCP-connect. A `hostname.parse::<IpAddr>().ok()` short-circuit before `dns::resolve` would skip the pool round-trip for literal calls — measurable on tight loops, irrelevant otherwise. Worth doing once across all three call sites (TCP-connect, UDP send-to, the HTTP client) rather than per-PR. - **Single-flight DNS** (already on the consolidated backlog from PR1). ### Verdict LGTM. The doc clarification on UDP-OK-isn't-delivery is the only item I'd want before merge; the comment-forward on `filter_map` is a 30-second nice-to-have; the test-comment caveat is informational. None block. The "all five PRs delivered" arc closes out a clean refactor of the networking surface — happy to see the consolidated follow-up list in the PR description; that's the right shape for a backlog pass.
● All three review-driven clarifications applied. CI green.
All checks were successful
CI - Linux / CI - Linux x86_64 (pull_request) Successful in 10m45s
13f2712963
Done:

  1. UDP-OK-isn't-delivery doc (STDLIB_REFERENCE.md): added a "Note" paragraph spelling out that send_to returning
   OK means the kernel queued the datagram for the chosen address family — the multi-address walk catches
  local-side errors (family mismatch, route-not-found) but cannot detect an unreachable peer past the first
  successful queue.
  2. filter_map skip-on-parse comment (udp.rs): carried forward the same one-line rationale that PR2's TCP-connect
   uses — resolver only emits IP-string forms, a parse failure would be an invariant violation, so we skip rather
  than panic.
  3. Test-comment caveat (test-udp.seq): rewrote the dual-stack assumption to be conditional — on stock Linux
  /etc/hosts the walk exercises the v6-fails-then-v4-succeeds fallback; on minimal containers / Alpine / some BSD
  /etc/hosts setups where localhost is v4-only, the walk has just one entry and the test still verifies the
  DNS-through-pool path.

  Optional follow-ups (IP-literal fast path across all three call sites, single-flight DNS) stay on the
  consolidated backlog per the reviewer's explicit guidance.
Author
Owner

Second pass

All three clarifications landed cleanly:

  • Doc note on UDP delivery semantics reads exactly right — it puts the "kernel queued ≠ peer received" distinction in front of the user and explicitly says the walk catches local-side errors (address-family mismatch, route-not-found) but can't detect an unreachable peer past the first successful queue. No more reading the multi-address fallback as a delivery-validation feature.
  • filter_map rationale comment is in and cross-references tcp::patch_seq_tcp_connect, which makes the invariant discoverable for someone landing in udp.rs first.
  • Test comment now distinguishes the dual-stack and single-stack cases honestly — it's clear which CI runners exercise the walk and which only exercise the DNS-through-pool path.

LGTM. Ship it.

## Second pass All three clarifications landed cleanly: - **Doc note on UDP delivery semantics** reads exactly right — it puts the "kernel queued ≠ peer received" distinction in front of the user and explicitly says the walk catches local-side errors (address-family mismatch, route-not-found) but can't detect an unreachable peer past the first successful queue. No more reading the multi-address fallback as a delivery-validation feature. - **`filter_map` rationale comment** is in and cross-references `tcp::patch_seq_tcp_connect`, which makes the invariant discoverable for someone landing in `udp.rs` first. - **Test comment** now distinguishes the dual-stack and single-stack cases honestly — it's clear which CI runners exercise the walk and which only exercise the DNS-through-pool path. LGTM. Ship it.
navicore deleted branch net-pr5 2026-05-13 16:47:48 +00:00
Sign in to join this conversation.
No description provided.