● PR2 (net.tcp.connect) ready. CI green. #478
No reviewers
Labels
No labels
bug
dependencies
documentation
duplicate
enhancement
good first issue
help wanted
invalid
question
refactor
rust
technical-debt
wontfix
No milestone
No project
No assignees
1 participant
Notifications
Due date
No due date set.
Dependencies
No dependencies set.
Reference
navicore/patch-seq!478
Loading…
Add table
Add a link
Reference in a new issue
No description provided.
Delete branch "net-pr2"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Changes:
cache and worker pool with the FFI surface instead of opening a parallel getaddrinfo path.
order, and uses may::net::TcpStream::connect with an IP-literal address so may's internal to_socket_addrs never
blocks the carrier on a name lookup. Yields the strand on every step.
may::net::TcpStream::read call (cooperative-blocking is the may default; the WouldBlock arm was unreachable).
MAX_READ_SIZE retained behind #[allow(dead_code)] as the architectural ceiling for future read-N variants.
STREAMS. The user-visible Socket type unifies both; close should as well. Exposed by the new round-trip test,
which needs to close both client and listener.
runtime/lib.rs.
plus three failure paths.
TCP block I was editing.
Review — PR2 (net.tcp.connect)
Reviewed the diff end-to-end against
origin/net-pr2. The structural moves are right:dns::resolvefactored out so both the FFI and the connect builtin share the cache + worker pool, IP literals only on theTcpStream::connectpath somaycan't reach forgetaddrinfoagain, and the read-loop dead-code excision. The PR1 fallout (worker-spawn degradation,SEQ_DNS_WORKERS=0doc) all landed cleanly.One real bug, a couple of behaviour shifts worth being explicit about, plus follow-ups.
Bug — IPv6 addresses unreachable
tcp.rs:180:iphere came fromSocketAddr::ip().to_string()insideresolve_blocking, so it's a bare IP without brackets. For IPv6 this produces strings like"::1:80"— which is not a validSocketAddr(the IPv6 literal needs[...]).TcpStream::connect("::1:80")will fail to parse and the loop moves on.On every modern Linux box
getaddrinforeturns AAAA before A by default, so for a dual-stack host the first attempt parse-fails, the second (IPv4) succeeds — invisible perf hit. For an IPv6-only host the connect fails entirely, even though DNS returned a valid address.Two clean fixes:
IpAddrand buildSocketAddrdirectly — avoids string round-tripping altogether:if ip.contains(':') { format!("[{ip}]:{port}") } else { format!("{ip}:{port}") }.The integration test as written can't catch this — the round-trip uses the IP literal
"127.0.0.1"(so DNS returns a single-element v4 list), and the failure-path tests use empty/invalid/.invalidhosts (so the loop is never entered). Worth adding a test that resolves a dual-stack name and asserts at least the v6 attempt parses, or one that connects to::1via a hostname that resolves IPv6-only.Behaviour shifts worth being explicit about
net.tcp.readis now strictly one-batch. Old loop kept reading untilWouldBlockafter the first non-empty chunk; new code does onestream.read(&mut [u8; 4096])and returns. The PR description is right that theWouldBlockarm was dead undermay(the wrapper suspends internally instead of surfacingWouldBlock), so the loop only ever made one progressing iteration in practice — but the buffer-extension loop could have stitched multiple ready chunks together if the kernel had >4 KB queued before the strand was scheduled. Now you get at most 4 KB per call, and the caller is responsible for re-reading. For TCP that's the correct framing model, but anything in the tree that was relying on "read drains the kernel buffer in one call" will see a behavioural change. Worth a quick grep of callers (http_clientusesureqso it's unaffected; just want to be sure no.seqexample assumes drain-on-read).net.tcp.closenow succeeds on listener ids. Old code only consultedSTREAMS; closing a listener silently returnedfalse. New code falls back toLISTENERS. Strictly a bug fix — the user-visibleSockettype was always supposed to unify both — but worth noting that any caller that assertednet.tcp.closeon a listener returnsfalse(none should, but) will now seetrue.Id-aliasing across
STREAMSandLISTENERS. The comment intcp.rs:438-445already calls this out as a "design wart": both registries start their ids at 0, so id0can legitimately mean either a stream or a listener.closechecks streams first, so if both exist with id 0, you'd close the stream. Fine for the v1 cases the comment lists (one listener, finite streams), but the integration test happens to exercise exactly this shape (one listener at id 0, one stream at id 0 in the same process) and only works because the test closes the client first, then the listener. A test that closes in the other order would expose the aliasing. Long-term fix is one allocator with tagged ids; near-term it's worth at least asserting in the close path that the registry it found the id in matches what the caller expected, or biasing the allocators to non-overlapping id ranges.Connect-specific items
No connect timeout.
TcpStream::connectwill park the strand for the OS SYN timeout (≈60–130 s on Linux) when the peer is silent. There's no caller-side way to bound this. Probably PR3 material, but the doc should at least say so — the worked example inSTDLIB_REFERENCE.mdwould deadlock a strand for a minute against an unreachable host.Serial fallback, no happy-eyeballs. Walking the resolved list in order is fine, but if AAAA points somewhere unreachable (common on misconfigured corp networks) you eat the full SYN timeout before falling back to A. RFC 8305 / happy-eyeballs is a known follow-up; mention it in the doc so people know.
No way to discover the OS-assigned port.
net.tcp.listenaccepts0(validated0..=65535since PR1) but there's nonet.tcp.local-port ( Socket -- Int Bool )to learn what the OS picked. The integration test hardcodes 54321 specifically because of this gap — and that port collides with anything else on a shared CI runner. Worth a small follow-up to addlocal-portso the test (and real callers binding ephemerals) doesn't need a magic number.Connect-loop error swallowing.
if let Ok(s) = TcpStream::connect(&target)discards the error per attempt. Fine for the loop logic, but for debugging an "every-address-failed" outcome the user gets back onlyfalse. A futurenet.tcp.connect-errorop (or a side-channel debug counter) would help; not blocking.Test concerns
tcp-test-portis hardcoded. As above — symptom of the missinglocal-portop. The test will be flaky on any runner where 54321 is taken. The error message at least tells the user what happened.test-net-tcp-connect-unresolvable-failsround-trips through the actual worker pool..invalidis RFC 2606, so getaddrinfo will reject it without external traffic, but each test invocation does start the DNS pool (if no prior test has) and submit a job. Acceptable; just noting that this is the first PR2 test that actually exercises the worker path, and it does so for the failure case only.No test for the multi-A-record fallback path. All three failure-path tests fail before the connect loop is entered (empty host, invalid port, unresolvable name). The happy-path test resolves to exactly one IP. So the
for ip in &addrsloop is never tested with >1 element. A test using a hostname that resolves to multiple addresses (or stubbing the resolver) would close that gap — useful both for the v6-bracket bug above and for verifying the order-of-attempt semantics.Nits
tcp.rs:104(pre-existing):format!("0.0.0.0:{}", port)could beformat!("0.0.0.0:{port}")foruninlined_format_argsconsistency. Drive-by, not yours to fix.tcp.rs:248—let mut connected: Option<TcpStream> = None;followed byfor+breakis fine but could belet connected = addrs.iter().find_map(|ip| TcpStream::connect(&format!("{ip}:{port}")).ok());. Stylistic only, and any fix for the v6 bug above will rewrite this block anyway.tcp.rs:122says "patch_seq_tcp_read currently performs a single 4 KB read per call" — accurate after this PR, slightly forward-referencing if read by someone who scrolls top-down. Not actionable.STDLIB_REFERENCE.mdworked example: minor —dup net.tcp.read [ io.write-line ] [ drop ] ifdrops the response on failure but never errors out, which is fine for a doc snippet but reads slightly differently from the patterns elsewhere in the file. Cosmetic.Verdict
I'd block on the IPv6-bracket bug (#1) and add at least one test that exercises the multi-IP loop — without it the bug-fix is unverified. Everything else can ride along as follow-ups (
local-portop, connect timeout, happy-eyeballs, id-aliasing cleanup). The DNS refactor and read-loop simplification are both wins independent of the connect work.Done in this round: 1. IPv6 bracket bug (tcp.rs:248) — replaced format!("{ip}:{port}") with SocketAddr::new(ip.parse::<IpAddr>()?, port as u16). Avoids string round-tripping entirely, so ::1 flows through as a proper IpAddr instead of getting smashed into the malformed "::1:port". Also folded the explicit Option/for/break into addrs.iter().find_map(...) per the reviewer's stylistic nit (it was natural here anyway). 2. Multi-IP fallback test — test-net-tcp-connect-multi-ip-fallback. Connects to "localhost" which getaddrinfo returns as ["::1", "127.0.0.1"] on this box; the listener binds 0.0.0.0 (v4 only), so ::1 connect-attempt fails and the loop walks to 127.0.0.1. Test would have caught the pre-fix bug (the malformed v6 string would have failed-parsed in old code, fallen through to v4 — same end result, but a v6-only host would have been wedged silently). 3. net.tcp.read caller audit — three .seq callers (http_simple, http_server, actor_counters); all do read once → parse → write → close. None loop on read. The old code's Ok(n) if n > 0 arm also had an unconditional break;, so the loop only kept going via the WouldBlock arm — and that only fired before the first byte arrived. Net practical behavior unchanged. 4. Doc additions — new "Known limitations — connect" section in STDLIB_REFERENCE.md: no connect timeout (and the strand-park duration users will see against silent peers), serial fallback / no happy-eyeballs. Worked-example inner read-failure branch now writes "read failed" instead of silently dropping. 5. Comment rewording — tcp.rs:121 no longer says "currently performs"; reads cleanly top-down. Carried as follow-ups (not addressed in PR2): net.tcp.local-port op (would replace the hardcoded test ports), id-aliasing cleanup across STREAMS/LISTENERS, per-attempt connect-error introspection, happy-eyeballs / connect timeout implementations, drive-by 0.0.0.0:{} format inlining.