● Issue #482 done. CI green. #486
No reviewers
Labels
No labels
bug
dependencies
documentation
duplicate
enhancement
good first issue
help wanted
invalid
question
refactor
rust
technical-debt
wontfix
No milestone
No project
No assignees
1 participant
Notifications
Due date
No due date set.
Dependencies
No dependencies set.
Reference
navicore/patch-seq!486
Loading…
Add table
Add a link
Reference in a new issue
No description provided.
Delete branch "issue-482"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
#482
All five items landed:
the underlying TCP local port) then LISTENERS. Wired through all four sites: compiler builtins, codegen runtime decls, ast/program.rs validation list,
runtime/lib.rs re-export, error-flag-lint entry.
socket is safe to reuse for the next request — the kernel reports no readable bytes, no FIN, no error."
back to dns::resolve only for actual DNS work). The three callers — tcp::patch_seq_tcp_connect, udp::patch_seq_udp_send_to,
http_client::ssrf::validate_url — all switched. Worker-pool round-trip is reserved for genuine DNS now.
IN_FLIGHT instead of through a per-job reply channel). First-arriver enqueues; later arrivers attach their reply senders to the existing entry. Worker
writes cache before the fanout pop, so late arrivers that race the pop see the cache hit and never need to attach. The "Known limitation: no
single-flight" section in both dns.rs and STDLIB_REFERENCE.md has been rewritten as "Fanout collapsing" describing the new behaviour.
(and the take_tcp / free_stream helpers) all back to private. The "free + allocate is dangerous across a strand yield" invariant lives entirely inside
tcp.rs; tls.rs is just crate::tcp::upgrade_tcp_in_place(socket_id, |tcp| build_tls(tcp, hostname)).
Tests cleanup (the bonus item) — test-net-tcp-connect.seq and test-net-tls-client.seq no longer hardcode ports. Every test now does 0 net.tcp.listen then
dup net.tcp.local-port to discover the OS-assigned port, then spawns the worker strand via [ drop server-body ] strand.spawn (drop discards the port from
the child's stack clone). The tcp-test-port / tcp-test-port-2 / tls-test-port[-bad,-dns] helpers are gone. 8 tests, all green.
Note on naming choice: I picked is_reusable over is_quiet per my earlier read on which conveyed purpose better. Easy to flip if you prefer is_quiet.
Review — PR #486 (networking polish pass, closes #482)
Reviewed against
origin/issue-482. Tight half-day's work hitting all five items on the issue, plus the test-port cleanup bonus. The encapsulation refactor (upgrade_tcp_in_place) and the single-flight DNS implementation are the meaty pieces; the rest are local cleanups. I have one real bug to flag and one design concern that lands on top of pre-existing surface area. The architectural shape is right.What works well
CACHE.putbeforeIN_FLIGHT.remove, which the inline comment inworker_loopcalls out explicitly. Late arrivers that race the fanout pop see the freshly-written cache and short-circuit without ever touching the in-flight map. I traced concurrent leader/follower, follower-after-pop, leader-panic-poison, and cache-TTL-expiry-mid-flight; nothing deadlocks and nothing returns a wrong answer.JOB_QUEUE.senderrs (worker pool failed to start), the leader removes its own entry fromIN_FLIGHTand fans out emptyVecto every attached sender, including any follower that raced in during the gap between insert and recovery-remove. That follower then getsVec::new()from itsreply_rx.recv()and returns empty. No infinite block.upgrade_tcp_in_placeencapsulates the take/reinstall/free invariant cleanly. The "slot reserved across yield" property now lives entirely insidetcp.rs.tls.rsis just a callback supplier — exactly the shape the PR3 reviewer-framing suggested.STREAMS,SocketRegistry, andStreamKindare all back to private. Good narrowing.resolve_to_ipscentralisation. The patternresolve(host).iter().filter_map(...IpAddr...).collect()was distributed acrosstcp::patch_seq_tcp_connect,udp::patch_seq_udp_send_to, andhttp_client::ssrf::validate_url. Now collapsed to one helper with the IP-literal short-circuit baked in. Less duplication, one place to fix bugs.0 net.tcp.listen+dup net.tcp.local-portis the standard "let the OS pick" pattern. Removing the hardcoded magic numbers eliminates a parallel-CI flake source. The[ drop server-body ] strand.spawn dropidiom for popping the port from the child's stack-clone reads cleanly once you see it twice.Bug — TLS-handshake-failure leaks a
STREAMSidupgrade_tcp_in_place's failure path is a silent no-op. Aftertake_tcp(id)succeeds, the slot isSome(None)— Vec entry exists, value isNone. Onf(tcp) → Err(()), the code callsfree_stream(id) → STREAMS.lock().unwrap().free(id). ButSocketRegistry::freeis guarded:The guard exists to prevent double-free (pushing the same id onto
free_idstwice would haveallocate()hand the same id to two callers). But it doesn't distinguish "in-use → freed" from "reserved (set None bytake_tcp) → freed." Aftertake_tcpsucceeds,*slot == None, soslot.is_some()is false andfree()exits without pushing ontofree_ids. The id is leaked. The Vec entry stays at indexidholdingNoneforever, and the nextallocate()skips this slot and pushes a new entry, growing the Vec monotonically.Per failed TLS handshake (invalid cert, server reset, ALPN mismatch, etc.), one id leaks. After
MAX_SOCKETS(10 000) failures the runtime refuses new sockets. Real but slow-burn.This bug predates this PR — the same
STREAMS.lock().unwrap().free(socket_id)call existed intls.rsbefore the encapsulation refactor and had the same no-op behaviour against a reserved-None slot. But this PR is the one that consolidates the take/reinstall/free dance intotcp.rs, which makes it the natural place to fix it. Three options:Add a
release_reserved(id)method that pushes ontofree_idsunconditionally when the slot isSome(None), refusing the call otherwise. Symmetric pair totake_tcp.In the failure branch of
upgrade_tcp_in_place, push directly:This would need
free_idsto bepub(crate)ofSocketRegistry, or an inherent helper onSocketRegistry. Less clean.Track reservation explicitly in a third state (e.g.
Reservedvariant of an enum slot) — biggest churn, biggest clarity win, probably overkill.I'd pick (1). It mirrors
take_tcpdirectly and keeps the invariant local. Worth a unit test that allocates an id, takes it, releases it via the new method, allocates again, and asserts you got the same id back — that pins the round-trip semantics that the current TLS-failure path silently breaks.Design concern —
net.tcp.local-portand the LISTENERS/STREAMS id aliasLISTENERSandSTREAMSare separate registries with separate free-lists, both starting at id 0. So a Seq-levelSocketinteger is only unique within one registry. The dispatch order inpatch_seq_tcp_local_port(and inclose, and elsewhere) is streams first, then listeners.For
close, this is fine: if younet.tcp.close 0on a stream id 0 that aliases a listener id 0, you close the stream, which is the more recent and more dangerous resource. The listener still exists and can be closed by another path or by process teardown.For
local-portthe failure mode is silent and wrong-answer rather than loud and conservative. Concretely:In the new tests this can't happen because
local-portis called immediately afterlisten, before any connect — soSTREAMSis empty and dispatch falls through toLISTENERS. But in real Seq code that mixes listeners and streams, asking for a listener's local-port after streams have been allocated returns whichever resource happens to live at that numeric id inSTREAMS. The user gets a plausible-looking integer (a real port number) with no indication anything is wrong.The id-aliasing problem is pre-existing — it predates this PR — and the dispatch choice for
local-portis consistent withclose, which is the right consistency call. Butlocal-portexposes a sharper version of the same flaw. Three mitigations, none required for merge:STDLIB_REFERENCE.md: "callnet.tcp.local-porton a listener before allocating connected streams, or store the result eagerly."local-port(LISTENERSfirst). Asymmetric withclose, which is itself an argument against doing it.Socketinteger with a registry-bit at the FFI boundary, encoding listener-vs-stream into the high bits of the i64. Out of scope here; tracked-elsewhere class of work.I'd take the documentation route as a one-line stdlib note for now and treat the broader id-aliasing fix as its own follow-up.
Smaller observations
dns::resolvecalls for the same uncached host and asserts the worker count goes up by 1 (not N) would lock in the architectural property the PR adds. TheIN_FLIGHTstatic is private, so a test would need either a#[cfg(test)]accessor or instrumentation on the worker (job counter). Both are small. The same goes forresolve_to_ips— a unit test assertingresolve_to_ips("127.0.0.1")returns[127.0.0.1]without incrementing a worker-job counter pins the fast-path skip. Optional, not for this PR.Repliestype alias is a bit generic.WaitingResolversorInFlightSenderswould read more directly. Nit.is_reusablevsis_quiet— agree with the choice.is_reusablematches the call-site intent (if is_reusable(fd) { return Some(stream) }) and the docstring captures the underlying "quiet" property precisely.is_quietwould have made the reader trace from name → semantics;is_reusablemakes the call-site read at a glance.upgrade_tcp_in_place'sErr(())ergonomics. The unit()error type is a fine convention for "the callback consumed the stream and there's nothing to report," but if the failure path ever wants a reason (logging, metrics, propagating a TLS handshake error category), this signature would force a wider refactor. Not a problem now — flagging only.STDLIB_REFERENCE.mdlocal-portparagraph not added to the new "Networking — net.tcp.*" table or the prose section. The compiler-sideadd_docsinsertion incrates/compiler/src/builtins/tcp.rscovers the integrated help, but a reader using the stdlib reference doc as their primary source won't see the new builtin. One row in the table is enough.Verdict
Land after fixing the id-leak in
upgrade_tcp_in_place's failure path. That's a real bug — small surface, slow burn, but the PR is the natural place. Thelocal-portid-aliasing issue is a pre-existing class of bug that this PR inherits rather than introduces; flag in docs and treat the deeper fix as its own follow-up. The encapsulation refactor and the single-flight DNS implementation are the right shape and correctly implemented; the test-port cleanup is a clean bonus that eliminates a flake source. Nice closeout of the PR1–PR5 arc.Second pass — review-driven fixes look right
Walked the delta from
cdb6c9b..dfca298. All four asks landed cleanly.release_reservedfix for the id-leak (tcp.rs)The new
SocketRegistry::release_reserved(id)is the symmetric pair totake_tcpand the right shape. I traced the precondition matrix:Some(None), id not on free list) —is_reserved == true,contains == false→ push. ✓is_reserved == true,contains == true→ skip. No double-push. ✓Some(None), id on free list) —is_reserved == truebutcontains == true→ skip. ✓ Doesn't corrupt.Some(value)) —is_reserved == false→ skip. ✓ Doesn't accidentally release a live id.sockets.get(id)returns None →.unwrap_or(false)→ skip. ✓The "free list is normally short, so the linear scan is cheap" rationale in the docstring is accurate — under steady-state load, free_ids is bounded by the count of recently-released ids and stays tiny. The contains check is the right defensive choice over a HashSet sidecar.
The inline comment on the
Err(())branch inupgrade_tcp_in_placeexplaining why plainfree()was insufficient (take_tcp already nulled the slot) makes the fix self-explanatory for the next reader who lands in this code.Unit test pins the regression (tcp/tests.rs)
test_socket_registry_release_reserved_round_tripis exactly the test I'd have written. The first half (allocate → simulate take → release_reserved → allocate again → assert same id) would have failed under the pre-fix no-op behaviour because the second allocate would have pushed a new Vec entry at id 1. The second half (double-release → two allocates must get different ids) pins thecontainsguard — without it, free_ids would hold[0, 0]and both allocates would return id 0, which is the use-after-free in disguise. Both halves lock-step the invariant.One nit, not for fixing: the test asserts
id == 0for the first allocation, which couples the test to internal allocation order. If a future refactor changed the free-list to a SmallVec or a queue, the test would still pass logically but with a different first id. Acceptable — the alternative (allocating-and-not-asserting-the-id) hides what's being asserted on the next line.Doc additions for
net.tcp.local-port(STDLIB_REFERENCE.md)The new table row is clean. The id-aliasing note is the right level of disclosure — honest about the surface ("the same
Socketinteger can refer to different resources depending on which registry holds it") without overstating the danger ("treatSocketids as resource-local, not globally unique"). The practical workaround ("callnet.tcp.local-porton it before allocating any connected streams") is the kind of guidance a user actually needs. References the broader follow-up without committing to a timeline. Good shape.Bonus catch on the
listen 0documentation — "Pass0to let the OS pick" was previously implicit in the runtime but never made it into the stdlib reference. Now any reader landing on the table sees the same idiom the tests use.Replies→InFlightSenders(dns.rs)Reads more directly. Trivial change, but worth doing.
Verdict
LGTM. Ship it. The fix is minimal, the test pins the regression, the docs are honest, and the rename is a clean read. The id-aliasing follow-up (a deeper LISTENERS/STREAMS unification, or a registry-bit tag on Socket) remains the right shape to leave deferred — the doc disclosure is sufficient for now.