● Issue #482 done. CI green. #486

Merged
navicore merged 2 commits from issue-482 into main 2026-05-14 00:44:41 +00:00
Owner

#482

All five items landed:

  1. net.tcp.local-port ( Socket -- Int Bool ) — new builtin in runtime/src/tcp.rs. Dispatches over STREAMS (works for both Tcp and Tls variants — returns
    the underlying TCP local port) then LISTENERS. Wired through all four sites: compiler builtins, codegen runtime decls, ast/program.rs validation list,
    runtime/lib.rs re-export, error-flag-lint entry.
  2. pool::is_alive → pool::is_reusable — accurate name. Module doc and call site updated; the doc on the function itself now says "Returns true iff the
    socket is safe to reuse for the next request — the kernel reports no readable bytes, no FIN, no error."
  3. IP-literal fast path — added dns::resolve_to_ips(&str) -> Vec that does the literal short-circuit (hostname.parse::().ok() first; falls
    back to dns::resolve only for actual DNS work). The three callers — tcp::patch_seq_tcp_connect, udp::patch_seq_udp_send_to,
    http_client::ssrf::validate_url — all switched. Worker-pool round-trip is reserved for genuine DNS now.
  4. Single-flight DNS — added static IN_FLIGHT: LazyLock<Mutex<HashMap<String, Replies>>>. Restructured Job to carry only the hostname (worker fans out via
    IN_FLIGHT instead of through a per-job reply channel). First-arriver enqueues; later arrivers attach their reply senders to the existing entry. Worker
    writes cache before the fanout pop, so late arrivers that race the pop see the cache hit and never need to attach. The "Known limitation: no
    single-flight" section in both dns.rs and STDLIB_REFERENCE.md has been rewritten as "Fanout collapsing" describing the new behaviour.
  5. Encapsulation refactor — tcp.rs now exposes only pub(crate) fn upgrade_tcp_in_place(id, f) for the TLS module. STREAMS, SocketRegistry, and StreamKind
    (and the take_tcp / free_stream helpers) all back to private. The "free + allocate is dangerous across a strand yield" invariant lives entirely inside
    tcp.rs; tls.rs is just crate::tcp::upgrade_tcp_in_place(socket_id, |tcp| build_tls(tcp, hostname)).

Tests cleanup (the bonus item) — test-net-tcp-connect.seq and test-net-tls-client.seq no longer hardcode ports. Every test now does 0 net.tcp.listen then
dup net.tcp.local-port to discover the OS-assigned port, then spawns the worker strand via [ drop server-body ] strand.spawn (drop discards the port from
the child's stack clone). The tcp-test-port / tcp-test-port-2 / tls-test-port[-bad,-dns] helpers are gone. 8 tests, all green.

Note on naming choice: I picked is_reusable over is_quiet per my earlier read on which conveyed purpose better. Easy to flip if you prefer is_quiet.

https://git.navicore.tech/navicore/patch-seq/issues/482 All five items landed: 1. net.tcp.local-port ( Socket -- Int Bool ) — new builtin in runtime/src/tcp.rs. Dispatches over STREAMS (works for both Tcp and Tls variants — returns the underlying TCP local port) then LISTENERS. Wired through all four sites: compiler builtins, codegen runtime decls, ast/program.rs validation list, runtime/lib.rs re-export, error-flag-lint entry. 2. pool::is_alive → pool::is_reusable — accurate name. Module doc and call site updated; the doc on the function itself now says "Returns true iff the socket is safe to reuse for the next request — the kernel reports no readable bytes, no FIN, no error." 3. IP-literal fast path — added dns::resolve_to_ips(&str) -> Vec<IpAddr> that does the literal short-circuit (hostname.parse::<IpAddr>().ok() first; falls back to dns::resolve only for actual DNS work). The three callers — tcp::patch_seq_tcp_connect, udp::patch_seq_udp_send_to, http_client::ssrf::validate_url — all switched. Worker-pool round-trip is reserved for genuine DNS now. 4. Single-flight DNS — added static IN_FLIGHT: LazyLock<Mutex<HashMap<String, Replies>>>. Restructured Job to carry only the hostname (worker fans out via IN_FLIGHT instead of through a per-job reply channel). First-arriver enqueues; later arrivers attach their reply senders to the existing entry. Worker writes cache before the fanout pop, so late arrivers that race the pop see the cache hit and never need to attach. The "Known limitation: no single-flight" section in both dns.rs and STDLIB_REFERENCE.md has been rewritten as "Fanout collapsing" describing the new behaviour. 5. Encapsulation refactor — tcp.rs now exposes only pub(crate) fn upgrade_tcp_in_place(id, f) for the TLS module. STREAMS, SocketRegistry, and StreamKind (and the take_tcp / free_stream helpers) all back to private. The "free + allocate is dangerous across a strand yield" invariant lives entirely inside tcp.rs; tls.rs is just crate::tcp::upgrade_tcp_in_place(socket_id, |tcp| build_tls(tcp, hostname)). Tests cleanup (the bonus item) — test-net-tcp-connect.seq and test-net-tls-client.seq no longer hardcode ports. Every test now does 0 net.tcp.listen then dup net.tcp.local-port to discover the OS-assigned port, then spawns the worker strand via [ drop server-body ] strand.spawn (drop discards the port from the child's stack clone). The tcp-test-port / tcp-test-port-2 / tls-test-port[-bad,-dns] helpers are gone. 8 tests, all green. Note on naming choice: I picked is_reusable over is_quiet per my earlier read on which conveyed purpose better. Easy to flip if you prefer is_quiet.
● Issue #482 done. CI green.
All checks were successful
CI - Linux / CI - Linux x86_64 (pull_request) Successful in 10m50s
cdb6c9bdd6
All five items landed:

  1. net.tcp.local-port ( Socket -- Int Bool ) — new builtin in runtime/src/tcp.rs. Dispatches over STREAMS (works for both Tcp and Tls variants — returns
  the underlying TCP local port) then LISTENERS. Wired through all four sites: compiler builtins, codegen runtime decls, ast/program.rs validation list,
  runtime/lib.rs re-export, error-flag-lint entry.
  2. pool::is_alive → pool::is_reusable — accurate name. Module doc and call site updated; the doc on the function itself now says "Returns true iff the
  socket is safe to reuse for the next request — the kernel reports no readable bytes, no FIN, no error."
  3. IP-literal fast path — added dns::resolve_to_ips(&str) -> Vec<IpAddr> that does the literal short-circuit (hostname.parse::<IpAddr>().ok() first; falls
   back to dns::resolve only for actual DNS work). The three callers — tcp::patch_seq_tcp_connect, udp::patch_seq_udp_send_to,
  http_client::ssrf::validate_url — all switched. Worker-pool round-trip is reserved for genuine DNS now.
  4. Single-flight DNS — added static IN_FLIGHT: LazyLock<Mutex<HashMap<String, Replies>>>. Restructured Job to carry only the hostname (worker fans out via
   IN_FLIGHT instead of through a per-job reply channel). First-arriver enqueues; later arrivers attach their reply senders to the existing entry. Worker
  writes cache before the fanout pop, so late arrivers that race the pop see the cache hit and never need to attach. The "Known limitation: no
  single-flight" section in both dns.rs and STDLIB_REFERENCE.md has been rewritten as "Fanout collapsing" describing the new behaviour.
  5. Encapsulation refactor — tcp.rs now exposes only pub(crate) fn upgrade_tcp_in_place(id, f) for the TLS module. STREAMS, SocketRegistry, and StreamKind
  (and the take_tcp / free_stream helpers) all back to private. The "free + allocate is dangerous across a strand yield" invariant lives entirely inside
  tcp.rs; tls.rs is just crate::tcp::upgrade_tcp_in_place(socket_id, |tcp| build_tls(tcp, hostname)).

  Tests cleanup (the bonus item) — test-net-tcp-connect.seq and test-net-tls-client.seq no longer hardcode ports. Every test now does 0 net.tcp.listen then
  dup net.tcp.local-port to discover the OS-assigned port, then spawns the worker strand via [ drop server-body ] strand.spawn (drop discards the port from
  the child's stack clone). The tcp-test-port / tcp-test-port-2 / tls-test-port[-bad,-dns] helpers are gone. 8 tests, all green.

  Note on naming choice: I picked is_reusable over is_quiet per my earlier read on which conveyed purpose better. Easy to flip if you prefer is_quiet.
Author
Owner

Review — PR #486 (networking polish pass, closes #482)

Reviewed against origin/issue-482. Tight half-day's work hitting all five items on the issue, plus the test-port cleanup bonus. The encapsulation refactor (upgrade_tcp_in_place) and the single-flight DNS implementation are the meaty pieces; the rest are local cleanups. I have one real bug to flag and one design concern that lands on top of pre-existing surface area. The architectural shape is right.

What works well

  • Single-flight DNS implementation is correct under the races I can construct. The critical ordering is CACHE.put before IN_FLIGHT.remove, which the inline comment in worker_loop calls out explicitly. Late arrivers that race the fanout pop see the freshly-written cache and short-circuit without ever touching the in-flight map. I traced concurrent leader/follower, follower-after-pop, leader-panic-poison, and cache-TTL-expiry-mid-flight; nothing deadlocks and nothing returns a wrong answer.
  • Enqueue-failure recovery path is right. If JOB_QUEUE.send errs (worker pool failed to start), the leader removes its own entry from IN_FLIGHT and fans out empty Vec to every attached sender, including any follower that raced in during the gap between insert and recovery-remove. That follower then gets Vec::new() from its reply_rx.recv() and returns empty. No infinite block.
  • upgrade_tcp_in_place encapsulates the take/reinstall/free invariant cleanly. The "slot reserved across yield" property now lives entirely inside tcp.rs. tls.rs is just a callback supplier — exactly the shape the PR3 reviewer-framing suggested. STREAMS, SocketRegistry, and StreamKind are all back to private. Good narrowing.
  • resolve_to_ips centralisation. The pattern resolve(host).iter().filter_map(...IpAddr...).collect() was distributed across tcp::patch_seq_tcp_connect, udp::patch_seq_udp_send_to, and http_client::ssrf::validate_url. Now collapsed to one helper with the IP-literal short-circuit baked in. Less duplication, one place to fix bugs.
  • Test-port cleanup is the right kind of bonus. 0 net.tcp.listen + dup net.tcp.local-port is the standard "let the OS pick" pattern. Removing the hardcoded magic numbers eliminates a parallel-CI flake source. The [ drop server-body ] strand.spawn drop idiom for popping the port from the child's stack-clone reads cleanly once you see it twice.
  • STDLIB_REFERENCE.md "Fanout collapsing" correctly rewrites the previous "Known limitation" section to describe the new behaviour. The "IP-literal fast path" paragraph below is the right level of detail for stdlib docs.

Bug — TLS-handshake-failure leaks a STREAMS id

upgrade_tcp_in_place's failure path is a silent no-op. After take_tcp(id) succeeds, the slot is Some(None) — Vec entry exists, value is None. On f(tcp) → Err(()), the code calls free_stream(id) → STREAMS.lock().unwrap().free(id). But SocketRegistry::free is guarded:

fn free(&mut self, id: usize) {
    if let Some(slot) = self.sockets.get_mut(id)
        && slot.is_some()                          // ← guard
    {
        *slot = None;
        self.free_ids.push(id);
    }
}

The guard exists to prevent double-free (pushing the same id onto free_ids twice would have allocate() hand the same id to two callers). But it doesn't distinguish "in-use → freed" from "reserved (set None by take_tcp) → freed." After take_tcp succeeds, *slot == None, so slot.is_some() is false and free() exits without pushing onto free_ids. The id is leaked. The Vec entry stays at index id holding None forever, and the next allocate() skips this slot and pushes a new entry, growing the Vec monotonically.

Per failed TLS handshake (invalid cert, server reset, ALPN mismatch, etc.), one id leaks. After MAX_SOCKETS (10 000) failures the runtime refuses new sockets. Real but slow-burn.

This bug predates this PR — the same STREAMS.lock().unwrap().free(socket_id) call existed in tls.rs before the encapsulation refactor and had the same no-op behaviour against a reserved-None slot. But this PR is the one that consolidates the take/reinstall/free dance into tcp.rs, which makes it the natural place to fix it. Three options:

  1. Add a release_reserved(id) method that pushes onto free_ids unconditionally when the slot is Some(None), refusing the call otherwise. Symmetric pair to take_tcp.

  2. In the failure branch of upgrade_tcp_in_place, push directly:

    Err(()) => {
        let mut streams = STREAMS.lock().unwrap();
        // take_tcp left the slot as Some(None); standard free() would
        // be a no-op (slot.is_some() == false), so we push the id back
        // onto the free list ourselves.
        streams.free_ids.push(id);
        return false;
    }
    

    This would need free_ids to be pub(crate) of SocketRegistry, or an inherent helper on SocketRegistry. Less clean.

  3. Track reservation explicitly in a third state (e.g. Reserved variant of an enum slot) — biggest churn, biggest clarity win, probably overkill.

I'd pick (1). It mirrors take_tcp directly and keeps the invariant local. Worth a unit test that allocates an id, takes it, releases it via the new method, allocates again, and asserts you got the same id back — that pins the round-trip semantics that the current TLS-failure path silently breaks.

Design concern — net.tcp.local-port and the LISTENERS/STREAMS id alias

LISTENERS and STREAMS are separate registries with separate free-lists, both starting at id 0. So a Seq-level Socket integer is only unique within one registry. The dispatch order in patch_seq_tcp_local_port (and in close, and elsewhere) is streams first, then listeners.

For close, this is fine: if you net.tcp.close 0 on a stream id 0 that aliases a listener id 0, you close the stream, which is the more recent and more dangerous resource. The listener still exists and can be closed by another path or by process teardown.

For local-port the failure mode is silent and wrong-answer rather than loud and conservative. Concretely:

0 net.tcp.listen                    \ → listener id 0 (LISTENERS[0] = Some)
"x" 80 net.tcp.connect              \ → stream id 0 (STREAMS[0] = Some) — alias!
\ Now ask for the listener's port:
0 net.tcp.local-port                \ returns the *stream's* ephemeral port, not the listener's bound port

In the new tests this can't happen because local-port is called immediately after listen, before any connect — so STREAMS is empty and dispatch falls through to LISTENERS. But in real Seq code that mixes listeners and streams, asking for a listener's local-port after streams have been allocated returns whichever resource happens to live at that numeric id in STREAMS. The user gets a plausible-looking integer (a real port number) with no indication anything is wrong.

The id-aliasing problem is pre-existing — it predates this PR — and the dispatch choice for local-port is consistent with close, which is the right consistency call. But local-port exposes a sharper version of the same flaw. Three mitigations, none required for merge:

  • Document the constraint in STDLIB_REFERENCE.md: "call net.tcp.local-port on a listener before allocating connected streams, or store the result eagerly."
  • Reverse the dispatch order for local-port (LISTENERS first). Asymmetric with close, which is itself an argument against doing it.
  • Tag the Socket integer with a registry-bit at the FFI boundary, encoding listener-vs-stream into the high bits of the i64. Out of scope here; tracked-elsewhere class of work.

I'd take the documentation route as a one-line stdlib note for now and treat the broader id-aliasing fix as its own follow-up.

Smaller observations

  • No direct test coverage for single-flight DNS. The integration tests exercise the path indirectly (TCP-connect-by-hostname, etc.), but a unit test that fires N concurrent dns::resolve calls for the same uncached host and asserts the worker count goes up by 1 (not N) would lock in the architectural property the PR adds. The IN_FLIGHT static is private, so a test would need either a #[cfg(test)] accessor or instrumentation on the worker (job counter). Both are small. The same goes for resolve_to_ips — a unit test asserting resolve_to_ips("127.0.0.1") returns [127.0.0.1] without incrementing a worker-job counter pins the fast-path skip. Optional, not for this PR.
  • Replies type alias is a bit generic. WaitingResolvers or InFlightSenders would read more directly. Nit.
  • is_reusable vs is_quiet — agree with the choice. is_reusable matches the call-site intent (if is_reusable(fd) { return Some(stream) }) and the docstring captures the underlying "quiet" property precisely. is_quiet would have made the reader trace from name → semantics; is_reusable makes the call-site read at a glance.
  • upgrade_tcp_in_place's Err(()) ergonomics. The unit () error type is a fine convention for "the callback consumed the stream and there's nothing to report," but if the failure path ever wants a reason (logging, metrics, propagating a TLS handshake error category), this signature would force a wider refactor. Not a problem now — flagging only.
  • STDLIB_REFERENCE.md local-port paragraph not added to the new "Networking — net.tcp.*" table or the prose section. The compiler-side add_docs insertion in crates/compiler/src/builtins/tcp.rs covers the integrated help, but a reader using the stdlib reference doc as their primary source won't see the new builtin. One row in the table is enough.

Verdict

Land after fixing the id-leak in upgrade_tcp_in_place's failure path. That's a real bug — small surface, slow burn, but the PR is the natural place. The local-port id-aliasing issue is a pre-existing class of bug that this PR inherits rather than introduces; flag in docs and treat the deeper fix as its own follow-up. The encapsulation refactor and the single-flight DNS implementation are the right shape and correctly implemented; the test-port cleanup is a clean bonus that eliminates a flake source. Nice closeout of the PR1–PR5 arc.

## Review — PR #486 (networking polish pass, closes #482) Reviewed against `origin/issue-482`. Tight half-day's work hitting all five items on the issue, plus the test-port cleanup bonus. The encapsulation refactor (`upgrade_tcp_in_place`) and the single-flight DNS implementation are the meaty pieces; the rest are local cleanups. I have **one real bug** to flag and **one design concern** that lands on top of pre-existing surface area. The architectural shape is right. ### What works well - **Single-flight DNS implementation is correct under the races I can construct.** The critical ordering is `CACHE.put` *before* `IN_FLIGHT.remove`, which the inline comment in `worker_loop` calls out explicitly. Late arrivers that race the fanout pop see the freshly-written cache and short-circuit without ever touching the in-flight map. I traced concurrent leader/follower, follower-after-pop, leader-panic-poison, and cache-TTL-expiry-mid-flight; nothing deadlocks and nothing returns a wrong answer. - **Enqueue-failure recovery path is right.** If `JOB_QUEUE.send` errs (worker pool failed to start), the leader removes its own entry from `IN_FLIGHT` and fans out empty `Vec` to every attached sender, including any follower that raced in during the gap between insert and recovery-remove. That follower then gets `Vec::new()` from its `reply_rx.recv()` and returns empty. No infinite block. - **`upgrade_tcp_in_place` encapsulates the take/reinstall/free invariant cleanly.** The "slot reserved across yield" property now lives entirely inside `tcp.rs`. `tls.rs` is just a callback supplier — exactly the shape the PR3 reviewer-framing suggested. `STREAMS`, `SocketRegistry`, and `StreamKind` are all back to private. Good narrowing. - **`resolve_to_ips` centralisation.** The pattern `resolve(host).iter().filter_map(...IpAddr...).collect()` was distributed across `tcp::patch_seq_tcp_connect`, `udp::patch_seq_udp_send_to`, and `http_client::ssrf::validate_url`. Now collapsed to one helper with the IP-literal short-circuit baked in. Less duplication, one place to fix bugs. - **Test-port cleanup is the right kind of bonus.** `0 net.tcp.listen` + `dup net.tcp.local-port` is the standard "let the OS pick" pattern. Removing the hardcoded magic numbers eliminates a parallel-CI flake source. The `[ drop server-body ] strand.spawn drop` idiom for popping the port from the child's stack-clone reads cleanly once you see it twice. - **STDLIB_REFERENCE.md "Fanout collapsing"** correctly rewrites the previous "Known limitation" section to describe the new behaviour. The "IP-literal fast path" paragraph below is the right level of detail for stdlib docs. ### Bug — TLS-handshake-failure leaks a `STREAMS` id `upgrade_tcp_in_place`'s failure path is a silent no-op. After `take_tcp(id)` succeeds, the slot is `Some(None)` — Vec entry exists, value is `None`. On `f(tcp) → Err(())`, the code calls `free_stream(id) → STREAMS.lock().unwrap().free(id)`. But `SocketRegistry::free` is guarded: ```rust fn free(&mut self, id: usize) { if let Some(slot) = self.sockets.get_mut(id) && slot.is_some() // ← guard { *slot = None; self.free_ids.push(id); } } ``` The guard exists to prevent double-free (pushing the same id onto `free_ids` twice would have `allocate()` hand the same id to two callers). But it doesn't distinguish "in-use → freed" from "reserved (set None by `take_tcp`) → freed." After `take_tcp` succeeds, `*slot == None`, so `slot.is_some()` is false and `free()` exits without pushing onto `free_ids`. **The id is leaked.** The Vec entry stays at index `id` holding `None` forever, and the next `allocate()` skips this slot and pushes a new entry, growing the Vec monotonically. Per failed TLS handshake (invalid cert, server reset, ALPN mismatch, etc.), one id leaks. After `MAX_SOCKETS` (10 000) failures the runtime refuses new sockets. Real but slow-burn. This bug **predates this PR** — the same `STREAMS.lock().unwrap().free(socket_id)` call existed in `tls.rs` before the encapsulation refactor and had the same no-op behaviour against a reserved-None slot. But this PR is the one that consolidates the take/reinstall/free dance into `tcp.rs`, which makes it the natural place to fix it. Three options: 1. **Add a `release_reserved(id)` method** that pushes onto `free_ids` unconditionally when the slot is `Some(None)`, refusing the call otherwise. Symmetric pair to `take_tcp`. 2. **In the failure branch of `upgrade_tcp_in_place`, push directly:** ```rust Err(()) => { let mut streams = STREAMS.lock().unwrap(); // take_tcp left the slot as Some(None); standard free() would // be a no-op (slot.is_some() == false), so we push the id back // onto the free list ourselves. streams.free_ids.push(id); return false; } ``` This would need `free_ids` to be `pub(crate)` of `SocketRegistry`, or an inherent helper on `SocketRegistry`. Less clean. 3. **Track reservation explicitly** in a third state (e.g. `Reserved` variant of an enum slot) — biggest churn, biggest clarity win, probably overkill. I'd pick (1). It mirrors `take_tcp` directly and keeps the invariant local. Worth a unit test that allocates an id, takes it, releases it via the new method, allocates again, and asserts you got the same id back — that pins the round-trip semantics that the current TLS-failure path silently breaks. ### Design concern — `net.tcp.local-port` and the LISTENERS/STREAMS id alias `LISTENERS` and `STREAMS` are separate registries with separate free-lists, both starting at id 0. So a Seq-level `Socket` integer is only unique *within* one registry. The dispatch order in `patch_seq_tcp_local_port` (and in `close`, and elsewhere) is **streams first, then listeners**. For `close`, this is fine: if you `net.tcp.close 0` on a stream id 0 that aliases a listener id 0, you close the stream, which is the more recent and more dangerous resource. The listener still exists and can be closed by another path or by process teardown. For `local-port` the failure mode is silent and wrong-answer rather than loud and conservative. Concretely: ```seq 0 net.tcp.listen \ → listener id 0 (LISTENERS[0] = Some) "x" 80 net.tcp.connect \ → stream id 0 (STREAMS[0] = Some) — alias! \ Now ask for the listener's port: 0 net.tcp.local-port \ returns the *stream's* ephemeral port, not the listener's bound port ``` In the new tests this can't happen because `local-port` is called immediately after `listen`, before any connect — so `STREAMS` is empty and dispatch falls through to `LISTENERS`. But in real Seq code that mixes listeners and streams, asking for a listener's local-port after streams have been allocated returns whichever resource happens to live at that numeric id in `STREAMS`. The user gets a plausible-looking integer (a real port number) with no indication anything is wrong. The id-aliasing problem is pre-existing — it predates this PR — and the dispatch choice for `local-port` is *consistent* with `close`, which is the right consistency call. But `local-port` exposes a sharper version of the same flaw. Three mitigations, none required for merge: - Document the constraint in `STDLIB_REFERENCE.md`: "call `net.tcp.local-port` on a listener before allocating connected streams, or store the result eagerly." - Reverse the dispatch order for `local-port` (`LISTENERS` first). Asymmetric with `close`, which is itself an argument against doing it. - Tag the `Socket` integer with a registry-bit at the FFI boundary, encoding listener-vs-stream into the high bits of the i64. Out of scope here; tracked-elsewhere class of work. I'd take the documentation route as a one-line stdlib note for now and treat the broader id-aliasing fix as its own follow-up. ### Smaller observations - **No direct test coverage for single-flight DNS.** The integration tests exercise the path indirectly (TCP-connect-by-hostname, etc.), but a unit test that fires N concurrent `dns::resolve` calls for the same uncached host and asserts the worker count goes up by 1 (not N) would lock in the architectural property the PR adds. The `IN_FLIGHT` static is private, so a test would need either a `#[cfg(test)]` accessor or instrumentation on the worker (job counter). Both are small. The same goes for `resolve_to_ips` — a unit test asserting `resolve_to_ips("127.0.0.1")` returns `[127.0.0.1]` *without* incrementing a worker-job counter pins the fast-path skip. Optional, not for this PR. - **`Replies` type alias is a bit generic.** `WaitingResolvers` or `InFlightSenders` would read more directly. Nit. - **`is_reusable` vs `is_quiet`** — agree with the choice. `is_reusable` matches the call-site intent (`if is_reusable(fd) { return Some(stream) }`) and the docstring captures the underlying "quiet" property precisely. `is_quiet` would have made the reader trace from name → semantics; `is_reusable` makes the call-site read at a glance. - **`upgrade_tcp_in_place`'s `Err(())` ergonomics.** The unit `()` error type is a fine convention for "the callback consumed the stream and there's nothing to report," but if the failure path ever wants a reason (logging, metrics, propagating a TLS handshake error category), this signature would force a wider refactor. Not a problem now — flagging only. - **`STDLIB_REFERENCE.md` `local-port` paragraph not added** to the new "Networking — net.tcp.*" table or the prose section. The compiler-side `add_docs` insertion in `crates/compiler/src/builtins/tcp.rs` covers the integrated help, but a reader using the stdlib reference doc as their primary source won't see the new builtin. One row in the table is enough. ### Verdict **Land after fixing the id-leak in `upgrade_tcp_in_place`'s failure path.** That's a real bug — small surface, slow burn, but the PR is the natural place. The `local-port` id-aliasing issue is a pre-existing class of bug that this PR inherits rather than introduces; flag in docs and treat the deeper fix as its own follow-up. The encapsulation refactor and the single-flight DNS implementation are the right shape and correctly implemented; the test-port cleanup is a clean bonus that eliminates a flake source. Nice closeout of the PR1–PR5 arc.
● Review fixes done. CI green.
All checks were successful
CI - Linux / CI - Linux x86_64 (pull_request) Successful in 10m46s
dfca298cad
Done:

  1. TLS-handshake-failure id leak fixed (tcp.rs). New SocketRegistry::release_reserved(id) is the symmetric pair to take_tcp — it pushes the id back onto
  free_ids when the slot is in the take-reserved shape (Vec entry present, inner Option is None), with a free_ids.contains(&id) guard against double-push.
  upgrade_tcp_in_place's Err(()) branch now calls release_reserved_stream(id) instead of the (silently-no-op) free_stream(id). Inline comment explains why
  plain free() was insufficient: take_tcp already nulled the slot, which made free()'s slot.is_some() guard skip the push.
  2. Unit test for the round-trip (tcp/tests.rs::test_socket_registry_release_reserved_round_trip). Allocates → simulates take_tcp → calls release_reserved
  → allocates again → asserts same id round-trips. Also exercises the double-release case: after take + release_reserved + release_reserved, the next two
  allocations must get different ids — proving the contains guard works. Lock-stops the regression: the test would fail with the pre-fix free() no-op
  behaviour.
  3. net.tcp.local-port table row added to STDLIB_REFERENCE.md TCP section. Also documented that listen accepts 0 for OS-assigned ports while I was there.
  4. Id-aliasing doc note added below the TCP table — explains the streams-first dispatch, gives the practical workaround (call local-port on listeners
  before allocating streams), and references the broader follow-up. Honest about the surface without overstating the danger.
  5. Replies → InFlightSenders in dns.rs. Reads more directly at the use sites.
Author
Owner

Second pass — review-driven fixes look right

Walked the delta from cdb6c9b..dfca298. All four asks landed cleanly.

release_reserved fix for the id-leak (tcp.rs)

The new SocketRegistry::release_reserved(id) is the symmetric pair to take_tcp and the right shape. I traced the precondition matrix:

  • Reserved slot (post-take, slot is Some(None), id not on free list)is_reserved == true, contains == false → push. ✓
  • Reserved slot, called twice in a row — second call: is_reserved == true, contains == true → skip. No double-push. ✓
  • Already-freed slot (Some(None), id on free list)is_reserved == true but contains == true → skip. ✓ Doesn't corrupt.
  • In-use slot (Some(value))is_reserved == false → skip. ✓ Doesn't accidentally release a live id.
  • Out-of-bounds idsockets.get(id) returns None → .unwrap_or(false) → skip. ✓

The "free list is normally short, so the linear scan is cheap" rationale in the docstring is accurate — under steady-state load, free_ids is bounded by the count of recently-released ids and stays tiny. The contains check is the right defensive choice over a HashSet sidecar.

The inline comment on the Err(()) branch in upgrade_tcp_in_place explaining why plain free() was insufficient (take_tcp already nulled the slot) makes the fix self-explanatory for the next reader who lands in this code.

Unit test pins the regression (tcp/tests.rs)

test_socket_registry_release_reserved_round_trip is exactly the test I'd have written. The first half (allocate → simulate take → release_reserved → allocate again → assert same id) would have failed under the pre-fix no-op behaviour because the second allocate would have pushed a new Vec entry at id 1. The second half (double-release → two allocates must get different ids) pins the contains guard — without it, free_ids would hold [0, 0] and both allocates would return id 0, which is the use-after-free in disguise. Both halves lock-step the invariant.

One nit, not for fixing: the test asserts id == 0 for the first allocation, which couples the test to internal allocation order. If a future refactor changed the free-list to a SmallVec or a queue, the test would still pass logically but with a different first id. Acceptable — the alternative (allocating-and-not-asserting-the-id) hides what's being asserted on the next line.

Doc additions for net.tcp.local-port (STDLIB_REFERENCE.md)

The new table row is clean. The id-aliasing note is the right level of disclosure — honest about the surface ("the same Socket integer can refer to different resources depending on which registry holds it") without overstating the danger ("treat Socket ids as resource-local, not globally unique"). The practical workaround ("call net.tcp.local-port on it before allocating any connected streams") is the kind of guidance a user actually needs. References the broader follow-up without committing to a timeline. Good shape.

Bonus catch on the listen 0 documentation — "Pass 0 to let the OS pick" was previously implicit in the runtime but never made it into the stdlib reference. Now any reader landing on the table sees the same idiom the tests use.

RepliesInFlightSenders (dns.rs)

Reads more directly. Trivial change, but worth doing.

Verdict

LGTM. Ship it. The fix is minimal, the test pins the regression, the docs are honest, and the rename is a clean read. The id-aliasing follow-up (a deeper LISTENERS/STREAMS unification, or a registry-bit tag on Socket) remains the right shape to leave deferred — the doc disclosure is sufficient for now.

## Second pass — review-driven fixes look right Walked the delta from `cdb6c9b..dfca298`. All four asks landed cleanly. ### `release_reserved` fix for the id-leak (tcp.rs) The new `SocketRegistry::release_reserved(id)` is the symmetric pair to `take_tcp` and the right shape. I traced the precondition matrix: - **Reserved slot (post-take, slot is `Some(None)`, id not on free list)** — `is_reserved == true`, `contains == false` → push. ✓ - **Reserved slot, called twice in a row** — second call: `is_reserved == true`, `contains == true` → skip. No double-push. ✓ - **Already-freed slot (`Some(None)`, id on free list)** — `is_reserved == true` but `contains == true` → skip. ✓ Doesn't corrupt. - **In-use slot (`Some(value)`)** — `is_reserved == false` → skip. ✓ Doesn't accidentally release a live id. - **Out-of-bounds id** — `sockets.get(id)` returns None → `.unwrap_or(false)` → skip. ✓ The "free list is normally short, so the linear scan is cheap" rationale in the docstring is accurate — under steady-state load, free_ids is bounded by the count of recently-released ids and stays tiny. The contains check is the right defensive choice over a HashSet sidecar. The inline comment on the `Err(())` branch in `upgrade_tcp_in_place` explaining *why* plain `free()` was insufficient (`take_tcp already nulled the slot`) makes the fix self-explanatory for the next reader who lands in this code. ### Unit test pins the regression (tcp/tests.rs) `test_socket_registry_release_reserved_round_trip` is exactly the test I'd have written. The first half (allocate → simulate take → release_reserved → allocate again → assert same id) would have failed under the pre-fix no-op behaviour because the second allocate would have pushed a new Vec entry at id 1. The second half (double-release → two allocates must get different ids) pins the `contains` guard — without it, free_ids would hold `[0, 0]` and both allocates would return id 0, which is the use-after-free in disguise. Both halves lock-step the invariant. One nit, not for fixing: the test asserts `id == 0` for the first allocation, which couples the test to internal allocation order. If a future refactor changed the free-list to a SmallVec or a queue, the test would still pass logically but with a different first id. Acceptable — the alternative (allocating-and-not-asserting-the-id) hides what's being asserted on the next line. ### Doc additions for `net.tcp.local-port` (STDLIB_REFERENCE.md) The new table row is clean. The id-aliasing note is the right level of disclosure — honest about the surface ("the same `Socket` integer can refer to different resources depending on which registry holds it") without overstating the danger ("treat `Socket` ids as resource-local, not globally unique"). The practical workaround ("call `net.tcp.local-port` on it *before* allocating any connected streams") is the kind of guidance a user actually needs. References the broader follow-up without committing to a timeline. Good shape. Bonus catch on the `listen 0` documentation — "Pass `0` to let the OS pick" was previously implicit in the runtime but never made it into the stdlib reference. Now any reader landing on the table sees the same idiom the tests use. ### `Replies` → `InFlightSenders` (dns.rs) Reads more directly. Trivial change, but worth doing. ### Verdict LGTM. Ship it. The fix is minimal, the test pins the regression, the docs are honest, and the rename is a clean read. The id-aliasing follow-up (a deeper LISTENERS/STREAMS unification, or a registry-bit tag on Socket) remains the right shape to leave deferred — the doc disclosure is sufficient for now.
navicore deleted branch issue-482 2026-05-14 00:44:41 +00:00
Sign in to join this conversation.
No description provided.