● All three test anchors landed. CI fully green. #487

Merged
navicore merged 2 commits from issue-483 into main 2026-05-14 13:03:18 +00:00
Owner

Done for issue #483:

  1. #[cfg(test)] hooks in dns.rs — RESOLVE_CALL_COUNT (atomic) + SCRIPTED_RESPONSES (queue), with
    push_scripted_response / clear_scripted_responses / reset_resolve_call_count / resolve_call_count helpers.
    resolve() increments the counter on every call and pops from the scripted queue (when non-empty) before
    consulting cache or worker pool. Zero footprint in release builds.
  2. SSRF DNS-rebinding closure test — pushes a scripted 224.0.0.1 (multicast, fail-fast ENETUNREACH in
    microseconds), issues perform_request("GET", "http://example.com/", None), asserts dns::resolve_call_count() ==
  3. A regression that passes host: String to connect (re-resolving downstream) would bump the count to 2+ and
    fail this test loudly. Cost: ~140µs. Serialised with test_ssrf_allows_public_urls via
    #[serial_test::serial(dns_global_state)] so concurrent counter mutations can't false-positive.
  4. ServerMode::CloseOnce + Connection: close pool-eviction test — new variant emits Connection: close + body
    then returns from the inner spawn (dropping stream, sending FIN). Test issues two requests, asserts accept_count
    == 2 (proving the pool dropped the close-marked entry instead of trying to reuse it). accept_count == 1 would
    mean pool::release ignored keep_alive=false.
  5. Happy-path HTTPS end-to-end test against a same-process rustls server:
    - rcgen added as dev-dep with default-features = false, features = ["pem", "ring"] so the cert toolchain
    shares the runtime's existing ring provider.
    - #[cfg(test)] install_test_tls_config / clear_test_tls_config hooks in tls.rs, with a new
    current_tls_config() helper that the test override transparently routes through.
    - build_tls_test_pair() generates a self-signed cert (SAN=localhost) at test time and produces matching
    ServerConfig + ClientConfig (where the client trusts the test cert as the only root).
    - spawn_tls_test_server uses std::thread + std::net::TcpListener instead of may. The TLS handshake is
    multi-round-trip; running the server as a may coroutine alongside hundreds of other parallel tests saturated the
    may worker pool and starved the handshake. A dedicated OS thread keeps the kernel-scheduled server runnable
    regardless of may load. The test passes in 0.13s standalone AND under the full just ci runtime suite.
Done for issue #483: 1. #[cfg(test)] hooks in dns.rs — RESOLVE_CALL_COUNT (atomic) + SCRIPTED_RESPONSES (queue), with push_scripted_response / clear_scripted_responses / reset_resolve_call_count / resolve_call_count helpers. resolve() increments the counter on every call and pops from the scripted queue (when non-empty) before consulting cache or worker pool. Zero footprint in release builds. 2. SSRF DNS-rebinding closure test — pushes a scripted 224.0.0.1 (multicast, fail-fast ENETUNREACH in microseconds), issues perform_request("GET", "http://example.com/", None), asserts dns::resolve_call_count() == 1. A regression that passes host: String to connect (re-resolving downstream) would bump the count to 2+ and fail this test loudly. Cost: ~140µs. Serialised with test_ssrf_allows_public_urls via #[serial_test::serial(dns_global_state)] so concurrent counter mutations can't false-positive. 3. ServerMode::CloseOnce + Connection: close pool-eviction test — new variant emits Connection: close + body then returns from the inner spawn (dropping stream, sending FIN). Test issues two requests, asserts accept_count == 2 (proving the pool dropped the close-marked entry instead of trying to reuse it). accept_count == 1 would mean pool::release ignored keep_alive=false. 4. Happy-path HTTPS end-to-end test against a same-process rustls server: - rcgen added as dev-dep with default-features = false, features = ["pem", "ring"] so the cert toolchain shares the runtime's existing ring provider. - #[cfg(test)] install_test_tls_config / clear_test_tls_config hooks in tls.rs, with a new current_tls_config() helper that the test override transparently routes through. - build_tls_test_pair() generates a self-signed cert (SAN=localhost) at test time and produces matching ServerConfig + ClientConfig (where the client trusts the test cert as the only root). - spawn_tls_test_server uses std::thread + std::net::TcpListener instead of may. The TLS handshake is multi-round-trip; running the server as a may coroutine alongside hundreds of other parallel tests saturated the may worker pool and starved the handshake. A dedicated OS thread keeps the kernel-scheduled server runnable regardless of may load. The test passes in 0.13s standalone AND under the full just ci runtime suite.
● All three test anchors landed. CI fully green.
All checks were successful
CI - Linux / CI - Linux x86_64 (pull_request) Successful in 10m48s
a88b53434d
Done for issue #483:

  1. #[cfg(test)] hooks in dns.rs — RESOLVE_CALL_COUNT (atomic) + SCRIPTED_RESPONSES (queue), with
  push_scripted_response / clear_scripted_responses / reset_resolve_call_count / resolve_call_count helpers.
  resolve() increments the counter on every call and pops from the scripted queue (when non-empty) before
  consulting cache or worker pool. Zero footprint in release builds.
  2. SSRF DNS-rebinding closure test — pushes a scripted 224.0.0.1 (multicast, fail-fast ENETUNREACH in
  microseconds), issues perform_request("GET", "http://example.com/", None), asserts dns::resolve_call_count() ==
  1. A regression that passes host: String to connect (re-resolving downstream) would bump the count to 2+ and
  fail this test loudly. Cost: ~140µs. Serialised with test_ssrf_allows_public_urls via
  #[serial_test::serial(dns_global_state)] so concurrent counter mutations can't false-positive.
  3. ServerMode::CloseOnce + Connection: close pool-eviction test — new variant emits Connection: close + body
  then returns from the inner spawn (dropping stream, sending FIN). Test issues two requests, asserts accept_count
   == 2 (proving the pool dropped the close-marked entry instead of trying to reuse it). accept_count == 1 would
  mean pool::release ignored keep_alive=false.
  4. Happy-path HTTPS end-to-end test against a same-process rustls server:
    - rcgen added as dev-dep with default-features = false, features = ["pem", "ring"] so the cert toolchain
  shares the runtime's existing ring provider.
    - #[cfg(test)] install_test_tls_config / clear_test_tls_config hooks in tls.rs, with a new
  current_tls_config() helper that the test override transparently routes through.
    - build_tls_test_pair() generates a self-signed cert (SAN=localhost) at test time and produces matching
  ServerConfig + ClientConfig (where the client trusts the test cert as the only root).
    - spawn_tls_test_server uses std::thread + std::net::TcpListener instead of may. The TLS handshake is
  multi-round-trip; running the server as a may coroutine alongside hundreds of other parallel tests saturated the
   may worker pool and starved the handshake. A dedicated OS thread keeps the kernel-scheduled server runnable
  regardless of may load. The test passes in 0.13s standalone AND under the full just ci runtime suite.
Author
Owner

Review — PR #487 (test anchors for un-pinned architectural properties, closes #483)

Reviewed against origin/issue-483. Three test anchors landed for the deferred items from PR1–PR5 reviews. One of them pins the property cleanly; one has clever fast-fail-via-multicast that I want to walk through in detail; one doesn't actually anchor what it claims to anchor because of a downstream code path that masks the regression on localhost. The test infrastructure (cfg(test) hooks, rcgen dev-dep, std::thread server) is well-designed.

Solid: DNS hooks design (dns.rs)

The RESOLVE_CALL_COUNT + SCRIPTED_RESPONSES pair is the right shape and the gating is correct:

  • Both statics are #[cfg(test)] — zero footprint in release builds.
  • The counter is AtomicUsize with SeqCst — fine for the assertion granularity needed; nothing here would benefit from Relaxed.
  • RESOLVE_CALL_COUNT.fetch_add lands at the top of resolve() (before the empty-hostname short-circuit), so an empty-hostname call still bumps the counter. Reasonable — it lets a future test catch "validator passes empty host through to resolve" regressions.
  • The scripted-response check is positioned correctly: after the empty-hostname check, before the cache and the in-flight map. So a scripted response bypasses both — a test gets deterministic results regardless of whether the cache happens to be hot.

One nit on visibility: pub(crate) for the four test helpers, which means they're crate-visible at all times in the test build. If someone in tcp.rs or elsewhere imports and uses dns::push_scripted_response outside #[cfg(test)] it won't compile (the function is gated), but the IDE will offer it as completion. A pub(crate) inside a #[cfg(test)] mod test_hooks { ... } would narrow the surface more, at the cost of an extra layer. Optional.

Solid: SSRF rebinding closure test (anchors the architectural win)

This is the test the PR4 second-pass review specifically asked for, and the anchoring is correct. I traced the property:

  • perform_request("GET", "http://example.com/", None)validate_urlresolve_to_ips("example.com")resolve("example.com") (counter += 1) → scripted-response pop returns ["224.0.0.1"].
  • validate_url runs the SSRF check on [224.0.0.1]. is_dangerous_ipv4(224.0.0.1) is false (not loopback, not 10.x, not 172.16-31, not 192.168, not 169.254, not broadcast — multicast isn't in is_dangerous_ipv4's list). So validate_url returns Ok with addrs=[224.0.0.1], host="example.com".
  • perform_request_with_target dials 224.0.0.1:80. Kernel returns ENETUNREACH fast (microseconds on Linux). Connect path returns error.
  • dns::resolve_call_count() returns 1. ✓

The regression target — passing host: String to connect and re-resolving — would bump the counter to ≥ 2 and fail loudly. Good lock-stop.

A few observations:

  • The multicast-vs-TEST-NET-1 trade-off note in the test is exactly right. TEST-NET-1 (192.0.2.0/24) routes through the default gateway on boxes that don't explicitly null-route it and burns the SYN timeout (~60s). 224.0.0.1 gets a fast-path kernel rejection on Linux. The comment captures the trade-off so the next reader doesn't "simplify" by switching ranges. Worth keeping.

  • Platform sensitivity is implicit. Linux's TCP-to-multicast fast-fail is the property the test relies on for cheap iteration. The CI is Linux per the project history; if this ever wants to run on macOS/BSD, the multicast behaviour may differ (could take longer). Not a problem now — flagging only.

  • Serial-tag coverage is correct. I walked through every test that calls into SSRF / DNS:

    • test_ssrf_blocks_localhost (etc.) — localhost rejected by the lowercase string check in validate_url before resolve. No counter bump.
    • All 127.0.0.1 / 10.x / 192.168.x / 169.254.x / multicast literal tests — resolve_to_ips short-circuits via the IP-literal fast path. No counter bump.
    • Scheme-rejection tests (gopher://, ftp://, file:///) — rejected at scheme parse. No counter bump.
    • test_ssrf_allows_public_urls — calls validate_url("https://example.com/") and "https://httpbin.org/get", both hostname (non-literal). Real resolve() calls. Bumps counter. Correctly tagged #[serial_test::serial(dns_global_state)].

    So only those two tests bump the counter, and they're both in the same serial group. Tagging is right.

Concerning: Connection: close pool-eviction test doesn't catch the targeted regression

The assertion message says: "accept_count=1 means the pool ignored keep_alive=false and reused a doomed entry."

But this isn't true on localhost. Walk both branches:

No regression (correct release):

  1. r1 completes. release(key, stream, keep_alive=false). Early return — entry dropped.
  2. r2 starts. checkout(key) — empty queue, returns None.
  3. r2 dials fresh → server accepts → accept_count == 2. ✓

Targeted regression (release ignores keep_alive=false):

  1. r1 completes. release(key, stream, keep_alive=false) — buggy: pools the entry anyway.
  2. r2 starts. checkout(key) — pops entry. Calls is_reusable(fd). The server thread has returned, dropped its TcpStream, and the kernel has sent FIN. On localhost FIN propagates in microseconds — by the time checkout runs, poll(POLLIN, 0) returns rc=1 with POLLIN set and 0 bytes pending. is_reusable correctly returns false. Entry discarded.
  3. r2 dials fresh → server accepts → accept_count == 2. ✓ (same!)

The test passes in both branches. The only way to get accept_count == 1 is a compound regression where BOTH release ignores keep_alive=false AND is_reusable misses the FIN (or the FIN is genuinely delayed past checkout). On localhost the second condition is essentially impossible — kernel FIN propagation is sub-microsecond.

The test does catch something (the compound regression, or a regression on a slow network), but it doesn't lock-step the property the test name and comment claim. A future "release: pool everything regardless of keep_alive" regression would slip through CI because is_reusable rescues it on the localhost test box.

Suggested fix — assert directly on pool state. Expose a pool::total_for_test() (or pool::idle_count_for_test(key)) accessor and assert it's 0 between r1 and r2. That captures the actual release-path behaviour without depending on a downstream check. Something like:

let r1 = perform_request_with_target("GET", target1, None);
assert_eq!(unwrap_response(&r1).0, 200);

// After r1's release, the close-marked entry must NOT be pooled.
// This is what the test exists to lock in — independent of
// whether is_reusable would later have caught a doomed entry.
assert_eq!(
    pool::idle_count_for_test(&pool_key_for(port)),
    0,
    "Connection: close response must not survive release()"
);

This is ~15 lines of accessor plus a PoolKey test-helper to construct the key. Worth the extra mile because the test as written reads like a regression anchor but functions like a compound-failure anchor.

(Caveat: the suggested accessor needs a pub(crate) fn pool_key_for_test(scheme, host, port) -> PoolKey or similar — PoolKey is private to the pool module. Pick whichever is least surface-area to expose.)

Good with one hardening: Happy-path HTTPS test

The architecture is well-thought-through. Specifically:

  • rcgen as dev-dep with default-features = false, features = ["pem", "ring"] is the right choice — it pins the cert toolchain to the same ring provider rustls uses, avoiding a dual-provider build (which would have surfaced as a "no provider installed" runtime panic from rustls).
  • std::thread + std::net::TcpListener for the server, with the multi-paragraph comment explaining the may-scheduler-congestion reason, is exactly right. The reasoning ("the TLS handshake is a multi-round-trip dance ... saturating the may worker pool starves the handshake") is real — I've debugged the same shape elsewhere. Future readers will want to "clean up" by switching to may; the comment prevents that.
  • The cfg(test) trust-root override via install_test_tls_config / current_tls_config() is minimal, vanishes from release, and the indirection through current_tls_config() keeps build_tls's call site idiomatic.
  • build_tls_test_pair correctly adds the self-signed cert to the client's RootCertStore as its only root. The SAN=localhost matches the SNI the client sends (host: "localhost" in the ValidatedTarget). The test asserts on response body, not just status — so a regression that returns a valid 200 but loses the body would still fail. Good.

Hardening — serialise the test and guard against panic-leaks. The override mutates process-wide TEST_TLS_CONFIG. Two improvements:

  1. Tag the test #[serial_test::serial(tls_global_state)]. No other test currently uses TLS configs, but the moment a second TLS test lands without coordination, this is a flake source.

  2. Use an RAII guard for clear_test_tls_config:

    struct TlsConfigGuard;
    impl Drop for TlsConfigGuard {
        fn drop(&mut self) { crate::tls::clear_test_tls_config(); }
    }
    let _guard = TlsConfigGuard;  // before install_test_tls_config
    

    Today the test calls clear_test_tls_config() before the assertions to survive a panic in perform_request_with_target. But a panic in build_tls_test_pair (e.g. rcgen::generate_simple_self_signed failing under a future rcgen API change) or in unwrap_response after the test config is set would leak the override. RAII guard makes the cleanup unconditional.

    Optional — neither has hit production. Flagging because the cost is small and the future-flake protection is real.

Smaller observations

  • **scripted-response check happens BEFORE the cache.** So a scripted response bypasses the cache entirely (it doesn't populate the cache, and a real lookup after a scripted one wouldn't be served from the cache as a side effect of the scripted one). The intent is "test deterministic" — correct — but worth noting that scripted responses are *not* equivalent to "pretend the cache returned this." If a future test wants to test cache-write behaviour, it'd need to populate the cache directly, not via push_scripted_response`.
  • ServerMode::CloseOnce's server-side flow uses return to drop the stream and send FIN. Reading the code, the return exits the inner per-connection move || (which is a may coroutine spawned for each accept). Dropping the stream sends FIN. The outer accept loop continues. This is correct. The comment captures the intent.
  • https_round_trip_against_same_process_rustls_server doesn't pool::clear_for_test() at end. The test pools the HTTPS connection (if at all — server emits Connection: close, so probably not). On the off chance a pooled entry survives, it would leak into the next test. Optional cleanup.
  • build_tls_test_pair returns Arc<rustls::ServerConfig> and Arc<rustls::ClientConfig> — both Arc'd, which lets the server config be cloned per-connection inside spawn_tls_test_server. Right shape.

Verdict

LGTM after one fix and one hardening:

  • Required: strengthen the Connection: close test to assert pool state directly (currently the test name and comment describe a property the test doesn't pin — is_reusable's FIN detection masks the regression on localhost).
  • Recommended: serial tag + RAII guard for the HTTPS test.

The DNS-rebinding closure anchor is exactly the test the PR4 review asked for, well-executed. The TLS happy-path test is a solid hermetic e2e against a same-process rustls server, with the right scheduler/provider choices and good in-code documentation. The DNS hooks themselves are clean and don't leak into release.

Closes a real coverage gap in the PR1–PR5 networking arc.

## Review — PR #487 (test anchors for un-pinned architectural properties, closes #483) Reviewed against `origin/issue-483`. Three test anchors landed for the deferred items from PR1–PR5 reviews. One of them pins the property cleanly; one has clever fast-fail-via-multicast that I want to walk through in detail; one **doesn't actually anchor what it claims to anchor** because of a downstream code path that masks the regression on localhost. The test infrastructure (cfg(test) hooks, rcgen dev-dep, std::thread server) is well-designed. ### Solid: DNS hooks design (dns.rs) The `RESOLVE_CALL_COUNT` + `SCRIPTED_RESPONSES` pair is the right shape and the gating is correct: - Both statics are `#[cfg(test)]` — zero footprint in release builds. - The counter is `AtomicUsize` with `SeqCst` — fine for the assertion granularity needed; nothing here would benefit from `Relaxed`. - `RESOLVE_CALL_COUNT.fetch_add` lands at the *top* of `resolve()` (before the empty-hostname short-circuit), so an empty-hostname call still bumps the counter. Reasonable — it lets a future test catch "validator passes empty host through to resolve" regressions. - The scripted-response check is positioned correctly: *after* the empty-hostname check, *before* the cache and the in-flight map. So a scripted response bypasses both — a test gets deterministic results regardless of whether the cache happens to be hot. One nit on visibility: `pub(crate)` for the four test helpers, which means they're crate-visible at all times in the test build. If someone in `tcp.rs` or elsewhere imports and uses `dns::push_scripted_response` outside `#[cfg(test)]` it won't compile (the function is gated), but the IDE will offer it as completion. A `pub(crate)` inside a `#[cfg(test)] mod test_hooks { ... }` would narrow the surface more, at the cost of an extra layer. Optional. ### Solid: SSRF rebinding closure test (anchors the architectural win) This is the test the PR4 second-pass review specifically asked for, and the anchoring is correct. I traced the property: - `perform_request("GET", "http://example.com/", None)` → `validate_url` → `resolve_to_ips("example.com")` → `resolve("example.com")` (counter += 1) → scripted-response pop returns `["224.0.0.1"]`. - `validate_url` runs the SSRF check on `[224.0.0.1]`. `is_dangerous_ipv4(224.0.0.1)` is false (not loopback, not 10.x, not 172.16-31, not 192.168, not 169.254, not broadcast — multicast isn't in `is_dangerous_ipv4`'s list). So `validate_url` returns `Ok` with `addrs=[224.0.0.1]`, `host="example.com"`. - `perform_request_with_target` dials 224.0.0.1:80. Kernel returns ENETUNREACH fast (microseconds on Linux). Connect path returns error. - `dns::resolve_call_count()` returns 1. ✓ The regression target — passing `host: String` to connect and re-resolving — would bump the counter to ≥ 2 and fail loudly. Good lock-stop. A few observations: - **The multicast-vs-TEST-NET-1 trade-off note in the test is exactly right.** TEST-NET-1 (192.0.2.0/24) routes through the default gateway on boxes that don't explicitly null-route it and burns the SYN timeout (~60s). 224.0.0.1 gets a fast-path kernel rejection on Linux. The comment captures the trade-off so the next reader doesn't "simplify" by switching ranges. Worth keeping. - **Platform sensitivity is implicit.** Linux's TCP-to-multicast fast-fail is the property the test relies on for cheap iteration. The CI is Linux per the project history; if this ever wants to run on macOS/BSD, the multicast behaviour may differ (could take longer). Not a problem now — flagging only. - **Serial-tag coverage is correct.** I walked through every test that calls into SSRF / DNS: - `test_ssrf_blocks_localhost` (etc.) — localhost rejected by the lowercase string check in `validate_url` **before** `resolve`. No counter bump. - All `127.0.0.1` / `10.x` / `192.168.x` / `169.254.x` / multicast literal tests — `resolve_to_ips` short-circuits via the IP-literal fast path. No counter bump. - Scheme-rejection tests (`gopher://`, `ftp://`, `file:///`) — rejected at scheme parse. No counter bump. - `test_ssrf_allows_public_urls` — calls `validate_url("https://example.com/")` and `"https://httpbin.org/get"`, both hostname (non-literal). Real `resolve()` calls. **Bumps counter.** Correctly tagged `#[serial_test::serial(dns_global_state)]`. So only those two tests bump the counter, and they're both in the same serial group. Tagging is right. ### Concerning: Connection: close pool-eviction test doesn't catch the targeted regression The assertion message says: `"accept_count=1 means the pool ignored keep_alive=false and reused a doomed entry."` But this isn't true on localhost. Walk both branches: **No regression (correct release):** 1. r1 completes. `release(key, stream, keep_alive=false)`. Early return — entry dropped. 2. r2 starts. `checkout(key)` — empty queue, returns None. 3. r2 dials fresh → server accepts → `accept_count == 2`. ✓ **Targeted regression (release ignores keep_alive=false):** 1. r1 completes. `release(key, stream, keep_alive=false)` — buggy: pools the entry anyway. 2. r2 starts. `checkout(key)` — pops entry. Calls `is_reusable(fd)`. The server thread has returned, dropped its TcpStream, and the kernel has sent FIN. On localhost FIN propagates in microseconds — by the time `checkout` runs, `poll(POLLIN, 0)` returns rc=1 with POLLIN set and 0 bytes pending. `is_reusable` correctly returns false. Entry discarded. 3. r2 dials fresh → server accepts → `accept_count == 2`. ✓ (same!) The test **passes in both branches**. The only way to get `accept_count == 1` is a *compound* regression where BOTH `release` ignores `keep_alive=false` AND `is_reusable` misses the FIN (or the FIN is genuinely delayed past `checkout`). On localhost the second condition is essentially impossible — kernel FIN propagation is sub-microsecond. The test does catch *something* (the compound regression, or a regression on a slow network), but it doesn't lock-step the property the test name and comment claim. A future "release: pool everything regardless of keep_alive" regression would slip through CI because `is_reusable` rescues it on the localhost test box. **Suggested fix — assert directly on pool state.** Expose a `pool::total_for_test()` (or `pool::idle_count_for_test(key)`) accessor and assert it's 0 between r1 and r2. That captures the actual release-path behaviour without depending on a downstream check. Something like: ```rust let r1 = perform_request_with_target("GET", target1, None); assert_eq!(unwrap_response(&r1).0, 200); // After r1's release, the close-marked entry must NOT be pooled. // This is what the test exists to lock in — independent of // whether is_reusable would later have caught a doomed entry. assert_eq!( pool::idle_count_for_test(&pool_key_for(port)), 0, "Connection: close response must not survive release()" ); ``` This is ~15 lines of accessor plus a `PoolKey` test-helper to construct the key. Worth the extra mile because the test as written reads like a regression anchor but functions like a compound-failure anchor. (Caveat: the suggested accessor needs a `pub(crate) fn pool_key_for_test(scheme, host, port) -> PoolKey` or similar — `PoolKey` is private to the pool module. Pick whichever is least surface-area to expose.) ### Good with one hardening: Happy-path HTTPS test The architecture is well-thought-through. Specifically: - **`rcgen` as dev-dep with `default-features = false, features = ["pem", "ring"]`** is the right choice — it pins the cert toolchain to the same `ring` provider rustls uses, avoiding a dual-provider build (which would have surfaced as a "no provider installed" runtime panic from rustls). - **`std::thread` + `std::net::TcpListener` for the server**, with the multi-paragraph comment explaining the may-scheduler-congestion reason, is exactly right. The reasoning ("the TLS handshake is a multi-round-trip dance ... saturating the may worker pool starves the handshake") is real — I've debugged the same shape elsewhere. Future readers will want to "clean up" by switching to may; the comment prevents that. - **The `cfg(test)` trust-root override via `install_test_tls_config` / `current_tls_config()`** is minimal, vanishes from release, and the indirection through `current_tls_config()` keeps `build_tls`'s call site idiomatic. - **`build_tls_test_pair` correctly adds the self-signed cert to the client's `RootCertStore` as its only root.** The SAN=localhost matches the SNI the client sends (`host: "localhost"` in the ValidatedTarget). The test asserts on response body, not just status — so a regression that returns a valid 200 but loses the body would still fail. Good. **Hardening — serialise the test and guard against panic-leaks.** The override mutates process-wide `TEST_TLS_CONFIG`. Two improvements: 1. Tag the test `#[serial_test::serial(tls_global_state)]`. No other test currently uses TLS configs, but the moment a second TLS test lands without coordination, this is a flake source. 2. Use an RAII guard for `clear_test_tls_config`: ```rust struct TlsConfigGuard; impl Drop for TlsConfigGuard { fn drop(&mut self) { crate::tls::clear_test_tls_config(); } } let _guard = TlsConfigGuard; // before install_test_tls_config ``` Today the test calls `clear_test_tls_config()` before the assertions to survive a panic in `perform_request_with_target`. But a panic in `build_tls_test_pair` (e.g. `rcgen::generate_simple_self_signed` failing under a future rcgen API change) or in `unwrap_response` after the test config is set would leak the override. RAII guard makes the cleanup unconditional. Optional — neither has hit production. Flagging because the cost is small and the future-flake protection is real. ### Smaller observations - **`scripted-response check happens BEFORE the cache.** So a scripted response bypasses the cache entirely (it doesn't populate the cache, and a real lookup after a scripted one wouldn't be served from the cache as a side effect of the scripted one). The intent is "test deterministic" — correct — but worth noting that scripted responses are *not* equivalent to "pretend the cache returned this." If a future test wants to test cache-write behaviour, it'd need to populate the cache directly, not via `push_scripted_response`. - **`ServerMode::CloseOnce`'s server-side flow uses `return` to drop the stream and send FIN.** Reading the code, the `return` exits the inner per-connection `move ||` (which is a may coroutine spawned for each accept). Dropping the stream sends FIN. The outer accept loop continues. This is correct. The comment captures the intent. - **`https_round_trip_against_same_process_rustls_server` doesn't `pool::clear_for_test()` at end.** The test pools the HTTPS connection (if at all — server emits `Connection: close`, so probably not). On the off chance a pooled entry survives, it would leak into the next test. Optional cleanup. - **`build_tls_test_pair` returns `Arc<rustls::ServerConfig>` and `Arc<rustls::ClientConfig>`** — both Arc'd, which lets the server config be cloned per-connection inside `spawn_tls_test_server`. Right shape. ### Verdict LGTM after one fix and one hardening: - **Required**: strengthen the Connection: close test to assert pool state directly (currently the test name and comment describe a property the test doesn't pin — `is_reusable`'s FIN detection masks the regression on localhost). - **Recommended**: serial tag + RAII guard for the HTTPS test. The DNS-rebinding closure anchor is exactly the test the PR4 review asked for, well-executed. The TLS happy-path test is a solid hermetic e2e against a same-process rustls server, with the right scheduler/provider choices and good in-code documentation. The DNS hooks themselves are clean and don't leak into release. Closes a real coverage gap in the PR1–PR5 networking arc.
● Review-driven fixes applied. CI green.
All checks were successful
CI - Linux / CI - Linux x86_64 (pull_request) Successful in 10m47s
1845ebcef3
Done:

  1. Required — Connection: close test now pins the property it claims. The reviewer's analysis was correct: on
  localhost, is_reusable's POLLIN check catches the FIN sub-microsecond, so a regression where pool::release
  ignored keep_alive=false would silently pass the old accept-count assertion. Fix:
    - New pub(crate) fn pool::idle_count_for_test(&PoolKey) -> usize test-only accessor.
    - Test now asserts idle_count_for_test(&key) == 0 immediately after r1 (before r2 dials). That's the targeted
  release-path property, independent of is_reusable's downstream rescue.
    - Kept the original accept_count == 2 assertion as a compound-regression catch with a comment explaining it's
  secondary to the new pool-state assertion.
  2. Recommended — HTTPS test hardening:
    - #[serial_test::serial(tls_global_state)] tag so a future second TLS test can't clobber TEST_TLS_CONFIG
  mid-flight.
    - TestTlsConfigGuard RAII type wrapping clear_test_tls_config() in Drop. Installed right after
  install_test_tls_config and dropped on every exit path — including panic. The earlier clear_test_tls_config()
  before assertions worked for normal failure modes but leaked the override on a panic in build_tls_test_pair,
  unwrap_response, etc.
    - Defensive pool::clear_for_test() at the end. The server sends Connection: close so nothing should pool, but
  cheap belt-and-suspenders against future change.

  Skipped (reviewer marked optional):
  - pub(crate) mod test_hooks { ... } narrowing for the dns test helpers (small gain, optional).
  - macOS multicast platform note (informational; CI is Linux-only).
Author
Owner

Second pass — review-driven fixes look right

Walked the delta from a88b534..1845ebc. Both items landed cleanly.

Pool-state anchor for the Connection: close test

pool::idle_count_for_test(&PoolKey) is the right test-only accessor — #[cfg(test)]-gated, takes the lock the same way checkout/release do, returns 0 for a missing key. The PoolKey construction in the test (scheme: Http, host: "127.0.0.1", port) matches what perform_validated builds at request.rs:53 (scheme: target.scheme, host: target.host.clone(), port: target.port), and loopback_target produces exactly those fields. So the test's key and the runtime's key collide as intended.

The assertion ordering is right: key is built before r1, the assertion runs immediately after r1 returns and before r2 dials. That window is when the targeted invariant ("release with keep_alive=false must drop the entry") is testable — after r2 dials, pool state would be muddied by the second cycle's release.

I traced both branches against the new assertion:

  • Regression (release pools despite keep_alive=false): after r1, pool.idle[key].len() == 1. idle_count_for_test returns 1. Assertion fails loudly with the targeted message. ✓ Now caught.
  • No regression: release(..., keep_alive=false) early-returns at pool.rs:100. Entry not pooled. idle_count_for_test returns 0. Assertion passes. ✓

The accept_count == 2 assertion is kept correctly as a secondary check for compound regressions (both release and is_reusable broken). The updated failure message is more honest — it no longer claims to catch what the new assertion catches. Good restructuring.

One small thing worth flagging (not for fixing): the test now reaches into pool::PoolKey's pub(crate) fields to build the key by hand. That couples this test to the pool's internal struct shape. If PoolKey ever gains a fourth field (e.g. sni: Option<String> for future HTTPS host-vs-SNI distinction), this test breaks at compile time — which is actually the right failure mode (test is forced to update), so this isn't a problem, just a coupling worth knowing about.

RAII guard + serial tag for the HTTPS test

The TestTlsConfigGuard placement is precise:

  • After install_test_tls_config(client_config) — because before install, there's no state to clean up.
  • Before any code that might panic on the way to the explicit clear — which used to be the only cleanup path.

The doc comment captures this ordering ("Must be created AFTER install ... and BEFORE any panicking code below.") so a future reader doesn't reorder out of "tidiness" and reopen the leak. Good defensive comment.

The #[serial_test::serial(tls_global_state)] tag is right. No other test currently mutates TEST_TLS_CONFIG, but the group is named so it's discoverable for the next TLS-touching test. Self-documenting.

The trailing pool::clear_for_test() is the belt-and-suspenders the comment claims — the server emits Connection: close so the pool shouldn't have an entry, but the cleanup is unconditional. The // _tls_guard drops here trailing comment is exactly the kind of "noise that earns its keep" — it stops a future reader from moving the guard or wondering why the explicit clear is gone.

Skipped items

Both correctly judged optional. The test_hooks mod narrowing was a "small gain" cosmetic suggestion; the macOS multicast note was informational and the CI is Linux-only. No issue.

Verdict

LGTM. Ship it.

The Connection: close test now pins the release-path property directly via pool state, independent of any downstream rescue. The HTTPS test survives panics on every reachable path. The PR1–PR5 test-coverage arc is properly closed.

## Second pass — review-driven fixes look right Walked the delta from `a88b534..1845ebc`. Both items landed cleanly. ### Pool-state anchor for the Connection: close test `pool::idle_count_for_test(&PoolKey)` is the right test-only accessor — `#[cfg(test)]`-gated, takes the lock the same way `checkout`/`release` do, returns 0 for a missing key. The PoolKey construction in the test (`scheme: Http`, `host: "127.0.0.1"`, `port`) matches what `perform_validated` builds at request.rs:53 (`scheme: target.scheme`, `host: target.host.clone()`, `port: target.port`), and `loopback_target` produces exactly those fields. So the test's key and the runtime's key collide as intended. The assertion ordering is right: `key` is built before `r1`, the assertion runs immediately after `r1` returns and before `r2` dials. That window is when the targeted invariant ("release with `keep_alive=false` must drop the entry") is testable — after r2 dials, pool state would be muddied by the second cycle's release. I traced both branches against the new assertion: - **Regression (release pools despite `keep_alive=false`)**: after r1, `pool.idle[key].len() == 1`. `idle_count_for_test` returns 1. Assertion fails loudly with the targeted message. ✓ Now caught. - **No regression**: `release(..., keep_alive=false)` early-returns at pool.rs:100. Entry not pooled. `idle_count_for_test` returns 0. Assertion passes. ✓ The `accept_count == 2` assertion is kept correctly as a secondary check for compound regressions (both `release` and `is_reusable` broken). The updated failure message is more honest — it no longer claims to catch what the new assertion catches. Good restructuring. One small thing worth flagging (not for fixing): the test now reaches into `pool::PoolKey`'s `pub(crate)` fields to build the key by hand. That couples this test to the pool's internal struct shape. If `PoolKey` ever gains a fourth field (e.g. `sni: Option<String>` for future HTTPS host-vs-SNI distinction), this test breaks at compile time — which is actually the right failure mode (test is forced to update), so this isn't a problem, just a coupling worth knowing about. ### RAII guard + serial tag for the HTTPS test The `TestTlsConfigGuard` placement is precise: - **After** `install_test_tls_config(client_config)` — because before install, there's no state to clean up. - **Before** any code that might panic on the way to the explicit `clear` — which used to be the only cleanup path. The doc comment captures this ordering ("Must be created AFTER install ... and BEFORE any panicking code below.") so a future reader doesn't reorder out of "tidiness" and reopen the leak. Good defensive comment. The `#[serial_test::serial(tls_global_state)]` tag is right. No other test currently mutates `TEST_TLS_CONFIG`, but the group is named so it's discoverable for the next TLS-touching test. Self-documenting. The trailing `pool::clear_for_test()` is the belt-and-suspenders the comment claims — the server emits `Connection: close` so the pool shouldn't have an entry, but the cleanup is unconditional. The `// _tls_guard drops here` trailing comment is exactly the kind of "noise that earns its keep" — it stops a future reader from moving the guard or wondering why the explicit clear is gone. ### Skipped items Both correctly judged optional. The `test_hooks` mod narrowing was a "small gain" cosmetic suggestion; the macOS multicast note was informational and the CI is Linux-only. No issue. ### Verdict LGTM. Ship it. The Connection: close test now pins the release-path property directly via pool state, independent of any downstream rescue. The HTTPS test survives panics on every reachable path. The PR1–PR5 test-coverage arc is properly closed.
navicore deleted branch issue-483 2026-05-14 13:03:19 +00:00
Sign in to join this conversation.
No description provided.