● All three test anchors landed. CI fully green. #487
No reviewers
Labels
No labels
bug
dependencies
documentation
duplicate
enhancement
good first issue
help wanted
invalid
question
refactor
rust
technical-debt
wontfix
No milestone
No project
No assignees
1 participant
Notifications
Due date
No due date set.
Dependencies
No dependencies set.
Reference
navicore/patch-seq!487
Loading…
Add table
Add a link
Reference in a new issue
No description provided.
Delete branch "issue-483"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Done for issue #483:
push_scripted_response / clear_scripted_responses / reset_resolve_call_count / resolve_call_count helpers.
resolve() increments the counter on every call and pops from the scripted queue (when non-empty) before
consulting cache or worker pool. Zero footprint in release builds.
microseconds), issues perform_request("GET", "http://example.com/", None), asserts dns::resolve_call_count() ==
fail this test loudly. Cost: ~140µs. Serialised with test_ssrf_allows_public_urls via
#[serial_test::serial(dns_global_state)] so concurrent counter mutations can't false-positive.
then returns from the inner spawn (dropping stream, sending FIN). Test issues two requests, asserts accept_count
== 2 (proving the pool dropped the close-marked entry instead of trying to reuse it). accept_count == 1 would
mean pool::release ignored keep_alive=false.
- rcgen added as dev-dep with default-features = false, features = ["pem", "ring"] so the cert toolchain
shares the runtime's existing ring provider.
- #[cfg(test)] install_test_tls_config / clear_test_tls_config hooks in tls.rs, with a new
current_tls_config() helper that the test override transparently routes through.
- build_tls_test_pair() generates a self-signed cert (SAN=localhost) at test time and produces matching
ServerConfig + ClientConfig (where the client trusts the test cert as the only root).
- spawn_tls_test_server uses std::thread + std::net::TcpListener instead of may. The TLS handshake is
multi-round-trip; running the server as a may coroutine alongside hundreds of other parallel tests saturated the
may worker pool and starved the handshake. A dedicated OS thread keeps the kernel-scheduled server runnable
regardless of may load. The test passes in 0.13s standalone AND under the full just ci runtime suite.
Review — PR #487 (test anchors for un-pinned architectural properties, closes #483)
Reviewed against
origin/issue-483. Three test anchors landed for the deferred items from PR1–PR5 reviews. One of them pins the property cleanly; one has clever fast-fail-via-multicast that I want to walk through in detail; one doesn't actually anchor what it claims to anchor because of a downstream code path that masks the regression on localhost. The test infrastructure (cfg(test) hooks, rcgen dev-dep, std::thread server) is well-designed.Solid: DNS hooks design (dns.rs)
The
RESOLVE_CALL_COUNT+SCRIPTED_RESPONSESpair is the right shape and the gating is correct:#[cfg(test)]— zero footprint in release builds.AtomicUsizewithSeqCst— fine for the assertion granularity needed; nothing here would benefit fromRelaxed.RESOLVE_CALL_COUNT.fetch_addlands at the top ofresolve()(before the empty-hostname short-circuit), so an empty-hostname call still bumps the counter. Reasonable — it lets a future test catch "validator passes empty host through to resolve" regressions.One nit on visibility:
pub(crate)for the four test helpers, which means they're crate-visible at all times in the test build. If someone intcp.rsor elsewhere imports and usesdns::push_scripted_responseoutside#[cfg(test)]it won't compile (the function is gated), but the IDE will offer it as completion. Apub(crate)inside a#[cfg(test)] mod test_hooks { ... }would narrow the surface more, at the cost of an extra layer. Optional.Solid: SSRF rebinding closure test (anchors the architectural win)
This is the test the PR4 second-pass review specifically asked for, and the anchoring is correct. I traced the property:
perform_request("GET", "http://example.com/", None)→validate_url→resolve_to_ips("example.com")→resolve("example.com")(counter += 1) → scripted-response pop returns["224.0.0.1"].validate_urlruns the SSRF check on[224.0.0.1].is_dangerous_ipv4(224.0.0.1)is false (not loopback, not 10.x, not 172.16-31, not 192.168, not 169.254, not broadcast — multicast isn't inis_dangerous_ipv4's list). Sovalidate_urlreturnsOkwithaddrs=[224.0.0.1],host="example.com".perform_request_with_targetdials 224.0.0.1:80. Kernel returns ENETUNREACH fast (microseconds on Linux). Connect path returns error.dns::resolve_call_count()returns 1. ✓The regression target — passing
host: Stringto connect and re-resolving — would bump the counter to ≥ 2 and fail loudly. Good lock-stop.A few observations:
The multicast-vs-TEST-NET-1 trade-off note in the test is exactly right. TEST-NET-1 (192.0.2.0/24) routes through the default gateway on boxes that don't explicitly null-route it and burns the SYN timeout (~60s). 224.0.0.1 gets a fast-path kernel rejection on Linux. The comment captures the trade-off so the next reader doesn't "simplify" by switching ranges. Worth keeping.
Platform sensitivity is implicit. Linux's TCP-to-multicast fast-fail is the property the test relies on for cheap iteration. The CI is Linux per the project history; if this ever wants to run on macOS/BSD, the multicast behaviour may differ (could take longer). Not a problem now — flagging only.
Serial-tag coverage is correct. I walked through every test that calls into SSRF / DNS:
test_ssrf_blocks_localhost(etc.) — localhost rejected by the lowercase string check invalidate_urlbeforeresolve. No counter bump.127.0.0.1/10.x/192.168.x/169.254.x/ multicast literal tests —resolve_to_ipsshort-circuits via the IP-literal fast path. No counter bump.gopher://,ftp://,file:///) — rejected at scheme parse. No counter bump.test_ssrf_allows_public_urls— callsvalidate_url("https://example.com/")and"https://httpbin.org/get", both hostname (non-literal). Realresolve()calls. Bumps counter. Correctly tagged#[serial_test::serial(dns_global_state)].So only those two tests bump the counter, and they're both in the same serial group. Tagging is right.
Concerning: Connection: close pool-eviction test doesn't catch the targeted regression
The assertion message says:
"accept_count=1 means the pool ignored keep_alive=false and reused a doomed entry."But this isn't true on localhost. Walk both branches:
No regression (correct release):
release(key, stream, keep_alive=false). Early return — entry dropped.checkout(key)— empty queue, returns None.accept_count == 2. ✓Targeted regression (release ignores keep_alive=false):
release(key, stream, keep_alive=false)— buggy: pools the entry anyway.checkout(key)— pops entry. Callsis_reusable(fd). The server thread has returned, dropped its TcpStream, and the kernel has sent FIN. On localhost FIN propagates in microseconds — by the timecheckoutruns,poll(POLLIN, 0)returns rc=1 with POLLIN set and 0 bytes pending.is_reusablecorrectly returns false. Entry discarded.accept_count == 2. ✓ (same!)The test passes in both branches. The only way to get
accept_count == 1is a compound regression where BOTHreleaseignoreskeep_alive=falseANDis_reusablemisses the FIN (or the FIN is genuinely delayed pastcheckout). On localhost the second condition is essentially impossible — kernel FIN propagation is sub-microsecond.The test does catch something (the compound regression, or a regression on a slow network), but it doesn't lock-step the property the test name and comment claim. A future "release: pool everything regardless of keep_alive" regression would slip through CI because
is_reusablerescues it on the localhost test box.Suggested fix — assert directly on pool state. Expose a
pool::total_for_test()(orpool::idle_count_for_test(key)) accessor and assert it's 0 between r1 and r2. That captures the actual release-path behaviour without depending on a downstream check. Something like:This is ~15 lines of accessor plus a
PoolKeytest-helper to construct the key. Worth the extra mile because the test as written reads like a regression anchor but functions like a compound-failure anchor.(Caveat: the suggested accessor needs a
pub(crate) fn pool_key_for_test(scheme, host, port) -> PoolKeyor similar —PoolKeyis private to the pool module. Pick whichever is least surface-area to expose.)Good with one hardening: Happy-path HTTPS test
The architecture is well-thought-through. Specifically:
rcgenas dev-dep withdefault-features = false, features = ["pem", "ring"]is the right choice — it pins the cert toolchain to the sameringprovider rustls uses, avoiding a dual-provider build (which would have surfaced as a "no provider installed" runtime panic from rustls).std::thread+std::net::TcpListenerfor the server, with the multi-paragraph comment explaining the may-scheduler-congestion reason, is exactly right. The reasoning ("the TLS handshake is a multi-round-trip dance ... saturating the may worker pool starves the handshake") is real — I've debugged the same shape elsewhere. Future readers will want to "clean up" by switching to may; the comment prevents that.cfg(test)trust-root override viainstall_test_tls_config/current_tls_config()is minimal, vanishes from release, and the indirection throughcurrent_tls_config()keepsbuild_tls's call site idiomatic.build_tls_test_paircorrectly adds the self-signed cert to the client'sRootCertStoreas its only root. The SAN=localhost matches the SNI the client sends (host: "localhost"in the ValidatedTarget). The test asserts on response body, not just status — so a regression that returns a valid 200 but loses the body would still fail. Good.Hardening — serialise the test and guard against panic-leaks. The override mutates process-wide
TEST_TLS_CONFIG. Two improvements:Tag the test
#[serial_test::serial(tls_global_state)]. No other test currently uses TLS configs, but the moment a second TLS test lands without coordination, this is a flake source.Use an RAII guard for
clear_test_tls_config:Today the test calls
clear_test_tls_config()before the assertions to survive a panic inperform_request_with_target. But a panic inbuild_tls_test_pair(e.g.rcgen::generate_simple_self_signedfailing under a future rcgen API change) or inunwrap_responseafter the test config is set would leak the override. RAII guard makes the cleanup unconditional.Optional — neither has hit production. Flagging because the cost is small and the future-flake protection is real.
Smaller observations
scripted-response check happens BEFORE the cache.** So a scripted response bypasses the cache entirely (it doesn't populate the cache, and a real lookup after a scripted one wouldn't be served from the cache as a side effect of the scripted one). The intent is "test deterministic" — correct — but worth noting that scripted responses are *not* equivalent to "pretend the cache returned this." If a future test wants to test cache-write behaviour, it'd need to populate the cache directly, not viapush_scripted_response`.ServerMode::CloseOnce's server-side flow usesreturnto drop the stream and send FIN. Reading the code, thereturnexits the inner per-connectionmove ||(which is a may coroutine spawned for each accept). Dropping the stream sends FIN. The outer accept loop continues. This is correct. The comment captures the intent.https_round_trip_against_same_process_rustls_serverdoesn'tpool::clear_for_test()at end. The test pools the HTTPS connection (if at all — server emitsConnection: close, so probably not). On the off chance a pooled entry survives, it would leak into the next test. Optional cleanup.build_tls_test_pairreturnsArc<rustls::ServerConfig>andArc<rustls::ClientConfig>— both Arc'd, which lets the server config be cloned per-connection insidespawn_tls_test_server. Right shape.Verdict
LGTM after one fix and one hardening:
is_reusable's FIN detection masks the regression on localhost).The DNS-rebinding closure anchor is exactly the test the PR4 review asked for, well-executed. The TLS happy-path test is a solid hermetic e2e against a same-process rustls server, with the right scheduler/provider choices and good in-code documentation. The DNS hooks themselves are clean and don't leak into release.
Closes a real coverage gap in the PR1–PR5 networking arc.
Done: 1. Required — Connection: close test now pins the property it claims. The reviewer's analysis was correct: on localhost, is_reusable's POLLIN check catches the FIN sub-microsecond, so a regression where pool::release ignored keep_alive=false would silently pass the old accept-count assertion. Fix: - New pub(crate) fn pool::idle_count_for_test(&PoolKey) -> usize test-only accessor. - Test now asserts idle_count_for_test(&key) == 0 immediately after r1 (before r2 dials). That's the targeted release-path property, independent of is_reusable's downstream rescue. - Kept the original accept_count == 2 assertion as a compound-regression catch with a comment explaining it's secondary to the new pool-state assertion. 2. Recommended — HTTPS test hardening: - #[serial_test::serial(tls_global_state)] tag so a future second TLS test can't clobber TEST_TLS_CONFIG mid-flight. - TestTlsConfigGuard RAII type wrapping clear_test_tls_config() in Drop. Installed right after install_test_tls_config and dropped on every exit path — including panic. The earlier clear_test_tls_config() before assertions worked for normal failure modes but leaked the override on a panic in build_tls_test_pair, unwrap_response, etc. - Defensive pool::clear_for_test() at the end. The server sends Connection: close so nothing should pool, but cheap belt-and-suspenders against future change. Skipped (reviewer marked optional): - pub(crate) mod test_hooks { ... } narrowing for the dns test helpers (small gain, optional). - macOS multicast platform note (informational; CI is Linux-only).Second pass — review-driven fixes look right
Walked the delta from
a88b534..1845ebc. Both items landed cleanly.Pool-state anchor for the Connection: close test
pool::idle_count_for_test(&PoolKey)is the right test-only accessor —#[cfg(test)]-gated, takes the lock the same waycheckout/releasedo, returns 0 for a missing key. The PoolKey construction in the test (scheme: Http,host: "127.0.0.1",port) matches whatperform_validatedbuilds at request.rs:53 (scheme: target.scheme,host: target.host.clone(),port: target.port), andloopback_targetproduces exactly those fields. So the test's key and the runtime's key collide as intended.The assertion ordering is right:
keyis built beforer1, the assertion runs immediately afterr1returns and beforer2dials. That window is when the targeted invariant ("release withkeep_alive=falsemust drop the entry") is testable — after r2 dials, pool state would be muddied by the second cycle's release.I traced both branches against the new assertion:
keep_alive=false): after r1,pool.idle[key].len() == 1.idle_count_for_testreturns 1. Assertion fails loudly with the targeted message. ✓ Now caught.release(..., keep_alive=false)early-returns at pool.rs:100. Entry not pooled.idle_count_for_testreturns 0. Assertion passes. ✓The
accept_count == 2assertion is kept correctly as a secondary check for compound regressions (bothreleaseandis_reusablebroken). The updated failure message is more honest — it no longer claims to catch what the new assertion catches. Good restructuring.One small thing worth flagging (not for fixing): the test now reaches into
pool::PoolKey'spub(crate)fields to build the key by hand. That couples this test to the pool's internal struct shape. IfPoolKeyever gains a fourth field (e.g.sni: Option<String>for future HTTPS host-vs-SNI distinction), this test breaks at compile time — which is actually the right failure mode (test is forced to update), so this isn't a problem, just a coupling worth knowing about.RAII guard + serial tag for the HTTPS test
The
TestTlsConfigGuardplacement is precise:install_test_tls_config(client_config)— because before install, there's no state to clean up.clear— which used to be the only cleanup path.The doc comment captures this ordering ("Must be created AFTER install ... and BEFORE any panicking code below.") so a future reader doesn't reorder out of "tidiness" and reopen the leak. Good defensive comment.
The
#[serial_test::serial(tls_global_state)]tag is right. No other test currently mutatesTEST_TLS_CONFIG, but the group is named so it's discoverable for the next TLS-touching test. Self-documenting.The trailing
pool::clear_for_test()is the belt-and-suspenders the comment claims — the server emitsConnection: closeso the pool shouldn't have an entry, but the cleanup is unconditional. The// _tls_guard drops heretrailing comment is exactly the kind of "noise that earns its keep" — it stops a future reader from moving the guard or wondering why the explicit clear is gone.Skipped items
Both correctly judged optional. The
test_hooksmod narrowing was a "small gain" cosmetic suggestion; the macOS multicast note was informational and the CI is Linux-only. No issue.Verdict
LGTM. Ship it.
The Connection: close test now pins the release-path property directly via pool state, independent of any downstream rescue. The HTTPS test survives panics on every reachable path. The PR1–PR5 test-coverage arc is properly closed.