Flip #367
No reviewers
Labels
No labels
bug
dependencies
documentation
duplicate
enhancement
good first issue
help wanted
invalid
question
refactor
rust
technical-debt
wontfix
No milestone
No project
No assignees
1 participant
Notifications
Due date
No due date set.
Dependencies
No dependencies set.
Reference
navicore/patch-seq!367
Loading…
Add table
Add a link
Reference in a new issue
No description provided.
Delete branch "flip"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
PR Review posted via Claude Code - see full review below
PR Review: Flip (tagged-ptr default + Arc + specialization re-enable). This is a significant correctness and performance PR. Full analysis follows. WHAT THIS PR DOES: (1) Promotes tagged-ptr from opt-in WIP to default in all three crates. (2) Replaces Box with Arc for O(1) clone. (3) Re-enables specialization, which was blanket-disabled under tagged-ptr. (4) Updates benchmark results (fanout only). --- CORRECTNESS - stack_new.rs Arc migration: The unsafe code is correct. clone_stack_value uses the canonical Arc::from_raw + Arc::clone + std::mem::forget + Arc::into_raw pattern. drop_stack_value correctly frees only when the last reference drops. Concern: stack_value_to_value changed semantics under sharing. Under Box, pop() was always O(1). Under Arc, if refcount > 1, try_unwrap silently falls back to a deep clone -- pop is no longer always O(1) for heap types. Document this in the Safety block. Also: tagged_stack_new.rs and stack_new.rs line 9 both still reference Box in encoding table comments; update to Arc. --- CORRECTNESS - Re-enabling specialization: Removing the early-return guard in can_specialize is well-reasoned. Specialization works in registers on Int/Float/Bool only. is_trivially_copyable_at_current_stmt in types.rs correctly excludes Float in tagged-ptr mode, so the fast-path dup only fires for Int and Bool. Open question: when a specialized word returns a Float to a non-specialized caller, does the ABI transition correctly box it into a tagged Arc pointer? A test for the specialized-to-interpreter Float handoff would confirm this. --- BENCHMARK RESULTS - incomplete run and format bug: Only fanout was benchmarked. CI check-bench-freshness validates only the timestamp, not which suites ran. Result files for fibonacci, primes, pingpong, skynet, and collections were deleted without replacements. The commit claims 122x speedup for fibonacci and 84x for primes -- please run 'just bench' and commit complete results before merging. Column-order bug in run.sh line 115 (cut -d: -f5): Go/Rust/Python output is BENCH:fanout:throughput-100k:COUNT:TIME_MS so field 5 is time_ms (correct). But Seq fanout_seq.txt shows BENCH:fanout:throughput-100k:23:100000 where field 4 is the time (23ms) and field 5 is the count (100000). So cut -f5 reads 100000 for Seq instead of 23ms, making the comparison table wrong. Fix seq.seq to emit count:time matching the other languages, or update run.sh. --- FEATURE FLAG PROPAGATION: The chain is now consistent across all three crates. Adding tagged-ptr to seq-runtime defaults (not only seq-compiler) is the key fix -- without it, seq-runtime built in isolation would use the 40-byte layout while the compiler expected 8-byte. --- TEST COVERAGE: Existing roundtrip tests are solid. Missing Arc-specific coverage: (1) Clone under sharing: dup a value, pop both copies, verify no double-free. (2) Specialization-to-interpreter Float handoff test. (3) Miri run to confirm no use-after-free in the clone_stack_value + drop_stack_value path. --- SUMMARY: Box -> Arc is the right design and the unsafe code is correct. Main concerns before merging: (1) Run full benchmark suite and commit all result files (five suites deleted). (2) Fix Seq benchmark column order -- fanout_seq.txt has time:count while others have count:time, so cut -f5 reads the wrong field for Seq. (3) Add a clone/drop test for the shared-Arc fallback in stack_value_to_value.
PR Review: Flip (tagged-ptr as default + Box→Arc)
This is a well-motivated change. Switching
Box<Value>toArc<Value>is the right call for enabling O(1) clones on the stack, and the benchmark numbers bear it out — collections speedup is significant (43s→18s on build-100k). Here are my observations:Code quality
crates/core/src/stack_new.rsThe
clone_stack_valueimplementation is correct:The
std::mem::forgetto avoid decrementing the original's refcount is the right pattern here.The
stack_value_to_valuefallback is also correct — whentry_unwrapfails (multiple Arc references exist), cloning the inner value is the only safe option. Worth a comment explaining when this can happen (e.g. a value duplicated on the stack that hasn't been dropped yet).crates/compiler/src/codegen/specialization.rsThe removal of the specialization guard is the crux of this PR. The old comment was:
With Arc that concern goes away — refcount bumps make aliasing safe. However, the guard was also labeled
WIPimplying specialization may not have been fully tested with tagged-ptr. Are there existing tests covering specialization paths with heap-allocated types (Float, String, etc.) in tagged-ptr mode? If not, a targeted test would strengthen confidence here.Potential concern:
Value: Send + SyncArc<T>requiresT: Send + Sync. IfValuecontains anyRc,Cell,RefCell, or raw pointers without explicitSend/Syncimpls, this will fail to compile — which is good (compile-time enforcement). Just worth confirming the build passes cleanly across all feature combinations.Benchmark files
Bug fix in
benchmarks/fanout/seq.seq: Theswapremoval is correct. The old code was outputtingBENCH:fanout:throughput-100k:{time}:{result}(swapped), but the format should beresult:time_ms. The old baselinefanout_seq.txthad100000:2flipped to2:100000— this fix aligns it with the correct format. The new Seq fanout result of 3ms (vs Go 33ms, Rust 8ms) looks legitimately good and now trustworthy.New baseline files: The Go and Rust baseline files appear to match the results files exactly (same run?). That's fine for initial cross-language baselines, but worth noting that
baseline/andresults/are now diverged in intent —baseline/seems to represent "reference implementation" numbers whileresults/tracks Seq progress. A brief note inbenchmarks/README.mdclarifying the distinction between the two directories would help future contributors.skynet_seq.txt: The Seq skynet time increased (3241ms → 3918ms) between baseline and results. The README correctly attributes this to mmap-per-coroutine spawn overhead, which is a known issue separate from this PR.Minor
README.mdnote "May coroutine spawn overhead" appears to be a typo — likely "May's coroutine spawn overhead" (May being the coroutine library/scheduler used).Summary
The core change (Box→Arc) is sound and the performance results are compelling. The main thing to verify before merging is that specialization correctness with heap types in tagged-ptr mode is covered by tests. Everything else looks good.
PR Review: Flip (
tagged-ptrdefault + Box→Arc migration)This is a significant, well-executed change. Flipping
tagged-ptrto default and switching heap values fromBox<Value>toArc<Value>delivers transformative performance gains while fixing a real correctness issue (the dup fast-path double-free). Overall the implementation looks sound. A few notes below:✅ What's well done
Arc ownership model is correct throughout:
clone_stack_value:Arc::from_raw+Arc::clone+std::mem::forgetcorrectly bumps the refcount without decrementing the original. No double-free.drop_stack_value:let _ = Arc::from_raw(...)correctly decrements and frees on last reference.stack_value_to_value:Arc::try_unwrapwith clone fallback is the right pattern — cheap unwrap when sole owner, graceful clone when dup'd refs exist.Feature propagation fix is important. Before this PR,
tagged-ptrwas default incompilerbut notcore/runtime, which could cause subtle mismatches between crates. Making it uniform is correct.Specialization re-enablement is justified. The guard that disabled specialization in tagged-ptr mode was added defensively after the dup fast-path crashes. Now that the dup path is fixed (Box→Arc), the original concern is resolved and specialization (which operates in registers) was always safe.
Fanout benchmark bug fix (swapped
result:time→time:resultfields) was a real pre-existing bug that was masking the actual numbers. Good catch.🔴 Issues
Stale module-level doc comment (
stack_new.rs:9):Should be
Arc<Value>now. Minor, but it's the first thing a reader sees.🟡 Observations worth noting
Arc::try_unwrapclone behavior on dup'd values:When a value is
dup'd (refcount = 2) and thenpop'd,try_unwrapfails and falls back to cloning the inner value. This is correct and the comment explains it well. The tradeoff is that every pop of a dup'd heap value pays a clone. For workloads that heavily dup/pop heap values in tight loops, this could be a hidden cost — worth monitoring if channel-heavy benchmarks regress in the future.Collections regression is acknowledged, but the number is notable:
build-100kwent from 15,664ms → 18,110ms (+15.7%), and this was already a known weak area. The commit message notes "COW optimization planned" — it would help to have a tracking issue linked so this doesn't slip.Value: Send + Syncassumption:The
Arc<Value>approach requiresValue: Send + Syncfor safe use across May coroutines. The commit message says this is "enforced by explicit unsafe impl in value.rs — build confirms." Worth verifying this also covers any newly-reachable code paths through specialization (since specialization was just re-enabled).🟢 Test coverage
Existing roundtrip tests in
stack_new.rscoverFloat,String,Symbol,Variant,Map,Quotation,Closure,Channel,WeaveCtx— all the heap types. There's no explicit test for theclone_stack_value+stack_value_to_valuerefcount path (i.e., push a float, dup it, pop both, verify no double-free or corruption), but the integration benchmark results serve as a functional stress test. A unit test for the dup→pop→pop flow on a heap type would be a good addition to guard the Arc invariants going forward.Summary
The core Arc migration is correct and the performance numbers are impressive. Fix the stale doc comment, and consider adding a link to a follow-up issue for the COW optimization. Otherwise this looks ready to merge.