Flip #367

Merged
navicore merged 5 commits from flip into main 2026-03-23 23:36:04 +00:00
navicore commented 2026-03-23 21:48:40 +00:00 (Migrated from github.com)
No description provided.
claude[bot] commented 2026-03-23 21:55:35 +00:00 (Migrated from github.com)

PR Review posted via Claude Code - see full review below

PR Review posted via Claude Code - see full review below
claude[bot] commented 2026-03-23 21:56:24 +00:00 (Migrated from github.com)

PR Review: Flip (tagged-ptr default + Arc + specialization re-enable). This is a significant correctness and performance PR. Full analysis follows. WHAT THIS PR DOES: (1) Promotes tagged-ptr from opt-in WIP to default in all three crates. (2) Replaces Box with Arc for O(1) clone. (3) Re-enables specialization, which was blanket-disabled under tagged-ptr. (4) Updates benchmark results (fanout only). --- CORRECTNESS - stack_new.rs Arc migration: The unsafe code is correct. clone_stack_value uses the canonical Arc::from_raw + Arc::clone + std::mem::forget + Arc::into_raw pattern. drop_stack_value correctly frees only when the last reference drops. Concern: stack_value_to_value changed semantics under sharing. Under Box, pop() was always O(1). Under Arc, if refcount > 1, try_unwrap silently falls back to a deep clone -- pop is no longer always O(1) for heap types. Document this in the Safety block. Also: tagged_stack_new.rs and stack_new.rs line 9 both still reference Box in encoding table comments; update to Arc. --- CORRECTNESS - Re-enabling specialization: Removing the early-return guard in can_specialize is well-reasoned. Specialization works in registers on Int/Float/Bool only. is_trivially_copyable_at_current_stmt in types.rs correctly excludes Float in tagged-ptr mode, so the fast-path dup only fires for Int and Bool. Open question: when a specialized word returns a Float to a non-specialized caller, does the ABI transition correctly box it into a tagged Arc pointer? A test for the specialized-to-interpreter Float handoff would confirm this. --- BENCHMARK RESULTS - incomplete run and format bug: Only fanout was benchmarked. CI check-bench-freshness validates only the timestamp, not which suites ran. Result files for fibonacci, primes, pingpong, skynet, and collections were deleted without replacements. The commit claims 122x speedup for fibonacci and 84x for primes -- please run 'just bench' and commit complete results before merging. Column-order bug in run.sh line 115 (cut -d: -f5): Go/Rust/Python output is BENCH:fanout:throughput-100k:COUNT:TIME_MS so field 5 is time_ms (correct). But Seq fanout_seq.txt shows BENCH:fanout:throughput-100k:23:100000 where field 4 is the time (23ms) and field 5 is the count (100000). So cut -f5 reads 100000 for Seq instead of 23ms, making the comparison table wrong. Fix seq.seq to emit count:time matching the other languages, or update run.sh. --- FEATURE FLAG PROPAGATION: The chain is now consistent across all three crates. Adding tagged-ptr to seq-runtime defaults (not only seq-compiler) is the key fix -- without it, seq-runtime built in isolation would use the 40-byte layout while the compiler expected 8-byte. --- TEST COVERAGE: Existing roundtrip tests are solid. Missing Arc-specific coverage: (1) Clone under sharing: dup a value, pop both copies, verify no double-free. (2) Specialization-to-interpreter Float handoff test. (3) Miri run to confirm no use-after-free in the clone_stack_value + drop_stack_value path. --- SUMMARY: Box -> Arc is the right design and the unsafe code is correct. Main concerns before merging: (1) Run full benchmark suite and commit all result files (five suites deleted). (2) Fix Seq benchmark column order -- fanout_seq.txt has time:count while others have count:time, so cut -f5 reads the wrong field for Seq. (3) Add a clone/drop test for the shared-Arc fallback in stack_value_to_value.

PR Review: Flip (tagged-ptr default + Arc + specialization re-enable). This is a significant correctness and performance PR. Full analysis follows. WHAT THIS PR DOES: (1) Promotes tagged-ptr from opt-in WIP to default in all three crates. (2) Replaces Box<Value> with Arc<Value> for O(1) clone. (3) Re-enables specialization, which was blanket-disabled under tagged-ptr. (4) Updates benchmark results (fanout only). --- CORRECTNESS - stack_new.rs Arc migration: The unsafe code is correct. clone_stack_value uses the canonical Arc::from_raw + Arc::clone + std::mem::forget + Arc::into_raw pattern. drop_stack_value correctly frees only when the last reference drops. Concern: stack_value_to_value changed semantics under sharing. Under Box, pop() was always O(1). Under Arc, if refcount > 1, try_unwrap silently falls back to a deep clone -- pop is no longer always O(1) for heap types. Document this in the Safety block. Also: tagged_stack_new.rs and stack_new.rs line 9 both still reference Box<Value> in encoding table comments; update to Arc<Value>. --- CORRECTNESS - Re-enabling specialization: Removing the early-return guard in can_specialize is well-reasoned. Specialization works in registers on Int/Float/Bool only. is_trivially_copyable_at_current_stmt in types.rs correctly excludes Float in tagged-ptr mode, so the fast-path dup only fires for Int and Bool. Open question: when a specialized word returns a Float to a non-specialized caller, does the ABI transition correctly box it into a tagged Arc<Value> pointer? A test for the specialized-to-interpreter Float handoff would confirm this. --- BENCHMARK RESULTS - incomplete run and format bug: Only fanout was benchmarked. CI check-bench-freshness validates only the timestamp, not which suites ran. Result files for fibonacci, primes, pingpong, skynet, and collections were deleted without replacements. The commit claims 122x speedup for fibonacci and 84x for primes -- please run 'just bench' and commit complete results before merging. Column-order bug in run.sh line 115 (cut -d: -f5): Go/Rust/Python output is BENCH:fanout:throughput-100k:COUNT:TIME_MS so field 5 is time_ms (correct). But Seq fanout_seq.txt shows BENCH:fanout:throughput-100k:23:100000 where field 4 is the time (23ms) and field 5 is the count (100000). So cut -f5 reads 100000 for Seq instead of 23ms, making the comparison table wrong. Fix seq.seq to emit count:time matching the other languages, or update run.sh. --- FEATURE FLAG PROPAGATION: The chain is now consistent across all three crates. Adding tagged-ptr to seq-runtime defaults (not only seq-compiler) is the key fix -- without it, seq-runtime built in isolation would use the 40-byte layout while the compiler expected 8-byte. --- TEST COVERAGE: Existing roundtrip tests are solid. Missing Arc-specific coverage: (1) Clone under sharing: dup a value, pop both copies, verify no double-free. (2) Specialization-to-interpreter Float handoff test. (3) Miri run to confirm no use-after-free in the clone_stack_value + drop_stack_value path. --- SUMMARY: Box -> Arc is the right design and the unsafe code is correct. Main concerns before merging: (1) Run full benchmark suite and commit all result files (five suites deleted). (2) Fix Seq benchmark column order -- fanout_seq.txt has time:count while others have count:time, so cut -f5 reads the wrong field for Seq. (3) Add a clone/drop test for the shared-Arc fallback in stack_value_to_value.
claude[bot] commented 2026-03-23 22:03:22 +00:00 (Migrated from github.com)

PR Review: Flip (tagged-ptr as default + Box→Arc)

This is a well-motivated change. Switching Box<Value> to Arc<Value> is the right call for enabling O(1) clones on the stack, and the benchmark numbers bear it out — collections speedup is significant (43s→18s on build-100k). Here are my observations:

Code quality

crates/core/src/stack_new.rs

The clone_stack_value implementation is correct:

let arc = Arc::from_raw(sv as *const Value);
let cloned = Arc::clone(&arc);
std::mem::forget(arc); // Don't decrement the original
Arc::into_raw(cloned) as u64

The std::mem::forget to avoid decrementing the original's refcount is the right pattern here.

The stack_value_to_value fallback is also correct — when try_unwrap fails (multiple Arc references exist), cloning the inner value is the only safe option. Worth a comment explaining when this can happen (e.g. a value duplicated on the stack that hasn't been dropped yet).

crates/compiler/src/codegen/specialization.rs

The removal of the specialization guard is the crux of this PR. The old comment was:

Raw copies create double-frees (Box can't be aliased)

With Arc that concern goes away — refcount bumps make aliasing safe. However, the guard was also labeled WIP implying specialization may not have been fully tested with tagged-ptr. Are there existing tests covering specialization paths with heap-allocated types (Float, String, etc.) in tagged-ptr mode? If not, a targeted test would strengthen confidence here.

Potential concern: Value: Send + Sync

Arc<T> requires T: Send + Sync. If Value contains any Rc, Cell, RefCell, or raw pointers without explicit Send/Sync impls, this will fail to compile — which is good (compile-time enforcement). Just worth confirming the build passes cleanly across all feature combinations.

Benchmark files

Bug fix in benchmarks/fanout/seq.seq: The swap removal is correct. The old code was outputting BENCH:fanout:throughput-100k:{time}:{result} (swapped), but the format should be result:time_ms. The old baseline fanout_seq.txt had 100000:2 flipped to 2:100000 — this fix aligns it with the correct format. The new Seq fanout result of 3ms (vs Go 33ms, Rust 8ms) looks legitimately good and now trustworthy.

New baseline files: The Go and Rust baseline files appear to match the results files exactly (same run?). That's fine for initial cross-language baselines, but worth noting that baseline/ and results/ are now diverged in intent — baseline/ seems to represent "reference implementation" numbers while results/ tracks Seq progress. A brief note in benchmarks/README.md clarifying the distinction between the two directories would help future contributors.

skynet_seq.txt: The Seq skynet time increased (3241ms → 3918ms) between baseline and results. The README correctly attributes this to mmap-per-coroutine spawn overhead, which is a known issue separate from this PR.

Minor

  • The README.md note "May coroutine spawn overhead" appears to be a typo — likely "May's coroutine spawn overhead" (May being the coroutine library/scheduler used).

Summary

The core change (Box→Arc) is sound and the performance results are compelling. The main thing to verify before merging is that specialization correctness with heap types in tagged-ptr mode is covered by tests. Everything else looks good.

## PR Review: Flip (tagged-ptr as default + Box→Arc) This is a well-motivated change. Switching `Box<Value>` to `Arc<Value>` is the right call for enabling O(1) clones on the stack, and the benchmark numbers bear it out — collections speedup is significant (43s→18s on build-100k). Here are my observations: ### Code quality **`crates/core/src/stack_new.rs`** The `clone_stack_value` implementation is correct: ```rust let arc = Arc::from_raw(sv as *const Value); let cloned = Arc::clone(&arc); std::mem::forget(arc); // Don't decrement the original Arc::into_raw(cloned) as u64 ``` The `std::mem::forget` to avoid decrementing the original's refcount is the right pattern here. The `stack_value_to_value` fallback is also correct — when `try_unwrap` fails (multiple Arc references exist), cloning the inner value is the only safe option. Worth a comment explaining *when* this can happen (e.g. a value duplicated on the stack that hasn't been dropped yet). **`crates/compiler/src/codegen/specialization.rs`** The removal of the specialization guard is the crux of this PR. The old comment was: > Raw copies create double-frees (Box can't be aliased) With Arc that concern goes away — refcount bumps make aliasing safe. However, the guard was also labeled `WIP` implying specialization may not have been fully tested with tagged-ptr. Are there existing tests covering specialization paths with heap-allocated types (Float, String, etc.) in tagged-ptr mode? If not, a targeted test would strengthen confidence here. ### Potential concern: `Value: Send + Sync` `Arc<T>` requires `T: Send + Sync`. If `Value` contains any `Rc`, `Cell`, `RefCell`, or raw pointers without explicit `Send`/`Sync` impls, this will fail to compile — which is good (compile-time enforcement). Just worth confirming the build passes cleanly across all feature combinations. ### Benchmark files **Bug fix in `benchmarks/fanout/seq.seq`**: The `swap` removal is correct. The old code was outputting `BENCH:fanout:throughput-100k:{time}:{result}` (swapped), but the format should be `result:time_ms`. The old baseline `fanout_seq.txt` had `100000:2` flipped to `2:100000` — this fix aligns it with the correct format. The new Seq fanout result of 3ms (vs Go 33ms, Rust 8ms) looks legitimately good and now trustworthy. **New baseline files**: The Go and Rust baseline files appear to match the results files exactly (same run?). That's fine for initial cross-language baselines, but worth noting that `baseline/` and `results/` are now diverged in intent — `baseline/` seems to represent "reference implementation" numbers while `results/` tracks Seq progress. A brief note in `benchmarks/README.md` clarifying the distinction between the two directories would help future contributors. **`skynet_seq.txt`**: The Seq skynet time increased (3241ms → 3918ms) between baseline and results. The README correctly attributes this to mmap-per-coroutine spawn overhead, which is a known issue separate from this PR. ### Minor - The `README.md` note "May coroutine spawn overhead" appears to be a typo — likely "**May**'s coroutine spawn overhead" (May being the coroutine library/scheduler used). ### Summary The core change (Box→Arc) is sound and the performance results are compelling. The main thing to verify before merging is that specialization correctness with heap types in tagged-ptr mode is covered by tests. Everything else looks good.
claude[bot] commented 2026-03-23 22:15:42 +00:00 (Migrated from github.com)

PR Review: Flip (tagged-ptr default + Box→Arc migration)

This is a significant, well-executed change. Flipping tagged-ptr to default and switching heap values from Box<Value> to Arc<Value> delivers transformative performance gains while fixing a real correctness issue (the dup fast-path double-free). Overall the implementation looks sound. A few notes below:


What's well done

Arc ownership model is correct throughout:

  • clone_stack_value: Arc::from_raw + Arc::clone + std::mem::forget correctly bumps the refcount without decrementing the original. No double-free.
  • drop_stack_value: let _ = Arc::from_raw(...) correctly decrements and frees on last reference.
  • stack_value_to_value: Arc::try_unwrap with clone fallback is the right pattern — cheap unwrap when sole owner, graceful clone when dup'd refs exist.

Feature propagation fix is important. Before this PR, tagged-ptr was default in compiler but not core/runtime, which could cause subtle mismatches between crates. Making it uniform is correct.

Specialization re-enablement is justified. The guard that disabled specialization in tagged-ptr mode was added defensively after the dup fast-path crashes. Now that the dup path is fixed (Box→Arc), the original concern is resolved and specialization (which operates in registers) was always safe.

Fanout benchmark bug fix (swapped result:timetime:result fields) was a real pre-existing bug that was masking the actual numbers. Good catch.


🔴 Issues

Stale module-level doc comment (stack_new.rs:9):

//! - Even > 2: Heap pointer to Box<Value>

Should be Arc<Value> now. Minor, but it's the first thing a reader sees.


🟡 Observations worth noting

Arc::try_unwrap clone behavior on dup'd values:
When a value is dup'd (refcount = 2) and then pop'd, try_unwrap fails and falls back to cloning the inner value. This is correct and the comment explains it well. The tradeoff is that every pop of a dup'd heap value pays a clone. For workloads that heavily dup/pop heap values in tight loops, this could be a hidden cost — worth monitoring if channel-heavy benchmarks regress in the future.

Collections regression is acknowledged, but the number is notable:
build-100k went from 15,664ms → 18,110ms (+15.7%), and this was already a known weak area. The commit message notes "COW optimization planned" — it would help to have a tracking issue linked so this doesn't slip.

Value: Send + Sync assumption:
The Arc<Value> approach requires Value: Send + Sync for safe use across May coroutines. The commit message says this is "enforced by explicit unsafe impl in value.rs — build confirms." Worth verifying this also covers any newly-reachable code paths through specialization (since specialization was just re-enabled).


🟢 Test coverage

Existing roundtrip tests in stack_new.rs cover Float, String, Symbol, Variant, Map, Quotation, Closure, Channel, WeaveCtx — all the heap types. There's no explicit test for the clone_stack_value + stack_value_to_value refcount path (i.e., push a float, dup it, pop both, verify no double-free or corruption), but the integration benchmark results serve as a functional stress test. A unit test for the dup→pop→pop flow on a heap type would be a good addition to guard the Arc invariants going forward.


Summary

The core Arc migration is correct and the performance numbers are impressive. Fix the stale doc comment, and consider adding a link to a follow-up issue for the COW optimization. Otherwise this looks ready to merge.

## PR Review: Flip (`tagged-ptr` default + Box→Arc migration) This is a significant, well-executed change. Flipping `tagged-ptr` to default and switching heap values from `Box<Value>` to `Arc<Value>` delivers transformative performance gains while fixing a real correctness issue (the dup fast-path double-free). Overall the implementation looks sound. A few notes below: --- ### ✅ What's well done **Arc ownership model is correct throughout:** - `clone_stack_value`: `Arc::from_raw` + `Arc::clone` + `std::mem::forget` correctly bumps the refcount without decrementing the original. No double-free. - `drop_stack_value`: `let _ = Arc::from_raw(...)` correctly decrements and frees on last reference. - `stack_value_to_value`: `Arc::try_unwrap` with clone fallback is the right pattern — cheap unwrap when sole owner, graceful clone when dup'd refs exist. **Feature propagation fix is important.** Before this PR, `tagged-ptr` was default in `compiler` but not `core`/`runtime`, which could cause subtle mismatches between crates. Making it uniform is correct. **Specialization re-enablement is justified.** The guard that disabled specialization in tagged-ptr mode was added defensively after the dup fast-path crashes. Now that the dup path is fixed (Box→Arc), the original concern is resolved and specialization (which operates in registers) was always safe. **Fanout benchmark bug fix** (swapped `result:time` → `time:result` fields) was a real pre-existing bug that was masking the actual numbers. Good catch. --- ### 🔴 Issues **Stale module-level doc comment (`stack_new.rs:9`):** ```rust //! - Even > 2: Heap pointer to Box<Value> ``` Should be `Arc<Value>` now. Minor, but it's the first thing a reader sees. --- ### 🟡 Observations worth noting **`Arc::try_unwrap` clone behavior on dup'd values:** When a value is `dup`'d (refcount = 2) and then `pop`'d, `try_unwrap` fails and falls back to cloning the inner value. This is correct and the comment explains it well. The tradeoff is that every pop of a dup'd heap value pays a clone. For workloads that heavily dup/pop heap values in tight loops, this could be a hidden cost — worth monitoring if channel-heavy benchmarks regress in the future. **Collections regression is acknowledged, but the number is notable:** `build-100k` went from 15,664ms → 18,110ms (+15.7%), and this was already a known weak area. The commit message notes "COW optimization planned" — it would help to have a tracking issue linked so this doesn't slip. **`Value: Send + Sync` assumption:** The `Arc<Value>` approach requires `Value: Send + Sync` for safe use across May coroutines. The commit message says this is "enforced by explicit unsafe impl in value.rs — build confirms." Worth verifying this also covers any newly-reachable code paths through specialization (since specialization was just re-enabled). --- ### 🟢 Test coverage Existing roundtrip tests in `stack_new.rs` cover `Float`, `String`, `Symbol`, `Variant`, `Map`, `Quotation`, `Closure`, `Channel`, `WeaveCtx` — all the heap types. There's no explicit test for the `clone_stack_value` + `stack_value_to_value` refcount path (i.e., push a float, dup it, pop both, verify no double-free or corruption), but the integration benchmark results serve as a functional stress test. A unit test for the dup→pop→pop flow on a heap type would be a good addition to guard the Arc invariants going forward. --- ### Summary The core Arc migration is correct and the performance numbers are impressive. Fix the stale doc comment, and consider adding a link to a follow-up issue for the COW optimization. Otherwise this looks ready to merge.
Sign in to join this conversation.
No description provided.