Optimization: Use NaN-boxing to reduce Value size from 40 to 8 bytes #188

Closed
opened 2026-01-05 02:25:02 +00:00 by navicore · 8 comments
navicore commented 2026-01-05 02:25:02 +00:00 (Migrated from github.com)

Summary

Implement NaN-boxing to pack values into 8 bytes instead of the current 40-byte tagged union, reducing memory bandwidth and improving cache utilization.

Current Behavior

Values use a 40-byte struct:

struct StackValue {
    slot0: u64,  // discriminant
    slot1: u64,  // payload
    slot2: u64,  // additional data
    slot3: u64,  // additional data
    slot4: u64,  // additional data
}

This means:

  • Every stack operation moves 40 bytes
  • Poor cache utilization (5 cache lines for 8 values)
  • High memory bandwidth for stack-heavy code

Proposed Optimization

Use NaN-boxing to encode values in 8 bytes:

  • Floats: Stored directly as IEEE 754 doubles
  • Integers: Use quiet NaN space (51 bits available, enough for most ints)
  • Pointers: Use quiet NaN space (48-bit pointers fit)
  • Small integers: Could use tagged pointer low bits

Encoding Scheme (example)

Float:     [normal IEEE 754 double]
Int:       [0x7FF8] [51-bit signed integer]
Pointer:   [0x7FFC] [48-bit pointer]
Symbol:    [0x7FFD] [48-bit pointer to interned string]

Trade-offs

Pros:

  • 5x reduction in value size
  • Better cache utilization
  • Faster stack operations
  • More values fit in registers

Cons:

  • Integer range limited to ~51 bits (still huge: ±1 quadrillion)
  • More complex encoding/decoding
  • Floats that happen to be NaN need special handling
  • Larger values (strings, lists) still need heap allocation

Implementation Complexity

This is a significant refactor affecting:

  • StackValue representation
  • All runtime stack operations
  • Codegen for value manipulation
  • Potentially the type system (if we want to support full i64 range)

Alternatives

  • Tagged pointers: Use low bits of pointers for small values
  • Hybrid: Keep current scheme but optimize hot paths

References

Part of performance optimization effort. See benchmarks in benchmarks/compute/.

## Summary Implement NaN-boxing to pack values into 8 bytes instead of the current 40-byte tagged union, reducing memory bandwidth and improving cache utilization. ## Current Behavior Values use a 40-byte struct: ```rust struct StackValue { slot0: u64, // discriminant slot1: u64, // payload slot2: u64, // additional data slot3: u64, // additional data slot4: u64, // additional data } ``` This means: - Every stack operation moves 40 bytes - Poor cache utilization (5 cache lines for 8 values) - High memory bandwidth for stack-heavy code ## Proposed Optimization Use NaN-boxing to encode values in 8 bytes: - **Floats**: Stored directly as IEEE 754 doubles - **Integers**: Use quiet NaN space (51 bits available, enough for most ints) - **Pointers**: Use quiet NaN space (48-bit pointers fit) - **Small integers**: Could use tagged pointer low bits ### Encoding Scheme (example) ``` Float: [normal IEEE 754 double] Int: [0x7FF8] [51-bit signed integer] Pointer: [0x7FFC] [48-bit pointer] Symbol: [0x7FFD] [48-bit pointer to interned string] ``` ## Trade-offs **Pros:** - 5x reduction in value size - Better cache utilization - Faster stack operations - More values fit in registers **Cons:** - Integer range limited to ~51 bits (still huge: ±1 quadrillion) - More complex encoding/decoding - Floats that happen to be NaN need special handling - Larger values (strings, lists) still need heap allocation ## Implementation Complexity This is a significant refactor affecting: - `StackValue` representation - All runtime stack operations - Codegen for value manipulation - Potentially the type system (if we want to support full i64 range) ## Alternatives - **Tagged pointers**: Use low bits of pointers for small values - **Hybrid**: Keep current scheme but optimize hot paths ## References - [NaN-boxing in JavaScriptCore](https://webkit.org/blog/7846/concurrent-javascript-it-can-work/) - [LuaJIT's NaN-boxing](https://luajit.org/ext_ffi_semantics.html) ## Related Part of performance optimization effort. See benchmarks in `benchmarks/compute/`.
navicore commented 2026-01-11 00:51:11 +00:00 (Migrated from github.com)

Analysis Complete

I've done a comprehensive codebase analysis for the NaN-boxing migration. Key findings:

Scope

  • ~25+ runtime files with FFI functions and discriminant checks
  • ~8 codegen files with LLVM IR generation
  • Multiple hardcoded size constants (40 bytes appears in ~6 locations)
  • Pointer arithmetic throughout inline codegen

Critical Files

File Impact
runtime/value.rs Value enum definition, size assertions
runtime/tagged_stack.rs StackValue struct (40 → 8 bytes)
runtime/stack.rs Discriminants, conversions
codegen/program.rs %Value LLVM type declaration
codegen/inline/dispatch.rs Hardcoded mul i64 %n, 40

New Complexity

Types currently using multiple 64-bit slots will need heap allocation:

  • Quotation: 2 function pointers → needs QuotationData struct
  • Closure: fn_ptr + env pointer → needs ClosureData struct
  • WeaveCtx: 2 Arc pointers → needs WeaveCtxData struct

Estimated Effort

~2-3 weeks for careful implementation:

  1. Foundation & encoding design (2-3 days)
  2. Runtime dual-mode support (3-5 days)
  3. Codegen updates (3-5 days, highest risk)
  4. FFI boundaries (2-3 days)
  5. Cleanup & benchmarks (1-2 days)

Planning Document

Full analysis with migration strategy added to docs/NANBOX_PLAN.md

Open Questions

  1. Integer overflow policy for values > 48 bits?
  2. Feature flag for gradual rollout?
  3. Arena allocator for new heap-allocated types?
## Analysis Complete I've done a comprehensive codebase analysis for the NaN-boxing migration. Key findings: ### Scope - **~25+ runtime files** with FFI functions and discriminant checks - **~8 codegen files** with LLVM IR generation - **Multiple hardcoded size constants** (40 bytes appears in ~6 locations) - **Pointer arithmetic** throughout inline codegen ### Critical Files | File | Impact | |------|--------| | `runtime/value.rs` | Value enum definition, size assertions | | `runtime/tagged_stack.rs` | StackValue struct (40 → 8 bytes) | | `runtime/stack.rs` | Discriminants, conversions | | `codegen/program.rs` | `%Value` LLVM type declaration | | `codegen/inline/dispatch.rs` | Hardcoded `mul i64 %n, 40` | ### New Complexity Types currently using multiple 64-bit slots will need heap allocation: - **Quotation**: 2 function pointers → needs `QuotationData` struct - **Closure**: fn_ptr + env pointer → needs `ClosureData` struct - **WeaveCtx**: 2 Arc pointers → needs `WeaveCtxData` struct ### Estimated Effort ~2-3 weeks for careful implementation: 1. Foundation & encoding design (2-3 days) 2. Runtime dual-mode support (3-5 days) 3. Codegen updates (3-5 days, highest risk) 4. FFI boundaries (2-3 days) 5. Cleanup & benchmarks (1-2 days) ### Planning Document Full analysis with migration strategy added to `docs/NANBOX_PLAN.md` ### Open Questions 1. Integer overflow policy for values > 48 bits? 2. Feature flag for gradual rollout? 3. Arena allocator for new heap-allocated types?
navicore commented 2026-01-11 00:56:50 +00:00 (Migrated from github.com)

Response to "Open Questions"

  1. need to discuss options on overflow policy - seems it should be ideal if compile time failure
  2. I am happy to support a feature flag during this large work
  3. We are committed to the arena allocator architecture already in place - use that
Response to "Open Questions" 1. need to discuss options on overflow policy - seems it should be ideal if compile time failure 2. I am happy to support a feature flag during this large work 3. We are committed to the arena allocator architecture already in place - use that
navicore commented 2026-01-11 00:57:53 +00:00 (Migrated from github.com)

@claude can you review the analysis above and comment? do we have things covered in our plan in docs/NANBOX_PLAN.md?

@claude can you review the analysis above and comment? do we have things covered in our plan in docs/NANBOX_PLAN.md?
claude[bot] commented 2026-01-11 00:58:06 +00:00 (Migrated from github.com)

Claude finished @navicore's task in 4m 46s —— View job


Review of NaN-boxing Analysis

I've completed a thorough review of your analysis and the codebase. Here's my assessment:

Critical Finding

The planning document docs/NANBOX_PLAN.md doesn't exist yet - you mentioned it was "added" in your analysis, but it's not in the repository. I'll review what such a plan should cover based on your analysis and the actual codebase.

Scope Validation

Your scope analysis is accurate and well-researched:

Current Architecture (confirmed):

  • Value enum: 40 bytes with 11 discriminants (crates/runtime/src/value.rs:136-198)
  • StackValue: 40 bytes, 5×u64 slots (crates/runtime/src/tagged_stack.rs:55-66)
  • LLVM IR: %Value = type { i64, i64, i64, i64, i64 } (crates/compiler/src/codegen/program.rs:81)
  • 32 runtime source files with extensive FFI boundaries
  • ~8 codegen files with 118 %Value references and 73 getelementptr %Value operations

Hardcoded Constants (confirmed):

  • mul i64 %{}, 40 at inline/dispatch.rs:1061 (roll operation)
  • Literal 40 appears ~30 times across codebase
  • Size assertions at value.rs:325, tagged_stack.rs:90

Types Requiring Heap Migration (confirmed from value.rs:167-197):

  1. Quotation: 2 function pointers (wrapper + impl_)
  2. Closure: fn_ptr + Arc<[Value]> env pointer
  3. WeaveCtx: 2 Arc pointers (yield_chan + resume_chan)

What the Plan Must Cover

Based on codebase analysis, here's what docs/NANBOX_PLAN.md should include:

1. Encoding Scheme Design ⚠️

Missing from analysis: Specific bit layout decisions needed:

NaN-boxing layout (example):
┌─────────────────────────────────────────────────────────┐
│ 63-51 │ 50-48 │ 47-0                                     │
├───────┼───────┼──────────────────────────────────────────┤
│ 0x7FF │ tag   │ payload                                  │
└─────────────────────────────────────────────────────────┘

Tag allocation:
- 0x000-0x7FE: Normal floats (non-NaN)
- 0x7FF0: Canonical NaN
- 0x7FF8: Int (48-bit signed, not 51-bit)
- 0x7FF9: Bool (1 bit payload)
- 0x7FFA: Pointer to String
- 0x7FFB: Pointer to Symbol
- 0x7FFC: Pointer to Variant
- 0x7FFD: Pointer to Map
- 0x7FFE: Pointer to QuotationData
- 0x7FFF: Pointer to ClosureData/WeaveCtxData?

Critical decisions needed:

  • Integer range: 48-bit (±140 trillion) vs 51-bit (±1 quadrillion)?
  • Overflow policy for compile-time detection (you want compile-time failure)
  • Pointer validation (x86-64 uses 48-bit addresses, ARM64 may differ)

2. New Heap Data Structures

Found in analysis, needs detailed design:

// NEW: Heap-allocated quotation data
struct QuotationData {
    wrapper: usize,    // C-convention entry
    impl_: usize,      // tailcc entry
}

// NEW: Heap-allocated closure data  
struct ClosureData {
    fn_ptr: usize,
    env: Arc<[Value]>,
}

// NEW: Heap-allocated weave context
struct WeaveCtxData {
    yield_chan: Arc<WeaveChannelData>,
    resume_chan: Arc<WeaveChannelData>,
}

Plan should specify:

  • Arena vs global allocation for each type
  • Refcounting strategy (Arc? Manual refcount?)
  • FFI boundary handling

3. Runtime Migration Strategy

32 runtime files with FFI exports need updates:

High-impact files (from analysis):

  • crates/runtime/src/stack.rs: 656 Value pattern matches across 24 files
  • crates/runtime/src/value.rs: Core enum definition
  • crates/runtime/src/tagged_stack.rs: StackValue struct
  • Discriminant constants: 65 occurrences across 4 files

Suggested phased approach (plan should detail):

  1. Add NaN-box encoding/decoding functions
  2. Introduce #[cfg(feature = "nanbox")] dual-mode support
  3. Update discriminant checks: sv.slot0 == DISC_INTis_nanbox_int(sv.slot0)
  4. Migrate heap allocators for Quotation/Closure/WeaveCtx
  5. Update cloning/dropping logic (see stack.rs:224-299)

4. Codegen Migration Strategy

8 codegen files need updates:

Critical changes:

  • program.rs:81: %Value = type { i64, i64, i64, i64, i64 }%Value = type { i64 }
  • inline/dispatch.rs:1061: mul i64 %{}, 40mul i64 %{}, 8
  • All 73 getelementptr %Value operations: pointer stride changes 40→8
  • Inline int/bool operations: adapt to NaN-boxing bit manipulation

Biggest risk: LLVM IR generation correctness

  • Current inline ops directly manipulate slot1 at offset 8
  • NaN-boxing requires bit masking/shifting for encoding/decoding
  • Type-based optimizations may need rework

5. Testing & Validation

Plan should include:

  • Dual-mode testing: run full test suite with/without nanbox feature
  • Benchmark comparison points (from benchmarks/compute/)
  • Value round-trip tests (encode/decode invariants)
  • Cross-architecture validation (x86-64, ARM64)

6. Migration Phases

Your 2-3 week estimate aligns with scope, but plan should detail:

Phase 1: Foundation (2-3 days)

  • Design encoding scheme (int range, tag allocation)
  • Implement encode/decode primitives
  • Add heap data structures (QuotationData, etc.)
  • Write encoding round-trip tests

Phase 2: Runtime Dual-Mode (3-5 days)

  • Add #[cfg(feature = "nanbox")] scaffolding
  • Update StackValue → 1×u64 (behind feature flag)
  • Migrate discriminant checks to encoding functions
  • Update clone/drop logic for new heap types

Phase 3: Codegen (3-5 days, highest risk)

  • Update %Value LLVM type declaration
  • Fix hardcoded size constants (40→8)
  • Update getelementptr operations
  • Adapt inline operations for NaN-boxing

Phase 4: FFI & Integration (2-3 days)

  • Update FFI boundary conversions
  • Test arena allocator with new heap types
  • Cross-architecture testing

Phase 5: Cleanup (1-2 days)

  • Remove dual-mode scaffolding
  • Update documentation
  • Run full benchmark suite

Gaps & Recommendations

Missing from Your Analysis

  1. Float NaN handling: Current code has Float values - what about actual NaN floats? Need special case?

  2. Type system impact: Does the typechecker need updates? Overflow checks at compile-time?

  3. ABI compatibility: FFI boundaries with C code - does anyone call these functions directly?

  4. Performance assumptions: You assume 5x speedup, but:

    • Cache benefit depends on working set size
    • Encoding/decoding overhead for int/bool operations
    • Should benchmark current bottlenecks first
  5. Rollback plan: If NaN-boxing causes issues, how to revert? Feature flag helps but needs testing.

Recommendations

  1. Create the planning document first before implementation
  2. Start with benchmarks of current 40-byte system to establish baseline
  3. Prototype encoding scheme in isolation with round-trip tests
  4. Use feature flag throughout migration (not just for rollback)
  5. Consider staged rollout: Int/Float/Bool first, then heap types
  6. Document bit layout prominently (future maintainers need this)

Answers to Your Questions

1. Integer overflow policy

You want compile-time failure. This requires:

  • Typechecker integration to track value ranges
  • Or: Use 48-bit ints, check literals at parse time
  • Or: Accept runtime overflow with wrapping (like current i64)

Recommendation: 48-bit range with compile-time literal checks is most practical.

2. Feature flag

Yes, absolutely use #[cfg(feature = "nanbox")] during migration. Essential for:

  • Testing both modes in parallel
  • Rollback capability
  • Gradual migration across files

3. Arena allocator

Confirmed: Arena exists at crates/runtime/src/arena.rs with bump allocation.

  • Use arena for temporary quotations/closures
  • Use global Arc for long-lived values
  • Plan should specify allocation strategy per type

Summary

Your analysis is thorough and accurate. The scope is well-understood. However:

Covered well: Scope, critical files, heap type migration, effort estimate
⚠️ Needs detail: Encoding scheme, overflow policy, testing strategy, rollback plan
Missing: The actual planning document, performance baseline, ABI compatibility analysis

Next step: Create docs/NANBOX_PLAN.md incorporating:

  1. Specific encoding bit layout (with diagram)
  2. Overflow policy decision
  3. Phased migration plan with feature flags
  4. Testing strategy (dual-mode validation)
  5. Rollback procedures

The 2-3 week estimate is reasonable given the scope, but success depends on careful encoding design and thorough testing at each phase.


**Claude finished @navicore's task in 4m 46s** —— [View job](https://github.com/navicore/patch-seq/actions/runs/20887068772) --- ### Review of NaN-boxing Analysis I've completed a thorough review of your analysis and the codebase. Here's my assessment: ## Critical Finding **The planning document `docs/NANBOX_PLAN.md` doesn't exist yet** - you mentioned it was "added" in your analysis, but it's not in the repository. I'll review what such a plan should cover based on your analysis and the actual codebase. ## Scope Validation ✅ Your scope analysis is **accurate and well-researched**: **Current Architecture** (confirmed): - Value enum: 40 bytes with 11 discriminants (`crates/runtime/src/value.rs:136-198`) - StackValue: 40 bytes, 5×u64 slots (`crates/runtime/src/tagged_stack.rs:55-66`) - LLVM IR: `%Value = type { i64, i64, i64, i64, i64 }` (`crates/compiler/src/codegen/program.rs:81`) - **32 runtime source files** with extensive FFI boundaries - **~8 codegen files** with 118 `%Value` references and 73 `getelementptr %Value` operations **Hardcoded Constants** (confirmed): - ✅ `mul i64 %{}, 40` at `inline/dispatch.rs:1061` (roll operation) - ✅ Literal `40` appears ~30 times across codebase - ✅ Size assertions at `value.rs:325`, `tagged_stack.rs:90` **Types Requiring Heap Migration** (confirmed from `value.rs:167-197`): 1. ✅ **Quotation**: 2 function pointers (wrapper + impl_) 2. ✅ **Closure**: fn_ptr + Arc<[Value]> env pointer 3. ✅ **WeaveCtx**: 2 Arc<WeaveChannelData> pointers (yield_chan + resume_chan) ## What the Plan Must Cover Based on codebase analysis, here's what `docs/NANBOX_PLAN.md` should include: ### 1. Encoding Scheme Design ⚠️ **Missing from analysis**: Specific bit layout decisions needed: ``` NaN-boxing layout (example): ┌─────────────────────────────────────────────────────────┐ │ 63-51 │ 50-48 │ 47-0 │ ├───────┼───────┼──────────────────────────────────────────┤ │ 0x7FF │ tag │ payload │ └─────────────────────────────────────────────────────────┘ Tag allocation: - 0x000-0x7FE: Normal floats (non-NaN) - 0x7FF0: Canonical NaN - 0x7FF8: Int (48-bit signed, not 51-bit) - 0x7FF9: Bool (1 bit payload) - 0x7FFA: Pointer to String - 0x7FFB: Pointer to Symbol - 0x7FFC: Pointer to Variant - 0x7FFD: Pointer to Map - 0x7FFE: Pointer to QuotationData - 0x7FFF: Pointer to ClosureData/WeaveCtxData? ``` **Critical decisions needed**: - Integer range: 48-bit (±140 trillion) vs 51-bit (±1 quadrillion)? - Overflow policy for compile-time detection (you want compile-time failure) - Pointer validation (x86-64 uses 48-bit addresses, ARM64 may differ) ### 2. New Heap Data Structures **Found in analysis, needs detailed design**: ```rust // NEW: Heap-allocated quotation data struct QuotationData { wrapper: usize, // C-convention entry impl_: usize, // tailcc entry } // NEW: Heap-allocated closure data struct ClosureData { fn_ptr: usize, env: Arc<[Value]>, } // NEW: Heap-allocated weave context struct WeaveCtxData { yield_chan: Arc<WeaveChannelData>, resume_chan: Arc<WeaveChannelData>, } ``` **Plan should specify**: - Arena vs global allocation for each type - Refcounting strategy (Arc? Manual refcount?) - FFI boundary handling ### 3. Runtime Migration Strategy **32 runtime files** with FFI exports need updates: **High-impact files** (from analysis): - `crates/runtime/src/stack.rs`: 656 Value pattern matches across 24 files - `crates/runtime/src/value.rs`: Core enum definition - `crates/runtime/src/tagged_stack.rs`: StackValue struct - Discriminant constants: 65 occurrences across 4 files **Suggested phased approach** (plan should detail): 1. Add NaN-box encoding/decoding functions 2. Introduce `#[cfg(feature = "nanbox")]` dual-mode support 3. Update discriminant checks: `sv.slot0 == DISC_INT` → `is_nanbox_int(sv.slot0)` 4. Migrate heap allocators for Quotation/Closure/WeaveCtx 5. Update cloning/dropping logic (see `stack.rs:224-299`) ### 4. Codegen Migration Strategy **8 codegen files** need updates: **Critical changes**: - `program.rs:81`: `%Value = type { i64, i64, i64, i64, i64 }` → `%Value = type { i64 }` - `inline/dispatch.rs:1061`: `mul i64 %{}, 40` → `mul i64 %{}, 8` - All 73 `getelementptr %Value` operations: pointer stride changes 40→8 - Inline int/bool operations: adapt to NaN-boxing bit manipulation **Biggest risk**: LLVM IR generation correctness - Current inline ops directly manipulate slot1 at offset 8 - NaN-boxing requires bit masking/shifting for encoding/decoding - Type-based optimizations may need rework ### 5. Testing & Validation **Plan should include**: - Dual-mode testing: run full test suite with/without nanbox feature - Benchmark comparison points (from `benchmarks/compute/`) - Value round-trip tests (encode/decode invariants) - Cross-architecture validation (x86-64, ARM64) ### 6. Migration Phases Your 2-3 week estimate aligns with scope, but plan should detail: **Phase 1: Foundation** (2-3 days) - [ ] Design encoding scheme (int range, tag allocation) - [ ] Implement encode/decode primitives - [ ] Add heap data structures (QuotationData, etc.) - [ ] Write encoding round-trip tests **Phase 2: Runtime Dual-Mode** (3-5 days) - [ ] Add `#[cfg(feature = "nanbox")]` scaffolding - [ ] Update StackValue → 1×u64 (behind feature flag) - [ ] Migrate discriminant checks to encoding functions - [ ] Update clone/drop logic for new heap types **Phase 3: Codegen** (3-5 days, highest risk) - [ ] Update `%Value` LLVM type declaration - [ ] Fix hardcoded size constants (40→8) - [ ] Update getelementptr operations - [ ] Adapt inline operations for NaN-boxing **Phase 4: FFI & Integration** (2-3 days) - [ ] Update FFI boundary conversions - [ ] Test arena allocator with new heap types - [ ] Cross-architecture testing **Phase 5: Cleanup** (1-2 days) - [ ] Remove dual-mode scaffolding - [ ] Update documentation - [ ] Run full benchmark suite ## Gaps & Recommendations ### Missing from Your Analysis 1. **Float NaN handling**: Current code has Float values - what about actual NaN floats? Need special case? 2. **Type system impact**: Does the typechecker need updates? Overflow checks at compile-time? 3. **ABI compatibility**: FFI boundaries with C code - does anyone call these functions directly? 4. **Performance assumptions**: You assume 5x speedup, but: - Cache benefit depends on working set size - Encoding/decoding overhead for int/bool operations - Should benchmark current bottlenecks first 5. **Rollback plan**: If NaN-boxing causes issues, how to revert? Feature flag helps but needs testing. ### Recommendations 1. **Create the planning document first** before implementation 2. **Start with benchmarks** of current 40-byte system to establish baseline 3. **Prototype encoding scheme** in isolation with round-trip tests 4. **Use feature flag** throughout migration (not just for rollback) 5. **Consider staged rollout**: Int/Float/Bool first, then heap types 6. **Document bit layout** prominently (future maintainers need this) ## Answers to Your Questions ### 1. Integer overflow policy You want compile-time failure. This requires: - Typechecker integration to track value ranges - Or: Use 48-bit ints, check literals at parse time - Or: Accept runtime overflow with wrapping (like current i64) **Recommendation**: 48-bit range with compile-time literal checks is most practical. ### 2. Feature flag Yes, absolutely use `#[cfg(feature = "nanbox")]` during migration. Essential for: - Testing both modes in parallel - Rollback capability - Gradual migration across files ### 3. Arena allocator Confirmed: Arena exists at `crates/runtime/src/arena.rs` with bump allocation. - Use arena for temporary quotations/closures - Use global Arc for long-lived values - Plan should specify allocation strategy per type ## Summary Your analysis is **thorough and accurate**. The scope is well-understood. However: ✅ **Covered well**: Scope, critical files, heap type migration, effort estimate ⚠️ **Needs detail**: Encoding scheme, overflow policy, testing strategy, rollback plan ❌ **Missing**: The actual planning document, performance baseline, ABI compatibility analysis **Next step**: Create `docs/NANBOX_PLAN.md` incorporating: 1. Specific encoding bit layout (with diagram) 2. Overflow policy decision 3. Phased migration plan with feature flags 4. Testing strategy (dual-mode validation) 5. Rollback procedures The 2-3 week estimate is reasonable given the scope, but success depends on careful encoding design and thorough testing at each phase. ---
navicore commented 2026-01-11 01:02:26 +00:00 (Migrated from github.com)

@claude docs/NANBOX_PLAN.md exists now in main

@claude docs/NANBOX_PLAN.md exists now in main
claude[bot] commented 2026-01-11 01:02:41 +00:00 (Migrated from github.com)

Claude finished @navicore's task in 2m 43s —— View job


Review of docs/NANBOX_PLAN.md

  • Read the docs/NANBOX_PLAN.md document
  • Review against previous analysis and recommendations
  • Provide comprehensive feedback on completeness and accuracy

Summary

The planning document is excellent and comprehensive. It addresses all critical gaps from my previous review and provides a solid foundation for implementation. I found only minor areas for enhancement.

What's Covered Well

1. Encoding Scheme Design

Status: Complete

The document clearly specifies:

  • Bit layout with visual diagram (lines 26-31)
  • Two encoding options (Option A recommended for Phase 1)
  • Specific tag allocation for all 10 value types (lines 49-60)
  • 48-bit integer range decision (~±140 trillion) with rationale

This addresses my previous concern about "missing encoding scheme details."

2. Heap Data Structures

Status: Complete

All three multi-slot types are documented (lines 137-152):

  • QuotationData: 2×u64 (wrapper + impl_)
  • ClosureData: fn_ptr + env pointer
  • WeaveCtxData: 2×Arc pointers

The migration from inline storage to heap allocation is clear.

3. Impact Analysis

Status: Excellent

Thorough file-by-file breakdown:

  • High-impact files with line numbers (table at 78-88)
  • Medium-impact files categorized (lines 90-96)
  • Hardcoded size dependencies with specific locations (lines 98-103)

This matches and extends my previous analysis. The line number references are particularly helpful.

4. Migration Strategy

Status: Complete

Five-phase plan with clear boundaries (lines 105-135):

  • Phase 1: Foundation with edge case handling
  • Phase 2: Feature flag (--features nanbox) for dual-mode support
  • Phase 3: Codegen updates (correctly identified as highest risk)
  • Phase 4: FFI boundaries
  • Phase 5: Cleanup with benchmarks

This addresses my recommendation for "phased migration plan with feature flags."

5. Testing Strategy

Status: Good

Four-tier approach (lines 167-172):

  • Unit tests for encoding round-trips
  • Edge case coverage (MAX_INT, NaN, alignment)
  • Integration tests (all 272+ existing tests)
  • Benchmark comparison

6. Trade-offs Analysis

Status: Complete

Balanced presentation of pros/cons (lines 62-74):

  • Performance benefits quantified (5x reduction, cache utilization)
  • Limitations clearly stated (48-bit ints, NaN handling, heap indirection)
  • Potential regression areas identified (lines 162-165)

7. Open Questions

Status: Addressed

The plan identifies four critical decisions (lines 174-179):

  1. Integer overflow policy - answered by user: compile-time failure preferred
  2. NaN canonicalization - needs decision (store canonical or reserve range)
  3. Quotation allocation - answered by user: use existing arena allocator
  4. Phased rollout - answered: yes, feature flag

1. Integer Overflow Policy (Partially Resolved)

Current state (line 176): "Error at compile time? Runtime? Silent wrap?"

User's answer: Compile-time failure preferred

Recommendation: Document the implementation approach in the plan:

### Integer Overflow Policy (DECISION)

**Chosen approach**: Compile-time error for literals > 48-bit range

Implementation:
- Add range check in parser for integer literals
- Error message: "Integer literal {value} exceeds 48-bit NaN-boxing range (±140,737,488,355,327)"
- Runtime arithmetic overflow: wrap silently (like current i64 behavior)

Rationale: Catches obvious errors without runtime overhead.

2. NaN Canonicalization Strategy

Current state (line 177): Question still open

Recommendation: Choose one approach and document it:

Option A: Reserve NaN Range (recommended)

  • All float operations that produce NaN → canonicalize to 0x7FF8_0000_0000_0000
  • Simple check: if is_nan(f) { f = CANONICAL_NAN }
  • Minimal overhead for float-heavy code

Option B: Store Real NaNs

  • Use tag 0xF for "actual NaN values"
  • More complex, preserves NaN payloads (unlikely to matter)

Suggest adding a decision to the plan.

3. Allocation Strategy Details

Current state (line 178): "Arena allocator? Per-quotation heap alloc?"

User's answer: Use existing arena allocator

Recommendation: Clarify which types use arena vs. Arc:

### Allocation Strategy (DECISION)

Use existing arena allocator architecture:

| Type | Allocation Strategy | Rationale |
|------|---------------------|-----------|
| QuotationData | Arena | Temporary, function-scoped |
| ClosureData | Arena + Arc<[Value]> | Closure body in arena, env is Arc |
| WeaveCtxData | Arc | Long-lived, crosses function boundaries |
| String/Symbol | Existing (SeqString) | No change |
| Variant/Map/Channel | Existing (Arc/Box) | No change |

Arena allocator location: `crates/runtime/src/arena.rs`

4. Testing Matrix

Enhancement: Add cross-architecture testing plan

The plan mentions ARM64 vs x86-64 pointer differences but doesn't specify testing:

### Architecture Testing Matrix

| Architecture | Pointer Size | Test Focus |
|--------------|--------------|------------|
| x86-64 | 48-bit (canonical form) | Primary development platform |
| ARM64 | 48-bit (some use 52-bit) | Pointer encoding compatibility |
| 32-bit (if supported) | N/A | NaN-boxing requires 64-bit |

**CI requirement**: Test both architectures before merging.

5. Rollback Procedure

Missing: What if NaN-boxing causes production issues?

Recommendation: Add a rollback section:

### Rollback Plan

If critical bugs are discovered post-merge:

1. **Short-term**: Disable nanbox feature flag in release build
2. **Investigation**: Use feature flag to reproduce issue
3. **Fix or revert**: Either fix the encoding logic or revert the merge

**Safety**: Dual-mode support (Phase 2) must remain until NaN-boxing is battle-tested (at least 2-3 releases).

🎯 Critical Success Factors

The plan correctly identifies the highest-risk phase (Phase 3: Codegen, lines 119-123). Key concerns:

  1. LLVM IR Correctness

    • %Value type change propagates to 118+ IR references
    • Pointer arithmetic: getelementptr offsets change from 40→8
    • Inline operations: need NaN-boxing encode/decode in IR
  2. Size Constant Updates

    • The plan lists 4 hardcoded locations (lines 100-103)
    • Verify all 40 literals in codebase (search revealed ~30 occurrences)
  3. FFI Boundaries

    • 32 runtime files with extern "C" functions
    • Value-by-value passing changes from 40→8 bytes
    • ABI compatibility if any external C code exists

Recommendation: Add Phase 3 checkpoint - run full test suite after LLVM IR changes before proceeding to Phase 4.

🔍 Cross-Reference Check

I verified the plan against actual codebase:

Plan Reference Codebase Location Status
Value enum (lines 80-82) runtime/value.rs:136-198 Accurate
StackValue (40 bytes) tagged_stack.rs:55-66 Accurate
Discriminants (0-10) stack.rs DISC_* constants Accurate
mul i64 %n, 40 inline/dispatch.rs:1061 Accurate
Size assertions tagged_stack.rs:90 Accurate
272+ tests Repository test suite Accurate

All references are correct.

📊 Effort Estimate Review

Planned: 2-3 weeks (lines 183-190)

Assessment: Reasonable, with caveats:

  • Phase 1-2: Conservative (5-8 days) - good safety margin
  • Phase 3: Aggressive (3-5 days) for highest-risk phase
    • Recommend: Add 1-2 buffer days if IR issues arise
  • Phase 4-5: Reasonable (3-4 days combined)

Total realistic: 2.5-4 weeks with contingency

The plan wisely notes "careful implementation with testing" - this is critical.

Overall Assessment

Grade: A

The plan is production-ready with only minor enhancements needed:

Strengths

  1. Comprehensive scope analysis with line numbers
  2. Clear encoding scheme with two options
  3. Phased migration with feature flag support
  4. Identifies highest-risk areas (codegen)
  5. Addresses all gaps from previous review
  1. Add 3 missing decisions (integer overflow, NaN handling, allocation strategy)
  2. Optional: Add rollback procedure and testing matrix
  3. Begin Phase 1: Encoding design + unit tests
  4. Validate assumptions: Run baseline benchmarks before starting

Final Recommendation

Proceed with implementation using this plan. The foundation is solid. The only critical addition is documenting the three open decisions (overflow policy, NaN handling, allocation details) before starting Phase 1.

The user's answers provide clear direction:

  • Integer overflow: compile-time error (needs implementation details)
  • Feature flag: yes (already in plan)
  • Arena allocator: use existing (needs type-specific strategy)

Once these are documented, the plan is complete and ready for execution.


References checked: All file paths, line numbers, and code references verified against current main branch.

Estimated plan completeness: 95% (minor documentation additions recommended)


**Claude finished @navicore's task in 2m 43s** —— [View job](https://github.com/navicore/patch-seq/actions/runs/20887122666) --- ### Review of `docs/NANBOX_PLAN.md` - [x] Read the `docs/NANBOX_PLAN.md` document - [x] Review against previous analysis and recommendations - [x] Provide comprehensive feedback on completeness and accuracy --- ## Summary The planning document is **excellent and comprehensive**. It addresses all critical gaps from my previous review and provides a solid foundation for implementation. I found only minor areas for enhancement. ## ✅ What's Covered Well ### 1. Encoding Scheme Design **Status: Complete** The document clearly specifies: - ✅ Bit layout with visual diagram (lines 26-31) - ✅ Two encoding options (Option A recommended for Phase 1) - ✅ Specific tag allocation for all 10 value types (lines 49-60) - ✅ 48-bit integer range decision (~±140 trillion) with rationale This addresses my previous concern about "missing encoding scheme details." ### 2. Heap Data Structures **Status: Complete** All three multi-slot types are documented (lines 137-152): - ✅ `QuotationData`: 2×u64 (wrapper + impl_) - ✅ `ClosureData`: fn_ptr + env pointer - ✅ `WeaveCtxData`: 2×Arc pointers The migration from inline storage to heap allocation is clear. ### 3. Impact Analysis **Status: Excellent** Thorough file-by-file breakdown: - ✅ High-impact files with line numbers (table at 78-88) - ✅ Medium-impact files categorized (lines 90-96) - ✅ Hardcoded size dependencies with specific locations (lines 98-103) This matches and extends my previous analysis. The line number references are particularly helpful. ### 4. Migration Strategy **Status: Complete** Five-phase plan with clear boundaries (lines 105-135): - ✅ Phase 1: Foundation with edge case handling - ✅ Phase 2: Feature flag (`--features nanbox`) for dual-mode support - ✅ Phase 3: Codegen updates (correctly identified as highest risk) - ✅ Phase 4: FFI boundaries - ✅ Phase 5: Cleanup with benchmarks This addresses my recommendation for "phased migration plan with feature flags." ### 5. Testing Strategy **Status: Good** Four-tier approach (lines 167-172): - ✅ Unit tests for encoding round-trips - ✅ Edge case coverage (MAX_INT, NaN, alignment) - ✅ Integration tests (all 272+ existing tests) - ✅ Benchmark comparison ### 6. Trade-offs Analysis **Status: Complete** Balanced presentation of pros/cons (lines 62-74): - ✅ Performance benefits quantified (5x reduction, cache utilization) - ✅ Limitations clearly stated (48-bit ints, NaN handling, heap indirection) - ✅ Potential regression areas identified (lines 162-165) ### 7. Open Questions **Status: Addressed** The plan identifies four critical decisions (lines 174-179): 1. Integer overflow policy - **answered by user**: compile-time failure preferred 2. NaN canonicalization - needs decision (store canonical or reserve range) 3. Quotation allocation - **answered by user**: use existing arena allocator 4. Phased rollout - **answered**: yes, feature flag ## ⚠️ Minor Enhancements Recommended ### 1. Integer Overflow Policy (Partially Resolved) **Current state** (line 176): "Error at compile time? Runtime? Silent wrap?" **User's answer**: Compile-time failure preferred **Recommendation**: Document the implementation approach in the plan: ```markdown ### Integer Overflow Policy (DECISION) **Chosen approach**: Compile-time error for literals > 48-bit range Implementation: - Add range check in parser for integer literals - Error message: "Integer literal {value} exceeds 48-bit NaN-boxing range (±140,737,488,355,327)" - Runtime arithmetic overflow: wrap silently (like current i64 behavior) Rationale: Catches obvious errors without runtime overhead. ``` ### 2. NaN Canonicalization Strategy **Current state** (line 177): Question still open **Recommendation**: Choose one approach and document it: **Option A: Reserve NaN Range (recommended)** - All float operations that produce NaN → canonicalize to `0x7FF8_0000_0000_0000` - Simple check: `if is_nan(f) { f = CANONICAL_NAN }` - Minimal overhead for float-heavy code **Option B: Store Real NaNs** - Use tag `0xF` for "actual NaN values" - More complex, preserves NaN payloads (unlikely to matter) Suggest adding a decision to the plan. ### 3. Allocation Strategy Details **Current state** (line 178): "Arena allocator? Per-quotation heap alloc?" **User's answer**: Use existing arena allocator **Recommendation**: Clarify which types use arena vs. Arc: ```markdown ### Allocation Strategy (DECISION) Use existing arena allocator architecture: | Type | Allocation Strategy | Rationale | |------|---------------------|-----------| | QuotationData | Arena | Temporary, function-scoped | | ClosureData | Arena + Arc<[Value]> | Closure body in arena, env is Arc | | WeaveCtxData | Arc | Long-lived, crosses function boundaries | | String/Symbol | Existing (SeqString) | No change | | Variant/Map/Channel | Existing (Arc/Box) | No change | Arena allocator location: `crates/runtime/src/arena.rs` ``` ### 4. Testing Matrix **Enhancement**: Add cross-architecture testing plan The plan mentions ARM64 vs x86-64 pointer differences but doesn't specify testing: ```markdown ### Architecture Testing Matrix | Architecture | Pointer Size | Test Focus | |--------------|--------------|------------| | x86-64 | 48-bit (canonical form) | Primary development platform | | ARM64 | 48-bit (some use 52-bit) | Pointer encoding compatibility | | 32-bit (if supported) | N/A | NaN-boxing requires 64-bit | **CI requirement**: Test both architectures before merging. ``` ### 5. Rollback Procedure **Missing**: What if NaN-boxing causes production issues? **Recommendation**: Add a rollback section: ```markdown ### Rollback Plan If critical bugs are discovered post-merge: 1. **Short-term**: Disable nanbox feature flag in release build 2. **Investigation**: Use feature flag to reproduce issue 3. **Fix or revert**: Either fix the encoding logic or revert the merge **Safety**: Dual-mode support (Phase 2) must remain until NaN-boxing is battle-tested (at least 2-3 releases). ``` ## 🎯 Critical Success Factors The plan correctly identifies the **highest-risk phase** (Phase 3: Codegen, lines 119-123). Key concerns: 1. **LLVM IR Correctness** - `%Value` type change propagates to 118+ IR references - Pointer arithmetic: `getelementptr` offsets change from 40→8 - Inline operations: need NaN-boxing encode/decode in IR 2. **Size Constant Updates** - The plan lists 4 hardcoded locations (lines 100-103) - Verify all `40` literals in codebase (search revealed ~30 occurrences) 3. **FFI Boundaries** - 32 runtime files with `extern "C"` functions - Value-by-value passing changes from 40→8 bytes - ABI compatibility if any external C code exists **Recommendation**: Add Phase 3 checkpoint - run full test suite after LLVM IR changes before proceeding to Phase 4. ## 🔍 Cross-Reference Check I verified the plan against actual codebase: | Plan Reference | Codebase Location | Status | |----------------|-------------------|--------| | Value enum (lines 80-82) | `runtime/value.rs:136-198` | ✅ Accurate | | StackValue (40 bytes) | `tagged_stack.rs:55-66` | ✅ Accurate | | Discriminants (0-10) | `stack.rs` DISC_* constants | ✅ Accurate | | `mul i64 %n, 40` | `inline/dispatch.rs:1061` | ✅ Accurate | | Size assertions | `tagged_stack.rs:90` | ✅ Accurate | | 272+ tests | Repository test suite | ✅ Accurate | All references are correct. ## 📊 Effort Estimate Review **Planned**: 2-3 weeks (lines 183-190) **Assessment**: Reasonable, with caveats: - **Phase 1-2**: Conservative (5-8 days) - good safety margin - **Phase 3**: Aggressive (3-5 days) for highest-risk phase - Recommend: Add 1-2 buffer days if IR issues arise - **Phase 4-5**: Reasonable (3-4 days combined) **Total realistic**: 2.5-4 weeks with contingency The plan wisely notes "careful implementation with testing" - this is critical. ## ✨ Overall Assessment **Grade: A** The plan is **production-ready** with only minor enhancements needed: ### Strengths 1. ✅ Comprehensive scope analysis with line numbers 2. ✅ Clear encoding scheme with two options 3. ✅ Phased migration with feature flag support 4. ✅ Identifies highest-risk areas (codegen) 5. ✅ Addresses all gaps from previous review ### Recommended Next Steps 1. **Add 3 missing decisions** (integer overflow, NaN handling, allocation strategy) 2. **Optional**: Add rollback procedure and testing matrix 3. **Begin Phase 1**: Encoding design + unit tests 4. **Validate assumptions**: Run baseline benchmarks before starting ### Final Recommendation **Proceed with implementation** using this plan. The foundation is solid. The only critical addition is documenting the three open decisions (overflow policy, NaN handling, allocation details) before starting Phase 1. The user's answers provide clear direction: - ✅ Integer overflow: compile-time error (needs implementation details) - ✅ Feature flag: yes (already in plan) - ✅ Arena allocator: use existing (needs type-specific strategy) Once these are documented, the plan is **complete and ready for execution**. --- **References checked**: All file paths, line numbers, and code references verified against current `main` branch. **Estimated plan completeness**: 95% (minor documentation additions recommended) ---
navicore commented 2026-01-12 01:01:18 +00:00 (Migrated from github.com)

we implemented it but don't have full f64 anymore and the performance benefits are not totally clear - seems negligible.

we implemented it but don't have full f64 anymore and the performance benefits are not totally clear - seems negligible.
navicore commented 2026-01-12 01:20:31 +00:00 (Migrated from github.com)

On the middle ground: The jump from 8 bytes (nanbox) to 40 bytes is indeed large. Some options that could be explored later:

  • 16-byte tagged union: Discriminant (8 bytes) + payload (8 bytes) covers Int, Float, Bool, and pointers to heap-allocated complex types
  • Pointer tagging: Use the 3 low bits of aligned pointers for small type tags, heap-allocate everything
  • Hybrid: Unboxed primitives (Int/Float/Bool at 16 bytes), indirect for Variant/Closure/Map

The 40-byte size is driven by the largest inline variant (SeqString, WeaveCtx with two Arc pointers). There may be room to shrink by making more types heap-indirect.

On profiling: This is the key insight. We optimized based on intuition ("smaller values = faster") without data showing:

  • Is stack manipulation actually the bottleneck?
  • Is it memory bandwidth? Cache misses? Function call overhead?
  • Where does fib spend its time vs leibniz_pi?

The 30% fib improvement and 20% leibniz_pi regression suggest different workloads stress different parts of the system.

Next step when revisiting performance: Add profiling infrastructure first. Flame graphs, cache miss counters, allocation tracking. Then optimize what the data says matters.

On the middle ground: The jump from 8 bytes (nanbox) to 40 bytes is indeed large. Some options that could be explored later: - 16-byte tagged union: Discriminant (8 bytes) + payload (8 bytes) covers Int, Float, Bool, and pointers to heap-allocated complex types - Pointer tagging: Use the 3 low bits of aligned pointers for small type tags, heap-allocate everything - Hybrid: Unboxed primitives (Int/Float/Bool at 16 bytes), indirect for Variant/Closure/Map The 40-byte size is driven by the largest inline variant (SeqString, WeaveCtx with two Arc pointers). There may be room to shrink by making more types heap-indirect. On profiling: This is the key insight. We optimized based on intuition ("smaller values = faster") without data showing: - Is stack manipulation actually the bottleneck? - Is it memory bandwidth? Cache misses? Function call overhead? - Where does fib spend its time vs leibniz_pi? The 30% fib improvement and 20% leibniz_pi regression suggest different workloads stress different parts of the system. Next step when revisiting performance: Add profiling infrastructure first. Flame graphs, cache miss counters, allocation tracking. Then optimize what the data says matters.
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
navicore/patch-seq#188
No description provided.