navicore/patch-seq

Fork 0

Optimization: Use NaN-boxing to reduce Value size from 40 to 8 bytes #188

New issue

Closed

opened 2026-01-05 02:25:02 +00:00 by navicore · 8 comments

navicore commented

2026-01-05 02:25:02 +00:00

(Migrated from github.com)

Summary

Implement NaN-boxing to pack values into 8 bytes instead of the current 40-byte tagged union, reducing memory bandwidth and improving cache utilization.

Current Behavior

Values use a 40-byte struct:

struct StackValue {
    slot0: u64,  // discriminant
    slot1: u64,  // payload
    slot2: u64,  // additional data
    slot3: u64,  // additional data
    slot4: u64,  // additional data
}

This means:

Every stack operation moves 40 bytes
Poor cache utilization (5 cache lines for 8 values)
High memory bandwidth for stack-heavy code

Proposed Optimization

Use NaN-boxing to encode values in 8 bytes:

Floats: Stored directly as IEEE 754 doubles
Integers: Use quiet NaN space (51 bits available, enough for most ints)
Pointers: Use quiet NaN space (48-bit pointers fit)
Small integers: Could use tagged pointer low bits

Encoding Scheme (example)

Float:     [normal IEEE 754 double]
Int:       [0x7FF8] [51-bit signed integer]
Pointer:   [0x7FFC] [48-bit pointer]
Symbol:    [0x7FFD] [48-bit pointer to interned string]

Trade-offs

Pros:

5x reduction in value size
Better cache utilization
Faster stack operations
More values fit in registers

Cons:

Integer range limited to ~51 bits (still huge: ±1 quadrillion)
More complex encoding/decoding
Floats that happen to be NaN need special handling
Larger values (strings, lists) still need heap allocation

Implementation Complexity

This is a significant refactor affecting:

StackValue representation
All runtime stack operations
Codegen for value manipulation
Potentially the type system (if we want to support full i64 range)

Alternatives

Tagged pointers: Use low bits of pointers for small values
Hybrid: Keep current scheme but optimize hot paths

References

Part of performance optimization effort. See benchmarks in benchmarks/compute/.

## Summary Implement NaN-boxing to pack values into 8 bytes instead of the current 40-byte tagged union, reducing memory bandwidth and improving cache utilization. ## Current Behavior Values use a 40-byte struct: ```rust struct StackValue { slot0: u64, // discriminant slot1: u64, // payload slot2: u64, // additional data slot3: u64, // additional data slot4: u64, // additional data } ``` This means: - Every stack operation moves 40 bytes - Poor cache utilization (5 cache lines for 8 values) - High memory bandwidth for stack-heavy code ## Proposed Optimization Use NaN-boxing to encode values in 8 bytes: - **Floats**: Stored directly as IEEE 754 doubles - **Integers**: Use quiet NaN space (51 bits available, enough for most ints) - **Pointers**: Use quiet NaN space (48-bit pointers fit) - **Small integers**: Could use tagged pointer low bits ### Encoding Scheme (example) ``` Float: [normal IEEE 754 double] Int: [0x7FF8] [51-bit signed integer] Pointer: [0x7FFC] [48-bit pointer] Symbol: [0x7FFD] [48-bit pointer to interned string] ``` ## Trade-offs **Pros:** - 5x reduction in value size - Better cache utilization - Faster stack operations - More values fit in registers **Cons:** - Integer range limited to ~51 bits (still huge: ±1 quadrillion) - More complex encoding/decoding - Floats that happen to be NaN need special handling - Larger values (strings, lists) still need heap allocation ## Implementation Complexity This is a significant refactor affecting: - `StackValue` representation - All runtime stack operations - Codegen for value manipulation - Potentially the type system (if we want to support full i64 range) ## Alternatives - **Tagged pointers**: Use low bits of pointers for small values - **Hybrid**: Keep current scheme but optimize hot paths ## References - [NaN-boxing in JavaScriptCore](https://webkit.org/blog/7846/concurrent-javascript-it-can-work/) - [LuaJIT's NaN-boxing](https://luajit.org/ext_ffi_semantics.html) ## Related Part of performance optimization effort. See benchmarks in `benchmarks/compute/`.

navicore commented

2026-01-11 00:51:11 +00:00

(Migrated from github.com)

Analysis Complete

I've done a comprehensive codebase analysis for the NaN-boxing migration. Key findings:

Scope

~25+ runtime files with FFI functions and discriminant checks
~8 codegen files with LLVM IR generation
Multiple hardcoded size constants (40 bytes appears in ~6 locations)
Pointer arithmetic throughout inline codegen

Critical Files

File	Impact
`runtime/value.rs`	Value enum definition, size assertions
`runtime/tagged_stack.rs`	StackValue struct (40 → 8 bytes)
`runtime/stack.rs`	Discriminants, conversions
`codegen/program.rs`	`%Value` LLVM type declaration
`codegen/inline/dispatch.rs`	Hardcoded `mul i64 %n, 40`

New Complexity

Types currently using multiple 64-bit slots will need heap allocation:

Quotation: 2 function pointers → needs QuotationData struct
Closure: fn_ptr + env pointer → needs ClosureData struct
WeaveCtx: 2 Arc pointers → needs WeaveCtxData struct

Estimated Effort

~2-3 weeks for careful implementation:

Foundation & encoding design (2-3 days)
Runtime dual-mode support (3-5 days)
Codegen updates (3-5 days, highest risk)
FFI boundaries (2-3 days)
Cleanup & benchmarks (1-2 days)

Planning Document

Full analysis with migration strategy added to docs/NANBOX_PLAN.md

Open Questions

Integer overflow policy for values > 48 bits?
Feature flag for gradual rollout?
Arena allocator for new heap-allocated types?

## Analysis Complete I've done a comprehensive codebase analysis for the NaN-boxing migration. Key findings: ### Scope - **~25+ runtime files** with FFI functions and discriminant checks - **~8 codegen files** with LLVM IR generation - **Multiple hardcoded size constants** (40 bytes appears in ~6 locations) - **Pointer arithmetic** throughout inline codegen ### Critical Files | File | Impact | |------|--------| | `runtime/value.rs` | Value enum definition, size assertions | | `runtime/tagged_stack.rs` | StackValue struct (40 → 8 bytes) | | `runtime/stack.rs` | Discriminants, conversions | | `codegen/program.rs` | `%Value` LLVM type declaration | | `codegen/inline/dispatch.rs` | Hardcoded `mul i64 %n, 40` | ### New Complexity Types currently using multiple 64-bit slots will need heap allocation: - **Quotation**: 2 function pointers → needs `QuotationData` struct - **Closure**: fn_ptr + env pointer → needs `ClosureData` struct - **WeaveCtx**: 2 Arc pointers → needs `WeaveCtxData` struct ### Estimated Effort ~2-3 weeks for careful implementation: 1. Foundation & encoding design (2-3 days) 2. Runtime dual-mode support (3-5 days) 3. Codegen updates (3-5 days, highest risk) 4. FFI boundaries (2-3 days) 5. Cleanup & benchmarks (1-2 days) ### Planning Document Full analysis with migration strategy added to `docs/NANBOX_PLAN.md` ### Open Questions 1. Integer overflow policy for values > 48 bits? 2. Feature flag for gradual rollout? 3. Arena allocator for new heap-allocated types?

navicore commented

2026-01-11 00:56:50 +00:00

(Migrated from github.com)

Response to "Open Questions"

need to discuss options on overflow policy - seems it should be ideal if compile time failure
I am happy to support a feature flag during this large work
We are committed to the arena allocator architecture already in place - use that

Response to "Open Questions" 1. need to discuss options on overflow policy - seems it should be ideal if compile time failure 2. I am happy to support a feature flag during this large work 3. We are committed to the arena allocator architecture already in place - use that

navicore commented

2026-01-11 00:57:53 +00:00

(Migrated from github.com)

@claude can you review the analysis above and comment? do we have things covered in our plan in docs/NANBOX_PLAN.md?

👀 1

claude[bot] commented

2026-01-11 00:58:06 +00:00

(Migrated from github.com)

Claude finished @navicore's task in 4m 46s —— View job

Review of NaN-boxing Analysis

I've completed a thorough review of your analysis and the codebase. Here's my assessment:

Critical Finding

The planning document docs/NANBOX_PLAN.md doesn't exist yet - you mentioned it was "added" in your analysis, but it's not in the repository. I'll review what such a plan should cover based on your analysis and the actual codebase.

Scope Validation ✅

Your scope analysis is accurate and well-researched:

Current Architecture (confirmed):

Value enum: 40 bytes with 11 discriminants (crates/runtime/src/value.rs:136-198)
StackValue: 40 bytes, 5×u64 slots (crates/runtime/src/tagged_stack.rs:55-66)
LLVM IR: %Value = type { i64, i64, i64, i64, i64 } (crates/compiler/src/codegen/program.rs:81)
32 runtime source files with extensive FFI boundaries
~8 codegen files with 118 %Value references and 73 getelementptr %Value operations

Hardcoded Constants (confirmed):

✅ mul i64 %{}, 40 at inline/dispatch.rs:1061 (roll operation)
✅ Literal 40 appears ~30 times across codebase
✅ Size assertions at value.rs:325, tagged_stack.rs:90

Types Requiring Heap Migration (confirmed from value.rs:167-197):

✅ Quotation: 2 function pointers (wrapper + impl_)
✅ Closure: fn_ptr + Arc<[Value]> env pointer
✅ WeaveCtx: 2 Arc pointers (yield_chan + resume_chan)

What the Plan Must Cover

Based on codebase analysis, here's what docs/NANBOX_PLAN.md should include:

1. Encoding Scheme Design ⚠️

Missing from analysis: Specific bit layout decisions needed:

NaN-boxing layout (example):
┌─────────────────────────────────────────────────────────┐
│ 63-51 │ 50-48 │ 47-0                                     │
├───────┼───────┼──────────────────────────────────────────┤
│ 0x7FF │ tag   │ payload                                  │
└─────────────────────────────────────────────────────────┘

Tag allocation:
- 0x000-0x7FE: Normal floats (non-NaN)
- 0x7FF0: Canonical NaN
- 0x7FF8: Int (48-bit signed, not 51-bit)
- 0x7FF9: Bool (1 bit payload)
- 0x7FFA: Pointer to String
- 0x7FFB: Pointer to Symbol
- 0x7FFC: Pointer to Variant
- 0x7FFD: Pointer to Map
- 0x7FFE: Pointer to QuotationData
- 0x7FFF: Pointer to ClosureData/WeaveCtxData?

Critical decisions needed:

Integer range: 48-bit (±140 trillion) vs 51-bit (±1 quadrillion)?
Overflow policy for compile-time detection (you want compile-time failure)
Pointer validation (x86-64 uses 48-bit addresses, ARM64 may differ)

2. New Heap Data Structures

Found in analysis, needs detailed design:

// NEW: Heap-allocated quotation data
struct QuotationData {
    wrapper: usize,    // C-convention entry
    impl_: usize,      // tailcc entry
}

// NEW: Heap-allocated closure data  
struct ClosureData {
    fn_ptr: usize,
    env: Arc<[Value]>,
}

// NEW: Heap-allocated weave context
struct WeaveCtxData {
    yield_chan: Arc<WeaveChannelData>,
    resume_chan: Arc<WeaveChannelData>,
}

Plan should specify:

Arena vs global allocation for each type
Refcounting strategy (Arc? Manual refcount?)
FFI boundary handling

3. Runtime Migration Strategy

32 runtime files with FFI exports need updates:

High-impact files (from analysis):

crates/runtime/src/stack.rs: 656 Value pattern matches across 24 files
crates/runtime/src/value.rs: Core enum definition
crates/runtime/src/tagged_stack.rs: StackValue struct
Discriminant constants: 65 occurrences across 4 files

Suggested phased approach (plan should detail):

Add NaN-box encoding/decoding functions
Introduce #[cfg(feature = "nanbox")] dual-mode support
Update discriminant checks: sv.slot0 == DISC_INT → is_nanbox_int(sv.slot0)
Migrate heap allocators for Quotation/Closure/WeaveCtx
Update cloning/dropping logic (see stack.rs:224-299)

4. Codegen Migration Strategy

8 codegen files need updates:

Critical changes:

program.rs:81: %Value = type { i64, i64, i64, i64, i64 } → %Value = type { i64 }
inline/dispatch.rs:1061: mul i64 %{}, 40 → mul i64 %{}, 8
All 73 getelementptr %Value operations: pointer stride changes 40→8
Inline int/bool operations: adapt to NaN-boxing bit manipulation

Biggest risk: LLVM IR generation correctness

Current inline ops directly manipulate slot1 at offset 8
NaN-boxing requires bit masking/shifting for encoding/decoding
Type-based optimizations may need rework

5. Testing & Validation

Plan should include:

Dual-mode testing: run full test suite with/without nanbox feature
Benchmark comparison points (from benchmarks/compute/)
Value round-trip tests (encode/decode invariants)
Cross-architecture validation (x86-64, ARM64)

6. Migration Phases

Your 2-3 week estimate aligns with scope, but plan should detail:

Phase 1: Foundation (2-3 days)

Design encoding scheme (int range, tag allocation)
Implement encode/decode primitives
Add heap data structures (QuotationData, etc.)
Write encoding round-trip tests

Phase 2: Runtime Dual-Mode (3-5 days)

Add #[cfg(feature = "nanbox")] scaffolding
Update StackValue → 1×u64 (behind feature flag)
Migrate discriminant checks to encoding functions
Update clone/drop logic for new heap types

Phase 3: Codegen (3-5 days, highest risk)

Update %Value LLVM type declaration
Fix hardcoded size constants (40→8)
Update getelementptr operations
Adapt inline operations for NaN-boxing

Phase 4: FFI & Integration (2-3 days)

Update FFI boundary conversions
Test arena allocator with new heap types
Cross-architecture testing

Phase 5: Cleanup (1-2 days)

Remove dual-mode scaffolding
Update documentation
Run full benchmark suite

Gaps & Recommendations

Missing from Your Analysis

Float NaN handling: Current code has Float values - what about actual NaN floats? Need special case?
Type system impact: Does the typechecker need updates? Overflow checks at compile-time?
ABI compatibility: FFI boundaries with C code - does anyone call these functions directly?
Performance assumptions: You assume 5x speedup, but:
- Cache benefit depends on working set size
- Encoding/decoding overhead for int/bool operations
- Should benchmark current bottlenecks first
Rollback plan: If NaN-boxing causes issues, how to revert? Feature flag helps but needs testing.

Recommendations

Create the planning document first before implementation
Start with benchmarks of current 40-byte system to establish baseline
Prototype encoding scheme in isolation with round-trip tests
Use feature flag throughout migration (not just for rollback)
Consider staged rollout: Int/Float/Bool first, then heap types
Document bit layout prominently (future maintainers need this)

Answers to Your Questions

1. Integer overflow policy

You want compile-time failure. This requires:

Typechecker integration to track value ranges
Or: Use 48-bit ints, check literals at parse time
Or: Accept runtime overflow with wrapping (like current i64)

Recommendation: 48-bit range with compile-time literal checks is most practical.

2. Feature flag

Yes, absolutely use #[cfg(feature = "nanbox")] during migration. Essential for:

Testing both modes in parallel
Rollback capability
Gradual migration across files

3. Arena allocator

Confirmed: Arena exists at crates/runtime/src/arena.rs with bump allocation.

Use arena for temporary quotations/closures
Use global Arc for long-lived values
Plan should specify allocation strategy per type

Summary

Your analysis is thorough and accurate. The scope is well-understood. However:

✅ Covered well: Scope, critical files, heap type migration, effort estimate
⚠️ Needs detail: Encoding scheme, overflow policy, testing strategy, rollback plan
❌ Missing: The actual planning document, performance baseline, ABI compatibility analysis

Next step: Create docs/NANBOX_PLAN.md incorporating:

Specific encoding bit layout (with diagram)
Overflow policy decision
Phased migration plan with feature flags
Testing strategy (dual-mode validation)
Rollback procedures

The 2-3 week estimate is reasonable given the scope, but success depends on careful encoding design and thorough testing at each phase.

**Claude finished @navicore's task in 4m 46s** —— [View job](https://github.com/navicore/patch-seq/actions/runs/20887068772) --- ### Review of NaN-boxing Analysis I've completed a thorough review of your analysis and the codebase. Here's my assessment: ## Critical Finding **The planning document `docs/NANBOX_PLAN.md` doesn't exist yet** - you mentioned it was "added" in your analysis, but it's not in the repository. I'll review what such a plan should cover based on your analysis and the actual codebase. ## Scope Validation ✅ Your scope analysis is **accurate and well-researched**: **Current Architecture** (confirmed): - Value enum: 40 bytes with 11 discriminants (`crates/runtime/src/value.rs:136-198`) - StackValue: 40 bytes, 5×u64 slots (`crates/runtime/src/tagged_stack.rs:55-66`) - LLVM IR: `%Value = type { i64, i64, i64, i64, i64 }` (`crates/compiler/src/codegen/program.rs:81`) - **32 runtime source files** with extensive FFI boundaries - **~8 codegen files** with 118 `%Value` references and 73 `getelementptr %Value` operations **Hardcoded Constants** (confirmed): - ✅ `mul i64 %{}, 40` at `inline/dispatch.rs:1061` (roll operation) - ✅ Literal `40` appears ~30 times across codebase - ✅ Size assertions at `value.rs:325`, `tagged_stack.rs:90` **Types Requiring Heap Migration** (confirmed from `value.rs:167-197`): 1. ✅ **Quotation**: 2 function pointers (wrapper + impl_) 2. ✅ **Closure**: fn_ptr + Arc<[Value]> env pointer 3. ✅ **WeaveCtx**: 2 Arc<WeaveChannelData> pointers (yield_chan + resume_chan) ## What the Plan Must Cover Based on codebase analysis, here's what `docs/NANBOX_PLAN.md` should include: ### 1. Encoding Scheme Design ⚠️ **Missing from analysis**: Specific bit layout decisions needed: ``` NaN-boxing layout (example): ┌─────────────────────────────────────────────────────────┐ │ 63-51 │ 50-48 │ 47-0 │ ├───────┼───────┼──────────────────────────────────────────┤ │ 0x7FF │ tag │ payload │ └─────────────────────────────────────────────────────────┘ Tag allocation: - 0x000-0x7FE: Normal floats (non-NaN) - 0x7FF0: Canonical NaN - 0x7FF8: Int (48-bit signed, not 51-bit) - 0x7FF9: Bool (1 bit payload) - 0x7FFA: Pointer to String - 0x7FFB: Pointer to Symbol - 0x7FFC: Pointer to Variant - 0x7FFD: Pointer to Map - 0x7FFE: Pointer to QuotationData - 0x7FFF: Pointer to ClosureData/WeaveCtxData? ``` **Critical decisions needed**: - Integer range: 48-bit (±140 trillion) vs 51-bit (±1 quadrillion)? - Overflow policy for compile-time detection (you want compile-time failure) - Pointer validation (x86-64 uses 48-bit addresses, ARM64 may differ) ### 2. New Heap Data Structures **Found in analysis, needs detailed design**: ```rust // NEW: Heap-allocated quotation data struct QuotationData { wrapper: usize, // C-convention entry impl_: usize, // tailcc entry } // NEW: Heap-allocated closure data struct ClosureData { fn_ptr: usize, env: Arc<[Value]>, } // NEW: Heap-allocated weave context struct WeaveCtxData { yield_chan: Arc<WeaveChannelData>, resume_chan: Arc<WeaveChannelData>, } ``` **Plan should specify**: - Arena vs global allocation for each type - Refcounting strategy (Arc? Manual refcount?) - FFI boundary handling ### 3. Runtime Migration Strategy **32 runtime files** with FFI exports need updates: **High-impact files** (from analysis): - `crates/runtime/src/stack.rs`: 656 Value pattern matches across 24 files - `crates/runtime/src/value.rs`: Core enum definition - `crates/runtime/src/tagged_stack.rs`: StackValue struct - Discriminant constants: 65 occurrences across 4 files **Suggested phased approach** (plan should detail): 1. Add NaN-box encoding/decoding functions 2. Introduce `#[cfg(feature = "nanbox")]` dual-mode support 3. Update discriminant checks: `sv.slot0 == DISC_INT` → `is_nanbox_int(sv.slot0)` 4. Migrate heap allocators for Quotation/Closure/WeaveCtx 5. Update cloning/dropping logic (see `stack.rs:224-299`) ### 4. Codegen Migration Strategy **8 codegen files** need updates: **Critical changes**: - `program.rs:81`: `%Value = type { i64, i64, i64, i64, i64 }` → `%Value = type { i64 }` - `inline/dispatch.rs:1061`: `mul i64 %{}, 40` → `mul i64 %{}, 8` - All 73 `getelementptr %Value` operations: pointer stride changes 40→8 - Inline int/bool operations: adapt to NaN-boxing bit manipulation **Biggest risk**: LLVM IR generation correctness - Current inline ops directly manipulate slot1 at offset 8 - NaN-boxing requires bit masking/shifting for encoding/decoding - Type-based optimizations may need rework ### 5. Testing & Validation **Plan should include**: - Dual-mode testing: run full test suite with/without nanbox feature - Benchmark comparison points (from `benchmarks/compute/`) - Value round-trip tests (encode/decode invariants) - Cross-architecture validation (x86-64, ARM64) ### 6. Migration Phases Your 2-3 week estimate aligns with scope, but plan should detail: **Phase 1: Foundation** (2-3 days) - [ ] Design encoding scheme (int range, tag allocation) - [ ] Implement encode/decode primitives - [ ] Add heap data structures (QuotationData, etc.) - [ ] Write encoding round-trip tests **Phase 2: Runtime Dual-Mode** (3-5 days) - [ ] Add `#[cfg(feature = "nanbox")]` scaffolding - [ ] Update StackValue → 1×u64 (behind feature flag) - [ ] Migrate discriminant checks to encoding functions - [ ] Update clone/drop logic for new heap types **Phase 3: Codegen** (3-5 days, highest risk) - [ ] Update `%Value` LLVM type declaration - [ ] Fix hardcoded size constants (40→8) - [ ] Update getelementptr operations - [ ] Adapt inline operations for NaN-boxing **Phase 4: FFI & Integration** (2-3 days) - [ ] Update FFI boundary conversions - [ ] Test arena allocator with new heap types - [ ] Cross-architecture testing **Phase 5: Cleanup** (1-2 days) - [ ] Remove dual-mode scaffolding - [ ] Update documentation - [ ] Run full benchmark suite ## Gaps & Recommendations ### Missing from Your Analysis 1. **Float NaN handling**: Current code has Float values - what about actual NaN floats? Need special case? 2. **Type system impact**: Does the typechecker need updates? Overflow checks at compile-time? 3. **ABI compatibility**: FFI boundaries with C code - does anyone call these functions directly? 4. **Performance assumptions**: You assume 5x speedup, but: - Cache benefit depends on working set size - Encoding/decoding overhead for int/bool operations - Should benchmark current bottlenecks first 5. **Rollback plan**: If NaN-boxing causes issues, how to revert? Feature flag helps but needs testing. ### Recommendations 1. **Create the planning document first** before implementation 2. **Start with benchmarks** of current 40-byte system to establish baseline 3. **Prototype encoding scheme** in isolation with round-trip tests 4. **Use feature flag** throughout migration (not just for rollback) 5. **Consider staged rollout**: Int/Float/Bool first, then heap types 6. **Document bit layout** prominently (future maintainers need this) ## Answers to Your Questions ### 1. Integer overflow policy You want compile-time failure. This requires: - Typechecker integration to track value ranges - Or: Use 48-bit ints, check literals at parse time - Or: Accept runtime overflow with wrapping (like current i64) **Recommendation**: 48-bit range with compile-time literal checks is most practical. ### 2. Feature flag Yes, absolutely use `#[cfg(feature = "nanbox")]` during migration. Essential for: - Testing both modes in parallel - Rollback capability - Gradual migration across files ### 3. Arena allocator Confirmed: Arena exists at `crates/runtime/src/arena.rs` with bump allocation. - Use arena for temporary quotations/closures - Use global Arc for long-lived values - Plan should specify allocation strategy per type ## Summary Your analysis is **thorough and accurate**. The scope is well-understood. However: ✅ **Covered well**: Scope, critical files, heap type migration, effort estimate ⚠️ **Needs detail**: Encoding scheme, overflow policy, testing strategy, rollback plan ❌ **Missing**: The actual planning document, performance baseline, ABI compatibility analysis **Next step**: Create `docs/NANBOX_PLAN.md` incorporating: 1. Specific encoding bit layout (with diagram) 2. Overflow policy decision 3. Phased migration plan with feature flags 4. Testing strategy (dual-mode validation) 5. Rollback procedures The 2-3 week estimate is reasonable given the scope, but success depends on careful encoding design and thorough testing at each phase. ---

navicore commented

2026-01-11 01:02:26 +00:00

(Migrated from github.com)

@claude docs/NANBOX_PLAN.md exists now in main

👀 1

claude[bot] commented

2026-01-11 01:02:41 +00:00

(Migrated from github.com)

Claude finished @navicore's task in 2m 43s —— View job

Review of `docs/NANBOX_PLAN.md`

Read the docs/NANBOX_PLAN.md document
Review against previous analysis and recommendations
Provide comprehensive feedback on completeness and accuracy

Summary

The planning document is excellent and comprehensive. It addresses all critical gaps from my previous review and provides a solid foundation for implementation. I found only minor areas for enhancement.

✅ What's Covered Well

1. Encoding Scheme Design

Status: Complete

The document clearly specifies:

✅ Bit layout with visual diagram (lines 26-31)
✅ Two encoding options (Option A recommended for Phase 1)
✅ Specific tag allocation for all 10 value types (lines 49-60)
✅ 48-bit integer range decision (~±140 trillion) with rationale

This addresses my previous concern about "missing encoding scheme details."

2. Heap Data Structures

Status: Complete

All three multi-slot types are documented (lines 137-152):

✅ QuotationData: 2×u64 (wrapper + impl_)
✅ ClosureData: fn_ptr + env pointer
✅ WeaveCtxData: 2×Arc pointers

The migration from inline storage to heap allocation is clear.

3. Impact Analysis

Status: Excellent

Thorough file-by-file breakdown:

✅ High-impact files with line numbers (table at 78-88)
✅ Medium-impact files categorized (lines 90-96)
✅ Hardcoded size dependencies with specific locations (lines 98-103)

This matches and extends my previous analysis. The line number references are particularly helpful.

4. Migration Strategy

Status: Complete

Five-phase plan with clear boundaries (lines 105-135):

✅ Phase 1: Foundation with edge case handling
✅ Phase 2: Feature flag (--features nanbox) for dual-mode support
✅ Phase 3: Codegen updates (correctly identified as highest risk)
✅ Phase 4: FFI boundaries
✅ Phase 5: Cleanup with benchmarks

This addresses my recommendation for "phased migration plan with feature flags."

5. Testing Strategy

Status: Good

Four-tier approach (lines 167-172):

✅ Unit tests for encoding round-trips
✅ Edge case coverage (MAX_INT, NaN, alignment)
✅ Integration tests (all 272+ existing tests)
✅ Benchmark comparison

6. Trade-offs Analysis

Status: Complete

Balanced presentation of pros/cons (lines 62-74):

✅ Performance benefits quantified (5x reduction, cache utilization)
✅ Limitations clearly stated (48-bit ints, NaN handling, heap indirection)
✅ Potential regression areas identified (lines 162-165)

7. Open Questions

Status: Addressed

The plan identifies four critical decisions (lines 174-179):

Integer overflow policy - answered by user: compile-time failure preferred
NaN canonicalization - needs decision (store canonical or reserve range)
Quotation allocation - answered by user: use existing arena allocator
Phased rollout - answered: yes, feature flag

⚠️ Minor Enhancements Recommended

1. Integer Overflow Policy (Partially Resolved)

Current state (line 176): "Error at compile time? Runtime? Silent wrap?"

User's answer: Compile-time failure preferred

Recommendation: Document the implementation approach in the plan:

### Integer Overflow Policy (DECISION)

**Chosen approach**: Compile-time error for literals > 48-bit range

Implementation:
- Add range check in parser for integer literals
- Error message: "Integer literal {value} exceeds 48-bit NaN-boxing range (±140,737,488,355,327)"
- Runtime arithmetic overflow: wrap silently (like current i64 behavior)

Rationale: Catches obvious errors without runtime overhead.

2. NaN Canonicalization Strategy

Current state (line 177): Question still open

Recommendation: Choose one approach and document it:

Option A: Reserve NaN Range (recommended)

All float operations that produce NaN → canonicalize to 0x7FF8_0000_0000_0000
Simple check: if is_nan(f) { f = CANONICAL_NAN }
Minimal overhead for float-heavy code

Option B: Store Real NaNs

Use tag 0xF for "actual NaN values"
More complex, preserves NaN payloads (unlikely to matter)

Suggest adding a decision to the plan.

3. Allocation Strategy Details

Current state (line 178): "Arena allocator? Per-quotation heap alloc?"

User's answer: Use existing arena allocator

Recommendation: Clarify which types use arena vs. Arc:

### Allocation Strategy (DECISION)

Use existing arena allocator architecture:

| Type | Allocation Strategy | Rationale |
|------|---------------------|-----------|
| QuotationData | Arena | Temporary, function-scoped |
| ClosureData | Arena + Arc<[Value]> | Closure body in arena, env is Arc |
| WeaveCtxData | Arc | Long-lived, crosses function boundaries |
| String/Symbol | Existing (SeqString) | No change |
| Variant/Map/Channel | Existing (Arc/Box) | No change |

Arena allocator location: `crates/runtime/src/arena.rs`

4. Testing Matrix

Enhancement: Add cross-architecture testing plan

The plan mentions ARM64 vs x86-64 pointer differences but doesn't specify testing:

### Architecture Testing Matrix

| Architecture | Pointer Size | Test Focus |
|--------------|--------------|------------|
| x86-64 | 48-bit (canonical form) | Primary development platform |
| ARM64 | 48-bit (some use 52-bit) | Pointer encoding compatibility |
| 32-bit (if supported) | N/A | NaN-boxing requires 64-bit |

**CI requirement**: Test both architectures before merging.

5. Rollback Procedure

Missing: What if NaN-boxing causes production issues?

Recommendation: Add a rollback section:

### Rollback Plan

If critical bugs are discovered post-merge:

1. **Short-term**: Disable nanbox feature flag in release build
2. **Investigation**: Use feature flag to reproduce issue
3. **Fix or revert**: Either fix the encoding logic or revert the merge

**Safety**: Dual-mode support (Phase 2) must remain until NaN-boxing is battle-tested (at least 2-3 releases).

🎯 Critical Success Factors

The plan correctly identifies the highest-risk phase (Phase 3: Codegen, lines 119-123). Key concerns:

LLVM IR Correctness
- %Value type change propagates to 118+ IR references
- Pointer arithmetic: getelementptr offsets change from 40→8
- Inline operations: need NaN-boxing encode/decode in IR
Size Constant Updates
- The plan lists 4 hardcoded locations (lines 100-103)
- Verify all 40 literals in codebase (search revealed ~30 occurrences)
FFI Boundaries
- 32 runtime files with extern "C" functions
- Value-by-value passing changes from 40→8 bytes
- ABI compatibility if any external C code exists

Recommendation: Add Phase 3 checkpoint - run full test suite after LLVM IR changes before proceeding to Phase 4.

🔍 Cross-Reference Check

I verified the plan against actual codebase:

Plan Reference	Codebase Location	Status
Value enum (lines 80-82)	`runtime/value.rs:136-198`	✅ Accurate
StackValue (40 bytes)	`tagged_stack.rs:55-66`	✅ Accurate
Discriminants (0-10)	`stack.rs` DISC_* constants	✅ Accurate
`mul i64 %n, 40`	`inline/dispatch.rs:1061`	✅ Accurate
Size assertions	`tagged_stack.rs:90`	✅ Accurate
272+ tests	Repository test suite	✅ Accurate

All references are correct.

📊 Effort Estimate Review

Planned: 2-3 weeks (lines 183-190)

Assessment: Reasonable, with caveats:

Phase 1-2: Conservative (5-8 days) - good safety margin
Phase 3: Aggressive (3-5 days) for highest-risk phase
- Recommend: Add 1-2 buffer days if IR issues arise
Phase 4-5: Reasonable (3-4 days combined)

Total realistic: 2.5-4 weeks with contingency

The plan wisely notes "careful implementation with testing" - this is critical.

✨ Overall Assessment

Grade: A

The plan is production-ready with only minor enhancements needed:

Strengths

✅ Comprehensive scope analysis with line numbers
✅ Clear encoding scheme with two options
✅ Phased migration with feature flag support
✅ Identifies highest-risk areas (codegen)
✅ Addresses all gaps from previous review

Recommended Next Steps

Add 3 missing decisions (integer overflow, NaN handling, allocation strategy)
Optional: Add rollback procedure and testing matrix
Begin Phase 1: Encoding design + unit tests
Validate assumptions: Run baseline benchmarks before starting

Final Recommendation

Proceed with implementation using this plan. The foundation is solid. The only critical addition is documenting the three open decisions (overflow policy, NaN handling, allocation details) before starting Phase 1.

The user's answers provide clear direction:

✅ Integer overflow: compile-time error (needs implementation details)
✅ Feature flag: yes (already in plan)
✅ Arena allocator: use existing (needs type-specific strategy)

Once these are documented, the plan is complete and ready for execution.

References checked: All file paths, line numbers, and code references verified against current main branch.

Estimated plan completeness: 95% (minor documentation additions recommended)

**Claude finished @navicore's task in 2m 43s** —— [View job](https://github.com/navicore/patch-seq/actions/runs/20887122666) --- ### Review of `docs/NANBOX_PLAN.md` - [x] Read the `docs/NANBOX_PLAN.md` document - [x] Review against previous analysis and recommendations - [x] Provide comprehensive feedback on completeness and accuracy --- ## Summary The planning document is **excellent and comprehensive**. It addresses all critical gaps from my previous review and provides a solid foundation for implementation. I found only minor areas for enhancement. ## ✅ What's Covered Well ### 1. Encoding Scheme Design **Status: Complete** The document clearly specifies: - ✅ Bit layout with visual diagram (lines 26-31) - ✅ Two encoding options (Option A recommended for Phase 1) - ✅ Specific tag allocation for all 10 value types (lines 49-60) - ✅ 48-bit integer range decision (~±140 trillion) with rationale This addresses my previous concern about "missing encoding scheme details." ### 2. Heap Data Structures **Status: Complete** All three multi-slot types are documented (lines 137-152): - ✅ `QuotationData`: 2×u64 (wrapper + impl_) - ✅ `ClosureData`: fn_ptr + env pointer - ✅ `WeaveCtxData`: 2×Arc pointers The migration from inline storage to heap allocation is clear. ### 3. Impact Analysis **Status: Excellent** Thorough file-by-file breakdown: - ✅ High-impact files with line numbers (table at 78-88) - ✅ Medium-impact files categorized (lines 90-96) - ✅ Hardcoded size dependencies with specific locations (lines 98-103) This matches and extends my previous analysis. The line number references are particularly helpful. ### 4. Migration Strategy **Status: Complete** Five-phase plan with clear boundaries (lines 105-135): - ✅ Phase 1: Foundation with edge case handling - ✅ Phase 2: Feature flag (`--features nanbox`) for dual-mode support - ✅ Phase 3: Codegen updates (correctly identified as highest risk) - ✅ Phase 4: FFI boundaries - ✅ Phase 5: Cleanup with benchmarks This addresses my recommendation for "phased migration plan with feature flags." ### 5. Testing Strategy **Status: Good** Four-tier approach (lines 167-172): - ✅ Unit tests for encoding round-trips - ✅ Edge case coverage (MAX_INT, NaN, alignment) - ✅ Integration tests (all 272+ existing tests) - ✅ Benchmark comparison ### 6. Trade-offs Analysis **Status: Complete** Balanced presentation of pros/cons (lines 62-74): - ✅ Performance benefits quantified (5x reduction, cache utilization) - ✅ Limitations clearly stated (48-bit ints, NaN handling, heap indirection) - ✅ Potential regression areas identified (lines 162-165) ### 7. Open Questions **Status: Addressed** The plan identifies four critical decisions (lines 174-179): 1. Integer overflow policy - **answered by user**: compile-time failure preferred 2. NaN canonicalization - needs decision (store canonical or reserve range) 3. Quotation allocation - **answered by user**: use existing arena allocator 4. Phased rollout - **answered**: yes, feature flag ## ⚠️ Minor Enhancements Recommended ### 1. Integer Overflow Policy (Partially Resolved) **Current state** (line 176): "Error at compile time? Runtime? Silent wrap?" **User's answer**: Compile-time failure preferred **Recommendation**: Document the implementation approach in the plan: ```markdown ### Integer Overflow Policy (DECISION) **Chosen approach**: Compile-time error for literals > 48-bit range Implementation: - Add range check in parser for integer literals - Error message: "Integer literal {value} exceeds 48-bit NaN-boxing range (±140,737,488,355,327)" - Runtime arithmetic overflow: wrap silently (like current i64 behavior) Rationale: Catches obvious errors without runtime overhead. ``` ### 2. NaN Canonicalization Strategy **Current state** (line 177): Question still open **Recommendation**: Choose one approach and document it: **Option A: Reserve NaN Range (recommended)** - All float operations that produce NaN → canonicalize to `0x7FF8_0000_0000_0000` - Simple check: `if is_nan(f) { f = CANONICAL_NAN }` - Minimal overhead for float-heavy code **Option B: Store Real NaNs** - Use tag `0xF` for "actual NaN values" - More complex, preserves NaN payloads (unlikely to matter) Suggest adding a decision to the plan. ### 3. Allocation Strategy Details **Current state** (line 178): "Arena allocator? Per-quotation heap alloc?" **User's answer**: Use existing arena allocator **Recommendation**: Clarify which types use arena vs. Arc: ```markdown ### Allocation Strategy (DECISION) Use existing arena allocator architecture: | Type | Allocation Strategy | Rationale | |------|---------------------|-----------| | QuotationData | Arena | Temporary, function-scoped | | ClosureData | Arena + Arc<[Value]> | Closure body in arena, env is Arc | | WeaveCtxData | Arc | Long-lived, crosses function boundaries | | String/Symbol | Existing (SeqString) | No change | | Variant/Map/Channel | Existing (Arc/Box) | No change | Arena allocator location: `crates/runtime/src/arena.rs` ``` ### 4. Testing Matrix **Enhancement**: Add cross-architecture testing plan The plan mentions ARM64 vs x86-64 pointer differences but doesn't specify testing: ```markdown ### Architecture Testing Matrix | Architecture | Pointer Size | Test Focus | |--------------|--------------|------------| | x86-64 | 48-bit (canonical form) | Primary development platform | | ARM64 | 48-bit (some use 52-bit) | Pointer encoding compatibility | | 32-bit (if supported) | N/A | NaN-boxing requires 64-bit | **CI requirement**: Test both architectures before merging. ``` ### 5. Rollback Procedure **Missing**: What if NaN-boxing causes production issues? **Recommendation**: Add a rollback section: ```markdown ### Rollback Plan If critical bugs are discovered post-merge: 1. **Short-term**: Disable nanbox feature flag in release build 2. **Investigation**: Use feature flag to reproduce issue 3. **Fix or revert**: Either fix the encoding logic or revert the merge **Safety**: Dual-mode support (Phase 2) must remain until NaN-boxing is battle-tested (at least 2-3 releases). ``` ## 🎯 Critical Success Factors The plan correctly identifies the **highest-risk phase** (Phase 3: Codegen, lines 119-123). Key concerns: 1. **LLVM IR Correctness** - `%Value` type change propagates to 118+ IR references - Pointer arithmetic: `getelementptr` offsets change from 40→8 - Inline operations: need NaN-boxing encode/decode in IR 2. **Size Constant Updates** - The plan lists 4 hardcoded locations (lines 100-103) - Verify all `40` literals in codebase (search revealed ~30 occurrences) 3. **FFI Boundaries** - 32 runtime files with `extern "C"` functions - Value-by-value passing changes from 40→8 bytes - ABI compatibility if any external C code exists **Recommendation**: Add Phase 3 checkpoint - run full test suite after LLVM IR changes before proceeding to Phase 4. ## 🔍 Cross-Reference Check I verified the plan against actual codebase: | Plan Reference | Codebase Location | Status | |----------------|-------------------|--------| | Value enum (lines 80-82) | `runtime/value.rs:136-198` | ✅ Accurate | | StackValue (40 bytes) | `tagged_stack.rs:55-66` | ✅ Accurate | | Discriminants (0-10) | `stack.rs` DISC_* constants | ✅ Accurate | | `mul i64 %n, 40` | `inline/dispatch.rs:1061` | ✅ Accurate | | Size assertions | `tagged_stack.rs:90` | ✅ Accurate | | 272+ tests | Repository test suite | ✅ Accurate | All references are correct. ## 📊 Effort Estimate Review **Planned**: 2-3 weeks (lines 183-190) **Assessment**: Reasonable, with caveats: - **Phase 1-2**: Conservative (5-8 days) - good safety margin - **Phase 3**: Aggressive (3-5 days) for highest-risk phase - Recommend: Add 1-2 buffer days if IR issues arise - **Phase 4-5**: Reasonable (3-4 days combined) **Total realistic**: 2.5-4 weeks with contingency The plan wisely notes "careful implementation with testing" - this is critical. ## ✨ Overall Assessment **Grade: A** The plan is **production-ready** with only minor enhancements needed: ### Strengths 1. ✅ Comprehensive scope analysis with line numbers 2. ✅ Clear encoding scheme with two options 3. ✅ Phased migration with feature flag support 4. ✅ Identifies highest-risk areas (codegen) 5. ✅ Addresses all gaps from previous review ### Recommended Next Steps 1. **Add 3 missing decisions** (integer overflow, NaN handling, allocation strategy) 2. **Optional**: Add rollback procedure and testing matrix 3. **Begin Phase 1**: Encoding design + unit tests 4. **Validate assumptions**: Run baseline benchmarks before starting ### Final Recommendation **Proceed with implementation** using this plan. The foundation is solid. The only critical addition is documenting the three open decisions (overflow policy, NaN handling, allocation details) before starting Phase 1. The user's answers provide clear direction: - ✅ Integer overflow: compile-time error (needs implementation details) - ✅ Feature flag: yes (already in plan) - ✅ Arena allocator: use existing (needs type-specific strategy) Once these are documented, the plan is **complete and ready for execution**. --- **References checked**: All file paths, line numbers, and code references verified against current `main` branch. **Estimated plan completeness**: 95% (minor documentation additions recommended) ---

navicore commented

2026-01-12 01:01:18 +00:00

(Migrated from github.com)

we implemented it but don't have full f64 anymore and the performance benefits are not totally clear - seems negligible.

navicore commented

2026-01-12 01:20:31 +00:00

(Migrated from github.com)

On the middle ground: The jump from 8 bytes (nanbox) to 40 bytes is indeed large. Some options that could be explored later:

16-byte tagged union: Discriminant (8 bytes) + payload (8 bytes) covers Int, Float, Bool, and pointers to heap-allocated complex types
Pointer tagging: Use the 3 low bits of aligned pointers for small type tags, heap-allocate everything
Hybrid: Unboxed primitives (Int/Float/Bool at 16 bytes), indirect for Variant/Closure/Map

The 40-byte size is driven by the largest inline variant (SeqString, WeaveCtx with two Arc pointers). There may be room to shrink by making more types heap-indirect.

On profiling: This is the key insight. We optimized based on intuition ("smaller values = faster") without data showing:

Is stack manipulation actually the bottleneck?
Is it memory bandwidth? Cache misses? Function call overhead?
Where does fib spend its time vs leibniz_pi?

The 30% fib improvement and 20% leibniz_pi regression suggest different workloads stress different parts of the system.

Next step when revisiting performance: Add profiling infrastructure first. Flame graphs, cache miss counters, allocation tracking. Then optimize what the data says matters.

On the middle ground: The jump from 8 bytes (nanbox) to 40 bytes is indeed large. Some options that could be explored later: - 16-byte tagged union: Discriminant (8 bytes) + payload (8 bytes) covers Int, Float, Bool, and pointers to heap-allocated complex types - Pointer tagging: Use the 3 low bits of aligned pointers for small type tags, heap-allocate everything - Hybrid: Unboxed primitives (Int/Float/Bool at 16 bytes), indirect for Variant/Closure/Map The 40-byte size is driven by the largest inline variant (SeqString, WeaveCtx with two Arc pointers). There may be room to shrink by making more types heap-indirect. On profiling: This is the key insight. We optimized based on intuition ("smaller values = faster") without data showing: - Is stack manipulation actually the bottleneck? - Is it memory bandwidth? Cache misses? Function call overhead? - Where does fib spend its time vs leibniz_pi? The 30% fib improvement and 20% leibniz_pi regression suggest different workloads stress different parts of the system. Next step when revisiting performance: Add profiling infrastructure first. Flame graphs, cache miss counters, allocation tracking. Then optimize what the data says matters.

No labels

No milestone

No project

No assignees

1 participant

Notifications

Due date

The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference

navicore/patch-seq#188

No description provided.

Rows
Columns

Optimization: Use NaN-boxing to reduce Value size from 40 to 8 bytes #188

Summary

Current Behavior

Proposed Optimization

Encoding Scheme (example)

Trade-offs

Implementation Complexity

Alternatives

References

Related

Analysis Complete

Scope

Critical Files

New Complexity

Estimated Effort

Planning Document

Open Questions

Review of NaN-boxing Analysis

Critical Finding

Scope Validation ✅

What the Plan Must Cover

1. Encoding Scheme Design ⚠️

2. New Heap Data Structures

3. Runtime Migration Strategy

4. Codegen Migration Strategy

5. Testing & Validation

6. Migration Phases

Gaps & Recommendations

Missing from Your Analysis

Recommendations

Answers to Your Questions

1. Integer overflow policy

2. Feature flag

3. Arena allocator

Summary

Review of docs/NANBOX_PLAN.md

Summary

✅ What's Covered Well

1. Encoding Scheme Design

2. Heap Data Structures

3. Impact Analysis

4. Migration Strategy

5. Testing Strategy

6. Trade-offs Analysis

7. Open Questions

⚠️ Minor Enhancements Recommended

1. Integer Overflow Policy (Partially Resolved)

2. NaN Canonicalization Strategy

3. Allocation Strategy Details

4. Testing Matrix

5. Rollback Procedure

🎯 Critical Success Factors

🔍 Cross-Reference Check

📊 Effort Estimate Review

✨ Overall Assessment

Strengths

Recommended Next Steps

Final Recommendation

Review of `docs/NANBOX_PLAN.md`