dynamic variant construction #168

Merged
navicore merged 2 commits from i-149 into main 2026-01-03 15:19:15 +00:00
navicore commented 2026-01-03 14:24:38 +00:00 (Migrated from github.com)

⏺ Issue #149 (Dynamic variant construction - wrap) is now complete. Here's what was done:

Summary

Changed variant tags from integer (u32) to symbol (SeqString) throughout the codebase:

Core Changes

  1. value.rs: Changed VariantData.tag from u32 to SeqString
  2. variant_ops.rs: Updated variant.tag to return Symbol, added wrap-0 through wrap-4, added symbol.= comparison
  3. codegen.rs: Pattern matching now uses cascading if-else with string comparison instead of integer switch
  4. ast.rs: Constructor generation pushes Symbol and calls variant.make-N

Stdlib Updates

  • json.seq: Updated json-serialize to use symbol.= with :JsonNull, :JsonBool, etc.
  • yaml.seq: Updated yaml-serialize to use symbol.= comparisons
  • result.seq: All predicates updated to use symbol comparison

Examples Updated

  • examples/lisp/ - All sexpr predicates now use symbol.=
  • examples/json/json_tree.seq - Works with symbol tags

New Features

  • wrap-0 through wrap-4: Dynamic variant construction: ( field1...fieldN Symbol -- Variant )
  • symbol.=: Compare two symbols for equality: ( Symbol Symbol -- Bool )

All 291 runtime tests and 231 compiler tests pass, plus all examples compile and run correctly.

⏺ Issue #149 (Dynamic variant construction - wrap) is now complete. Here's what was done: Summary Changed variant tags from integer (u32) to symbol (SeqString) throughout the codebase: Core Changes 1. value.rs: Changed VariantData.tag from u32 to SeqString 2. variant_ops.rs: Updated variant.tag to return Symbol, added wrap-0 through wrap-4, added symbol.= comparison 3. codegen.rs: Pattern matching now uses cascading if-else with string comparison instead of integer switch 4. ast.rs: Constructor generation pushes Symbol and calls variant.make-N Stdlib Updates - json.seq: Updated json-serialize to use symbol.= with :JsonNull, :JsonBool, etc. - yaml.seq: Updated yaml-serialize to use symbol.= comparisons - result.seq: All predicates updated to use symbol comparison Examples Updated - examples/lisp/ - All sexpr predicates now use symbol.= - examples/json/json_tree.seq - Works with symbol tags New Features - wrap-0 through wrap-4: Dynamic variant construction: ( field1...fieldN Symbol -- Variant ) - symbol.=: Compare two symbols for equality: ( Symbol Symbol -- Bool ) All 291 runtime tests and 231 compiler tests pass, plus all examples compile and run correctly.
claude[bot] commented 2026-01-03 14:26:36 +00:00 (Migrated from github.com)

Code Review: PR #168 - Dynamic Variant Construction

Summary

This PR successfully implements dynamic variant construction by changing variant tags from integer (u32) to symbol (SeqString). This is a foundational change for SON (Seq Object Notation) support and enables runtime variant construction via the new wrap-0 through wrap-4 functions.

Positive Aspects

1. Comprehensive Implementation

  • All core changes are coordinated across compiler, runtime, and stdlib
  • Test coverage confirms 291 runtime tests + 231 compiler tests pass
  • Examples updated consistently (lisp, json, yaml)
  • Well-documented in SON_DESIGN.md

2. Backward Compatibility

  • Changes are internal to the implementation
  • Generated constructors (Make-*) hide the complexity
  • Existing code continues working with new symbol-based tags

3. Good Code Quality

  • Clear comments explaining design decisions (e.g., crates/runtime/src/variant_ops.rs:51-62)
  • Proper safety documentation on unsafe functions
  • Consistent naming conventions (wrap-N, symbol.=)

Issues & Concerns

1. Performance Regression ⚠️ CRITICAL

Location: crates/compiler/src/codegen.rs:4588-4670

The pattern matching implementation changed from O(1) integer switch to O(n) cascading if-else with string comparisons:

// OLD: O(1) switch on integer tag
switch i64 %tag, label %unreachable [
  i64 0, label %arm0
  i64 1, label %arm1
]

// NEW: O(n) cascading if-else with string comparisons
call @patch_seq_symbol_eq_cstr(stack, "VariantName")
peek bool
br i1 result, label %arm, label %next_check

Impact:

  • Match expressions with N arms now require N-1 string comparisons in worst case
  • Each comparison involves: dup, symbol_eq_cstr, peek, pop, branch
  • Performance degrades linearly with number of variants

Recommendations:

  1. Immediate: Document this performance tradeoff in commit message/docs
  2. Future optimization: Consider interning symbols for O(1) pointer comparison
  3. Future optimization: Generate a perfect hash or trie for larger match expressions
  4. Benchmark: Add performance tests for match expressions with many arms

2. Documentation Out of Sync ⚠️ MEDIUM

Location: docs/ARCHITECTURE.md:80-82

The documentation still shows the old u32 tag:

pub struct VariantData {
    pub tag: u32,  // ❌ Should be SeqString
    pub fields: Box<[Value]>,
}

Action Required: Update ARCHITECTURE.md to reflect the new symbol-based tag system.

3. Missing Symbol Interning 💡 ENHANCEMENT

Location: crates/runtime/src/value.rs:150-154

The comment explicitly states symbols are NOT interned:

/// Note: Currently NOT interned (each symbol allocates). Interning may be
/// added in the future for O(1) equality comparison.

Impact:

  • Every symbol allocation creates a new SeqString
  • Symbol equality requires full string comparison
  • Memory overhead for duplicate symbol names

Recommendation: Consider implementing symbol interning in a follow-up PR to:

  • Enable O(1) pointer equality for symbols
  • Reduce memory usage
  • Improve pattern matching performance

4. Potential Optimization in Match Codegen 💡 ENHANCEMENT

Location: crates/compiler/src/codegen.rs:4608-4629

The current implementation duplicates the tag for all but the last comparison:

let compare_stack = if !is_last {
    let dup = self.fresh_temp();
    writeln!(dup);  // Dup for each comparison
    dup
} else {
    current_tag_stack.clone()
};

Recommendation: For matches with many arms, consider extracting the tag symbol into a local LLVM variable once, then comparing against string constants without stack operations.

5. Test Coverage Gaps ⚠️ MEDIUM

While existing tests pass, consider adding:

  • Performance tests: Benchmark match expressions with 10+ arms
  • Symbol interning tests: Verify symbol identity semantics
  • Edge cases: Empty variant names, unicode in symbols, very long symbol names
  • SON round-trip tests: Serialize/deserialize with wrap-N functions

6. Security Consideration 🔒 LOW

Location: crates/runtime/src/variant_ops.rs:73-75

The patch_seq_symbol_eq_cstr function uses expect() on UTF-8 validation:

.expect("Invalid UTF-8 in variant name");

Current Status: Safe because variant names come from compiler-generated code.

Future Risk: If SON parsing or user-defined variant names are added, this could become a crash vector.

Recommendation: Document that variant names must be valid UTF-8 identifiers, or add validation at the source.

Code Quality Issues

7. Duplicate Function Declarations 📋 MINOR

Location: crates/compiler/src/codegen.rs:819 and 1200

The runtime function declarations are duplicated between generate_module() and generate_standalone_function().

Recommendation: Extract to a shared function to avoid maintenance burden.

8. Magic Numbers 📋 MINOR

Location: Various places use field counts (0-4) as magic numbers

Recommendation: Consider defining constants like:

const MAX_VARIANT_FIELDS: usize = 4;

Best Practices

Excellent

  • FFI safety documentation
  • Type-safe variant constructors
  • Cascading if-else correctly handles the last arm without duplication
  • Tests verify end-to-end functionality

Good

  • Symbol type added to Value enum with clear documentation
  • Builtins properly registered with type signatures
  • Examples updated to demonstrate usage patterns

Security Assessment

Overall: LOW RISK

  • No obvious security vulnerabilities
  • String comparisons are safe (no buffer overflows)
  • UTF-8 validation uses safe Rust APIs
  • Panic conditions are documented and expected only on programmer error

Note: Future SON deserialization should validate symbol names before use.

Performance Summary

Aspect Before After Impact
Variant tag storage 4 bytes (u32) 24+ bytes (SeqString) Memory: +20 bytes/variant
Pattern match dispatch O(1) switch O(n) string comparisons Time: Linear degradation
Constructor generation Push int + make-N Push symbol + make-N Minimal
Symbol equality N/A O(m) string comparison Time: Length-dependent

Recommendations Summary

Must Fix Before Merge

  1. Update ARCHITECTURE.md to reflect SeqString tags

Should Address Soon

  1. ⚠️ Document performance tradeoffs in design docs
  2. ⚠️ Add performance benchmarks for match expressions
  3. ⚠️ Add test coverage for edge cases and SON scenarios

Future Enhancements

  1. 💡 Implement symbol interning for O(1) equality
  2. 💡 Optimize match codegen for large unions (hash-based dispatch)
  3. 💡 Extract duplicate runtime declarations
  4. 💡 Add constants for magic numbers

Conclusion

This is a well-executed foundational change that enables important future features (SON, dynamic variant construction). The code quality is high, tests pass comprehensively, and the design is sound.

Primary concern: The O(n) pattern matching performance regression should be documented and eventually optimized via symbol interning or improved codegen.

Recommendation: APPROVE with minor documentation fixes

The performance tradeoff is acceptable for the initial implementation, especially since:

  • Most real-world unions have few variants (< 10)
  • Symbol interning can restore O(1) performance later
  • The feature enables critical SON functionality

Great work on maintaining test coverage and updating all affected code paths! 🚀


Review performed by Claude Code - Automated PR review requested by user

# Code Review: PR #168 - Dynamic Variant Construction ## Summary This PR successfully implements dynamic variant construction by changing variant tags from integer (`u32`) to symbol (`SeqString`). This is a foundational change for SON (Seq Object Notation) support and enables runtime variant construction via the new `wrap-0` through `wrap-4` functions. ## Positive Aspects ### 1. **Comprehensive Implementation** ✅ - All core changes are coordinated across compiler, runtime, and stdlib - Test coverage confirms 291 runtime tests + 231 compiler tests pass - Examples updated consistently (lisp, json, yaml) - Well-documented in SON_DESIGN.md ### 2. **Backward Compatibility** ✅ - Changes are internal to the implementation - Generated constructors (Make-*) hide the complexity - Existing code continues working with new symbol-based tags ### 3. **Good Code Quality** ✅ - Clear comments explaining design decisions (e.g., crates/runtime/src/variant_ops.rs:51-62) - Proper safety documentation on unsafe functions - Consistent naming conventions (wrap-N, symbol.=) ## Issues & Concerns ### 1. **Performance Regression** ⚠️ CRITICAL **Location:** crates/compiler/src/codegen.rs:4588-4670 The pattern matching implementation changed from O(1) integer switch to O(n) cascading if-else with string comparisons: ```llvm // OLD: O(1) switch on integer tag switch i64 %tag, label %unreachable [ i64 0, label %arm0 i64 1, label %arm1 ] // NEW: O(n) cascading if-else with string comparisons call @patch_seq_symbol_eq_cstr(stack, "VariantName") peek bool br i1 result, label %arm, label %next_check ``` **Impact:** - Match expressions with N arms now require N-1 string comparisons in worst case - Each comparison involves: dup, symbol_eq_cstr, peek, pop, branch - Performance degrades linearly with number of variants **Recommendations:** 1. **Immediate:** Document this performance tradeoff in commit message/docs 2. **Future optimization:** Consider interning symbols for O(1) pointer comparison 3. **Future optimization:** Generate a perfect hash or trie for larger match expressions 4. **Benchmark:** Add performance tests for match expressions with many arms ### 2. **Documentation Out of Sync** ⚠️ MEDIUM **Location:** docs/ARCHITECTURE.md:80-82 The documentation still shows the old `u32` tag: ```rust pub struct VariantData { pub tag: u32, // ❌ Should be SeqString pub fields: Box<[Value]>, } ``` **Action Required:** Update ARCHITECTURE.md to reflect the new symbol-based tag system. ### 3. **Missing Symbol Interning** 💡 ENHANCEMENT **Location:** crates/runtime/src/value.rs:150-154 The comment explicitly states symbols are NOT interned: ```rust /// Note: Currently NOT interned (each symbol allocates). Interning may be /// added in the future for O(1) equality comparison. ``` **Impact:** - Every symbol allocation creates a new SeqString - Symbol equality requires full string comparison - Memory overhead for duplicate symbol names **Recommendation:** Consider implementing symbol interning in a follow-up PR to: - Enable O(1) pointer equality for symbols - Reduce memory usage - Improve pattern matching performance ### 4. **Potential Optimization in Match Codegen** 💡 ENHANCEMENT **Location:** crates/compiler/src/codegen.rs:4608-4629 The current implementation duplicates the tag for all but the last comparison: ```rust let compare_stack = if !is_last { let dup = self.fresh_temp(); writeln!(dup); // Dup for each comparison dup } else { current_tag_stack.clone() }; ``` **Recommendation:** For matches with many arms, consider extracting the tag symbol into a local LLVM variable once, then comparing against string constants without stack operations. ### 5. **Test Coverage Gaps** ⚠️ MEDIUM While existing tests pass, consider adding: - **Performance tests:** Benchmark match expressions with 10+ arms - **Symbol interning tests:** Verify symbol identity semantics - **Edge cases:** Empty variant names, unicode in symbols, very long symbol names - **SON round-trip tests:** Serialize/deserialize with wrap-N functions ### 6. **Security Consideration** 🔒 LOW **Location:** crates/runtime/src/variant_ops.rs:73-75 The `patch_seq_symbol_eq_cstr` function uses `expect()` on UTF-8 validation: ```rust .expect("Invalid UTF-8 in variant name"); ``` **Current Status:** Safe because variant names come from compiler-generated code. **Future Risk:** If SON parsing or user-defined variant names are added, this could become a crash vector. **Recommendation:** Document that variant names must be valid UTF-8 identifiers, or add validation at the source. ## Code Quality Issues ### 7. **Duplicate Function Declarations** 📋 MINOR **Location:** crates/compiler/src/codegen.rs:819 and 1200 The runtime function declarations are duplicated between `generate_module()` and `generate_standalone_function()`. **Recommendation:** Extract to a shared function to avoid maintenance burden. ### 8. **Magic Numbers** 📋 MINOR **Location:** Various places use field counts (0-4) as magic numbers **Recommendation:** Consider defining constants like: ```rust const MAX_VARIANT_FIELDS: usize = 4; ``` ## Best Practices ### ✅ Excellent - FFI safety documentation - Type-safe variant constructors - Cascading if-else correctly handles the last arm without duplication - Tests verify end-to-end functionality ### ✅ Good - Symbol type added to Value enum with clear documentation - Builtins properly registered with type signatures - Examples updated to demonstrate usage patterns ## Security Assessment **Overall: LOW RISK** - No obvious security vulnerabilities - String comparisons are safe (no buffer overflows) - UTF-8 validation uses safe Rust APIs - Panic conditions are documented and expected only on programmer error **Note:** Future SON deserialization should validate symbol names before use. ## Performance Summary | Aspect | Before | After | Impact | |--------|--------|-------|--------| | Variant tag storage | 4 bytes (u32) | 24+ bytes (SeqString) | Memory: +20 bytes/variant | | Pattern match dispatch | O(1) switch | O(n) string comparisons | Time: Linear degradation | | Constructor generation | Push int + make-N | Push symbol + make-N | Minimal | | Symbol equality | N/A | O(m) string comparison | Time: Length-dependent | ## Recommendations Summary ### Must Fix Before Merge 1. ❌ **Update ARCHITECTURE.md** to reflect SeqString tags ### Should Address Soon 2. ⚠️ **Document performance tradeoffs** in design docs 3. ⚠️ **Add performance benchmarks** for match expressions 4. ⚠️ **Add test coverage** for edge cases and SON scenarios ### Future Enhancements 5. 💡 Implement symbol interning for O(1) equality 6. 💡 Optimize match codegen for large unions (hash-based dispatch) 7. 💡 Extract duplicate runtime declarations 8. 💡 Add constants for magic numbers ## Conclusion This is a **well-executed foundational change** that enables important future features (SON, dynamic variant construction). The code quality is high, tests pass comprehensively, and the design is sound. **Primary concern:** The O(n) pattern matching performance regression should be documented and eventually optimized via symbol interning or improved codegen. **Recommendation:** ✅ **APPROVE with minor documentation fixes** The performance tradeoff is acceptable for the initial implementation, especially since: - Most real-world unions have few variants (< 10) - Symbol interning can restore O(1) performance later - The feature enables critical SON functionality Great work on maintaining test coverage and updating all affected code paths! 🚀 --- *Review performed by Claude Code - Automated PR review requested by user*
claude[bot] commented 2026-01-03 15:09:54 +00:00 (Migrated from github.com)

Pull Request Review: Dynamic Variant Construction (Issue #149)

Summary

This PR successfully implements dynamic variant construction by migrating from integer-based tags to symbol-based tags throughout the codebase. The implementation is well-executed and all 522 tests (291 runtime + 231 compiler) pass. This is a foundational change that enables SON (Seq Object Notation) support.

Strengths

1. Excellent Architectural Decision

The switch from u32 tags to SeqString (Symbol) tags is the right approach:

  • Enables self-describing data structures (SON)
  • More debuggable (:JsonNull vs 0)
  • Prepares for future serialization features

2. Comprehensive Migration

All layers properly updated: runtime, compiler, stdlib, examples, and docs.

3. New Features Well-Implemented

  • wrap-0 through wrap-4: Clean aliases for dynamic construction
  • symbol.=: Proper equality comparison for symbols
  • Pattern matching preserved with symbolic tags

4. Test Coverage

All existing tests pass (291 runtime + 231 compiler tests).

⚠️ Issues & Concerns

1. Performance Regression in Pattern Matching (CRITICAL)

Issue: The new codegen strategy in codegen_match_statement (crates/compiler/src/codegen.rs:4550) has a performance problem.

Old approach (integer tags): O(1) or O(log n) via jump table

New approach (symbol tags): Cascading if-else with string comparison

  • O(n) linear scan through variants
  • O(m) string comparison per variant
  • Overall: O(n*m) complexity

Impact: Large enums (10+ variants) will suffer, hot path pattern matching becomes bottleneck.

Recommendation: Implement symbol interning with integer IDs

  • Add tag_id: u32 for fast comparison alongside tag_name: SeqString
  • Use compile-time interning: :JsonNull → ID 0, :JsonBool → ID 1
  • Use integer switch for pattern matching (O(1))
  • Keep tag_name for SON serialization

2. Missing peek_bool_value Implementation Check

Location: crates/compiler/src/codegen.rs:869

The function patch_seq_peek_bool_value is declared but needs verification it exists in the runtime.

3. Memory Overhead

VariantData now includes heap-allocated SeqString (~24 bytes per variant). Acceptable for SON benefits, but consider string interning for optimization.

4. Test Gap: No Tests for wrap-N Functions

Add compiler tests for wrap-0 through wrap-4 to verify correct behavior.

5. Potential Edge Case: Duplicate Variant Names

What happens if two unions have the same variant name (e.g., both Result and HttpResponse have :Ok)? Add test case.

🔍 Code Quality Observations

Positive:

  • Consistent error messages
  • Clear documentation of stack effects
  • Proper safety annotations

Minor Issues:

  • Inconsistent tag naming in string_split ("List" vs :List convention)
  • Outdated comments in codegen from integer-based implementation

🛡️ Security Considerations

Good: No unsafe memory operations beyond FFI, stack assertions, bounds checking
⚠️ Watch: String comparison in hot path could be DoS vector with large enums

📊 Performance Benchmarks

Please run benchmarks comparing:

  1. Pattern matching performance (before/after)
  2. JSON parsing (heavy variant creation)
  3. Memory usage for large JSON structures

🎯 Recommendations Summary

MUST FIX (before merge):

  1. Performance: Implement symbol interning to restore O(1) pattern matching
  2. Correctness: Verify patch_seq_peek_bool_value exists

SHOULD FIX:

  1. Add test coverage for wrap-N functions
  2. Benchmark performance impact
  3. Clarify tag naming consistency

NICE TO HAVE:

  1. Update outdated comments
  2. Test duplicate variant names across unions

Final Verdict

Status: ⚠️ Approve with Required Changes

This PR is architecturally sound but the pattern matching performance regression is a blocker.

Suggested Action:

  1. Implement symbol interning (dual tag_id + tag_name)
  2. Restore O(1) pattern matching via integer switch
  3. Keep symbol names for SON serialization
  4. Verify all FFI functions exist
  5. Then merge

Great work on a complex migration! Once the performance issue is resolved, this will be a solid foundation for SON.


Reviewed by: Claude (Anthropic)
Review Date: 2026-01-03

# Pull Request Review: Dynamic Variant Construction (Issue #149) ## Summary This PR successfully implements dynamic variant construction by migrating from integer-based tags to symbol-based tags throughout the codebase. The implementation is well-executed and all 522 tests (291 runtime + 231 compiler) pass. This is a foundational change that enables SON (Seq Object Notation) support. ## ✅ Strengths ### 1. Excellent Architectural Decision The switch from u32 tags to SeqString (Symbol) tags is the right approach: - Enables self-describing data structures (SON) - More debuggable (:JsonNull vs 0) - Prepares for future serialization features ### 2. Comprehensive Migration All layers properly updated: runtime, compiler, stdlib, examples, and docs. ### 3. New Features Well-Implemented - wrap-0 through wrap-4: Clean aliases for dynamic construction - symbol.=: Proper equality comparison for symbols - Pattern matching preserved with symbolic tags ### 4. Test Coverage All existing tests pass (291 runtime + 231 compiler tests). ## ⚠️ Issues & Concerns ### 1. Performance Regression in Pattern Matching (CRITICAL) **Issue**: The new codegen strategy in codegen_match_statement (crates/compiler/src/codegen.rs:4550) has a performance problem. **Old approach** (integer tags): O(1) or O(log n) via jump table **New approach** (symbol tags): Cascading if-else with string comparison - O(n) linear scan through variants - O(m) string comparison per variant - Overall: O(n*m) complexity **Impact**: Large enums (10+ variants) will suffer, hot path pattern matching becomes bottleneck. **Recommendation**: Implement symbol interning with integer IDs - Add tag_id: u32 for fast comparison alongside tag_name: SeqString - Use compile-time interning: :JsonNull → ID 0, :JsonBool → ID 1 - Use integer switch for pattern matching (O(1)) - Keep tag_name for SON serialization ### 2. Missing peek_bool_value Implementation Check **Location**: crates/compiler/src/codegen.rs:869 The function patch_seq_peek_bool_value is declared but needs verification it exists in the runtime. ### 3. Memory Overhead VariantData now includes heap-allocated SeqString (~24 bytes per variant). Acceptable for SON benefits, but consider string interning for optimization. ### 4. Test Gap: No Tests for wrap-N Functions Add compiler tests for wrap-0 through wrap-4 to verify correct behavior. ### 5. Potential Edge Case: Duplicate Variant Names What happens if two unions have the same variant name (e.g., both Result and HttpResponse have :Ok)? Add test case. ## 🔍 Code Quality Observations **Positive**: - Consistent error messages - Clear documentation of stack effects - Proper safety annotations **Minor Issues**: - Inconsistent tag naming in string_split ("List" vs :List convention) - Outdated comments in codegen from integer-based implementation ## 🛡️ Security Considerations ✅ Good: No unsafe memory operations beyond FFI, stack assertions, bounds checking ⚠️ Watch: String comparison in hot path could be DoS vector with large enums ## 📊 Performance Benchmarks Please run benchmarks comparing: 1. Pattern matching performance (before/after) 2. JSON parsing (heavy variant creation) 3. Memory usage for large JSON structures ## 🎯 Recommendations Summary ### MUST FIX (before merge): 1. **Performance**: Implement symbol interning to restore O(1) pattern matching 2. **Correctness**: Verify patch_seq_peek_bool_value exists ### SHOULD FIX: 3. Add test coverage for wrap-N functions 4. Benchmark performance impact 5. Clarify tag naming consistency ### NICE TO HAVE: 6. Update outdated comments 7. Test duplicate variant names across unions ## Final Verdict **Status**: ⚠️ **Approve with Required Changes** This PR is architecturally sound but the **pattern matching performance regression is a blocker**. **Suggested Action**: 1. Implement symbol interning (dual tag_id + tag_name) 2. Restore O(1) pattern matching via integer switch 3. Keep symbol names for SON serialization 4. Verify all FFI functions exist 5. Then merge Great work on a complex migration! Once the performance issue is resolved, this will be a solid foundation for SON. --- **Reviewed by**: Claude (Anthropic) **Review Date**: 2026-01-03
Sign in to join this conversation.
No description provided.