Performance: Actor/strand spawn is 20-2500x slower than alternatives #307

Closed
opened 2026-01-25 23:13:12 +00:00 by navicore · 1 comment
navicore commented 2026-01-25 23:13:12 +00:00 (Migrated from github.com)

Current State

Skynet benchmark: spawn 100k actors in 10-ary tree, collect sum

Language Time vs Seq
Seq 5,165ms 1x
Python (asyncio) 266ms 20x faster
Go 19ms 270x faster
Rust 2ms 2,500x faster

Root Causes

  1. Strand allocation: Each strand has significant setup cost
  2. Channel per strand: Every actor creates a channel for results
  3. Thread pool scheduling: Work distribution overhead
  4. Deep recursion: 10-ary tree creates stack pressure

Potential Approaches

Near-term

  • Lighter strand creation: Reduce per-strand allocation
  • Channel pooling: Reuse channels for short-lived communications
  • Work stealing: Better load balancing across threads

Long-term

  • Stackless coroutines: Lighter than full strand state machines
  • Strand pooling: Reuse strand slots instead of allocating new
  • Spawn fusion: Batch multiple spawns into single scheduling operation

Benchmark Code

: skynet ( Channel Int Int -- )
  dup 1 i.= if
    drop swap chan.send drop
  else
    chan.make
    over 10 i.divide drop
    # Spawn 10 children
    3 pick 10 i.* 0 i.+ 2 pick swap 2 pick [ skynet ] strand.spawn drop drop drop drop
    3 pick 10 i.* 1 i.+ 2 pick swap 2 pick [ skynet ] strand.spawn drop drop drop drop
    # ... (10 spawns total)
    
    # Collect 10 results
    dup chan.receive drop
    over chan.receive drop i.+
    # ... (10 receives total)
  then
;

Success Criteria

  • Spawn 100k actors in < 500ms (10x improvement)
  • Within 50x of Go for tree-structured spawns
## Current State Skynet benchmark: spawn 100k actors in 10-ary tree, collect sum | Language | Time | vs Seq | |----------|------|--------| | Seq | 5,165ms | 1x | | Python (asyncio) | 266ms | 20x faster | | Go | 19ms | 270x faster | | Rust | 2ms | 2,500x faster | ## Root Causes 1. **Strand allocation**: Each strand has significant setup cost 2. **Channel per strand**: Every actor creates a channel for results 3. **Thread pool scheduling**: Work distribution overhead 4. **Deep recursion**: 10-ary tree creates stack pressure ## Potential Approaches ### Near-term - **Lighter strand creation**: Reduce per-strand allocation - **Channel pooling**: Reuse channels for short-lived communications - **Work stealing**: Better load balancing across threads ### Long-term - **Stackless coroutines**: Lighter than full strand state machines - **Strand pooling**: Reuse strand slots instead of allocating new - **Spawn fusion**: Batch multiple spawns into single scheduling operation ## Benchmark Code ```seq : skynet ( Channel Int Int -- ) dup 1 i.= if drop swap chan.send drop else chan.make over 10 i.divide drop # Spawn 10 children 3 pick 10 i.* 0 i.+ 2 pick swap 2 pick [ skynet ] strand.spawn drop drop drop drop 3 pick 10 i.* 1 i.+ 2 pick swap 2 pick [ skynet ] strand.spawn drop drop drop drop # ... (10 spawns total) # Collect 10 results dup chan.receive drop over chan.receive drop i.+ # ... (10 receives total) then ; ``` ## Success Criteria - Spawn 100k actors in < 500ms (10x improvement) - Within 50x of Go for tree-structured spawns
navicore commented 2026-03-23 21:50:15 +00:00 (Migrated from github.com)
https://github.com/navicore/patch-seq/pull/367
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
navicore/patch-seq#307
No description provided.