Performance: Channel throughput is 400-10,000x slower than alternatives #306

Closed
opened 2026-01-25 23:12:59 +00:00 by navicore · 1 comment
navicore commented 2026-01-25 23:12:59 +00:00 (Migrated from github.com)

Current State

Fanout benchmark: 1 producer → 10 workers → 100k messages

Language Time vs Seq
Seq 100,000ms 1x
Python (asyncio) 230ms 400x faster
Go 30ms 3,300x faster
Rust 9ms 11,000x faster

Root Causes

  1. Value boxing: Every message is wrapped in 40-byte Value struct
  2. Channel synchronization: Lock contention on each send/receive
  3. No batching: Each message is an individual operation
  4. Yielding overhead: chan.yield calls between operations

Potential Approaches

Near-term

  • Primitive channels: IntChannel type that passes i64 directly without boxing
  • Buffered channels: Reduce lock contention with ring buffer
  • Batch send/receive: chan.send-all / chan.receive-n operations

Long-term

  • Lock-free channels: Use atomic operations instead of mutex
  • Zero-copy for large values: Pass pointers instead of copying
  • Channel fusion: Optimize known patterns (fan-out, pipeline)

Benchmark Code

: worker-loop ( Channel Channel Int -- )
  2 pick chan.receive drop
  chan.yield
  dup 0 i.< if
    drop swap chan.send drop drop
  else
    drop 1 i.+ worker-loop
  then
;

: producer ( Channel Int -- )
  dup 0 i.> if
    dup 2 pick chan.send drop
    1 i.- producer
  else
    drop drop
  then
;

Success Criteria

  • Throughput within 100x of Go (target: < 3,000ms for 100k messages)
  • Primitive channels within 10x of Go
## Current State Fanout benchmark: 1 producer → 10 workers → 100k messages | Language | Time | vs Seq | |----------|------|--------| | Seq | 100,000ms | 1x | | Python (asyncio) | 230ms | 400x faster | | Go | 30ms | 3,300x faster | | Rust | 9ms | 11,000x faster | ## Root Causes 1. **Value boxing**: Every message is wrapped in 40-byte Value struct 2. **Channel synchronization**: Lock contention on each send/receive 3. **No batching**: Each message is an individual operation 4. **Yielding overhead**: `chan.yield` calls between operations ## Potential Approaches ### Near-term - **Primitive channels**: `IntChannel` type that passes i64 directly without boxing - **Buffered channels**: Reduce lock contention with ring buffer - **Batch send/receive**: `chan.send-all` / `chan.receive-n` operations ### Long-term - **Lock-free channels**: Use atomic operations instead of mutex - **Zero-copy for large values**: Pass pointers instead of copying - **Channel fusion**: Optimize known patterns (fan-out, pipeline) ## Benchmark Code ```seq : worker-loop ( Channel Channel Int -- ) 2 pick chan.receive drop chan.yield dup 0 i.< if drop swap chan.send drop drop else drop 1 i.+ worker-loop then ; : producer ( Channel Int -- ) dup 0 i.> if dup 2 pick chan.send drop 1 i.- producer else drop drop then ; ``` ## Success Criteria - Throughput within 100x of Go (target: < 3,000ms for 100k messages) - Primitive channels within 10x of Go
navicore commented 2026-03-23 21:50:03 +00:00 (Migrated from github.com)
https://github.com/navicore/patch-seq/pull/367
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
navicore/patch-seq#306
No description provided.