Performance: Channel throughput is 400-10,000x slower than alternatives #306

New issue

Closed

opened 2026-01-25 23:12:59 +00:00 by navicore · 1 comment

navicore commented

2026-01-25 23:12:59 +00:00

(Migrated from github.com)

Current State

Fanout benchmark: 1 producer → 10 workers → 100k messages

Language	Time	vs Seq
Seq	100,000ms	1x
Python (asyncio)	230ms	400x faster
Go	30ms	3,300x faster
Rust	9ms	11,000x faster

Root Causes

Value boxing: Every message is wrapped in 40-byte Value struct
Channel synchronization: Lock contention on each send/receive
No batching: Each message is an individual operation
Yielding overhead: chan.yield calls between operations

Potential Approaches

Near-term

Primitive channels: IntChannel type that passes i64 directly without boxing
Buffered channels: Reduce lock contention with ring buffer
Batch send/receive: chan.send-all / chan.receive-n operations

Long-term

Lock-free channels: Use atomic operations instead of mutex
Zero-copy for large values: Pass pointers instead of copying
Channel fusion: Optimize known patterns (fan-out, pipeline)

Benchmark Code

: worker-loop ( Channel Channel Int -- )
  2 pick chan.receive drop
  chan.yield
  dup 0 i.< if
    drop swap chan.send drop drop
  else
    drop 1 i.+ worker-loop
  then
;

: producer ( Channel Int -- )
  dup 0 i.> if
    dup 2 pick chan.send drop
    1 i.- producer
  else
    drop drop
  then
;

Success Criteria

Throughput within 100x of Go (target: < 3,000ms for 100k messages)
Primitive channels within 10x of Go

## Current State Fanout benchmark: 1 producer → 10 workers → 100k messages | Language | Time | vs Seq | |----------|------|--------| | Seq | 100,000ms | 1x | | Python (asyncio) | 230ms | 400x faster | | Go | 30ms | 3,300x faster | | Rust | 9ms | 11,000x faster | ## Root Causes 1. **Value boxing**: Every message is wrapped in 40-byte Value struct 2. **Channel synchronization**: Lock contention on each send/receive 3. **No batching**: Each message is an individual operation 4. **Yielding overhead**: `chan.yield` calls between operations ## Potential Approaches ### Near-term - **Primitive channels**: `IntChannel` type that passes i64 directly without boxing - **Buffered channels**: Reduce lock contention with ring buffer - **Batch send/receive**: `chan.send-all` / `chan.receive-n` operations ### Long-term - **Lock-free channels**: Use atomic operations instead of mutex - **Zero-copy for large values**: Pass pointers instead of copying - **Channel fusion**: Optimize known patterns (fan-out, pipeline) ## Benchmark Code ```seq : worker-loop ( Channel Channel Int -- ) 2 pick chan.receive drop chan.yield dup 0 i.< if drop swap chan.send drop drop else drop 1 i.+ worker-loop then ; : producer ( Channel Int -- ) dup 0 i.> if dup 2 pick chan.send drop 1 i.- producer else drop drop then ; ``` ## Success Criteria - Throughput within 100x of Go (target: < 3,000ms for 100k messages) - Primitive channels within 10x of Go

navicore commented

2026-03-23 21:50:03 +00:00

(Migrated from github.com)

https://github.com/navicore/patch-seq/pull/367