Performance: Collection operations are 50-100x slower than Python #305

New issue

Closed

opened 2026-01-25 23:12:44 +00:00 by navicore · 1 comment

navicore commented

2026-01-25 23:12:44 +00:00

(Migrated from github.com)

Current State

Collection operations on 100k elements:

Operation	Seq	Python	Gap
build-100k	15,888ms	~0ms	huge
map-double	83ms	2ms	40x
filter-evens	128ms	3ms	40x
fold-sum	195ms	2ms	100x
chain (map→filter→fold)	412ms	6ms	70x

Root Causes

40-byte Value struct: Every element is boxed in a heap-allocated Value
Immutable lists: Each operation creates a new list
Quotation call overhead: Function call per element for map/filter/fold
No vectorization: Operations are sequential, no SIMD

Potential Approaches

Near-term

Specialized numeric arrays: IntArray type with contiguous i64 storage
In-place operations: Mutation when refcount == 1
Inline small quotations: Avoid function call overhead for simple ops

Long-term

SIMD vectorization: Use LLVM vector intrinsics for bulk numeric ops
Lazy/streaming collections: Fuse operations before materializing
Unboxed generics: Monomorphize collection types for primitives

Benchmark Code

: build-list ( Int -- Variant )
  list.make 0 rot build-list-loop
;

[ 2 i.* ] list.map
[ 2 i.% drop 0 i.= ] list.filter  
0 [ i.+ ] list.fold

Success Criteria

build-100k < 100ms
map/filter/fold within 10x of Python

## Current State Collection operations on 100k elements: | Operation | Seq | Python | Gap | |-----------|-----|--------|-----| | build-100k | 15,888ms | ~0ms | huge | | map-double | 83ms | 2ms | 40x | | filter-evens | 128ms | 3ms | 40x | | fold-sum | 195ms | 2ms | 100x | | chain (map→filter→fold) | 412ms | 6ms | 70x | ## Root Causes 1. **40-byte Value struct**: Every element is boxed in a heap-allocated Value 2. **Immutable lists**: Each operation creates a new list 3. **Quotation call overhead**: Function call per element for map/filter/fold 4. **No vectorization**: Operations are sequential, no SIMD ## Potential Approaches ### Near-term - **Specialized numeric arrays**: `IntArray` type with contiguous i64 storage - **In-place operations**: Mutation when refcount == 1 - **Inline small quotations**: Avoid function call overhead for simple ops ### Long-term - **SIMD vectorization**: Use LLVM vector intrinsics for bulk numeric ops - **Lazy/streaming collections**: Fuse operations before materializing - **Unboxed generics**: Monomorphize collection types for primitives ## Benchmark Code ```seq : build-list ( Int -- Variant ) list.make 0 rot build-list-loop ; [ 2 i.* ] list.map [ 2 i.% drop 0 i.= ] list.filter 0 [ i.+ ] list.fold ``` ## Success Criteria - build-100k < 100ms - map/filter/fold within 10x of Python

navicore commented

2026-03-23 21:49:52 +00:00

(Migrated from github.com)

https://github.com/navicore/patch-seq/pull/367