Performance: Collection operations are 50-100x slower than Python #305

Closed
opened 2026-01-25 23:12:44 +00:00 by navicore · 1 comment
navicore commented 2026-01-25 23:12:44 +00:00 (Migrated from github.com)

Current State

Collection operations on 100k elements:

Operation Seq Python Gap
build-100k 15,888ms ~0ms huge
map-double 83ms 2ms 40x
filter-evens 128ms 3ms 40x
fold-sum 195ms 2ms 100x
chain (map→filter→fold) 412ms 6ms 70x

Root Causes

  1. 40-byte Value struct: Every element is boxed in a heap-allocated Value
  2. Immutable lists: Each operation creates a new list
  3. Quotation call overhead: Function call per element for map/filter/fold
  4. No vectorization: Operations are sequential, no SIMD

Potential Approaches

Near-term

  • Specialized numeric arrays: IntArray type with contiguous i64 storage
  • In-place operations: Mutation when refcount == 1
  • Inline small quotations: Avoid function call overhead for simple ops

Long-term

  • SIMD vectorization: Use LLVM vector intrinsics for bulk numeric ops
  • Lazy/streaming collections: Fuse operations before materializing
  • Unboxed generics: Monomorphize collection types for primitives

Benchmark Code

: build-list ( Int -- Variant )
  list.make 0 rot build-list-loop
;

[ 2 i.* ] list.map
[ 2 i.% drop 0 i.= ] list.filter  
0 [ i.+ ] list.fold

Success Criteria

  • build-100k < 100ms
  • map/filter/fold within 10x of Python
## Current State Collection operations on 100k elements: | Operation | Seq | Python | Gap | |-----------|-----|--------|-----| | build-100k | 15,888ms | ~0ms | huge | | map-double | 83ms | 2ms | 40x | | filter-evens | 128ms | 3ms | 40x | | fold-sum | 195ms | 2ms | 100x | | chain (map→filter→fold) | 412ms | 6ms | 70x | ## Root Causes 1. **40-byte Value struct**: Every element is boxed in a heap-allocated Value 2. **Immutable lists**: Each operation creates a new list 3. **Quotation call overhead**: Function call per element for map/filter/fold 4. **No vectorization**: Operations are sequential, no SIMD ## Potential Approaches ### Near-term - **Specialized numeric arrays**: `IntArray` type with contiguous i64 storage - **In-place operations**: Mutation when refcount == 1 - **Inline small quotations**: Avoid function call overhead for simple ops ### Long-term - **SIMD vectorization**: Use LLVM vector intrinsics for bulk numeric ops - **Lazy/streaming collections**: Fuse operations before materializing - **Unboxed generics**: Monomorphize collection types for primitives ## Benchmark Code ```seq : build-list ( Int -- Variant ) list.make 0 rot build-list-loop ; [ 2 i.* ] list.map [ 2 i.% drop 0 i.= ] list.filter 0 [ i.+ ] list.fold ``` ## Success Criteria - build-100k < 100ms - map/filter/fold within 10x of Python
navicore commented 2026-03-23 21:49:52 +00:00 (Migrated from github.com)
https://github.com/navicore/patch-seq/pull/367
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
navicore/patch-seq#305
No description provided.