Feature: Generator support with yield_value and resume #131

Closed
opened 2025-12-29 08:00:02 +00:00 by navicore · 4 comments
navicore commented 2025-12-29 08:00:02 +00:00 (Migrated from github.com)

Summary

Add generator/coroutine support to Seq with the ability to yield values to callers and resume execution. This enables lazy sequences, streaming, and backtracking patterns.

Motivation

Primary Use Case: seq-prolog

We're building a compiled Prolog interpreter in Seq (seq-prolog). Prolog's backtracking naturally maps to generators:

  • A Prolog query produces multiple solutions on demand
  • User requests next solution with ;
  • Execution resumes from where it left off

Without generators, we'd need workarounds:

  • Channels: Allocation + synchronization overhead for each solution
  • CPS transformation: Complex compilation, all code must be rewritten
  • Manual choice points: Essentially an interpreter in disguise

General Use Cases

Generators are a fundamental pattern used across languages (Python, JavaScript, Go, Rust iterators, C#, Lua):

  • Lazy sequences: Generate Fibonacci numbers on demand
  • Stream processing: Process large files without loading into memory
  • Tokenizers/Parsers: Yield tokens one at a time
  • Tree traversals: Yield nodes during walk
  • State machines: Yield on state transitions
  • Pipelines: Chain generators for data transformation

Proposed Semantics

Creating a Generator

# A generator is a quotation that can yield values
: count-up ( start -- Generator )
  [ 
    # 'start' is captured from stack
    dup yield      # yield current value, suspend
    1 i.+ count-up-loop  # increment and continue (TCO)
  ] make-generator
;

# Or inline:
[ 1 yield 2 yield 3 yield ] make-generator

Using a Generator

10 count-up       # create generator starting at 10
dup resume        # yields 10, generator suspended
dup resume        # yields 11
dup resume        # yields 12
drop              # discard generator

Generator State

union GeneratorState {
  Ready           # Can be resumed
  Yielded { value: T }  # Just yielded a value
  Done            # No more values
}

: resume ( Generator -- Generator Value GeneratorState )
  # Returns updated generator, the yielded value (if any), and state
;

For-each Style Iteration

: for-each ( Generator Quotation -- )
  # Apply quotation to each yielded value until Done
  swap resume
  match
    Yielded { >value } ->
      value rot call    # apply quotation to value
      for-each          # continue
    Done ->
      drop drop         # cleanup
  end
;

# Usage:
[ 1 yield 2 yield 3 yield ] make-generator
[ io.write-line ] for-each

Implementation Notes

Leveraging May

Seq already uses may for green threads. May's coroutines support suspend/resume, which could back generators:

// Conceptually:
// - make-generator creates a may coroutine
// - yield suspends the coroutine, passing value to caller
// - resume continues the coroutine, receiving the yielded value

Key question: Does may's coroutine::yield_with() or similar provide value-passing on yield? If not, we may need a small wrapper using channels internally, but hidden from the Seq user.

Stack Considerations

When a generator yields:

  1. Its stack state must be preserved
  2. The yielded value transfers to the caller's stack
  3. On resume, execution continues after the yield

This is similar to how spawn clones the stack, but generators maintain a single suspended stack rather than running concurrently.

Interaction with TCO

Generators should work with Seq's tail-call optimization. A generator that tail-calls itself should not grow the stack:

: infinite-ones ( -- Generator )
  [ 1 yield infinite-ones-loop ] make-generator
;

: infinite-ones-loop ( -- )
  1 yield
  infinite-ones-loop  # TCO - no stack growth
;

Alternatives Considered

  1. Use channels: Works but adds overhead. Good for concurrent producers, overkill for sequential iteration.

  2. CPS transformation: Compile all code to continuation-passing style. Complex and invasive.

  3. Explicit choice points: Manual state management. Essentially an interpreter.

Generators provide the cleanest abstraction for "produce values on demand" patterns.

  • seq-prolog: Compiled Prolog in Seq (depends on this feature)
  • Lua coroutines (inspiration)
  • Python generators
  • JavaScript generators
  • Rust iterators (similar pattern, different implementation)

Questions to Explore

  1. Can may's existing suspend/resume pass values, or do we need a wrapper?
  2. Should generators be a new type, or a variant of quotations?
  3. How do generators interact with the existing strand/channel model?
  4. Should there be a yield_all for delegating to sub-generators?
## Summary Add generator/coroutine support to Seq with the ability to yield values to callers and resume execution. This enables lazy sequences, streaming, and backtracking patterns. ## Motivation ### Primary Use Case: seq-prolog We're building a **compiled Prolog** interpreter in Seq (seq-prolog). Prolog's backtracking naturally maps to generators: - A Prolog query produces multiple solutions on demand - User requests next solution with `;` - Execution resumes from where it left off Without generators, we'd need workarounds: - **Channels**: Allocation + synchronization overhead for each solution - **CPS transformation**: Complex compilation, all code must be rewritten - **Manual choice points**: Essentially an interpreter in disguise ### General Use Cases Generators are a fundamental pattern used across languages (Python, JavaScript, Go, Rust iterators, C#, Lua): - **Lazy sequences**: Generate Fibonacci numbers on demand - **Stream processing**: Process large files without loading into memory - **Tokenizers/Parsers**: Yield tokens one at a time - **Tree traversals**: Yield nodes during walk - **State machines**: Yield on state transitions - **Pipelines**: Chain generators for data transformation ## Proposed Semantics ### Creating a Generator ```seq # A generator is a quotation that can yield values : count-up ( start -- Generator ) [ # 'start' is captured from stack dup yield # yield current value, suspend 1 i.+ count-up-loop # increment and continue (TCO) ] make-generator ; # Or inline: [ 1 yield 2 yield 3 yield ] make-generator ``` ### Using a Generator ```seq 10 count-up # create generator starting at 10 dup resume # yields 10, generator suspended dup resume # yields 11 dup resume # yields 12 drop # discard generator ``` ### Generator State ```seq union GeneratorState { Ready # Can be resumed Yielded { value: T } # Just yielded a value Done # No more values } : resume ( Generator -- Generator Value GeneratorState ) # Returns updated generator, the yielded value (if any), and state ; ``` ### For-each Style Iteration ```seq : for-each ( Generator Quotation -- ) # Apply quotation to each yielded value until Done swap resume match Yielded { >value } -> value rot call # apply quotation to value for-each # continue Done -> drop drop # cleanup end ; # Usage: [ 1 yield 2 yield 3 yield ] make-generator [ io.write-line ] for-each ``` ## Implementation Notes ### Leveraging May Seq already uses [may](https://github.com/Xudong-Huang/may) for green threads. May's coroutines support suspend/resume, which could back generators: ```rust // Conceptually: // - make-generator creates a may coroutine // - yield suspends the coroutine, passing value to caller // - resume continues the coroutine, receiving the yielded value ``` Key question: Does may's `coroutine::yield_with()` or similar provide value-passing on yield? If not, we may need a small wrapper using channels internally, but hidden from the Seq user. ### Stack Considerations When a generator yields: 1. Its stack state must be preserved 2. The yielded value transfers to the caller's stack 3. On resume, execution continues after the yield This is similar to how `spawn` clones the stack, but generators maintain a single suspended stack rather than running concurrently. ### Interaction with TCO Generators should work with Seq's tail-call optimization. A generator that tail-calls itself should not grow the stack: ```seq : infinite-ones ( -- Generator ) [ 1 yield infinite-ones-loop ] make-generator ; : infinite-ones-loop ( -- ) 1 yield infinite-ones-loop # TCO - no stack growth ; ``` ## Alternatives Considered 1. **Use channels**: Works but adds overhead. Good for concurrent producers, overkill for sequential iteration. 2. **CPS transformation**: Compile all code to continuation-passing style. Complex and invasive. 3. **Explicit choice points**: Manual state management. Essentially an interpreter. Generators provide the cleanest abstraction for "produce values on demand" patterns. ## Related - seq-prolog: Compiled Prolog in Seq (depends on this feature) - Lua coroutines (inspiration) - Python generators - JavaScript generators - Rust iterators (similar pattern, different implementation) ## Questions to Explore 1. Can may's existing suspend/resume pass values, or do we need a wrapper? 2. Should generators be a new type, or a variant of quotations? 3. How do generators interact with the existing strand/channel model? 4. Should there be a `yield_all` for delegating to sub-generators?
navicore commented 2025-12-29 08:28:07 +00:00 (Migrated from github.com)

Design Summary: Weaves (Generators for Seq)

Naming

Weave - Continuing the textile metaphor (strands are threads of execution), a weave is a strand you can interleave with, yielding control back and forth. Captures the bidirectional, interwoven nature.

Core Semantics

# Create a weave from a quotation
weave.make   ( Quotation -- Weave )

# Resume with a value, get result
weave.resume ( Weave T -- Weave WeaveResult )

# Yield a value, receive resume value (only valid inside weave)
weave.yield  ( T -- U )

# Check state without resuming
weave.state  ( Weave -- Weave WeaveState )

Result Type (follows std:result convention)

union WeaveResult {
  Yielded { value: T }   # tag 0 - success, has yielded value
  Done                   # tag 1 - weave completed, no more values
}

union WeaveState {
  Ready      # tag 0 - initial, hasn't started
  Suspended  # tag 1 - yielded, waiting for resume  
  Completed  # tag 2 - finished
}

Error Handling: No Panics

Resuming a completed weave returns Done, not a panic. Caller must handle it:

my-weave value weave.resume
weave-ok? if
  weave-unwrap process-value
else
  drop "Weave completed" io.write-line
then

Or using match:

my-weave value weave.resume
match
  Yielded { >value } -> value process-value
  Done -> "Weave completed" io.write-line
end

Implementation Approach

Build on existing strand/channel infrastructure:

  1. weave.make spawns a may coroutine, parks waiting on internal resume channel
  2. weave.yield sends to yield channel, blocks on resume channel
  3. weave.resume sends to resume channel, receives from yield channel
  4. When quotation returns, sends Done sentinel

Example: Counter

: counter ( start -- Weave )
  [ 
    loop
      dup weave.yield   # yield current value
      i.+               # add received increment
    repeat
  ] weave.make
;

: main ( -- Int )
  10 counter           # start at 10
  1 weave.resume       # yields 10, send increment 1
  match Yielded { >value } -> value int->string io.write-line end
  1 weave.resume       # yields 11
  match Yielded { >value } -> value int->string io.write-line end
  5 weave.resume       # yields 12 (we sent 5 as increment)
  match Yielded { >value } -> value int->string io.write-line end
  drop
  0
;

Example: Prolog-style backtracking

: solve-query ( Goal -- Weave )
  [
    # goal captured
    find-all-solutions  # internal solver
    # for each solution:
    #   bindings weave.yield
    #   receive signal to continue or stop
  ] weave.make
;

# Interactive solution enumeration
query-weave
loop
  unit weave.resume
  match
    Yielded { >bindings } ->
      bindings print-solution
      "More? (y/n): " prompt "y" string.equal?
    Done -> 
      "No more solutions." io.write-line
      false
  end
while-true

This design revealed that some existing builtins use panics where they should return Results:

  • chan.receive panics on closed channel → should return Result
  • chan.send panics on closed channel → should return Result
  • Various -safe variants exist but the unsafe versions shouldn't crash

Proposed follow-up issue: Audit all builtins and convert panics to Result returns. The -safe suffix pattern is backwards - safe should be the default, with explicit -unsafe! or -unchecked! variants for performance-critical code that's willing to crash.

## Design Summary: Weaves (Generators for Seq) ### Naming **Weave** - Continuing the textile metaphor (strands are threads of execution), a weave is a strand you can interleave with, yielding control back and forth. Captures the bidirectional, interwoven nature. ### Core Semantics ```seq # Create a weave from a quotation weave.make ( Quotation -- Weave ) # Resume with a value, get result weave.resume ( Weave T -- Weave WeaveResult ) # Yield a value, receive resume value (only valid inside weave) weave.yield ( T -- U ) # Check state without resuming weave.state ( Weave -- Weave WeaveState ) ``` ### Result Type (follows std:result convention) ```seq union WeaveResult { Yielded { value: T } # tag 0 - success, has yielded value Done # tag 1 - weave completed, no more values } union WeaveState { Ready # tag 0 - initial, hasn't started Suspended # tag 1 - yielded, waiting for resume Completed # tag 2 - finished } ``` ### Error Handling: No Panics Resuming a completed weave returns `Done`, not a panic. Caller must handle it: ```seq my-weave value weave.resume weave-ok? if weave-unwrap process-value else drop "Weave completed" io.write-line then ``` Or using match: ```seq my-weave value weave.resume match Yielded { >value } -> value process-value Done -> "Weave completed" io.write-line end ``` ### Implementation Approach Build on existing strand/channel infrastructure: 1. `weave.make` spawns a may coroutine, parks waiting on internal resume channel 2. `weave.yield` sends to yield channel, blocks on resume channel 3. `weave.resume` sends to resume channel, receives from yield channel 4. When quotation returns, sends `Done` sentinel ### Example: Counter ```seq : counter ( start -- Weave ) [ loop dup weave.yield # yield current value i.+ # add received increment repeat ] weave.make ; : main ( -- Int ) 10 counter # start at 10 1 weave.resume # yields 10, send increment 1 match Yielded { >value } -> value int->string io.write-line end 1 weave.resume # yields 11 match Yielded { >value } -> value int->string io.write-line end 5 weave.resume # yields 12 (we sent 5 as increment) match Yielded { >value } -> value int->string io.write-line end drop 0 ; ``` ### Example: Prolog-style backtracking ```seq : solve-query ( Goal -- Weave ) [ # goal captured find-all-solutions # internal solver # for each solution: # bindings weave.yield # receive signal to continue or stop ] weave.make ; # Interactive solution enumeration query-weave loop unit weave.resume match Yielded { >bindings } -> bindings print-solution "More? (y/n): " prompt "y" string.equal? Done -> "No more solutions." io.write-line false end while-true ``` --- ## Related: Error Handling Audit This design revealed that some existing builtins use panics where they should return Results: - `chan.receive` panics on closed channel → should return Result - `chan.send` panics on closed channel → should return Result - Various `-safe` variants exist but the unsafe versions shouldn't crash **Proposed follow-up issue:** Audit all builtins and convert panics to Result returns. The `-safe` suffix pattern is backwards - safe should be the default, with explicit `-unsafe!` or `-unchecked!` variants for performance-critical code that's willing to crash.
navicore commented 2025-12-29 13:50:01 +00:00 (Migrated from github.com)

Design Evolution: Unified Strands with Effect Tracking

After further discussion, we've evolved the design to avoid introducing a separate "Weave" type. Instead, generators are strands with the Yield effect - one concurrency primitive, two modes of use.

Core Insight

Strands and generators aren't fundamentally different:

  • Both use may coroutines under the hood
  • Both yield cooperatively (at I/O or explicit yield points)
  • Both transfer values via channels

The difference is the communication pattern:

  • spawn: fire-and-forget, communicate via explicit channels
  • weave: structured yield/resume protocol

Effect-Annotated Quotation Types

Seq already tracks stack effects: ( a b -- a c ). We extend this to track computational effects:

( a -- b )             # pure quotation
( a -- b | Yield T )   # quotation that may yield values of type T

The | separates stack effects from computational effects. This extends naturally from Seq's existing row polymorphism.

Strand Operations

: strand.spawn  ( ( -- ) -- SpawnedStrand )
: strand.weave  ( ( -- | Yield T ) -- WovenStrand T )

: strand.resume ( WovenStrand T -- T Bool WovenStrand T )  # value, has-more, strand
# Alternative with Option:
: strand.resume ( WovenStrand T -- Option T WovenStrand T )

: yield ( T -- | Yield T )  # introduces Yield effect

Compile-Time Safety

The effect system catches errors at compile time:

# ERROR: spawn expects pure quotation, got Yield effect
[ 1 yield 2 yield ] strand.spawn

# ERROR: weave expects Yield effect, got pure quotation  
[ 1 2 i.+ ] strand.weave

# ERROR: resume not valid on SpawnedStrand
[ do-work ] strand.spawn strand.resume

# OK: effect matches
[ 1 yield 2 yield 3 yield ] strand.weave

Effect Propagation

Effects bubble up through function calls:

: my-generator ( -- | Yield Int )
    1 yield
    2 yield
    3 yield
;

: uses-generator ( -- | Yield Int )
    my-generator  # Yield effect propagates
;

: handles-generator ( -- )
    [ my-generator ] strand.weave  # Yield effect handled here
    # ... consume values ...
;

A pure main with unhandled Yield effects = compile error.

Future Effect Extensions

The same mechanism could track other effects:

( a -- b | IO )              # performs I/O
( a -- b | Fail E )          # may fail with error type E
( a -- b | Yield T, IO )     # multiple effects

But for now, just Yield T is the focus.

Why This Approach

  1. No metaphor pollution: One concurrency primitive (strands), two modes
  2. Compile-time safety: Type errors, not runtime panics
  3. Natural extension: Fits Seq's existing row-polymorphic type system
  4. User already thinks differently: Handling yields requires different code anyway - the effect annotation makes this explicit
  5. Principled: Effect systems are well-studied in PL theory

Implementation Considerations

  1. Parser: Track effect annotations in quotation type syntax
  2. Type checker: Propagate effects, check at spawn/weave boundaries
  3. Codegen: SpawnedStrand vs WovenStrand can share runtime representation
  4. Runtime: Channel-backed yield/resume (same as previous design)

Dependency: Should complete error handling audit (#132) first, so the new strand operations return Results consistently.

## Design Evolution: Unified Strands with Effect Tracking After further discussion, we've evolved the design to avoid introducing a separate "Weave" type. Instead, generators are **strands with the Yield effect** - one concurrency primitive, two modes of use. ### Core Insight Strands and generators aren't fundamentally different: - Both use may coroutines under the hood - Both yield cooperatively (at I/O or explicit yield points) - Both transfer values via channels The difference is the *communication pattern*: - `spawn`: fire-and-forget, communicate via explicit channels - `weave`: structured yield/resume protocol ### Effect-Annotated Quotation Types Seq already tracks stack effects: `( a b -- a c )`. We extend this to track **computational effects**: ```seq ( a -- b ) # pure quotation ( a -- b | Yield T ) # quotation that may yield values of type T ``` The `|` separates stack effects from computational effects. This extends naturally from Seq's existing row polymorphism. ### Strand Operations ```seq : strand.spawn ( ( -- ) -- SpawnedStrand ) : strand.weave ( ( -- | Yield T ) -- WovenStrand T ) : strand.resume ( WovenStrand T -- T Bool WovenStrand T ) # value, has-more, strand # Alternative with Option: : strand.resume ( WovenStrand T -- Option T WovenStrand T ) : yield ( T -- | Yield T ) # introduces Yield effect ``` ### Compile-Time Safety The effect system catches errors at compile time: ```seq # ERROR: spawn expects pure quotation, got Yield effect [ 1 yield 2 yield ] strand.spawn # ERROR: weave expects Yield effect, got pure quotation [ 1 2 i.+ ] strand.weave # ERROR: resume not valid on SpawnedStrand [ do-work ] strand.spawn strand.resume # OK: effect matches [ 1 yield 2 yield 3 yield ] strand.weave ``` ### Effect Propagation Effects bubble up through function calls: ```seq : my-generator ( -- | Yield Int ) 1 yield 2 yield 3 yield ; : uses-generator ( -- | Yield Int ) my-generator # Yield effect propagates ; : handles-generator ( -- ) [ my-generator ] strand.weave # Yield effect handled here # ... consume values ... ; ``` A pure `main` with unhandled `Yield` effects = compile error. ### Future Effect Extensions The same mechanism could track other effects: ```seq ( a -- b | IO ) # performs I/O ( a -- b | Fail E ) # may fail with error type E ( a -- b | Yield T, IO ) # multiple effects ``` But for now, just `Yield T` is the focus. ### Why This Approach 1. **No metaphor pollution**: One concurrency primitive (strands), two modes 2. **Compile-time safety**: Type errors, not runtime panics 3. **Natural extension**: Fits Seq's existing row-polymorphic type system 4. **User already thinks differently**: Handling yields requires different code anyway - the effect annotation makes this explicit 5. **Principled**: Effect systems are well-studied in PL theory ### Implementation Considerations 1. **Parser**: Track effect annotations in quotation type syntax 2. **Type checker**: Propagate effects, check at spawn/weave boundaries 3. **Codegen**: SpawnedStrand vs WovenStrand can share runtime representation 4. **Runtime**: Channel-backed yield/resume (same as previous design) --- **Dependency**: Should complete error handling audit (#132) first, so the new strand operations return Results consistently.
navicore commented 2025-12-30 03:30:46 +00:00 (Migrated from github.com)
https://github.com/navicore/patch-seq/pull/138
navicore commented 2025-12-30 06:33:22 +00:00 (Migrated from github.com)
https://github.com/navicore/patch-seq/pull/138
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
navicore/patch-seq#131
No description provided.