Phase 2 Lint: Data flow analysis for resource leak detection #142

Closed
opened 2025-12-30 15:33:42 +00:00 by navicore · 2 comments
navicore commented 2025-12-30 15:33:42 +00:00 (Migrated from github.com)

Background

The current lint system (Phase 1) uses syntactic pattern matching on word sequences. This works well for detecting simple inefficiencies like swap swap or dup drop, but cannot detect resource leaks that depend on control flow.

PR #138 added simple weave lint rules:

  • weave-immediate-drop - catches strand.weave drop
  • unchecked-resume - catches strand.resume drop drop drop

These catch obvious mistakes but miss more subtle leaks.

Problem: Patterns Can't Track Data Flow

Example 1: Leak through control flow

: example ( -- )
  [ gen ] strand.weave
  some-condition if
    strand.weave-cancel  # Cancel in one branch ✓
  else
    drop                 # LEAK in other branch ✗
  then
;

Pattern matching can't see that drop in the else branch leaks the handle.

Example 2: Leak across word boundaries

: store-weave ( -- Handle )
  [ gen ] strand.weave ;  # Returns handle

: use-and-leak ( -- )
  store-weave drop ;      # LEAK - can't trace across words

Example 3: Conditional completion

: process-some ( -- )
  [ gen ] strand.weave
  0 strand.resume
  if
    drop strand.weave-cancel  # Handled ✓
  else
    drop drop                 # Completed naturally ✓
  then
;

This is correct but pattern matching can't verify both branches handle the resource.

Proposed Solution: Phase 2 Data Flow Analysis

Architecture

┌─────────────────────────────────────────────────────────────┐
│                     Phase 2 Lint Engine                      │
├─────────────────────────────────────────────────────────────┤
│  1. Resource Tagging                                         │
│     - Tag values returned by resource-creating words         │
│     - strand.weave → WeaveHandle                            │
│     - chan.make → Channel                                   │
│     - file.open → FileHandle (future)                       │
├─────────────────────────────────────────────────────────────┤
│  2. Flow Analysis                                            │
│     - Track tagged values through stack operations          │
│     - Follow branches (if/else/then)                        │
│     - Detect when tagged value is:                          │
│       • Consumed by cleanup word (cancel, close)            │
│       • Dropped without cleanup                             │
│       • Escapes scope (returned, stored)                    │
├─────────────────────────────────────────────────────────────┤
│  3. Escape Analysis                                          │
│     - If resource escapes (returned from word), no warning  │
│     - Caller becomes responsible                            │
│     - Could add @must_use annotation for stricter checking  │
├─────────────────────────────────────────────────────────────┤
│  4. Branch Unification                                       │
│     - All branches of if/else must handle resource          │
│     - Either all consume it, or all preserve it             │
│     - Mixed handling is an error                            │
└─────────────────────────────────────────────────────────────┘

Resource Definitions (Config)

# Resource tracking configuration
[[resource]]
id = "weave-handle"
created_by = ["strand.weave"]
consumed_by = ["strand.weave-cancel"]
completed_by_pattern = "strand.resume ... if ... else drop drop then"
message = "WeaveHandle must be cancelled or resumed to completion"

[[resource]]
id = "channel"
created_by = ["chan.make"]
consumed_by = ["chan.close"]
message = "Channel should be closed when no longer needed"

Implementation Phases

Phase 2a: Single-word analysis

  • Track resources within a single word definition
  • Detect leaks at word boundaries
  • No cross-word analysis

Phase 2b: Cross-word analysis

  • Track resources returned from words
  • Propagate "must handle" requirements
  • Detect leaks in callers

Phase 2c: Effect-based tracking

  • Integrate with type system effects
  • Words with Yield effect must be used with strand.weave
  • Automatic resource requirement propagation

Alternative: Linear Types (Long-term)

A more principled solution would be linear/affine types:

# Hypothetical syntax
: example ( -- )
  [ gen ] strand.weave  # Returns linear WeaveHandle!
  # ... handle MUST be used exactly once ...
  strand.weave-cancel   # Consumes the handle
;

This would be a type system change rather than a lint, providing compile-time guarantees.

Acceptance Criteria (Phase 2a MVP)

  • Define resource tracking data structure
  • Implement stack simulation through word body
  • Track resources through if/else/then
  • Emit warning when resource is dropped without cleanup
  • Handle strand.weave and strand.resume completion pattern
  • No false positives on valid code patterns
  • Integration with LSP for real-time feedback

Test Cases

# Should warn: immediate drop
: bad1 ( -- ) [ gen ] strand.weave drop ;

# Should warn: drop in one branch
: bad2 ( -- )
  [ gen ] strand.weave
  true if strand.weave-cancel else drop then ;

# Should NOT warn: both branches handle
: good1 ( -- )
  [ gen ] strand.weave
  true if strand.weave-cancel else strand.weave-cancel then ;

# Should NOT warn: completion pattern
: good2 ( -- )
  [ gen ] strand.weave
  0 strand.resume
  if drop strand.weave-cancel else drop drop then ;

# Should NOT warn: resource escapes (caller's responsibility)
: good3 ( -- Handle ) [ gen ] strand.weave ;
  • #139 - Remove panics from weave FFI
  • #140 - Use ADT for weave control flow
  • #141 - WeaveHandle Drop warning

References

  • Rust's #[must_use] attribute
  • Linear types in Clean, Idris, Linear Haskell
  • Escape analysis in JVM/Go compilers
## Background The current lint system (Phase 1) uses syntactic pattern matching on word sequences. This works well for detecting simple inefficiencies like `swap swap` or `dup drop`, but cannot detect resource leaks that depend on control flow. PR #138 added simple weave lint rules: - `weave-immediate-drop` - catches `strand.weave drop` - `unchecked-resume` - catches `strand.resume drop drop drop` These catch obvious mistakes but miss more subtle leaks. ## Problem: Patterns Can't Track Data Flow ### Example 1: Leak through control flow ```seq : example ( -- ) [ gen ] strand.weave some-condition if strand.weave-cancel # Cancel in one branch ✓ else drop # LEAK in other branch ✗ then ; ``` Pattern matching can't see that `drop` in the else branch leaks the handle. ### Example 2: Leak across word boundaries ```seq : store-weave ( -- Handle ) [ gen ] strand.weave ; # Returns handle : use-and-leak ( -- ) store-weave drop ; # LEAK - can't trace across words ``` ### Example 3: Conditional completion ```seq : process-some ( -- ) [ gen ] strand.weave 0 strand.resume if drop strand.weave-cancel # Handled ✓ else drop drop # Completed naturally ✓ then ; ``` This is correct but pattern matching can't verify both branches handle the resource. ## Proposed Solution: Phase 2 Data Flow Analysis ### Architecture ``` ┌─────────────────────────────────────────────────────────────┐ │ Phase 2 Lint Engine │ ├─────────────────────────────────────────────────────────────┤ │ 1. Resource Tagging │ │ - Tag values returned by resource-creating words │ │ - strand.weave → WeaveHandle │ │ - chan.make → Channel │ │ - file.open → FileHandle (future) │ ├─────────────────────────────────────────────────────────────┤ │ 2. Flow Analysis │ │ - Track tagged values through stack operations │ │ - Follow branches (if/else/then) │ │ - Detect when tagged value is: │ │ • Consumed by cleanup word (cancel, close) │ │ • Dropped without cleanup │ │ • Escapes scope (returned, stored) │ ├─────────────────────────────────────────────────────────────┤ │ 3. Escape Analysis │ │ - If resource escapes (returned from word), no warning │ │ - Caller becomes responsible │ │ - Could add @must_use annotation for stricter checking │ ├─────────────────────────────────────────────────────────────┤ │ 4. Branch Unification │ │ - All branches of if/else must handle resource │ │ - Either all consume it, or all preserve it │ │ - Mixed handling is an error │ └─────────────────────────────────────────────────────────────┘ ``` ### Resource Definitions (Config) ```toml # Resource tracking configuration [[resource]] id = "weave-handle" created_by = ["strand.weave"] consumed_by = ["strand.weave-cancel"] completed_by_pattern = "strand.resume ... if ... else drop drop then" message = "WeaveHandle must be cancelled or resumed to completion" [[resource]] id = "channel" created_by = ["chan.make"] consumed_by = ["chan.close"] message = "Channel should be closed when no longer needed" ``` ### Implementation Phases **Phase 2a: Single-word analysis** - Track resources within a single word definition - Detect leaks at word boundaries - No cross-word analysis **Phase 2b: Cross-word analysis** - Track resources returned from words - Propagate "must handle" requirements - Detect leaks in callers **Phase 2c: Effect-based tracking** - Integrate with type system effects - Words with `Yield` effect must be used with `strand.weave` - Automatic resource requirement propagation ## Alternative: Linear Types (Long-term) A more principled solution would be linear/affine types: ```seq # Hypothetical syntax : example ( -- ) [ gen ] strand.weave # Returns linear WeaveHandle! # ... handle MUST be used exactly once ... strand.weave-cancel # Consumes the handle ; ``` This would be a type system change rather than a lint, providing compile-time guarantees. ## Acceptance Criteria (Phase 2a MVP) - [ ] Define resource tracking data structure - [ ] Implement stack simulation through word body - [ ] Track resources through `if`/`else`/`then` - [ ] Emit warning when resource is dropped without cleanup - [ ] Handle `strand.weave` and `strand.resume` completion pattern - [ ] No false positives on valid code patterns - [ ] Integration with LSP for real-time feedback ## Test Cases ```seq # Should warn: immediate drop : bad1 ( -- ) [ gen ] strand.weave drop ; # Should warn: drop in one branch : bad2 ( -- ) [ gen ] strand.weave true if strand.weave-cancel else drop then ; # Should NOT warn: both branches handle : good1 ( -- ) [ gen ] strand.weave true if strand.weave-cancel else strand.weave-cancel then ; # Should NOT warn: completion pattern : good2 ( -- ) [ gen ] strand.weave 0 strand.resume if drop strand.weave-cancel else drop drop then ; # Should NOT warn: resource escapes (caller's responsibility) : good3 ( -- Handle ) [ gen ] strand.weave ; ``` ## Related Issues - #139 - Remove panics from weave FFI - #140 - Use ADT for weave control flow - #141 - WeaveHandle Drop warning ## References - Rust's `#[must_use]` attribute - Linear types in Clean, Idris, Linear Haskell - Escape analysis in JVM/Go compilers
navicore commented 2026-01-01 05:47:03 +00:00 (Migrated from github.com)
https://github.com/navicore/patch-seq/pull/159
navicore commented 2026-01-03 15:33:23 +00:00 (Migrated from github.com)
https://github.com/navicore/patch-seq/pull/162
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
navicore/patch-seq#142
No description provided.