⏺ Phase 4 is complete. Here's a summary of what was implemented: #5

Merged
navicore merged 6 commits from phase4 into main 2026-02-09 02:18:43 +00:00
navicore commented 2026-02-09 01:17:04 +00:00 (Migrated from github.com)

Summary

Files modified:

src/parser.rs — Added PARSE instruction parsing:

  • parse_value_depth field to stop expression parsing at WITH keyword
  • is_with_keyword() helper and WITH break in parse_concat()
  • parse_template() — full template grammar (variables, dots, literals, absolute/relative positions, variable patterns, commas)
  • parse_parse() — PARSE [UPPER] source template with all sources (ARG, PULL, SOURCE, VERSION, LINEIN, VAR, VALUE)
  • parse_pull() — PULL [template]
  • Upgraded parse_arg() from Phase 3's simple one-var-per-position to full template parsing
  • Wired PARSE and PULL into parse_clause()

src/eval.rs — Added PARSE template engine:

  • apply_template() — core template engine implementing REXX word-parsing, literal patterns, absolute/relative positions, variable patterns
  • assign_section() — 0 targets: skip; 1 target: verbatim; 2+: word-parse
  • exec_parse() — source dispatch with comma-split sub-templates
  • read_stdin_line() for PULL/LINEIN
  • set_main_args() for CLI arg injection
  • Upgraded exec_arg() to route through exec_parse(true, Arg, template)
  • Wired PARSE and PULL in exec_clause()

src/main.rs — Wired CLI args:

  • run_line() now accepts args: &[String] and pushes them as main program arguments

tests/phase4.rs — 31 integration tests covering:

  • Word parsing (1-3+ targets, excess vars)
  • Literal patterns (single/multi-char, not-found)
  • Positional patterns (absolute, relative, backward)
  • Dot placeholder
  • PARSE UPPER
  • PARSE VALUE ... WITH (simple and expression)
  • ARG with templates (word split, literal patterns, comma-separated)
  • PARSE ARG (case-preserving)
  • PARSE SOURCE / VERSION
  • PULL with piped stdin
  • Variable patterns
  • Backward compatibility with Phase 3
  • Edge cases (empty source, no template)
Summary Files modified: src/parser.rs — Added PARSE instruction parsing: - parse_value_depth field to stop expression parsing at WITH keyword - is_with_keyword() helper and WITH break in parse_concat() - parse_template() — full template grammar (variables, dots, literals, absolute/relative positions, variable patterns, commas) - parse_parse() — PARSE [UPPER] source template with all sources (ARG, PULL, SOURCE, VERSION, LINEIN, VAR, VALUE) - parse_pull() — PULL [template] - Upgraded parse_arg() from Phase 3's simple one-var-per-position to full template parsing - Wired PARSE and PULL into parse_clause() src/eval.rs — Added PARSE template engine: - apply_template() — core template engine implementing REXX word-parsing, literal patterns, absolute/relative positions, variable patterns - assign_section() — 0 targets: skip; 1 target: verbatim; 2+: word-parse - exec_parse() — source dispatch with comma-split sub-templates - read_stdin_line() for PULL/LINEIN - set_main_args() for CLI arg injection - Upgraded exec_arg() to route through exec_parse(true, Arg, template) - Wired PARSE and PULL in exec_clause() src/main.rs — Wired CLI args: - run_line() now accepts args: &[String] and pushes them as main program arguments tests/phase4.rs — 31 integration tests covering: - Word parsing (1-3+ targets, excess vars) - Literal patterns (single/multi-char, not-found) - Positional patterns (absolute, relative, backward) - Dot placeholder - PARSE UPPER - PARSE VALUE ... WITH (simple and expression) - ARG with templates (word split, literal patterns, comma-separated) - PARSE ARG (case-preserving) - PARSE SOURCE / VERSION - PULL with piped stdin - Variable patterns - Backward compatibility with Phase 3 - Edge cases (empty source, no template)
claude[bot] commented 2026-02-09 01:20:34 +00:00 (Migrated from github.com)

Phase 4 PARSE Implementation Review

I've reviewed PR #5 implementing PARSE instruction functionality for the REXX interpreter. Overall, this is solid work with good test coverage. I found several issues requiring attention:


Critical Issues

1. Integer Overflow in Template Parsing (src/parser.rs:839-860)

Issue: Parsing relative positions (+N, -N) uses i32, but the conversion from string can overflow without proper bounds checking.

Problem: If n is "2147483648" or larger, .parse::<i32>() will fail, but the error detail says "invalid relative position" rather than "overflow". More critically, malicious input like +999999999999999999999 could cause issues.

Fix: Add explicit range validation before parsing, or use checked arithmetic. Per REXX spec, positions should fit in implementation limits (typically ±999999999).


2. CLI Argument Handling Loses Fidelity (src/main.rs:74-77)

Issue: CLI arguments are joined with spaces, losing the distinction between multiple arguments.

Problem: In REXX, ARG a, b, c should parse comma-separated arguments. CLI invocation like patch-rexx script.rexx arg1 arg2 arg3 should make ARG a, b, c assign a="ARG1", b="ARG2", c="ARG3". But the current code joins them as "arg1 arg2 arg3", so comma parsing would fail (line 281-288 in eval.rs only sees one arg).

Fix: Pass arguments as separate RexxValue entries: args.iter().map(|s| value::RexxValue::new(s)).collect()


3. Potential Panic in Environment (src/env.rs:54, 75, 121, 171)

Not introduced by this PR, but exposed: Several .expect() calls assume scopes exist. If Environment::new() doesn't initialize with at least one scope, or if there's a bug in procedure stack management, this will panic on valid REXX programs using PARSE.

Recommendation: Since this is pre-existing, file a tracking issue. For this PR, just ensure PARSE operations don't expose this (currently safe).


Design/Correctness Issues

4. AbsolutePos Cast Truncation (src/eval.rs:365-366)

Issue: Conversion from i64 to usize uses as usize, which truncates on 32-bit platforms or with very large positions.

Problem: On a 32-bit system, position 5000000000 (valid i64) would truncate incorrectly. REXX should reject positions larger than usize::MAX.

Fix: Use usize::try_from(pos.saturating_sub(1)).unwrap_or(usize::MAX).min(source.len())


5. Word Parsing Doesn't Handle Tabs (src/eval.rs:441)

Issue: Word boundary detection uses .find(' '), which only matches ASCII space (0x20).

REXX Spec: Per ANSI X3.274-1996 §2.1.7, "blanks" include space and horizontal tab (0x09). The current implementation rejects tabs as word delimiters.

Fix: if let Some(space_pos) = trimmed.find(|c: char| c == ' ' || c == '\t') {


6. Missing Validation: PARSE ARG with UPPER (src/eval.rs:259-261)

Issue: exec_arg() hardcodes upper: true (correct per spec: "ARG is shorthand for PARSE UPPER ARG"), but doesn't validate that the user didn't write PARSE UPPER ARG explicitly, which would be redundant.

Fix (optional): Add a comment explaining that ARG always uppercases.


Performance/Efficiency

7. Unnecessary String Allocation in Comma Split (src/eval.rs:465-479)

Issue: split_template_at_commas() clones all TemplateElements even when no commas are present (the common case).

Impact: For PARSE VAR data a b c (no commas), this clones 3 elements unnecessarily.

Fix: Check for commas first: if !template.elements.iter().any(|e| matches!(e, TemplateElement::Comma)) { return vec![template.clone()]; }


8. Per-Call stdin Allocation (src/eval.rs:483-493)

Issue: read_stdin_line() allocates a new String for every PULL or PARSE LINEIN. In loops processing large files, this adds overhead.

Recommendation: Consider adding a reusable line buffer to Evaluator. Not critical for correctness, but worth profiling.


Testing Gaps

9. No Tests for Error Paths

The 31 tests in tests/phase4.rs cover happy paths comprehensively, but missing:

  • Parser errors: Invalid template syntax
  • Runtime errors: Non-numeric position in AbsolutePos
  • Edge cases: Empty templates, deeply nested expressions

Recommendation: Add 5-10 negative tests for robustness.


10. Missing Multi-Byte UTF-8 Test

Issue: REXX strings are byte-oriented, but Rust String is UTF-8. The slicing logic in apply_template() uses &source[cursor..], which panics if indices fall mid-UTF-8 character.

Current risk: LOW (REXX programs typically use ASCII), but could crash on malicious input.

Fix: Use source.char_indices() or validate UTF-8 boundaries.


Minor Issues

11. Inconsistent Error for Missing WITH (src/parser.rs:941-945)

Uses InvalidSubKeyword error for PARSE VALUE expr without WITH. Consider using a more specific error like ExpectedWith for clarity.

12. to_position_value() Duplication (src/eval.rs:496-504)

Nearly identical to to_integer() (lines 792-809). Consider consolidating.


Positive Notes

  • Excellent test coverage for word parsing, literals, positions, and edge cases
  • No arithmetic issues (BigDecimal not used in PARSE)
  • Safe string slicing for valid inputs
  • Correct REXX semantics for PULL, ARG uppercase, and VERSION/SOURCE strings
  • No unsafe code or unchecked indexing

Recommendation

Status: Approve with required changes

Required before merge:

  1. Fix CLI argument handling (#2)
  2. Fix relative position overflow validation (#1)
  3. Fix word parsing to handle tabs (#5)
  4. Add AbsolutePos range check (#4)

Suggested follow-up:

  • File issue for UTF-8 handling strategy (#10)
  • Add error path tests (#9)
  • Profile split_template_at_commas performance (#7)

Great work on this implementation! The template engine is well-structured and handles the complex REXX parsing rules correctly. Once the CLI args issue is fixed, this will be production-ready.

## Phase 4 PARSE Implementation Review I've reviewed PR #5 implementing PARSE instruction functionality for the REXX interpreter. Overall, this is **solid work** with good test coverage. I found several issues requiring attention: --- ## Critical Issues ### 1. Integer Overflow in Template Parsing (src/parser.rs:839-860) **Issue:** Parsing relative positions (+N, -N) uses `i32`, but the conversion from string can overflow without proper bounds checking. **Problem:** If `n` is "2147483648" or larger, `.parse::<i32>()` will fail, but the error detail says "invalid relative position" rather than "overflow". More critically, malicious input like `+999999999999999999999` could cause issues. **Fix:** Add explicit range validation before parsing, or use checked arithmetic. Per REXX spec, positions should fit in implementation limits (typically ±999999999). --- ### 2. CLI Argument Handling Loses Fidelity (src/main.rs:74-77) **Issue:** CLI arguments are joined with spaces, losing the distinction between multiple arguments. **Problem:** In REXX, `ARG a, b, c` should parse comma-separated arguments. CLI invocation like `patch-rexx script.rexx arg1 arg2 arg3` should make `ARG a, b, c` assign a="ARG1", b="ARG2", c="ARG3". But the current code joins them as "arg1 arg2 arg3", so comma parsing would fail (line 281-288 in eval.rs only sees one arg). **Fix:** Pass arguments as separate RexxValue entries: `args.iter().map(|s| value::RexxValue::new(s)).collect()` --- ### 3. Potential Panic in Environment (src/env.rs:54, 75, 121, 171) **Not introduced by this PR, but exposed:** Several `.expect()` calls assume scopes exist. If `Environment::new()` doesn't initialize with at least one scope, or if there's a bug in procedure stack management, this will panic on valid REXX programs using PARSE. **Recommendation:** Since this is pre-existing, file a tracking issue. For this PR, just ensure PARSE operations don't expose this (currently safe). --- ## Design/Correctness Issues ### 4. AbsolutePos Cast Truncation (src/eval.rs:365-366) **Issue:** Conversion from `i64` to `usize` uses `as usize`, which truncates on 32-bit platforms or with very large positions. **Problem:** On a 32-bit system, position `5000000000` (valid i64) would truncate incorrectly. REXX should reject positions larger than `usize::MAX`. **Fix:** Use `usize::try_from(pos.saturating_sub(1)).unwrap_or(usize::MAX).min(source.len())` --- ### 5. Word Parsing Doesn't Handle Tabs (src/eval.rs:441) **Issue:** Word boundary detection uses `.find(' ')`, which only matches ASCII space (0x20). **REXX Spec:** Per ANSI X3.274-1996 §2.1.7, "blanks" include space and horizontal tab (0x09). The current implementation rejects tabs as word delimiters. **Fix:** `if let Some(space_pos) = trimmed.find(|c: char| c == ' ' || c == '\t') {` --- ### 6. Missing Validation: PARSE ARG with UPPER (src/eval.rs:259-261) **Issue:** `exec_arg()` hardcodes `upper: true` (correct per spec: "ARG is shorthand for PARSE UPPER ARG"), but doesn't validate that the user didn't write `PARSE UPPER ARG` explicitly, which would be redundant. **Fix (optional):** Add a comment explaining that `ARG` always uppercases. --- ## Performance/Efficiency ### 7. Unnecessary String Allocation in Comma Split (src/eval.rs:465-479) **Issue:** `split_template_at_commas()` clones all `TemplateElement`s even when no commas are present (the common case). **Impact:** For `PARSE VAR data a b c` (no commas), this clones 3 elements unnecessarily. **Fix:** Check for commas first: `if !template.elements.iter().any(|e| matches!(e, TemplateElement::Comma)) { return vec![template.clone()]; }` --- ### 8. Per-Call stdin Allocation (src/eval.rs:483-493) **Issue:** `read_stdin_line()` allocates a new `String` for every `PULL` or `PARSE LINEIN`. In loops processing large files, this adds overhead. **Recommendation:** Consider adding a reusable line buffer to `Evaluator`. Not critical for correctness, but worth profiling. --- ## Testing Gaps ### 9. No Tests for Error Paths The 31 tests in `tests/phase4.rs` cover happy paths comprehensively, but missing: - Parser errors: Invalid template syntax - Runtime errors: Non-numeric position in AbsolutePos - Edge cases: Empty templates, deeply nested expressions **Recommendation:** Add 5-10 negative tests for robustness. --- ### 10. Missing Multi-Byte UTF-8 Test **Issue:** REXX strings are byte-oriented, but Rust `String` is UTF-8. The slicing logic in `apply_template()` uses `&source[cursor..]`, which panics if indices fall mid-UTF-8 character. **Current risk:** LOW (REXX programs typically use ASCII), but could crash on malicious input. **Fix:** Use `source.char_indices()` or validate UTF-8 boundaries. --- ## Minor Issues ### 11. Inconsistent Error for Missing WITH (src/parser.rs:941-945) Uses `InvalidSubKeyword` error for `PARSE VALUE expr` without `WITH`. Consider using a more specific error like `ExpectedWith` for clarity. ### 12. to_position_value() Duplication (src/eval.rs:496-504) Nearly identical to `to_integer()` (lines 792-809). Consider consolidating. --- ## Positive Notes - Excellent test coverage for word parsing, literals, positions, and edge cases - No arithmetic issues (BigDecimal not used in PARSE) - Safe string slicing for valid inputs - Correct REXX semantics for PULL, ARG uppercase, and VERSION/SOURCE strings - No unsafe code or unchecked indexing --- ## Recommendation **Status:** Approve with required changes **Required before merge:** 1. Fix CLI argument handling (#2) 2. Fix relative position overflow validation (#1) 3. Fix word parsing to handle tabs (#5) 4. Add AbsolutePos range check (#4) **Suggested follow-up:** - File issue for UTF-8 handling strategy (#10) - Add error path tests (#9) - Profile split_template_at_commas performance (#7) --- Great work on this implementation! The template engine is well-structured and handles the complex REXX parsing rules correctly. Once the CLI args issue is fixed, this will be production-ready.
claude[bot] commented 2026-02-09 01:36:22 +00:00 (Migrated from github.com)

PR Review: Phase 4 PARSE Implementation

I've reviewed the PARSE instruction implementation. The implementation is generally solid with good test coverage, but I've identified one critical safety issue.


Critical Issue

1. UTF-8 String Slicing Can Panic (Safety - High Priority)

Location: src/eval.rs:334-409 (throughout apply_template)

Issue: All string slicing operations use byte-based indexing (source[cursor..], source[cursor..abs], etc.) without UTF-8 boundary validation. This will panic at runtime if cursor or calculated positions land mid-character in multi-byte UTF-8 sequences.

Example that would panic:
data = '🎯test'
parse var data a 2 b

Why this matters: ANSI X3.274-1996 defines REXX string operations on characters, not bytes. Modern implementations must handle multi-byte encodings safely.

Recommended fix: Use .char_indices() to build a byte-to-character position map, or document ASCII-only and validate at parse time.

Lines affected: 334, 344-348, 368, 385, 397-399


Correctness Issues

2. Absolute Position Operates on Bytes Not Characters

Location: src/eval.rs:366-373

Position 5 should mean character 5, not byte 5. For multi-byte characters, this creates invalid slices.


Performance Observations

3. String Slicing Allocations in Word Parsing

Location: src/eval.rs:434-454

Assessment: Likely fine for typical REXX use cases.


Potential Edge Cases

4. Empty Pattern in Variable Pattern Matching

Location: src/eval.rs:393-409

If (var) evaluates to empty string, find() returns Some(0). Classic REXX treats empty pattern as not found.

Test coverage: No test for sep = ''; parse var data a (sep) b


Positive Observations

Error handling: Proper REXX error numbers (25, 38, 20)
Test coverage: 38 tests passing
REXX semantics: Correctly implements word parsing, patterns, uppercase, tabs
Performance: Fast path optimization in split_template_at_commas
No panics on malformed input


Recommendation

Must fix before merge: Issue 1 (UTF-8 slicing panic)

Optional: Add UTF-8 test or document ASCII-only limitation

Test results: All 38 tests passing

## PR Review: Phase 4 PARSE Implementation I've reviewed the PARSE instruction implementation. The implementation is generally solid with good test coverage, but I've identified one critical safety issue. --- ## Critical Issue ### 1. UTF-8 String Slicing Can Panic (Safety - High Priority) **Location:** src/eval.rs:334-409 (throughout apply_template) **Issue:** All string slicing operations use byte-based indexing (source[cursor..], source[cursor..abs], etc.) without UTF-8 boundary validation. This will panic at runtime if cursor or calculated positions land mid-character in multi-byte UTF-8 sequences. **Example that would panic:** data = '🎯test' parse var data a 2 b **Why this matters:** ANSI X3.274-1996 defines REXX string operations on characters, not bytes. Modern implementations must handle multi-byte encodings safely. **Recommended fix:** Use .char_indices() to build a byte-to-character position map, or document ASCII-only and validate at parse time. **Lines affected:** 334, 344-348, 368, 385, 397-399 --- ## Correctness Issues ### 2. Absolute Position Operates on Bytes Not Characters **Location:** src/eval.rs:366-373 Position 5 should mean character 5, not byte 5. For multi-byte characters, this creates invalid slices. --- ## Performance Observations ### 3. String Slicing Allocations in Word Parsing **Location:** src/eval.rs:434-454 Assessment: Likely fine for typical REXX use cases. --- ## Potential Edge Cases ### 4. Empty Pattern in Variable Pattern Matching **Location:** src/eval.rs:393-409 If (var) evaluates to empty string, find() returns Some(0). Classic REXX treats empty pattern as not found. Test coverage: No test for sep = ''; parse var data a (sep) b --- ## Positive Observations ✅ Error handling: Proper REXX error numbers (25, 38, 20) ✅ Test coverage: 38 tests passing ✅ REXX semantics: Correctly implements word parsing, patterns, uppercase, tabs ✅ Performance: Fast path optimization in split_template_at_commas ✅ No panics on malformed input --- ## Recommendation **Must fix before merge:** Issue 1 (UTF-8 slicing panic) **Optional:** Add UTF-8 test or document ASCII-only limitation Test results: All 38 tests passing ✅
claude[bot] commented 2026-02-09 01:47:50 +00:00 (Migrated from github.com)

Code Review: Phase 4 PARSE Implementation

Reviewed REXX PARSE instruction implementation for correctness, safety, and REXX semantics compliance per ANSI X3.274-1996.

Critical Issues

1. Panic Risk: RelativePos Integer Overflow (src/eval.rs:383)

Severity: HIGH (causes panic in debug, undefined behavior in release)

TemplateElement::RelativePos(offset) => {
    #[allow(clippy::cast_sign_loss)]
    let target = if *offset >= 0 {
        Self::advance_chars(source, cursor, *offset as usize)
    } else {
        Self::retreat_chars(source, cursor, (-*offset) as usize)  // ← PANIC if offset == i32::MIN
    }

Issue: Negating i32::MIN (-2147483648) causes arithmetic overflow. In debug mode this panics, in release mode it wraps to the same value.

Fix:

let n = offset.unsigned_abs() as usize;
let target = if *offset >= 0 {
    Self::advance_chars(source, cursor, n)
} else {
    Self::retreat_chars(source, cursor, n)
};

2. Panic Risk: Out-of-Bounds String Slicing (src/eval.rs:520, 529)

Severity: HIGH (panics on malformed input)

advance_chars() line 520:

fn advance_chars(source: &str, byte_cursor: usize, n: usize) -> usize {
    source[byte_cursor..]  // ← PANIC if byte_cursor > source.len()

retreat_chars() line 529:

fn retreat_chars(source: &str, byte_cursor: usize, n: usize) -> usize {
    let boundaries: Vec<usize> = source[..byte_cursor]  // ← PANIC if byte_cursor > source.len()

Root cause: In apply_template(), the cursor can advance beyond source.len() via patterns. While most code paths check cursor < source.len() before slicing (lines 333, 351, 405), the slices at lines 346, 369, 386, 401 assume cursor is in bounds. When cursor exceeds bounds, subsequent calls to advance_chars/retreat_chars panic.

Test case that could trigger (not currently in test suite):

data = 'ab'
parse var data a 10 b +5 c  /* pos 10 exceeds length 2, then +5 from there */

Fix: Add bounds checks:

fn advance_chars(source: &str, byte_cursor: usize, n: usize) -> usize {
    if byte_cursor >= source.len() { return source.len(); }
    source[byte_cursor..]
        .char_indices()
        .nth(n)
        .map_or(source.len(), |(offset, _)| byte_cursor + offset)
}

fn retreat_chars(source: &str, byte_cursor: usize, n: usize) -> usize {
    let byte_cursor = byte_cursor.min(source.len());
    let boundaries: Vec<usize> = source[..byte_cursor]
        .char_indices()
        .map(|(i, _)| i)
        .collect();
    if n >= boundaries.len() {
        0
    } else {
        boundaries[boundaries.len() - n]
    }
}

3. Semantic Issue: to_position_value Silently Rounds (src/eval.rs:543)

Severity: MEDIUM (incorrect REXX semantics)

fn to_position_value(&self, val: &RexxValue) -> RexxResult<i64> {
    let d = self.to_number(val)?;
    let rounded = d.round(0);  // ← Silently rounds 3.7 → 4
    let s = rounded.to_string();
    s.parse::<i64>().map_err(...)
}

Issue: REXX template positions must be exact integers. This silently rounds 3.7 to 4, inconsistent with to_integer() at lines 840-842 which explicitly checks if d != rounded and returns error.

Fix: Match to_integer() behavior:

fn to_position_value(&self, val: &RexxValue) -> RexxResult<i64> {
    let d = self.to_number(val)?;
    let rounded = d.round(0);
    if d != rounded {
        return Err(RexxDiagnostic::new(RexxError::InvalidWholeNumber)
            .with_detail(format!("'{}' is not a whole number", val)));
    }
    let s = rounded.to_string();
    s.parse::<i64>().map_err(|_| {
        RexxDiagnostic::new(RexxError::InvalidWholeNumber)
            .with_detail(format!("'{}' is not a valid position", val))
    })
}

Minor Issues

4. Parser: No Support for Nested Parentheses in Variable Patterns (src/parser.rs:867-883)

Severity: LOW (edge case, documented limitation)

Variable pattern parsing expects simple (varname) form. Input like parse var data a ((nested)) fails parsing with "expected symbol inside '(' ')'". This is acceptable as nested patterns are uncommon, but consider documenting.


5. Performance: Unnecessary Clone in split_template_at_commas (src/eval.rs:478)

Severity: LOW (minor allocation)

Fast path returns vec![template.clone()] when no commas found. Could use Cow or return &ParseTemplate to avoid clone, but negligible impact.


Correctness Verified ✓

REXX Semantics

  • Uppercase conversion: ARG uppercases, PARSE ARG preserves case
  • Missing arguments: Return empty string (not uppercase variable name)
  • Empty variable patterns: Treated as not found per REXX spec
  • Word parsing: Tabs treated as blanks (lines 254, 259)
  • 1-based positions: Correctly converted to 0-based (line 366)
  • Backward positioning: Returns empty string when target ≤ cursor

UTF-8 Safety

  • char_pos_to_byte_offset: Uses char_indices(), respects boundaries
  • Multi-byte characters: Tests pass for emoji and accented chars

Edge Cases

  • Empty source string: Handled (test: parse_var_empty_source)
  • Empty template: No crash (test: parse_var_no_template)
  • Comma handling: Empty sub-templates between commas work correctly
  • Pattern not found: Consumes rest of string
  • Overflow in parse: +99999999999 returns Error 38 (InvalidTemplate)

Test Coverage

Excellent: 42 tests covering all major paths. Suggestions for additional tests:

  1. RelativePos overflow: parse var data a -2147483648 b
  2. Cursor out of bounds: parse var "ab" x 10 y +5 z
  3. Non-integer position: parse var data a 3.7 b (should error after fix)
  4. Position overflow: parse value "test" with a 999999999999999999 b

Summary

Overall assessment: High-quality implementation with comprehensive test coverage. The implementation correctly handles REXX semantics, UTF-8, and edge cases. Three critical panic risks need fixing before merge:

  1. Fix RelativePos i32::MIN overflow (1 line change)
  2. Add bounds checks to advance_chars/retreat_chars (4 lines)
  3. Fix to_position_value to reject non-integers (3 lines)

All tests pass. No arithmetic errors, no division by zero issues, error numbers are correct. The PARSE template engine implementation is algorithmically sound.

Recommendation: Request fixes for the three critical issues, then approve.

# Code Review: Phase 4 PARSE Implementation Reviewed REXX PARSE instruction implementation for correctness, safety, and REXX semantics compliance per ANSI X3.274-1996. ## Critical Issues ### 1. **Panic Risk: RelativePos Integer Overflow** (src/eval.rs:383) **Severity:** HIGH (causes panic in debug, undefined behavior in release) ```rust TemplateElement::RelativePos(offset) => { #[allow(clippy::cast_sign_loss)] let target = if *offset >= 0 { Self::advance_chars(source, cursor, *offset as usize) } else { Self::retreat_chars(source, cursor, (-*offset) as usize) // ← PANIC if offset == i32::MIN } ``` **Issue:** Negating `i32::MIN` (-2147483648) causes arithmetic overflow. In debug mode this panics, in release mode it wraps to the same value. **Fix:** ```rust let n = offset.unsigned_abs() as usize; let target = if *offset >= 0 { Self::advance_chars(source, cursor, n) } else { Self::retreat_chars(source, cursor, n) }; ``` --- ### 2. **Panic Risk: Out-of-Bounds String Slicing** (src/eval.rs:520, 529) **Severity:** HIGH (panics on malformed input) **advance_chars()** line 520: ```rust fn advance_chars(source: &str, byte_cursor: usize, n: usize) -> usize { source[byte_cursor..] // ← PANIC if byte_cursor > source.len() ``` **retreat_chars()** line 529: ```rust fn retreat_chars(source: &str, byte_cursor: usize, n: usize) -> usize { let boundaries: Vec<usize> = source[..byte_cursor] // ← PANIC if byte_cursor > source.len() ``` **Root cause:** In `apply_template()`, the cursor can advance beyond `source.len()` via patterns. While most code paths check `cursor < source.len()` before slicing (lines 333, 351, 405), the slices at lines 346, 369, 386, 401 assume cursor is in bounds. When cursor exceeds bounds, subsequent calls to advance_chars/retreat_chars panic. **Test case that could trigger (not currently in test suite):** ```rexx data = 'ab' parse var data a 10 b +5 c /* pos 10 exceeds length 2, then +5 from there */ ``` **Fix:** Add bounds checks: ```rust fn advance_chars(source: &str, byte_cursor: usize, n: usize) -> usize { if byte_cursor >= source.len() { return source.len(); } source[byte_cursor..] .char_indices() .nth(n) .map_or(source.len(), |(offset, _)| byte_cursor + offset) } fn retreat_chars(source: &str, byte_cursor: usize, n: usize) -> usize { let byte_cursor = byte_cursor.min(source.len()); let boundaries: Vec<usize> = source[..byte_cursor] .char_indices() .map(|(i, _)| i) .collect(); if n >= boundaries.len() { 0 } else { boundaries[boundaries.len() - n] } } ``` --- ### 3. **Semantic Issue: to_position_value Silently Rounds** (src/eval.rs:543) **Severity:** MEDIUM (incorrect REXX semantics) ```rust fn to_position_value(&self, val: &RexxValue) -> RexxResult<i64> { let d = self.to_number(val)?; let rounded = d.round(0); // ← Silently rounds 3.7 → 4 let s = rounded.to_string(); s.parse::<i64>().map_err(...) } ``` **Issue:** REXX template positions must be exact integers. This silently rounds `3.7` to `4`, inconsistent with `to_integer()` at lines 840-842 which explicitly checks `if d != rounded` and returns error. **Fix:** Match `to_integer()` behavior: ```rust fn to_position_value(&self, val: &RexxValue) -> RexxResult<i64> { let d = self.to_number(val)?; let rounded = d.round(0); if d != rounded { return Err(RexxDiagnostic::new(RexxError::InvalidWholeNumber) .with_detail(format!("'{}' is not a whole number", val))); } let s = rounded.to_string(); s.parse::<i64>().map_err(|_| { RexxDiagnostic::new(RexxError::InvalidWholeNumber) .with_detail(format!("'{}' is not a valid position", val)) }) } ``` --- ## Minor Issues ### 4. **Parser: No Support for Nested Parentheses in Variable Patterns** (src/parser.rs:867-883) **Severity:** LOW (edge case, documented limitation) Variable pattern parsing expects simple `(varname)` form. Input like `parse var data a ((nested))` fails parsing with "expected symbol inside '(' ')'". This is acceptable as nested patterns are uncommon, but consider documenting. --- ### 5. **Performance: Unnecessary Clone in split_template_at_commas** (src/eval.rs:478) **Severity:** LOW (minor allocation) Fast path returns `vec![template.clone()]` when no commas found. Could use `Cow` or return `&ParseTemplate` to avoid clone, but negligible impact. --- ## Correctness Verified ✓ ### REXX Semantics - ✓ **Uppercase conversion**: ARG uppercases, PARSE ARG preserves case - ✓ **Missing arguments**: Return empty string (not uppercase variable name) - ✓ **Empty variable patterns**: Treated as not found per REXX spec - ✓ **Word parsing**: Tabs treated as blanks (lines 254, 259) - ✓ **1-based positions**: Correctly converted to 0-based (line 366) - ✓ **Backward positioning**: Returns empty string when target ≤ cursor ### UTF-8 Safety - ✓ **char_pos_to_byte_offset**: Uses `char_indices()`, respects boundaries - ✓ **Multi-byte characters**: Tests pass for emoji and accented chars ### Edge Cases - ✓ **Empty source string**: Handled (test: parse_var_empty_source) - ✓ **Empty template**: No crash (test: parse_var_no_template) - ✓ **Comma handling**: Empty sub-templates between commas work correctly - ✓ **Pattern not found**: Consumes rest of string - ✓ **Overflow in parse**: `+99999999999` returns Error 38 (InvalidTemplate) --- ## Test Coverage **Excellent:** 42 tests covering all major paths. Suggestions for additional tests: 1. **RelativePos overflow:** `parse var data a -2147483648 b` 2. **Cursor out of bounds:** `parse var "ab" x 10 y +5 z` 3. **Non-integer position:** `parse var data a 3.7 b` (should error after fix) 4. **Position overflow:** `parse value "test" with a 999999999999999999 b` --- ## Summary **Overall assessment:** High-quality implementation with comprehensive test coverage. The implementation correctly handles REXX semantics, UTF-8, and edge cases. Three critical panic risks need fixing before merge: 1. Fix RelativePos i32::MIN overflow (1 line change) 2. Add bounds checks to advance_chars/retreat_chars (4 lines) 3. Fix to_position_value to reject non-integers (3 lines) All tests pass. No arithmetic errors, no division by zero issues, error numbers are correct. The PARSE template engine implementation is algorithmically sound. **Recommendation:** Request fixes for the three critical issues, then approve.
claude[bot] commented 2026-02-09 01:54:49 +00:00 (Migrated from github.com)

Code Review — Phase 4 PARSE Implementation

I've completed a thorough review of this PR focusing on correctness, REXX semantics, error handling, and safety. All 42 tests pass successfully.

Critical Issue Found

src/main.rs:74-76 — Incorrect ARG handling for multi-argument programs

The current code joins all CLI arguments into a single string:

if !args.is_empty() {
    let joined = args.join(" ");
    evaluator.set_main_args(vec![value::RexxValue::new(joined)]);
}

Problem: Per REXX semantics, when a program is called with multiple arguments like patch-rexx script.rexx foo bar baz, the ARG instruction should parse them as separate arguments accessible via comma syntax. With the current implementation, comma-separated parsing gets empty strings because there's only one argument.

Fix: Each CLI argument should be a separate RexxValue:

if !args.is_empty() {
    let rexx_args: Vec<_> = args.iter()
        .map(|s| value::RexxValue::new(s))
        .collect();
    evaluator.set_main_args(rexx_args);
}

Positive Findings

Parser (src/parser.rs:813-1002)

  • Template parsing correctly handles all REXX pattern types
  • Error handling uses correct error numbers (Error 20, 25, 38)
  • parse_value_depth guard prevents WITH from being consumed by implicit concatenation
  • No panics on malformed input — all paths return proper RexxError

Evaluator (src/eval.rs:258-526)

  • UTF-8 handling is correct — char_pos_to_byte_offset, advance_chars, retreat_chars use char boundaries
  • Word parsing correctly implements REXX blank rules (space 0x20 + tab 0x09)
  • Empty pattern handling matches REXX spec (treated as not found)
  • Backward positioning handled safely (no underflow)
  • to_position_value correctly uses to_number → round → parse for REXX numeric conversion

REXX Semantics

  • ARG uppercases, PARSE ARG preserves case
  • PULL uppercases stdin
  • Single target gets verbatim, multiple targets word-parse
  • Last target gets remainder
  • Missing args leave variables unset (returning uppercase name)

Error Handling

  • All error paths use correct REXX error numbers
  • No unwrap() or panic!() in production code (only in tests)
  • read_stdin_line gracefully handles EOF
  • Position overflow/underflow clamped safely

Test Coverage

  • 42 comprehensive tests covering word parsing, literals, positions, dots, UPPER, VALUE, ARG, PULL, SOURCE, VERSION, variable patterns, UTF-8, edge cases, and error paths
  • Backward compatibility verified

Minor Observations (Not Blockers)

  1. Performance — src/eval.rs:504-514 retreat_chars allocation

    • Allocates Vec for every backward movement
    • Not a correctness issue, but could optimize with rev().nth() if this becomes hot
  2. Performance — src/eval.rs:449-455 template cloning

    • Fast path exists to avoid clone when no commas present
    • Optimization is appropriate
  3. stdin EOF behavior — src/eval.rs:474-476

    • Returns empty string on EOF (standard REXX behavior)
    • Silently ignores read errors (consistent with classic REXX)

Recommendation

Fix the main.rs argument handling issue, then this is ready to merge. The implementation is otherwise solid, well-tested, and correctly implements REXX PARSE semantics including proper error handling and UTF-8 safety.

## Code Review — Phase 4 PARSE Implementation I've completed a thorough review of this PR focusing on correctness, REXX semantics, error handling, and safety. All 42 tests pass successfully. ### Critical Issue Found **src/main.rs:74-76 — Incorrect ARG handling for multi-argument programs** The current code joins all CLI arguments into a single string: ```rust if !args.is_empty() { let joined = args.join(" "); evaluator.set_main_args(vec![value::RexxValue::new(joined)]); } ``` **Problem:** Per REXX semantics, when a program is called with multiple arguments like `patch-rexx script.rexx foo bar baz`, the ARG instruction should parse them as separate arguments accessible via comma syntax. With the current implementation, comma-separated parsing gets empty strings because there's only one argument. **Fix:** Each CLI argument should be a separate RexxValue: ```rust if !args.is_empty() { let rexx_args: Vec<_> = args.iter() .map(|s| value::RexxValue::new(s)) .collect(); evaluator.set_main_args(rexx_args); } ``` --- ### Positive Findings **Parser (src/parser.rs:813-1002)** - ✅ Template parsing correctly handles all REXX pattern types - ✅ Error handling uses correct error numbers (Error 20, 25, 38) - ✅ parse_value_depth guard prevents WITH from being consumed by implicit concatenation - ✅ No panics on malformed input — all paths return proper RexxError **Evaluator (src/eval.rs:258-526)** - ✅ UTF-8 handling is correct — char_pos_to_byte_offset, advance_chars, retreat_chars use char boundaries - ✅ Word parsing correctly implements REXX blank rules (space 0x20 + tab 0x09) - ✅ Empty pattern handling matches REXX spec (treated as not found) - ✅ Backward positioning handled safely (no underflow) - ✅ to_position_value correctly uses to_number → round → parse for REXX numeric conversion **REXX Semantics** - ✅ ARG uppercases, PARSE ARG preserves case - ✅ PULL uppercases stdin - ✅ Single target gets verbatim, multiple targets word-parse - ✅ Last target gets remainder - ✅ Missing args leave variables unset (returning uppercase name) **Error Handling** - ✅ All error paths use correct REXX error numbers - ✅ No unwrap() or panic!() in production code (only in tests) - ✅ read_stdin_line gracefully handles EOF - ✅ Position overflow/underflow clamped safely **Test Coverage** - ✅ 42 comprehensive tests covering word parsing, literals, positions, dots, UPPER, VALUE, ARG, PULL, SOURCE, VERSION, variable patterns, UTF-8, edge cases, and error paths - ✅ Backward compatibility verified --- ### Minor Observations (Not Blockers) 1. **Performance — src/eval.rs:504-514 retreat_chars allocation** - Allocates Vec for every backward movement - Not a correctness issue, but could optimize with rev().nth() if this becomes hot 2. **Performance — src/eval.rs:449-455 template cloning** - Fast path exists to avoid clone when no commas present - Optimization is appropriate 3. **stdin EOF behavior — src/eval.rs:474-476** - Returns empty string on EOF (standard REXX behavior) - Silently ignores read errors (consistent with classic REXX) --- ### Recommendation **Fix the main.rs argument handling issue**, then this is ready to merge. The implementation is otherwise solid, well-tested, and correctly implements REXX PARSE semantics including proper error handling and UTF-8 safety.
claude[bot] commented 2026-02-09 01:57:52 +00:00 (Migrated from github.com)

Code Review: Phase 4 PARSE Implementation

Overview

This PR implements REXX PARSE instruction with full template grammar support. All 42 tests pass.

Critical Issues

1. Character boundary panic risk in retreat_chars()

Location: src/eval.rs:524
Issue: Could panic if byte_cursor falls between UTF-8 character boundaries.
Recommendation: Add is_char_boundary validation before slicing.

2. Missing error context in position expressions

Location: src/eval.rs:348
Issue: Division by zero in template positions loses context.
Recommendation: Wrap eval_expr with better error messages.

3. i32 overflow in relative positions

Location: src/parser.rs:839, 855
Issue: Large relative positions (+999999999999) fail i32 parse.
Recommendation: Use i64 or proper Error 42 for overflow.

Important Issues

4. Unnecessary clone in split_template_at_commas()

Location: src/eval.rs:469
Impact: Every PARSE without commas clones entire template.
Recommendation: Use borrowed slices.

5. Quadratic behavior in retreat_chars()

Location: src/eval.rs:524
Impact: O(cursor_position) memory for large strings.
Recommendation: Use reverse iteration without collecting.

6. parse_value_depth not reset on error

Location: src/parser.rs:937-938
Impact: Parser state corrupted after PARSE VALUE error.
Recommendation: Decrement depth before propagating error.

7. Stdin errors ignored

Location: src/eval.rs:490
Impact: Returns empty string instead of Error 48.

What Was Done Well

  1. UTF-8 handling with char_indices() (eval.rs:502-516)
  2. Correct REXX error codes (20, 25, 26, 38)
  3. Word parsing with space+tab blanks (eval.rs:406, 411)
  4. 42 comprehensive tests including edge cases
  5. Division by zero raises Error 42 correctly

Summary

Critical: 3 | Important: 4 | Minor: 2

Recommendation: Address critical issues 1-3 before merge.

Reviewed by: Claude Sonnet 4.5
Standards: ANSI X3.274-1996

# Code Review: Phase 4 PARSE Implementation ## Overview This PR implements REXX PARSE instruction with full template grammar support. All 42 tests pass. ## Critical Issues ### 1. Character boundary panic risk in retreat_chars() **Location:** src/eval.rs:524 **Issue:** Could panic if byte_cursor falls between UTF-8 character boundaries. **Recommendation:** Add is_char_boundary validation before slicing. ### 2. Missing error context in position expressions **Location:** src/eval.rs:348 **Issue:** Division by zero in template positions loses context. **Recommendation:** Wrap eval_expr with better error messages. ### 3. i32 overflow in relative positions **Location:** src/parser.rs:839, 855 **Issue:** Large relative positions (+999999999999) fail i32 parse. **Recommendation:** Use i64 or proper Error 42 for overflow. ## Important Issues ### 4. Unnecessary clone in split_template_at_commas() **Location:** src/eval.rs:469 **Impact:** Every PARSE without commas clones entire template. **Recommendation:** Use borrowed slices. ### 5. Quadratic behavior in retreat_chars() **Location:** src/eval.rs:524 **Impact:** O(cursor_position) memory for large strings. **Recommendation:** Use reverse iteration without collecting. ### 6. parse_value_depth not reset on error **Location:** src/parser.rs:937-938 **Impact:** Parser state corrupted after PARSE VALUE error. **Recommendation:** Decrement depth before propagating error. ### 7. Stdin errors ignored **Location:** src/eval.rs:490 **Impact:** Returns empty string instead of Error 48. ## What Was Done Well 1. UTF-8 handling with char_indices() (eval.rs:502-516) 2. Correct REXX error codes (20, 25, 26, 38) 3. Word parsing with space+tab blanks (eval.rs:406, 411) 4. 42 comprehensive tests including edge cases 5. Division by zero raises Error 42 correctly ## Summary Critical: 3 | Important: 4 | Minor: 2 **Recommendation:** Address critical issues 1-3 before merge. **Reviewed by:** Claude Sonnet 4.5 **Standards:** ANSI X3.274-1996
claude[bot] commented 2026-02-09 02:11:24 +00:00 (Migrated from github.com)

PR Review: Phase 4 PARSE Implementation

Summary

Comprehensive REXX PARSE implementation with excellent test coverage. No critical issues found — code is safe, correct, and well-tested.


Strengths

  1. Correct REXX Semantics

    • Word parsing per ANSI spec (space + tab only)
    • Empty patterns handled correctly
    • Proper case handling: ARG=uppercase, PARSE ARG=preserves case
  2. UTF-8 Safety

    • All positions use char_indices() for proper boundaries
    • No panic paths from invalid slicing
    • Tests cover multibyte chars
  3. Error Handling

    • Correct REXX error numbers (20, 25, 26, 38)
    • No unwrap() in production code
    • All errors propagated via Result
  4. Test Coverage

    • 42 integration tests
    • Covers edge cases, UTF-8, error paths

🔍 Issues Found

1. CLI Argument Handling (src/main.rs:74-76) — Medium Severity

Issue: CLI args are joined into single string, losing boundaries.

Current: args.join(" ") creates ["foo bar"] not ["foo", "bar"]

Fix:

evaluator.set_main_args(
    args.iter().map(|s| value::RexxValue::new(s)).collect()
);

2. Division by Zero Error (src/eval.rs:977, 991) — Low Severity

Uses Error 42 (overflow) for div-by-zero. ANSI spec is ambiguous; Regina/ooRexx use 42, so acceptable. Add comment explaining choice.

3. Defensive Bounds Check (src/eval.rs:437) — Very Low

source[cursor..] safe due to invariants, but consider explicit clamp:

let cursor = cursor.min(source.len());

📋 Recommendations

  • Fix CLI arg handling (only breaking change)
  • Add test for CLI comma-separated args
  • Consider RAII guard for parse_value_depth

Verdict: Approve with Minor Fix

Excellent implementation. Merge after fixing CLI args.

Reviewed by: Claude Sonnet 4.5

# PR Review: Phase 4 PARSE Implementation ## Summary Comprehensive REXX PARSE implementation with excellent test coverage. **No critical issues found** — code is safe, correct, and well-tested. --- ## ✅ Strengths 1. **Correct REXX Semantics** - Word parsing per ANSI spec (space + tab only) - Empty patterns handled correctly - Proper case handling: ARG=uppercase, PARSE ARG=preserves case 2. **UTF-8 Safety** - All positions use char_indices() for proper boundaries - No panic paths from invalid slicing - Tests cover multibyte chars 3. **Error Handling** - Correct REXX error numbers (20, 25, 26, 38) - No unwrap() in production code - All errors propagated via Result 4. **Test Coverage** - 42 integration tests - Covers edge cases, UTF-8, error paths --- ## 🔍 Issues Found ### 1. CLI Argument Handling (src/main.rs:74-76) — **Medium Severity** **Issue:** CLI args are joined into single string, losing boundaries. Current: `args.join(" ")` creates ["foo bar"] not ["foo", "bar"] **Fix:** ```rust evaluator.set_main_args( args.iter().map(|s| value::RexxValue::new(s)).collect() ); ``` ### 2. Division by Zero Error (src/eval.rs:977, 991) — **Low Severity** Uses Error 42 (overflow) for div-by-zero. ANSI spec is ambiguous; Regina/ooRexx use 42, so acceptable. Add comment explaining choice. ### 3. Defensive Bounds Check (src/eval.rs:437) — **Very Low** `source[cursor..]` safe due to invariants, but consider explicit clamp: ```rust let cursor = cursor.min(source.len()); ``` --- ## 📋 Recommendations - Fix CLI arg handling (only breaking change) - Add test for CLI comma-separated args - Consider RAII guard for parse_value_depth --- ## ✅ Verdict: **Approve with Minor Fix** Excellent implementation. Merge after fixing CLI args. **Reviewed by:** Claude Sonnet 4.5
Sign in to join this conversation.
No description provided.