navicore/patch-rexx

Fork 0

⏺ Phase 4 is complete. Here's a summary of what was implemented: #5

Merged

navicore merged 6 commits from phase4 into main

2026-02-09 02:18:43 +00:00

navicore commented

2026-02-09 01:17:04 +00:00

(Migrated from github.com)

Summary

Files modified:

src/parser.rs — Added PARSE instruction parsing:

parse_value_depth field to stop expression parsing at WITH keyword
is_with_keyword() helper and WITH break in parse_concat()
parse_template() — full template grammar (variables, dots, literals, absolute/relative positions, variable patterns, commas)
parse_parse() — PARSE [UPPER] source template with all sources (ARG, PULL, SOURCE, VERSION, LINEIN, VAR, VALUE)
parse_pull() — PULL [template]
Upgraded parse_arg() from Phase 3's simple one-var-per-position to full template parsing
Wired PARSE and PULL into parse_clause()

src/eval.rs — Added PARSE template engine:

apply_template() — core template engine implementing REXX word-parsing, literal patterns, absolute/relative positions, variable patterns
assign_section() — 0 targets: skip; 1 target: verbatim; 2+: word-parse
exec_parse() — source dispatch with comma-split sub-templates
read_stdin_line() for PULL/LINEIN
set_main_args() for CLI arg injection
Upgraded exec_arg() to route through exec_parse(true, Arg, template)
Wired PARSE and PULL in exec_clause()

src/main.rs — Wired CLI args:

run_line() now accepts args: &[String] and pushes them as main program arguments

tests/phase4.rs — 31 integration tests covering:

Word parsing (1-3+ targets, excess vars)
Literal patterns (single/multi-char, not-found)
Positional patterns (absolute, relative, backward)
Dot placeholder
PARSE UPPER
PARSE VALUE ... WITH (simple and expression)
ARG with templates (word split, literal patterns, comma-separated)
PARSE ARG (case-preserving)
PARSE SOURCE / VERSION
PULL with piped stdin
Variable patterns
Backward compatibility with Phase 3
Edge cases (empty source, no template)

Summary Files modified: src/parser.rs — Added PARSE instruction parsing: - parse_value_depth field to stop expression parsing at WITH keyword - is_with_keyword() helper and WITH break in parse_concat() - parse_template() — full template grammar (variables, dots, literals, absolute/relative positions, variable patterns, commas) - parse_parse() — PARSE [UPPER] source template with all sources (ARG, PULL, SOURCE, VERSION, LINEIN, VAR, VALUE) - parse_pull() — PULL [template] - Upgraded parse_arg() from Phase 3's simple one-var-per-position to full template parsing - Wired PARSE and PULL into parse_clause() src/eval.rs — Added PARSE template engine: - apply_template() — core template engine implementing REXX word-parsing, literal patterns, absolute/relative positions, variable patterns - assign_section() — 0 targets: skip; 1 target: verbatim; 2+: word-parse - exec_parse() — source dispatch with comma-split sub-templates - read_stdin_line() for PULL/LINEIN - set_main_args() for CLI arg injection - Upgraded exec_arg() to route through exec_parse(true, Arg, template) - Wired PARSE and PULL in exec_clause() src/main.rs — Wired CLI args: - run_line() now accepts args: &[String] and pushes them as main program arguments tests/phase4.rs — 31 integration tests covering: - Word parsing (1-3+ targets, excess vars) - Literal patterns (single/multi-char, not-found) - Positional patterns (absolute, relative, backward) - Dot placeholder - PARSE UPPER - PARSE VALUE ... WITH (simple and expression) - ARG with templates (word split, literal patterns, comma-separated) - PARSE ARG (case-preserving) - PARSE SOURCE / VERSION - PULL with piped stdin - Variable patterns - Backward compatibility with Phase 3 - Edge cases (empty source, no template)

claude[bot] commented

2026-02-09 01:20:34 +00:00

(Migrated from github.com)

Phase 4 PARSE Implementation Review

I've reviewed PR #5 implementing PARSE instruction functionality for the REXX interpreter. Overall, this is solid work with good test coverage. I found several issues requiring attention:

Critical Issues

1. Integer Overflow in Template Parsing (src/parser.rs:839-860)

Issue: Parsing relative positions (+N, -N) uses i32, but the conversion from string can overflow without proper bounds checking.

Problem: If n is "2147483648" or larger, .parse::<i32>() will fail, but the error detail says "invalid relative position" rather than "overflow". More critically, malicious input like +999999999999999999999 could cause issues.

Fix: Add explicit range validation before parsing, or use checked arithmetic. Per REXX spec, positions should fit in implementation limits (typically ±999999999).

2. CLI Argument Handling Loses Fidelity (src/main.rs:74-77)

Issue: CLI arguments are joined with spaces, losing the distinction between multiple arguments.

Problem: In REXX, ARG a, b, c should parse comma-separated arguments. CLI invocation like patch-rexx script.rexx arg1 arg2 arg3 should make ARG a, b, c assign a="ARG1", b="ARG2", c="ARG3". But the current code joins them as "arg1 arg2 arg3", so comma parsing would fail (line 281-288 in eval.rs only sees one arg).

Fix: Pass arguments as separate RexxValue entries: args.iter().map(|s| value::RexxValue::new(s)).collect()

3. Potential Panic in Environment (src/env.rs:54, 75, 121, 171)

Not introduced by this PR, but exposed: Several .expect() calls assume scopes exist. If Environment::new() doesn't initialize with at least one scope, or if there's a bug in procedure stack management, this will panic on valid REXX programs using PARSE.

Recommendation: Since this is pre-existing, file a tracking issue. For this PR, just ensure PARSE operations don't expose this (currently safe).

Design/Correctness Issues

4. AbsolutePos Cast Truncation (src/eval.rs:365-366)

Issue: Conversion from i64 to usize uses as usize, which truncates on 32-bit platforms or with very large positions.

Problem: On a 32-bit system, position 5000000000 (valid i64) would truncate incorrectly. REXX should reject positions larger than usize::MAX.

Fix: Use usize::try_from(pos.saturating_sub(1)).unwrap_or(usize::MAX).min(source.len())

5. Word Parsing Doesn't Handle Tabs (src/eval.rs:441)

Issue: Word boundary detection uses .find(' '), which only matches ASCII space (0x20).

REXX Spec: Per ANSI X3.274-1996 §2.1.7, "blanks" include space and horizontal tab (0x09). The current implementation rejects tabs as word delimiters.

Fix: if let Some(space_pos) = trimmed.find(|c: char| c == ' ' || c == '\t') {

6. Missing Validation: PARSE ARG with UPPER (src/eval.rs:259-261)

Issue: exec_arg() hardcodes upper: true (correct per spec: "ARG is shorthand for PARSE UPPER ARG"), but doesn't validate that the user didn't write PARSE UPPER ARG explicitly, which would be redundant.

Fix (optional): Add a comment explaining that ARG always uppercases.

Performance/Efficiency

7. Unnecessary String Allocation in Comma Split (src/eval.rs:465-479)

Issue: split_template_at_commas() clones all TemplateElements even when no commas are present (the common case).

Impact: For PARSE VAR data a b c (no commas), this clones 3 elements unnecessarily.

Fix: Check for commas first: if !template.elements.iter().any(|e| matches!(e, TemplateElement::Comma)) { return vec![template.clone()]; }

8. Per-Call stdin Allocation (src/eval.rs:483-493)

Issue: read_stdin_line() allocates a new String for every PULL or PARSE LINEIN. In loops processing large files, this adds overhead.

Recommendation: Consider adding a reusable line buffer to Evaluator. Not critical for correctness, but worth profiling.

Testing Gaps

9. No Tests for Error Paths

The 31 tests in tests/phase4.rs cover happy paths comprehensively, but missing:

Parser errors: Invalid template syntax
Runtime errors: Non-numeric position in AbsolutePos
Edge cases: Empty templates, deeply nested expressions

Recommendation: Add 5-10 negative tests for robustness.

10. Missing Multi-Byte UTF-8 Test

Issue: REXX strings are byte-oriented, but Rust String is UTF-8. The slicing logic in apply_template() uses &source[cursor..], which panics if indices fall mid-UTF-8 character.

Current risk: LOW (REXX programs typically use ASCII), but could crash on malicious input.

Fix: Use source.char_indices() or validate UTF-8 boundaries.

Minor Issues

11. Inconsistent Error for Missing WITH (src/parser.rs:941-945)

Uses InvalidSubKeyword error for PARSE VALUE expr without WITH. Consider using a more specific error like ExpectedWith for clarity.

12. to_position_value() Duplication (src/eval.rs:496-504)

Nearly identical to to_integer() (lines 792-809). Consider consolidating.

Positive Notes

Excellent test coverage for word parsing, literals, positions, and edge cases
No arithmetic issues (BigDecimal not used in PARSE)
Safe string slicing for valid inputs
Correct REXX semantics for PULL, ARG uppercase, and VERSION/SOURCE strings
No unsafe code or unchecked indexing

Recommendation

Status: Approve with required changes

Required before merge:

Fix CLI argument handling (#2)
Fix relative position overflow validation (#1)
Fix word parsing to handle tabs (#5)
Add AbsolutePos range check (#4)

Suggested follow-up:

File issue for UTF-8 handling strategy (#10)
Add error path tests (#9)
Profile split_template_at_commas performance (#7)

Great work on this implementation! The template engine is well-structured and handles the complex REXX parsing rules correctly. Once the CLI args issue is fixed, this will be production-ready.

## Phase 4 PARSE Implementation Review I've reviewed PR #5 implementing PARSE instruction functionality for the REXX interpreter. Overall, this is **solid work** with good test coverage. I found several issues requiring attention: --- ## Critical Issues ### 1. Integer Overflow in Template Parsing (src/parser.rs:839-860) **Issue:** Parsing relative positions (+N, -N) uses `i32`, but the conversion from string can overflow without proper bounds checking. **Problem:** If `n` is "2147483648" or larger, `.parse::<i32>()` will fail, but the error detail says "invalid relative position" rather than "overflow". More critically, malicious input like `+999999999999999999999` could cause issues. **Fix:** Add explicit range validation before parsing, or use checked arithmetic. Per REXX spec, positions should fit in implementation limits (typically ±999999999). --- ### 2. CLI Argument Handling Loses Fidelity (src/main.rs:74-77) **Issue:** CLI arguments are joined with spaces, losing the distinction between multiple arguments. **Problem:** In REXX, `ARG a, b, c` should parse comma-separated arguments. CLI invocation like `patch-rexx script.rexx arg1 arg2 arg3` should make `ARG a, b, c` assign a="ARG1", b="ARG2", c="ARG3". But the current code joins them as "arg1 arg2 arg3", so comma parsing would fail (line 281-288 in eval.rs only sees one arg). **Fix:** Pass arguments as separate RexxValue entries: `args.iter().map(|s| value::RexxValue::new(s)).collect()` --- ### 3. Potential Panic in Environment (src/env.rs:54, 75, 121, 171) **Not introduced by this PR, but exposed:** Several `.expect()` calls assume scopes exist. If `Environment::new()` doesn't initialize with at least one scope, or if there's a bug in procedure stack management, this will panic on valid REXX programs using PARSE. **Recommendation:** Since this is pre-existing, file a tracking issue. For this PR, just ensure PARSE operations don't expose this (currently safe). --- ## Design/Correctness Issues ### 4. AbsolutePos Cast Truncation (src/eval.rs:365-366) **Issue:** Conversion from `i64` to `usize` uses `as usize`, which truncates on 32-bit platforms or with very large positions. **Problem:** On a 32-bit system, position `5000000000` (valid i64) would truncate incorrectly. REXX should reject positions larger than `usize::MAX`. **Fix:** Use `usize::try_from(pos.saturating_sub(1)).unwrap_or(usize::MAX).min(source.len())` --- ### 5. Word Parsing Doesn't Handle Tabs (src/eval.rs:441) **Issue:** Word boundary detection uses `.find(' ')`, which only matches ASCII space (0x20). **REXX Spec:** Per ANSI X3.274-1996 §2.1.7, "blanks" include space and horizontal tab (0x09). The current implementation rejects tabs as word delimiters. **Fix:** `if let Some(space_pos) = trimmed.find(|c: char| c == ' ' || c == '\t') {` --- ### 6. Missing Validation: PARSE ARG with UPPER (src/eval.rs:259-261) **Issue:** `exec_arg()` hardcodes `upper: true` (correct per spec: "ARG is shorthand for PARSE UPPER ARG"), but doesn't validate that the user didn't write `PARSE UPPER ARG` explicitly, which would be redundant. **Fix (optional):** Add a comment explaining that `ARG` always uppercases. --- ## Performance/Efficiency ### 7. Unnecessary String Allocation in Comma Split (src/eval.rs:465-479) **Issue:** `split_template_at_commas()` clones all `TemplateElement`s even when no commas are present (the common case). **Impact:** For `PARSE VAR data a b c` (no commas), this clones 3 elements unnecessarily. **Fix:** Check for commas first: `if !template.elements.iter().any(|e| matches!(e, TemplateElement::Comma)) { return vec![template.clone()]; }` --- ### 8. Per-Call stdin Allocation (src/eval.rs:483-493) **Issue:** `read_stdin_line()` allocates a new `String` for every `PULL` or `PARSE LINEIN`. In loops processing large files, this adds overhead. **Recommendation:** Consider adding a reusable line buffer to `Evaluator`. Not critical for correctness, but worth profiling. --- ## Testing Gaps ### 9. No Tests for Error Paths The 31 tests in `tests/phase4.rs` cover happy paths comprehensively, but missing: - Parser errors: Invalid template syntax - Runtime errors: Non-numeric position in AbsolutePos - Edge cases: Empty templates, deeply nested expressions **Recommendation:** Add 5-10 negative tests for robustness. --- ### 10. Missing Multi-Byte UTF-8 Test **Issue:** REXX strings are byte-oriented, but Rust `String` is UTF-8. The slicing logic in `apply_template()` uses `&source[cursor..]`, which panics if indices fall mid-UTF-8 character. **Current risk:** LOW (REXX programs typically use ASCII), but could crash on malicious input. **Fix:** Use `source.char_indices()` or validate UTF-8 boundaries. --- ## Minor Issues ### 11. Inconsistent Error for Missing WITH (src/parser.rs:941-945) Uses `InvalidSubKeyword` error for `PARSE VALUE expr` without `WITH`. Consider using a more specific error like `ExpectedWith` for clarity. ### 12. to_position_value() Duplication (src/eval.rs:496-504) Nearly identical to `to_integer()` (lines 792-809). Consider consolidating. --- ## Positive Notes - Excellent test coverage for word parsing, literals, positions, and edge cases - No arithmetic issues (BigDecimal not used in PARSE) - Safe string slicing for valid inputs - Correct REXX semantics for PULL, ARG uppercase, and VERSION/SOURCE strings - No unsafe code or unchecked indexing --- ## Recommendation **Status:** Approve with required changes **Required before merge:** 1. Fix CLI argument handling (#2) 2. Fix relative position overflow validation (#1) 3. Fix word parsing to handle tabs (#5) 4. Add AbsolutePos range check (#4) **Suggested follow-up:** - File issue for UTF-8 handling strategy (#10) - Add error path tests (#9) - Profile split_template_at_commas performance (#7) --- Great work on this implementation! The template engine is well-structured and handles the complex REXX parsing rules correctly. Once the CLI args issue is fixed, this will be production-ready.

claude[bot] commented

2026-02-09 01:36:22 +00:00

(Migrated from github.com)

PR Review: Phase 4 PARSE Implementation

I've reviewed the PARSE instruction implementation. The implementation is generally solid with good test coverage, but I've identified one critical safety issue.

Critical Issue

1. UTF-8 String Slicing Can Panic (Safety - High Priority)

Location: src/eval.rs:334-409 (throughout apply_template)

Issue: All string slicing operations use byte-based indexing (source[cursor..], source[cursor..abs], etc.) without UTF-8 boundary validation. This will panic at runtime if cursor or calculated positions land mid-character in multi-byte UTF-8 sequences.

Example that would panic:
data = '🎯test'
parse var data a 2 b

Why this matters: ANSI X3.274-1996 defines REXX string operations on characters, not bytes. Modern implementations must handle multi-byte encodings safely.

Recommended fix: Use .char_indices() to build a byte-to-character position map, or document ASCII-only and validate at parse time.

Lines affected: 334, 344-348, 368, 385, 397-399

Correctness Issues

2. Absolute Position Operates on Bytes Not Characters

Location: src/eval.rs:366-373

Position 5 should mean character 5, not byte 5. For multi-byte characters, this creates invalid slices.

Performance Observations

3. String Slicing Allocations in Word Parsing

Location: src/eval.rs:434-454

Assessment: Likely fine for typical REXX use cases.

Potential Edge Cases

4. Empty Pattern in Variable Pattern Matching

Location: src/eval.rs:393-409

If (var) evaluates to empty string, find() returns Some(0). Classic REXX treats empty pattern as not found.

Test coverage: No test for sep = ''; parse var data a (sep) b

Positive Observations

✅ Error handling: Proper REXX error numbers (25, 38, 20)
✅ Test coverage: 38 tests passing
✅ REXX semantics: Correctly implements word parsing, patterns, uppercase, tabs
✅ Performance: Fast path optimization in split_template_at_commas
✅ No panics on malformed input

Recommendation

Must fix before merge: Issue 1 (UTF-8 slicing panic)

Optional: Add UTF-8 test or document ASCII-only limitation

Test results: All 38 tests passing ✅

## PR Review: Phase 4 PARSE Implementation I've reviewed the PARSE instruction implementation. The implementation is generally solid with good test coverage, but I've identified one critical safety issue. --- ## Critical Issue ### 1. UTF-8 String Slicing Can Panic (Safety - High Priority) **Location:** src/eval.rs:334-409 (throughout apply_template) **Issue:** All string slicing operations use byte-based indexing (source[cursor..], source[cursor..abs], etc.) without UTF-8 boundary validation. This will panic at runtime if cursor or calculated positions land mid-character in multi-byte UTF-8 sequences. **Example that would panic:** data = '🎯test' parse var data a 2 b **Why this matters:** ANSI X3.274-1996 defines REXX string operations on characters, not bytes. Modern implementations must handle multi-byte encodings safely. **Recommended fix:** Use .char_indices() to build a byte-to-character position map, or document ASCII-only and validate at parse time. **Lines affected:** 334, 344-348, 368, 385, 397-399 --- ## Correctness Issues ### 2. Absolute Position Operates on Bytes Not Characters **Location:** src/eval.rs:366-373 Position 5 should mean character 5, not byte 5. For multi-byte characters, this creates invalid slices. --- ## Performance Observations ### 3. String Slicing Allocations in Word Parsing **Location:** src/eval.rs:434-454 Assessment: Likely fine for typical REXX use cases. --- ## Potential Edge Cases ### 4. Empty Pattern in Variable Pattern Matching **Location:** src/eval.rs:393-409 If (var) evaluates to empty string, find() returns Some(0). Classic REXX treats empty pattern as not found. Test coverage: No test for sep = ''; parse var data a (sep) b --- ## Positive Observations ✅ Error handling: Proper REXX error numbers (25, 38, 20) ✅ Test coverage: 38 tests passing ✅ REXX semantics: Correctly implements word parsing, patterns, uppercase, tabs ✅ Performance: Fast path optimization in split_template_at_commas ✅ No panics on malformed input --- ## Recommendation **Must fix before merge:** Issue 1 (UTF-8 slicing panic) **Optional:** Add UTF-8 test or document ASCII-only limitation Test results: All 38 tests passing ✅

claude[bot] commented

2026-02-09 01:47:50 +00:00

(Migrated from github.com)

Code Review: Phase 4 PARSE Implementation

Reviewed REXX PARSE instruction implementation for correctness, safety, and REXX semantics compliance per ANSI X3.274-1996.

Critical Issues

1. Panic Risk: RelativePos Integer Overflow (src/eval.rs:383)

Severity: HIGH (causes panic in debug, undefined behavior in release)

TemplateElement::RelativePos(offset) => {
    #[allow(clippy::cast_sign_loss)]
    let target = if *offset >= 0 {
        Self::advance_chars(source, cursor, *offset as usize)
    } else {
        Self::retreat_chars(source, cursor, (-*offset) as usize)  // ← PANIC if offset == i32::MIN
    }

Issue: Negating i32::MIN (-2147483648) causes arithmetic overflow. In debug mode this panics, in release mode it wraps to the same value.

Fix:

let n = offset.unsigned_abs() as usize;
let target = if *offset >= 0 {
    Self::advance_chars(source, cursor, n)
} else {
    Self::retreat_chars(source, cursor, n)
};

2. Panic Risk: Out-of-Bounds String Slicing (src/eval.rs:520, 529)

Severity: HIGH (panics on malformed input)

advance_chars() line 520:

fn advance_chars(source: &str, byte_cursor: usize, n: usize) -> usize {
    source[byte_cursor..]  // ← PANIC if byte_cursor > source.len()

retreat_chars() line 529:

fn retreat_chars(source: &str, byte_cursor: usize, n: usize) -> usize {
    let boundaries: Vec<usize> = source[..byte_cursor]  // ← PANIC if byte_cursor > source.len()

Root cause: In apply_template(), the cursor can advance beyond source.len() via patterns. While most code paths check cursor < source.len() before slicing (lines 333, 351, 405), the slices at lines 346, 369, 386, 401 assume cursor is in bounds. When cursor exceeds bounds, subsequent calls to advance_chars/retreat_chars panic.

Test case that could trigger (not currently in test suite):

data = 'ab'
parse var data a 10 b +5 c  /* pos 10 exceeds length 2, then +5 from there */

Fix: Add bounds checks:

fn advance_chars(source: &str, byte_cursor: usize, n: usize) -> usize {
    if byte_cursor >= source.len() { return source.len(); }
    source[byte_cursor..]
        .char_indices()
        .nth(n)
        .map_or(source.len(), |(offset, _)| byte_cursor + offset)
}

fn retreat_chars(source: &str, byte_cursor: usize, n: usize) -> usize {
    let byte_cursor = byte_cursor.min(source.len());
    let boundaries: Vec<usize> = source[..byte_cursor]
        .char_indices()
        .map(|(i, _)| i)
        .collect();
    if n >= boundaries.len() {
        0
    } else {
        boundaries[boundaries.len() - n]
    }
}

3. Semantic Issue: to_position_value Silently Rounds (src/eval.rs:543)

Severity: MEDIUM (incorrect REXX semantics)

fn to_position_value(&self, val: &RexxValue) -> RexxResult<i64> {
    let d = self.to_number(val)?;
    let rounded = d.round(0);  // ← Silently rounds 3.7 → 4
    let s = rounded.to_string();
    s.parse::<i64>().map_err(...)
}

Issue: REXX template positions must be exact integers. This silently rounds 3.7 to 4, inconsistent with to_integer() at lines 840-842 which explicitly checks if d != rounded and returns error.

Fix: Match to_integer() behavior:

fn to_position_value(&self, val: &RexxValue) -> RexxResult<i64> {
    let d = self.to_number(val)?;
    let rounded = d.round(0);
    if d != rounded {
        return Err(RexxDiagnostic::new(RexxError::InvalidWholeNumber)
            .with_detail(format!("'{}' is not a whole number", val)));
    }
    let s = rounded.to_string();
    s.parse::<i64>().map_err(|_| {
        RexxDiagnostic::new(RexxError::InvalidWholeNumber)
            .with_detail(format!("'{}' is not a valid position", val))
    })
}

Minor Issues

4. Parser: No Support for Nested Parentheses in Variable Patterns (src/parser.rs:867-883)

Severity: LOW (edge case, documented limitation)

Variable pattern parsing expects simple (varname) form. Input like parse var data a ((nested)) fails parsing with "expected symbol inside '(' ')'". This is acceptable as nested patterns are uncommon, but consider documenting.

5. Performance: Unnecessary Clone in split_template_at_commas (src/eval.rs:478)

Severity: LOW (minor allocation)

Fast path returns vec![template.clone()] when no commas found. Could use Cow or return &ParseTemplate to avoid clone, but negligible impact.

Correctness Verified ✓

REXX Semantics

✓ Uppercase conversion: ARG uppercases, PARSE ARG preserves case
✓ Missing arguments: Return empty string (not uppercase variable name)
✓ Empty variable patterns: Treated as not found per REXX spec
✓ Word parsing: Tabs treated as blanks (lines 254, 259)
✓ 1-based positions: Correctly converted to 0-based (line 366)
✓ Backward positioning: Returns empty string when target ≤ cursor

UTF-8 Safety

✓ char_pos_to_byte_offset: Uses char_indices(), respects boundaries
✓ Multi-byte characters: Tests pass for emoji and accented chars

Edge Cases

✓ Empty source string: Handled (test: parse_var_empty_source)
✓ Empty template: No crash (test: parse_var_no_template)
✓ Comma handling: Empty sub-templates between commas work correctly
✓ Pattern not found: Consumes rest of string
✓ Overflow in parse: +99999999999 returns Error 38 (InvalidTemplate)

Test Coverage

Excellent: 42 tests covering all major paths. Suggestions for additional tests:

RelativePos overflow: parse var data a -2147483648 b
Cursor out of bounds: parse var "ab" x 10 y +5 z
Non-integer position: parse var data a 3.7 b (should error after fix)
Position overflow: parse value "test" with a 999999999999999999 b

Summary

Overall assessment: High-quality implementation with comprehensive test coverage. The implementation correctly handles REXX semantics, UTF-8, and edge cases. Three critical panic risks need fixing before merge:

Fix RelativePos i32::MIN overflow (1 line change)
Add bounds checks to advance_chars/retreat_chars (4 lines)
Fix to_position_value to reject non-integers (3 lines)

All tests pass. No arithmetic errors, no division by zero issues, error numbers are correct. The PARSE template engine implementation is algorithmically sound.

Recommendation: Request fixes for the three critical issues, then approve.

# Code Review: Phase 4 PARSE Implementation Reviewed REXX PARSE instruction implementation for correctness, safety, and REXX semantics compliance per ANSI X3.274-1996. ## Critical Issues ### 1. **Panic Risk: RelativePos Integer Overflow** (src/eval.rs:383) **Severity:** HIGH (causes panic in debug, undefined behavior in release) ```rust TemplateElement::RelativePos(offset) => { #[allow(clippy::cast_sign_loss)] let target = if *offset >= 0 { Self::advance_chars(source, cursor, *offset as usize) } else { Self::retreat_chars(source, cursor, (-*offset) as usize) // ← PANIC if offset == i32::MIN } ``` **Issue:** Negating `i32::MIN` (-2147483648) causes arithmetic overflow. In debug mode this panics, in release mode it wraps to the same value. **Fix:** ```rust let n = offset.unsigned_abs() as usize; let target = if *offset >= 0 { Self::advance_chars(source, cursor, n) } else { Self::retreat_chars(source, cursor, n) }; ``` --- ### 2. **Panic Risk: Out-of-Bounds String Slicing** (src/eval.rs:520, 529) **Severity:** HIGH (panics on malformed input) **advance_chars()** line 520: ```rust fn advance_chars(source: &str, byte_cursor: usize, n: usize) -> usize { source[byte_cursor..] // ← PANIC if byte_cursor > source.len() ``` **retreat_chars()** line 529: ```rust fn retreat_chars(source: &str, byte_cursor: usize, n: usize) -> usize { let boundaries: Vec<usize> = source[..byte_cursor] // ← PANIC if byte_cursor > source.len() ``` **Root cause:** In `apply_template()`, the cursor can advance beyond `source.len()` via patterns. While most code paths check `cursor < source.len()` before slicing (lines 333, 351, 405), the slices at lines 346, 369, 386, 401 assume cursor is in bounds. When cursor exceeds bounds, subsequent calls to advance_chars/retreat_chars panic. **Test case that could trigger (not currently in test suite):** ```rexx data = 'ab' parse var data a 10 b +5 c /* pos 10 exceeds length 2, then +5 from there */ ``` **Fix:** Add bounds checks: ```rust fn advance_chars(source: &str, byte_cursor: usize, n: usize) -> usize { if byte_cursor >= source.len() { return source.len(); } source[byte_cursor..] .char_indices() .nth(n) .map_or(source.len(), |(offset, _)| byte_cursor + offset) } fn retreat_chars(source: &str, byte_cursor: usize, n: usize) -> usize { let byte_cursor = byte_cursor.min(source.len()); let boundaries: Vec<usize> = source[..byte_cursor] .char_indices() .map(|(i, _)| i) .collect(); if n >= boundaries.len() { 0 } else { boundaries[boundaries.len() - n] } } ``` --- ### 3. **Semantic Issue: to_position_value Silently Rounds** (src/eval.rs:543) **Severity:** MEDIUM (incorrect REXX semantics) ```rust fn to_position_value(&self, val: &RexxValue) -> RexxResult<i64> { let d = self.to_number(val)?; let rounded = d.round(0); // ← Silently rounds 3.7 → 4 let s = rounded.to_string(); s.parse::<i64>().map_err(...) } ``` **Issue:** REXX template positions must be exact integers. This silently rounds `3.7` to `4`, inconsistent with `to_integer()` at lines 840-842 which explicitly checks `if d != rounded` and returns error. **Fix:** Match `to_integer()` behavior: ```rust fn to_position_value(&self, val: &RexxValue) -> RexxResult<i64> { let d = self.to_number(val)?; let rounded = d.round(0); if d != rounded { return Err(RexxDiagnostic::new(RexxError::InvalidWholeNumber) .with_detail(format!("'{}' is not a whole number", val))); } let s = rounded.to_string(); s.parse::<i64>().map_err(|_| { RexxDiagnostic::new(RexxError::InvalidWholeNumber) .with_detail(format!("'{}' is not a valid position", val)) }) } ``` --- ## Minor Issues ### 4. **Parser: No Support for Nested Parentheses in Variable Patterns** (src/parser.rs:867-883) **Severity:** LOW (edge case, documented limitation) Variable pattern parsing expects simple `(varname)` form. Input like `parse var data a ((nested))` fails parsing with "expected symbol inside '(' ')'". This is acceptable as nested patterns are uncommon, but consider documenting. --- ### 5. **Performance: Unnecessary Clone in split_template_at_commas** (src/eval.rs:478) **Severity:** LOW (minor allocation) Fast path returns `vec![template.clone()]` when no commas found. Could use `Cow` or return `&ParseTemplate` to avoid clone, but negligible impact. --- ## Correctness Verified ✓ ### REXX Semantics - ✓ **Uppercase conversion**: ARG uppercases, PARSE ARG preserves case - ✓ **Missing arguments**: Return empty string (not uppercase variable name) - ✓ **Empty variable patterns**: Treated as not found per REXX spec - ✓ **Word parsing**: Tabs treated as blanks (lines 254, 259) - ✓ **1-based positions**: Correctly converted to 0-based (line 366) - ✓ **Backward positioning**: Returns empty string when target ≤ cursor ### UTF-8 Safety - ✓ **char_pos_to_byte_offset**: Uses `char_indices()`, respects boundaries - ✓ **Multi-byte characters**: Tests pass for emoji and accented chars ### Edge Cases - ✓ **Empty source string**: Handled (test: parse_var_empty_source) - ✓ **Empty template**: No crash (test: parse_var_no_template) - ✓ **Comma handling**: Empty sub-templates between commas work correctly - ✓ **Pattern not found**: Consumes rest of string - ✓ **Overflow in parse**: `+99999999999` returns Error 38 (InvalidTemplate) --- ## Test Coverage **Excellent:** 42 tests covering all major paths. Suggestions for additional tests: 1. **RelativePos overflow:** `parse var data a -2147483648 b` 2. **Cursor out of bounds:** `parse var "ab" x 10 y +5 z` 3. **Non-integer position:** `parse var data a 3.7 b` (should error after fix) 4. **Position overflow:** `parse value "test" with a 999999999999999999 b` --- ## Summary **Overall assessment:** High-quality implementation with comprehensive test coverage. The implementation correctly handles REXX semantics, UTF-8, and edge cases. Three critical panic risks need fixing before merge: 1. Fix RelativePos i32::MIN overflow (1 line change) 2. Add bounds checks to advance_chars/retreat_chars (4 lines) 3. Fix to_position_value to reject non-integers (3 lines) All tests pass. No arithmetic errors, no division by zero issues, error numbers are correct. The PARSE template engine implementation is algorithmically sound. **Recommendation:** Request fixes for the three critical issues, then approve.

claude[bot] commented

2026-02-09 01:54:49 +00:00

(Migrated from github.com)

Code Review — Phase 4 PARSE Implementation

I've completed a thorough review of this PR focusing on correctness, REXX semantics, error handling, and safety. All 42 tests pass successfully.

Critical Issue Found

src/main.rs:74-76 — Incorrect ARG handling for multi-argument programs

The current code joins all CLI arguments into a single string:

if !args.is_empty() {
    let joined = args.join(" ");
    evaluator.set_main_args(vec![value::RexxValue::new(joined)]);
}

Problem: Per REXX semantics, when a program is called with multiple arguments like patch-rexx script.rexx foo bar baz, the ARG instruction should parse them as separate arguments accessible via comma syntax. With the current implementation, comma-separated parsing gets empty strings because there's only one argument.

Fix: Each CLI argument should be a separate RexxValue:

if !args.is_empty() {
    let rexx_args: Vec<_> = args.iter()
        .map(|s| value::RexxValue::new(s))
        .collect();
    evaluator.set_main_args(rexx_args);
}

Positive Findings

Parser (src/parser.rs:813-1002)

✅ Template parsing correctly handles all REXX pattern types
✅ Error handling uses correct error numbers (Error 20, 25, 38)
✅ parse_value_depth guard prevents WITH from being consumed by implicit concatenation
✅ No panics on malformed input — all paths return proper RexxError

Evaluator (src/eval.rs:258-526)

✅ UTF-8 handling is correct — char_pos_to_byte_offset, advance_chars, retreat_chars use char boundaries
✅ Word parsing correctly implements REXX blank rules (space 0x20 + tab 0x09)
✅ Empty pattern handling matches REXX spec (treated as not found)
✅ Backward positioning handled safely (no underflow)
✅ to_position_value correctly uses to_number → round → parse for REXX numeric conversion

REXX Semantics

✅ ARG uppercases, PARSE ARG preserves case
✅ PULL uppercases stdin
✅ Single target gets verbatim, multiple targets word-parse
✅ Last target gets remainder
✅ Missing args leave variables unset (returning uppercase name)

Error Handling

✅ All error paths use correct REXX error numbers
✅ No unwrap() or panic!() in production code (only in tests)
✅ read_stdin_line gracefully handles EOF
✅ Position overflow/underflow clamped safely

Test Coverage

✅ 42 comprehensive tests covering word parsing, literals, positions, dots, UPPER, VALUE, ARG, PULL, SOURCE, VERSION, variable patterns, UTF-8, edge cases, and error paths
✅ Backward compatibility verified

Minor Observations (Not Blockers)

Performance — src/eval.rs:504-514 retreat_chars allocation
- Allocates Vec for every backward movement
- Not a correctness issue, but could optimize with rev().nth() if this becomes hot
Performance — src/eval.rs:449-455 template cloning
- Fast path exists to avoid clone when no commas present
- Optimization is appropriate
stdin EOF behavior — src/eval.rs:474-476
- Returns empty string on EOF (standard REXX behavior)
- Silently ignores read errors (consistent with classic REXX)

Recommendation

Fix the main.rs argument handling issue, then this is ready to merge. The implementation is otherwise solid, well-tested, and correctly implements REXX PARSE semantics including proper error handling and UTF-8 safety.

## Code Review — Phase 4 PARSE Implementation I've completed a thorough review of this PR focusing on correctness, REXX semantics, error handling, and safety. All 42 tests pass successfully. ### Critical Issue Found **src/main.rs:74-76 — Incorrect ARG handling for multi-argument programs** The current code joins all CLI arguments into a single string: ```rust if !args.is_empty() { let joined = args.join(" "); evaluator.set_main_args(vec![value::RexxValue::new(joined)]); } ``` **Problem:** Per REXX semantics, when a program is called with multiple arguments like `patch-rexx script.rexx foo bar baz`, the ARG instruction should parse them as separate arguments accessible via comma syntax. With the current implementation, comma-separated parsing gets empty strings because there's only one argument. **Fix:** Each CLI argument should be a separate RexxValue: ```rust if !args.is_empty() { let rexx_args: Vec<_> = args.iter() .map(|s| value::RexxValue::new(s)) .collect(); evaluator.set_main_args(rexx_args); } ``` --- ### Positive Findings **Parser (src/parser.rs:813-1002)** - ✅ Template parsing correctly handles all REXX pattern types - ✅ Error handling uses correct error numbers (Error 20, 25, 38) - ✅ parse_value_depth guard prevents WITH from being consumed by implicit concatenation - ✅ No panics on malformed input — all paths return proper RexxError **Evaluator (src/eval.rs:258-526)** - ✅ UTF-8 handling is correct — char_pos_to_byte_offset, advance_chars, retreat_chars use char boundaries - ✅ Word parsing correctly implements REXX blank rules (space 0x20 + tab 0x09) - ✅ Empty pattern handling matches REXX spec (treated as not found) - ✅ Backward positioning handled safely (no underflow) - ✅ to_position_value correctly uses to_number → round → parse for REXX numeric conversion **REXX Semantics** - ✅ ARG uppercases, PARSE ARG preserves case - ✅ PULL uppercases stdin - ✅ Single target gets verbatim, multiple targets word-parse - ✅ Last target gets remainder - ✅ Missing args leave variables unset (returning uppercase name) **Error Handling** - ✅ All error paths use correct REXX error numbers - ✅ No unwrap() or panic!() in production code (only in tests) - ✅ read_stdin_line gracefully handles EOF - ✅ Position overflow/underflow clamped safely **Test Coverage** - ✅ 42 comprehensive tests covering word parsing, literals, positions, dots, UPPER, VALUE, ARG, PULL, SOURCE, VERSION, variable patterns, UTF-8, edge cases, and error paths - ✅ Backward compatibility verified --- ### Minor Observations (Not Blockers) 1. **Performance — src/eval.rs:504-514 retreat_chars allocation** - Allocates Vec for every backward movement - Not a correctness issue, but could optimize with rev().nth() if this becomes hot 2. **Performance — src/eval.rs:449-455 template cloning** - Fast path exists to avoid clone when no commas present - Optimization is appropriate 3. **stdin EOF behavior — src/eval.rs:474-476** - Returns empty string on EOF (standard REXX behavior) - Silently ignores read errors (consistent with classic REXX) --- ### Recommendation **Fix the main.rs argument handling issue**, then this is ready to merge. The implementation is otherwise solid, well-tested, and correctly implements REXX PARSE semantics including proper error handling and UTF-8 safety.

claude[bot] commented

2026-02-09 01:57:52 +00:00

(Migrated from github.com)

Code Review: Phase 4 PARSE Implementation

Overview

This PR implements REXX PARSE instruction with full template grammar support. All 42 tests pass.

Critical Issues

1. Character boundary panic risk in retreat_chars()

Location: src/eval.rs:524
Issue: Could panic if byte_cursor falls between UTF-8 character boundaries.
Recommendation: Add is_char_boundary validation before slicing.

2. Missing error context in position expressions

Location: src/eval.rs:348
Issue: Division by zero in template positions loses context.
Recommendation: Wrap eval_expr with better error messages.

3. i32 overflow in relative positions

Location: src/parser.rs:839, 855
Issue: Large relative positions (+999999999999) fail i32 parse.
Recommendation: Use i64 or proper Error 42 for overflow.

Important Issues

4. Unnecessary clone in split_template_at_commas()

Location: src/eval.rs:469
Impact: Every PARSE without commas clones entire template.
Recommendation: Use borrowed slices.

5. Quadratic behavior in retreat_chars()

Location: src/eval.rs:524
Impact: O(cursor_position) memory for large strings.
Recommendation: Use reverse iteration without collecting.

6. parse_value_depth not reset on error

Location: src/parser.rs:937-938
Impact: Parser state corrupted after PARSE VALUE error.
Recommendation: Decrement depth before propagating error.

7. Stdin errors ignored

Location: src/eval.rs:490
Impact: Returns empty string instead of Error 48.

What Was Done Well

UTF-8 handling with char_indices() (eval.rs:502-516)
Correct REXX error codes (20, 25, 26, 38)
Word parsing with space+tab blanks (eval.rs:406, 411)
42 comprehensive tests including edge cases
Division by zero raises Error 42 correctly

Summary

Critical: 3 | Important: 4 | Minor: 2

Recommendation: Address critical issues 1-3 before merge.

Reviewed by: Claude Sonnet 4.5
Standards: ANSI X3.274-1996

# Code Review: Phase 4 PARSE Implementation ## Overview This PR implements REXX PARSE instruction with full template grammar support. All 42 tests pass. ## Critical Issues ### 1. Character boundary panic risk in retreat_chars() **Location:** src/eval.rs:524 **Issue:** Could panic if byte_cursor falls between UTF-8 character boundaries. **Recommendation:** Add is_char_boundary validation before slicing. ### 2. Missing error context in position expressions **Location:** src/eval.rs:348 **Issue:** Division by zero in template positions loses context. **Recommendation:** Wrap eval_expr with better error messages. ### 3. i32 overflow in relative positions **Location:** src/parser.rs:839, 855 **Issue:** Large relative positions (+999999999999) fail i32 parse. **Recommendation:** Use i64 or proper Error 42 for overflow. ## Important Issues ### 4. Unnecessary clone in split_template_at_commas() **Location:** src/eval.rs:469 **Impact:** Every PARSE without commas clones entire template. **Recommendation:** Use borrowed slices. ### 5. Quadratic behavior in retreat_chars() **Location:** src/eval.rs:524 **Impact:** O(cursor_position) memory for large strings. **Recommendation:** Use reverse iteration without collecting. ### 6. parse_value_depth not reset on error **Location:** src/parser.rs:937-938 **Impact:** Parser state corrupted after PARSE VALUE error. **Recommendation:** Decrement depth before propagating error. ### 7. Stdin errors ignored **Location:** src/eval.rs:490 **Impact:** Returns empty string instead of Error 48. ## What Was Done Well 1. UTF-8 handling with char_indices() (eval.rs:502-516) 2. Correct REXX error codes (20, 25, 26, 38) 3. Word parsing with space+tab blanks (eval.rs:406, 411) 4. 42 comprehensive tests including edge cases 5. Division by zero raises Error 42 correctly ## Summary Critical: 3 | Important: 4 | Minor: 2 **Recommendation:** Address critical issues 1-3 before merge. **Reviewed by:** Claude Sonnet 4.5 **Standards:** ANSI X3.274-1996

claude[bot] commented

2026-02-09 02:11:24 +00:00

(Migrated from github.com)

PR Review: Phase 4 PARSE Implementation

Summary

Comprehensive REXX PARSE implementation with excellent test coverage. No critical issues found — code is safe, correct, and well-tested.

✅ Strengths

Correct REXX Semantics
- Word parsing per ANSI spec (space + tab only)
- Empty patterns handled correctly
- Proper case handling: ARG=uppercase, PARSE ARG=preserves case
UTF-8 Safety
- All positions use char_indices() for proper boundaries
- No panic paths from invalid slicing
- Tests cover multibyte chars
Error Handling
- Correct REXX error numbers (20, 25, 26, 38)
- No unwrap() in production code
- All errors propagated via Result
Test Coverage
- 42 integration tests
- Covers edge cases, UTF-8, error paths

🔍 Issues Found

1. CLI Argument Handling (src/main.rs:74-76) — Medium Severity

Issue: CLI args are joined into single string, losing boundaries.

Current: args.join(" ") creates ["foo bar"] not ["foo", "bar"]

Fix:

evaluator.set_main_args(
    args.iter().map(|s| value::RexxValue::new(s)).collect()
);

2. Division by Zero Error (src/eval.rs:977, 991) — Low Severity

Uses Error 42 (overflow) for div-by-zero. ANSI spec is ambiguous; Regina/ooRexx use 42, so acceptable. Add comment explaining choice.

3. Defensive Bounds Check (src/eval.rs:437) — Very Low

source[cursor..] safe due to invariants, but consider explicit clamp:

let cursor = cursor.min(source.len());

📋 Recommendations

Fix CLI arg handling (only breaking change)
Add test for CLI comma-separated args
Consider RAII guard for parse_value_depth

✅ Verdict: Approve with Minor Fix

Excellent implementation. Merge after fixing CLI args.

Reviewed by: Claude Sonnet 4.5

# PR Review: Phase 4 PARSE Implementation ## Summary Comprehensive REXX PARSE implementation with excellent test coverage. **No critical issues found** — code is safe, correct, and well-tested. --- ## ✅ Strengths 1. **Correct REXX Semantics** - Word parsing per ANSI spec (space + tab only) - Empty patterns handled correctly - Proper case handling: ARG=uppercase, PARSE ARG=preserves case 2. **UTF-8 Safety** - All positions use char_indices() for proper boundaries - No panic paths from invalid slicing - Tests cover multibyte chars 3. **Error Handling** - Correct REXX error numbers (20, 25, 26, 38) - No unwrap() in production code - All errors propagated via Result 4. **Test Coverage** - 42 integration tests - Covers edge cases, UTF-8, error paths --- ## 🔍 Issues Found ### 1. CLI Argument Handling (src/main.rs:74-76) — **Medium Severity** **Issue:** CLI args are joined into single string, losing boundaries. Current: `args.join(" ")` creates ["foo bar"] not ["foo", "bar"] **Fix:** ```rust evaluator.set_main_args( args.iter().map(|s| value::RexxValue::new(s)).collect() ); ``` ### 2. Division by Zero Error (src/eval.rs:977, 991) — **Low Severity** Uses Error 42 (overflow) for div-by-zero. ANSI spec is ambiguous; Regina/ooRexx use 42, so acceptable. Add comment explaining choice. ### 3. Defensive Bounds Check (src/eval.rs:437) — **Very Low** `source[cursor..]` safe due to invariants, but consider explicit clamp: ```rust let cursor = cursor.min(source.len()); ``` --- ## 📋 Recommendations - Fix CLI arg handling (only breaking change) - Add test for CLI comma-separated args - Consider RAII guard for parse_value_depth --- ## ✅ Verdict: **Approve with Minor Fix** Excellent implementation. Merge after fixing CLI args. **Reviewed by:** Claude Sonnet 4.5