Phase 2 delivered: parser #66
No reviewers
Labels
No labels
bug
documentation
duplicate
enhancement
good first issue
help wanted
invalid
question
seq-lisp-2-backport
wontfix
No milestone
No project
No assignees
1 participant
Notifications
Due date
No due date set.
Dependencies
No dependencies set.
Reference
navicore/seq-lisp!66
Loading…
Add table
Add a link
Reference in a new issue
No description provided.
Delete branch "self-hosting-parser"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
PR Review: Phase 2 - Self-Hosted Parser
Nice work - a clean recursive descent parser in ~200 lines with solid test coverage. The higher-order function trick to break mutual recursion between parse-expr and parse-list-items is elegant. A few things worth flagging:
strip-quotes - verify substring semantics are clear
The function is: (define (strip-quotes text) (substring text 1 (- (string-length text) 2))). substring takes (str start length). For token hello (7 chars with surrounding quotes): (substring text 1 5) gives hello. Correct. But easy to misread - a comment clarifying that the tokenizer always produces token text starting and ending with a double-quote would help future readers.
Unmatched closing paren is silently consumed as an atom
In parse-expr, a closing paren token appearing outside a list context falls through to the atom branch and becomes the symbol ) via string->symbol. No error is raised. Similarly, an unclosed opening paren in parse-list-items silently returns whatever was accumulated when tokens run out. Acceptable for Phase 2, but worth documenting as a known gap for Phase 3 / error recovery.
eval-with-errors creates an evaluator dependency
The PR description acknowledges this already. Two specific concerns: (1) Fallback type mismatch: If string->symbol fails, it returns name as a string instead of a symbol. Any code expecting a symbol from text-to-atom would get a string, causing silent type errors downstream. (2) Ordering dependency: The parser can only be loaded after the evaluator is initialized - worth a comment since it constrains bootstrapping order.
string-to-integer - minus sign handling assumes pos=0
The minus branch resets acc to 0 and negates the result. Works correctly when called as (string-to-integer text 0 0) since minus only appears at position 0 for valid number tokens. A comment noting this assumption would prevent misuse if the function is reused later.
Test gaps
A few cases not covered: (1) Negative floats: (parse -3.14) - exercises number-token? and string->float-val on a sign-prefixed string. (2) Strings with escape sequences - verifies strip-quotes works correctly after the tokenizer resolves escapes. (3) parse-all with empty input: (parse-all ) - edge case for the parse-all-exprs accumulator path. (4) Resilience to unclosed paren: A test documenting current behavior (partial result, no crash) would catch regressions when error handling is added.
Minor: digit-char? and digit-value duplication
Both implement 0-9 membership via nested if chains. Noted in the PR as a limitation without a native char->int builtin - the two are consistent. Worth combining once that builtin lands.
Overall
Architecture is sound, happy-path test coverage is comprehensive, and the HOF pattern for mutual recursion is clean. The main things to address before production use: silent error swallowing (unmatched tokens become atoms, unclosed lists return partial results silently) and the string->symbol fallback returning a string where a symbol is expected.
Follow-up: Corrections to the previous review
The prior review's "Test gaps" section contains inaccuracies. The following tests it flagged as missing are already present in
tests/lisp/reader/parser.slisp:(test "parse negative float" (assert-eq (parse "-3.14") -3.14))(test "parse string with escape" ...)(test "parse-all empty" (assert-eq (parse-all "") '()))(test "unclosed paren returns partial" (assert-eq (parse "(a b") '(a b)))Test coverage is actually quite good. Two genuine issues not previously called out:
Dead code:
string-starts-with?is defined but never usedDefined in parser.slisp but never called. Was probably a stepping stone during development. Safe to remove.
"extra close paren" test has a likely wrong expected value
The test named "extra close paren parsed as symbol" expects
(list)which equals'()(empty list). But tracing the code:parse-exprhits the atom branch, callstext-to-atom ")", which callsstring->symbol ")", which callseval-with-errorson the string"')". Since')is invalid syntax,eval-with-errorsreturns an error result andstring->symbolfalls back to returning the string")". Soparseof")"would return the string")", not'(). The test name, expected value, orstring->symbolfallback behavior needs clarification.Code Review - Phase 2: Self-Hosted Parser
Overall this is solid, well-structured work. The recursive descent approach is clean, the higher-order function trick to break mutual recursion is elegant, and the test suite is comprehensive. A few things worth looking at.
Potential Bug 1: strip-quotes off-by-one risk
The strip-quotes function uses: (substring text 1 (- (string-length text) 2)). This is correct if substring takes (str start length), which appears to be the convention based on char-at in the tokenizer. But the semantics are non-obvious - a future reader or a refactor to an end-based substring could silently introduce an off-by-one. A brief inline comment noting the calling convention would help.
Potential Bug 2: Test expectation mismatch on extra close paren
The test asserts: (assert-eq (parse close-paren) empty-list). Tracing through: close-paren is not open-paren, not a quote/quasiquote/unquote token, so it falls to the atom branch. text-to-atom calls string->symbol which calls eval-with-errors with quote+close-paren. Evaluating that in the host is malformed (quote with no argument), so string->symbol falls back and returns the string close-paren. The actual return would likely be the string, not the empty list. This test may be asserting the wrong expected value and masking a latent issue.
Potential Bug 3: string->symbol and string->float-val silent fallback to string
Both functions fall back to returning a raw string when eval-with-errors fails. If eval-with-errors fails for any identifier (e.g. a symbol name that is reserved or malformed in the host), downstream consumers expecting a symbol silently get a string instead. This could produce confusing behavior that is hard to debug. At minimum, a comment noting the known failure modes would help.
Design Consideration: Dot Notation
The choice to treat dot as a literal symbol rather than construct a proper cons pair is documented and intentional, and the tests reflect it. Worth noting this means the self-hosted parser cannot round-trip code that relies on proper pair structure. Documenting this as a known compatibility gap would be useful.
Minor notes
float-token? will return #t for strings like 1.2.3 (passes number-token? and contains a dot). string->float-val would then fail and silently return the text as a string. Low risk given the tokenizer produces well-formed tokens, but worth noting. The digit-char? / digit-value nested-if chains are verbose but appropriate given the absence of cond at this stage. The PR description acknowledging these as future builtin candidates is appreciated.
Test Coverage
36 tests is solid for a first pass. A few gaps worth filling eventually: symbols with special characters like !, ->, set! as bare atoms (not just in list position); float variants: 0.5, -0.5, 1.0; strings with multiple escape sequences in sequence; parse-all with mixed multi-level input.
Summary
The architecture is clean and consistent with the tokenizer. The higher-order parser parameter to break the parse-expr/parse-list-items mutual recursion without forward references is the highlight of this PR. The main actionable items are verifying the close-paren edge case test expectation and documenting the string->symbol fallback behavior.