What We Built #3
No reviewers
Labels
No labels
bug
dependencies
documentation
duplicate
enhancement
good first issue
help wanted
invalid
question
refactor
rust
technical-debt
wontfix
No milestone
No project
No assignees
1 participant
Notifications
Due date
No due date set.
Dependencies
No dependencies set.
Reference
navicore/patch-seq!3
Loading…
Add table
Add a link
Reference in a new issue
No description provided.
Delete branch "compiler"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
✅ Complete compiler pipeline:
✅ Key design decisions:
✅ Build system:
Working Example
cd /Users/navicore/git/navicore/cem3
just build-examples
./target/examples/hello-world
Output: Hello, World!
PR Review: Complete Compiler Pipeline - This is a well-structured implementation with clean separation of concerns (AST -> Parser -> CodeGen). STRENGTHS: (1) Text-based LLVM IR avoids FFI complexity, (2) Good documentation and error handling, (3) Unit tests for parser and codegen. CRITICAL ISSUES: (1) compiler/src/parser.rs:184-191 - String tokenizer does not handle escaped quotes - will incorrectly parse "Say "hello"", (2) parser.rs:178-224 - No detection of unclosed string literals. IMPORTANT: (1) lib.rs:42-50 - Hardcoded library path should validate libcem3_runtime.a exists, (2) codegen.rs:215-223 - No validation for undefined words (causes linker errors instead of compile errors), (3) codegen.rs:254-284 - Target triple returns "unknown" for unsupported platforms (should error). TEST GAPS: Missing tests for escaped quotes, unclosed strings, undefined words, multi-word programs. SECURITY: Low risk - no unsafe code, PathBuf for paths, but validate clang command args. RATING: Approve with improvements. Action items before merge: Fix string escape handling, add escaped quote test, detect unclosed strings. Nice to have: word validation pass, expand test coverage, configurable library path. Great work on the full pipeline - the text IR approach is maintainable!
Detailed Issue: String Tokenizer Bug
File: compiler/src/parser.rs lines 184-191
The tokenizer does not handle escaped quotes. It treats backslash-quote as a string terminator when it should be part of the string content.
Current code ends string on ANY quote character, not just unescaped ones.
Fix: Track whether previous character was backslash. Only end string when quote is not escaped.
Add test case for strings containing escaped quotes.
Missing Validation: Undefined Words
File: compiler/src/codegen.rs lines 215-223
When user code calls an undefined word, the compiler generates IR that references a non-existent function. This causes a linker error instead of a clear compile-time error.
Recommendation: Add validation pass before codegen that checks all WordCall statements reference either:
This will give users much better error messages at compile time.
Security Note: Clang Command Construction
File: compiler/src/lib.rs lines 42-50
The hardcoded library path "target/release" could be improved for robustness and security:
Example improvement:
Check Path::new("target/release/libcem3_runtime.a").exists() and return clear error if missing.
Overall security risk is low since PathBuf provides safety and this is a compiler tool, but these improvements would help with error messages and robustness.
Test Coverage Recommendations
The existing tests are good but several edge cases are missing:
These tests would help catch parser edge cases and ensure robustness.
The codegen test at line 307 may have a bug - it expects @main but code generates @cem_main. Worth verifying this test is correct.
Pull Request Review: cem3 Compiler Pipeline
Overview
This PR introduces a complete compiler pipeline for cem3, including parser, AST, code generator, and CLI tool. This is a significant milestone that moves the project from a runtime-only foundation to a working compiler. The implementation follows sound architectural principles and demonstrates excellent attention to detail.
Strengths
1. Excellent Architectural Design
2. Code Quality
3. Testing
4. Safety & Security
Issues & Recommendations
Critical Issues
None identified - the code is production-ready for its scope.
Medium Priority
Parser String Handling (parser.rs:99-104): String literal parsing doesn't handle escape sequences. Strings like "Hello\nWorld" won't work correctly. Recommend adding escape sequence handling or documenting the limitation.
Hardcoded Library Path (lib.rs:46): The library path target/release is hardcoded, which may fail in debug builds. Recommend using environment variables or build.rs to locate the library dynamically.
Error Recovery in Parser: The parser fails immediately on first error. For better developer experience, consider collecting multiple errors.
Low Priority
LLVM IR String Escaping (codegen.rs:50-70): The escape function converts char to string to bytes, which is inefficient for multi-byte UTF-8. Consider using encode_utf8 directly.
Missing Drop Implementation (value.rs:48-49): Comment mentions future Drop implementation. Document the memory management strategy for Quotation variant.
Test Coverage Gaps: Some scenarios lack tests: parser error cases, multiple word definitions, string constant deduplication.
Performance Considerations
Good Decisions:
Future Optimizations (Not Blockers):
Security Assessment
No security concerns identified for the defensive use case (compiling trusted .cem source):
Test Coverage Analysis
Test coverage is excellent for a v1 compiler:
Coverage highlights:
Conclusion
This is excellent work. The PR represents a well-architected, thoroughly tested compiler implementation that:
Recommendation: APPROVE with minor suggestions
The identified issues are all non-blocking:
The code is production-ready for its current scope. The suggestions above are opportunities for future improvements, not blockers for this PR.
Great job!
Reviewed files: compiler/src/.rs, runtime/src/.rs, Cargo.toml, justfile, examples/hello-world.cem
Test execution: All CI checks passing (formatting, clippy, 33 tests, build, examples)