cp6 #437
No reviewers
Labels
No labels
bug
dependencies
documentation
duplicate
enhancement
good first issue
help wanted
invalid
question
refactor
rust
technical-debt
wontfix
No milestone
No project
No assignees
1 participant
Notifications
Due date
No due date set.
Dependencies
No dependencies set.
Reference
navicore/patch-seq!437
Loading…
Add table
Add a link
Reference in a new issue
No description provided.
Delete branch "cp6"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Code Review — cp6: Byte-clean string literals + OSC byte primitives
Overview
This PR lands two related features:
StringLiteral— changes the AST node payload fromStringtoVec<u8>so\xNNescapes produce the literal byte0xNNrather than the UTF-8 encoding of U+00NN. High-byte values (0x80–0xFF) no longer silently expand to two-byte UTF-8 sequences.int.to-bytes-i32-beandfloat.to-bytes-f32-beproduce 4-byte big-endian strings, plus a real OSC 1.0 encoder as a motivating example.The change is internally consistent: the migration from
String→Vec<u8>touches the parser, AST, codegen, and all test files uniformly.Code Quality
Strengths
unescape_stringchange is exactly right:result.push(byte_val)vs. the oldresult.push(byte_val as char)which silently went through UTF-8 encoding.escape_llvm_stringrefactored to byte-by-byte matching — cleaner and correct for the new payload type.ch.encode_utf8(&mut buf), which preserves valid UTF-8 for human-readable text while treating escape sequences as raw bytes. Good layering.patch_seq_push_string_bytesnull-pointer guard for thelen == 0case is correct.Concerns
1. Backslash escape in
escape_llvm_string— possible pre-existing bug carried forwardcrates/compiler/src/codegen/globals.rs:This emits
\\in LLVM IR. Per the LLVM Language Reference, the only recognized string escape inside an assembly string constant is\XX(two hex digits).\\is\followed by\— the second\begins a new escape expecting two hex chars. The correct encoding of a literal backslash would be\5C.This was present before this PR, but the refactoring is a good time to fix it. If the tests currently pass, it likely means the LLVM assembler is lenient here, but relying on undocumented tolerance is fragile.
Suggestion: change the arm to
b'\\' => result.push_str(r"\5C")to match the hex-escape convention used for every other non-printable byte.2.
patch_seq_push_stringis now silently more permissivecrates/runtime/src/io.rs— before:After:
The old code would panic loudly on non-UTF-8 C strings; the new code accepts them silently. The comment documents the intended callers (variant tags, NULL-FFI fallbacks — all ASCII), so this is unlikely to matter in practice. But the loss of the diagnostic is worth noting: if a future codegen path accidentally routes binary data through
push_stringinstead ofpush_string_bytes, it will produce wrong output silently rather than panicking.Suggestion: add a
debug_assertthat verifies UTF-8 in debug builds, or at minimum note the accepted bytes are expected ASCII in the safety contract.3. Hardcoded binary path in
justfileIf
build-exampleschanges the output naming convention, or runs on a platform where the binary has an extension, this line silently produces a "no such file" error that might be swallowed depending on the shell. Consider usingjust run-example live-coding-csound/test_osc(if such a recipe exists) or at least adding a-fexistence check with a clear error message.4.
osc.zero-pad— out-of-range silent fallthroughThe else-of-else arm silently handles any value ≥ 4 (including 0) as "four NULs". Given the calling context
4 - (len % 4)always yields 1–4, this is currently safe. Adding a comment or an assertion that the expected range is 1–4 would make the constraint explicit.Test Coverage
Good coverage overall:
\x00–\xFF.test_hex_escape_full_byte_rangeis a useful new sanity test.bytes_ops/tests.rscovers positive, negative, truncation, zero, and canonical OSC cases for both builtins.Gaps worth considering:
\00appears in the.strglobal andpush_string_bytesis called with the correct length). A snapshot/golden test for the LLVM IR of"\x00abc"would lock in the end-to-end behavior.osc.stringon a string whose byte-length is already a multiple of 4 plus 0 remainder (e.g. len=4: should produce 4 NUL bytes total, not 0).Performance
Each call to
int_to_bytes_i32_be/float_to_bytes_f32_beallocates a 4-elementVec<u8>before passing it toglobal_bytes. For live-coding OSC at audio rates (potentially thousands of messages per second), this is fine. If this primitive later lands in a tight loop at much higher rates, consider a fixed-size array path.Summary
This is a well-motivated and cleanly executed change. The migration is thorough and consistent. The main actionable items are:
\5Cvs\\inescape_llvm_stringfor backslash bytes.patch_seq_push_stringabout its ASCII-only intended callers.justfile.None of these are blockers. The core semantics change is correct.
Code Review — PR #437: cp6 (Byte-clean strings + binary protocol builtins)
Overview
This PR makes Seq strings genuinely byte-clean by changing
StringLiteral(String)→StringLiteral(Vec<u8>)throughout the compiler, fixing\xNNescapes to produce literal bytes (not UTF-8-encoded codepoints), switching the codegen target for string literals frompatch_seq_push_string(C-string, NUL-terminates) to a newpatch_seq_push_string_bytes(explicit length), and adding two new builtins (int.to-bytes-i32-be,float.to-bytes-f32-be) along with a working OSC 1.0 encoder as a proof-of-concept.This is a clean, well-motivated refactoring. The core change is correct, the diff is mechanically thorough, and the new tests are well-chosen.
Strengths
StringLiteral(String)call site was updated; the compiler would have caught any missed ones, but it's good that none slipped through.\5Cinstead of\\is technically more correct per LLVM IR spec, and the comment explaining why is exactly the right level of detail.escape_llvm_stringunit tests,bytes_opsunit tests, updated parser tests, and end-to-end OSC byte-format tests together give strong coverage of the new behaviour.push_string_bytes: Theif len == 0 { Vec::new() }guard correctly handles zero-length with a potentially-null pointer.test_hex_escape_full_byte_range: Nice sanity test that covers the full0x00–0xFFrange explicitly.Issues and Suggestions
1. Debug-only assertion in
patch_seq_push_string(io.rs)In release builds this silently accepts non-UTF-8 bytes, which could cause silent data corruption if a future codegen path incorrectly routes binary data here. The comment says "the comment above is the contract" — but that relies on the codegen never making a mistake. Consider using
#[cfg(debug_assertions)]with an actual panic path (not just assert), or noting in the function's# Safetydoc that callers must guarantee UTF-8 and any violation is UB/undefined behaviour from the ABI contract perspective.2. Redundant
pub usealiases at the bottom ofbytes_ops.rslib.rsalready re-exports these directly from their full names (patch_seq_float_to_bytes_f32_be). These aliases inside the module are unused externally and don't match the pattern of other runtime modules (e.g.,float_ops,string_opsdon't have internal re-export aliases). They add noise without benefit — remove them or move the alias responsibility entirely tolib.rs.3. LLVM ABI:
i64vsisizeforleninpush_string_bytesThe Rust signature uses
usize. On 64-bit (the only real target today) this is fine, but the hardcodedi64in the LLVM declaration will silently misbehave on 32-bit systems whereusize=i32. If 32-bit is a non-goal, a single comment to that effect would head off future confusion. If 32-bit ever needs to work, this needs a target-width check.4. No integration test for
\xNNhigh-byte round-trip through the full pipelineThe parser tests verify the
Vec<u8>payload; the codegen tests verify LLVM IR emission; thebytes_opsunit tests verify the runtime functions. But there's no integration test that compiles and runs a Seq program containing"\xDC\xFF"and verifies the bytes arrive intact at the runtime (e.g. viastring.byte-lengthor comparing against a known pattern). The OSC tests cover this indirectly, but a targeted integration test makes the guarantee explicit and will catch regressions in the codegen path independently of OSC semantics.5.
osc.zero-pad— silent fallback for out-of-range inputsThe comment says "passing values outside 1..4 silently produces 4 NULs." For any caller other than
osc.stringthis is a silent wrong answer, not a helpful error. Since the current only caller isosc.string(and the formula is statically bounded to 1..4), this isn't a runtime risk today — but ifosc.zero-padis ever used directly by user code, the failure mode is non-obvious. Consider an assertion/panic for the out-of-range case, or rename it to make the contract more obvious (e.g.,osc.zero-pad-1-to-4).6. OSC encoder:
seq:allow(unchecked-mod)placementThe annotation is placed inside the doc comment block, which is the correct Seq convention, but it's worth double-checking that
seq:allowplaced on theword definitionsuppresses the warning for thedropon the Bool result (as opposed to the division itself). If the linter attaches to thedropcall-site rather than the word, the annotation may not fire.7. Little-endian variants
OSC mandates big-endian, so the current scope is correct. But if
bytes_opsis intended as the start of a general binary-encoding module (as the module-level doc implies: "OSC, DNS, NTP, MessagePack, Protobuf, etc."), little-endian variants will be needed shortly — MessagePack uses big-endian, but DNS, NTP, and most "native" protocols vary. No action needed now, just worth a TODO note in the module doc or a tracked issue so this doesn't get forgotten.Minor Nits
// ===...banner comment inio.rsfor the two-linepush_string_bytessection is heavier than necessary given the function-level doc is already clear.bytes_ops/tests.rs,pop_string_bytescallss.as_bytes().to_vec()— sinceSeqStringis already byte-backed, check whetheras_bytes()copies unnecessarily. A borrow might suffice.# Int32 + float32:,if. The Csound POC's bread-and-butter: triggerhas a stray backtick that breaks the inline formatting.Test verification
Manually verified the OSC test vectors:
/synth/play(11 bytes) → pad by 1 → 12 bytes ✓,if(3 bytes) → pad by 1 → 4 bytes ✓1000→0x000003E8✓220.0→ f320x435C0000✓test-osc-msg-if-length-aligned✓All other test vectors check out.
Overall: Approve with the
pub useredundancy and LLVM ABI comment as the only changes I'd call required before merge. The integration test gap and out-of-rangeosc.zero-padbehavior are worth tracking but not blockers.