No description

Find a file

Ed Sweeney bd1d39f341 init		2026-06-09 09:32:49 -07:00
README.md	init	2026-06-09 09:32:49 -07:00

README.md

cpp+ — "C++: The Good Parts"

A thought experiment. The goal is not to make C++ memory-safe — it can't be, in the compositional, compile-time sense Rust means. The goal is to define a curated subset of modern C++ that recovers as much of the Rust experience (value semantics, ADTs, exhaustive pattern matching, RAII ownership) as the language allows, and to be honest about exactly where the remaining gap is and what it costs to paper over.

If you only read one thing, read The Honest Gap at the bottom.

The thesis

C++ already contains a smaller, better language. The "good parts" are the value-semantic, RAII, ownership-typed core. The "bad parts" are the ones that exist for C compatibility, manual memory management, or implicit conversions. A disciplined subset:

forbids the features that produce the famous footguns,
mandates the constructs that give RAII + value semantics,
uses std::variant / std::optional / std::expected as sum types, and
accepts that lifetime/aliasing safety is tested, not proven — and instruments the build accordingly.

Items 1–3 are free and get you ~80% of the daily Rust ergonomics. Item 4 is where the categorical difference with Rust lives. We name it rather than hide it.

The subset: forbid / mandate / prefer

Forbidden (linter is `-Werror` on these)

Banned	Why	Use instead
Owning raw pointers (`new`/`delete`, `malloc`/`free`)	manual lifetime = use-after-free, leaks, double-free	`unique_ptr`, `shared_ptr`, value types
C arrays, pointer arithmetic	unbounded access	`std::array`, `std::vector`, `std::span`
C-style casts, `reinterpret_cast`	silent UB	`static_cast`, named conversions
Implicit narrowing conversions	silent truncation	`{}`-init (narrowing is ill-formed), `gsl::narrow`
Raw `union`	type confusion	`std::variant`
Out-params via non-const ref/pointer	aliasing, uninit reads	return values, `struct`/`tuple` returns
Default/implicit conversions on ctors	accidental construction	`explicit` on every single-arg ctor
Macros for logic/constants	no scope, no types	`constexpr`, `consteval`, templates, `enum class`
Naked `new` in expressions	leak on exception	`make_unique`, `make_shared`
`const_cast` away of const	UB on real consts	redesign
Inheritance for code reuse	fragile base class	composition; inheritance only for interfaces (pure virtual)
Exceptions across module/ABI boundaries	(project policy)	`std::expected` at boundaries

Mandated

Rule of Zero. Write no destructor, copy, or move ops. If you can't, your type owns a resource — wrap that resource in a type that itself obeys Rule of Zero. Custom special members are a code smell that needs review.
explicit on every constructor that can take one argument.
[[nodiscard]] on every function returning a value that means something (especially expected/optional/status types).
const by default. const locals, const member functions, const & params for non-trivial reads. Mutability is opt-in and visible.
{}-initialization everywhere (no = init, no () init) — gives you narrowing as a compile error and dodges the most-vexing-parse.
Ownership in the type. unique_ptr<T> = owned, moves like a Rust Box. T&/span<T> = borrowed, never stored. shared_ptr<T> = shared ownership, used sparingly and named in review as "I genuinely need shared lifetime."
Bounds-checked access in the subset's container wrappers (.at() semantics, or a hardened span), with the cost acknowledged.

Preferred idioms (the "feels like Rust" layer)

std::optional<T> for "maybe a value" — your Option<T>.
std::expected<T, E> (C++23) for fallible returns — your Result<T, E>. No exceptions on the happy/expected-error path; exceptions reserved for truly exceptional/programmer-error.
std::variant<A, B, C> for closed sum types — your enum.
std::span<T> for borrowed contiguous views — your &[T].
std::string_view for borrowed strings — your &str. (With the lifetime caveat below.)
Free functions over methods when there's no invariant to protect; enum class always.

The ADT + pattern-matching story

This is the part Rust users care about most, so here's the honest side-by-side.

Sum types

Rust:

enum Shape {
    Circle { r: f64 },
    Rect { w: f64, h: f64 },
}

cpp+:

struct Circle { double r; };
struct Rect   { double w, h; };
using Shape = std::variant<Circle, Rect>;

std::variant is a real, type-safe, stack-allocated tagged union. No heap, no inheritance, exhaustive by construction. This is genuinely close to a Rust enum in semantics. The gap is purely ergonomic.

Pattern matching / exhaustiveness

Rust — the compiler enforces exhaustiveness and binds fields in one form:

let area = match shape {
    Shape::Circle { r } => PI * r * r,
    Shape::Rect { w, h } => w * h,
};

cpp+ — std::visit with an overload set. The trick is the "overloaded" helper:

// the one piece of boilerplate you write once, project-wide:
template <class... Ts> struct overload : Ts... { using Ts::operator()...; };
template <class... Ts> overload(Ts...) -> overload<Ts...>;

double area = std::visit(overload{
    [](const Circle& c) { return std::numbers::pi * c.r * c.r; },
    [](const Rect&   r) { return r.w * r.h; },
}, shape);

Exhaustiveness is recovered: if you omit a case, the overload set has no matching operator() for that alternative and it fails to compile. So you do get Rust-like "add a variant, every match breaks until you handle it" — this is the single most important property and std::variant + std::visit preserves it.

What you don't get:

Destructuring bind in the match arm. You get c.r, not { r }. Structured bindings (auto [w, h] = r;) help inside the lambda but it's still clunkier.
Guards (Shape::Rect { w, h } if w == h =>). You write an if inside the lambda.
Nested patterns (Some(Circle { r })). You nest visits or destructure manually — this gets ugly fast and is the clearest ergonomic loss.
One expression, one form. The overload/visit ritual is heavier at every call site.

The honest ergonomic verdict on ADTs

Semantically: ~90% there. variant/optional/expected are real sum types with compile-time exhaustiveness. Ergonomically: ~60% there. The boilerplate tax (overload, visit, no destructuring, no guards, no nested patterns) is paid at every use site, and nested ADTs are where it stops feeling pleasant. C++26's pattern matching proposal (inspect) would close much of this gap if it lands — worth tracking.

Tooling sketch

The subset is only real if a machine enforces it. Minimum viable enforcement:

clang-tidy config enabling cppcoreguidelines-*, bugprone-*, modernize-*, cppcoreguidelines-pro-type-* (catches casts, unions, pointer arithmetic), with WarningsAsErrors: '*'.
-Werror -Wall -Wextra -Wconversion -Wshadow plus -Wnarrowing.
A header-only cpp+ prelude providing overload, the bounds-checked container wrappers, narrow, and using-aliases that nudge toward expected/span.
Sanitizers in CI, always: -fsanitize=address,undefined for the test suite (and a separate MSan build for uninitialized reads). These are the modern replacement for valgrind: faster, and they instrument the fuzzer's own runs.
Continuous fuzzing: -fsanitize=fuzzer (libFuzzer) or AFL++ on every parsing / untrusted-input boundary. This is the OSS-Fuzz methodology in miniature. It is the backstop for the one thing the linter cannot check — lifetimes and aliasing.

The honest gap

Everything above buys you a great deal: value semantics, RAII ownership, real sum types, compile-time-exhaustive matching, no manual new/delete. For a large class of code it will feel close to Rust, and a disciplined team genuinely ships fewer memory bugs this way. Many high-assurance C++ shops already live here.

What it cannot do, even in principle:

No borrow checker. Nothing in the subset proves, at compile time, that a T&, span<T>, or string_view does not outlive the thing it points into. The linter catches syntactic patterns; the general case (lifetimes + aliasing across function boundaries) is undecidable to detect statically without the whole-program flow analysis that is the borrow checker. string_view over a temporary is still a dangling read here, and the compiler will accept it.
Safety is sampled, not total. Sanitizers + fuzzing prove "no defect observed on the paths and inputs we executed." That's a probability over a sampled state space, not a proof over all of it. It does not compose: two clean-fuzzed modules tell you nothing about their composition until you fuzz the whole.
The guarantee doesn't compose — which is the actual property that made Rust's safety cheap. In Rust, a safe function built from safe functions is safe, for free, forever. Here, the absence of memory bugs is a continuously-earned test result, not a structural fact.

So the honest claim cpp+ can make is "high-assurance tested C++ with Rust-flavored ergonomics," never "memory-safe C++." That distinction is not pedantry — it is the entire debate, and it's the one the C++ committee itself is currently having (Baxter's Safe C++ / Circle borrow checker vs. Stroustrup–Sutter Profiles). cpp+ is squarely the "Profiles" philosophy: curate, lint, check at runtime. Its limits are exactly the limits the Safe C++ camp points at.

That's the takeaway worth sharing: you can get C++ to "no memory bug found after a lot of testing," and disciplined shops do exactly that. What you can't cheaply get is "memory-safe by construction, and that property composes." Rust's borrow checker isn't mainly a bug catcher — it's a mechanism for making the absence of those bugs a free, compositional, compile-time fact. The fact that C++'s own experts are fighting over whether to adopt that very mechanism is the strongest evidence that this is not a zealot's opinion.

Status

Thought experiment. No code yet beyond this sketch. The cheapest next artifact that would make it real: the cpp+ prelude header (overload, container wrappers, aliases) + a pinned clang-tidy config + one worked example translating a small Rust enum-heavy program. That would be enough to argue from concretely without committing to a "language."

README.md Unescape Escape