Preface

This is a production guide to modern C++23, not a language course and not a modernization diary. It is written for programmers who already know how software gets built, tested, shipped, and debugged, but who need a reliable way to make good C++ decisions under real design pressure. The subject is not syntax coverage. It is ownership, lifetime, interface shape, failure boundaries, concurrency, data layout, performance evidence, and verification.

The book assumes you do not need help writing a loop or setting up an editor. You need help deciding whether a function should borrow with std::string_view or std::span<const std::byte>, where ownership should transfer, whether failure should throw or return std::expected, when a range pipeline clarifies an algorithm, when a coroutine frame becomes a lifetime risk, and what evidence is strong enough to justify a performance claim. If you want a feature tour, beginner scaffolding, or a header-by-header catalog, this is the wrong book.

That assumption is deliberate. You do not need to arrive as a C++ expert. You do need to arrive as an engineer who is already comfortable reading nontrivial code, reasoning about APIs, tracing control flow across subsystems, and dealing with testing, debugging, performance work, and operational tradeoffs. The book is organized for that reader.

The manuscript moves in seven parts. Parts I and II establish the mental models and everyday vocabulary that keep ordinary code reviewable. Part III moves outward into interfaces, polymorphism, libraries, modules, and ABI reality. Parts IV and V address concurrency, data layout, allocation, and measurement. Part VI covers the verification and diagnostic stack that keeps native systems honest. Part VII closes with complete production shapes: a service, a reusable library, and a reviewer workflow.

That structure exists because the expensive failures in C++ are rarely local. A service keeps request state alive through shared_ptr, fans work out into background tasks, logs failure by side effect, and hangs during shutdown. A library accepts borrowed views at the boundary, then stores them past the caller’s lifetime. A performance refactor removes one copy and quietly adds contention and allocator pressure elsewhere. Each local choice can look reasonable. The failure shows up when ownership, time, error transport, and cost models interact.

Undefined behavior is a related pressure that runs through the entire book rather than living in one quarantine chapter. A dangling reference, a data race, an invalidated iterator, or a view pipeline that quietly outlives its source can turn a plausible design into a system that fails only under optimization, load, or a different toolchain. The ownership, concurrency, performance, and tooling chapters each address the UB risks most relevant to their domain. Treat UB awareness as a thread that connects the manuscript, not as a one-time warning label.

How to read this book

You can read straight through. The sequence is meant to build judgment in layers: first ownership and invariants, then everyday library and language tools, then architecture, concurrency, performance, verification, and full production patterns. But the chapter set also supports targeted entry if you already know the class of problem you are solving:

Starting point	Recommended chapters
Designing interfaces or library boundaries	Chapters 4, 9, 10, 11, and 22
Building or repairing a native service	Chapters 1, 3, 12, 13, 14, 20, and 21
Writing generic and reusable implementation code	Chapters 5, 6, 7, 8, and 22
Working on hot paths, memory behavior, or measurement	Chapters 15, 16, and 17
Hardening verification and build diagnostics	Chapters 18, 19, and 20
Reviewing production C++ changes	Chapters 1, 3, 4, 14, 19, 22, and 23

Each chapter declares its prerequisites near the top so you can enter where the problem is and fill in background only when needed. The appendices are compact support material: a decision-oriented feature index, a toolchain and diagnostics baseline, a short-form review checklist, and the canonical glossary for the book’s core terms.

The primary baseline is C++23. C++26 appears only when it changes a decision you would make today or removes a meaningful cost from an existing pattern. Forward-looking material is there to keep the recommendations honest about the near future, not to turn the book into standards commentary.

If the book succeeds, it should leave you with more than vocabulary. You should be able to look at a design or a diff and explain the trade clearly: what it buys, what it risks, what it commits callers or operators to, and what evidence would prove the choice sound. That is the difference between knowing modern C++ features and using modern C++ well.

Book Map

The book is organized around the places where production C++ gets expensive when the design is vague. It starts with ownership, invariants, failure boundaries, and API shape because later architecture depends on those choices being reviewable. It then moves through the library and language tools that change everyday design, into interfaces and packaging, into concurrency and performance, into verification and observability, and finally into complete production patterns.

Undefined behavior is a cross-cutting concern rather than a standalone stop on the tour. The ownership chapters address lifetime and aliasing hazards. The concurrency chapters address data races, shutdown bugs, and work that outlives its owner. The data and performance chapters address invalidation, layout, locality, and false confidence from weak measurement. The verification chapters cover the tooling and runtime signals that catch what code review alone will miss.

Part I: Core Mental Models

Part I establishes the vocabulary the rest of the manuscript depends on: ownership, lifetime, value semantics, invariants, failure boundaries, and how signatures communicate cost and retention. If these terms are fuzzy, later chapters become style arguments instead of engineering arguments.

Part II: Writing Modern C++ Code

Part II covers the standard library and language tools that should change ordinary C++23 design: borrowing types, result and alternative types, concepts, ranges, generators, and compile-time work used with restraint. The focus is not feature coverage. It is which tools change contracts, review burden, and cost models.

Part III: Interfaces, Libraries, and Architecture

Part III moves from local code into subsystem and package boundaries. The question becomes whether interfaces remain honest once callbacks, type erasure, modules, packaging, and ABI constraints enter the picture. This is where local elegance starts competing with long-term composability.

Part IV: Concurrency and Asynchronous Systems

Part IV treats concurrency as ownership across time. Shared state, coroutine suspension, cancellation, and backpressure are presented as lifecycle and throughput problems, not as a catalog of primitives. The goal is to make async work bounded, owned, and stoppable.

Part V: Data, Memory, and Performance

Part V focuses on how representation decisions become runtime behavior. Data layout, container choice, allocation policy, locality, and measurement discipline are treated as one connected cost story rather than as separate optimization tricks.

Part VI: Verification and Delivery

Part VI covers the stack that keeps native systems honest after the design is chosen: tests aimed at boundary failures, sanitizer and static-analysis lanes, build diagnostics, and observability for running systems. The emphasis is on evidence quality, not on checklists for their own sake.

Part VII: Production Patterns

Part VII applies the earlier chapters to complete engineering shapes. Chapter 21 shows the service boundary where ownership, admission control, shutdown, and telemetry all have to line up. Chapter 22 does the same for a reusable library whose contracts must survive other teams. Chapter 23 closes with a reviewer workflow that turns the rest of the book into day-to-day code review behavior.

Appendices

The appendices are intentionally compact. They provide a decision-oriented feature index, a reference toolchain and diagnostics baseline, a short-form review checklist, and the glossary that anchors the book’s core terminology. They are there to speed up applied work, not to become a second textbook.

Ownership, Lifetime, and RAII

The first production question in modern C++ is not “should this be a class” or “can this be zero-copy.” It is simpler and more dangerous: who owns this resource, how long does it stay valid, and what guarantees cleanup when the happy path stops being happy.

That question applies to memory, but memory is only part of the story. Real systems own sockets, file descriptors, mutexes, thread joins, temporary directories, telemetry registrations, process handles, mapped files, transaction scopes, and shutdown hooks. The language gives you enough rope to represent all of them badly. Ownership deserves the first chapter not because it is foundational in an academic sense, but because unclear ownership makes the rest of the design impossible to review with confidence.

In production, the expensive failures are rarely dramatic at the call site. A service starts a background flush, captures a raw pointer to request state, and occasionally crashes during deploy. A connection pool closes on the wrong thread because the last shared_ptr release happened in a callback nobody considered part of shutdown. An initialization path half-builds three resources and leaks the second one when the fourth throws. These are not syntax problems. They are ownership problems that became operational incidents.

RAII is the main reason modern C++ can manage these situations cleanly. It is not an old idiom that survived by habit; it is the mechanism that lets resource lifetime compose with scope, exceptions, early returns, and partial construction. Used well, RAII makes cleanup boring. That is exactly what you want.

Ownership must be legible

Ownership is a contract, not an implementation detail. A reviewer should be able to point at a type or member and answer three questions quickly.

What does this object own?
What may it borrow temporarily?
What event ends the lifetime of the owned resource?

If the answers require reading several helper functions, the design is already too implicit.

This is why modern C++ favors types whose ownership behavior is obvious. std::unique_ptr<T> means exclusive ownership. std::shared_ptr<T> means shared ownership with reference-counted lifetime. A plain object member means the containing object owns that subobject directly. A std::span<T> or std::string_view means borrowing, not retention. These are not stylistic preferences. They are part of how the program communicates lifetime.

The opposite style is familiar and expensive: a raw pointer member that might own, might observe, and might occasionally be null because shutdown is in progress. That design is cheap to type and expensive to reason about.

RAII is about resources, not about `new`

Many programmers first encounter RAII as “use smart pointers instead of manual delete.” That is directionally correct and far too small.

RAII means tying a resource to the lifetime of an object whose destructor releases it. The resource might be memory. It might just as easily be a file descriptor, a kernel event, a transaction lock, or a metrics registration that must be unregistered before shutdown completes.

What happens without RAII

Before illustrating the RAII pattern, it is worth seeing the manual approach in a fuller form. The following anti-pattern is intentionally buggy, because production codebases still contain code that looks exactly like this.

socket_t create_server_socket(std::uint16_t port) {
    socket_t server = ::socket(AF_INET, SOCK_STREAM, 0);
    if (server == invalid_socket) {
        throw NetworkError{"socket failed"};
    }

    int opt = 1;
    if (::setsockopt(server, SOL_SOCKET, SO_REUSEADDR, &opt, sizeof(opt)) < 0) {
        ::close_socket(server);
        throw NetworkError{"setsockopt failed"};
    }

    sockaddr_in addr{};
    addr.sin_family = AF_INET;
    addr.sin_addr.s_addr = INADDR_ANY;
    addr.sin_port = htons(port);

    if (::bind(server, reinterpret_cast<sockaddr*>(&addr), sizeof(addr)) < 0) {
        ::close_socket(server);
        throw NetworkError{"bind failed"};
    }

    if (::listen(server, 16) < 0) {
        ::close_socket(server);
        throw NetworkError{"listen failed"};
    }

    return server; // RISK: caller now owns the raw descriptor by convention
}

void serve_once(std::uint16_t port) {
    socket_t server = create_server_socket(port);
    socket_t client = invalid_socket;

    try {
        sockaddr_in client_addr{};
        socket_length addr_len = sizeof(client_addr);
        client = ::accept(server,
                          reinterpret_cast<sockaddr*>(&client_addr),
                          &addr_len);
        if (client == invalid_socket) {
            ::close_socket(server); // BUG: server will be closed twice (here + in catch)
            throw NetworkError{"accept failed"};
        }

        std::array<char, 8192> buffer{};
        auto n = read_from_socket(client, buffer.data(), buffer.size());
        if (n <= 0) {
            ::close_socket(client);
            ::close_socket(server);
            return;
        }

        process_request(client, std::string_view{buffer.data(), static_cast<std::size_t>(n)}); // RISK: any throw must preserve cleanup correctness

        ::close_socket(client);
        ::close_socket(server);
    } catch (...) {
        if (client != invalid_socket) {
            ::close_socket(client);
        }
        ::close_socket(server);
        throw;
    }
}

The problems compound quickly:

Cleanup is duplicated. ::close_socket(server) appears in the setup helper, the normal path, the early-return path, and the exception path. The more exits you add, the more duplication you carry.
Duplication turns into bugs. The accept failure path already closes server before throwing, so the catch block closes it a second time. Manual ownership logic tends to drift this way under maintenance.
Exception safety depends on discipline. process_request may throw. Any maintenance change between acquisition and cleanup has to remember which descriptors are live at that point.
Transfer is implicit. create_server_socket() returns a raw socket_t, so ownership is now a convention between caller and callee rather than part of the type system.
Reviews become global. To verify correctness, a reviewer has to inspect the whole function and confirm that every exit path closes every descriptor exactly once.

The RAII alternative eliminates these problems by construction. Each resource is held by an owning object whose destructor performs the release. Stack unwinding does the rest.

The companion web-api project already contains the example we want. Its Socket class in examples/web-api/src/modules/http.cppm wraps a file descriptor and makes the ownership rules explicit:

// From examples/web-api/src/modules/http.cppm
class Socket {
public:
    Socket() = default;
    explicit Socket(socket_handle fd) noexcept : fd_{fd} {}

    Socket(const Socket&) = delete;
    Socket& operator=(const Socket&) = delete;

    Socket(Socket&& other) noexcept
        : fd_{std::exchange(other.fd_, invalid_socket_handle)} {}

    Socket& operator=(Socket&& other) noexcept {
        if (this != &other) {
            close(); // release what this object currently owns
            fd_ = std::exchange(other.fd_, invalid_socket_handle);
        }
        return *this;
    }

    ~Socket() { close(); } // automatic release on every exit path

    [[nodiscard]] socket_handle fd() const noexcept { return fd_; }
    [[nodiscard]] bool valid() const noexcept { return fd_ != invalid_socket_handle; }
    explicit operator bool() const noexcept { return valid(); }

    void close() noexcept {
        if (fd_ != invalid_socket_handle) {
            close_socket(fd_);
            fd_ = invalid_socket_handle;
        }
    }

private:
    socket_handle fd_{invalid_socket_handle};
};

That class is enough to explain the whole RAII story:

Acquisition happens in the constructor: Socket sock{::socket(...)};
Ownership is unique because copy is deleted.
Transfer is explicit because moves use std::exchange to leave the source empty.
Release is automatic because the destructor always calls close().

The surrounding code in the same module shows how this behaves in real use. The following is a partial excerpt: only the ownership-relevant lines are shown, so supporting declarations and unrelated error-handling details are omitted for clarity.

[[nodiscard]] Socket create_server_socket() const {
    Socket sock{::socket(AF_INET, SOCK_STREAM, 0)}; // ownership starts here
    if (!sock) return {};

    int opt = 1;
    if (::setsockopt(sock.fd(), SOL_SOCKET, SO_REUSEADDR, &opt, sizeof(opt)) < 0) {
        return {}; // sock is destroyed here, so the descriptor closes automatically
    }

    sockaddr_in addr{};
    addr.sin_family = AF_INET;
    addr.sin_addr.s_addr = INADDR_ANY;
    addr.sin_port = htons(port_);
    if (::bind(sock.fd(), reinterpret_cast<sockaddr*>(&addr), sizeof(addr)) < 0) {
        return {}; // same: failure path still releases the descriptor
    }

    if (::listen(sock.fd(), 16) < 0) {
        return {};
    }

    return sock; // move or copy elision transfers ownership to the caller
}

Socket client{::accept(server_sock.fd(), ...)}; // client address parameters omitted for brevity
handle_connection(std::move(client));           // explicit ownership transfer

void handle_connection(Socket client) const {
    std::array<char, 8192> buf{};
    auto n = read_from_socket(client.fd(), buf.data(), buf.size());
    if (n <= 0) return;

    Response resp = handler_(req); // request parsing omitted here
    auto data = resp.serialize();
    (void)write_to_socket(client.fd(), data.data(), static_cast<int>(data.size()));
} // client goes out of scope here and closes automatically

After handle_connection(std::move(client)), the caller-side client no longer owns the descriptor. The move constructor exchanged its file descriptor for invalid_socket_handle, so the moved-from object is harmless when its destructor later runs. Ownership exists in exactly one object at a time.

Notice what disappeared: there is no cleanup ladder, no try/catch whose main job is teardown, and no convention about who owns which descriptor. The type carries the policy. That is the practical value of RAII.

The same pattern applies to many non-memory resources. A scoped registration token unregisters in its destructor. A transaction object rolls back unless explicitly committed. A joined-thread wrapper joins during destruction or refuses to be destroyed while still joinable. Once a codebase thinks this way, cleanup paths become local again instead of scattered through error handling.

Anti-pattern: cleanup by convention

The alternative to RAII is usually not explicit manual cleanup done perfectly. It is cleanup by convention, which means cleanup gets skipped under stress.

// Anti-pattern: ownership and cleanup are split across control flow.
void publish_snapshot(Publisher& publisher, std::string_view path) {
    auto* file = ::open_config(path.data());
    if (file == nullptr) {
        throw ConfigError{"open failed"};
    }

    auto payload = read_payload(file);
    if (!payload) {
        ::close_config(file); // BUG: one exit path remembered cleanup
        throw ConfigError{"parse failed"};
    }

    publisher.send(*payload); // BUG: if this throws, file leaks
    ::close_config(file);
}

This is not controversial because manual cleanup is ugly. It is wrong because cleanup policy is now interleaved with every exit path. Once the function acquires a second or third resource, the control flow becomes harder to audit than the work the function actually performs.

The RAII version eliminates every manual release and every conditional cleanup path:

void publish_snapshot(Publisher& publisher, std::string_view path) {
    auto file = ConfigFile::open(path); // RAII: destructor calls ::close_config
    if (!file) {
        throw ConfigError{"open failed"};
    }

    auto payload = read_payload(*file);
    if (!payload) {
        throw ConfigError{"parse failed"};
        // file releases automatically -- no manual cleanup needed
    }

    publisher.send(*payload);
    // file releases automatically at scope exit, whether normal or exceptional
}

The function now has one concern: its actual logic. Cleanup is invisible because it is guaranteed. Adding a third, fourth, or tenth exit path changes nothing about resource safety. That composability is the real payoff of RAII – not prettier code, but correct code under maintenance pressure.

RAII fixes cleanup-by-convention by moving the release policy into the owning object. Error paths then recover their main job: describing failure rather than describing teardown.

Exclusive ownership should be the default

Most resources in well-designed systems have one obvious owner at any given time. A request object owns its parsed payload. A connection object owns its socket. A batch owns its buffers. That is why exclusive ownership is the right default mental model.

In practice this means preferring plain object members or std::unique_ptr when direct containment is not possible. unique_ptr is not a signal that the design is sophisticated. It is a signal that ownership transfers and destruction are explicit. It also composes well with containers, factories, and failure paths because moved-from state is defined and single ownership stays single.

Shared ownership should be a deliberate exception. There are valid cases: asynchronous fan-out where several components must keep the same immutable state alive, graph-like structures with genuine shared lifetime, caches whose entries remain valid while multiple users still hold them. But shared_ptr is not a safety blanket. It changes destruction timing, adds atomic reference-count traffic, and often hides the real question: why can no component name the owner?

If a review finds shared_ptr at a boundary, the follow-up question should be concrete: what lifetime relationship made exclusive ownership impossible here? If the answer is vague, the shared ownership is probably compensating for a design that never decided where the resource belongs.

A common symptom is shutdown non-determinism. When the last shared_ptr to a resource is released from an unpredictable callback or thread, the destructor runs at an unpredictable time and place:

// Risky: destruction timing depends on which callback finishes last.
void start_fanout(std::shared_ptr<Connection> conn) {
    for (auto& shard : shards_) {
        shard.post([conn] {           // each lambda extends lifetime
            conn->send(shard_ping()); // last lambda to finish destroys conn
        });
    }
    // conn may already be destroyed here, or may live much longer --
    // depends on thread scheduling. Destructor side effects (logging,
    // metric flush, socket close) now happen at an uncontrolled point.
}

When destruction order matters (and in production it almost always does), prefer unique_ptr with explicit lifetime scoping, and pass non-owning raw pointers or references to work that is guaranteed to complete within the owner’s lifetime.

Borrowing needs tighter discipline than owning

A system with clear ownership still needs non-owning access. Algorithms inspect caller-owned buffers. Validation reads request metadata. Iterators and views traverse storage without copying it. Borrowing is normal. The mistake is letting borrowed state outlive the owner or making the borrow invisible.

Modern C++ gives you useful borrowing vocabulary: references, pointers used explicitly as observers, std::span, and std::string_view. These types help, but they do not enforce a good design by themselves. A view member inside a long-lived object is still a lifetime risk if the owner is elsewhere. A callback that captures a reference to stack state is still wrong if the callback runs later.

That risk becomes more severe under concurrency. A raw pointer or string_view captured into background work is not a small local shortcut. It is a cross-time borrow whose validity now depends on scheduling and shutdown order.

This is why a useful ownership rule is simple: owning types may cross time freely; borrowed types should cross time only when the owner is visibly stronger and longer-lived than the work using the borrow. If you cannot make that argument quickly, copy or transfer ownership instead.

Move semantics define transfer, not optimization

Move semantics are often introduced as a performance topic. In practice they matter first as an ownership topic.

Moving an object states that the resource changes owners while the source remains valid but no longer responsible for the old resource. That is what makes factories, containers, and pipeline stages composable without inventing bespoke transfer APIs for every type.

For resource-owning types, good move behavior is part of the type’s correctness story.

The moved-to object becomes the owner.
The moved-from object remains destructible and assignable.
Double-release cannot occur.

This is one reason direct resource wrappers are worth the small amount of code they require. Once the ownership transfer rules live in the type, callers stop hand-transferring raw handles and hoping conventions line up.

Some types should not be movable, and not every move is cheap. A mutex is typically neither copyable nor movable because moving it would complicate invariants and platform semantics. A large aggregate with direct-buffer ownership may be movable but still not cheap in a hot path. The design question is not “can I default the move operations.” It is “what ownership story should this type allow.”

Lifetime bugs often hide in shutdown and partial construction

Programmers tend to think about lifetime during the main work path. Production bugs often show up during startup failure and shutdown instead.

Partial construction is one example. If an object acquires three resources and the second acquisition throws, the first one must still release correctly. RAII handles this automatically when ownership is layered into members rather than performed manually in constructor bodies with cleanup flags.

The manual approach is fragile:

// Anti-pattern: manual multi-resource construction with cleanup flags.
class Pipeline {
public:
    Pipeline(const Config& cfg) {
        db_ = ::open_db(cfg.db_path().c_str());
        if (!db_) throw InitError{"db open failed"};

        cache_ = ::create_cache(cfg.cache_size());
        if (!cache_) {
            ::close_db(db_); // must remember to clean up db_
            throw InitError{"cache alloc failed"};
        }

        listener_ = ::bind_listener(cfg.port());
        if (listener_ == invalid_socket) {
            ::destroy_cache(cache_); // must remember both prior resources
            ::close_db(db_);
            throw InitError{"bind failed"};
        }
    }

    ~Pipeline() {
        ::close_listener(listener_);
        ::destroy_cache(cache_);
        ::close_db(db_);
    }

private:
    db_handle_t db_ = nullptr;
    cache_handle_t cache_ = nullptr;
    socket_t listener_ = invalid_socket;
};

Every new resource added to this constructor requires updating every prior failure branch. A maintenance change that reorders acquisitions silently breaks the cleanup logic.

The RAII version uses member wrappers and relies on the language rule that already-constructed members are destroyed when a constructor throws:

class Pipeline {
public:
    Pipeline(const Config& cfg)
        : db_(DbHandle::open(cfg.db_path()))       // destroyed automatically if
        , cache_(Cache::create(cfg.cache_size()))   // a later member throws
        , listener_(Listener::bind(cfg.port())) {}

private:
    DbHandle db_;
    Cache cache_;
    Listener listener_;
};

No cleanup flags, no cascading if blocks, no order-sensitive manual teardown. The language does the work.

In the book’s example project, main.cpp shows this principle applied to a complete service startup. Each layer is constructed as a scoped local in main(), and the stack’s natural destruction order handles teardown:

// From examples/web-api/src/main.cpp (simplified)
int main() {
    webapi::TaskRepository repo;                       // 1. domain object
    webapi::Router router;                             // 2. route table
    router.get("/tasks", webapi::handlers::list_tasks(repo));

    auto handler = webapi::middleware::chain(           // 3. middleware
        pipeline, router.to_handler());

    webapi::http::Server server{port, std::move(handler)}; // 4. server
    server.run_until(shutdown_requested);
    // destruction unwinds in reverse: server, handler, router, repo
}

No explicit teardown code appears anywhere. If any construction step throws, every previously constructed object is destroyed automatically in reverse order – exactly the guarantee the RAII Pipeline pattern relies on.

Shutdown is the other major pressure point. Destructors run when the system is already under state transition. Background work may still hold references. Logging infrastructure may be partially torn down. A destructor that blocks indefinitely, calls back into unstable subsystems, or depends on thread affinity that the type never documented can turn a tidy ownership model into a deploy-time failure.

The lesson is not to fear destructors, but to keep destructor work narrow and explicit. Release the resource you own. Avoid surprising cross-subsystem behavior. If teardown requires a richer protocol than destruction alone can safely provide, expose an explicit stop or close operation and use the destructor as a final safety net rather than the only cleanup path.

Verification and review

Ownership design needs explicit review because many lifetime bugs are structurally visible before any tool runs.

Useful review questions:

Is there a single obvious owner for each resource?
Are borrowed references and views visibly shorter-lived than the owner?
Is shared_ptr solving a real shared-lifetime problem or avoiding an ownership decision?
Do move operations preserve single ownership and safe destruction?
Does shutdown rely on destructor side effects that are broader than resource release?

Dynamic tools still matter. AddressSanitizer catches many use-after-free bugs. Leak sanitizers and platform diagnostics catch forgotten release paths. ThreadSanitizer helps when lifetime errors are exposed by races during shutdown. But tools are strongest when the type system already makes ownership legible.

Takeaways

Treat ownership as a contract that must be visible in types and object structure.
Use RAII for every meaningful resource, not only heap memory.
Prefer exclusive ownership by default and justify shared ownership explicitly.
Think of move semantics as ownership transfer rules before thinking of them as performance features.
Review shutdown and partial-construction paths as seriously as the steady-state path.

If a resource can be leaked, double-released, used after destruction, or destroyed on the wrong thread, the problem usually started earlier than the crash. It started when ownership was left implicit.

Values, Identity, and Invariants

Once ownership is clear, the next production question is semantic rather than mechanical: what kind of thing is this object supposed to be?

Modern C++ code gets harder than it needs to be when every type is treated as mutable state with methods attached. Some objects are values. They represent a self-contained piece of meaning, can often be copied or replaced wholesale, and should be easy to compare and test. Some objects carry identity. They represent a specific session, account, worker, or connection whose continuity matters across time even as its fields change. Mixing those roles inside one vague type creates bugs that look unrelated: broken equality, unstable cache keys, accidental aliasing, poor concurrency behavior, and APIs that cannot say whether mutation changes “the same thing” or creates a new one.

This chapter is about keeping those categories sharp and enforcing invariants so the types remain trustworthy under pressure. It is not a chapter about parameter passing or ownership transfer mechanics. Those belong elsewhere. The focus here is modeling: deciding when a type should behave like a value, when identity must remain explicit, and how invariants stop the object graph from becoming a bag of fields.

Values and entities solve different problems

A value is defined by what it contains, not by where it came from. Two values that represent the same configuration, time window, or money amount should usually be interchangeable. You can copy them, compare them, and move them between threads without inventing a story about which one is the “real” instance.

An entity or identity-bearing object is different. A live client session is not interchangeable with another session that happens to have the same fields at one moment. A connection object may reconnect, accumulate statistics, and hold synchronization state while still remaining the same connection from the system’s point of view. Identity exists so the program can talk about continuity over time.

This sounds obvious. The design damage appears when teams fail to decide which category a type belongs to.

If an Order type is mutable, shared, equality-comparable by every field, and also used as a cache key, the program now has several incompatible stories about what that object means. If a configuration snapshot is wrapped in a reference-counted mutable object even though callers only need an immutable set of values, the code has paid for aliasing and lifetime complexity without gaining semantic power.

The useful default is stronger than many codebases realize: if a type does not need continuity across time, design it as a value first.

Value types reduce accidental coupling

Value semantics reduce invisible sharing. A caller gets its own copy or moved instance. Mutation is local. Equality can often be structural. Tests can build small examples without allocating object graphs or mocking infrastructure.

Configuration is a good example. Many systems model configuration as a globally shared mutable object because updates exist somewhere in the product. That choice infects code that only needs a stable snapshot.

Usually the better design is this:

Parse raw configuration into a validated value object.
Publish a new snapshot when configuration changes.
Let consumers hold the snapshot they were given.

That design makes each reader’s world explicit. Code processing one request can reason against one configuration value. There is no half-updated object graph, no lock required merely to read a timeout value, and no mystery about whether two callers are looking at the same mutable instance.

What goes wrong without value semantics

When configuration is modeled as a shared mutable object rather than a value snapshot, aliasing bugs appear:

// Anti-pattern: shared mutable configuration.
struct AppConfig {
    std::string db_host;
    int db_port;
    std::chrono::seconds timeout;
};

// A single global mutable instance, shared by reference.
AppConfig g_config;

void handle_request(RequestContext& ctx) {
    auto conn = connect(g_config.db_host, g_config.db_port);
    // ... long operation ...
    // BUG: another thread calls reload_config(), mutating g_config
    // mid-request. conn was opened with the old host, but now
    // ctx uses the new timeout. The request operates against
    // an incoherent mix of old and new configuration.
    conn.set_timeout(g_config.timeout);
}

With value semantics, each request captures its own immutable snapshot. No lock is needed to read fields, and no mid-flight mutation can create incoherent state:

void handle_request(RequestContext& ctx, const ServiceConfig& config) {
    // config is a value -- it cannot change during this call.
    auto conn = connect(config.db_host(), config.db_port());
    conn.set_timeout(config.timeout());
    // Entire request sees a single consistent configuration.
}

Values compose well. They can sit inside containers, cross threads, participate in deterministic tests, and form stable inputs to hashing or comparison. Identity-bearing objects can do those things too, but they require more rules and more caution. Use that complexity only when the model needs it.

Invariants are the reason to have types at all

A type that permits invalid combinations of state is often just a struct-shaped bug carrier.

An invariant is a condition that should hold whenever an object is observable by the rest of the program. A time window may require start <= end. A money amount may require a currency and a bounded integer representation. A batching policy may require max_items > 0 and flush_interval > 0ms. A connection state object may forbid “authenticated but not connected.”

The point of an invariant is not to make the constructor fancier. It is to shrink the number of invalid states that later code must defend against.

Consider a scheduling subsystem.

class RetryPolicy {
public:
    static auto create(std::chrono::milliseconds base_delay,
                       std::chrono::milliseconds max_delay,
                       std::uint32_t max_attempts)
        -> std::expected<RetryPolicy, ConfigError>;

    auto base_delay() const noexcept -> std::chrono::milliseconds {
        return base_delay_;
    }

    auto max_delay() const noexcept -> std::chrono::milliseconds {
        return max_delay_;
    }

    auto max_attempts() const noexcept -> std::uint32_t {
        return max_attempts_;
    }

private:
    RetryPolicy(std::chrono::milliseconds base_delay,
                std::chrono::milliseconds max_delay,
                std::uint32_t max_attempts) noexcept
        : base_delay_(base_delay),
          max_delay_(max_delay),
          max_attempts_(max_attempts) {}

    std::chrono::milliseconds base_delay_;
    std::chrono::milliseconds max_delay_;
    std::uint32_t max_attempts_;
};

The details of error transport belong more fully in the next chapter, but the modeling point is already clear: a RetryPolicy should not exist in a nonsense state. Once created, code using it should not have to ask whether the delays are inverted or the attempt count is zero unless those are valid meanings the domain actually wants.

If a type does not enforce its invariants, the burden moves outward into every caller and every code review.

What happens when invariants are not enforced

Compare the factory-validated RetryPolicy above with a plain aggregate that leaves validation to callers:

// Anti-pattern: invariants left to the caller.
struct RetryPolicy {
    std::chrono::milliseconds base_delay;
    std::chrono::milliseconds max_delay;
    std::uint32_t max_attempts;
};

void schedule_retries(const RetryPolicy& policy) {
    // Caller forgot to validate. base_delay is negative, max_attempts is 0.
    // This loop does nothing, silently dropping work.
    for (std::uint32_t i = 0; i < policy.max_attempts; ++i) {
        auto delay = std::min(policy.base_delay * (1 << i), policy.max_delay);
        enqueue_after(delay); // never executes when max_attempts == 0
    }
}

Every function that receives RetryPolicy must now independently check for nonsense values, or assume some earlier layer already did. In practice, some callers check and some do not, producing inconsistent behavior depending on the call path. The factory approach shown earlier makes this class of bug structurally impossible: if you have a RetryPolicy, it is valid.

The companion web-api project applies the same pattern to its domain model. Task::validate() is a static factory that returns Result<Task>, rejecting empty or oversized titles at the boundary:

// examples/web-api/src/modules/task.cppm
[[nodiscard]] static Result<Task> validate(Task t) {
    if (t.title.empty()) {
        return make_error(ErrorCode::bad_request, "title must not be empty");
    }
    if (t.title.size() > 256) {
        return make_error(ErrorCode::bad_request, "title exceeds 256 characters");
    }
    return t;
}

Every path that stores a Task goes through validate() first, including updates — the repository re-validates after mutation. The invariant is owned by the type, not by individual callers.

Anti-pattern: entity semantics smuggled into a value type

One recurring failure is a type that looks like a value because it is copied and compared, but actually carries identity-bearing mutation.

// Anti-pattern: one type tries to be both a value and a live entity.
struct Job {
    std::string id;
    std::string owner;
    std::vector<Task> tasks;
    std::mutex mutex; // RISK: identity-bearing synchronization hidden inside data model
    bool cancelled = false;
};

This object cannot honestly behave like a value because copying a mutex and a live cancellation flag has no sensible meaning. It also cannot honestly behave like a narrow entity model because the entire mutable representation is public. The type will infect the rest of the codebase with ambiguity.

The cleaner split is usually:

a value type such as JobSpec or JobSnapshot for the stable domain data,
and an identity-bearing runtime object such as JobExecution that owns synchronization, progress, and cancellation state.

That split makes clear which parts are serializable, comparable, cacheable, and safe to move across threads, and which parts model a live process.

The companion web-api project demonstrates this separation. Task is a value type (copyable, comparable, serializable) while TaskRepository is the identity-bearing entity that owns a shared_mutex, an ID generator, and the mutable collection. The value carries domain data; the entity manages lifecycle and synchronization.

Equality should match meaning

One of the best tests for whether a type has a coherent semantic role is whether equality is obvious.

For many value types, equality should be structural. Two validated endpoint configurations with the same host, port, and TLS mode are the same value. Two money amounts with the same currency and minor units are the same value.

For identity-bearing objects, structural equality is often actively misleading. Two live sessions with the same user id and remote address are not the same session. Two connections pointed at the same shard are not interchangeable if each carries different lifecycle state and pending work.

If a team cannot answer what equality should mean for a type, the type is probably mixing value data with identity-bearing runtime concerns.

The companion web-api project keeps this straightforward. Task declares a defaulted three-way comparison, making equality purely structural:

// examples/web-api/src/modules/task.cppm
[[nodiscard]] auto operator<=>(const Task&) const = default;

Because Task is a value, structural equality is the right answer. The identity-bearing TaskRepository has no equality operator at all — comparing two repositories would be meaningless.

This matters in practice. Equality influences cache keys, deduplication logic, diff generation, test assertions, and change detection. A semantically vague type produces semantically vague equality, which then breaks several systems at once.

Shallow copies and aliasing: a concrete trap

When a type looks like a value but shares internal state through pointers or references, copies become aliases rather than independent values:

// Anti-pattern: shallow copy creates aliasing bugs.
struct Route {
    std::string name;
    std::shared_ptr<std::vector<Endpoint>> endpoints; // shared, not owned
};

void reconfigure(Route primary) {
    Route backup = primary; // looks like a copy, but endpoints are shared

    backup.name = "backup-" + primary.name;
    backup.endpoints->push_back(fallback_endpoint()); // BUG: mutates primary too

    // primary.endpoints and backup.endpoints point to the same vector.
    // The caller who passed primary now sees an endpoint they never added.
}

The fix is to give the type genuine value semantics. Either store the vector directly as a member (so copies are deep), or use a copy-on-write strategy, or make the type immutable so sharing is safe:

struct Route {
    std::string name;
    std::vector<Endpoint> endpoints; // owned, copied on assignment

    auto with_endpoint(Endpoint ep) const -> Route {
        Route copy = *this;
        copy.endpoints.push_back(std::move(ep));
        return copy;
    }
};

Now Route behaves as a value. Copies are independent. Mutation through with_endpoint produces a new value without disturbing the original. No aliasing surprise is possible.

Mutation should respect the modeling choice

Values and entities tolerate mutation differently.

For value types, the cleanest design is often immutability after validation or at least mutation through narrow operations that preserve invariants. Replacing a configuration snapshot or producing a new routing table is frequently easier to reason about than mutating one shared instance in place.

For entities, mutation is natural because the object models continuity over time. But that does not justify public writable fields or unconstrained setters. An entity still needs a controlled state machine. A Connection may transition from connecting to ready to draining to closed; it should not permit arbitrary combinations just because the fields are individually legal.

The real design question is not whether mutation is allowed. It is where mutation is allowed and what guarantees survive it.

If mutation can break invariants between two field assignments, the type likely needs a stronger operation boundary. If callers must lock, update three fields, and remember to recompute a derived flag, the invariant was never really owned by the type.

// Anti-pattern: public fields allow invariant-breaking mutation.
struct TimeWindow {
    std::chrono::system_clock::time_point start;
    std::chrono::system_clock::time_point end;
};

void extend_deadline(TimeWindow& window, std::chrono::hours extra) {
    window.end += extra; // fine
}

void shift_start(TimeWindow& window, std::chrono::hours shift) {
    window.start += shift;
    // BUG: if shift is large enough, start > end.
    // Every consumer of TimeWindow must now defend against this.
}

An encapsulated type eliminates this class of bug by making the invariant un-breakable from outside:

class TimeWindow {
public:
    static auto create(system_clock::time_point start,
                       system_clock::time_point end)
        -> std::optional<TimeWindow>
    {
        if (start > end) return std::nullopt;
        return TimeWindow{start, end};
    }

    auto start() const noexcept { return start_; }
    auto end() const noexcept { return end_; }

    auto with_extended_end(std::chrono::hours extra) const -> TimeWindow {
        return TimeWindow{start_, end_ + extra}; // always valid: end moves forward
    }

private:
    TimeWindow(system_clock::time_point s, system_clock::time_point e)
        : start_(s), end_(e) {}

    system_clock::time_point start_;
    system_clock::time_point end_;
};

Callers cannot produce an invalid TimeWindow. The invariant start <= end is enforced once, in the type, rather than diffusely across every mutation site.

Small domain types are worth the ceremony

Experienced programmers sometimes resist tiny wrapper types because they look like ceremony compared with plain integers or strings. In production C++, these types often pay for themselves quickly.

An AccountId, ShardId, TenantName, BytesPerSecond, or Deadline type can eliminate argument swaps, clarify logs, and make invalid combinations harder to express. Just as importantly, these types can carry invariants and conversions locally instead of distributing them across parsing, storage, and formatting code.

The warning is that a wrapper type is only useful if it actually sharpens meaning. A thin shell around std::string that preserves all invalid states and adds no semantic operations is mostly noise. The right question is whether the type enforces or communicates a real distinction the system cares about.

Concurrency gets easier when values stay values

Many concurrency problems are modeling problems in disguise. Shared mutable state is hard largely because the program uses identity-bearing objects where immutable values would have been enough.

Threading a validated snapshot through a pipeline is easy to reason about. Sharing a mutable configuration service object with interior locking across the same pipeline is much harder. Passing a value-oriented request descriptor into work queues is simpler than passing a live session object with hidden aliasing and synchronization.

This does not mean every concurrent system can eliminate entities. But value semantics are one of the most effective ways to reduce the amount of state that must be shared and synchronized. When mutation can be replaced with snapshot publication or message passing of values, both correctness and reviewability improve.

Verification and review

Types that claim semantic roles should be reviewed against those roles directly.

Useful review questions:

Is this type primarily a value or primarily an identity-bearing object?
Do its equality, copying, and mutation rules match that choice?
Which invariants does the type enforce itself?
Would splitting stable domain data from live runtime state simplify the design?
Is shared mutable state present because the model truly requires identity, or because value semantics were never attempted?

Testing follows the same logic. Value types deserve property-style tests for invariant preservation, equality, and serialization stability where relevant. Identity-bearing types deserve lifecycle and state-machine tests that verify legal transitions and reject illegal ones.

Takeaways

Default to value semantics when continuity across time is not part of the domain meaning.
Make identity explicit when the object represents a specific live thing rather than interchangeable data.
Enforce invariants inside types so callers do not have to rediscover them defensively.
Let equality, copying, and mutation rules follow the semantic role of the type.
Split stable domain values from runtime control state when one object is trying to do both jobs.

When a type has a clear answer to “what kind of thing is this,” the rest of the design gets easier: ownership is more obvious, APIs get narrower, tests become simpler, and concurrency stops fighting hidden aliasing. That is why semantic clarity belongs this early in the book.

Errors, results, and failure boundaries

Most large C++ systems do not fail because they chose the wrong single error mechanism. They fail because several mechanisms are used at once with no layer policy. One subsystem throws exceptions, another returns status codes, a third logs and continues, and a fourth converts every failure into false. Individually, each choice may have looked reasonable. Together they produce code where callers cannot tell which operations may fail, which failures are recoverable, where diagnostics were emitted, or whether cleanup and rollback still ran.

The production question is not “exceptions or std::expected?” The production question is where each error model belongs, how failure information crosses boundaries, and which parts of the system are responsible for translation, logging, and crash decisions.

That boundary focus matters because error handling is architectural. A parsing layer, a domain layer, a storage adapter, and a process entry point all face different constraints. Conflating them is what makes code noisy and operationally fragile.

This chapter keeps the distinctions sharp. It does not try to outlaw exceptions or declare std::expected a universal replacement. It argues for a policy that preserves useful failure information without letting every low-level mechanism leak across the whole codebase.

Start by classifying failure

Not every failure deserves the same transport.

At a minimum, production code should distinguish among these categories:

Invalid input or failed validation.
Environmental or boundary failure such as file IO, network errors, or storage timeouts.
Contract violation or impossible internal state.
Process-level startup or shutdown failure.

These categories influence both recovery and observability. Invalid input is often expected at system edges and should usually become a local error result with enough detail to reject the request or config cleanly. Environmental failure may need translation, retry policy, or escalation. Contract violation often means the program or subsystem has already lost a key invariant; that is closer to crash territory than to “return an error and continue.” Startup failure is special because there may be no useful degraded mode at all. Failing fast can be the correct behavior.

Once these categories are explicit, API design gets easier. Not every function should expose every class of failure directly. A high-level domain function should not need to understand a vendor-specific SQL error enumeration if the only actionable outcomes are not_found, conflict, and temporarily_unavailable.

The error-code-only approach and its pitfalls

Before std::expected and before exceptions were widely adopted, C++ codebases (and C codebases that C++ inherited) relied on integer error codes and sentinel return values. That approach is still common, and its failure modes are worth examining concretely.

// Error-code-only style: caller must check, but nothing enforces it.
enum ConfigErrorCode { kOk = 0, kFileNotFound = 1, kParseError = 2, kInvalidValue = 3 };

ConfigErrorCode load_service_config(const std::string& path, ServiceConfig* out);

void startup() {
    ServiceConfig cfg;
    load_service_config("/etc/app/config.yaml", &cfg); // BUG: return code silently ignored

    // cfg may be uninitialized garbage -- the program continues anyway.
    listen(cfg.port); // binds to nonsense port or zero
}

The core problem is that error codes are advisory. The compiler does not require the caller to inspect them. Even with [[nodiscard]], a cast to void or an accidental omission silences the warning. In large codebases, studies of C error-code APIs consistently find that 30-60% of error returns are never checked at some call sites.

The secondary problem is information loss. An integer code cannot carry structured context (which file failed to parse, which value was invalid, what the underlying OS error actually said). Callers who do check the code often log a generic message and discard the detail, producing diagnostics that are useless during incidents.

std::expected solves both problems. The caller must explicitly access either the value or the error; attempting to use the value when the result holds an error is a visible, reviewable decision (and undefined behavior if done carelessly, which sanitizers catch). The error type can carry structured diagnostics without side-channel logging.

The companion web-api project applies this approach throughout. In error.cppm, a single Result<T> alias makes the pattern consistent across the entire codebase:

// examples/web-api/src/modules/error.cppm
template <typename T>
using Result = std::expected<T, Error>;

[[nodiscard]] inline std::unexpected<Error>
make_error(ErrorCode code, std::string detail) {
    return std::unexpected<Error>{Error{code, std::move(detail)}};
}

Error carries a typed ErrorCode enum and a human-readable detail string, structured enough for programmatic branching and descriptive enough for diagnostics. No integer codes are exposed to callers; every failure path goes through Result<T>.

Exceptions are good for unwinding and local clarity

Exceptions remain valuable in C++ because stack unwinding composes naturally with RAII. When a constructor fails halfway through a resource-owning object graph, exceptions let the language drive destruction without hand-written cleanup ladders. When a local implementation has several nested helper calls and any of them can fail in the same way, exceptions can keep the main path readable.

That does not make exceptions a universal boundary model.

Their strengths are real:

they separate ordinary flow from failure flow,
they preserve concise code through multiple call layers,
and they pair well with RAII because cleanup stays automatic.

Their weaknesses are also real:

they hide failure from the signature,
they can cross boundaries that were never designed for them,
and they invite sloppy layering when low-level exception types leak into high-level code.

The right conclusion is modest. Exceptions are often a good internal mechanism inside a layer. They are usually a poor language for broad subsystem boundaries unless the codebase has committed to that model consistently and can enforce it.

`std::expected` is strong at decision boundaries

std::expected<T, E> is not better than exceptions in the abstract. It is better when the caller is expected to make a visible decision based on the failure.

Parsing, validation, boundary translation, and request-level operations often fall into this category. The call site usually needs to branch, emit a structured rejection, choose retry behavior, or attach context. Returning an expected makes that decision point explicit.

Consider a configuration loader:

enum class ConfigErrorCode {
    file_not_found,
    parse_error,
    invalid_value,
};

struct ConfigError {
    ConfigErrorCode code;
    std::string message;
    std::string source;
};

auto load_service_config(std::filesystem::path path)
    -> std::expected<ServiceConfig, ConfigError>;

This contract tells the reader something important. Failure is part of normal control flow at this boundary. The caller must decide whether to abort startup, fall back to a default environment, or report a clear diagnostic. That is different from a deep internal helper whose only sensible failure policy is to unwind to the boundary that can actually choose.

Compare the expected-based loader with a traditional output-parameter-plus-bool approach to see how much information the old style loses:

// Old style: bool return, output parameter, no structured error.
bool load_service_config(const std::filesystem::path& path,
                         ServiceConfig* out,
                         std::string* error_msg = nullptr);

void startup() {
    ServiceConfig cfg;
    std::string err;
    if (!load_service_config("/etc/app/config.yaml", &cfg, &err)) {
        // What kind of failure? File missing? Parse error? Permission denied?
        // err is a free-form string -- no programmatic branching possible.
        LOG_ERROR("config load failed: {}", err);
        std::exit(1); // only option: cannot distinguish retriable from fatal
    }
}

With std::expected<ServiceConfig, ConfigError>, the caller can branch on ConfigErrorCode::file_not_found versus ConfigErrorCode::parse_error, choose different recovery strategies, and still access a human-readable message for logging. The type system carries the decision-relevant information rather than burying it in a string.

The danger with expected is over-propagation. If every tiny helper returns expected merely because a public boundary does, the implementation can become littered with repetitive forwarding logic that obscures the main algorithm. Keep expected where the error belongs in the design. Do not force it through every private function unless that really improves local clarity.

Anti-pattern: side-effect errors with no boundary policy

One of the most common production failures is logging plus partial status plus occasional throwing, all in the same subsystem.

// Anti-pattern: side effects and transport are mixed.
bool refresh_profile(Cache& cache, DbClient& db, UserId user_id) {
    try {
        auto row = db.fetch_profile(user_id);
        if (!row) {
            LOG_ERROR("profile not found for {}", user_id);
            return false;
        }

        cache.put(user_id, to_profile(*row));
        return true;
    } catch (const DbTimeout& e) {
        LOG_WARNING("db timeout: {}", e.what());
        throw; // RISK: some failures logged here, some rethrown, signature hides both
    }
}

This function is expensive to use because the caller does not know what false means, which failures were already logged, or whether it still needs to add context. If several layers follow this pattern, incidents become noisy and under-explained at the same time.

Boundary code should choose one transport and one logging policy. Either the function returns a structured failure and leaves logging to a higher layer that can attach request context, or it handles the failure completely and makes that clear in the contract. Mixing both is how duplicate logs and missing decisions enter a system.

Anti-pattern: silent failures from unchecked returns

A subtler variant of the side-effect problem is code that converts failures into default values without any signal to the caller.

// Anti-pattern: failure becomes a silent default.
int get_retry_limit(const Config& cfg) {
    auto val = cfg.get_int("retry_limit");
    if (!val) {
        return 3; // silent fallback -- no log, no metric, no trace
    }
    return *val;
}

This is seductive because the code never crashes. But when the configuration file has a typo (retry_limt instead of retry_limit), the system silently uses a hardcoded default. During an incident, operators change the configuration expecting behavior to update, and nothing happens. The bug is invisible precisely because the error was swallowed.

The better approach makes the default explicit and the fallback observable:

auto get_retry_limit(const Config& cfg) -> std::uint32_t {
    constexpr std::uint32_t default_limit = 3;
    auto val = cfg.get_uint("retry_limit");
    if (!val) {
        LOG_INFO("retry_limit not configured, using default={}", default_limit);
        return default_limit;
    }
    return *val;
}

Or, if the caller should decide whether a missing value is acceptable, return the expected or optional directly and let the boundary make the policy choice.

Translate near volatile dependencies

Boundary translation is where most error design work should happen.

A storage adapter may receive driver exceptions, status codes, retry hints, or platform errors. The rest of the system usually does not want those details directly. It wants decision-relevant categories and maybe enough attached context for diagnostics.

That means translation should happen close to the unstable dependency, not in business logic three layers away.

auto AccountRepository::load(AccountId id)
    -> std::expected<AccountSnapshot, AccountLoadError>
{
    try {
        auto row = client_.fetch_account(id);
        if (!row) {
            return std::unexpected(AccountLoadError::not_found(id));
        }
        return to_snapshot(*row);
    } catch (const DbTimeout& e) {
        return std::unexpected(AccountLoadError::temporarily_unavailable(
            id, e.what()));
    } catch (const DbProtocolError& e) {
        return std::unexpected(AccountLoadError::backend_fault(
            id, e.what()));
    }
}

This does not erase useful information. It packages it in a form the caller can act on. Business logic can now distinguish not-found from temporary unavailability without learning the storage client’s failure taxonomy.

The same rule applies to network boundaries, filesystem boundaries, and third-party libraries. Translate once near the edge. Do not let raw backend errors leak until every layer has to understand them.

The companion web-api project shows the same pattern at the HTTP boundary. In handlers.cppm, result_to_response() translates a domain Result<T> into an HTTP response exactly once, at the edge:

// examples/web-api/src/modules/handlers.cppm
template <json::JsonSerializable T>
[[nodiscard]] http::Response
result_to_response(const Result<T>& result, int success_status = 200) {
    if (result) {
        return {.status = success_status, .body = result->to_json()};
    }
    return http::Response::error(result.error().http_status(),
                                 result.error().to_json());
}

Domain logic works exclusively with Result<T>. The translation to HTTP status codes and JSON error bodies happens in a single function at the handler boundary. No domain code imports HTTP concepts, and no handler code inspects error internals beyond calling the translation.

Constructors, destructors, and startup need different rules

Error policy should respect lifecycle context.

Constructors are often a good place to use exceptions because partial construction plus RAII is one of C++’s strongest combinations. A resource-owning object that cannot be made valid should usually refuse to exist. Returning a half-initialized object plus status is rarely an improvement.

Destructors are the opposite. Throwing across destruction is usually catastrophic or forbidden by the design. If cleanup can fail meaningfully, the type may need an explicit close, flush, or commit operation that reports failure while the object is still in a controlled state. The destructor then becomes best-effort cleanup or final safety net.

Startup is its own case. At process startup, configuration loading, dependency initialization, and binding to ports often have only one sensible failure policy: produce a clear diagnostic and fail the process. That is not the same as saying every startup helper should call std::exit. It means the top-level startup boundary should own the decision and the lower layers should return enough structured information to make that failure obvious and precise.

Diagnostics must be rich without being contagious

Good error handling preserves context. Bad error handling spreads context-building code into every branch until the main behavior disappears.

Useful failure information often includes:

a stable category or code,
a human-readable message,
key identifiers such as file path, tenant, shard, or request id,
and sometimes captured backend detail or stacktrace data when it materially helps debugging.

The trick is to keep the error object meaningful without letting it become a dump of every internal detail. A domain-facing error type should expose what callers need to decide and what operators need to diagnose, not every low-level exception string encountered on the way.

This is one reason named error types matter. expected<T, std::string> is quick to write and weak as a system design. Strings are good final diagnostics and poor architectural contracts.

Where to log

The cleanest default is to log at boundaries that have enough context to make the event operationally useful.

That usually means request boundaries, background job supervisors, startup entry points, and outer retry loops. It usually does not mean every helper that notices failure. Logging too early strips context. Logging at every layer duplicates noise. Logging nowhere until the process dies loses evidence.

The core rule is simple: the layer that decides what the failure means operationally is usually the right place to log it.

That rule works well with expected-style boundaries and with exception translation. Lower layers preserve information. Boundary layers classify, add context, decide recovery, and emit the event once.

Contract violations are not just another error path

Some failures indicate that the program received bad input. Others indicate that the program broke its own assumptions.

If an invariant that should have been enforced earlier is now false, or a supposedly unreachable state is reached, pretending this is just another recoverable business error often hides a deeper bug. That does not always require immediate process termination, but it does require different treatment from routine validation failure.

A good codebase makes these distinctions explicit. Input failure is modeled as input failure. Backend unavailability is modeled as environmental failure. Internal invariant breakage is surfaced as a bug, not normalized into an ordinary “operation failed” code path.

Verification and review

Failure handling should be reviewed as a system property, not function by function in isolation.

Useful review questions:

Which failures are expected and decision-relevant at this boundary?
Are exceptions being used internally for clarity, or leaking unpredictably across layers?
Is expected carrying real decision information, or merely replacing exceptions with boilerplate?
Where is backend-specific failure translated into stable categories?
Is logging happening once, at the layer with enough context to make it useful?

Testing should include unhappy paths deliberately. Parse invalid input. Simulate timeouts and not-found cases. Verify translation from backend failures into domain-facing failures. Exercise startup failure paths and explicit close or commit operations. A codebase that only tests happy-path behavior will eventually discover its actual error model in production.

Takeaways

Choose error transport by layer and boundary, not by ideology.
Use exceptions where unwinding and local clarity help, especially inside layers and during construction.
Use std::expected where callers must make explicit decisions based on failure.
Translate unstable backend errors near the dependency boundary into stable, decision-relevant categories.
Log at the layer that understands the operational meaning of the failure.

If callers cannot tell what failed, whether it was already logged, and what they are expected to do next, the failure boundary is badly shaped. That is a design flaw long before it becomes an outage.

Parameter passing, return types, and API surface

In C++, a function signature says more than many authors intend. It says whether the callee borrows or retains data. It often implies whether null is meaningful, whether mutation is allowed, whether a copy may happen, and whether failure is ordinary control flow. If the signature gets these semantics wrong, the implementation can still be locally correct while the API remains expensive to use and hard to review.

This chapter is about that semantic surface area. The goal is not to memorize “always pass X by Y.” The goal is to choose parameter and return forms that tell the truth about ownership, lifetime, mutation, nullability, and cost at the boundary where a caller must make decisions.

That boundary focus keeps this chapter distinct from the ones before and after it. Chapter 1 dealt with who owns resources. Chapter 2 dealt with what kinds of objects exist in the model. Chapter 3 dealt with how failure should cross boundaries. Here the question is narrower and more practical: given those design choices, what should a function signature look like so the call contract is obvious?

Signatures are contracts, not type-checking rituals

Many bad C++ APIs come from treating the signature as the smallest set of types that compiles. That approach ignores the fact that parameter and return choices are documentation with teeth.

Take a parser boundary.

auto parse_frame(std::span<const std::byte> bytes)
    -> std::expected<Frame, ParseError>;

This single line communicates several things at once: the function borrows contiguous read-only bytes, does not require ownership of the source buffer, produces a frame as an owned value, and makes failure explicit.

Compare that with Frame parse_frame(const std::vector<std::byte>&);. That version implies a container choice the parser does not need, hides failure policy, and says nothing about whether the returned Frame contains borrowed views into the input or independent owned data.

The example project’s HTTP parser follows the same pattern. In examples/web-api/src/modules/http.cppm, parse_request borrows its input and returns an owned result:

[[nodiscard]] inline std::optional<Request>
parse_request(std::string_view raw);

The function accepts a string_view into a stack buffer, parses method, path, headers, and body, and returns a fully owned Request with std::string members. The caller’s buffer can be reused or destroyed immediately after the call returns. That borrow-in, own-out contract is visible in the signature alone.

The difference is not stylistic polish. It is whether the call site can reason about the contract without opening the implementation.

Borrowing parameters should look borrowed

If a function reads caller-owned data during the call and does not retain it, the signature should express borrowing directly.

For text, std::string_view is usually the right parameter type when null termination is irrelevant and no ownership transfer occurs. For contiguous binary or element sequences, std::span<const T> is usually the right read-only form. For mutable borrowed access, std::span<T> or a non-const reference may be appropriate depending on whether the abstraction is sequence-shaped or object-shaped.

This has two advantages.

Call sites remain flexible. They can pass strings, slices, arrays, vectors, and mapped buffers without forced allocation or container conversion.
The contract is honest. Borrowing stays borrowing.

The main misuse is allowing borrowed parameters to leak into retained state. A function that accepts string_view and then caches it beyond the call is not clever; it is lying about the contract.

Dangling borrows: the cost of getting this wrong

When a borrowed parameter outlives its source, the result is undefined behavior that often manifests as intermittent corruption rather than a clean crash:

class Logger {
public:
    void set_prefix(std::string_view prefix) {
        prefix_ = prefix; // BUG: stores a view, not a copy
    }

    void log(std::string_view message) {
        fmt::print("[{}] {}\n", prefix_, message); // reads dangling view
    }

private:
    std::string_view prefix_; // non-owning -- lifetime depends on caller
};

void configure_logger(Logger& logger) {
    std::string name = build_service_name();
    logger.set_prefix(name); // name is destroyed at end of scope
} // name destroyed here -- logger.prefix_ is now dangling

The fix is straightforward: if the member must outlive the call, it must own its data.

class Logger {
public:
    void set_prefix(std::string prefix) { // takes ownership by value
        prefix_ = std::move(prefix);
    }
    // ...
private:
    std::string prefix_; // owning -- no lifetime dependency on caller
};

That is why a useful review heuristic is simple: if the parameter type says borrow, all retention must be visible in the implementation as an explicit copy or transformation to an owning type.

Pass by value when the callee needs its own copy anyway

One of the most useful modern C++ patterns is passing by value when the callee intends to store or otherwise own the argument. This often surprises people trained to avoid copies at all costs.

Consider a request object that stores a tenant name.

class RequestContext {
public:
    explicit RequestContext(std::string tenant)
        : tenant_(std::move(tenant)) {}

private:
    std::string tenant_;
};

This constructor is often better than both const std::string& and std::string_view.

It is honest that the object will own a string.
Lvalue callers pay one copy, which was unavoidable anyway.
Rvalue callers can move directly.
There is no temptation to retain a borrowed view accidentally.

The rule is not “always pass expensive types by const reference.” The rule is “pass by value when ownership transfer into the callee is the intended contract and the extra move/copy story is acceptable.”

Wrong parameter choices and their costs

The cost of getting parameter passing wrong is not always dramatic, but it compounds across hot paths and large objects.

An unnecessary copy from const std::string& when ownership is needed:

class Registry {
public:
    void register_name(const std::string& name) {
        names_.push_back(name); // always copies, even if caller passed a temporary
    }
private:
    std::vector<std::string> names_;
};

// Caller:
registry.register_name(build_name()); // builds a temporary string, copies it,
                                      // then destroys the temporary. The move
                                      // that pass-by-value would have enabled
                                      // is lost.

With pass-by-value-and-move, the temporary is moved directly into the container at zero copy cost:

void register_name(std::string name) {
    names_.push_back(std::move(name)); // rvalue callers: 1 move. lvalue callers: 1 copy + 1 move.
}

The example project’s error module applies the same pattern. In examples/web-api/src/modules/error.cppm, make_error takes a std::string by value and moves it into the error object:

[[nodiscard]] inline std::unexpected<Error>
make_error(ErrorCode code, std::string detail) {
    return std::unexpected<Error>{Error{code, std::move(detail)}};
}

Callers passing a string literal or a temporary pay zero copy cost; callers passing an lvalue pay one copy that the function would have needed anyway. The signature honestly communicates that detail will be owned by the resulting error.

A forced allocation from const std::vector<T>& when std::span suffices:

// Anti-pattern: forces callers to allocate a vector even if data is in an array or span.
double average(const std::vector<double>& values);

// Caller with a C array or std::array must construct a vector just to call this:
std::array<double, 4> readings = {1.0, 2.0, 3.0, 4.0};
auto avg = average(std::vector<double>(readings.begin(), readings.end())); // pointless heap allocation

With std::span<const double>, the function accepts any contiguous source without forcing a container choice:

double average(std::span<const double> values);

// Now works with vector, array, C array, span -- no allocation required.
auto avg = average(readings);

The tradeoff is that pass-by-value can be wrong for polymorphic types, very large aggregates where copying lvalues is rarely desired, or APIs where retention is conditional and uncommon. As always, the semantic contract comes first.

Non-const reference means more than mutability

A non-const reference parameter is strong syntax. It says the caller must provide a live object, null is not meaningful, and the callee may mutate that exact object. This is sometimes the right contract. It is also overused.

Use a non-const reference when the mutation is central to the operation and callers should see it as the main point of the call. Sorting a vector in place, filling a provided output buffer, or advancing a parser state object may fit.

Do not use non-const references merely to avoid returning a value or because out-parameters feel familiar from C APIs. Out-parameters weaken readability when the result is conceptually the output of the function rather than an object the caller is deliberately handing over for mutation.

In modern C++, returning a value is usually clearer for primary results. Reserve non-const reference parameters for genuine in-place mutation or multi-object coordination where mutation is the real contract.

Raw pointers are mostly for optionality and interop

Raw pointers still have legitimate roles in interfaces. The cleanest modern use is to represent an optional borrowed object or to interoperate with lower-level APIs.

That is a narrower role than many codebases give them.

A T* parameter should usually mean one of two things:

The callee may receive no object at all.
The interface is crossing into pointer-based interop or low-level data structures where pointer identity itself matters.

If null is not meaningful, a reference is usually clearer. If ownership is being transferred, std::unique_ptr<T> or another owning type is clearer. If the object is an array or contiguous sequence, std::span<T> is usually clearer. A naked pointer that means “non-null borrowed maybe-single maybe-many maybe-retained” is semantic debt.

The same principle applies to return types. Returning a raw owning pointer from ordinary modern C++ APIs is almost always the wrong signal. Returning a raw observer pointer can be fine when absence is meaningful and lifetime is controlled elsewhere.

Return owned meaning, not storage accidents

Return types deserve the same discipline as parameters. The main question is whether the caller should receive owned meaning, borrowed access, or a decision-bearing wrapper such as expected or optional.

For many APIs, returning an owned value is the cleanest design even when it involves a move. This keeps lifetime local, makes composition easier, and avoids callers depending on internal storage. Modern C++23 move semantics make value returns cheap enough in many cases that the clarity win dominates.

Borrowed return types are appropriate only when the source lifetime is obvious, stable, and truly part of the contract. Returning std::string_view into internal storage is fine only when the storage clearly outlives the view and callers can use that fact safely. Across broad boundaries, this is often a poor trade because it exports lifetime reasoning the callee could have kept private.

Optionality and failure should also be explicit in the return type rather than smuggled through sentinel values. A search returning “maybe found” fits std::optional<T> or an observer pointer if lifetime semantics require it. A parse or load operation whose failure is decision-relevant fits std::expected<T, E>. A function that returns an empty string or -1 on failure is usually making the API weaker than the implementation needs it to be.

Anti-pattern: one signature, several hidden stories

This kind of API survives in many codebases because it seems flexible.

// Anti-pattern: signature hides ownership, failure, and buffer contract.
bool encode_record(const Record& record,
               std::vector<std::byte>& output,
               std::string* error_message = nullptr);

This one function now carries several hidden rules.

Does it append to output or overwrite it?
Is error_message optional because diagnostics are not important, or because logging happens elsewhere?
Can output be partially modified on failure?
Is false validation failure, encoding bug, capacity issue, or internal exception translation?

Nothing in the signature answers those questions cleanly.

A stronger API usually splits the semantics.

auto encode_record(const Record& record)
    -> std::expected<std::vector<std::byte>, EncodeError>;

auto append_encoded_record(const Record& record,
                   ByteAppender& output)
    -> std::expected<void, EncodeError>;

Now the caller chooses between owned result production and append-style mutation, and the failure contract is explicit. The two different operations no longer pretend to be one generic “flexible” interface.

Factories and acquisition functions must state ownership up front

Creation functions are where unclear ownership becomes especially expensive. A factory returning T* leaves callers asking who deletes it. A factory writing into an out-parameter plus bool return often hides partial-construction rules. A factory returning shared_ptr<T> by default may impose shared ownership long before the design proved it necessary.

For ordinary exclusive ownership, std::unique_ptr<T> is usually the clearest result. For value-like created objects, return the value directly or use expected<T, E> when failure belongs at the boundary. For shared ownership, return shared_ptr<T> only when the product being created is genuinely intended for shared lifetime.

The difference is concrete:

// Anti-pattern: raw pointer factory -- caller does not know who owns the result.
Widget* create_widget(const WidgetConfig& cfg);

void setup() {
    auto* w = create_widget(cfg);
    // Does the caller own w? Does a global registry own it?
    // Must the caller call delete? delete[]? A custom deallocator?
    // Nothing in the signature answers these questions.
    use(w);
    // If the caller guesses wrong, the result is a leak or a double-free.
}

// Clear: unique_ptr states exclusive caller ownership unambiguously.
auto create_widget(const WidgetConfig& cfg)
    -> std::expected<std::unique_ptr<Widget>, WidgetError>;

void setup() {
    auto result = create_widget(cfg);
    if (!result) { /* handle error */ }
    auto widget = std::move(*result); // ownership transferred, no ambiguity
    // widget is destroyed automatically when it leaves scope
}

The example project shows a value-oriented variant of this pattern. In examples/web-api/src/modules/task.cppm, Task::validate is a factory-style function that takes a Task by value and returns Result<Task> (an alias for std::expected<Task, Error>):

[[nodiscard]] static Result<Task> validate(Task t) {
    if (t.title.empty()) {
        return make_error(ErrorCode::bad_request, "title must not be empty");
    }
    return t;
}

And in examples/web-api/src/modules/repository.cppm, TaskRepository::create composes with it — accepting a Task by value, validating, assigning an ID, and returning the stored result or the validation error:

[[nodiscard]] Result<Task> create(Task task);

Neither function uses out-parameters or bool return codes. The ownership story is the same as the unique_ptr factory above, adapted for value types: the caller moves a value in, gets back either a valid owned result or an explicit error.

The important point is not the specific vocabulary type. It is that creation boundaries are where ownership should become unmistakable.

API surface is also cost surface

Signature choices influence cost in ways that matter to callers.

A std::function parameter may allocate and type-erase even when the callback is used only synchronously. A std::span<const T> avoids forcing callers into a container representation. A std::string by-value sink constructor may permit efficient moves from temporaries. Returning an owned vector may allocate once but eliminate a long-lived lifetime hazard. These are design tradeoffs, not micro-optimizations.

The right discipline is to expose costs the caller should know about and avoid accidental ones the caller cannot infer. A good signature does not promise zero cost. It makes the important costs unsurprising.

That is also why broad “convenience” overload sets can become harmful. When an API accepts every combination of pointer, string, span, vector, and view, the overload surface can become harder to reason about than the original problem. Prefer a small number of semantically crisp forms.

Verification and review

Function signatures are one of the cheapest places to catch design mistakes early.

Useful review questions:

Does each parameter truthfully communicate borrow, ownership transfer, mutation, or optionality?
Is pass-by-value being used where the callee needs ownership, rather than out of habit or dogma?
Are raw pointers reserved for optional borrowed access or interop, rather than vague contracts?
Does the return type express owned result, borrowed access, or explicit failure clearly?
Is the API exposing the important costs and hiding only incidental implementation details?

Tests should exercise signature-driven semantics, not only core behavior. Verify append versus overwrite behavior. Verify that returned views remain valid for the documented lifetime and no longer. Verify that failure leaves output parameters or state in the promised condition. A clear signature still needs evidence behind it.

Takeaways

Treat signatures as semantic contracts, not just compiler-acceptable types.
Use borrowing parameter types when the callee only inspects caller-owned data.
Pass by value when the callee needs to take ownership and that contract should be obvious.
Use references, pointers, and return wrappers to express mutation, optionality, and failure deliberately.
Keep the API surface small enough that callers can understand lifetime and cost without reading implementation code.

Good C++ APIs do not merely compile. They tell the truth early enough that callers can use them correctly on the first read. That is what makes signatures worth this much attention.

Standard library types that change design

The standard library matters most when it stops being a utility layer and starts changing what your APIs are allowed to mean. In production C++, that shift happens with a small set of vocabulary types. They make borrowing explicit, separate absence from failure, represent closed sets of alternatives, and keep ownership from leaking through every function signature.

This chapter is not a tour of headers. The question is narrower and more useful: which standard types should change how you design ordinary code in a C++23 codebase, and where do those same types become misleading or expensive?

The pressure shows up at boundaries. A service parses bytes from the network, hands borrowed text into validation, constructs domain values, records partial failures, and emits results into storage or downstream services. If those steps are expressed with raw pointers, sentinel values, and container-heavy signatures, the code compiles but the contracts stay vague. Readers have to infer ownership, lifetime, nullability, and error meaning from implementation details. That is exactly the wrong place to hide them.

Borrowing types change API shape

std::string_view and std::span are the most important everyday design types in modern C++ because they separate access from ownership. That sounds small. It is not. Once a codebase adopts borrowing types consistently, function signatures stop implying allocations they do not need and stop pretending to own data they merely inspect.

Consider a telemetry ingestion layer that parses line-based text records and binary attribute blobs:

struct MetricRecord {
    std::string name;
    std::int64_t value;
    std::vector<std::byte> attributes;
};

auto parse_metric_line(std::string_view line,
                       std::span<const std::byte> attribute_bytes)
    -> std::expected<MetricRecord, ParseError>;

This signature says several important things immediately.

The function borrows both inputs.
The text input is not required to be null-terminated.
The binary input is a contiguous read-only sequence.
Ownership of the parsed result is transferred only in the return value.
Failure is not the same thing as absence.

The older alternatives blur those statements. const std::string& suggests string ownership exists somewhere, even when callers are holding a slice into a larger buffer. const std::vector<std::byte>& excludes stack buffers, std::array, memory-mapped regions, and packet views for no good reason. const char* quietly reintroduces lifetime ambiguity and C-string assumptions.

To see the difference concretely, consider how the same boundary looked before borrowing types existed:

// Pre-C++17: raw pointer + length, no type safety on the binary side
auto parse_metric_line(const char* line, std::size_t line_len,
                       const unsigned char* attr_bytes, std::size_t attr_len,
                       MetricRecord* out_record) -> int; // 0 = success, -1 = error

// Or the "safe" version that forces callers into specific containers
auto parse_metric_line(const std::string& line,
                       const std::vector<unsigned char>& attribute_bytes)
    -> MetricRecord; // throws on failure, no way to distinguish absence from error

The pointer-and-length version has no type system support for contiguity, read-only access, or even the fact that the binary buffer represents bytes rather than characters. Every caller must manually track two raw values per parameter, and a transposition bug (passing attr_len where line_len was expected) compiles silently. The container-reference version forces every caller to allocate a std::string and a std::vector even when the data already lives in a memory-mapped file or a stack buffer. Neither version communicates the ownership contract through the type system.

Borrowing types do impose discipline. A std::string_view is safe only while its source remains alive and unchanged. A std::span is safe only while the referenced storage remains valid. That is not a weakness of the types. It is the point. They force the boundary to say, in the type system, that this is a borrowing relationship and not an ownership transfer.

The failure mode is storing the borrow when the lifetime guarantee was local.

class RequestContext {
public:
    void set_tenant(std::string_view tenant) {
        tenant_ = tenant; // BUG: borrowed view may outlive caller storage
    }

private:
    std::string_view tenant_;
};

This is not a reason to avoid std::string_view. It is a reason to use it only for parameters, local algorithm plumbing, and return types whose lifetime contract is obvious and reviewable. If the object needs to keep the data, store std::string. If a subsystem needs stable binary ownership, store a container or dedicated buffer type.

The example project demonstrates this borrow-then-own transition cleanly. In examples/web-api/src/modules/task.cppm, Task::from_json accepts a string_view to borrow the raw JSON body, but returns an optional<Task> whose std::string members own their data independently:

[[nodiscard]] static std::optional<Task> from_json(std::string_view sv);

The function extracts field values from the borrowed input, moves them into owned strings, and returns a fully self-contained Task. The caller’s buffer can be reused or destroyed immediately. This is the pattern described above: borrow for inspection, own for storage.

In practice, a good review question is simple: does this object merely inspect caller-owned data, or does it need to retain it? If the latter, a borrowing type in storage is already suspicious.

`optional`, `expected`, and `variant` solve different problems

Production code gets expensive when teams use one vocabulary type as a universal answer. std::optional, std::expected, and std::variant all model different semantics. Choosing among them is a design decision, not a style preference.

Use std::optional<T> when the absence of a value is ordinary and not itself an error. A cache lookup may miss. A configuration override may be unset. An HTTP request may or may not include an idempotency key. If callers are expected to branch on presence without explanation, optional is the right signal.

Use std::expected<T, E> when failure information matters to control flow, logging, or user-visible behavior. Parsing, validation, protocol negotiation, and boundary I/O usually belong here. Returning optional from those operations throws away the reason the work failed and forces side channels for diagnostics.

Use std::variant<A, B, ...> when the result is one of several valid domain states, not a success-or-failure pair. A messaging system might model a command as one of several packet shapes. A scheduler might represent work as std::variant<TimerTask, IoTask, ShutdownTask>. That is not failure; it is an explicit closed set.

The mistake is treating these types as interchangeable wrappers around uncertainty.

optional is for maybe-there.
expected is for success-or-explanation.
variant is for one-of-several-valid-forms.

The example project illustrates this distinction concretely. In examples/web-api/src/modules/repository.cppm, a lookup that may simply miss uses optional:

[[nodiscard]] std::optional<Task> find_by_id(TaskId id) const;

There is no error to report — the task either exists or it does not. Returning expected here would force callers to inspect an error they cannot act on. optional is the right signal for ordinary absence.

Once you phrase them that way, many API debates end quickly.

What designs looked like before these types

Before std::optional, the standard idiom for “maybe a value” was a sentinel or an out-parameter:

// Sentinel: -1 means "not found." Every caller must know the convention.
int find_port(const Config& cfg); // returns -1 if unset

// Out-parameter: success indicated by bool return, value written through pointer.
bool find_port(const Config& cfg, int* out_port);

// Nullable pointer: caller must check for null, and ownership is ambiguous.
const Config* find_override(std::string_view key); // null means absent... or error?

Every one of these forces callers to remember an informal protocol. Sentinels like -1 or nullptr are invisible in the type system; nothing prevents a caller from using the sentinel value in arithmetic. Out-parameters invert the data flow and make chaining awkward. With std::optional<int>, the type itself carries the “maybe absent” semantics and the compiler helps enforce the check.

Before std::variant, closed sets of alternatives were modeled with union, an enum discriminator, and manual discipline:

// C-style tagged union: no automatic destruction, no compiler-checked exhaustiveness
enum class ValueKind { Integer, Float, String };

struct Value {
    ValueKind kind;
    union {
        std::int64_t as_int;
        double as_float;
        char as_string[64]; // fixed buffer, truncation risk
    };
};

void process(const Value& v) {
    switch (v.kind) {
    case ValueKind::Integer: /* ... */ break;
    case ValueKind::Float:   /* ... */ break;
    // Forgot String? Compiles fine. UB at runtime if String arrives.
    }
}

The union holds the data but the language provides no guarantee that kind and the active member stay in sync. Adding a new alternative requires updating every switch site manually, and the compiler is not required to warn about missing cases. std::variant makes the active alternative part of the type’s runtime state, destructs the previous value correctly on reassignment, and std::visit provides a pattern where the compiler can warn if a case is missing.

Suppose a configuration loader may find no override, may parse a valid override, or may reject malformed input. Those are three semantically different outcomes. Cramming them into optional<Config> loses the reason malformed input was rejected. Returning expected<optional<Config>, ConfigError> may look slightly heavier, but it states the contract precisely: absence is normal, malformed input is failure.

The same precision matters across service boundaries. If an internal client library returns variant<Response, RetryAfter, Redirect>, callers can pattern-match on legitimate protocol outcomes. If it returns expected<Response, Error> instead, retry and redirect get misclassified as error paths even when they are part of the intended control flow.

The example project uses this approach at its domain boundary. In examples/web-api/src/modules/error.cppm, a project-wide alias makes the pattern consistent:

template <typename T>
using Result = std::expected<T, Error>;

Then in examples/web-api/src/modules/repository.cppm, creation that can fail with a meaningful reason returns Result<Task>:

[[nodiscard]] Result<Task> create(Task task);

If validation rejects the input, the caller receives an Error with a code and a human-readable detail string — not a bare false or an empty optional. The create_task handler in examples/web-api/src/modules/handlers.cppm translates that Result into an HTTP response at the boundary, without out-parameters or exception handling:

auto result = repo.create(std::move(*parsed));
return result_to_response(result, 201);

expected also changes exception strategy. In codebases that use exceptions sparingly or forbid them across certain boundaries, expected lets failure stay local and explicit without collapsing into status codes and out-parameters. But there is a real tradeoff: plumbing expected through every private helper can turn straight-line code into repetitive propagation logic. Keep it at boundaries where the error information matters. Inside a tightly scoped implementation, a local exception boundary or a smaller helper decomposition may still produce cleaner code.

Containers should not pretend to be contracts

One of the most persistent design mistakes in C++ code is using owning containers as parameter types when the function only needs a sequence. std::vector<T> in a signature is rarely a neutral choice. It says something about allocation strategy, contiguity, and caller representation. Sometimes that is intended. Often it is accidental.

If a function consumes a read-only sequence, accept std::span<const T>. If it needs a mutable view over contiguous caller storage, accept std::span<T>. If it needs ownership transfer, accept the owning type explicitly. If it needs a specific associative container because lookup complexity or key stability is part of the contract, say so directly.

That distinction is especially important in libraries. A compression library that exposes compress(const std::vector<std::byte>&) has silently told every caller how to store input buffers. A better boundary is almost always a borrowing range over bytes, often std::span<const std::byte>. The owning choice then stays with the caller: pooled buffer, memory-mapped file region, stack array, or vector.

The reciprocal mistake is returning views when the function is actually producing owned data. Returning std::span<const Header> from a parser that builds a local vector is wrong. Returning std::vector<Header> or an owning domain object is right. Borrowing types improve APIs when they describe reality. They make APIs worse when used to avoid a copy that the contract requires.

There is also a question of mutation. Passing a mutable container often exposes far more freedom than the algorithm needs. A function that only appends parsed records should not accept an entire mutable map if the real contract is output insertion. In those cases, consider a narrower abstraction: a callback sink, a dedicated appender type, or a constrained generic interface as discussed in the next chapter. Types should express what the callee is allowed to assume, not just what happens to compile.

A real boundary: parsing without contract drift

A native service that ingests protobuf-like frames from a socket often has three distinct layers:

A transport layer that owns buffers and retries reads.
A parser that borrows bytes and validates framing.
A domain layer that owns normalized values.

The standard library types should reinforce those layers, not blur them.

The transport layer might expose owned frame storage because it must manage partial reads, capacity reuse, and backpressure. The parser should typically accept std::span<const std::byte> because it inspects caller-owned bytes and either produces a domain object or returns a parse error. The domain layer should return ordinary values, not spans into packet buffers, because business logic should not inherit transport lifetimes by accident.

That sounds obvious when written out. It becomes less obvious when a performance-minded refactor starts threading string_view and span deeper into the system “to avoid copies.” The copy is sometimes the cost of decoupling a stable domain object from a volatile transport buffer. Eliminating it may shift the cost into lifetime complexity, delayed parsing bugs, and review difficulty.

A useful rule is this: borrow at inspection boundaries, own at semantic boundaries. Parsing code often sits at the edge between them.

Where these types hurt

Vocabulary types improve code only when the semantics stay crisp.

std::string_view hurts when developers treat it as a cheap string substitute rather than a borrow. std::span hurts when the code really needs noncontiguous traversal or stable ownership. std::optional hurts when it erases why work failed. std::variant hurts when the set of alternatives is open-ended or frequently extended across modules. std::expected hurts when used deep inside implementation code that would be clearer with a local exception boundary or a simpler helper split.

Another common failure is stacking wrappers until the API stops speaking human. Types such as expected<optional<variant<...>>, Error> are occasionally correct, but they are never cheap for readers. If a contract requires that much decoding, a named domain type is usually overdue.

The point of vocabulary types is not maximal precision at any syntactic cost. The point is to make the dominant semantics obvious enough that a reviewer can reason about ownership, absence, and failure without reverse-engineering the implementation.

Verification and review

The verification burden here is mostly contractual.

Review stored string_view and span members as potential lifetime bugs.
Test parser and boundary APIs with short-lived buffers, sliced inputs, empty inputs, and malformed payloads.
Check whether optional results are silently discarding errors that matter operationally.
Audit container parameters for accidental ownership or representation commitments.
Treat conversions from borrowed views to owned values as meaningful design points, not incidental implementation details.

Sanitizers help, especially when borrowed views cross asynchronous or deferred execution boundaries, but they do not replace API review. Many misuse patterns are logically wrong long before they become dynamically observable.

Takeaways

Prefer borrowing types for inspection boundaries and owning types for storage boundaries.
Use optional, expected, and variant for three different meanings: absence, failure, and closed alternatives.
Do not let containers leak representation choices into APIs unless that representation is part of the contract.
A removed copy is not automatically a win if it pushes lifetime complexity into unrelated layers.
When a vocabulary type stops making the contract easier to read, introduce a named domain type instead of stacking wrappers.

Generic code with concepts and constraints

Generic code is valuable when a family resemblance in the problem is real. It is destructive when authors use templates to postpone interface decisions. The production problem is not “how do I make this reusable?” It is “how do I remove duplication without making the call contract, diagnostics, and failure behavior opaque?”

Concepts are the first C++ feature in a long time that directly improves this situation at the boundary where readers need help most. They do not make generic code automatically simple. They make it possible to state what a template expects in terms close to the actual design. That is a major shift from the era of “instantiate it and hope the compiler error points at the right line.”

This chapter focuses on constrained generic code that ordinary product teams can maintain: reusable transforms, narrow extension points, policy objects, and algorithm families whose assumptions must remain reviewable.

Start with the variation, not with the template

Most bad generic code starts from a false premise: “these functions look similar, so I should template them.” Similar surface syntax is not enough. The real question is which parts of the design are allowed to vary and which invariants must stay fixed.

Imagine an internal observability library that writes metric batches to different sinks: in-memory test collectors, local files, and a network exporter. The invariant parts are straightforward: a batch has a schema, timestamps must be monotonic within a flush, serialization failures must be reported, and shutdown must not lose acknowledged data. The variable part is where bytes go.

That points to a narrow generic seam. It does not justify templating the entire pipeline.

If you template everything from parsing through retry logic through transport mechanics, you are no longer writing reusable code. You are building a second language inside the codebase. Concepts help only if the variation boundary was already honest.

Constrain the boundary so the implementation can stay ordinary

The main practical use of concepts is not clever overload ranking. It is telling callers, and the compiler, what operations your algorithm is allowed to assume.

Consider a batching helper that writes already-serialized records to some sink:

template <typename Sink>
concept ByteSink = requires(Sink sink,
                            std::span<const std::byte> bytes) {
    { sink.write(bytes) } -> std::same_as<std::expected<void, WriteError>>;
    { sink.flush() } -> std::same_as<std::expected<void, WriteError>>;
};

template <ByteSink Sink>
auto flush_batch(Sink& sink,
                 std::span<const EncodedRecord> batch)
    -> std::expected<void, WriteError>
{
    for (const auto& record : batch) {
        if (auto result = sink.write(record.bytes); !result) {
            return std::unexpected(result.error());
        }
    }
    return sink.flush();
}

Notice what this gets right:

The concept names the role, not the implementation technique.
The required operations are few and operationally meaningful.
Failure behavior is part of the contract.
The function body is ordinary code; nothing about the implementation has become more abstract than the problem demands.

The alternative is the classic unconstrained template:

template <typename Sink>
auto flush_batch(Sink& sink, const auto& batch) {
    for (const auto& record : batch) {
        sink.push(record.data(), record.size()); // RISK: hidden, undocumented assumptions
    }
    sink.commit();
}

This version looks shorter. It is worse in every production-relevant way. The assumptions are unstated. The error contract is unclear. The required record shape is accidental. A mismatch produces compiler noise at the use site rather than a crisp statement of the interface.

The error message problem, concretely

To appreciate what concepts actually fix, consider what happens when someone passes a wrong type to the unconstrained version:

struct BadSink {};
BadSink sink;
std::vector<EncodedRecord> batch = /* ... */;
flush_batch(sink, batch);

Without concepts, the compiler instantiates the template body and fails deep inside the implementation. A typical error from a major compiler looks something like:

error: 'class BadSink' has no member named 'push'
    in instantiation of 'auto flush_batch(Sink&, const auto&) [with Sink = BadSink; ...]'
    required from here
note: in expansion of 'sink.push(record.data(), record.size())'
error: 'class BadSink' has no member named 'commit'
note: in expansion of 'sink.commit()'

This is two errors for a simple case. In production, the template is rarely this shallow. The sink might be passed through three layers of adapters, each a template. The actual error appears at the bottom of a deep instantiation stack, and the programmer must mentally unwind the chain to figure out what went wrong. With heavily nested templates and standard library types involved, these diagnostics routinely span dozens of lines.

With the ByteSink concept, the same mistake produces a single, targeted error at the call site:

error: constraints not satisfied for 'auto flush_batch(Sink&, ...) [with Sink = BadSink]'
note: because 'BadSink' does not satisfy 'ByteSink'
note: because 'sink.write(bytes)' would be ill-formed

The error names the concept, names the unsatisfied requirement, and points at the call site rather than the implementation internals. The programmer knows immediately what interface BadSink needs to provide.

What this replaced: SFINAE

Before concepts, the standard technique for constraining templates was SFINAE (Substitution Failure Is Not An Error). The idea was to make the template signature itself ill-formed for wrong types, so the compiler would silently remove it from overload resolution rather than producing a hard error.

The equivalent of the ByteSink constraint in pre-C++20 code looked like this:

// SFINAE approach (using enable_if to constrain the same interface)
template <typename Sink,
          std::enable_if_t<
              std::is_same_v<
                  decltype(std::declval<Sink&>().write(
                      std::declval<std::span<const std::byte>>())),
                  std::expected<void, WriteError>
              > &&
              std::is_same_v<
                  decltype(std::declval<Sink&>().flush()),
                  std::expected<void, WriteError>
              >,
              int> = 0>
auto flush_batch(Sink& sink, std::span<const EncodedRecord> batch)
    -> std::expected<void, WriteError>;

This is the same constraint expressed in a form that nobody wants to read. std::enable_if_t with decltype and std::declval is not describing a design intent; it is exploiting a compiler mechanism. The resulting error messages when SFINAE rejects the overload are typically just “no matching function for call to flush_batch” with no indication of which requirement was not met. When multiple SFINAE-guarded overloads exist, the compiler may list every candidate it rejected without explaining why any individual one failed. The concept version is better on all counts: easier to read, better error messages, and simpler to maintain.

Constrain the public surface aggressively so the implementation can remain boring. That is the right trade.

Concepts should describe semantics, not just syntax

A concept that merely checks for the existence of member names is better than nothing, but it can still be a weak design. Production generic code becomes maintainable when the concept corresponds to a semantic role in the system.

SortableRange is better than HasBeginEndAndLessThan. ByteSink is better than HasWriteAndFlush. RetryPolicy is better than CallableWithErrorAndAttemptCount. The more the concept reads like a design term the team already uses, the more useful it becomes in code review and diagnostics.

This matters because concepts serve two audiences.

The compiler uses them to select and reject instantiations.
Humans use them to understand what kind of thing the algorithm expects.

If the name and structure only satisfy the compiler, half the value is gone.

That does not mean concepts should try to prove every semantic law. Most useful invariants remain testable rather than statically enforceable. A RetryPolicy concept can require a call signature and result type. It cannot prove that the policy is idempotency-safe for a specific operation. Accept that limit. State what can be checked in the interface and verify the rest in tests and review.

Prefer narrow customization points over template sprawl

Many reusable components do not need a giant concept hierarchy. They need one or two carefully chosen extension points.

Suppose a storage subsystem wants to support several record types that can be serialized into a wire buffer. A common bad move is to define a primary template, invite specializations scattered across the codebase, and let argument-dependent lookup or implicit conversions decide what happens. This makes behavior hard to discover and easy to break.

The cleaner design is usually a narrow customization point with an explicit required signature. That can be a member function if the type owns the behavior, or a non-member operation if the type should stay decoupled from the serialization library. Either way, concepts should constrain the shape and result.

The companion project examples/web-api/ illustrates this pattern concretely. Its JsonSerializable concept requires exactly one operation — a to_json() member that returns something convertible to std::string:

// examples/web-api/src/modules/json.cppm
template <typename T>
concept JsonSerializable = requires(const T& t) {
    { t.to_json() } -> std::convertible_to<std::string>;
};

The concept is narrow by design: it names a single capability rather than bundling serialization, deserialization, and validation into one monolithic requirement. A type opts in simply by providing a conforming to_json() member — no registration, no base class, no scattered specializations. The result is a customization point where all three review questions have short, local answers.

Similarly, the project defines TaskUpdater to constrain callable parameters passed to the repository’s update method:

// examples/web-api/src/modules/repository.cppm
template <typename F>
concept TaskUpdater = std::invocable<F, Task&> &&
    requires(F f, Task& t) {
        { f(t) } -> std::same_as<void>;
    };

This prevents callers from passing arbitrary callables that return unexpected values or accept the wrong arguments. The concept documents the contract at the boundary rather than relying on template instantiation errors deep in the implementation.

The key is locality. A reviewer should be able to answer three questions quickly.

What exactly may vary?
What invariants remain fixed?
Where does a new model type opt in?

If the answers span ten headers and depend on incidental overload resolution rules, the generic design is already too implicit.

Generic code is not a license to hide costs

Templates are notorious for hiding allocation, copying, and code-size growth behind pretty call sites. Concepts do not solve that. They only make the allowed shapes clearer.

When designing generic code, force yourself to state the operational costs that stay fixed across all models and the costs that vary by model. Does the algorithm require contiguous storage? Does it materialize intermediate buffers? Does a policy type get inlined into every translation unit? Does a concept accept both throwing and non-throwing operations, thereby smearing failure handling across the interface?

These are design questions, not optimization trivia. A template that looks abstract but only performs well for one category of types is often an unstable abstraction. Either narrow the concept or provide distinct overloads for materially different cost models.

This is especially important for headers used across large codebases. Every additional instantiation increases compile cost and potentially code size. If a component crosses shared-library or plugin boundaries, an ordinary virtual interface or type-erased callable may be the better trade, even if it gives up some inlining. Stable boundaries are often worth more than theoretical zero-overhead purity.

When ordinary overloads beat concepts

There is a persistent temptation to use concepts as a proof that a design is modern. Resist it. If you have three known input types and no evidence the set should grow, overloads are often clearer. If you need a runtime-polymorphic boundary, use one. If the variability matters only in tests, a function object or small mockable interface may be easier to maintain than a generic subsystem.

Concepts are strongest when all of the following are true:

the algorithm genuinely applies to a family of types,
the required operations can be stated narrowly,
callers benefit from compile-time rejection,
and the implementation cost model stays legible.

When those conditions fail, templates become a liability quickly.

Failure modes and boundary conditions

Generic code tends to fail in recurring ways.

Unconstrained seepage is common: one generic function accepts “anything range-like,” another expects “anything writable,” and soon the codebase has accidental compatibility between components that were never designed to work together. Concepts should narrow those seams, not widen them.

Constraint duplication is another recurring problem. Authors copy nearly identical requires clauses across helpers until the interface becomes impossible to evolve. Prefer named concepts for recurring requirements. A named concept is documentation, not just syntax compression.

Semantics drift happens when a concept originally built for a clean role gradually accumulates unrelated requirements because one more caller needed one more operation. When that happens, split the concept or split the algorithm. Do not let one abstraction become the dumping ground for vaguely related use cases.

Finally, watch for diagnostic theater: elaborate concept stacks that look principled but still produce unreadable compiler output. If users cannot tell why their type failed to model the concept, simplify. Good generic design includes failure messages people can act on.

Verification and review

Verification for generic code is not just “instantiate it once.”

Add static_assert checks for representative positive and negative models when the concept is central enough to deserve stable examples. The companion project shows this directly — task.cppm verifies that the domain type satisfies its serialization contract at compile time:

// examples/web-api/src/modules/task.cppm
static_assert(json::JsonSerializable<Task>,
              "Task must satisfy JsonSerializable");
static_assert(json::JsonDeserializable<Task>,
              "Task must satisfy JsonDeserializable");

These assertions are living documentation: if someone changes Task in a way that breaks the JSON contract, the compiler rejects the build immediately rather than deferring the failure to a runtime test or, worse, to production.

Test the algorithm with a small number of materially different model types, not a parade of cosmetic variants.
Review whether the concept names a real role in the system or merely a bundle of operations.
Measure compile time and code size if the generic component sits in a hot header path.
Confirm that error behavior, allocation behavior, and ownership assumptions are visible at the constrained boundary.

Compile-time rejection is helpful only if the rejected program was actually wrong according to a design the team can explain. Otherwise you have built a sophisticated gate around an unclear interface.

Takeaways

Write generic code only when the variation in the problem is real and durable.
Use concepts to constrain public boundaries so implementations can remain ordinary and readable.
Name concepts after semantic roles, not after incidental syntax.
Prefer narrow customization points over open-ended specialization schemes.
If compile-time polymorphism makes costs, diagnostics, or boundaries worse, use overloads or runtime abstraction instead.

Ranges, views, and generators

Ranges and generators are attractive because they compress iteration into something that reads like data flow. Sometimes that is exactly what a production codebase needs. Sometimes it is how lifetime bugs, hidden work, and impossible-to-debug lazy behavior enter otherwise plain code.

The useful question is not whether range pipelines are elegant. The useful question is when lazy composition makes the structure of the work clearer than an ordinary loop, and when it obscures ownership, error handling, or cost. C++23 gives you powerful range machinery and std::generator for pull-based sequences. Neither should become the default shape for all iteration.

The sample domains here are realistic ones: filtering logs before export, transforming rows in a batch job, and exposing paged or coroutine-backed sources as pull-based sequences. The design pressure is the same in each case. Work arrives over time. The code wants to express a sequence of transformations. The risks are delayed execution, borrowed state, and confusion about where the data actually lives.

Pipelines earn their place when the dataflow is the main story

An ordinary loop is still the right tool for many jobs. It keeps sequencing obvious, makes side effects explicit, and is easy to step through. A range pipeline earns its place when the essential structure of the work is “take a sequence, discard some elements, transform the survivors, then materialize or consume.”

Suppose a log-export worker receives a batch of parsed records and needs to ship only security-relevant entries after projecting them into an export schema. First, consider the pre-ranges approach using manual iterators:

// Pre-C++20: manual iterator loop with filter + transform
std::vector<ExportRow> export_rows;
for (auto it = records.begin(); it != records.end(); ++it) {
    if (it->severity >= Severity::warning && !it->redacted) {
        ExportRow row;
        row.timestamp = it->timestamp;
        row.service   = it->service;
        row.message   = it->message;
        export_rows.push_back(row);
    }
}

This version works, but the filtering logic and the transformation logic are fused into a single loop body. The reader must parse the if to understand what is kept and parse the body to understand what is produced. In more complex cases, these loops accumulate nested conditions, early continue statements, index arithmetic, and manual bookkeeping that obscure the data flow. Off-by-one errors in index-based variants (for (size_t i = 0; i < records.size(); ++i)) are a persistent source of bugs, especially when the loop body mutates the container or uses the index for more than one purpose.

The range version separates the concerns structurally:

auto export_rows = records
    | std::views::filter([](const LogRecord& r) {
          return r.severity >= Severity::warning && !r.redacted;
      })
    | std::views::transform([](const LogRecord& r) {
          return ExportRow{
              .timestamp = r.timestamp,
              .service = r.service,
              .message = r.message,
          };
      })
    | std::ranges::to<std::vector>();

This is good range code because the pipeline is the business logic. There is no tricky mutation, no lifetime ambiguity in the source, and a clear materialization point at the end. The intermediate storage never existed conceptually, so not allocating it improves both clarity and cost.

Now compare that with a loop that updates shared counters, emits metrics, mutates records in place, and conditionally retries downstream writes. A pipeline there often hides the part that matters most: sequencing and side effects. Range syntax is not a readability win when the computation is stateful and effect-heavy.

Another common pre-ranges pattern worth examining is the “erase-remove” idiom for in-place filtering:

// Pre-C++20 erase-remove idiom
records.erase(
    std::remove_if(records.begin(), records.end(),
                   [](const LogRecord& r) {
                       return r.severity < Severity::warning || r.redacted;
                   }),
    records.end());

This is correct but notoriously easy to get wrong. Forgetting the .end() argument to erase is a well-known bug that compiles but leaves the removed elements in the container. The logic is also inverted: you specify what to remove rather than what to keep, which is a common source of predicate errors. C++20 introduced std::erase_if to simplify this, and range pipelines avoid the problem entirely by producing a new view rather than mutating in place.

The companion project examples/web-api/ contains a small but representative example of this pattern. In repository.cppm, the find_completed method filters tasks by completion status using a views::filter pipeline:

// examples/web-api/src/modules/repository.cppm
[[nodiscard]] std::vector<Task> find_completed(bool completed) const {
    std::shared_lock lock{mutex_};
    auto view = tasks_
        | std::views::filter([completed](const Task& t) {
              return t.completed == completed;
          });
    return {view.begin(), view.end()};
}

The pipeline is the business logic — filter by a predicate, then materialize into an owning result before it crosses the API boundary. There is no lifetime ambiguity because the lock keeps the source alive for the duration and the result is an independent vector.

The rule is straightforward. Use range pipelines for linear dataflow over a sequence. Use loops when control flow, mutation, or operational steps are the story.

Views borrow, and laziness moves bugs later in time

The hardest production bug class with ranges is not algorithmic. It is lifetime. Views are often non-owning, and lazy pipelines delay work until iteration. That means the place where a view is constructed and the place where its bug becomes visible may be far apart.

Consider an anti-pattern that shows up in request processing code:

auto tenant_ids() {
    return load_tenants()
        | std::views::transform([](const Tenant& t) {
              return std::string_view{t.id};
          }); // BUG: returned view depends on destroyed temporary container
}

The code looks tidy. It is wrong. load_tenants() returns an owning container temporary. The view pipeline borrows from that container. Returning the view returns a delayed lifetime bug.

This is the central design rule for views: the owner must outlive the view, and that fact must stay obvious to readers. If the lifetime relationship is subtle, the abstraction is already too clever.

There are several safe patterns.

Build and consume the pipeline in one local scope where the owner is visibly alive.
Materialize an owning result before crossing a boundary.
Return an owning range or domain object when the caller cannot reasonably track the source lifetime.
Use std::generator or another owning abstraction when the sequence is produced over time rather than borrowed from existing storage.

Borrowing is not a defect. Hidden borrowing is.

Do not export deep lazy pipelines as casual APIs

Internally, ranges are often excellent glue. At subsystem boundaries, they deserve more caution. Returning a deep view stack from a public API exports not just a sequence but a bundle of lifetime assumptions, iterator category behavior, evaluation timing, and sometimes surprising invalidation rules.

That is a lot of semantic surface for a caller who may only want “the filtered records.”

In a library or large service boundary, ask what the caller really needs.

If they need an owned result, return one.
If they need pull-based traversal over expensive or paged data, consider a generator.
If they need customizable traversal with local control, expose a callback-based visitor or a dedicated iterator abstraction.

Returning a lazily composed view is strongest inside a local implementation region where one team owns both sides of the contract and the lifetime story is visually short.

The companion project shows two range-aware designs that stay within safe boundaries. First, json.cppm defines a function template that accepts any input_range whose elements satisfy JsonSerializable, combining range constraints with concepts:

// examples/web-api/src/modules/json.cppm
template <std::ranges::input_range R>
    requires JsonSerializable<std::ranges::range_value_t<R>>
[[nodiscard]] std::string serialize_array(R&& range) {
    std::string result = "[";
    bool first = true;
    for (const auto& item : range) {
        if (!first) result += ',';
        result += item.to_json();
        first = false;
    }
    result += ']';
    return result;
}

The function returns an owned std::string — no lazy view escapes the boundary. The range constraint ensures that only collections of serializable types are accepted, and the error message from a constraint violation names the concept rather than pointing at the loop body.

Second, middleware.cppm uses std::ranges::rbegin and std::ranges::rend to iterate a middleware collection in reverse, so that the first middleware in the list wraps outermost:

// examples/web-api/src/modules/middleware.cppm
template <std::ranges::input_range R>
    requires std::same_as<std::ranges::range_value_t<R>, Middleware>
[[nodiscard]] http::Handler
chain(R&& middlewares, http::Handler base) {
    http::Handler current = std::move(base);
    for (auto it = std::ranges::rbegin(middlewares);
         it != std::ranges::rend(middlewares); ++it)
    {
        current = apply(*it, std::move(current));
    }
    return current;
}

This is a good use of range utilities inside a local algorithm: the reverse iteration intent is expressed through rbegin/rend rather than index arithmetic, and the function produces an owned result.

The same caution applies to string_view and span inside pipelines. A transform that produces borrowed slices is fine if the source lifetime stays local and obvious. It is risky if those slices are smuggled across threads, queued for later work, or cached.

`std::generator` is for pull-based sources, not for replacing every loop

C++23’s std::generator is useful because some sequences are not naturally “stored then traversed.” They are produced incrementally: paged database scans, directory walks, chunked file reads, retry-aware polling, or protocol decoders that yield complete messages as bytes arrive.

This is where generators change design. They let the producer keep state between elements without forcing the caller into callback inversion or hand-written iterator machinery.

A batch job that reads pages from a remote API is a good example. Before generators, expressing an incremental page-fetching sequence required either callback inversion or a hand-written iterator class:

// Pre-C++23: hand-written iterator for paged results
class PagedResultIterator {
public:
    using value_type = Row;
    using difference_type = std::ptrdiff_t;

    PagedResultIterator() = default; // sentinel
    explicit PagedResultIterator(Client& client)
        : client_(&client) { fetch_next_page(); }

    const Row& operator*() const { return rows_[index_]; }
    PagedResultIterator& operator++() {
        if (++index_ >= rows_.size()) {
            if (next_token_.empty()) { client_ = nullptr; return *this; }
            fetch_next_page();
        }
        return *this;
    }
    bool operator==(const PagedResultIterator& other) const {
        return client_ == other.client_;
    }

private:
    void fetch_next_page() {
        auto page = client_->fetch(next_token_);
        rows_ = std::move(page.rows);
        next_token_ = std::move(page.next_token);
        index_ = 0;
    }

    Client* client_ = nullptr;
    std::vector<Row> rows_;
    std::string next_token_;
    std::size_t index_ = 0;
};

This is roughly forty lines of boilerplate to express “fetch pages and yield rows.” The same logic with std::generator:

std::generator<Row> paged_rows(Client& client) {
    std::string token;
    do {
        auto page = client.fetch(token);
        for (auto& row : page.rows)
            co_yield std::move(row);
        token = std::move(page.next_token);
    } while (!token.empty());
}

The generator version keeps all state (page token, current buffer, position) implicit in the coroutine frame. The control flow is obvious. The hand-written iterator version is error-prone: the sentinel comparison, the index bookkeeping, and the fetch-on-boundary logic are all manual and easy to get subtly wrong.

Materializing all rows before processing may waste memory and delay first useful work. A generator can express a sequence of yielded rows while keeping page tokens, buffers, and retry state local to the producer.

That said, generators are coroutine machinery. They come with suspension points, frame lifetime, and sometimes allocation cost depending on implementation and optimization. They are not free. They are also harder to debug than a local vector and a loop. Use them when incremental production is the real structure of the problem, not as a fashionable replacement for ordinary containers.

Another boundary question matters here. Does the generator yield owned values or borrowed references into internal buffers? Yielding borrowed references can be correct, but only when the lifetime across suspension points is explicit and easy to reason about. In many cases, yielding small owning values is the safer trade.

If your standard library support for std::generator is incomplete on a target toolchain, the same design guidance still applies to an equivalent coroutine-backed generator type. The question is structural, not vendor-specific.

Laziness helps until observability and error handling matter more

Lazy pipelines are appealing because they delay work until needed. That is often useful. It also means instrumentation, exceptions or expected propagation, and failure attribution may happen later than readers expect.

In a log processing path, a pipeline that filters, parses, enriches, and serializes on demand may look elegant, but operationally it can smear a failure across the eventual consumer. When parsing fails, where does the error belong? When metrics should count discarded records, where is the increment performed? When tracing needs stage-level timing, what exactly is a stage in a fused lazy pipeline?

This is where materialization points matter. Breaking a long pipeline into named phases with owned intermediate results can make the system easier to observe and reason about, even if it costs a little memory. Not every temporary allocation is waste. Some are what let you attach metrics, isolate faults, and put the debugger in the right place.

Do not confuse laziness with efficiency. Sometimes fusing operations reduces work. Sometimes it blocks parallelization, complicates branch prediction, or simply makes the expensive part harder to see. Benchmark the whole path rather than assuming the pipeline form is faster.

Where ranges and generators stop being the right tool

Ranges are a poor fit when mutation is central, when control flow is irregular, when early exits carry important side effects, or when the algorithm is already dominated by external I/O or locking. Generators are a poor fit when a plain container result is cheaper and simpler, when the sequence must be revisited multiple times, or when coroutine lifetime across subsystems would be harder to reason about than a local buffer.

Another common failure is turning pipelines into performance theater. A chain of five adaptors over unstable borrowed state is not better engineering than a loop with three well-named variables. The winning design is the one whose ownership, cost, and failure behavior remain easy to explain.

Verification and review

Ranges and generators deserve specific review questions.

What owns the data a view is traversing, and is that owner visibly alive for the full iteration?
Where does lazy work actually execute, and is that timing acceptable for error handling and metrics?
Is there a deliberate materialization point where ownership or observability should become explicit?
Would an ordinary loop communicate control flow and side effects more clearly?
Does a generator yield owned values or borrowed ones, and is that lifetime valid across suspension?

Dynamic tools matter here. AddressSanitizer and UndefinedBehaviorSanitizer are good at exposing view lifetime mistakes once exercised. Benchmarks help when pipelines are adopted for throughput claims. But review still carries most of the burden because many lazy-lifetime bugs are structurally obvious if the ownership story is traced carefully.

Takeaways

Use range pipelines when linear dataflow is the main story and side effects are secondary.
Treat every view as a borrowing abstraction whose owner must remain obvious and alive.
Avoid exporting deep lazy pipelines across broad API boundaries unless the lifetime contract is genuinely clear.
Use generators for incrementally produced sequences, not as a replacement for simple stored results.
Insert materialization points when observability, error boundaries, or lifetime clarity matter more than maximal laziness.

Compile-time programming without losing your mind

Compile-time programming is one of the places where C++ expertise most easily turns into self-harm. The language lets you move work into constant evaluation, dispatch on types, reject invalid configurations early, and synthesize tables or metadata before runtime begins. Those are real capabilities. They also consume build time, damage diagnostics, spread logic into headers, and tempt engineers to encode business rules in forms nobody wants to debug.

The right production question is not “can this be done at compile time?” It is “what becomes safer, cheaper, or harder to misuse if this is done at compile time, and is that worth the cost to builds and maintainability?”

That framing keeps compile-time techniques in their proper role: they are engineering tools for eliminating invalid states, verifying fixed configuration, and specializing low-level behavior where the variation is truly static. They are not a moral upgrade over runtime code.

Prefer `constexpr` code that still reads like ordinary code

The best modern compile-time programming is often just ordinary code written so it can also run during constant evaluation. If a parser helper, small lookup builder, or unit conversion routine can be constexpr without becoming cryptic, that is usually the sweet spot.

This matters because most of the old pain in C++ metaprogramming came from forcing logic into type-level encodings or template recursion that no human would choose if runtime code were allowed. C++20 and C++23 reduced that pressure substantially. You can often write loops, branches, and small local data structures directly in constexpr functions.

That changes the design trade. If a compile-time routine still looks like normal code, review and debugging stay tolerable. If moving work to compile time requires a second, stranger version of the algorithm, the benefit has to be substantial.

The old world: recursive templates and type-level arithmetic

To appreciate how much constexpr changed, consider a common pre-C++11 task: computing a factorial at compile time. Without constexpr, the only option was recursive template instantiation:

// Pre-C++11: compile-time factorial via template recursion
template <int N>
struct Factorial {
    static const int value = N * Factorial<N - 1>::value;
};

template <>
struct Factorial<0> {
    static const int value = 1;
};

// Usage: Factorial<10>::value

This works, but the logic is encoded in the type system rather than in code. There are no loops, no variables, and no debugger support. Errors from exceeding the recursion depth produce long chains of template instantiation backtraces. More complex computations, such as compile-time string processing or table generation, required increasingly arcane techniques: variadic template packs as value lists, recursive struct hierarchies to simulate arrays, and SFINAE tricks to simulate conditionals.

The modern equivalent is just a function:

constexpr auto factorial(int n) -> int {
    int result = 1;
    for (int i = 2; i <= n; ++i)
        result *= i;
    return result;
}

// Usage: constexpr auto f = factorial(10);

Same result, evaluated at compile time, but written as ordinary code that any C++ programmer can read and that a debugger can step through at runtime if needed. This is the shift that matters: compile-time programming no longer requires a separate mental model.

A more realistic example is compile-time lookup table construction. In the old style, generating a table of, say, CRC values required a recursive template that instantiated itself once per table entry, accumulated results through nested type aliases, and was practically impossible to extend or debug. With constexpr, you write a loop that fills a std::array:

constexpr auto build_crc_table() -> std::array<std::uint32_t, 256> {
    std::array<std::uint32_t, 256> table{};
    for (std::uint32_t i = 0; i < 256; ++i) {
        std::uint32_t crc = i;
        for (int j = 0; j < 8; ++j)
            crc = (crc >> 1) ^ (0xEDB88320u & (~(crc & 1u) + 1u));
        table[i] = crc;
    }
    return table;
}

constexpr auto crc_table = build_crc_table();

This replaces what would have been hundreds of lines of template machinery with a plain function that happens to run at compile time.

Good candidates are fixed translation tables, protocol field layout helpers, validated lookup maps for small enums, and command metadata assembled from constant inputs. These are cases where the inputs are static by nature and computing the result earlier can remove startup work or make invalid combinations impossible.

The companion project examples/web-api/ contains several compact examples of this pattern. In error.cppm, a constexpr function maps error codes to HTTP status integers:

// examples/web-api/src/modules/error.cppm
[[nodiscard]] constexpr int to_http_status(ErrorCode code) noexcept {
    switch (code) {
        case ErrorCode::not_found:      return 404;
        case ErrorCode::bad_request:    return 400;
        case ErrorCode::conflict:       return 409;
        case ErrorCode::internal_error: return 500;
    }
    return 500;
}

This is ordinary code that happens to be usable at compile time. It reads like a runtime function, can be tested at runtime, and can also be evaluated in constexpr contexts — for instance, in static_assert checks that verify the mapping is consistent. A companion function to_reason() does the same for human-readable reason strings, returning std::string_view literals.

Similarly, http.cppm provides constexpr functions for parsing and formatting HTTP method strings:

// examples/web-api/src/modules/http.cppm
[[nodiscard]] constexpr Method parse_method(std::string_view sv) noexcept {
    if (sv == "GET")    return Method::GET;
    if (sv == "POST")   return Method::POST;
    if (sv == "PUT")    return Method::PUT;
    if (sv == "PATCH")  return Method::PATCH;
    if (sv == "DELETE") return Method::DELETE_;
    return Method::UNKNOWN;
}

Both functions are compile-time lookup tables expressed as plain control flow. They require no template machinery, produce clear diagnostics when misused, and remain debuggable at runtime. This is where constexpr pays off cleanly: the inputs are drawn from a small, static set, and the mapping is stable enough that compile-time evaluation adds safety without complexity.

Use `consteval` only when delayed failure would be a design bug

consteval is stronger than constexpr: it requires evaluation at compile time. That is useful when accepting runtime fallback would hide a configuration mistake you never want in production.

Imagine a wire protocol subsystem with a fixed set of message descriptors that must have unique opcodes and bounded payload sizes. Those constraints are not dynamic business logic. They are part of the static shape of the program. Catching a duplicate opcode at compile time is materially better than discovering it during startup or, worse, through a routing bug in integration.

struct MessageDescriptor {
    std::uint16_t opcode;
    std::size_t max_payload;
};

template <std::size_t N>
consteval auto validate_descriptors(std::array<MessageDescriptor, N> table)
    -> std::array<MessageDescriptor, N>
{
    for (std::size_t i = 0; i < N; ++i) {
        if (table[i].max_payload > 64 * 1024) {
            throw "payload limit exceeded";
        }
        for (std::size_t j = i + 1; j < N; ++j) {
            if (table[i].opcode == table[j].opcode) {
                throw "duplicate opcode";
            }
        }
    }
    return table;
}

constexpr auto descriptors = validate_descriptors(std::array{
    MessageDescriptor{0x10, 1024},
    MessageDescriptor{0x11, 4096},
    MessageDescriptor{0x12, 512},
});

The exact error text and mechanism can be refined, but the design is right. These descriptors are static program structure. Rejecting an invalid table during compilation is worth the cost.

The mistake is using consteval to force evaluation of logic that is not inherently static. If a value may legitimately come from deployment configuration, user input, or external data, trying to drag it into compile time usually produces an awkward and brittle design.

`if constexpr` should separate real families, not encode arbitrary business logic

if constexpr is one of the most useful tools in modern generic code because it keeps type-dependent branching local and readable. Used well, it lets one implementation adapt to a small number of meaningful model differences without splitting into a forest of specializations.

Used badly, it turns a function template into a dumping ground for unrelated behavior.

The right use case is something like storage strategy differences between trivially copyable payloads and non-trivial domain objects, or a formatting helper that handles byte buffers differently from structured records while preserving one public contract. The variation belongs to representation or capability.

Before if constexpr, this kind of type-dependent branching required either tag dispatch or SFINAE overload sets:

// Pre-C++17 tag dispatch: two overloads selected by a type trait
template <typename T>
void serialize_impl(const T& val, Buffer& buf, std::true_type /*trivially_copyable*/) {
    buf.append(reinterpret_cast<const std::byte*>(&val), sizeof(T));
}

template <typename T>
void serialize_impl(const T& val, Buffer& buf, std::false_type /*trivially_copyable*/) {
    val.serialize(buf); // requires a member function
}

template <typename T>
void serialize(const T& val, Buffer& buf) {
    serialize_impl(val, buf, std::is_trivially_copyable<T>{});
}

This works but scatters a single logical function across multiple overloads. The reader must trace through the tag dispatch to understand the branching. With if constexpr, the same logic is local and linear:

template <typename T>
void serialize(const T& val, Buffer& buf) {
    if constexpr (std::is_trivially_copyable_v<T>) {
        buf.append(reinterpret_cast<const std::byte*>(&val), sizeof(T));
    } else {
        val.serialize(buf);
    }
}

Both branches exist in the same function. The discarded branch is not instantiated, so it does not need to compile for the actual type. The intent is immediately visible.

The wrong use case is encoding every product-specific rule as another compile-time branch because “the compiler can optimize it away.” That approach ties application policy to type structure and makes the function harder to review each time a new condition is added. When the branching is really about runtime business meaning rather than static type capability, ordinary runtime code is usually clearer.

Compile-time branching is best when it explains a stable family relationship. If it is there mostly to avoid writing a second straightforward function, it is often a mistake.

The main costs are build time, diagnostic quality, and organizational drag

Runtime code has visible execution cost. Compile-time code has visible team cost.

Large constant-evaluated tables, heavily instantiated templates, and header-defined helper frameworks slow incremental builds and make dependency graphs more fragile. Diagnostics from failed constant evaluation can still be difficult to interpret, especially once several templates and concepts stack together. And because compile-time machinery often lives in headers, implementation details leak farther across the codebase than their runtime equivalents would.

This is why production compile-time programming should stay close to a few recurring wins.

Reject statically invalid program structure early.
Remove small startup work for fixed data.
Specialize low-level operations based on static capability.
Keep generated tables and metadata consistent with declared types.

Outside those zones, the return on investment drops quickly.

There is also an organizational cost. Once a team normalizes elaborate compile-time infrastructure, more engineers start building on top of it because it exists, not because it is the clearest solution. The abstraction surface expands. Fewer people can confidently review it. Eventually the project has two complexity layers: runtime code and the compile-time framework that shapes it.

That is why restraint matters more here than almost anywhere else in modern C++.

Code generation is sometimes better than metaprogramming

If the source of truth is external or large, code generation is often the better engineering trade. A protocol schema, telemetry catalog, SQL query inventory, or command registry drawn from external definitions may be easier to validate and evolve with a generator than with an elaborate tower of templates and constexpr parsers.

This is not an admission of defeat. It is a recognition that some complexity is easier to manage in build tooling than in the C++ type system. Generated C++ can still expose clean typed interfaces. The difference is where the complexity lives and how visible the failure modes are.

As a rule, prefer compile-time programming inside C++ when the source data is small, static, and naturally expressed in code. Prefer code generation when the source data is large, external, or already maintained in another format. The break-even point arrives earlier than template enthusiasts like to admit.

Failure modes and boundaries

Compile-time programming tends to fail in familiar ways.

One failure mode is replacing readable runtime code with dense template machinery to save a startup cost that was never measured. Another is pulling deploy-time configuration into compile time, which forces rebuilds for changes that should have remained operational choices. Another is treating constexpr success as proof that the overall design is better, even when build time and diagnostics have become markedly worse.

There is also a boundary around what compile time can prove. It can validate fixed shapes, constant relationships, and type-level capability. It cannot replace integration testing, resource-boundary testing, or operational verification. A compile-time validated dispatch table can still point to handlers whose runtime side effects are wrong.

Keep compile-time logic close to the part of the design that is truly static. Do not let it metastasize into a general architecture style.

Verification and review

Verification here includes both correctness and cost.

Add focused static_assert checks for core compile-time helpers when they encode rules you do not want to regress.
Keep representative runtime tests even for compile-time-built tables and metadata; constant evaluation does not prove dynamic correctness.
Watch incremental build times when adding header-heavy compile-time infrastructure.
Review error messages from failure cases. If the diagnostics are unusable, the abstraction is not production-ready.
Ask whether the same outcome could be achieved with simpler runtime code or with code generation.

The last question is the one teams skip most often, and it is usually the most valuable.

Takeaways

Prefer constexpr code that still looks like ordinary code.
Use consteval only when runtime fallback would represent a real design error.
Apply if constexpr to stable capability differences, not to arbitrary business branching.
Count build time, diagnostics, and reviewability as first-class costs.
When compile-time machinery stops clarifying the static structure of the program, step back to simpler runtime code or to generation tooling.

Interface Design and Dependency Direction

This chapter assumes you already read function signature design, ownership, invariants, and failure boundaries. The question here is not how to write a function, but how to shape a boundary that survives team growth and system pressure.

The Production Problem

Most interface damage does not come from obviously bad code. It comes from locally reasonable choices that harden into system-wide coupling. A storage layer returns database row types because that is what it already has. A service boundary takes a giant configuration object because future options might matter. A library accepts callbacks as std::function everywhere because it seems flexible. Six months later, tests require half the dependency graph, call sites leak transport concerns, and changing an implementation detail becomes a breaking change.

This chapter is about source-level interface design: what a boundary exposes, which direction dependencies should point, and how to keep policy, ownership, and representation from leaking across layers. It is not a chapter about runtime dispatch mechanics; that belongs in the next chapter. It is not a chapter about binary compatibility or distribution; that belongs in the chapter after that. The focus here is narrower and more important: deciding what one part of the program is allowed to know about another.

The core rule is simple: dependencies should point toward stable policy and domain meaning, not toward volatile implementation detail. In practice, that means interfaces should be built from concepts the caller already understands, not from the callee’s storage, transport, framework, or logging choices.

Why This Gets Expensive

Bad dependency direction multiplies cost in ways that code review often misses.

If domain logic depends directly on SQL row types, protobuf-generated classes, HTTP request wrappers, or filesystem traversal state, then every test, benchmark, and refactor must drag those details with it. The dependency graph becomes wider than the design requires. Build times increase because transitive includes and templates spread implementation detail everywhere. Review quality drops because boundary violations become normal. Most importantly, design choices that should have been local stop being local.

The cost goes beyond compile time. It is also conceptual stability. A good interface can survive a database change, a queue replacement, or a logging rewrite. A bad one requires the rest of the codebase to relearn internal facts that were never their concern.

Start From the Boundary Question

Before writing an interface, force the production question into one sentence.

In a native service, the question is usually not “how do I expose the repository?” It is “how does the order workflow ask for customer credit state?” In a shared library, it is not “how do I surface the parser internals?” It is “what contract does the caller need in order to validate and transform incoming records?”

That shift matters because it changes the shape of the types. Interfaces designed around implementation nouns tend to leak mechanisms. Interfaces designed around work and invariants tend to stay narrow.

An interface should answer four questions clearly:

What capability does the caller need?
Which side owns data and lifetime?
Where does failure get translated into the chapter-3 error model?
Which policies are fixed here, and which remain caller choices?

If those answers are vague, the interface is probably mixing layers.

Dependency Direction Means Policy Direction

Dependency inversion is often explained mechanically: depend on abstractions, not concretions. That is correct and not sufficient. The useful test is whether the dependency arrow follows stable policy.

In a service, business rules change slower than transport glue. Fraud policy should not depend on HTTP handlers. Order validation should not depend on SQL record wrappers. Domain logic can define the port it needs, and the database or network adapter can implement that port.

That does not mean every boundary needs an abstract base class. Many do not. Sometimes the right boundary is a free function taking domain data. Sometimes it is a concept-constrained template in an internal library. Sometimes it is a value-type request and result object with no virtual dispatch anywhere. The design decision is not “where do I place an interface type?” It is “which side gets to name the contract?”

The side with the more stable vocabulary should usually name it.

Anti-pattern: Interface Defined by the Dependency

When implementation detail names the contract, the dependency arrow is already wrong.

// Anti-pattern: domain code now depends on storage representation.
struct AccountRow {
    std::string id;
    std::int64_t cents_available;
    bool is_frozen;
    std::string fraud_flag;
};

class AccountsTable {
public:
    virtual std::expected<AccountRow, DbError>
    fetch_by_id(std::string_view id) = 0;
    virtual ~AccountsTable() = default;
};

std::expected<PaymentDecision, PaymentError>
authorize_payment(AccountsTable& table, const PaymentRequest& request);

This looks testable because it uses an abstract base class. It is still the wrong seam. The payment workflow should not know that available credit is stored as cents next to a fraud flag string loaded from a table row. The abstraction preserved the dependency but did not improve its direction.

A better port is named by the workflow and returns the minimum stable facts the workflow needs.

struct CreditState {
    Money available;
    bool frozen;
    RiskLevel risk;
};

class CreditPolicyPort {
public:
    virtual std::expected<CreditState, PaymentError>
    load_credit_state(AccountId account) = 0;
    virtual ~CreditPolicyPort() = default;
};

std::expected<PaymentDecision, PaymentError>
authorize_payment(CreditPolicyPort& credit, const PaymentRequest& request);

Now the workflow depends on domain meaning rather than storage shape. The adapter that talks to SQL does the translation. That translation is work, but it is the right work: contain volatility near the volatile thing.

Anti-pattern: Fat Interfaces That Attract Everything Nearby

A bloated interface does not just violate aesthetics. It creates coupling gravity: every new feature gets bolted onto the existing surface because adding a method is easier than rethinking the boundary.

// Anti-pattern: a "god interface" that mixes query, mutation, lifecycle,
// metrics, and configuration concerns in one surface.
class UserService {
public:
    virtual std::expected<UserProfile, ServiceError>
    get_profile(UserId id) = 0;

    virtual void update_profile(UserId id, const ProfilePatch& patch) = 0;

    virtual void ban_user(UserId id, std::string_view reason) = 0;

    virtual std::vector<AuditEntry>
    get_audit_log(UserId id, TimeRange range) = 0;

    virtual void flush_cache() = 0;

    virtual MetricsSnapshot get_metrics() const = 0;

    virtual void set_rate_limit(RateLimitConfig config) = 0;

    virtual ~UserService() = default;
};

This interface has at least four unrelated axes of change: user data access, moderation policy, operational observability, and runtime configuration. A caller who only needs to read a profile now transitively depends on audit, caching, metrics, and rate-limiting types. Test doubles must implement seven methods to fake one. Adding a new moderation action forces recompilation of read-only consumers. The interface is not flexible. It is a dependency sink that makes every change expensive and every test fragile.

The fix is to split along responsibility boundaries:

class UserProfileQuery {
public:
    virtual std::expected<UserProfile, ServiceError>
    get_profile(UserId id) = 0;
    virtual ~UserProfileQuery() = default;
};

class ModerationActions {
public:
    virtual void ban_user(UserId id, std::string_view reason) = 0;
    virtual std::vector<AuditEntry>
    get_audit_log(UserId id, TimeRange range) = 0;
    virtual ~ModerationActions() = default;
};

Now read-only consumers depend only on UserProfileQuery, moderation tools depend on ModerationActions, and operational concerns live in yet another interface. Each can evolve independently. Test doubles are trivial.

Anti-pattern: Leaking Implementation Details Through the Interface

Even a small interface can damage a system if it exposes the wrong types.

// Anti-pattern: interface leaks the JSON library into every consumer.
#include <nlohmann/json.hpp>

class RetryConfigProvider {
public:
    virtual nlohmann::json load_retry_config() = 0;
    virtual ~RetryConfigProvider() = default;
};

Every translation unit that includes this header now depends on the JSON library, whether or not it cares about JSON. Changing to TOML, YAML, or a binary config format becomes a breaking change across the entire codebase. The JSON library’s compile time, macro definitions, and transitive includes spread into unrelated components. Worse, callers must navigate a JSON tree to extract retry parameters—initial backoff, max backoff, max attempts—scattering implicit schema knowledge throughout the codebase.

The fix is to return domain-meaningful types:

struct RetryConfig {
    std::chrono::milliseconds initial_backoff;
    std::chrono::milliseconds max_backoff;
    std::uint32_t max_attempts;
};

class RetryConfigProvider {
public:
    virtual std::expected<RetryConfig, ConfigError>
    load_retry_config() = 0;
    virtual ~RetryConfigProvider() = default;
};

Now the JSON dependency stays inside the adapter implementation. Consumers work with typed, validated values. The interface communicates domain meaning, not serialization format.

Anti-pattern: Wrong Abstraction Level

Interfaces pitched at the wrong level of abstraction force callers to do work that should be encapsulated, or prevent them from doing work they need.

// Anti-pattern: too low-level. Caller must assemble SQL semantics
// even though this is supposed to abstract away storage.
class DataStore {
public:
    virtual std::expected<RowSet, DbError>
    execute_query(std::string_view sql) = 0;

    virtual std::expected<std::size_t, DbError>
    execute_update(std::string_view sql) = 0;

    virtual ~DataStore() = default;
};

This interface claims to abstract storage, but it exposes SQL as a string protocol. Callers must know the schema, construct correct SQL, and parse RowSet results. The abstraction prevents neither SQL injection nor schema coupling. It is a pass-through that adds indirection without reducing dependency.

Conversely, an interface can be too high-level and prevent legitimate use:

// Anti-pattern: too high-level. No way to paginate, filter,
// or control what gets loaded.
class OrderRepository {
public:
    virtual std::vector<Order> get_all_orders() = 0;
    virtual ~OrderRepository() = default;
};

The right abstraction level matches the operations the caller actually performs, using domain vocabulary and offering enough control to be efficient.

Keep Interfaces Small by Separating Commands From Queries

Bloated interfaces usually come from mixing unrelated reasons to change. A boundary that both retrieves state, mutates state, emits audit events, opens transactions, and exposes metrics snapshots is not flexible. It is a dependency sink.

Splitting commands from queries is often enough to recover clarity. Query paths usually want value-oriented request and result types, predictable cost, and no hidden mutation. Command paths usually want explicit ownership transfer, clear side effects, and strong failure semantics. Treating them as one interface encourages accidental coupling because callers start depending on whatever was convenient to put there last quarter.

Smaller interfaces also help review quality. A reviewer can ask whether each function belongs to the same boundary at all. Once an interface becomes a bag of nearby operations, that question stops being easy.

The TaskRepository in examples/web-api/src/modules/repository.cppm illustrates a narrow interface that stays focused. Its public surface is only CRUD: create, find_by_id, find_all, find_completed, update, remove, and size. There is no logging method, no configuration knob, no metric snapshot, no cache flush. Locking strategy (std::shared_mutex), storage representation (std::vector<Task>), and ID generation (std::atomic<TaskId>) are private. Callers depend on domain operations, not on how the repository happens to implement them.

Data Shapes: Accept Stable Views, Return Owned Meaning

Chapter 4 covers local signature choices. At interface boundaries, the same rules become architectural.

Inputs should usually accept non-owning views when the callee does not need to retain data: std::string_view, std::span<const std::byte>, spans of domain objects, or lightweight request structs referencing caller-owned data. That keeps call sites cheap and honest.

Outputs should usually return owned values or domain objects with clear lifetimes. Returning a view into adapter-owned storage, a borrowed pointer into a cache line, or an iterator into internal state turns a boundary into a lifetime puzzle. That is rarely worth it.

The asymmetry is deliberate. Borrow from callers when cost matters and retention does not. Return ownership when crossing back out, because the callee controls its internals and should not force callers to care how long those internals stay alive.

There are exceptions. Hot-path parsers, zero-copy data pipelines, and memory-mapped processing stages may intentionally return views. When they do, the lifetime boundary must be part of the interface contract, not tribal knowledge. A type like ParsedFrameView tied to a specific buffer owner is much safer than leaking naked std::string_view or raw pointers and hoping reviewers notice the coupling.

Do Not Smuggle Policy Through Optional Parameters

One of the fastest ways to make an interface unclear is to use configuration objects or default parameters to push policy decisions into places where callers cannot reason about them.

If a function has flags such as skip_cache, best_effort, emit_audit, allow_stale, and retry_count, the function is probably doing too many jobs. The problem is not aesthetics. The problem is that callers can now form combinations whose semantics are unclear, untested, or operationally dangerous.

Prefer one of three alternatives:

Split the capability into separate operations with clearer names.
Promote policy to an explicit type whose invalid states are impossible or obvious.
Move policy selection up a layer so the lower-level interface stays deterministic.

An interface is easier to evolve when policy is named explicitly instead of hidden in parameter soup.

Testability Is a Consequence, Not the Goal

Teams often justify an interface by saying it improves testing. That is backward. The primary question is whether the boundary reflects the real design. If it does, testing usually gets easier. If it does not, test doubles only preserve the mistake.

For example, introducing a repository interface solely so unit tests can fake database access is weak reasoning if the domain still depends on table-shaped data and transport-shaped errors. The tests may become easier to write while the design stays wrong.

Good boundaries produce better tests because they isolate policy from mechanism. You can test business logic against simple fakes because the business logic asks for domain facts, not framework objects. You can integration-test the adapter separately because translation is contained in one place. This is a stronger outcome than “we can mock it now.”

Use Concepts and Templates Internally, Not as Public Escape Hatches

Modern C++ makes it easy to encode interfaces as constraints rather than virtual classes. This is often the right choice inside a component or within a tightly controlled codebase. A constrained template can keep code allocation-free, inlineable, and more expressive than a deep hierarchy.

But a public interface that tries to be everything through templates often stops being an interface at all. It becomes a policy surface, a compile-time integration mechanism, and a documentation burden at the same time. Error messages get worse, build dependencies widen, and call-site expectations become murky.

Use concept-constrained interfaces when all of the following are true:

The caller and callee are built together.
The customization point is central to performance or representation.
You can state the semantic contract clearly, not just the syntax contract.

If those conditions do not hold, a smaller value-oriented API or a runtime boundary is often better.

Failure Translation Belongs at the Boundary

An interface is also where failure semantics become explicit. If the adapter speaks SQL exceptions, gRPC status codes, or platform error values, that does not mean the rest of the system should.

Translate failures as close to the volatile dependency as practical. The domain-facing interface should expose failure categories the caller can actually act on. This keeps business logic from depending on transport or vendor error taxonomies, and it makes logging and retry behavior much easier to reason about.

Do not over-normalize into useless generic errors. “Operation failed” is not a boundary model. The point is to expose stable decision-relevant categories while containing unstable backend detail.

The example project in examples/web-api/ shows this pattern concretely. The result_to_response() function in handlers.cppm sits at the boundary between domain logic and HTTP transport:

// examples/web-api/src/modules/handlers.cppm
template <json::JsonSerializable T>
[[nodiscard]] http::Response
result_to_response(const Result<T>& result, int success_status = 200) {
    if (result) {
        return {.status = success_status, .body = result->to_json()};
    }
    return http::Response::error(result.error().http_status(),
                                 result.error().to_json());
}

Domain code works exclusively with Result<T> and ErrorCode from the error module. The HTTP status code mapping is defined once in error.cppm via to_http_status(), and the translation into an HTTP response happens at the handler layer. No domain type knows what an HTTP response looks like. No handler leaks domain error internals into the transport. The boundary translates, and each side speaks its own vocabulary.

When Not to Abstract

Some code should depend directly on a concrete type. Over-abstraction creates indirection, hides cost, and makes simple paths harder to read.

If a type is local to one subsystem, has one obvious implementation, and changing it would not produce a different deployment or test strategy, a direct dependency is often correct. Internal helper types, parsers, allocators scoped to a component, and single-backend pipeline stages do not become better because they grew ports.

The test is not whether abstraction is theoretically possible. The test is whether a boundary isolates a real axis of change or policy. If not, keep the dependency concrete and local.

Verification and Review Questions

Interface design should be reviewed with the same discipline as performance or concurrency.

Ask these questions:

Does the interface expose domain meaning or implementation detail?
Are ownership and lifetime obvious at the boundary?
Are failure types translated into something decision-relevant?
Could a caller use this API correctly without knowing storage, transport, or framework internals?
Does the dependency arrow point toward the more stable policy vocabulary?
Is any abstraction here justified by a real axis of change, or only by a desire to mock?

Verification goes beyond code review. Integration tests should exercise adapters at the actual boundary where translation happens. Build profiling is also useful: if a supposedly clean interface still drags large transitive dependencies everywhere, the design may be source-level coupling in disguise.

Takeaways

Interface design is mostly about deciding what must not leak.

Keep dependency direction aligned with stable policy, not convenient implementation. Accept cheap borrowed inputs when retention is unnecessary, but return owned meaning when crossing a boundary. Split interfaces by responsibility instead of building bags of operations. Translate failures where volatile dependencies enter the system. Abstract only where there is a real design seam.

If a caller must understand your database schema, transport wrapper, framework handle, or internal storage lifetime in order to use your API correctly, the boundary is already doing too much. That is the signal to redesign before the coupling becomes normal.

Runtime Polymorphism, Type Erasure, and Callbacks

This chapter assumes you already know how to define a good source-level boundary. The question now is how to represent runtime variability at that boundary without lying about cost, lifetime, or ownership.

The Production Problem

Sooner or later, production C++ code must choose behavior at runtime. A service picks a retry strategy from configuration. A scheduler stores work submitted by arbitrary callers. A library accepts hooks for telemetry, filtering, or authentication. A plugin host loads behavior from separately compiled modules. None of these problems can be solved with templates alone, because the concrete type is not known at compile time where the decision matters.

This is where teams often reach for a familiar abstraction and keep using it everywhere. Some use virtual functions for everything. Some wrap everything in std::function. Some build hand-rolled type erasure around void* and function pointers because they fear allocations. The result is usually an interface that works but hides important facts: whether the callable is owned, whether it can allocate, whether it can throw, whether it can outlive captured state, and whether dispatch cost matters in the actual hot path.

This chapter separates the main runtime indirection tools in modern C++23: classic virtual dispatch, type erasure, and callback forms. The goal is not to crown one as best. The goal is to choose the smallest tool that matches the lifetime, ownership, and performance requirements of the boundary you are building.

First Decide What Kind of Variability You Need

Not all runtime flexibility is the same.

There are at least four common cases:

A stable object protocol with multiple long-lived implementations.
A callable submitted for later execution.
A short-lived hook invoked synchronously during one operation.
A plugin or extension point crossing packaging or ABI boundaries.

These cases look similar from ten thousand feet because each involves calling “something dynamic.” They differ sharply in what the callee needs to own, how long the behavior must live, and which costs matter.

If you skip this classification step, the wrong abstraction can look acceptable for years. A synchronous hook might accidentally require heap allocation because it is modeled as std::function. A background task system might store borrowed callback state because its API looked convenient. A plugin system might expose C++ class hierarchies across compiler boundaries and call that architecture. The mistakes are structural, not syntactic.

Virtual Dispatch: Good for Stable Object Protocols

Virtual functions are still the clearest tool when you need a stable protocol over a family of long-lived objects. A storage backend interface, a message sink, or a strategy object chosen once and reused heavily can all fit this model.

The strengths are well understood:

Ownership is usually explicit in the surrounding object graph.
The protocol is easy to document as named operations.
The interface can evolve carefully through additional methods only when necessary.
Tooling, debuggers, and reviewers understand it immediately.

The weaknesses are just as real. Hierarchies tempt over-generalization. Per-call dispatch is indirect and hard to inline. The interface must commit to object identity and mutation semantics even when a simpler callable would do. Public inheritance also couples the caller to one representation of customization: objects with vtables.

Use virtual dispatch when the abstraction is naturally an object protocol. Do not use it just because behavior varies.

Anti-pattern: Deep Inheritance Hierarchies and Fragile Base Classes

Virtual dispatch becomes a liability when it grows into deep or wide hierarchies where the base class accumulates obligations over time.

// Anti-pattern: a growing base class that every derived type must satisfy.
class Widget {
public:
    virtual void draw(Canvas& c) = 0;
    virtual void handle_input(const InputEvent& e) = 0;
    virtual Size preferred_size() const = 0;
    virtual void set_theme(const Theme& t) = 0;
    virtual void serialize(Archive& ar) = 0;    // added in v2
    virtual void animate(Duration dt) = 0;       // added in v3
    virtual AccessibilityInfo accessibility() = 0; // added in v4
    virtual ~Widget() = default;
};

Every new virtual method forces every derived class to implement it or inherit a possibly wrong default. Classes that only need drawing must still address input, serialization, animation, and accessibility. Testing a simple leaf widget requires constructing Canvas, InputEvent, Theme, Archive, and AccessibilityInfo objects. The base class becomes a change amplifier: a single addition to Widget triggers recompilation and potential modification of every derived class across the codebase.

This is the fragile base class problem. The hierarchy appears extensible but is actually brittle because the base class interface keeps growing to serve every consumer.

Diamond inheritance and semantic ambiguity

Multiple inheritance of interface hierarchies introduces diamond problems that virtual inheritance only partially addresses.

class Readable {
public:
    virtual std::expected<std::size_t, IoError>
    read(std::span<std::byte> buffer) = 0;
    virtual void close() = 0;  // close the read side
    virtual ~Readable() = default;
};

class Writable {
public:
    virtual std::expected<std::size_t, IoError>
    write(std::span<const std::byte> data) = 0;
    virtual void close() = 0;  // close the write side
    virtual ~Writable() = default;
};

// Diamond: what does close() mean here? Read side? Write side? Both?
class ReadWriteStream : public virtual Readable, public virtual Writable {
public:
    // Single close() must now serve two different semantic contracts.
    // Callers holding a Readable* expect close() to close the read side.
    // Callers holding a Writable* expect close() to close the write side.
    // There is no way to satisfy both through one override.
    void close() override { /* ??? */ }
};

Virtual inheritance solves the layout duplication but not the semantic conflict. The result is code that compiles but whose behavior depends on which base pointer the caller holds. This ambiguity is structural. It does not go away with more careful implementation.

Contrast: type erasure avoids these problems

Type erasure sidesteps hierarchies entirely. Each erased wrapper defines its own minimal contract without forcing unrelated types into a common base.

// No base class. No hierarchy. No diamond.
// Any type that is callable with the right signature works.
using DrawAction = std::move_only_function<void(Canvas&)>;
using InputHandler = std::move_only_function<bool(const InputEvent&)>;

struct WidgetBehavior {
    DrawAction draw;
    InputHandler handle_input;
};

// A simple widget only provides what it needs.
// No obligation to implement serialize, animate, or accessibility.
WidgetBehavior make_label(std::string text) {
    return {
        .draw = [t = std::move(text)](Canvas& c) { c.draw_text(t); },
        .handle_input = [](const InputEvent&) { return false; }
    };
}

There is no base class to grow. Adding animation support does not force label widgets to change. Testing draw does not require constructing an InputEvent. Each concern is independently composable. The cost is that you lose the named-object-protocol clarity of a class hierarchy, which may matter if the protocol is genuinely stable and rich. The tradeoff is worth evaluating case by case.

Type Erasure: Good for Owned Runtime Flexibility

Type erasure is the right tool when you need to store or pass runtime-selected behavior without exposing the concrete type, but you do not need an inheritance hierarchy in the user model.

std::function is the most familiar example, but in C++23 std::move_only_function is often the better default when copyability is not part of the contract. Many submitted tasks, completion handlers, and deferred operations are naturally move-only because they own buffers, promises, file handles, or cancellation state. Requiring copyability there is not flexible. It is misleading.

Type erasure buys three things:

The caller can provide arbitrary callable types.
The callee can own the callable past the current stack frame.
The public contract can talk about invocation semantics instead of concrete class design.

It also introduces costs that must be treated as design facts rather than implementation trivia: possible heap allocation, indirect call overhead, larger object representation, and sometimes loss of noexcept or cv/ref-qualification detail unless you model it carefully.

For many systems, these costs are acceptable. For some, especially hot dispatch loops or high-rate scheduler internals, they are decisive. Measure in the actual workload before arguing from aesthetics.

The middleware system in examples/web-api/src/modules/middleware.cppm demonstrates type erasure composition in practice. Middleware is defined as std::function<http::Response(const http::Request&, const http::Handler&)> – a type-erased callable that wraps a handler and produces a new handler. The apply() function composes one middleware with one handler; chain() composes a range of middlewares around a base handler by folding in reverse order:

// examples/web-api/src/modules/middleware.cppm
template <std::ranges::input_range R>
    requires std::same_as<std::ranges::range_value_t<R>, Middleware>
[[nodiscard]] http::Handler
chain(R&& middlewares, http::Handler base) {
    http::Handler current = std::move(base);
    for (auto it = std::ranges::rbegin(middlewares);
         it != std::ranges::rend(middlewares); ++it)
    {
        current = apply(*it, std::move(current));
    }
    return current;
}

No inheritance hierarchy is involved. Each middleware (logging, CORS, content-type enforcement) is an independent callable. They compose through chain() without knowing about each other. The result is a single http::Handler that can be handed to the server. This is exactly the strength of type erasure: composable behavior without coupling implementations into a class tree.

Common pitfall: `std::function` forces copyability on move-only state

std::function requires its target to be copyable. This seems harmless until real callback state enters the picture.

// This will not compile. std::function requires CopyConstructible.
auto handler = std::function<void()>{
    [conn = std::make_unique<DbConnection>()](){ conn->heartbeat(); }
};

Teams work around this by wrapping unique pointers in shared pointers, adding reference counting and shared mutation to code that was naturally single-owner. The workaround compiles but weakens the ownership model.

// Workaround: shared_ptr "fixes" compilation but lies about ownership.
auto conn = std::make_shared<DbConnection>();
auto handler = std::function<void()>{
    [conn]() { conn->heartbeat(); }
};
// Now conn is shared. Who shuts it down? When? The ownership story is gone.

std::move_only_function avoids this entirely. If your callback is submitted, queued, or deferred and will not be copied, it is the correct default in C++23.

Callback Forms: Borrowed, Owned, and One-Shot

The word “callback” hides three different contracts.

Borrowed synchronous callbacks

These are invoked during the current call and not retained. Logging visitors, per-record validation hooks, or traversal callbacks often fit here. The key property is that the callee does not store the callable.

In this case, forcing the API through an owning wrapper is usually unnecessary. A templated callable parameter may be simplest when the call stays internal. If you need a non-templated surface, a lightweight borrowed callable view can work, but standard C++23 does not yet ship function_ref. Many teams therefore use constrained templates for synchronous hooks inside a component and reserve type erasure for cases that genuinely need ownership.

Owned deferred callbacks

A queue, scheduler, timer wheel, or asynchronous client often needs to retain work beyond the current stack frame. This is the natural home for std::move_only_function or a custom erased task type with explicit allocation rules.

Here the questions are concrete:

Does the queue own the callable?
Is copying part of the API, or only moving?
Can submission allocate?
Must the callable be noexcept?
What happens at shutdown to unrun callbacks?

These are interface questions, not implementation details.

One-shot completion handlers

A completion path that fires exactly once is often better modeled as move-only from the start. This aligns the type with reality and prevents accidental sharing of stateful handlers.

class TimerQueue {
public:
    using Task = std::move_only_function<void() noexcept>;

    void schedule_after(std::chrono::milliseconds delay, Task task);
};

This signature says something important. The queue takes ownership, may run later, and expects the task not to throw across the scheduler boundary. That is much more precise than a generic std::function<void()> parameter.

Anti-pattern: Flexible Surface, Ambiguous Lifetime

Many callback bugs come from APIs that look flexible but fail to say who owns what.

// Anti-pattern: ambiguous retention and capture lifetime.
class EventSource {
public:
    void on_message(std::function<void(std::string_view)> handler);
};

void wire(EventSource& source, Session& session) {
    source.on_message([&session](std::string_view payload) {
        session.record(payload);
    });
    // RISK: if EventSource stores the handler past Session lifetime, this dangles.
}

The reference capture is the immediate hazard, but the deeper problem is that the API failed to state whether the handler is borrowed for one operation, stored until unregistration, invoked concurrently, or called during destruction. The abstraction chose generality over a usable contract.

A stronger design names the ownership and lifetime model explicitly. If the callback is retained, registration should return a handle or registration object whose lifetime governs subscription. If the callback is synchronous, the API should not accept an owning erased callable just to appear generic.

The main.cpp in examples/web-api/ demonstrates scoped lifetime discipline for captured references. The TaskRepository is declared before the router and handlers, and the server is declared last. Because C++ destroys local variables in reverse order of construction, the repository outlives every handler and middleware that captures a reference to it. This ordering is not accidental – it is the structural guarantee that no erased callable will ever dangle on its captured repo reference. When callbacks capture by reference, the enclosing scope must enforce that the referent outlives the callable. Scoping order in main() is where that guarantee lives.

Virtual Protocol or Erased Callable?

A practical decision rule helps.

Choose a virtual interface when behavior is naturally described as a named object protocol with multiple operations or meaningful state transitions. A metrics sink with record_counter, record_histogram, and flush is an object protocol. A storage backend with get, put, and remove is an object protocol.

Choose an erased callable when the abstraction is fundamentally “here is work or logic to invoke later” rather than “here is an object with a semantic role.” Retry policies, completion handlers, task submissions, and predicate hooks usually fit this pattern better.

When teams confuse the two, the code gets awkward fast. Virtual single-method types often signal that a callback would be clearer. Conversely, enormous callable signatures with tuples of parameters often signal that a real protocol object should exist.

The example project in examples/web-api/ shows both sides of this choice in the same codebase. Router in router.cppm is an object protocol: it has a fluent registration API (get, post, put_prefix, etc.) and manages a route table with multiple named operations. This is naturally a named object with state, not a single callable. By contrast, http::Handler is defined as std::function<Response(const Request&)> – a type-erased callable. Each handler is “here is work to invoke when a request matches.” The router’s to_handler() method collapses the entire route table into a single erased callable, bridging the two models cleanly.

Cost Models Matter, but Usually in Specific Places

Runtime indirection always has a cost model. The mistake is to discuss it abstractly.

Virtual dispatch cost is usually tiny relative to IO, parsing, locking, allocation, or cache misses. In a hot numeric inner loop or per-packet classifier running millions of times per second, it may be very real. Type erasure may allocate; whether that matters depends on submission rate, allocator behavior, and tail latency sensitivity. Small-buffer optimization can help, but relying on unspecified thresholds is not a stable contract.

Do not guess. If dispatch is on a hot path, measure branch behavior, instruction footprint, allocation rate, and end-to-end throughput in representative scenarios. If it is not hot, choose the abstraction that makes ownership and lifetime easiest to reason about.

Exceptions, Cancellation, and Shutdown Semantics

Runtime indirection often sits at boundaries where failure handling gets sloppy. A callback API that does not document whether exceptions may cross it is unfinished. A task queue that accepts work without defining shutdown semantics is unfinished. A plugin hook that can reenter the host while invariants are half-updated is unfinished.

Decide explicitly:

Can the callable throw? If not, encode that in the type when practical.
If invocation fails, who translates or logs the failure?
Can callbacks run concurrently?
How are pending callbacks cancelled or drained during shutdown?
Is reentrancy allowed?

These decisions matter more than whether the abstraction used a vtable or an erased function wrapper.

Packaging Boundaries Change the Answer

Inside one program built as a unit, all three techniques are fair game. Across plugin boundaries or public SDKs, the answer changes. Exposing C++ runtime polymorphism across binary boundaries brings ABI assumptions with it. Erased callables that capture allocator or exception behavior can also become unstable across module or shared-library boundaries.

That is why chapter 11 treats packaging and ABI as a separate problem. Runtime flexibility inside a process is one design question. Binary contracts between separately built artifacts are another. Do not let a convenient in-process abstraction accidentally become your distribution contract.

Verification and Review Questions

Review runtime indirection by asking what the type promises, not what the implementation currently does.

Does the boundary own the callable or merely borrow it?
Can the callable be retained, moved across threads, or invoked during shutdown?
Is copyability required by the contract, or only inherited from a convenient wrapper?
Are exception and cancellation semantics explicit?
Is dispatch cost material in this workload, or are hidden allocations the real issue?
Would a named protocol object be clearer than a giant callable signature, or vice versa?

Verification should include allocation tracing for callback-heavy paths, sanitizer coverage for lifetime bugs in captured state, and targeted benchmarks when dispatch sits in a measured hot path. Unit tests alone are weak here because the most expensive failures are usually lifetime races, shutdown bugs, and throughput collapse under sustained load.

Takeaways

Runtime indirection is not one thing. It is a set of tools for different lifetime and ownership models.

Use virtual dispatch for stable object protocols. Use type erasure when you need owned runtime-selected behavior without exposing a hierarchy. Use callback forms that match whether invocation is synchronous, deferred, or one-shot. Prefer std::move_only_function over std::function when ownership is single and copyability would misstate the contract.

Most importantly, make lifetime, retention, exception behavior, and shutdown semantics visible at the boundary. Hidden allocations are annoying. Hidden lifetime rules are how systems fail.

Modules, Libraries, Packaging, and ABI Reality

This chapter assumes the source-level interface is already well designed. The question now is how that interface behaves when code is built, shipped, versioned, and consumed across real toolchains.

The Production Problem

Teams often talk about modules, libraries, and ABI as if they were one topic. They are related and not interchangeable.

Modules are mostly about source organization, dependency hygiene, and build scalability. Library packaging is about how code is distributed and linked: static library, shared library, header-only package, source distribution, internal monorepo component, or plugin SDK. ABI is about whether separately built binaries can agree on layout, calling convention, exception behavior, allocation ownership, symbol naming, and object lifetime.

Treating these as one problem causes expensive mistakes. A team adopts C++20 modules and assumes this somehow stabilizes a public binary boundary. Another ships a shared library whose public headers expose std::string, std::vector, exceptions, and inline-heavy templates across compilers, then discovers that “works on our build agent” is not a compatibility strategy. A plugin host exports C++ class hierarchies and learns too late that compiler version changes are now deployment events.

This chapter keeps the distinctions sharp. Source hygiene is valuable. Distribution choices are architectural. ABI stability is a contract you either design intentionally or do not offer.

Modules Solve Source Problems, Not Binary Problems

C++ modules help with parse cost, macro isolation, and dependency control. Those are real wins, especially in large codebases with header-heavy libraries. A well-factored module interface can reduce accidental exposure of implementation detail and make the intended import surface clearer.

But modules do not create a portable binary contract. They do not erase compiler ABI differences. They do not guarantee the same layout rules, exception interoperability, or standard library binary compatibility across vendors. They are not a substitute for packaging strategy.

What modules replace: the header inclusion model and its hazards

Without modules, C++ compilation is textual inclusion. Every #include pastes a header’s full text into the translation unit. This creates three classes of real problems.

Include order dependencies. If header A defines a macro or type that header B consumes, swapping #include order can silently change behavior or break compilation. This is not hypothetical. Large codebases accumulate implicit ordering contracts that no one documents.

// order_matters.cpp
#include <windows.h>    // defines min/max as macros
#include <algorithm>     // std::min/std::max are now broken

auto x = std::min(1, 2); // compilation error or wrong overload

Macro pollution. Every macro defined in every transitively included header is visible everywhere below. A library that #defines ERROR, OK, TRUE, CHECK, or Status can silently collide with unrelated code. The classic defense (include guards, #undef, NOMINMAX) is fragile and must be remembered at every inclusion site.

// some_vendor_lib.h
#define STATUS int
#define ERROR -1

// your_code.cpp
#include "some_vendor_lib.h"
#include "your_domain.h"  // any enum named ERROR or type named STATUS is now broken

enum class Status { ok, error };  // fails to compile: STATUS expands to int

Transitive dependency explosion. Including one header can pull in hundreds of others. A seemingly small change to an internal header triggers recompilation of thousands of translation units. Build times scale with total transitive include depth, not with the actual dependency graph of the program.

Modules address all three problems: they do not leak macros, they have well-defined import semantics independent of order, and they export only what is explicitly declared. This is a meaningful improvement for source hygiene, even though it does not touch binary compatibility.

Module syntax in practice

The example project in examples/web-api/ is structured as seven C++20 module interface units. Each .cppm file declares a named module, explicitly imports its dependencies, and exports only the public surface. Here is the structure of a typical module:

// examples/web-api/src/modules/error.cppm
module;

// Global module fragment: standard headers that predate modules
#include <cstdint>
#include <expected>
#include <format>
#include <string>
#include <string_view>

export module webapi.error;   // module declaration — names this module

export namespace webapi {
    enum class ErrorCode : std::uint8_t { not_found, bad_request, conflict, internal_error };
    struct Error { /* ... */ };

    template <typename T>
    using Result = std::expected<T, Error>;
}  // only what is inside 'export' is visible to importers

Modules that depend on other modules use import declarations rather than #include:

// examples/web-api/src/modules/handlers.cppm
module;
#include <format>
#include <string>
// ...

export module webapi.handlers;

import webapi.error;        // typed error model
import webapi.http;         // Request, Response, Handler
import webapi.json;         // JsonSerializable concept
import webapi.repository;   // TaskRepository
import webapi.task;         // Task, TaskId

The global module fragment (between module; and export module ...;) is where standard library headers live, since they predate the module system and must be included textually. Each import names a specific module – there is no transitive inclusion. handlers.cppm imports webapi.error but does not accidentally pull in everything that error.cppm itself includes. The export keyword controls visibility: only exported names are reachable by importers. Private helpers and non-exported types stay invisible.

The consuming side is simpler too. In main.cpp, six import declarations replace what would have been a chain of #include directives and all their transitive dependencies:

// examples/web-api/src/main.cpp
import webapi.handlers;
import webapi.http;
import webapi.middleware;
import webapi.repository;
import webapi.router;
import webapi.task;

No include guards, no macro collisions, no order sensitivity. The build system sees the module dependency graph directly and compiles modules in the right order. This is the source-level improvement that modules deliver.

That means the first decision is not “should we use modules?” It is “what are we promising consumers?”

If the answer is internal source reuse inside one repository with one toolchain baseline, modules may be excellent. If the answer is “we ship a public SDK consumed by unknown build systems and compiler versions,” modules may still help your own build, but they do not remove the need for strict binary-boundary discipline.

Packaging Choices Express Operational Intent

Packaging is where architecture meets deployment.

Header-only or source-distributed libraries

These avoid many ABI promises because consumers compile the code into their own program. The cost is compile time, larger dependency surfaces, and more exposure of implementation detail. Templates, concepts, and inline functions fit naturally here. This is often a good choice for internal generic utilities or narrow public libraries where performance and optimizer visibility matter more than distribution simplicity.

Static libraries

Static linkage simplifies deployment and avoids some runtime compatibility issues. It can still create ODR and allocator-boundary problems if the public interface is careless, but it usually reduces cross-version operational complexity. Static libraries fit well for internal components deployed as one unit or for consumers who prefer self-contained binaries.

Shared libraries and SDKs

These offer deployment and patching advantages, but now you own a real binary boundary. That means symbol visibility, versioning policy, exception rules, allocator ownership, and data layout are no longer private engineering choices. They are part of your product behavior.

Plugin boundaries

These are the harshest case because host and plugin may be built separately, loaded dynamically, upgraded independently, and sometimes compiled with different flags or compilers. Here, the safest public boundary is often C ABI plus opaque handles and explicit function tables, even if the internal implementation is modern C++ throughout.

The packaging decision should be made from operational constraints, not from what looks elegant in local code.

Internal Library Versus Public Binary Contract

Many libraries never need stable ABI. That is normal.

If producer and consumer are rebuilt together from the same commit and toolchain, source compatibility matters much more than ABI stability. In that environment, modern C++ APIs can be expressive. Returning vocabulary types, using templates, adopting modules, and relying on inlining may all be good tradeoffs.

The moment you need independently upgradeable binaries, the constraints change. Now even innocent-looking public types become liability. A changed private member order, different standard library implementation, different compiler, or different exception model can break consumers without any source-level signature change.

Do not accidentally promise stable ABI just because you shipped a DLL once.

ABI Stability Requires Deliberate Narrowing

Stable ABI is expensive because it forbids many convenient language habits at the boundary.

These are common sources of ABI fragility:

Exposing standard library types in public binary interfaces.
Exposing class layouts whose size or members may change.
Throwing exceptions across compiler or runtime boundaries.
Allocating on one side and freeing on the other without a shared allocator contract.
Exporting inline-heavy templates or virtual hierarchies as the binary extension mechanism.

That does not mean standard C++ library types are bad. It means they are often the wrong public binary boundary.

Concrete ABI breakage scenarios

These are real scenarios, not theoretical risks.

Adding a private member changes class size. A consumer compiled against v1 of a library allocates objects based on sizeof(Widget) at their compile time. If v2 adds a private member, the library’s methods now write past what the consumer allocated. The result is silent memory corruption, not a linker error.

// v1: shipped in libwidget.so
class EXPORT Widget {
    int x_;
    int y_;
public:
    void move(int dx, int dy);  // accesses x_, y_
};
// sizeof(Widget) == 8 for the consumer

// v2: added a z-index member
class EXPORT Widget {
    int x_;
    int y_;
    int z_;  // sizeof(Widget) is now 12
public:
    void move(int dx, int dy);  // same signature, same symbol
};
// Consumer still allocates 8 bytes. Library writes 12. Corruption.

Different standard library implementations. A shared library built with libstdc++ exposes std::string in its API. A consumer built with libc++ links against it. The two implementations have different internal layouts (SSO buffer sizes, pointer arrangements). Calling across this boundary corrupts string state. There is no compile-time or link-time diagnostic.

Compiler flag mismatch. Building the library with -fno-exceptions and the consumer with exceptions enabled can produce incompatible stack unwinding behavior. Building with different -std= flags can change the layout of standard types. Building with different struct packing or alignment flags changes ABI silently.

ODR violations from header-only libraries

Header-only libraries are popular because they avoid binary distribution complexity. They introduce a different class of problems: One Definition Rule violations.

If two translation units include the same header-only library but compile with different flags, preprocessor definitions, or template arguments that affect inline function behavior, the linker may silently pick one definition and discard the other. The program contains code compiled against two different assumptions linked into one binary.

// translation_unit_a.cpp
#define LIBRARY_USE_SSE 1
#include "header_only_math.hpp"  // vector ops use SSE intrinsics

// translation_unit_b.cpp
// LIBRARY_USE_SSE not defined
#include "header_only_math.hpp"  // vector ops use scalar fallback

// Both define the same inline functions with different bodies.
// Linker picks one. Half the program uses the wrong implementation.
// No diagnostic. Possible wrong results or crashes.

This is not a contrived scenario. Libraries that use #ifdef to select code paths, or that behave differently based on NDEBUG, _DEBUG, or platform macros, can produce ODR violations in any project that mixes compilation settings. Sanitizers (specifically -fsanitize=undefined with ODR violation detection) and link-time tools like ld’s --detect-odr-violations can catch some of these, but not all.

For a stable shared-library or plugin contract, prefer opaque handles, narrow C-style value types, explicit ownership functions, versioned structs, and clear lifetime rules. Internally, use modern C++ aggressively. At the boundary, be conservative because consumers pay for your binary ambiguity.

Anti-pattern: Public Binary Surface Mirrors Internal C++ Types

// Anti-pattern: fragile ABI surface for a shared library.
class EXPORT Session {
public:
    virtual std::string send(const std::string& request) = 0;
    virtual ~Session() = default;
};

std::unique_ptr<Session> create_session();

This interface is attractive inside one build. As a public SDK boundary it is risky.

std::string representation and allocator behavior are implementation details. std::unique_ptr bakes in deleter and runtime assumptions. Virtual dispatch across the boundary ties host and consumer to compatible object model details. Exceptions may also leak unless documented and controlled. The interface has effectively made your compiler, standard library, and build flags part of the contract.

For a true cross-binary boundary, a versioned C ABI is often safer.

struct session_v1;

struct request_buffer {
    const std::byte* data;
    std::size_t size;
};

struct response_buffer {
    const std::byte* data;
    std::size_t size;
};

struct session_api_v1 {
    std::uint32_t struct_size;
    session_v1* (*create)() noexcept;
    void (*destroy)(session_v1*) noexcept;
    status_code (*send)(session_v1*, request_buffer, response_buffer*) noexcept;
    void (*release_response)(response_buffer*) noexcept;
};

This is less pretty and much more honest. The boundary names allocation ownership, versioning surface, and error transport explicitly. Internally, the implementation can still use std::expected, std::pmr, coroutines, modules, and any C++23 technique that stays behind the wall.

The Pimpl Tradeoff Still Exists

For C++ consumers within one toolchain family, the pimpl pattern still has a place. It can reduce rebuild fanout, hide private members, and preserve class size across some implementation changes. It also adds indirection, allocation, and complexity. Pimpl is not a free modernization badge.

Use it when all of the following are true:

You need to hide representation or reduce compile-time exposure.
The object is not so hot that another pointer chase is a measurable problem.
The library truly benefits from keeping class layout stable.

Do not reach for pimpl just because headers are messy. Modules may solve that source problem better for internal builds. Pimpl is primarily a representation and compatibility tool, not a style requirement.

Modules in Real Build Systems

C++23-first advice has to stay realistic. Modules are valuable and still operationally uneven across toolchains, package managers, and mixed-language build systems.

Inside a controlled build environment with GCC 14+, Clang 18+, or MSVC 17.10+, modules can reduce parse overhead and make dependency intent clearer. In heterogeneous environments, the module artifact model, build graph integration, and package-manager support may still impose friction. That friction is not an argument against modules. It is a reminder that adoption belongs to build architecture, not just to language enthusiasm.

A good default is pragmatic:

Use modules first for internal components built together.
Avoid making public package consumption depend on module support unless your consumer ecosystem is controlled.
Keep the binary contract decision separate from the module decision.

The examples/web-api/ project demonstrates this pragmatic approach. Its seven .cppm files (error, task, json, http, repository, handlers, middleware, router) form a clear module dependency graph built together from one CMakeLists. Standard library headers remain in the global module fragment because they are not yet modularized on all toolchains. The project does not attempt to export its modules as a public package – it uses modules to organize its own source. That is the right starting point.

Versioning Policy Is Part of the Interface

Packaging without versioning policy is wishful thinking. Consumers need to know what kind of change is allowed between releases: source-compatible only, ABI-compatible within a major version, or no promises beyond exact build matching.

This policy affects technical design. If ABI compatibility within a major version matters, your public types must be narrowed aggressively and your rollout process must include ABI review. If consumers rebuild from source, the policy can be looser and the interfaces more idiomatic.

Versioning goes beyond semantic version numbers. It includes symbol versioning where available, inline namespace strategy for source-level APIs, feature detection, deprecation windows, and package metadata that correctly describes compiler and runtime requirements.

Memory, Exceptions, and Ownership Across the Boundary

Most cross-library failures are not glamorous. They come from ownership mismatches.

If one side allocates memory and the other deallocates it, the allocator contract must be explicit. If exceptions are allowed to cross the boundary, runtime and compiler assumptions must align. If the boundary uses callbacks, retention and thread-affinity rules must be documented. If background work continues during unload, the packaging design is already unsafe.

// Anti-pattern: cross-boundary allocation mismatch.
// Library (built with MSVC debug runtime, uses debug heap):
EXPORT char* get_name() {
    char* buf = new char[64];
    std::strcpy(buf, "session-001");
    return buf;
}

// Consumer (built with MSVC release runtime, uses release heap):
void use_library() {
    char* name = get_name();
    // ...
    delete[] name;  // CRASH: freeing debug-heap memory on release heap
}

The fix is to never let allocation and deallocation cross the boundary. The library that allocates must also provide the deallocation function, or the boundary must use caller-provided buffers.

// Safe: library owns both allocation and deallocation.
EXPORT char* get_name();
EXPORT void free_name(char* name);

// Also safe: caller provides the buffer.
EXPORT status_code get_name(char* buffer, std::size_t buffer_size);

These are the details that turn a locally clean interface into an operable library.

Verification and Review Questions

Review packaging and ABI separately from source-level API quality.

Is this library intended for same-build consumption, source consumption, or independently upgradeable binary consumption?
Are we using modules to improve source hygiene while mistakenly assuming they solve ABI?
Does the public boundary expose types whose layout or runtime behavior we do not actually control?
Is allocation ownership explicit across the boundary?
Are versioning and compatibility promises documented and testable?
Would a C ABI plus opaque handles be safer for this plugin or SDK than exported C++ classes?

Verification should include build-matrix testing across supported compilers and standard libraries, symbol visibility inspection, ABI comparison tooling where relevant, and packaging tests that simulate real consumer integration. A unit test suite inside the producer repository is not enough evidence for a public binary contract.

Takeaways

Modules, packaging, and ABI are three different design axes.

Use modules to improve source boundaries and build scalability. Choose packaging based on deployment and consumer constraints. Promise stable ABI only when you are willing to narrow the public boundary and verify it continuously. Inside the implementation, use modern C++23 freely. At true binary boundaries, prefer explicit ownership, explicit versioning, and conservative surface area.

The sharpest mistake in library design is exporting internal elegance as public binary policy. Source-level beauty does not make ABI risk disappear. Only deliberate boundary design does.

Shared state, synchronization, and contention

This chapter assumes you already reason precisely about ownership and invariants in single-threaded code. The question now is what survives once multiple threads can observe and mutate the same state.

The production problem

Most concurrency failures are not caused by missing primitives. They are caused by vague sharing policy.

A cache is “mostly reads.” A connection pool has “just one mutex.” A metrics registry uses atomics “for speed.” A request coordinator stores a few counters and a queue behind a lock and appears correct in code review. Then production load arrives. Lock hold time couples unrelated work. A reader observes half-updated state because the invariant spans two fields. A background cleanup path waits under the same mutex as the hot path. Throughput collapses before anyone sees a crash, and once crashes do appear they are often data-race UB, deadlock, or starvation rather than a clean failure.

This chapter is about the shape of shared mutable state in production C++23 systems. The core question is not which mutex type to memorize. It is how much state should be shared at all, which invariants require synchronization, how contention appears under real traffic, and what reviewable designs look like when sharing is unavoidable.

Keep the boundary with the next chapters clear. This chapter is about simultaneous access to shared mutable state. Chapter 13 is about coroutine lifetime across suspension. Chapter 14 is about orchestrating groups of tasks, cancellation, and backpressure. Those topics interact, but they are not interchangeable.

Shared mutable state is a cost center

Shared state buys coordination convenience at the price of coupling.

Once two threads can mutate the same object graph, local reasoning stops being enough. Every access now depends on synchronization policy, lock ordering, memory ordering, wakeup behavior, and destruction timing. Reviewers must ask not only whether an operation is correct in isolation, but whether the state can be observed between two steps, whether a callback re-enters under a lock, whether waiting code holds resources needed for progress, and whether contention turns a correct design into a slow one.

That cost means the first concurrency decision should be structural:

Can this state become thread-confined instead?
Can updates be batched, sharded, snapshotted, or message-passed?
If sharing is unavoidable, what invariant must be preserved atomically?

Teams often skip directly to lock selection. That is backward. The expensive choice is the sharing topology, not the spelling of std::mutex.

The safest shared state is the state you never share.

In service code, many apparently shared structures can be split by traffic key, request lifetime, or ownership role. Metrics ingestion can aggregate per thread and merge periodically. A session table can be sharded by session identifier. A cache can separate immutable value blobs from a small mutable index. A queue consumer can own its local work buffer and publish only completed snapshots.

These moves matter more than switching from one primitive to another because they reduce the number of places where correctness depends on interleaving.

Three reductions are especially common in production systems:

Thread confinement

Let one thread or one executor own mutation. Everyone else communicates by message, snapshot, or handoff. This is often the simplest answer for request schedulers, connection managers, and event loops. The benefit is not just fewer locks; the invariants stay local.

Sharding

Partition state by a stable key so that contention is proportional to hot-key concentration rather than total traffic. Sharding does not remove synchronization, but it narrows the blast radius of each critical section.

Snapshotting

If readers dominate and they can tolerate slightly stale data, publish immutable snapshots and update them off to the side. Readers get cheap, stable access. Writers pay the complexity cost once.

None of these is free. Confinement can create bottlenecks. Sharding complicates cross-shard operations. Snapshotting increases allocation and copy cost. But those are explicit costs, which is better than paying accidental contention everywhere.

What happens without synchronization

Before discussing which primitives to use, it is worth seeing what happens when they are absent. The following code has a data race, which is undefined behavior in C++.

// BUG: data race — two threads read and write counter without synchronization.
struct metrics {
    int request_count = 0;
    int error_count = 0;
};

metrics g_metrics;

void record_request(bool success) {
    ++g_metrics.request_count;            // unsynchronized read-modify-write
    if (!success) ++g_metrics.error_count; // same
}

This is not just a correctness risk; it is undefined behavior per the standard. The compiler and hardware are free to reorder, tear, or elide these operations. In practice, counters may lose updates, report impossible values, or corrupt adjacent memory on architectures with non-atomic word stores. Sanitizers will flag this immediately, but sanitizers are not always running in production.

A subtler variant involves multi-field invariants:

// BUG: readers can observe state_ == READY while payload_ is half-written.
struct shared_result {
    std::string payload_;
    enum { EMPTY, READY } state_ = EMPTY;
};

// Writer thread:
result.payload_ = build_payload();   // not yet visible to readers
result.state_ = READY;              // may be reordered before payload_ write

// Reader thread:
if (result.state_ == READY)
    process(result.payload_);        // may see partially constructed string

Even if state_ were atomic, the write to payload_ could be reordered past it without an appropriate memory order. The lesson: data races are not just about single variables. They are about the visibility ordering of related mutations.

Raw mutex misuse vs. scoped guards

Manual lock/unlock is the oldest source of mutex bugs. Consider:

// BUG: exception between lock and unlock leaks the lock.
std::mutex mtx;
std::vector<int> data;

void push(int value) {
    mtx.lock();
    data.push_back(value); // may throw (allocation failure)
    mtx.unlock();          // never reached if push_back throws — deadlock on next access
}

If push_back throws, unlock() is skipped. Every subsequent thread that tries to acquire mtx will block forever. This is not hypothetical; allocation failure under memory pressure or a throwing copy constructor will trigger it.

The fix is mechanical: use RAII guards.

void push(int value) {
    std::scoped_lock lock(mtx);
    data.push_back(value); // if this throws, ~scoped_lock releases the mutex
}

std::scoped_lock handles single and multiple mutexes with deadlock avoidance. std::unique_lock adds the ability to defer locking, transfer ownership, and use condition variables. Prefer scoped_lock unless you need the extra flexibility.

// unique_lock: needed when the lock must be released before scope exit.
void transfer_expired(registry& reg, std::vector<session>& out) {
    std::unique_lock lock(reg.mutex_);
    auto expired = reg.extract_expired(); // modifies registry under lock
    lock.unlock();                        // release before expensive cleanup
    for (auto& s : expired)
        s.close_socket();                 // no lock held — safe to block
    // out is caller-owned, no synchronization needed
    out.insert(out.end(),
        std::make_move_iterator(expired.begin()),
        std::make_move_iterator(expired.end()));
}

Deadlock from inconsistent lock ordering

When code acquires multiple mutexes, inconsistent ordering is the classic deadlock source.

// BUG: deadlock if thread 1 calls transfer(a, b) while thread 2 calls transfer(b, a).
struct account {
    std::mutex mtx;
    int balance = 0;
};

void transfer(account& from, account& to, int amount) {
    std::lock_guard lock_from(from.mtx); // locks 'from' first
    std::lock_guard lock_to(to.mtx);     // then 'to' — opposite order on another thread
    from.balance -= amount;
    to.balance += amount;
}

Thread 1 locks a.mtx and waits for b.mtx. Thread 2 locks b.mtx and waits for a.mtx. Neither can proceed. std::scoped_lock solves this by using std::lock internally to acquire both mutexes with deadlock avoidance:

void transfer(account& from, account& to, int amount) {
    std::scoped_lock lock(from.mtx, to.mtx); // deadlock-free acquisition
    from.balance -= amount;
    to.balance += amount;
}

This is a correctness boundary, not just a convenience. Any design requiring multiple mutexes should either use std::scoped_lock for simultaneous acquisition or enforce a documented total ordering on locks. Ad hoc ordering disciplines rarely survive refactoring.

The performance cost of over-synchronization

Contention is not just about correctness. Excessive locking serializes work that could run in parallel.

// Over-synchronized: every stat update contends on one lock.
class request_stats {
    std::mutex mtx_;
    uint64_t total_requests_ = 0;
    uint64_t total_bytes_ = 0;
    uint64_t error_count_ = 0;
public:
    void record(uint64_t bytes, bool error) {
        std::scoped_lock lock(mtx_);
        ++total_requests_;
        total_bytes_ += bytes;
        if (error) ++error_count_;
    }
};

On a 64-core machine handling millions of requests per second, every thread serializes on one cache line. Lock acquisition, cache-line bouncing, and scheduler wakeups dominate. The better design depends on tolerances:

If exact consistency between fields is unnecessary, use per-thread counters and merge periodically.
If only approximate totals are needed, use std::atomic<uint64_t> with memory_order_relaxed for each counter independently.
If cross-field consistency is required (e.g., error rate = errors / total), keep the mutex but shard by thread or request key.

The point is not that mutexes are slow. It is that one mutex shared across all cores turns a parallel workload into a serial bottleneck. Measure lock hold time and wait time separately; high wait time with low hold time is the signature of over-synchronization.

Design around invariants, not fields

Locks do not protect variables. They protect invariants.

That distinction matters because production objects rarely fail at the field level. They fail when multiple fields must change together and one thread can observe the state between those changes.

A connection pool is not correct because available_count is atomic. It is correct if the following relationship always holds under concurrent access: checked-out connections are not also on the idle list, closed connections are not reissued, and waiters are woken when progress becomes possible. Those are invariant statements. If the design does not name them explicitly, the synchronization boundary is already underspecified.

This is where coarse-grained locking sometimes wins. If one mutex cleanly covers one invariant domain, that may be a better design than several finer locks that allow impossible intermediate states or require fragile lock ordering. Fine-grained locking is not advanced by default. It is often just harder to review.

Anti-pattern: one lock around a growing service object

The most common failure shape is not “no synchronization.” It is “one reasonable lock that gradually became the service boundary.”

// Anti-pattern: one mutex protects unrelated invariants and long operations.
class session_registry {
public:
    std::optional<session_info> find(session_id id) {
        std::scoped_lock lock(mutex_);
        auto it = sessions_.find(id);
        if (it == sessions_.end()) {
            return std::nullopt;
        }
        return it->second;
    }

    void expire_idle_sessions(std::chrono::steady_clock::time_point now) {
        std::scoped_lock lock(mutex_);
        for (auto it = sessions_.begin(); it != sessions_.end();) {
            if (it->second.expires_at <= now) {
                close_socket(it->second.socket); // RISK: blocking work under the lock.
                it = sessions_.erase(it);
            } else {
                ++it;
            }
        }
    }

private:
    std::mutex mutex_;
    std::unordered_map<session_id, session_info> sessions_;
};

This object may survive early testing because it is locally simple. It fails later because unrelated work now shares one queueing point. A read blocks on cleanup. Cleanup holds the mutex while doing I/O. Future features will add metrics, callbacks, and logging inside the same critical section because the lock already exists.

The problem is not just duration of the critical section. It is that the object has no explicit invariant boundaries. Lifetime management, lookup, expiration, and side effects have been collapsed into one synchronization domain.

The better direction is usually to separate state transition from external action: identify which sessions should expire under the lock, move them out, release the lock, then close sockets afterward. That shortens lock scope and makes the invariant easier to state: the protected region updates the registry; external cleanup happens after ownership has been transferred out of the shared structure.

Minimize lock scope, but do not split logic blindly

“Keep lock scope small” is correct and incomplete.

A critical section should contain exactly the work required to preserve the invariant, no more and no less. That means:

No blocking I/O under the lock.
No callbacks into foreign code under the lock.
No allocation-heavy or logging-heavy slow paths under the lock if they can be moved out.
No splitting of logically atomic state updates merely to make the scope visually shorter.

The last point is where teams get into trouble. A lock that protects a multi-step invariant may need to span several operations. If you release it between steps to look “faster,” you may create impossible states. Optimize after you can state what must remain atomic.

Atomics are for narrow facts, not complex ownership

Atomics are useful and easy to misuse.

Use atomics when the shared fact is truly narrow: a stop flag, a generation counter, an index into a ring buffer, a reference count in an already-sound ownership model, or a statistics counter where relaxed ordering is sufficient. Avoid using atomics as a substitute for structured ownership or for multi-field invariants.

The example project’s TaskRepository (examples/web-api/src/modules/repository.cppm) illustrates the distinction. The ID generator is a single monotonic counter — a textbook narrow fact — so it uses std::atomic<TaskId> with memory_order_relaxed. The task collection, on the other hand, is a multi-field invariant (the vector contents must be consistent with the IDs issued), so it is protected by a shared_mutex. Mixing those two strategies in one class is perfectly sound because the scopes do not overlap: the atomic handles one independent fact, the mutex handles everything else.

An atomic counter does not make a queue safe. An atomic pointer does not make object lifetime trivial. A handful of memory_order arguments does not fix a design that lets one thread observe partially published state.

C++23 gives useful tools here, including std::atomic::wait and notify_one or notify_all. They can remove some condition-variable boilerplate for narrow state transitions. They do not change the need to design the state machine first.

If a reviewer cannot explain which value transitions are legal and why the chosen ordering is sufficient, the atomic code is not done.

Reader-heavy data wants different shapes

Contention is often caused less by write frequency than by read design.

A configuration table, routing map, or feature-policy snapshot may be read on every request and updated rarely. Protecting that with a central mutex works functionally and creates avoidable tail latency. In these cases, immutable snapshots or copy-on-write style publication often produce a better system than finer locking.

The tradeoff is explicit:

Readers get stable, low-contention access.
Writers pay copy and publication cost.
Memory pressure may increase due to overlapping generations.
Staleness must be acceptable for the domain.

That is often the right trade in request routing, authorization policy, and read-mostly metadata. It is the wrong trade for highly write-heavy order books or frequently mutating shared indexes.

When neither extreme applies — reads are frequent but writes happen on every create, update, or delete request — std::shared_mutex with std::shared_lock for readers and std::scoped_lock for writers is the pragmatic middle ground. The example project’s TaskRepository (examples/web-api/src/modules/repository.cppm) follows exactly this pattern:

// repository.cppm — reader-writer locking in practice
class TaskRepository {
    mutable std::shared_mutex mutex_;
    std::vector<Task>         tasks_;
    std::atomic<TaskId>       next_id_{1};
public:
    // Reads take shared_lock — multiple readers proceed in parallel.
    [[nodiscard]] std::optional<Task> find_by_id(TaskId id) const {
        std::shared_lock lock{mutex_};
        auto it = std::ranges::find(tasks_, id, &Task::id);
        if (it == tasks_.end()) return std::nullopt;
        return *it;
    }

    // Writes take scoped_lock — exclusive access preserves invariants.
    [[nodiscard]] Result<Task> create(Task task) {
        std::scoped_lock lock{mutex_};
        // validate, assign id, store ...
    }
};

Every read path (find_by_id, find_all, find_completed, size) acquires a shared_lock, allowing concurrent readers. Every write path (create, update, remove) acquires a scoped_lock for exclusive access. The mutex protects the invariant domain — the relationship between tasks_ and next_id_ — not individual fields.

The corresponding stress test (examples/web-api/tests/test_repository.cpp) validates this design under concurrent load:

void test_concurrent_access() {
    webapi::TaskRepository repo;
    constexpr int num_threads = 8;
    constexpr int ops_per_thread = 100;

    std::vector<std::jthread> threads;
    for (int i = 0; i < num_threads; ++i) {
        threads.emplace_back([&repo, i]() {
            for (int j = 0; j < ops_per_thread; ++j) {
                auto title = std::format("Task-{}-{}", i, j);
                auto result = repo.create(webapi::Task{.title = std::move(title)});
                assert(result.has_value());
            }
        });
    }
    threads.clear(); // jthreads auto-join
    assert(repo.size() == num_threads * ops_per_thread);
}

Eight threads hammering create concurrently, then verifying the total count matches expectations. This is a baseline correctness test, not a contention benchmark — but it catches data races and lost updates that would surface under ThreadSanitizer.

Condition variables and wakeup discipline

Condition variables are where many otherwise careful designs become hand-wavy.

The rule is simple: the wait predicate is part of the invariant, not a convenience expression. A waiting thread must re-check a predicate that is protected by the same synchronization domain that makes the predicate meaningful. Notifications are signals to re-check, not proofs that progress is guaranteed.

In practical terms:

Name the predicate precisely: queue not empty, shutdown requested, capacity available, generation changed.
Update the predicate state before notifying.
Keep waiting code robust against spurious wakeups and shutdown races.
Decide whether waking one waiter or all waiters matches the progress model.

Most broken condition-variable code is not broken because the author forgot the loop. It is broken because the predicate is underspecified or split across state that different code paths update inconsistently.

Hidden shared state is still shared state

Concurrency bugs often hide in objects that do not look shared from the call site.

Examples include:

Allocators or memory resources used by many threads.
Logging sinks with internal buffers.
Reference-counted handles with shared control blocks.
Caches behind seemingly pure helper APIs.
Global registries used for plugin discovery, metrics, or tracing.

These deserve the same scrutiny as explicit shared maps and queues. “This helper is thread-safe” is not enough. Ask whether it serializes all callers, whether it allocates under contention, whether it can call user code while holding internal locks, and whether it introduces contention in the hot path without making that cost visible in the API.

Measuring contention changes the design

Correctness is only the first gate. After that, shared-state design is a measurement problem.

Contention rarely appears as a clear source-level smell. It shows up as queueing time, lock hold distributions, convoy behavior, cache-line bouncing, and scheduler-visible stalls. That means verification must include operational evidence:

Measure lock hold time and wait time on hot paths.
Track tail latency, not only throughput averages.
Observe hot-key skew when using sharding.
Profile allocation inside critical sections.
Use ThreadSanitizer for race detection and targeted stress tests for deadlock and starvation patterns.

A design that is logically correct but collapses at the ninety-ninth percentile is still a bad concurrency design.

Review questions for shared state

Before approving concurrent shared-state code, ask:

What exact invariant does each lock or atomic protect?
Could this state be confined, sharded, or snapshotted instead of shared?
Does any critical section perform I/O, allocation-heavy work, logging, or callbacks?
Are cross-field updates truly atomic with respect to observers?
Are condition-variable predicates precise and updated under the right synchronization domain?
Where will contention appear under burst load or hot-key skew?
What evidence do we have beyond “it passed tests”?

If those questions do not have crisp answers, the design is not ready for production load.

Takeaways

Shared mutable state is not the default shape of concurrent design. It is the expensive shape.

When sharing is unavoidable, define invariants before choosing primitives. Prefer confinement, sharding, and snapshots over ever more clever locking. Use mutexes to protect invariant domains, atomics for narrow facts, and condition variables only with clearly stated predicates. Then measure the result under realistic load, because correct synchronization can still produce the wrong system if contention dominates behavior.

Coroutines, tasks, and suspension boundaries

This chapter assumes you already understand failure transport and concurrent shared-state design. The focus here is narrower: what a coroutine owns, what survives suspension, and what goes wrong when asynchronous control flow hides lifetime.

The production problem

Coroutines make asynchronous code easier to read and easier to lie about.

A request handler that previously nested callbacks can become straight-line code. A streaming parser can yield values naturally. A background refresh job can co_await timers and I/O instead of hand-writing a state machine. Those are real gains. But the machinery did not disappear. The state machine still exists. It now lives in a coroutine frame whose lifetime, ownership, and resumption context must be designed on purpose.

Production failures with coroutines usually have one of four shapes:

Borrowed data outlives its source across suspension.
A task has no clear owner, so work outlives the component that started it.
Failure and cancellation paths are implicit, so suspended work resumes into invalid assumptions.
Execution hops across threads or executors in ways the code does not make obvious.

This chapter keeps the scope local. The question is not yet how a whole task tree should be managed under cancellation pressure. That is Chapter 14. The question here is what each coroutine actually is: a resource-owning object with suspension points that define lifetime boundaries.

What coroutines replace: callback hell and manual state machines

To appreciate coroutine design tradeoffs, see what they displace. Pre-coroutine asynchronous code relies on continuation-passing style, where each step chains a callback into the next. A simple “fetch, validate, store” sequence looks like this:

// Continuation-passing style — correct but unreadable at scale.
void handle_request(request req, std::function<void(response)> done) {
    fetch_profile(req.user_id, [req, done](std::expected<profile, error> prof) {
        if (!prof) { done(error_response(prof.error())); return; }
        validate_access(prof->role, req.resource,
            [req, prof = *prof, done](std::expected<bool, error> ok) {
                if (!ok || !*ok) { done(denied_response()); return; }
                store_audit_log(req, prof,
                    [req, prof, done](std::expected<void, error> result) {
                        if (!result) { done(error_response(result.error())); return; }
                        done(success_response(prof));
                    });
            });
    });
}

Every step nests deeper. Error handling is duplicated at each level. Lifetime of captured values must be managed manually — capture by value inflates copies, capture by reference invites dangling. Adding timeout, cancellation, or retry logic multiplies the nesting further. This is not a strawman; it is the shape of real pre-coroutine async C++ in production codebases.

The coroutine equivalent:

task<response> handle_request(request req) {
    auto prof = co_await fetch_profile(req.user_id);
    if (!prof) co_return error_response(prof.error());

    auto ok = co_await validate_access(prof->role, req.resource);
    if (!ok || !*ok) co_return denied_response();

    auto result = co_await store_audit_log(req, *prof);
    if (!result) co_return error_response(result.error());

    co_return success_response(*prof);
}

Sequential reading, single error path per step, no nesting. The improvement is real. But the state machine did not vanish — it moved into the coroutine frame. The rest of this chapter is about what that means for ownership and lifetime.

The example project’s handler layer (examples/web-api/src/modules/handlers.cppm) shows a synchronous version of this same structural benefit. Each handler is a function that accepts const http::Request& (borrowed) and returns http::Response (owned). The control flow is straight-line: parse the path parameter, validate input, call the repository, translate the result to HTTP. Error handling is local to each step, not nested inside callbacks. While these handlers are not coroutines, they demonstrate the same principle — when each step is sequential and error paths are flat, business logic stays readable and reviewable.

A coroutine is a state machine with storage

Treating coroutines as syntax sugar is the fastest way to ship lifetime bugs.

When a function becomes a coroutine, some state moves into a frame. Parameters may be copied or moved there. Locals that live across suspension reside there. Awaiter state may influence when and where execution resumes. Destruction may happen on success, on failure, on cancellation, or when the owning task object is destroyed. None of that is cosmetic.

This matters because ordinary stack intuition stops being reliable. In a non-coroutine function, a local dies when control exits the scope. In a coroutine, a local that spans suspension may live much longer than the caller expects, while a borrowed view into caller-owned storage may become invalid long before resumption.

The core review question is simple: what data must remain valid from one suspension point to the next?

Suspension points are lifetime boundaries

Every co_await is a boundary where ordinary assumptions should be re-checked.

Before suspension, ask:

Which references, spans, string views, iterators, and pointers will still be needed afterward?
Who owns the storage they refer to?
Can the awaited operation outlive the caller, request, or component that initiated it?
On which executor or thread will resumption occur?

This is the local equivalent of API lifetime review. If the coroutine keeps borrowed data across suspension, you must prove that the owner outlives the coroutine or change the coroutine to take ownership.

That is why coroutine APIs often need stricter parameter choices than synchronous ones. A synchronous helper might safely accept std::string_view because it finishes immediately. An asynchronous task that suspends usually should not keep that view unless the ownership contract is extremely tight and documented.

The example project’s Request::path_param_after() (examples/web-api/src/modules/http.cppm) illustrates the safe side of this boundary. It returns std::optional<std::string_view> pointing into the request’s path member. That is safe here because the handler executes synchronously — the Request object outlives the entire handler call. If these handlers were coroutines that suspended mid-execution, the same string_view would become a dangling reference the moment the request buffer was recycled. The design works precisely because the lifetime contract is simple: the request lives for the duration of the handler, and handlers do not suspend.

Anti-pattern: borrowed request state survives suspension

// Anti-pattern: borrowed data may dangle after suspension.
task<parsed_request> parse_and_authorize(
    std::string_view body,
    const auth_context& auth) {

    auto token = co_await fetch_access_token(auth.user_id());
    co_return parse_request(body, token); // BUG: body may refer to caller-owned storage.
}

This code looks efficient because it avoids a copy. It is only correct if the caller guarantees that body remains valid until the coroutine completes. In service code that often means until network I/O, authentication lookup, retries, and timeout handling have all finished. That is not a small promise. It is usually the wrong one.

The safer default is to move ownership into the coroutine when the data is needed after suspension.

task<parsed_request> parse_and_authorize(
    std::string body,
    auth_context auth) {

    auto token = co_await fetch_access_token(auth.user_id());
    co_return parse_request(body, token);
}

The copy or move is visible and reviewable. The coroutine frame now owns what it needs. If that allocation cost matters, measure it and redesign around message boundaries or storage reuse. Do not silently borrow across time.

More lifetime traps: locals, temporaries, and lambda captures

The borrowed-parameter anti-pattern above is the most common case, but coroutine lifetime bugs take other forms that deserve explicit attention.

Dangling reference to a caller’s local

// BUG: coroutine captures a reference to a local that dies when the caller returns.
task<void> start_processing(dispatcher& d) {
    std::vector<record> batch = build_batch();
    co_await d.schedule([&batch] {     // lambda captures batch by reference
        process(batch);                // batch may be destroyed if start_processing
    });                                // is suspended and its caller exits
}

When start_processing suspends at co_await, the coroutine frame keeps batch alive — but only if the frame itself is alive. If the task is detached or the parent scope exits, the frame is destroyed, and the lambda’s reference dangles. The fix: capture by value, or ensure the parent scope outlives the scheduled work through structured ownership.

Temporary lifetime collapse

// BUG: temporary string destroyed before coroutine body executes.
task<void> log_message(std::string_view msg);

void caller() {
    log_message("request started"s + request_id()); // temporary std::string
    // temporary is destroyed here, before the coroutine even begins if lazy-start
}

With a lazy-start coroutine, the temporary std::string is destroyed at the semicolon, but the coroutine has not yet executed. Even with eager-start coroutines, if the frame stores msg as a string_view, it points to freed memory after the first suspension. The solution is to accept std::string by value in the coroutine signature so the frame owns a copy.

The `this` pointer across suspension

// BUG: 'this' may dangle if the object is moved or destroyed while suspended.
class connection {
    std::string peer_addr_;
public:
    task<void> run() {
        auto data = co_await read_socket();    // suspended here
        log("received from " + peer_addr_);    // 'this' may be invalid
    }
};

If a connection object is moved into another container, or destroyed while run() is suspended, this becomes invalid at resumption. Member coroutines are safe only when the object’s lifetime is guaranteed to exceed the coroutine’s. In practice, this often means the coroutine should take a shared_ptr<connection> or the owning scope must be structured to prevent destruction during suspension.

These are not exotic edge cases. They are the normal failure modes of coroutine lifetime in production code.

Task types are ownership contracts

The return type of a coroutine is not decoration. It defines ownership, result transport, and destruction behavior.

A useful task type answers at least these questions:

Does destroying the task cancel work, detach it, block, or leak it?
Is the result observed exactly once, many times, or not at all?
How are exceptions transported?
Can the task start eagerly, or only when awaited?
Is cancellation represented explicitly?

Many coroutine bugs are really task-type bugs. A detached “fire-and-forget” coroutine is not an asynchronous style choice. It is an ownership claim that no later code needs to know when the work finishes, whether it failed, or whether it should be canceled during shutdown. That claim is rarely true in production services.

The conservative default is simple: every started task should have a clear owner and a visible completion path. If you cannot name the owner, you are building orphaned work.

Eager versus lazy start changes failure timing

Whether a coroutine begins running immediately or only when awaited affects correctness, not just performance.

An eager task may start side effects before the caller stores the handle. A lazy task may delay work until orchestration code decides where and when it should run. Both are valid. What matters is that the behavior is consistent and documented.

This influences failure boundaries. If task construction can start work, exceptions and cancellation may become observable before any parent scope thinks the task is “live.” If work starts only on first await or explicit scheduling, ownership is usually easier to reason about.

The recommendation for production code is not one universal policy. It is that the task abstraction must make the policy obvious enough that reviewers do not need to inspect promise-type internals to know when side effects begin.

Resumption context is part of correctness

Coroutine code often reads like it stays on one thread. That is an illusion.

An awaited operation may resume on an I/O thread, a scheduler pool, a UI-affine thread, or an executor chosen by the awaiter. If the code touches thread-confined state after resumption, or if it expects continuation on a specific executor, that requirement must be explicit in the abstraction.

This is where teams recreate callback-era bugs with prettier syntax. The control flow looks sequential, so reviewers stop asking where the continuation runs. Then a coroutine resumes on a pool thread and touches request-local state that was only safe on the initiating executor.

Make resumption policy visible in one of three ways:

The task type carries a clear scheduler or executor contract.
The code explicitly switches context before using thread-affine resources.
The component is designed so post-await code is executor-agnostic.

If none is true, the coroutine is relying on ambient behavior. Ambient behavior breaks under refactoring.

Coroutines do not remove error-boundary design

A co_await does not answer whether failure should throw, return std::expected, request cancellation, or terminate a larger operation. It merely changes the control-flow shape.

That means the error-boundary decisions from Chapter 3 still apply. Keep them consistent inside task APIs:

Use exceptions for domains where stack unwinding across internal layers is acceptable and well understood.
Use result types when failure is expected, compositional, and part of ordinary business flow.
Decide how cancellation appears in the value space, exception space, or task state.
Make timeout handling explicit rather than burying it in an awaiter with surprising policy.

The failure model should be readable at the call site. “This coroutine may suspend” is not enough information. The caller also needs to know what completion means and what failed completion looks like.

Generators and tasks solve different problems

C++23 coroutine support enables both pull-style generators and asynchronous tasks. Do not blur them.

Generators are about staged production of values with a local consumer. They often work well for streaming parse pipelines, tokenization, batched traversal, or incremental transformation. Their main concerns are iterator validity, producer lifetime, and whether yielded references remain valid.

Tasks are about eventual completion of asynchronous work. Their concerns are ownership, scheduling, cancellation, and result transport.

They share machinery and deserve different review questions. A generator bug is often “what does this yielded reference point to?” A task bug is often “who owns this work after suspension?” Keeping those categories separate makes code review sharper.

Destruction and cancellation must compose

Coroutine cleanup paths are easy to ignore because the happy path looks linear.

Ask what happens if the owning scope exits while the coroutine is suspended. Does destruction request cancellation? Does it wait for child operations? Can it race with completion? Are outstanding registrations, file descriptors, timers, or buffers released exactly once?

These are not implementation details. They are the semantic contract of the task abstraction.

If coroutine destruction merely drops the handle while the underlying operation continues somewhere else, that is detach-by-destruction. Sometimes that is intentional. More often it is a shutdown bug waiting to happen.

Verification for coroutine code

Testing coroutine logic with only success-path unit tests is not enough. Verification should target boundary behavior:

Lifetime tests that force the caller-owned data to disappear before resumption.
Cancellation tests that interrupt the coroutine at multiple suspension points.
Scheduler tests that resume on unexpected executors to catch thread-affinity assumptions.
Failure-path tests for exceptions, error results, and timeout races.
Sanitizer runs for use-after-free and race detection when coroutine state interacts with shared objects.

For high-value components, it is often worth writing deterministic test awaiters or fake schedulers so resumption order can be controlled instead of guessed.

Review questions for coroutines

Before approving coroutine code, ask:

What lives in the coroutine frame, and who owns it?
Which borrowed views or references survive across suspension?
Who owns the task after it is started?
When do side effects begin: on construction, on scheduling, or on first await?
On what context does resumption happen?
How are failure, timeout, and cancellation represented?
What happens if the initiating component shuts down mid-suspension?

If those answers are vague, the coroutine is not simpler than callback code. It is just easier to misread.

Takeaways

Coroutines improve control-flow clarity. They do not remove lifetime design.

Treat every suspension point as a boundary where ownership, resumption context, and failure semantics must still make sense. Prefer task types with explicit ownership and completion behavior. Move data into the coroutine frame when it must survive suspension. Keep generators and asynchronous tasks conceptually separate. Most importantly, do not confuse sequential-looking source code with sequential lifetime. Coroutine correctness depends on what persists across time, not on how tidy the co_await chain looks.

Structured concurrency, cancellation, and backpressure

This chapter assumes you already understand local coroutine lifetime and suspension hazards. The focus now is system shape: how groups of tasks begin, end, fail, and apply pressure to each other under real load.

The production problem

Many asynchronous systems fail even when each individual task looks reasonable.

A request fans out to four backends and returns once three respond, but the fourth keeps running after the client disconnects. A worker pipeline accepts input faster than downstream storage can commit it, so memory usage climbs until the process is killed. A shutdown path waits forever because background tasks were detached instead of being part of a supervised tree. A retry storm consumes the very capacity needed for recovery. None of these is primarily a local coroutine bug. They are orchestration bugs.

Structured concurrency is the discipline of making concurrent work follow lexical and ownership structure. Tasks belong to parents. Lifetimes are bounded. Failure propagates somewhere definite. Cancellation is not advisory folklore. Backpressure is part of admission policy, not a dashboard surprise.

This chapter is about those system-level rules. Chapter 12 dealt with shared mutable state. Chapter 13 dealt with what one coroutine owns across suspension. Here the unit of reasoning is a set of tasks that together implement a request path, stream processor, or bounded service stage.

Unstructured work scales failure, not just throughput

The simplest way to start concurrent work is to launch tasks wherever needed and hope completion sorts itself out. That style is attractive because it minimizes immediate coordination. It is also how systems accumulate invisible work.

Detached tasks, ad hoc thread pools, and fire-and-forget retries have three predictable consequences:

Lifetime becomes non-local. The code that started work is no longer responsible for proving when it ends.
Failure becomes observational. Errors surface only if someone remembered to log or poll them.
Capacity becomes fictional. The system keeps accepting work because no parent scope owns admission pressure.

A service can survive this for months if traffic is light and shutdown is rare. Under burst load, deployment churn, or slow downstream dependencies, the hidden work becomes the system.

Fire-and-forget: a catalog of failures

Before contrasting with structured concurrency, it is worth seeing exactly how unstructured work fails. “Fire-and-forget” is not one anti-pattern; it is several, each with a distinct failure mode.

Resource leaks from ownerless work

// Anti-pattern: detached task leaks a database connection on cancellation.
void on_request(request req) {
    std::jthread([req = std::move(req)] {
        auto conn = db_pool.acquire();        // acquired, never returned on some paths
        auto result = conn.execute(req.query);
        send_response(req.client, result);
    }).detach(); // no owner, no cancellation, no cleanup guarantee
}

If the process begins shutting down, detached threads do not receive stop requests. The database connection is not returned to the pool. Multiply this by thousands of in-flight requests during a rolling deployment: the database sees connection exhaustion, and the old process hangs in std::thread destructor calls or, worse, exits while threads still reference destroyed globals.

Unobserved exceptions vanish silently

// Anti-pattern: exception in detached task is never observed.
void start_background_sync() {
    auto handle = std::async(std::launch::async, [] {
        auto data = fetch_remote_config(); // throws on network error
        apply_config(data);
    });
    // handle is destroyed here — std::async's destructor blocks,
    // but if this were a custom fire-and-forget task, the exception
    // would be silently swallowed.
}

With std::async, the destructor blocks (which may be its own surprise). But with most custom task types that support detach, destroying the handle without observing the result means exceptions evaporate. The system continues with stale configuration, and the failure appears only as a mysterious behavioral regression hours later.

Shutdown hangs from orphaned work

// Anti-pattern: shutdown cannot complete because background tasks were never tracked.
class ingestion_service {
    void ingest(message msg) {
        // "just kick off enrichment in the background"
        pool_.submit([msg = std::move(msg), this] {
            auto enriched = enrich(msg);       // calls external service, may block
            store_.write(enriched);
        });
    }

    void shutdown() {
        store_.close();    // closes storage
        pool_.shutdown();  // waits for in-flight tasks
        // BUG: in-flight tasks may call store_.write() after store_ is closed
        // BUG: enrich() may block indefinitely — pool shutdown hangs
    }
};

The pool has tasks, but the service has no model of what those tasks need or how to cancel them. Shutdown either hangs (waiting for a blocked external call) or races (closing dependencies while tasks still use them). In production, this turns a clean restart into a process kill, which turns into data loss.

The structured alternative in brief

The structured answer to all three problems is the same principle: the scope that creates work owns its completion.

// Structured: parent scope owns child tasks, propagates cancellation, awaits completion.
task<void> on_request(request req, std::stop_token stop) {
    auto conn = co_await db_pool.acquire(stop);  // respects cancellation
    auto result = co_await conn.execute(req.query, stop);
    co_await send_response(req.client, result);
    // conn returned to pool when coroutine frame is destroyed
    // if stop is triggered, co_await points observe it and unwind cleanly
}

The parent request scope can cancel the token on client disconnect or deadline. The coroutine’s awaitables check the token at each suspension point. Resources are released through normal RAII. No work outlives its owner. The contrast with fire-and-forget is not style preference; it is the difference between a system that can shut down cleanly and one that cannot.

Structured concurrency means parent scopes own child work

The central idea is simple: if a scope starts child tasks to complete its job, those children should finish, fail, or be canceled before the scope is considered done.

That gives you three properties that ad hoc async code rarely has by default:

The lifetime of work is bounded by a parent operation.
Failures can be aggregated or escalated in one place.
Cancellation and shutdown can follow a tree instead of searching the process for loose ends.

This does not require one specific library. It requires design discipline. A request handler that fans out to multiple backends should not return while those backend calls continue running unless the business contract explicitly permits detached follow-up work and names its owner. A batch consumer should not enqueue downstream tasks without also deciding who drains them on shutdown and who absorbs overload.

Structured concurrency is an ownership rule for time. If Chapter 1 taught that every resource needs an owner, this chapter applies the same principle to concurrent work.

Cancellation must be a first-class contract

Cancellation is often described as a courtesy. In production it is load control.

Once a client disconnects, a deadline expires, or a parent task fails, continuing child work may waste CPU, memory, database capacity, and retry budget. Worse, uncanceled work competes with useful work. Systems under pressure often fail because they keep doing tasks that no longer matter.

Modern C++ gives useful building blocks: std::stop_source, std::stop_token, and std::jthread. The example project’s Server::run(std::stop_token) (examples/web-api/src/modules/http.cppm) uses them directly: the accept loop checks stop_token.stop_requested() on every iteration, and select() with a one-second timeout ensures the check happens promptly even when no clients are connecting. The token is the contract; the timeout-based polling is the mechanism.

But the primitives alone are not enough. The harder question is semantic:

Which operations are cancelable?
At what boundaries is cancellation observed?
What cleanup is guaranteed before completion is reported?
Is partial progress committed, rolled back, or made visible with compensation?

If those questions are unanswered, wiring a stop token through a few functions is theater.

Cancellation also needs direction. Parent-to-child propagation should be the default. Child-to-parent escalation depends on policy: one child failure may cancel siblings, may degrade the result, or may be recorded while work continues. The point is that the rule must be explicit at the scope that owns the group.

Anti-pattern: fan-out without bounded ownership

// Anti-pattern: child work outlives the request and overload has no admission limit.
task<aggregate_reply> handle_request(request req) {
    auto a = fetch_profile(req.user_id);
    auto b = fetch_inventory(req.item_id);
    auto c = fetch_pricing(req.item_id);

    co_return aggregate_reply{
        co_await a,
        co_await b,
        co_await c,
    };
}

This code is tidy and underspecified.

What cancels the three child operations if the client times out after the first await? What prevents ten thousand concurrent requests from starting thirty thousand backend calls immediately? If fetch_inventory hangs, do the others keep running? If one call fails fast, should the rest be canceled or allowed to complete because partial results are useful?

The problem is not that fan-out is bad. It is that the code does not show a supervision policy.

In a structured design, the request scope owns a cancellation source or token, child tasks are started within that scope, deadlines are attached, and concurrency against downstream dependencies is bounded by permits or semaphores. The exact abstraction varies by codebase. The essential property is that the request does not create anonymous work.

Deadlines and budgets beat best-effort timeouts

Timeouts are often implemented locally and inconsistently. One dependency has a 200 ms timeout, another has 500 ms, and the caller has a 300 ms deadline that nobody propagates. The result is wasted work and confusing telemetry.

A better model is budget propagation. A parent operation carries a deadline or remaining budget. Child operations derive their own limits from that budget instead of inventing unrelated ones. This keeps cancellation and latency intent aligned.

The tradeoff is that downstream APIs must accept deadline or cancellation context explicitly, and timeout behavior becomes visible in signatures or task builders. That is a good cost. Hidden timeout policy is usually worse than noisy timeout policy.

Backpressure is admission control, not complaint logging

Backpressure means the system has a deliberate answer to “what happens when work arrives faster than we can finish it?”

Without that answer, work piles up in queues, buffers, retry loops, and coroutine frames. Memory climbs first, latency second, and only then does the outage become obvious. An unbounded queue is not elasticity. It is a promise to convert overload into delayed failure.

Real backpressure mechanisms are concrete:

Bounded queues that reject or defer new work.
Semaphores or permits that limit concurrent access to scarce dependencies.
Producer throttling when downstream stages are saturated.
Load shedding when serving all traffic would destroy latency for all traffic.
Batch sizing and flush policy that match downstream commit cost.

Each mechanism encodes business policy. Which work can be dropped? Which must wait? Which clients receive an explicit overload signal? Those are product decisions expressed as concurrency control.

Bounded concurrency is usually better than bigger pools

When a dependency slows down, many teams first increase pool sizes or queue depths. That often amplifies the problem.

If a database can sustain fifty useful concurrent requests, allowing two hundred in-flight operations mostly increases contention and timeout overlap. The same applies to CPU-heavy parsing stages, compression work, and remote service calls with their own internal bottlenecks.

Bound concurrency where the scarce resource actually is. Make that bound visible in code and telemetry. Then decide what should happen when the bound is reached: wait, fail fast, degrade, or redirect. Bigger pools without policy only hide overload until the whole system is saturated.

Pipelines need pressure to travel upstream

Pipelines are where backpressure discipline becomes unavoidable.

Consider a message consumer that parses records, enriches them with remote lookups, and writes batches to storage. If parsing outruns storage, some stage must slow down. If enrichment outruns parsing, the enrichment stage should not keep creating more in-flight requests just because it can. If shutdown begins, all stages need a coordinated drain or cancel policy.

Good pipeline design therefore names:

The maximum in-flight work per stage.
The maximum queue depth between stages.
Whether a full queue blocks producers, drops input, or triggers load shedding.
Whether cancellation drains partially completed batches or discards them.
Which metrics reveal saturation before memory pressure becomes critical.

This is not optional infrastructure polish. It is the difference between a system that degrades gracefully and one that accumulates work until it dies.

Failure propagation needs policy, not hope

Once work is structured into groups, failure handling becomes a design choice instead of an accident.

Common policies include:

Fail-fast groups, where one child failure cancels siblings because the result is useless without all parts.
Best-effort groups, where some child failures are tolerated and recorded.
Quorum groups, where enough successful children satisfy the operation and the rest are canceled.
Supervisory loops, where failures restart isolated child work under rate limits and budgets.

All four are valid in the right domain. What matters is that the code and abstraction make the policy apparent. Silent continuation after child failure is not resilience; it is ambiguity.

Shutdown is the truth serum

Systems with weak concurrency structure often look fine until shutdown.

A clean shutdown path forces all the hidden questions into the open. Which tasks are still running? Which can be interrupted safely? Which queues must drain? Which side effects may be committed after shutdown begins? Which background loops own the stop source, and who awaits their completion?

This is why shutdown tests are disproportionately valuable. They expose detached work, missing cancellation points, unbounded queues, and tasks with no owner. If a subsystem cannot describe how it stops under load, it does not fully own its concurrency model.

The example project implements a complete structured shutdown chain that puts these principles into practice. The flow spans three files:

Signal handler sets the flag (examples/web-api/src/main.cpp):

std::atomic<bool> shutdown_requested{false};

extern "C" void signal_handler(int /*sig*/) {
    shutdown_requested.store(true, std::memory_order_release);
}

Server::run_until bridges the flag to a jthread stop token (examples/web-api/src/modules/http.cppm):

void run_until(const std::atomic<bool>& should_stop) {
    std::jthread server_thread{[this](std::stop_token st) {
        run(st);
    }};
    while (!should_stop.load(std::memory_order_acquire)) {
        std::this_thread::sleep_for(std::chrono::milliseconds(100));
    }
    server_thread.request_stop();
    // jthread auto-joins on destruction
}

Server::run checks the token on every accept loop iteration:

void run(std::stop_token stop_token) {
    // ...
    while (!stop_token.stop_requested()) {
        // select() with 1-second timeout, then check stop again
        // accept and handle connection ...
    }
    std::println("Server shutting down gracefully");
}

The ownership chain is clear: main owns the Server, the Server owns the jthread, and the jthread owns the stop_source. When Ctrl+C arrives, the signal handler sets the atomic flag, run_until observes it and calls request_stop(), the accept loop exits on the next iteration, and the jthread destructor joins the thread. No detached work, no orphaned connections, no race between shutdown and in-flight requests. This is structured concurrency applied to a real service lifecycle.

Verification and telemetry for structured async systems

Unit tests alone do not validate structured concurrency or backpressure. You need evidence that the system behaves sanely under stress.

Useful verification includes:

Load tests that drive the system past nominal capacity and confirm bounded memory.
Cancellation tests that inject disconnects, deadline expiry, and partial child failure.
Shutdown tests that start work, trigger stop, and verify prompt quiescence.
Metrics for in-flight tasks, queue depth, permit utilization, rejection rate, deadline expiry, and cancellation latency.
Traces or logs that show parent-child linkage so orphaned work is visible.

If observability cannot reveal where work is accumulating or which parent owns it, the structure exists only in the author’s head.

Review questions for structured concurrency

Before approving an asynchronous orchestration design, ask:

Who owns each group of child tasks?
What event cancels them: parent completion, failure, deadline, shutdown, or overload?
What bounds concurrent work against each scarce dependency?
What happens when the queue or permit limit is reached?
Does failure cancel siblings, degrade gracefully, or wait for quorum?
Can shutdown complete promptly under peak load?
Which metrics prove that backpressure is working?

If the answers are unclear, the system is probably running on optimism and spare capacity.

Takeaways

Structured concurrency is ownership applied to time and task trees.

Do not launch anonymous work. Make parent scopes own child lifetimes, propagate cancellation deliberately, and bound concurrency where resources are actually scarce. Treat backpressure as admission policy, not as a tuning afterthought. A system that cannot say when work stops, who cancels it, and how overload is limited does not yet have a concurrency model. It only has asynchronous code.

Data layout, containers, and memory behavior

Production problem

Many C++ performance failures are representation failures disguised as algorithm discussions. A team debates whether a lookup should be $O(1)$ or $O(\log n)$ while the real problem is that the hot path walks heap nodes, touches too many cache lines, or drags cold fields through every iteration. The compiler cannot optimize around a bad data shape. It can only make the chosen shape slightly less expensive.

This chapter is about the decisions that determine how data lives in memory: container selection, element layout, iteration order, invalidation behavior, and how views expose stored data without obscuring lifetime. The focus is not “which container is fastest” in the abstract. The question is narrower and more useful: what representation makes the dominant operations cheap in the system you are actually building?

Use realistic pressure when answering that question. Think about a request router with millions of route entries, a telemetry pipeline ingesting dense time-series samples, a game or simulation update loop scanning component state every frame, or an analytics job walking event batches. In those systems, container choice is not an implementation detail. It is part of the performance contract.

When big-O stops explaining runtime

Asymptotic complexity is necessary and insufficient. It filters out obviously bad designs, but it does not describe memory traffic, branch predictability, prefetch behavior, false sharing, or the price of indirection. A std::list insert is constant time, but that fact is almost irrelevant if every useful operation also requires pointer chasing through cold memory. A sorted std::vector can beat a hash table for moderate sizes because contiguous binary search and cheap iteration often dominate nominal lookup complexity.

That mismatch matters because production workloads are rarely uniform. Some operations dominate the total cost. Others are latency-sensitive enough that a few extra cache misses matter more than one extra comparison. Before choosing a container, state the actual workload shape in plain language:

Is the dominant operation full traversal, point lookup, append, erase in the middle, or batched rebuild?
Is the data mostly read-only after construction, or constantly mutated?
Do you care about stable addresses, stable iteration order, or predictable invalidation rules?
Is the data set small enough to fit in private cache, or large enough that TLB and memory bandwidth behavior dominate?

If those answers are missing, container selection will drift toward folklore.

Default toward contiguous storage

For data that is traversed often, contiguous storage should be the default starting point. std::vector, std::array, std::span, std::mdspan, flat buffers, and columnar arrays win repeatedly because hardware rewards predictable access. Sequential scans let the processor prefetch effectively, amortize TLB work, and keep branch behavior simple. That advantage is often larger than the algorithmic edge of a theoretically “more advanced” structure.

The difference is not subtle. Consider summing values stored in a std::vector<int> versus a std::list<int> with the same element count:

#include <list>
#include <vector>
#include <numeric>
#include <cstdint>

// Contiguous: hardware prefetcher has a good day.
std::int64_t sum_vector(const std::vector<int>& v) {
    return std::accumulate(v.begin(), v.end(), std::int64_t{0});
}

// Scattered: every node is a pointer chase.  Each dereference is
// a potential cache miss if nodes were allocated at different times
// and landed on different cache lines or pages.
std::int64_t sum_list(const std::list<int>& l) {
    return std::accumulate(l.begin(), l.end(), std::int64_t{0});
}

On typical hardware with one million elements, the vector version runs 10-50x faster than the list version for pure traversal. The list carries per-node overhead (two pointers per element on most implementations, plus allocator metadata), but the dominant cost is not space – it is that advancing to the next node requires loading a pointer whose target address the prefetcher cannot predict. Every step risks a last-level-cache miss at 50-100 ns each, while the vector scan hits L1 or L2 almost every time because the hardware prefetcher recognizes the sequential pattern.

This is not a contrived worst case. It is the default behavior of std::list when nodes are allocated over time through a general-purpose allocator. Even std::list with a pool allocator that places nodes contiguously only partially recovers, because the next-pointer indirection and per-node overhead remain.

This is why many high-performance designs look boring at first glance. They store records in std::vector, sort once, then answer queries with binary search or batched scans. They keep hot data compact. They rebuild indexes in coarse batches instead of maintaining pointer-rich structures incrementally. They move work from random access toward regular access.

That does not mean std::vector is universally right. It means the burden of proof usually falls on the non-contiguous alternative. If a node-based or hash-based structure is required, the reason should be concrete: stable iterators across mutation, truly heavy mid-sequence insertion, concurrent ownership patterns, external handles that must remain valid, or lookup patterns that stay large and sparse enough for hashing to pay off.

Containers encode tradeoffs, not just operations

Experienced reviewers should read a container choice as a set of promises and costs.

std::vector promises contiguous storage, cheap append at the end, and efficient traversal. It charges you with occasional reallocation, iterator invalidation on growth, and expensive mid-sequence erase. It is often the right answer for batches, indexes, dense state, and tables that can be rebuilt or compacted.

std::deque relaxes contiguity to preserve cheaper growth at both ends and to avoid whole-buffer relocation. That can be valuable for queue-like workloads, but traversal locality is weaker than std::vector, and treating it as “basically a vector” is a mistake when the code is scan-heavy.

Ordered associative containers such as std::map and std::set buy stable ordering and stable references under many mutations, but they pay with node allocation, indirection, and branch-heavy traversal. They are justified when ordering is semantically required or when mutation patterns make rebuild-once strategies impossible. They are bad defaults for hot lookups on read-mostly data.

std::unordered_map and std::unordered_set trade ordering for average-case lookup speed. But they still carry a real memory cost: buckets, load factors, node storage in many implementations, and unpredictable probe behavior. They are valuable for large key spaces with frequent lookup by exact key. They are less compelling when iteration dominates, when memory footprint matters, or when the working set is small enough that sorted contiguous data stays in cache.

C++23 adds std::flat_map and std::flat_set (in <flat_map> and <flat_set>), which formalize what production codebases have done for years: a sorted contiguous key-value array that is frequently better for read-mostly indexes. Prior to C++23, teams relied on Boost.Container, Abseil, or hand-rolled equivalents. The standard versions accept underlying container template parameters, so you can back them with std::vector (the default), std::deque, or std::pmr::vector as locality and allocation needs dictate. Note that std::flat_map invalidates iterators on mutation just like std::vector does, and insertion into the middle is O(n) due to element shifting. It is a read-optimized structure, not a write-optimized one.

Layout inside the element matters as much as the container

Choosing std::vector<Order> is not enough. The shape of Order can still waste bandwidth. If every iteration reads price, quantity, and timestamp, but each object also carries a large symbol string, audit metadata, retry policy, and debugging state, then a scan over the vector is still dragging cold bytes through cache.

This is where hot/cold splitting matters. Keep the fields touched together in time physically close together. Move infrequently used state behind a separate table, side structure, or handle if it materially reduces scan cost. Do not over-abstract this into a generalized “entity system” unless the codebase really needs one. Often the right move is simpler: a compact hot record plus an auxiliary store for cold metadata.

The same pressure drives the array-of-structures versus structure-of-arrays decision. Array-of-structures is easier to reason about when objects move through the system as units. Structure-of-arrays wins when processing is columnar: filter all timestamps, compute on all prices, aggregate all counters, or feed vectorized kernels. The representation should match the dominant access pattern, not an imagined object model.

To see the tradeoff concretely, compare these two representations of the same data:

// Array of Structures (AoS): natural object-oriented layout.
// Each Tick is self-contained.  Good when you routinely need all
// fields of a single tick (e.g., serializing one record, looking
// up a specific event).
struct Tick {
    std::int64_t timestamp_ns;
    std::int32_t instrument_id;
    double bid;
    double ask;
    char exchange[8];       // cold: rarely used in hot aggregation
    std::uint32_t seq_no;   // cold
    std::uint16_t flags;    // cold
    // sizeof(Tick) ~ 48 bytes with padding on most ABIs
};

double mid_price_sum_aos(std::span<const Tick> ticks) {
    double total = 0.0;
    for (const auto& t : ticks) {
        // Each iteration loads a full 48-byte Tick, but only
        // reads bid and ask (16 bytes).  The remaining 32 bytes
        // pollute cache lines and reduce effective bandwidth.
        total += (t.bid + t.ask) * 0.5;
    }
    return total;
}

// Structure of Arrays (SoA): columnar layout.
// Each field lives in its own contiguous array.
struct TickColumns {
    std::vector<std::int64_t> timestamp_ns;
    std::vector<std::int32_t> instrument_id;
    std::vector<double> bid;
    std::vector<double> ask;
    std::vector<std::array<char, 8>> exchange;
    std::vector<std::uint32_t> seq_no;
    std::vector<std::uint16_t> flags;

    void append(std::int64_t ts, std::int32_t id, double b, double a) {
        timestamp_ns.push_back(ts);
        instrument_id.push_back(id);
        bid.push_back(b);
        ask.push_back(a);
        // ...other columns omitted for brevity
    }
};

double mid_price_sum_soa(const TickColumns& ticks) {
    double total = 0.0;
    // Only bid[] and ask[] are touched.  Each cache line is 100%
    // useful payload.  The compiler can auto-vectorize this loop,
    // and the prefetcher has two clean sequential streams.
    for (std::size_t i = 0; i != ticks.bid.size(); ++i) {
        total += (ticks.bid[i] + ticks.ask[i]) * 0.5;
    }
    return total;
}

With one million ticks, the SoA version typically runs 2-4x faster for columnar aggregation on modern x86 hardware. The reason is bandwidth efficiency: the AoS loop loads roughly 48 bytes per element but uses only 16, wasting two thirds of every cache line fetch. The SoA loop touches only the two 8-byte arrays it needs, and both are perfectly sequential. The compiler is also far more likely to emit SIMD instructions for the SoA version because there is no interleaving stride to complicate vectorization.

The cost of SoA appears elsewhere. Adding a new tick requires appending to every column vector, which is awkward and error-prone. Passing “one tick” to a function requires either an index plus a reference to the whole table, or a temporary struct assembled from columns. If the dominant operation is per-record processing that touches most fields, the AoS layout avoids that assembly cost and keeps related data together.

A practical middle ground is hot/cold splitting without full SoA: keep a compact “hot” struct with only the fields the hot path needs, and store cold fields in a parallel side table indexed by the same position.

This representation is not “more modern” by itself. It is better only if the workload repeatedly processes columns independently or if compact numeric columns materially improve memory behavior. If downstream logic constantly needs a full logical tick object with many correlated fields, the conversion cost or loss of clarity may erase the win.

Stable handles are expensive; ask whether you really need them

Many poor representations originate in an unstated requirement for address stability. Code stores raw pointers or references into container elements, so the container choice becomes constrained by lifetime and invalidation concerns rather than by access cost. That often leads to node-based structures that preserve addresses at the price of worse locality everywhere else.

Sometimes that tradeoff is correct. More often the deeper problem is that the system has coupled identity to physical address. If components need durable external references, use explicit handles, indices with generation counters, or keys into a stable indirection layer. That lets the underlying storage stay compact and movable while keeping external references safe.

This is not free. Handles add lookup steps, validation work, and failure modes around stale references. But the cost is explicit and localized, which is usually preferable to poisoning the entire representation with pointer-stable containers just to satisfy a few long-lived aliases.

Invalidation rules are part of API design

A container is not only a storage mechanism. It creates or destroys API guarantees. Returning std::span<T> into a std::vector<T> tells callers that the view is valid only while the underlying storage remains alive and unmodified in invalidating ways. Returning iterators into a hash table exposes rehash sensitivity. Returning references into a node container exposes object lifetime and synchronization assumptions.

That is why representation and interface design cannot be fully separated. If a module wants freedom to compact, sort, rebuild, or reallocate internally, it should not leak raw iterators or long-lived references casually. Prefer value results, short-lived callback-based access, copied summaries, or opaque handles when the implementation needs movement freedom.

Ranges make this cleaner but also easier to misuse. A view pipeline can look purely functional while still borrowing storage whose lifetime is narrower than the pipeline’s use. If the underlying data lives in a transient buffer, query object, or request-local arena, a beautifully composed range expression can still be a lifetime bug. The storage model remains primary.

Dense data beats clever object models on hot paths

Data-intensive systems often degrade when teams model the problem domain too literally. A log-processing stage becomes a graph of objects with virtual methods and scattered ownership because the domain sounds object-oriented. Under load, the profiler reports cache misses and allocator churn rather than expensive arithmetic.

For hot paths, prefer representations that make the dominant walk simple and dense. A packet classifier might store parsed header fields in packed arrays and keep only rare extension data elsewhere. A recommendation engine might separate immutable item features from request-local scoring buffers. An order book might keep price levels in contiguous arrays indexed by normalized tick offsets instead of trees of heap nodes if the price band is bounded enough.

These designs can look less elegant than an object graph. They are often more honest. Hardware executes memory traffic, not class diagrams.

Cache-hostile structures in practice

It is worth seeing exactly what a cache-hostile structure looks like and why it hurts, because the failure mode is common and invisible in code review.

// A naive priority queue built from scattered heap nodes.
// Each node is individually allocated and linked by pointer.
struct Job {
    int priority;
    std::string description;  // may allocate on heap
    Job* next;
};

class NaivePriorityQueue {
    Job* head_ = nullptr;
public:
    void insert(int priority, std::string desc) {
        auto* node = new Job{priority, std::move(desc), nullptr};
        // Sorted insert: walk the list to find position.
        // Each step dereferences a pointer to a random heap location.
        Job** pos = &head_;
        while (*pos && (*pos)->priority <= priority)
            pos = &(*pos)->next;
        node->next = *pos;
        *pos = node;
    }

    // Find highest-priority job.  Cheap -- it is at the head.
    // But any operation that scans (e.g., "remove by description",
    // "count jobs above threshold") pointer-chases through
    // potentially thousands of cold cache lines.
    Job* top() const { return head_; }

    ~NaivePriorityQueue() {
        while (head_) { auto* n = head_; head_ = head_->next; delete n; }
    }
};

Compare with the cache-friendly alternative that stores the same data contiguously:

struct Job {
    int priority;
    std::string description;
};

class FlatPriorityQueue {
    std::vector<Job> jobs_;
public:
    void insert(int priority, std::string desc) {
        jobs_.push_back({priority, std::move(desc)});
        // Could maintain sorted order with std::lower_bound + insert,
        // or just push_back and sort/partial_sort when needed.
    }

    // Rebuild top in O(n) but with contiguous memory access.
    // For scan-heavy workloads this dominates pointer-chasing.
    const Job& top() const {
        return *std::min_element(jobs_.begin(), jobs_.end(),
            [](const auto& a, const auto& b) {
                return a.priority < b.priority;
            });
    }
};

The flat version may perform more comparisons for top(), but it does so while streaming through contiguous memory. In practice, for collections under a few thousand elements, the flat scan often beats the linked version’s O(1) head access combined with O(n) insert, because the insert in the linked version pays a cache miss per node visited. For larger collections, std::priority_queue (which wraps a contiguous heap in a std::vector) is the standard tool for exactly this reason.

The general lesson: pointer-linked structures pay a per-node tax on every traversal step that is invisible in algorithmic analysis but dominant in wall-clock time.

Common failure modes

Several recurring mistakes are worth naming explicitly.

Choosing a container by operation cheat sheet rather than by measured workload shape. “Need fast lookup” is too vague. How many elements? What key distribution? How often do you iterate? How often do you rebuild? Without those numbers, “fast” has no content.

Mixing hot and cold fields because a single struct feels tidier. Compact layout is not premature optimization when the code traverses millions of elements per second.

Allowing incidental aliases to dictate representation. A few long-lived pointers should not force the entire system onto node-based storage if a handle layer would isolate the requirement.

Treating views as a lifetime abstraction. They are not. std::span and ranges make non-owning access explicit; they do not make non-owning access safe by themselves.

Overfitting to one microbenchmark. A representation that wins isolated lookup tests may lose badly in full pipelines where decoding, filtering, aggregation, and serialization interact.

What to verify in real code

Representation work should show up in code review and measurement plans, not just in implementation.

Reviewers should ask:

Which operations dominate time and memory traffic?
Does the chosen container optimize those operations or merely make one local call site convenient?
Are stable addresses truly required, or is the code leaking representation constraints through aliases?
Which invalidation rules are now part of the API contract?
Are hot fields physically grouped together, or are scans dragging cold state through cache?
If views are returned, what storage and mutation conditions bound their lifetime?

The next chapter turns those questions into a cost model. For now, the central point is simpler: representation decisions are often the first-order performance decision. Before discussing allocators, inlining, or benchmarking methodology, get the shape of the data right.

Takeaways

Start with the dominant access pattern, not with container folklore.
Prefer contiguous storage by default for scan-heavy and read-mostly workloads.
Treat stable addresses and stable iterators as expensive requirements that need justification.
Separate hot and cold data when repeated traversal makes bandwidth the bottleneck.
Use handles or indirection layers deliberately when identity must survive storage movement.
Treat invalidation and view lifetime as API-level consequences of representation.

Allocation, locality, and cost models

Production problem

Once the data shape is reasonable, the next performance question is where the bytes come from and how often they move. Many teams jump from “this path is slow” straight to allocator tweaks or pool designs without first stating the cost model. That is backwards. Before changing allocation strategy, you need to know which costs dominate: allocation latency, synchronization in the allocator, page faults, cache and TLB misses from scattered objects, copy cost from oversized values, or instruction overhead from abstraction layers.

This chapter builds that model. The point is not to memorize allocator APIs. It is to reason concretely about what a design forces the machine to do. Claims such as “std::function is zero cost,” “arenas are always faster,” or “small allocations are cheap now” are not engineering arguments. The argument starts when you can identify the actual object graph, the number and timing of allocations, the ownership horizon of the objects involved, and the locality consequences of each layer of indirection.

Keep the boundary with the previous chapter sharp. Chapter 15 asked, “What should the representation be?” This chapter asks, “Given a representation, what costs does it impose over time?” Containers still appear here, but the emphasis is allocation frequency, lifetime clustering, object graph depth, and locality, not container semantics.

Start with an allocation inventory

The first useful performance model is embarrassingly simple: list what allocates on the path you care about.

Most codebases are worse at this than they think. A request parser allocates strings for every header value, a routing layer stores callbacks in type-erased wrappers, a JSON transformation materializes intermediate objects, and a logging path formats into temporary buffers. Each decision may be locally reasonable. Together they create a request that performs dozens or hundreds of allocations before any real business work begins.

The example project’s parse_request() in examples/web-api/src/modules/http.cppm illustrates this pattern concretely. Each call allocates a std::string for every header name and value (line 191: headers.emplace_back(std::string(name), std::string(value))), plus a std::string for the path and the body. For a request with ten headers, that is at least twelve heap allocations before any handler runs. This is a natural candidate for PMR optimization: a std::pmr::monotonic_buffer_resource backed by a stack buffer could supply all of those strings from a single arena, eliminating per-header allocator calls entirely and making teardown a bulk operation when the request scope ends.

Inventory work should separate three questions:

Which operations allocate on the steady-state hot path?
Which allocations are one-time setup or batch rebuild costs?
Which allocations are avoidable with different ownership or data flow, rather than with a better allocator?

That last question matters most. If a system allocates because it insistently decomposes dense processing into many short-lived heap objects, changing the allocator may reduce pain without fixing the design. The highest-leverage change is often to stop needing the allocations.

Here is a concrete example of what allocation-heavy code looks like on a hot path, and what the alternative can be:

// Allocation-heavy: every event creates a temporary string,
// a vector, and a map entry.  Under load this path may perform
// 5-10 heap allocations per event.
struct Event {
    std::string type;
    std::string payload;
    std::vector<std::string> tags;
    std::unordered_map<std::string, std::string> metadata;
};

void process_batch_heavy(std::span<const RawEvent> raw,
                         std::vector<Event>& out) {
    for (const auto& r : raw) {
        Event e;
        e.type = parse_type(r);         // allocates
        e.payload = parse_payload(r);   // allocates
        e.tags = parse_tags(r);         // allocates vector + each string
        e.metadata = parse_meta(r);     // allocates map buckets + nodes
        out.push_back(std::move(e));    // may reallocate out's buffer
    }
}

// Allocation-light: pre-sized arena, string views into stable
// input buffer, fixed-capacity inline storage.
struct EventView {
    std::string_view type;
    std::string_view payload;
    // Use a small fixed-capacity container for tags.
    // boost::static_vector or a similar stack-allocated small vector.
    std::array<std::string_view, 8> tags;
    std::uint8_t tag_count = 0;
};

void process_batch_light(std::string_view input_buffer,
                         std::span<const RawEvent> raw,
                         std::vector<EventView>& out) {
    out.clear();
    out.reserve(raw.size());  // one allocation, amortized
    for (const auto& r : raw) {
        EventView e;
        e.type = parse_type_view(r, input_buffer);
        e.payload = parse_payload_view(r, input_buffer);
        e.tag_count = parse_tags_view(r, input_buffer, e.tags);
        out.push_back(e);
    }
    // Zero heap allocations per event if input_buffer is stable
    // and out has sufficient capacity.
}

The light version imposes constraints: the input buffer must outlive the views, tags are bounded, and metadata is handled differently. Those constraints are the cost of avoiding allocations. Whether that cost is acceptable depends on the workload, but making it visible is the point.

Allocation cost is more than the call to `new`

Engineers sometimes talk about allocation as though the only cost were the allocator function call. In production, that is usually a minority of the bill. Allocation also affects cache locality, synchronization behavior, fragmentation, page working set, and destruction cost later. If an object graph spreads logically adjacent data across unrelated heap locations, every later traversal pays for that decision. If per-request allocations hit a shared global allocator from many threads, allocator contention becomes part of latency variance. If many short-lived objects are destroyed individually, cleanup traffic can dominate tail latency during bursts.

This is why “we pooled it, so the problem is solved” is often false. A pool may reduce allocator call overhead and even reduce contention, but if the resulting object graph is still pointer-heavy and scattered, traversal remains expensive. Conversely, a design that stores request-local state in contiguous buffers may perform very few allocations and enjoy better locality even with the default allocator.

Lifetime clustering usually beats clever reuse

Objects that die together should often be allocated together. This is the core intuition behind arenas and monotonic resources: if a batch of data shares a lifetime boundary, paying for individual deallocation is wasted work. Request-local parse trees, temporary token buffers, graph search scratch state, and one-shot compilation metadata are classic examples.

C++23 still relies on std::pmr for the standard vocabulary here. The value is not stylistic. It is architectural. Memory resources let you express that a family of objects belongs to a shared lifetime region without hard-wiring a custom allocator type through every template instantiation.

struct RequestScratch {
    std::pmr::monotonic_buffer_resource arena;
    std::pmr::vector<std::pmr::string> tokens{&arena};
    std::pmr::unordered_map<std::pmr::string, std::pmr::string> headers{&arena};

    explicit RequestScratch(std::span<std::byte> buffer)
        : arena(buffer.data(), buffer.size()) {}
};

This design says something important: the strings and containers are not independent heap citizens. They are request-scoped scratch. That reduces allocation overhead and makes teardown a bulk operation.

A more complete example shows the difference in practice. Compare standard allocation versus pmr with a stack-local buffer for a request-processing path:

#include <memory_resource>
#include <vector>
#include <string>
#include <array>

// Standard allocation: every string, every vector growth, and the
// map internals go through the global allocator.  Under contention
// from many threads, this serializes on allocator locks.
void handle_request_standard(std::span<const std::byte> input) {
    std::vector<std::string> tokens;
    std::unordered_map<std::string, std::string> headers;
    parse(input, tokens, headers);  // many small allocations
    route(tokens, headers);
    // Destruction: each string freed individually, each map node freed.
}

// PMR with stack buffer: small requests never touch the heap.
// The monotonic_buffer_resource first allocates from the stack buffer.
// If the request is large enough to exhaust it, it falls back to
// the upstream resource (default: new/delete).
void handle_request_pmr(std::span<const std::byte> input) {
    std::array<std::byte, 4096> stack_buf;
    std::pmr::monotonic_buffer_resource arena{
        stack_buf.data(), stack_buf.size(),
        std::pmr::null_memory_resource()
        // null_memory_resource: fail loudly if buffer is exceeded.
        // Replace with std::pmr::new_delete_resource() to allow
        // fallback to heap for oversized requests.
    };

    std::pmr::vector<std::pmr::string> tokens{&arena};
    std::pmr::unordered_map<std::pmr::string, std::pmr::string>
        headers{&arena};
    parse_pmr(input, tokens, headers);
    route_pmr(tokens, headers);
    // Destruction: arena destructor releases everything in one shot.
    // No per-string, per-node deallocation calls.
}

The pmr version eliminates all per-object deallocation calls and avoids global allocator contention entirely for requests that fit within the stack buffer. On a high-throughput server handling small requests, this can reduce allocator overhead by an order of magnitude. The tradeoff is that std::pmr containers carry an extra pointer to the memory resource (increasing sizeof slightly) and that the monotonic resource does not reclaim memory from individual deallocations – it only grows until the resource itself is destroyed. This is fine for request-scoped scratch; it is wrong for long-lived containers that grow and shrink over time.

But monotonic allocation is not a universal upgrade. It is a bad fit when objects need selective deallocation, when memory spikes from one pathological request must not bloat the steady-state footprint, or when accidentally retaining a single object would retain an entire arena. Regional allocation sharpens lifetime assumptions. If the assumptions are wrong, the failure is bigger than with individual ownership.

Locality is about graph shape, not just raw bytes

It is possible to have a low allocation count and still have terrible locality. A handful of large allocations containing arrays of pointers to separately allocated nodes can be worse than many small allocations if traversal constantly bounces between pages. The cost model therefore needs one more question: when the hot code walks this structure, how many pointer dereferences does it perform before it reaches useful payload?

Pointer-rich designs are often semantically attractive because they mirror domain relationships directly. Trees point to children. Polymorphic objects point to implementations. Pipelines store chains of heap-allocated stages. Sometimes that is unavoidable. Often it is laziness disguised as modeling.

The cure is not “never use pointers.” The cure is to distinguish identity and topology from storage. A graph can be stored in contiguous node arrays with index-based adjacency. A polymorphic pipeline can often be represented as a small closed std::variant of step types when the set of operations is known. A string-heavy parser can intern repeated tokens or keep slices into a stable input buffer rather than allocating owned strings for every field.

Those are not language-trick optimizations. They are graph-shape decisions. They reduce the amount of memory chasing required before useful work begins.

The hidden cost of `std::shared_ptr`

std::shared_ptr deserves special attention because its costs are frequently underestimated. The allocation cost is the most visible: std::make_shared performs one allocation for the control block and the managed object together, while constructing from a raw pointer performs two. But allocation is only the beginning.

The deeper cost is reference counting. Every copy of a std::shared_ptr performs an atomic increment; every destruction performs an atomic decrement with acquire-release semantics. On x86, an atomic increment is relatively cheap (a locked instruction, roughly 10-20 ns under no contention), but under cross-core sharing, the cache line holding the control block bounces between cores. Under heavy contention, this serializes otherwise parallel work.

// Looks innocent: passing shared_ptr by value into a thread pool.
// Each enqueue copies the shared_ptr (atomic increment), and each
// task completion destroys it (atomic decrement + potential dealloc).
void submit_work(std::shared_ptr<Config> cfg,
                 ThreadPool& pool,
                 std::span<const Request> requests) {
    for (const auto& req : requests) {
        // Copies cfg: atomic ref-count increment per task.
        pool.enqueue([cfg, &req] {
            handle(req, *cfg);
        });
    }
    // If 10,000 requests are enqueued, that is 10,000 atomic
    // increments on submission and 10,000 atomic decrements
    // on completion, all contending on the same cache line.
}

// Fix: cfg outlives all tasks, so pass a raw pointer or reference.
void submit_work_fixed(const Config& cfg,
                       ThreadPool& pool,
                       std::span<const Request> requests) {
    for (const auto& req : requests) {
        pool.enqueue([&cfg, &req] {
            handle(req, cfg);
        });
    }
    // Zero reference-counting overhead.  Caller guarantees
    // cfg lives until all tasks complete.
}

The rule is not “never use std::shared_ptr.” It is: do not use shared ownership to avoid thinking about lifetime. When an object has a clear owner and borrowers, express that with a unique owner and references or views. Reserve std::shared_ptr for genuinely shared, non-deterministic lifetimes. And never pass std::shared_ptr by value when a const& or raw reference suffices – each copy is an atomic round-trip you are paying for nothing.

The example project demonstrates this principle in practice. In examples/web-api/src/modules/handlers.cppm, every handler factory (e.g., list_tasks(), get_task(), create_task()) takes TaskRepository& by reference and captures it by reference in the returned lambda. The repository is owned by main() and outlives all handlers, so there is no need for std::shared_ptr<TaskRepository>. This avoids atomic reference-count traffic on every request and keeps the handler’s capture small – a single pointer rather than a two-pointer-wide shared_ptr plus its control block.

Additional costs that accumulate: std::shared_ptr is two pointers wide (pointer to object + pointer to control block), doubling the size of a raw pointer. Containers of std::shared_ptr therefore have worse cache density. The weak reference count adds another atomic variable. And custom deleters stored in the control block add type-erased indirection at destruction time.

Hidden allocation is a design smell

Modern C++ provides abstractions that are appropriate only when their cost model remains visible enough for the code under review. The problem is not abstraction itself. The problem is abstraction whose allocation behavior is implicit, workload-dependent, or implementation-defined in ways the team ignores.

std::string may allocate or may fit in a small-string buffer. std::function may allocate for larger callables and may not for smaller ones. Type-erased wrappers, coroutine frames, regex engines, locale-aware formatting, and stream-based composition can all allocate in ways that disappear from the immediate call site.

None of these types are forbidden. They become dangerous when used on hot paths without explicit evidence. If a service constructs a std::function per message, or repeatedly turns stable string slices into owned std::string objects because downstream APIs demand ownership by default, the real issue is not just “too many allocations.” It is that the API surface obscures where cost enters.

Review hot-path abstractions with the same seriousness you apply to thread synchronization. Ask:

Can this wrapper allocate?
Can it force an extra indirection?
Can it enlarge object size enough to reduce packing density?
Can the same behavior be expressed with a closed alternative such as std::variant, a template boundary, or a borrowed view?

The right answer depends on code size, ABI, compile times, and substitution flexibility. The point is to make the trade explicit.

Pools, freelist reuse, and their failure modes

Pooling is attractive because it offers a story of reuse and predictability. Sometimes that story is true. Fixed-size object pools can help when allocation size is uniform, object lifetime is short, reuse is heavy, and allocator contention matters. Slab-like designs can also improve spatial locality relative to fully general heap allocation.

But pools fail in repeatable ways.

Variable object sizes lead to multiple pools or internal fragmentation that erases the gain. Pools can hide unbounded retention when objects are “reused later” but rarely enough that memory never returns to the system. Per-thread pools complicate balancing under skewed workloads. Lifetime logic starts bending around pool availability instead of domain ownership. And developers stop measuring because the existence of a pool feels like optimization.

The operational rule is blunt: use pooling to support a known workload shape, not as a generic performance gesture. If you cannot describe the allocation distribution and reuse pattern, you are not ready to design the pool.

Value size and parameter surfaces still matter

Allocation is only part of the cost model. Large value types copied casually through APIs can be just as destructive. A “convenient” record type that embeds several std::string members, optional blobs, and vectors may avoid some heap traffic by using move semantics, but it still enlarges working-set size, increases cache pressure, and makes pass-by-value choices more expensive.

This is where Chapter 4’s API guidance re-enters the picture. Passing by value is excellent when ownership transfer or cheap move is the actual contract. It is poor when a path repeatedly copies or moves large aggregate objects merely to satisfy generic convenience. A cost model must include object size, move cost, and how often data crosses layer boundaries.

Small values are easier to move through the system. Large values often want stable storage plus borrowed access, summary extraction, or decomposition into hot and cold parts. If those options complicate the API, that complication may be justified. Performance design is full of cases where cleaner interfaces at one layer create avoidable cost everywhere else.

A practical cost model for review

For production work, an informal but explicit model is usually enough. For the path under review, write down:

Number of steady-state allocations per operation.
Whether those allocations are thread-local, globally contended, or hidden behind abstractions.
The lifetime grouping of allocated objects.
Number of pointer indirections on the hot traversal path.
Approximate hot working-set size.
Whether traversal is contiguous, strided, hashed, or graph-like.
Whether destruction is individual, batched, or region-based.

That list will not yield a cycle-accurate prediction. It will stop hand-waving. It lets reviewers distinguish “this feels costly” from “this design guarantees allocator traffic, scattered reads, and poor teardown behavior under burst load.”

Boundary conditions

A cost model is not a license to over-specialize everything. Sometimes a heap allocation is correct because the object truly outlives local scopes and participates in shared ownership. Sometimes type erasure is the right trade for substitution across library boundaries. Sometimes arena allocation is inappropriate because retention risk or debugging complexity outweighs the throughput gain.

The goal is not maximal local speed. The goal is predictable, explainable cost under real system pressure. If a design slightly increases steady-state cost while dramatically improving correctness or evolvability at a non-hot boundary, that can be the right call. Cost models exist to support tradeoffs, not to abolish them.

What to verify before you tune

Before introducing custom allocators, pooling, or broad pmr plumbing, verify four things.

First, confirm the path is hot enough that allocation and locality matter materially. Second, confirm the current design actually allocates or scatters data in the way you think it does. Third, confirm the objects involved share the lifetime shape your proposed allocator strategy assumes. Fourth, confirm the new design does not simply move cost elsewhere, such as larger retained memory, worse debugging ergonomics, or more complex ownership boundaries.

The next chapter addresses the evidence side directly. Cost models are hypotheses. They become engineering only when benchmarking and profiling test the right hypothesis.

Takeaways

Start with an allocation inventory before reaching for allocator techniques.
Treat allocation cost as latency, locality, contention, retention, and teardown cost, not just the price of new.
Cluster objects by lifetime when their destruction boundary is genuinely shared.
Use std::pmr to express regional memory strategy when it matches ownership, not as decorative modernity.
Be suspicious of abstractions whose allocation and indirection behavior is hidden on hot paths.
Design pools for a measured workload shape or not at all.

Benchmarking and profiling without lying to yourself

Production problem

Performance discussions become expensive when teams confuse numbers with evidence. A benchmark reports a 20 percent win, but the production service does not improve. A profiler shows a hot function, but the real issue is lock contention or off-CPU waiting. A regression slips into main because the benchmark measured the wrong input shape or because someone “optimized” dead work the compiler already removed.

This chapter is about measurement discipline. The previous chapters covered representation and cost modeling. Here the question is different: how do you gather evidence that is strong enough to change code, justify complexity, or reject a supposed optimization? The answer requires benchmark design, profiler literacy, and a refusal to let attractive charts substitute for causal reasoning.

Performance work in modern C++ is especially vulnerable to self-deception because the language exposes many powerful local transformations. You can change container type, ownership shape, inlining boundaries, allocator strategy, range pipelines, coroutine structure, and type-erasure choices. Some of those changes matter. Many do not. Measurement is how you tell the difference.

Choose the right instrument for the question

Not every performance question should start with a microbenchmark.

If you are deciding between two data representations for a tight loop, a controlled benchmark may be exactly right. If you are debugging why request latency spikes under burst load, profiling a realistic system or gathering production traces is more appropriate. If you suspect lock contention, scheduler behavior and blocking time matter more than a standalone throughput loop. If a regression appears only in end-to-end service traffic, a synthetic isolated benchmark may actively mislead.

Use a simple hierarchy:

Use microbenchmarks for narrowly scoped, well-isolated questions.
Use profilers for discovering where time or samples actually go in a process.
Use production-like load tests for interactions among queues, threads, I/O, caches, and contention.
Use production observability to confirm the change matters in the environment that pays the bill.

Confusing these layers is one of the fastest ways to waste weeks.

A benchmark must state its claim

A trustworthy benchmark starts with a sentence, not with code. “Compare lookup latency of sorted contiguous storage versus hash lookup for a read-mostly route table of 1k, 10k, and 100k entries with realistic key lengths.” That is a claim. “Benchmark containers” is not.

The sentence should identify:

The operation under test.
The data size and distribution.
The mutation versus read ratio.
The machine or environment assumptions that matter.
The decision the benchmark is supposed to inform.

If you cannot write that sentence, you do not yet know what the benchmark means.

This requirement matters because performance is workload-shaped. A benchmark over random integer keys may tell you nothing about a production router that uses string views with strong prefix locality. A benchmark over uniform hash hits may hide the collision behavior of real skewed keys. A benchmark that rebuilds a container every iteration may punish a design whose production cost is dominated by lookup after a one-time build.

Benchmark the whole relevant operation

One common lie is measuring a convenient fragment instead of the decision boundary that matters. For example, a parsing pipeline is “optimized” by measuring only token conversion after the input is already resident in cache and after allocations are pre-reserved. A container comparison measures lookup while excluding construction, sorting, deduplication, and memory reclamation even though the production workload rebuilds the structure frequently.

A benchmark does not need to be huge. It does need to include the costs the design imposes in reality. If an API choice forces allocation, copying, hashing, or validation before the line you currently measure, those costs belong in the measurement unless you can justify excluding them.

This is where the cost model from Chapter 16 should inform the benchmark. Measure the actual boundary where the costs accumulate. Otherwise the result is technically correct and operationally useless.

In a real project, the benchmarkable surfaces are often less obvious than vector<int> but more instructive. Consider the example project in examples/web-api/:

json::serialize_array() (json.cppm) iterates a range and builds a JSON array by repeated string concatenation. Benchmarking this function across varying collection sizes (10, 100, 1000 tasks) would reveal whether the concatenation strategy or the to_json() cost per element dominates, and whether pre-reserving the result string matters.
TaskRepository::find_by_id() (repository.cppm) performs a linear scan with std::ranges::find under a shared_lock. A benchmark comparing this against an unordered_map<TaskId, Task> alternative would need to include the locking cost and test at realistic repository sizes – not just the raw find.
Router::to_handler() (router.cppm) captures the route table by value and returns a lambda that performs a linear scan on every request dispatch. Benchmarking route dispatch at 5, 50, and 500 registered routes would show whether the linear scan remains acceptable or whether a sorted-vector or trie approach is justified. The benchmark must include the full dispatch path – matching method, comparing patterns, and invoking the handler – not just the loop.

These are the kinds of benchmark claims worth writing in plain language before writing code: “Compare route dispatch latency with linear scan versus sorted vector at 5, 50, and 500 routes with realistic HTTP method and path distributions.”

Control for compiler and harness artifacts

C++ can produce especially misleading microbenchmarks because the optimizer is extremely willing to remove, fold, hoist, and vectorize code that is not anchored to observable behavior. Benchmark harnesses exist partly to prevent this, but they do not eliminate the need for skepticism.

At minimum:

Ensure results are consumed in a way the compiler cannot elide.
Separate one-time setup from per-iteration work deliberately.
Warm up enough to avoid measuring first-touch effects accidentally.
Control data initialization so each iteration exercises the intended branch and cache behavior.
Inspect generated code when a surprising result appears.

If a benchmark claims that a complex operation takes almost no time, assume the optimizer removed work until proven otherwise. If a benchmark shows enormous variance, assume the environment is unstable or the workload is underspecified until proven otherwise.

Flawed benchmark: dead code elimination

This is the single most common microbenchmark lie. The compiler sees that a result is never used and removes the computation entirely:

// BROKEN: the compiler may eliminate the entire loop because
// 'total' is never observed.
static void BM_bad_dce(benchmark::State& state) {
    std::vector<double> data(1'000'000, 1.0);
    for (auto _ : state) {
        double total = 0.0;
        for (double d : data)
            total += d * d;
        // total is dead.  Optimizer removes the loop.
        // Benchmark reports ~0 ns/iteration.
    }
}

// FIXED: benchmark::DoNotOptimize prevents the compiler from
// proving the result is unused.
static void BM_good_dce(benchmark::State& state) {
    std::vector<double> data(1'000'000, 1.0);
    for (auto _ : state) {
        double total = 0.0;
        for (double d : data)
            total += d * d;
        benchmark::DoNotOptimize(total);
    }
}

benchmark::DoNotOptimize is not magic. On most implementations it acts as an opaque read of the value (often an inline asm that the compiler treats as potentially observing the variable). Use it on the final result, not on every intermediate step, or you risk inhibiting legitimate optimizations the production code would also benefit from. If you are unsure whether DCE is affecting your results, compile with -S and inspect the assembly.

Flawed benchmark: measuring setup instead of work

// BROKEN: construction cost dominates. The benchmark is
// measuring vector allocation and initialization, not lookup.
static void BM_bad_lookup(benchmark::State& state) {
    for (auto _ : state) {
        std::vector<int> v(1'000'000);
        std::iota(v.begin(), v.end(), 0);
        auto it = std::lower_bound(v.begin(), v.end(), 500'000);
        benchmark::DoNotOptimize(it);
    }
}

// FIXED: setup goes outside the timing loop.
static void BM_good_lookup(benchmark::State& state) {
    std::vector<int> v(1'000'000);
    std::iota(v.begin(), v.end(), 0);
    for (auto _ : state) {
        auto it = std::lower_bound(v.begin(), v.end(), 500'000);
        benchmark::DoNotOptimize(it);
    }
}

Flawed benchmark: wrong baseline

Comparing two designs against an unfair baseline is subtler and more dangerous:

// MISLEADING: comparing hash lookup against linear scan.
// Concludes "hash map is 100x faster" -- but the real alternative
// in production is sorted vector with binary search, which may
// be within 2x and uses half the memory.
static void BM_linear_scan(benchmark::State& state) {
    std::vector<std::pair<int,int>> data(100'000);
    // ... fill with random kv pairs, unsorted ...
    for (auto _ : state) {
        auto it = std::find_if(data.begin(), data.end(),
            [](const auto& p) { return p.first == 42; });
        benchmark::DoNotOptimize(it);
    }
}

The right baseline is the realistic alternative, not the worst possible option. Always state what the benchmark is comparing against and why that alternative is the one the team would actually choose.

Flawed benchmark: warm cache illusion

// MISLEADING: data fits in L2 cache and is hot from the previous
// iteration.  Production accesses the same structure after
// processing unrelated data that evicts it from cache.
static void BM_warm_cache(benchmark::State& state) {
    std::vector<int> v(1'000);  // ~4 KB, fits in L1
    std::iota(v.begin(), v.end(), 0);
    for (auto _ : state) {
        int sum = 0;
        for (int x : v) sum += x;
        benchmark::DoNotOptimize(sum);
    }
    // Reports ~50 ns.  In production, with cache-cold data,
    // the same operation takes 10-50x longer.
}

If the production access pattern encounters cold data, either make the working set large enough to exceed cache, or explicitly flush cache lines between iterations (platform-specific and fragile, but sometimes necessary for honest results).

Google Benchmark pitfalls

Google Benchmark (benchmark::) is widely used and generally solid, but several recurring mistakes deserve mention:

Forgetting benchmark::ClobberMemory(). DoNotOptimize prevents dead stores of a value, but it does not force the compiler to assume memory has changed. If your benchmark modifies a data structure in place, the compiler may hoist reads above writes across iterations. Use benchmark::ClobberMemory() after mutations to force a reload:

static void BM_modify(benchmark::State& state) {
    std::vector<int> v(10'000, 0);
    for (auto _ : state) {
        for (auto& x : v) x += 1;
        benchmark::ClobberMemory();
        // Without ClobberMemory, the compiler could theoretically
        // observe that v is never read and eliminate the writes,
        // or combine multiple iterations into one.
    }
}

Not using state.SetItemsProcessed(). Without it, the output shows only time per iteration, making it hard to compare benchmarks that process different batch sizes. Always call state.SetItemsProcessed(state.iterations() * num_items) so the output includes a throughput column.
Ignoring state.PauseTiming() / state.ResumeTiming() overhead. These calls use clock reads that themselves take 20-100 ns on many platforms. If the operation you are measuring takes less than a microsecond, the pause/resume overhead dominates. For sub-microsecond work, keep setup outside the loop entirely or amortize it across many iterations.
Benchmarking only one size. Use ->Range(8, 1 << 20) or ->DenseRange() to test across sizes. A design that wins at 1K elements may lose at 1M. Performance is not a scalar.

Use a serious harness when possible. The exact library is less important than the discipline: stable repetition, clear setup boundaries, and explicit prevention of dead-code elimination. When a benchmark is intentionally partial because the repository does not standardize on a harness, say so and document the omitted scaffolding.

Measuring noise instead of signal

Even with a correct benchmark, environmental noise can dominate the result. Frequency scaling, thermal throttling, background processes, NUMA effects, and interrupt coalescing all inject variance that has nothing to do with your code change.

Practical defenses:

Pin CPU frequency during benchmarks (cpupower frequency-set -g performance on Linux, or disable turbo boost). A benchmark that runs at 4.5 GHz on one iteration and 3.2 GHz on the next is measuring the governor, not your code.
Isolate cores (isolcpus kernel parameter or taskset / numactl) to prevent scheduler interference.
Run multiple trials and report median, not mean. Median is robust to occasional spikes from interrupts or page faults. Mean is dragged by outliers.
Require statistical significance before declaring a win. A 3% improvement with 5% coefficient of variation is noise. Google Benchmark supports --benchmark_repetitions=N and reports stddev; use it.
Compare on the same machine, same boot, same binary when possible. Cross-machine comparisons require careful normalization and are generally less trustworthy.

If a benchmark result changes by more than 1-2% between identical runs on an idle machine, the benchmark setup needs fixing before the result means anything.

Distribution matters more than a single number

Average runtime is a weak summary. Many production systems care about percentiles, variance, and worst-case behavior under skew. A representation that improves mean throughput while making tail latency worse under bursty allocation or lock contention may still be a regression. Likewise, a benchmark that reports only “nanoseconds per iteration” can hide bimodal behavior caused by rehash, page faults, branch predictor flips, or occasional large allocations.

Read performance numbers the way you would read availability or latency telemetry. Ask about spread, not just center. Ask whether outliers are noise, environment instability, or real behavior from the design. Ask whether the benchmark shape forces rare expensive events often enough to matter.

For CI regression control, this means using thresholds and trend analysis carefully. A noisy benchmark can create false alarms that train teams to ignore the signal. A too-forgiving threshold can let meaningful regressions accumulate. Stable benchmark design is usually more valuable than elaborate reporting.

Profilers answer different questions

A profiler is not a slower benchmark. It is a sampling or instrumentation tool for understanding where time, allocations, cache misses, or waits occur in a real process. Use it when you do not yet know where the bottleneck is, or when a microbenchmark result needs validation against full-system behavior.

Different profilers reveal different failure classes:

CPU sampling profilers answer where active CPU time is spent.
Allocation profilers answer which paths allocate and retain memory.
Hardware-counter-aware tools answer where cache misses, branch mispredicts, or stalled cycles cluster.
Concurrency and tracing tools answer where threads block, wait, or contend.

Do not ask one tool to answer a question it cannot see. A CPU profiler will not explain why threads are mostly idle waiting on a lock. An allocation flame graph will not tell you whether a faster allocator would matter if traversal cost still dominates. A wall-clock trace may show a slow request without distinguishing CPU work from scheduler delay.

On Linux, that may mean combining perf, allocator profiling, and tracing. On Windows, it may mean ETW-based tools, Visual Studio Profiler, or Windows Performance Analyzer. On macOS, Instruments fills a similar role. The tool choice is secondary to the habit: pair the question with the instrument that can actually answer it.

Correlate benchmarks with profiles

Benchmarking and profiling should constrain each other.

If a microbenchmark says a change should help because it reduces allocations, the profiler in a realistic process should show fewer allocations or less time in allocation-heavy paths. If a profile says a loop is hot because of cache misses in a pointer-rich traversal, a benchmark should isolate that traversal shape and test alternatives. If the two disagree, do not average them into comfort. Investigate the mismatch.

Common causes of mismatch include:

The benchmark data shape does not match production.
The benchmark isolated a cost that is drowned out end to end.
The profiler points at a symptom rather than the root cause.
The measured change affected code size, inlining, or branch behavior in the full binary differently than in isolation.

Good performance work narrows these gaps. Bad performance work ignores them.

Beware “representative” inputs that are not

Teams often sabotage measurement by using tidy synthetic inputs. Keys are uniformly random. Messages are the same size. Queues are never bursty. Hash tables never experience realistic load factors. Parsers never see malformed or adversarial data. These inputs are easier to generate and easier to stabilize. They are also often wrong.

Representative input does not mean copying production traffic blindly. It means preserving the properties that drive cost: size distribution, skew, repetition, mutation ratio, working-set size, and failure-path frequency. For a cache, that may mean a Zipf-like access pattern rather than uniform keys. For a parser, it may mean a realistic mix of short and long fields plus a small rate of malformed records. For a scheduler or queue, it may mean burst patterns rather than a flat arrival rate.

When data privacy or operational constraints prevent real traces, at least synthesize distributions intentionally. A benchmark over unrealistic inputs is not neutral. It actively trains the team on the wrong problem.

Performance claims must survive code review

Treat performance changes as reviewable design work, not as heroic experiments. A credible change should come with a compact evidence package:

The performance question being answered.
The benchmark or profile setup.
The workload assumptions.
The before-and-after result, including variance or percentile data when relevant.
The tradeoffs introduced: code complexity, memory footprint, API restrictions, portability, or maintenance cost.

This forces a useful discipline. It prevents “seems faster on my machine” from entering the codebase as institutional memory. It also creates artifacts future reviewers can re-run when compilers, standard libraries, or workload shape changes alter the answer.

Regression control is an engineering system, not a dashboard

It is tempting to add a benchmark job to CI and call performance solved. In practice, regression control works only when the measured benchmarks are stable, cheap enough to run at the right frequency, and tied to code paths the team actually cares about. A flaky nightly benchmark suite that no one trusts is not safety. It is ritual.

A practical setup usually includes a small set of highly stable microbenchmarks for known hot kernels, a separate heavier performance workflow for broader load tests, and production observability that tracks latency, throughput, CPU time, and memory effects after release. The layers differ in cost and fidelity. You need all three because no single layer is enough.

What honest measurement looks like

Honest measurement is modest. It does not promise universal truths from one benchmark, confuse profile heat with immediate blame, or assume an optimization matters just because it is visible in assembly. It ties a number to a workload, a question, and a decision.

Hardware changes, compilers improve, standard library implementations shift, and production traffic evolves. The habit worth building is not attachment to one profiler or framework. It is the refusal to make performance claims without evidence that matches the decision being made.

Takeaways

Pick the measurement tool that matches the question: benchmark, profile, load test, or production telemetry.
Write the benchmark claim in plain language before writing benchmark code.
Measure the full relevant operation, not the most convenient fragment.
Treat optimizer artifacts, harness mistakes, and unrealistic inputs as default suspects.
Look at variance and percentiles, not only means.
Require performance changes to carry a reviewable evidence package.

Testing Strategy for Resource and Boundary Bugs

Most expensive C++ bugs are not “algorithm returns the wrong number” bugs. They are resource and boundary bugs: a file descriptor that stays open on an error path, a temporary file that survives a failed commit, a cancellation path that leaks work, a parser that accepts malformed input until one particular byte pattern explodes under load, or a library boundary that quietly translates a domain failure into process termination.

Happy-path unit tests do not put enough pressure on these designs. They tend to validate nominal behavior while leaving lifetime transitions, cleanup guarantees, and edge contracts mostly unexercised. In modern C++, that is a bad bargain. Ownership and failure handling are explicit enough that you can design tests around them, and you should. If a component owns scarce resources, crosses process or API boundaries, or behaves differently under timeout, cancellation, malformed input, or partial failure, its test strategy should be built around those facts.

This chapter is about test design, not tooling. The goal is to decide what evidence to ask for before code ships. Sanitizers, static analysis, and build diagnostics belong in the next chapter. Runtime logs, metrics, traces, and crash evidence belong in the chapter after that. Here the question is simpler: what tests prove that ownership, cleanup, and boundary behavior stay correct when the system is under stress?

Start from failure shape, not test pyramid slogans

Generic testing advice becomes weak quickly in C++ because the expensive failures are not evenly distributed. If a service spends most of its risk budget in shutdown, cancellation, temporary-file replacement, buffer lifetime, and external protocol translation, then the test suite should spend most of its effort there as well.

That means starting with failure shape.

For each component, ask four questions.

Which resources must be released, rolled back, or committed exactly once?
Which boundaries translate errors, ownership, or representation between subsystems?
Which inputs or schedules are too large to enumerate but cheap to generate?
Which behaviors depend on time, concurrency, or cancellation rather than simple call order?

Those questions push you toward different test forms. Resource cleanup usually needs deterministic failure injection and postcondition checks. Boundary translation usually needs contract tests against realistic payloads and error classes. Large input surfaces usually need properties and fuzzing. Time-sensitive concurrency usually needs controllable clocks, executors, and shutdown orchestration instead of sleep-based tests.

Coverage numbers do not answer those questions. A line can run and still fail to prove that rollback happened, ownership remained valid, or shutdown drained background work without use-after-free risk. Treat coverage as a lagging completeness signal, not as the organizing principle of the suite.

Test resource lifecycles at the level the business cares about

The right test for a resource bug almost never asserts that some helper was called. It asserts the observable contract around acquisition, commit, rollback, and release.

Consider a service that rewrites an on-disk snapshot atomically. The production rule is not “call write, then rename, then remove on failure.” The production rule is “either the new snapshot becomes visible, or the old one remains intact and the staging file is cleaned up.” A useful test targets that rule directly.

Intentional partial: a seam that makes rollback testable

struct file_system {
    virtual ~file_system() = default;

    virtual auto write(std::filesystem::path const& path,
                       std::span<char const> bytes)
        -> std::expected<void, std::error_code> = 0;

    virtual auto rename(std::filesystem::path const& from,
                        std::filesystem::path const& to)
        -> std::expected<void, std::error_code> = 0;

    virtual void remove(std::filesystem::path const& path) noexcept = 0;
};

enum class snapshot_error {
    staging_write_failed,
    commit_failed,
};

auto write_snapshot_atomically(file_system& fs,
                               std::filesystem::path const& target,
                               std::span<char const> bytes)
    -> std::expected<void, snapshot_error>
{
    auto staging = target;
    staging += ".tmp";

    if (auto r = fs.write(staging, bytes); !r) {
        return std::unexpected(snapshot_error::staging_write_failed);
    }

    if (auto r = fs.rename(staging, target); !r) {
        fs.remove(staging);
        return std::unexpected(snapshot_error::commit_failed);
    }

    return {};
}

TEST(write_snapshot_atomically_cleans_up_staging_file_on_commit_failure)
{
    fake_file_system fs;
    fs.fail_rename_with(make_error_code(std::errc::device_or_resource_busy));

    auto result = write_snapshot_atomically(
        fs,
        "cache/index.bin",
        std::as_bytes(std::span{"new snapshot"sv.data(), "new snapshot"sv.size()}));

    ASSERT_FALSE(result.has_value());
    EXPECT_EQ(result.error(), snapshot_error::commit_failed);
    EXPECT_FALSE(fs.exists("cache/index.bin.tmp"));
    EXPECT_EQ(fs.read("cache/index.bin"), "old snapshot");
}

This is the right kind of seam because it sits at the business boundary. The test does not mock half the standard library. It creates one replaceable interface around the external effect and checks the postconditions the caller depends on.

That tradeoff matters. Over-mocking infrastructure produces brittle tests that ratify the implementation order of syscalls rather than the safety properties of the operation. Under-seaming the design leaves failure paths untestable except by broad integration tests. The middle ground is to isolate the resource boundary once, then write tests against commit and rollback behavior.

When over-mocking hides real bugs

Consider the difference between a test that checks implementation details and one that checks the safety property. Teams often write tests like this:

// BAD: This test passes, but proves nothing about cleanup.
TEST(write_snapshot_calls_remove_on_rename_failure)
{
    strict_mock_file_system fs;
    EXPECT_CALL(fs, write(_, _)).WillOnce(Return(std::expected<void, std::error_code>{}));
    EXPECT_CALL(fs, rename(_, _)).WillOnce(Return(
        std::unexpected(make_error_code(std::errc::device_or_resource_busy))));
    EXPECT_CALL(fs, remove("cache/index.bin.tmp")).Times(1);

    write_snapshot_atomically(fs, "cache/index.bin", as_bytes("data"sv));
}

This test verifies that remove is called. It does not verify that the staging file is actually gone or that the original file is untouched. If someone refactors the cleanup to use std::filesystem::remove_all or changes the staging path convention, this test breaks – but a real bug where remove silently fails and leaves the staging file behind would pass. The earlier test against fake_file_system is stronger because it asserts observable postconditions, not call sequences.

Resource-leak tests: verify cleanup, not just happy-path ownership

Scoped RAII is not enough if error paths skip construction or move ownership incorrectly. A surprisingly common pattern is a resource that leaks only on a specific failure path:

class connection_pool {
public:
    auto acquire() -> std::expected<pooled_connection, pool_error>;
    void release(pooled_connection conn) noexcept;
};

// This function has a leak on the second acquire failure.
auto transfer(connection_pool& pool, transfer_request const& req)
    -> std::expected<receipt, transfer_error>
{
    auto src = pool.acquire();
    if (!src) return std::unexpected(transfer_error::no_connection);

    auto dst = pool.acquire();
    if (!dst) {
        // BUG: forgot to release src back to the pool.
        return std::unexpected(transfer_error::no_connection);
    }

    // ... perform transfer, release both on success ...
    pool.release(std::move(*src));
    pool.release(std::move(*dst));
    return receipt{};
}

A test that only exercises the success path never sees the leak. A test that only checks the return value on failure also misses it. The test that catches it asserts pool state:

TEST(transfer_releases_source_connection_when_dest_acquire_fails)
{
    counting_connection_pool pool{.max_connections = 1};

    auto result = transfer(pool, make_request());

    ASSERT_FALSE(result.has_value());
    EXPECT_EQ(pool.available(), 1);  // Source connection must be returned.
}

This is the pattern: if you own a scarce resource, your failure-path tests should assert that the resource is released, not just that an error was returned.

Exception safety: the gap between “compiles” and “correct”

Even noexcept-free code paths deserve testing when exception safety matters. A container or cache that provides the strong exception guarantee should be tested for it:

TEST(cache_insert_preserves_existing_entries_on_allocation_failure)
{
    lru_cache<std::string, std::string> cache(/*capacity=*/4);
    cache.insert("key1", "value1");
    cache.insert("key2", "value2");

    failing_allocator::arm_failure_after(1);  // Fail during insert internals.

    auto result = cache.insert("key3", "value3");
    EXPECT_FALSE(result.has_value());

    // Strong guarantee: pre-existing entries are intact.
    EXPECT_EQ(cache.get("key1"), "value1");
    EXPECT_EQ(cache.get("key2"), "value2");
    EXPECT_EQ(cache.size(), 2);
}

If the cache provides only the basic guarantee, the test should still verify that no resources leaked and the cache is in a valid (if modified) state. The worst outcome is no test at all – the cache silently corrupts its internal structure on exception, and callers discover the problem under production allocation pressure.

The same pattern applies to sockets, transactions, lock-guarded registries, temporary directories, subprocess handles, and thread-owning services. Ask what the stable contract is on success, partial failure, retry, and shutdown. Test that.

Boundary tests should prove translation, not just parsing

Modern C++ code often spends its complexity budget at boundaries: network protocols, file formats, process boundaries, plugin APIs, database clients, and C interfaces. Bugs here are expensive because they corrupt assumptions on both sides. A boundary test should verify three things.

First, valid inputs map to the internal representation without lifetime tricks. If a parser stores std::string_view into longer-lived state, boundary tests should prove that the view refers to stable ownership or that the representation copies when necessary. Second, invalid or partial inputs fail with the right error category. A parse failure, transport failure, and business-rule rejection should not collapse into one generic error path unless that is explicitly the API contract. Third, output formatting or translation back out of the component preserves invariants such as ordering, escaping, units, and versioning.

Use realistic artifacts here. For a configuration loader, keep real sample files beside the tests. For an HTTP or RPC edge, keep representative payloads, including malformed headers, oversized bodies, duplicate fields, bad encodings, and unsupported versions. For a library with a C API, write tests at the ABI-facing surface rather than only against the internal C++ wrapper. If the boundary promises not to throw, test that promise under allocator pressure and invalid inputs.

These tests do not have to be large. They do have to be concrete. “Round-trips one JSON object” is weak. “Rejects duplicate primary key fields with a schema error and leaves previous configuration active” is strong.

Boundary edge cases that curated examples miss

Pay attention to boundary conditions that look harmless in isolation but interact badly:

// A parser that stores string_view into a longer-lived config object.
// This test passes because the input string outlives the config.
TEST(config_parser_reads_server_name)
{
    std::string input = R"({"server": "prod-01"})";
    auto cfg = parse_config(input);
    EXPECT_EQ(cfg.server_name(), "prod-01");  // PASSES -- but fragile.
}

// This test exposes the dangling view.
TEST(config_survives_input_destruction)
{
    auto cfg = []{
        std::string input = R"({"server": "prod-01"})";
        return parse_config(input);
    }();
    // input is destroyed. If server_name() holds a string_view into it,
    // this is use-after-free. It may still "pass" without sanitizers.
    EXPECT_EQ(cfg.server_name(), "prod-01");
}

The first test is the one most teams write. The second is the one that catches the real bug. Pair it with AddressSanitizer (next chapter) to turn the silent corruption into a hard failure.

Other frequently missed boundary edge cases worth explicit tests:

Empty input, single-byte input, and input exactly at buffer boundaries.
Payloads where string fields contain embedded nulls, since std::string_view::size() and C strlen() disagree.
Error responses from dependencies that arrive as valid protocol frames with unexpected status codes, not just connection failures.
Inputs that are valid in one version of a schema but illegal in another, especially when version negotiation is involved.

The example project demonstrates several of these patterns concretely. In examples/web-api/tests/test_http.cpp, test_parse_request_malformed() feeds the string "not a valid request" into the parser and asserts that parse_request() returns std::nullopt rather than crashing or producing a half-initialized Request. This is a malformed-input boundary test that catches parsers which assume well-formed input. The test also exercises missing headers (test_header_missing()), confirming that the std::optional-returning header() method handles absence correctly rather than returning a dangling view or a default-constructed string.

In examples/web-api/tests/test_task.cpp, the boundary validation tests follow the same approach. test_task_validation_rejects_empty_title() and test_task_validation_rejects_long_title() (the latter constructing a 257-character string) verify that domain invariants hold at the extremes. These are not implementation-order tests – they assert the business rule that a task title must be non-empty and within a length bound, and they check that the error is reported through std::expected with the correct ErrorCode::bad_request, not swallowed or converted to an exception.

Failure injection is more valuable than more mocks

C++ error paths are where ownership mistakes become production incidents. If you only test success, you are effectively declaring the error-handling code unreviewed.

Deterministic failure injection is the practical answer. Introduce failures where the component crosses a resource boundary or scheduling boundary: file open, rename, memory allocation inside a bounded component, task submission, timer expiration, downstream RPC call, or durable commit. Then verify that the operation leaves the system in a valid state.

The important word is deterministic. Randomly failing syscalls can be useful in chaos environments, but they are weak as regression tests. A regression test should be able to say exactly which operation fails and what state must hold afterward.

Design the seams accordingly.

File and network adapters should be replaceable at the operation boundary.
Clock and timer sources should be injectable so timeout tests do not sleep.
Task scheduling should allow a test executor that advances work deliberately.
Shutdown and cancellation should expose a completion point that tests can await.

This design pressure is healthy. If a component cannot be forced through its failure modes without global monkey-patching, it is usually too entangled with the environment.

Avoid one common overreach: simulating allocator failure everywhere. Allocation-failure testing can be useful for hard real-time or infrastructure components with strong recovery guarantees, but in many codebases it produces noise and unrealistic control flow. Use it where the contract actually depends on low-memory survival. For most service code, I/O failure, timeout, cancellation, and partial-commit behavior are the higher-value targets.

Property tests and fuzzers belong at input-rich boundaries

Some boundaries are too large for curated examples alone. Parsers, decoders, compressors, SQL-like query fragments, binary message readers, path normalizers, and command-line interpreters all accept vast input spaces. Here property-based tests and fuzzing pay for themselves.

The point is not novelty. The point is to encode invariants that should survive many inputs.

Examples of good properties:

Parsing valid configuration preserves semantic equality after serialize-then-parse.
Invalid UTF-8 never produces a successful normalized identifier.
A message decoder either returns a fully formed value or a structured error; it never leaves partially initialized output observable.
Path normalization is idempotent for already-normalized relative paths within the accepted domain.

Fuzzing is especially strong for native code because malformed inputs often drive control flow into rarely tested branches where lifetime mistakes and undefined behavior live. But keep the chapter boundary straight: fuzzing is still a testing strategy. Its value comes from generating pressure on contracts and invariants. The next chapter explains how sanitizers make fuzzing much more productive by turning silent memory corruption into actionable failures.

Use seed corpora that look like production traffic, not just arbitrary bytes. Otherwise the fuzzer spends too much time exploring input shapes your real system would reject at an outer layer. For protocol readers, include truncated messages, duplicate fields, bad lengths, unsupported versions, and compression edge cases. For text formats, include overlong tokens, invalid escapes, and mixed line endings.

Concurrency and cancellation tests need controllable time

Many C++ teams know that sleep-based tests are flaky, then keep writing them because the production code hard-wires real clocks and thread pools. The result is a false economy: tests pass locally, fail in CI, and still miss the real shutdown bug.

If a component depends on deadlines, retries, stop requests, or background draining, design it so tests can control time and scheduling. std::stop_token and std::jthread help express cancellation intent, but they do not remove the need for deterministic orchestration. A task queue that runs on an injectable executor is easier to verify than one that immediately spawns detached work. A retry loop that takes a clock and sleep strategy is easier to test than one that directly calls std::this_thread::sleep_for.

Good concurrency tests typically assert one of these behaviors.

A stop request prevents new work from starting.
In-flight work observes cancellation at defined suspension points.
Shutdown waits for owned work and does not use freed state afterward.
Backpressure limits queue growth instead of converting overload into unbounded memory growth.
Timeout paths return consistent error categories and release owned resources.

Notice that none of those are “called callback X before callback Y.” They are lifecycle guarantees. That is where concurrency bugs become expensive.

The example project’s test_concurrent_access() in examples/web-api/tests/test_repository.cpp shows this concretely. It spawns 8 std::jthreads, each creating 100 tasks concurrently, then asserts that the final repository size equals 800. This tests the invariant that the shared_mutex-protected TaskRepository does not lose or duplicate entries under concurrent writes. The same file also demonstrates update-validation testing: test_update_validates() mutates a task’s title to an empty string inside the update() callback and asserts that the repository rejects the mutation with ErrorCode::bad_request. This is a boundary-meets-concurrency test – it verifies that the re-validation step inside the write-locked update() path catches invariant violations even when the updater callable is provided by the caller.

The project’s CMake configuration (examples/web-api/CMakeLists.txt) also supports running these tests under sanitizers via ENABLE_ASAN and ENABLE_TSAN options. Running the concurrent test under ThreadSanitizer provides mechanical evidence that the locking protocol is correct, rather than relying on the test passing by luck on a particular scheduler interleaving.

Integration tests should validate whole cleanup stories

Not every resource bug can be proven with isolated tests. Some failures emerge only when the real file system, process model, sockets, or thread scheduling are involved. You still want focused unit and property tests, but you also need a smaller set of integration tests that validate end-to-end cleanup behavior.

For a service, that may mean starting the process with a temporary data directory, sending realistic requests, forcing a failure at the storage layer, then verifying restart behavior and on-disk state. For a library, it may mean exercising the public API from a tiny host program that loads configuration, starts background work, cancels it, and unloads cleanly. For tooling, it may mean invoking the real executable against fixture trees and checking exit codes, stderr, and file-system postconditions.

Keep these tests scenario-based and scarce. They are slower and harder to diagnose than unit tests. Their job is to validate a full cleanup story: partial writes do not become committed state, repeated starts do not inherit garbage from failed shutdown, and external contracts remain stable under failure.

What to stop testing

Weak tests consume review time without improving confidence.

Stop writing tests that merely restate the current implementation structure.

Tests that verify every helper invocation but never assert an externally meaningful postcondition.
Mock-heavy tests that would fail if you merged two internal functions even though the contract stayed correct.
Sleep-based async tests whose real assertion is “the machine was idle enough today.”
Snapshot tests for logs or error strings when the contract is the error category and structured fields, not the prose.
Broad integration suites used as a substitute for precise failure-path tests.

The discipline is to spend test budget where the bug classes live. In C++, those bug classes cluster around ownership, boundaries, cancellation, and malformed input. Design for those explicitly.

Takeaways

Testing strategy in modern C++ should follow failure economics, not generic layering slogans. Resource-owning code needs deterministic failure-path tests. Boundary-heavy code needs contract tests with realistic artifacts. Input-rich code needs properties and fuzzing. Concurrent code needs controllable time and scheduling. Integration tests should validate whole cleanup stories, not replace focused tests.

Use this chapter to decide what behavior must be proven before shipping. Use the next chapter to decide which compilers, sanitizers, analyzers, and build diagnostics should mechanically search for bugs while those tests run.

Review questions:

What are the resource commit, rollback, and release guarantees of this component?
Which boundary translations need concrete contract tests with realistic fixtures?
Which failure points can be injected deterministically today, and which require redesign to become testable?
Which input surfaces deserve property tests or fuzzing rather than example-only coverage?
Which time, cancellation, or shutdown behaviors are currently tested by sleeping instead of by controlled scheduling?

Sanitizers, Static Analysis, and Build Diagnostics

Testing strategy tells you what behavior to exercise. This chapter is about the mechanical bug-finding stack that should run alongside that exercise: compiler warnings, sanitizers, static analysis, and build configurations that preserve useful diagnostics. These tools do not tell you whether the design is correct. They tell you whether the program stepped into a bug class that humans routinely miss in review.

That distinction matters. A test can assert that cancellation leaves no visible partial state. AddressSanitizer can tell you that the cleanup path touched freed memory while attempting to honor that contract. A contract test can prove that a parser rejects malformed input. UndefinedBehaviorSanitizer can tell you that one rejection path signed-overflowed while computing a buffer size. Observability can later tell you that a production build is crashing in a path you never exercised under sanitizer. Each layer answers a different question.

For native systems, the cost of skipping this layer is predictable. The bug is found later, reproduced less reliably, and diagnosed with worse evidence. If the build only produces optimized binaries with stripped symbols, weak warnings, and no analyzer or sanitizer jobs, the team has chosen slower debugging as policy.

Treat diagnostics as build products, not developer preferences

The first mistake is organizational, not technical. Teams often treat warnings and analysis as optional local tooling, which means they drift by compiler, machine, and mood. Production C++ needs the opposite posture. Diagnostic fidelity should be part of the build contract.

At minimum, your repository should define named build modes that answer distinct questions.

Build mode	Primary question	Typical characteristics
Fast developer build	Can I iterate quickly on logic?	Debug info, assertions, no or low optimization
Address/UB sanitizer build	Did execution hit memory or undefined-behavior bugs?	`-O1`, debug info, frame pointers, ASan and UBSan
Thread sanitizer build	Did concurrent execution hit a data race or lock-order problem?	Dedicated job, reduced parallelism, TSan only
Static analysis build	Does the code trigger warning patterns or analyzable defects before execution?	Compiler warnings, clang-tidy, analyzer jobs
Release-with-symbols build	Will production behavior remain diagnosable?	Release optimization, external symbols, build IDs, stable source mapping

Trying to collapse those into one universal configuration usually fails. TSan carries too much overhead for every build. ASan and UBSan alter memory layout and timing. Deep analysis jobs are slower than normal edit-compile-run loops. The right answer is not one magical build. The right answer is a deliberate matrix.

That matrix should live in versioned build scripts or presets, not in tribal knowledge. If the repository cannot tell a new engineer exactly how to produce a sanitized binary or a release artifact with symbols, the workflow is not mature enough.

The example project in examples/web-api/ demonstrates this with named CMake options that map directly to the build lanes above:

# examples/web-api/CMakeLists.txt
option(ENABLE_ASAN  "Enable AddressSanitizer + UBSan"  OFF)
option(ENABLE_TSAN  "Enable ThreadSanitizer"            OFF)

add_library(project_sanitizers INTERFACE)
if(ENABLE_ASAN)
    target_compile_options(project_sanitizers INTERFACE
        -fsanitize=address,undefined -fno-omit-frame-pointer)
    target_link_options(project_sanitizers    INTERFACE
        -fsanitize=address,undefined)
endif()
if(ENABLE_TSAN)
    target_compile_options(project_sanitizers INTERFACE -fsanitize=thread)
    target_link_options(project_sanitizers    INTERFACE -fsanitize=thread)
endif()

A new engineer clones the repository and runs cmake -G Ninja -DENABLE_ASAN=ON or -DENABLE_TSAN=ON. The lanes are discoverable, version-controlled, and produce distinct binaries. That is what “named build modes” looks like in practice.

Warnings are a policy surface

Compiler warnings are the cheapest analysis you have, and teams still waste them. One common failure mode is warning inflation: thousands of preexisting warnings train everyone to ignore the channel. The other is warning minimalism: fear of noise causes the team to enable so little that suspicious code passes silently.

The practical target is narrower and stricter.

Enable a serious warning set on all supported compilers.
Treat warnings as errors for owned code once the warning baseline is under control.
Keep suppressions local, versioned, and explained.
Review new suppressions like code changes, because that is what they are.

This is not about aesthetic cleanliness. Warnings often expose real review problems: narrowing conversions, missing overrides, ignored return values, shadowing that hides ownership state, switch exhaustiveness gaps, or accidental copies in hot paths. Some warnings are stylistic and should stay off. That is fine. The point is to make the enabled set defensible and stable.

Be especially careful with blanket suppression at target level. When a third-party header or generated source is noisy, isolate it rather than muting the same diagnostic across the repository. Teams often create future blind spots by solving one vendor problem with project-wide suppression.

Sanitizers turn silent corruption into actionable failures

Sanitizers are valuable because they change failure mode. Instead of a memory bug manifesting as a distant crash or impossible state, it stops near the violation with a stack trace and an explanation of the bug class.

For most production C++ codebases, three sanitizer configurations carry the highest value.

AddressSanitizer and leak detection

AddressSanitizer is the standard first line because it finds a wide set of bugs that otherwise waste enormous time: use-after-free, heap buffer overflow, stack use-after-return in some configurations, double free, and related memory-lifetime violations. Leak detection, where available, adds another useful signal for test processes and short-lived tools.

ASan is especially effective when paired with the testing strategies from the previous chapter. Failure-path tests, fuzzers, and integration scenarios drive execution into branches where ownership mistakes live. ASan then converts those mistakes into reproducible failures.

The bug that “works” without ASan

This is the canonical case that wastes days of debugging time in codebases that skip sanitizer builds:

auto get_session_name(session_registry& registry, session_id id)
    -> std::string_view
{
    auto it = registry.find(id);
    if (it == registry.end()) return {};
    return it->second.name();  // Returns view into the session object.
}

void log_and_remove_session(session_registry& registry, session_id id)
{
    auto name = get_session_name(registry, id);
    registry.erase(id);             // Session destroyed. name is now dangling.
    audit_log("removed session: {}", name);  // Use-after-free.
}

Without ASan, this code will usually pass tests and even run correctly in production for months. The freed memory still contains the old string data until something else overwrites it. The test passes. Code review might not catch it – the function looks straightforward. When it does fail, the symptom is garbled log output or a crash in an unrelated allocation, nowhere near the actual bug.

Under ASan, this produces an immediate, precise failure:

==41032==ERROR: AddressSanitizer: heap-use-after-free on address 0x6020000000d0
READ of size 12 at 0x6020000000d0 thread T0
    #0 0x55a3c1 in log_and_remove_session(session_registry&, session_id)
        src/session_manager.cpp:47
    #1 0x55a812 in handle_disconnect src/connection.cpp:103

0x6020000000d0 is located 0 bytes inside of 32-byte region
freed by thread T0 here:
    #0 0x4c1a30 in operator delete(void*)
    #1 0x55a7f1 in session_registry::erase(session_id)
        src/session_manager.cpp:31

previously allocated by thread T0 here:
    #0 0x4c1820 in operator new(unsigned long)
    #1 0x55a620 in session_registry::insert(session_id, session_info)
        src/session_manager.cpp:22

The report identifies the exact read, the exact free, and the exact allocation. Compare that with the alternative: a corrupted log entry three weeks from now that nobody connects to this code path.

Typical build characteristics look like this:

clang++ -std=c++23 -O1 -g -fno-omit-frame-pointer \
    -fsanitize=address,undefined

The exact flags vary by toolchain, but the principles are stable: keep enough optimization to preserve realistic structure, keep debug info, and keep frame pointers so stacks are usable.

UndefinedBehaviorSanitizer

UBSan is the companion that catches dangerous behavior not always visible as memory corruption: misaligned access, invalid shifts, bad enum values, null dereference in some contexts, signed overflow depending on configuration, and other undefined or suspicious operations. The important operational lesson is that undefined behavior is often input-sensitive and build-sensitive. The same code may pass tests for months, then fail only on a new compiler or after an inlining change. UBSan helps surface those hazards while the bug is still local enough to fix sanely.

Do not over-interpret it, though. UBSan is not a proof system. It only reports behavior that the exercised execution encountered and that the enabled checks can see.

A concrete example: signed overflow in size calculations is a common source of security bugs that compilers are free to exploit.

auto compute_buffer_size(std::int32_t width, std::int32_t height, std::int32_t channels)
    -> std::int32_t
{
    return width * height * channels;  // Signed overflow if product exceeds INT32_MAX.
}

For width=4096, height=4096, channels=4, the product is 67,108,864 – safe. For width=32768, height=32768, channels=4, the product is 4,294,967,296 which overflows a 32-bit signed integer. Without UBSan, the compiler may optimize downstream bounds checks away entirely because signed overflow is undefined. UBSan catches this at the multiplication:

runtime error: signed integer overflow: 32768 * 32768 cannot be
represented in type 'int'

The fix is to use unsigned arithmetic or to check for overflow before the multiplication. The point is that this class of bug is silent, optimizer-sensitive, and often security-relevant – the class of bug UBSan is built to catch.

ThreadSanitizer

TSan is expensive and often noisy around custom synchronization, lock-free code, and some coroutine or foreign-runtime integrations. It is still worth running because data races remain among the most expensive native bugs to diagnose after the fact.

The data race that tests never catch

Data races are invisible to testing without TSan because they depend on scheduling. Consider a metrics counter shared between a request handler and a background reporter:

struct service_stats {
    std::int64_t requests_handled = 0;   // No synchronization.
    std::int64_t bytes_processed = 0;
};

// Thread 1: request handler
void handle_request(service_stats& stats, request const& req) {
    process(req);
    stats.requests_handled++;    // Data race: unsynchronized write.
    stats.bytes_processed += req.size();
}

// Thread 2: periodic reporter
void report_stats(service_stats const& stats) {
    log_metrics("requests", stats.requests_handled);   // Data race: unsynchronized read.
    log_metrics("bytes", stats.bytes_processed);
}

This code will pass every test you write. It will run correctly for months on x86 where the memory model is relatively forgiving. It becomes a problem when the compiler reorders the writes, when the optimizer lifts the read into a register, or when someone ports to ARM. The bug is real today but the symptoms are deferred.

TSan catches it immediately:

WARNING: ThreadSanitizer: data race (pid=28511)
  Write of size 8 at 0x7f8e3c000120 by thread T1:
    #0 handle_request(service_stats&, request const&)
        src/handler.cpp:24
    #1 worker_loop src/server.cpp:88

  Previous read of size 8 at 0x7f8e3c000120 by thread T2:
    #0 report_stats(service_stats const&)
        src/reporter.cpp:12
    #1 reporter_loop src/server.cpp:102

  Location is global 'g_stats' of size 16 at 0x7f8e3c000120

  Thread T1 (tid=28513, running) created by main thread at:
    #0 pthread_create
    #1 start_workers src/server.cpp:71

  Thread T2 (tid=28514, running) created by main thread at:
    #0 pthread_create
    #1 start_reporter src/server.cpp:76

The fix is to use std::atomic<std::int64_t> with appropriate memory ordering, or to protect the struct with a mutex if the fields must be read consistently together. The important point is that no amount of conventional testing would have found this – the test needs TSan to convert a scheduling-dependent corruption into a deterministic failure.

The operational pattern is usually different from ASan. Run TSan in a narrower CI lane or nightly job. Feed it tests that deliberately stress shared-state paths, shutdown, retries, and cancellation. Keep the suppression file short and justified. If TSan reports a race in supposedly benign statistics code, do not dismiss it reflexively. Benign races have a habit of becoming real ones after the next feature.

Avoid stacking TSan with other heavy sanitizers in the same build. Separate jobs make failures easier to interpret and keep the timing distortion manageable.

The example project’s TaskRepository (in examples/web-api/src/modules/repository.cppm) is a case where TSan validates a correct synchronization pattern. The repository protects its internal std::vector<Task> with a std::shared_mutex, using std::shared_lock for read paths (find_by_id, find_all) and std::scoped_lock for write paths (create, update, remove). Building with -DENABLE_TSAN=ON and exercising concurrent readers and writers confirms that this locking discipline has no data races – evidence that conventional tests alone cannot provide.

Static analysis scales review attention

Static analysis is most useful when it is selective and boring. If the analyzer produces pages of stylistic noise, the team will stop reading it. If it is tuned toward patterns that actually matter in your codebase, it extends what reviewers can catch.

Useful targets in modern C++ typically include:

Dangling views or references caused by temporary lifetime mistakes.
Missing or misapplied override, noexcept, or [[nodiscard]] where the API contract depends on them.
Suspicious ownership transfer patterns involving raw pointers, moved-from objects, or smart-pointer aliasing.
Error-handling mistakes such as ignored results, swallowed status values, or inconsistent translation at boundaries.
Expensive accidental copies across hot or high-volume interfaces.
Concurrency hazards such as locking inconsistencies or unsafe capture of shared state.

Compiler-integrated analysis, clang-tidy, the Clang static analyzer, and platform-specific tools such as MSVC /analyze each catch somewhat different things. Use more than one if your toolchain supports it, but keep the output curated. A small enforced rule set that consistently catches real problems is better than a sprawling configuration everybody bypasses.

This is also where repository-specific knowledge belongs. If your service code should never ignore std::expected results from transport adapters, add checks and wrappers that make that hard to do silently. If your library forbids exceptions at the ABI boundary, analyze for that policy directly or enforce it via build and API structure. Static analysis improves when it knows what your contracts are.

Preserve diagnostic quality in release artifacts

One of the most damaging habits in native development is treating debuggability as a debug-build-only concern. Production failures happen in release builds. If those artifacts do not preserve enough information to map crashes and latency problems back to code, you have made later observability much harder.

Release artifacts should normally preserve at least these properties.

External symbol files or symbol servers so stacks can be symbolized after deployment.
Build IDs or equivalent version fingerprints that unambiguously map a dump or trace to an exact binary.
Source revision metadata embedded in the artifact or attached in deployment metadata.
Enough unwind support for usable native stack traces.
Stable compiler and linker settings recorded somewhere repeatable.

Depending on platform and sensitivity, that may also include frame pointers, split DWARF or PDB handling, map files, and archived link commands. The exact mechanics are toolchain-specific. The policy is not: if you cannot reproduce the diagnostic shape of the shipped binary, incident response slows down immediately.

This is why build diagnostics belong in the chapter with sanitizers and analysis rather than in the observability chapter. Observability consumes these artifacts later, but the decision to produce them is a build decision.

CI should stage cost, not pretend cost does not exist

A mature pipeline does not run every expensive check on every edit. It stages them by cost and by bug class.

For example:

Pull request gate: fast build, serious warnings, targeted tests, and at least one ASan/UBSan configuration on changed targets.
Scheduled or nightly jobs: broader sanitizer coverage, TSan, deeper static analysis, and fuzz targets with sanitizer enabled.
Release qualification: clean release-with-symbols build, packaging checks, and verification that symbol publication and build metadata succeeded.

The tradeoff is obvious: slower checks find bugs later in the day. The answer is not to drop them. The answer is to place them where they are sustainable and visible.

Do not let sanitizer or analyzer failures become advisory-only noise. If a lane is too flaky to gate anything, fix the flakiness or narrow its scope. A permanently red analysis job is organizationally equivalent to not having the job.

What these tools will not do

This tooling stack is powerful, but its limits should stay explicit.

Sanitizers do not prove correctness; they only instrument exercised executions.
Static analysis does not understand every project-specific invariant unless you encode those invariants into the code and configuration.
Warning cleanliness does not imply good API design or good failure handling.
A perfectly diagnosable build can still ship the wrong behavior.

That is why Part VI has three chapters instead of one. Testing strategy defines what must be exercised. Mechanical tooling catches classes of bugs while that exercise happens. Observability explains how to understand failures that still reach production.

Takeaways

In production C++, diagnostics must be designed, not hoped for. Keep a versioned build matrix with distinct jobs for fast iteration, sanitizers, analysis, and release-with-symbols. Treat warnings as a policy surface. Run ASan and UBSan routinely, TSan deliberately, and static analysis selectively enough that people still read the output. Preserve symbolization and build identity in release artifacts.

The central tradeoff is cost versus signal. Sanitizers and analysis slow the pipeline and occasionally require suppressions. Shipping without them costs far more when a native bug escapes. Choose the cost while the code is still local.

Review questions:

Which sanitizer configurations are mandatory for this target, and are they actually exercised by meaningful tests?
Which warnings are enforced repository-wide, and where are suppressions reviewed?
Which analyzer checks reflect real project contracts rather than generic style preferences?
Can a release crash or dump be mapped back to an exact binary, symbol set, and source revision?
Which expensive checks run later by design, and is that staging explicit rather than accidental?

Observability for Native Systems

Testing and sanitizers reduce how often bugs escape. They do not eliminate production failures, and they do not explain live behavior under real load, real data, real dependency failures, and real rollout conditions. That is what observability is for.

In native systems, poor observability is unusually expensive. When a managed service stalls, you may still have a runtime with rich exceptions, heap snapshots, and standardized tracing hooks. In C++ you may instead have a partially symbolized crash, a thread pool stuck behind one blocked dependency, RSS growth that does not cleanly map to logical ownership, and operators who only know that latency tripled after a deployment. If logs, metrics, traces, and dump artifacts were not designed into the system before the incident, the investigation starts from guesswork.

This chapter is about runtime evidence. Keep the boundary sharp. Tests ask whether the code meets a contract before shipping. Sanitizers and static analysis mechanically search for bug classes in development and CI. Observability answers a different question: when a native system is running for real, what signals let engineers explain failure, overload, or degradation fast enough to act?

Start from operating questions

Observability is weak when it begins as “add some logs.” It gets strong when it begins with operating questions.

For a service, those questions may be:

Why did request latency rise even though CPU stayed moderate?
Which dependency failures are causing retries, queue growth, or partial work?
Did shutdown hang because work ignored cancellation or because a downstream dependency never drained?
Is memory growth caused by leaks, fragmentation, caching, or backlog?

For a library, the questions shift:

Which host operation triggered the failure and with what inputs or version context?
Is the library spending time on parsing, waiting, locking, allocation, or I/O?
Can the host correlate library failures with its own request or job identifiers?

Those questions determine which fields, metrics, and spans are worth recording. Without them, teams default to verbose but low-value telemetry: string-heavy logs, counters with no dimensions, or traces that show everything except queueing, retries, and cancellation.

What debugging looks like without observability

To make the value concrete, consider a service that processes file uploads. A user reports that uploads are timing out. Here is the handler with no structured observability:

void handle_upload(const http_request& req) {
    std::println("INFO: Processing upload");

    std::println("INFO: Starting validation");
    auto validation = validate(req);
    if (!validation) {
        std::println(stderr, "ERROR: Validation failed: {}",
                     validation.error().message());
        return;
    }

    std::println("INFO: Storing file");
    auto store_result = store(req);
    if (!store_result) {
        std::println(stderr, "ERROR: Store failed: {}",
                     store_result.error().message());
        return;
    }

    notify(req);
    std::println("INFO: Upload complete");
}

Which upload failed? Was it the same user or a different one? What was it waiting on – disk I/O, a downstream service, a lock? How long did the store call take? Was the failure retried? None of these questions are answerable from the logs this function produces. The on-call engineer resorts to grepping logs by timestamp ranges, guessing at correlation, and asking the user to reproduce.

Now the same function with structured logging, correlation IDs, and dimensional metrics:

void handle_upload(upload_context& ctx) {
    auto span = ctx.tracer().start_span("handle_upload", {
        {"request_id", ctx.request_id()},
        {"user_id",    ctx.user_id()},
        {"file_size",  ctx.file_size()},
        {"shard",      ctx.shard_id()},
    });

    ctx.log(severity::info, "upload_started", {
        {"request_id", ctx.request_id()},
        {"file_name",  ctx.file_name()},
        {"file_size",  std::to_string(ctx.file_size())},
    });

    auto validation = validate(ctx);
    if (!validation) {
        ctx.log(severity::warning, "validation_failed", {
            {"request_id", ctx.request_id()},
            {"reason",     validation.error().category()},
        });
        ctx.metrics().increment("upload_failures", 1,
            {{"reason", "validation"}, {"shard", ctx.shard_id()}});
        return;
    }

    auto store_result = store(ctx);
    if (!store_result) {
        ctx.log(severity::error, "store_failed", {
            {"request_id",  ctx.request_id()},
            {"dependency",  "blob_store"},
            {"error_class", store_result.error().category()},
            {"latency_ms",  std::to_string(store_result.elapsed_ms())},
        });
        ctx.metrics().increment("upload_failures", 1,
            {{"reason", "store"}, {"shard", ctx.shard_id()}});
        return;
    }

    ctx.metrics().observe_latency("upload_duration_ms", span.elapsed_ms(),
        {{"shard", ctx.shard_id()}});
    ctx.log(severity::info, "upload_complete", {
        {"request_id", ctx.request_id()},
        {"latency_ms", std::to_string(span.elapsed_ms())},
    });
}

Now the on-call engineer filters by request_id, sees that the timeout happened during store with dependency=blob_store, checks the upload_duration_ms histogram by shard, and discovers that shard-3 latency spiked at 09:40. The blob store dashboard confirms the dependency was degraded. Total investigation time drops from hours to minutes.

Both versions of handle_upload do the same work: validate, store, notify. The difference is not more code. It is code that was written with operating questions in mind from the start.

Logs should explain decisions and state transitions

Logs are most useful when they capture decisions the system made and the state that mattered at the time, not when they narrate every function call. In native systems, this discipline matters even more because the volume and overhead of logging can become a performance problem quickly.

Good production logs are structured, sparse, and stable.

Structured means the important fields are emitted as machine-readable key-value data, not hidden in prose.
Sparse means the default path is quiet and the exceptional path is informative.
Stable means field names and meanings do not drift every sprint.

When an operation fails, the log record should usually capture identity, classification, and local operating context.

Request or job identifier.
Operation or route name.
Failure category, not just a formatted message.
Retryability or permanence if the code knows it.
Resource indicators that change diagnosis, such as queue depth, shard, peer, or attempt number.
Version or build metadata when rollout state matters.

Avoid two common mistakes.

First, do not make logs the only source of truth for metrics-like questions. If you need to know retry rate or queue depth, emit those as metrics instead of forcing operators to reconstruct them from text. Second, do not log high-cardinality payloads or sensitive blobs just because an incident once needed them. Put those behind deliberate sampling or debug paths.

std::source_location can be useful in low-volume internal diagnostics or infrastructure code, especially when you need a stable call-site tag without hand-maintaining strings. It is not a substitute for a meaningful operation name. A log saying source=foo.cpp:412 is weaker than one saying operation=manifest_reload phase=commit.

The example project’s request_logger() middleware (in examples/web-api/src/modules/middleware.cppm) shows a minimal but structured starting point for per-request logging:

// examples/web-api/src/modules/middleware.cppm — request_logger()
return [](const http::Request& req, const http::Handler& next) -> http::Response {
    auto start = std::chrono::steady_clock::now();
    auto resp = next(req);
    auto elapsed = std::chrono::steady_clock::now() - start;
    auto ms = std::chrono::duration_cast<std::chrono::microseconds>(elapsed).count();

    std::println("[{}] {} {} → {} ({} μs)",
        "LOG",
        http::method_to_string(req.method),
        req.path,
        resp.status,
        ms
    );
    return resp;
};

Every response carries method, path, status code, and latency. That is not yet production-grade structured logging – the fields are embedded in formatted text rather than emitted as machine-queryable key-value pairs – but it demonstrates the right instinct: capture identity (route), outcome (status), and timing (latency) at a single cross-cutting point. Moving to a structured JSON or key-value emitter is a natural evolution.

Unstructured versus structured logging in practice

The difference between unstructured and structured logs matters most during incidents, when the person reading logs is under time pressure and may not have written the code.

// Unstructured: human-readable but machine-hostile.
log("Failed to connect to database server db-prod-3 after 3 retries "
    "(last error: connection refused), request will be dropped");

This line contains useful information, but extracting it requires parsing English. You cannot filter by retry count, dependency name, or error class without fragile regex. Across a fleet of instances, aggregating failure patterns from lines like this is expensive and error-prone.

// Structured: same information, machine-queryable.
ctx.log(severity::error, "dependency_connect_failed", {
    {"dependency",  "db-prod-3"},
    {"attempts",    "3"},
    {"last_error",  "connection_refused"},
    {"action",      "request_dropped"},
    {"request_id",  ctx.request_id()},
});

Now dependency_connect_failed events can be counted, filtered by dependency name, and correlated with specific requests. The field names are stable across code changes, so dashboards and alerts do not break when someone rewords a log message.

Metrics should track throughput, saturation, and failure shape

Metrics answer questions that logs answer poorly: rates, distributions, long-term drift, and comparative behavior across instances. For native systems, the most useful metrics usually fall into three families.

The first is throughput and latency: request rate, task completion rate, retry rate, and latency histograms for important stages. Use histograms for latency, not just averages. Native performance failures are often tail problems.

The second is saturation: queue depth, worker utilization, open file descriptors, connection pool occupancy, allocator pressure, pending timers, and outstanding background tasks. These tell you whether the system is busy in a healthy way or accumulating work it cannot retire.

The third is failure shape: counts by error category, timeout counts, cancellation counts, parse failures, dropped work, crash-loop restarts, and degraded-mode activations. These reveal whether the system is failing because dependencies are slow, because input quality changed, or because internal backpressure kicked in.

The example project’s ErrorCode enum (in examples/web-api/src/modules/error.cppm) illustrates the foundation for failure-shape metrics. The closed set – not_found, bad_request, conflict, internal_error – with its constexpr to_http_status() mapping gives every failure a stable category. In a production evolution, you would increment a counter dimensioned by ErrorCode at the handler boundary, turning the type system’s classification into queryable metric labels without inventing ad-hoc string tags.

Use labels conservatively. High-cardinality labels are one of the fastest ways to turn a metrics system into an expensive liability. Request ID, user ID, file path, arbitrary exception text, and raw peer address usually do not belong as labels. Region, route, dependency name, result category, and bounded shard identifiers often do.

Gauges also deserve suspicion. They are easy to add and easy to misread. If a queue depth gauge jumps, is that a brief burst or a persistent trend? Pair gauges with rates or histograms when possible so operators can tell whether the state is draining.

Traces need to follow asynchronous ownership, not just synchronous calls

In distributed services and async native systems, traces are the only practical way to see where end-to-end time went. But C++ code often loses trace value by failing to preserve context across executors, callbacks, thread hops, and coroutine suspension.

If a request enters a service, enqueues background work, awaits a downstream call, and later resumes on another worker, the trace should still describe one coherent operation. That requires explicit propagation of trace context at the boundaries where ownership of work crosses time.

This is where earlier design decisions matter. Structured concurrency and explicit cancellation scopes make tracing cleaner because the parent-child relationships are already meaningful. Detached work and ad hoc thread spawning make traces fragment into unrelated spans.

Record spans for stages that correspond to actual waiting or service boundaries.

Time spent queued before work starts.
Time spent executing local CPU work.
Time spent waiting on downstream I/O.
Time spent retrying or backing off.
Time lost to cancellation, shutdown, or overload shedding.

Do not create spans for every helper function. That produces trace noise without causal value. The purpose of tracing is to explain latency structure and dependency shape, not to restate the call graph.

The middleware pipeline in the example project (see middleware::chain() in examples/web-api/src/modules/middleware.cppm) is a natural carrier for trace context propagation. Each middleware wraps the next handler and has access to both the request and the response. A tracing middleware inserted into the pipeline can start a span before calling next, attach it to the request context, and close the span after the response returns. Because the pipeline already composes as a chain of std::function wrappers, adding a tracing stage requires no changes to handler signatures – a clean cross-cutting insertion point for trace propagation.

Without trace context: the invisible queue

A common failure mode in async native services is latency that lives in queueing, not in execution. Without trace propagation across executor boundaries, this is invisible:

// No trace context propagation. The span only covers execution, not waiting.
void enqueue_work(thread_pool& pool, request req) {
    pool.submit([req = std::move(req)] {
        auto span = tracer::start_span("process_request");  // Starts when work RUNS.
        process(req);
    });
    // Time between submit() and when the lambda actually executes is lost.
    // If the pool is saturated, requests wait 500ms in the queue,
    // but traces show 2ms of execution time. Operators see low latency
    // in traces while users experience high latency. The queue time is a
    // blind spot.
}

With proper context propagation, the full picture is visible:

void enqueue_work(thread_pool& pool, request req, trace_context ctx) {
    auto enqueue_time = steady_clock::now();
    pool.submit([req = std::move(req), ctx = std::move(ctx), enqueue_time] {
        auto queue_span = ctx.start_span("queued", {
            {"queue_ms", std::to_string(duration_cast<milliseconds>(
                steady_clock::now() - enqueue_time).count())},
        });
        queue_span.end();

        auto exec_span = ctx.start_span("process_request");
        process(req);
    });
}

Now the trace shows two spans – queueing and execution – both linked to the parent request. When queue time dominates, it is immediately visible in the trace waterfall. This is the kind of latency that metrics alone (average processing time) systematically hide.

Crash diagnostics are part of observability, not a separate emergency hobby

Native services and tools need a crash story before the first crash. That story includes more than “enable dumps.” You need to know where dumps go, how they are symbolized, how they map to exact builds, how operators correlate them with logs and traces, and which process metadata is attached.

At minimum, a crash event should be linkable to:

Exact binary or build ID.
Symbol files produced by the matching build.
Deployment metadata such as version, environment, and rollout ring.
Recent structured breadcrumbs around the failing operation.
Thread identities and, where possible, stack traces for relevant threads.

The build work from the previous chapter is what makes this viable. Symbol servers, build IDs, and release metadata are build concerns. Their operational payoff shows up here when an incident happens at 03:00 and the on-call engineer needs a useful stack instead of raw addresses.

Crash reporting also needs policy. Some components should fail fast because continuing risks data corruption. Others can isolate a failing request or plugin and keep the host alive. Observability should make that decision legible after the fact. If the process aborts intentionally on invariant violation, emit enough context before termination that the crash is distinguishable from an arbitrary segfault.

Resource visibility matters more in native systems

A recurring operational problem in C++ services is confusing logical work growth with memory bugs. RSS climbs, latency rises, and everyone asks whether there is a leak. Sometimes there is. Often the explanation is less clean: allocator retention, oversized caches, unbounded queues, stalled consumers, mmap growth, file descriptor leaks, or fragmentation under a new traffic pattern.

You cannot solve that with one metric. You need a set of resource signals that connect runtime behavior to likely causes.

RSS and virtual memory for coarse process shape.
Allocator-specific statistics where available.
Queue depth and backlog age for in-memory work accumulation.
Open file descriptor or handle counts.
Active thread count and blocked-thread indicators.
Connection pool occupancy and timeout counts.
Cache size and eviction metrics for components that intentionally retain memory.

The point is not to expose every allocator bin or kernel counter. The point is to make the likely failure modes distinguishable. If memory climbs while queue depth and backlog age also climb, overload is a stronger hypothesis than a pure leak. If memory climbs while queue depth stays flat and handle counts rise, a resource leak becomes more plausible. Observability should narrow the search space.

Libraries need host-owned telemetry boundaries

A reusable library should not assume a global logging framework, metrics backend, or tracing SDK. That creates the same dependency inversion problems discussed earlier in the book, now in operational form. Libraries should instead expose a narrow diagnostics boundary that the host can implement.

Intentional partial: a library-facing diagnostics sink

enum class severity { debug, info, warning, error };

struct diagnostic_field {
    std::string_view key;
    std::string_view value;
};

struct diagnostics_sink {
    virtual ~diagnostics_sink() = default;

    virtual void record_event(severity level,
                              std::string_view event_name,
                              std::span<diagnostic_field const> fields) noexcept = 0;

    virtual void increment_counter(std::string_view name,
                                   std::int64_t delta,
                                   std::span<diagnostic_field const> dimensions) noexcept = 0;
};

This kind of interface keeps the library honest. It can report parse failures, retries, cache evictions, or reload timings without hard-coding a vendor SDK into every binary that uses it. The host decides how to attach request IDs, export metrics, or bridge into traces.

The tradeoff is deliberate abstraction work. For a small internal-only component, direct integration may be acceptable. For a reusable library, host-owned telemetry is usually the cleaner long-term design.

What to avoid

Native observability goes wrong in familiar ways.

Logging every allocation, lock acquisition, or function entry in the hope that more data is always safer.
Using high-cardinality identifiers as metric dimensions.
Emitting traces that follow synchronous helper calls but drop async context on executor or coroutine boundaries.
Shipping binaries with weak symbolization and expecting crash analysis to work later.
Treating logs as the only operational interface instead of combining logs, metrics, traces, and dumps.
Making library telemetry depend on a specific service logging stack.

All of these create cost without proportional diagnostic value.

Takeaways

Observability in native systems is runtime evidence design. Start from concrete operating questions. Use logs for decisions and state transitions, metrics for rates and saturation, traces for end-to-end latency structure, and crash artifacts for postmortem debugging. Preserve the async and ownership boundaries that make these signals meaningful. Expose resource signals that let operators distinguish leaks, backlog, fragmentation, and dependency stalls.

The main tradeoff is overhead versus explanation quality. Rich telemetry adds CPU, memory, storage, and design complexity. Sparse telemetry lengthens incidents and leaves native failures ambiguous. Choose the smallest signal set that answers your real operating questions, then make it stable.

Review questions:

Which operating questions can this service or library answer today without guessing?
Which log fields are stable and structured enough to support automation, not just humans reading text?
Which metrics distinguish throughput, saturation, and failure shape rather than mixing them together?
Does trace context survive executor hops, callbacks, and coroutine suspension boundaries?
Can a production crash be correlated with exact build identity, symbols, and nearby operational context?

Building a Small Service in Modern C++

Small services are where many C++ teams either get disciplined or get hurt. The codebase is still small enough that people are tempted to improvise, but the process already has real failure modes: overload, half-configured startup, partial writes, dependency timeouts, queue growth, shutdown races, and production debugging with incomplete evidence. The language does not rescue you from that.

This chapter is not a framework tutorial. The production question is narrower: what shape should a small C++23 service have if you want ownership, failure handling, concurrency, and operations to remain reviewable six months later? The answer is not “use all the latest features.” The answer is to choose a service shape that keeps lifetime obvious, async work owned, resource limits explicit, and diagnosis possible under pressure.

The sample system is a small configuration-backed service that accepts requests, validates them, performs bounded background work, persists state, and exposes metrics and health information. The details are ordinary on purpose. Most production services are not conceptually novel. They fail because basic engineering boundaries were left vague.

Define the unit of ownership before the unit of deployment

The first architectural mistake in small services is organizing around endpoints, handlers, or framework callbacks instead of around owned resources. A deployable service owns a fixed set of long-lived things: configuration, listeners, executors, connection pools, storage adapters, telemetry sinks, and shutdown coordination. If those are not represented explicitly, the code drifts toward globals, shared singletons, detached work, and shutdown by hope.

The service object should therefore model ownership directly. It should be the place where long-lived dependencies are constructed, started, and stopped. That does not mean one giant god object. It means one clear root that owns the parts whose lifetimes must end together.

Intentional partial: a service root that owns time and shutdown

struct service_components {
    config cfg;
    request_router router;
    storage_client storage;
    bounded_executor executor;
    telemetry telemetry;
    http_listener listener;
};

class service {
public:
    explicit service(service_components components)
        : components_(std::move(components)) {}

    auto run(std::stop_token stop) -> std::expected<void, service_error>;
    void request_stop() noexcept;

private:
    auto start() -> std::expected<void, service_error>;
    auto drain() noexcept -> void;

    service_components components_;
    std::atomic<bool> stopping_{false};
};

This is intentionally boring. The service has one ownership root, one stop path, and one place to reason about startup and drain order. A small service does not need architectural theater.

The example project in examples/web-api/ follows this pattern. Its main.cpp constructs every long-lived resource in a single scoped sequence – repository, router, middleware pipeline, server – and wires them together before the process begins accepting work:

// examples/web-api/src/main.cpp — scoped multi-resource construction
webapi::TaskRepository repo;
// ... seed data ...

webapi::Router router;
router
    .get("/health",       webapi::handlers::health_check(repo))
    .get("/tasks",        webapi::handlers::list_tasks(repo))
    // ... remaining routes ...

std::vector<webapi::middleware::Middleware> pipeline{
    webapi::middleware::request_logger(),
    webapi::middleware::require_json(),
};
auto handler = webapi::middleware::chain(pipeline, router.to_handler());

webapi::http::Server server{port, std::move(handler)};
server.run_until(shutdown_requested);

There is one ownership root (main), one shutdown coordination point (shutdown_requested), and every resource has a clear scope. When run_until returns, destruction proceeds in reverse declaration order – server first, then handler, then router, then repository. No shared pointers, no global registries, no detached work.

That root should usually own concrete infrastructure types, not a graph of heap-allocated interfaces stitched together with shared ownership. Dependency inversion still matters, but the inversion point is usually at boundaries such as storage, transport, or telemetry adapters. Within the process, static ownership is simpler and cheaper than a forest of std::shared_ptr objects whose real owners no longer exist on paper.

Anti-pattern: shared_ptr soup for request state

A common failure mode is using std::shared_ptr to extend request lifetimes across callbacks, queues, and retries without an explicit ownership model. The code compiles and appears safe, but nobody can say when request resources actually release, whether cancellation reaches all holders, or whether shutdown can complete deterministically.

// BAD: shared_ptr soup — every callback extends lifetime indefinitely
void handle_request(std::shared_ptr<http_request> req) {
    auto ctx = std::make_shared<request_context>(req->parse_body());
    ctx->db_future = db_.async_query(ctx->query, [ctx](auto result) {
        ctx->result = result;
        cache_.async_store(ctx->key, ctx->result, [ctx](auto status) {
            ctx->respond(status);  // when does ctx die? who knows
        });
    });
    // ctx is now kept alive by two lambdas, the future, and possibly
    // a retry timer. cancellation cannot reach it. shutdown cannot
    // drain it. memory profile is non-deterministic.
}

The fix is to extract an owned work item and move it through the pipeline with clear handoff points.

// BETTER: owned work item with explicit lifetime boundaries
struct request_work {
    parsed_query query;
    std::stop_token stop;
    response_sink sink;  // move-only, writes exactly once
};

void handle_request(http_request& req, std::stop_token stop) {
    auto work = request_work{
        .query = req.parse_body(),
        .stop  = stop,
        .sink  = req.take_response_sink(),
    };
    executor_.submit(std::move(work));
    // work is now owned by the executor. cancellation reaches it
    // through stop_token. shutdown drains the executor.
}

The owned work item makes the design questions visible: what data survives the request boundary, who can cancel it, and where does it end up during shutdown.

Startup should either produce a running service or fail cleanly

Many service incidents start before the first request. Configuration is partially loaded. One subsystem is healthy, another is not. Threads start before health state exists. Background timers begin before dependencies are validated. The process reports “ready” because some constructor returned.

The correct startup question is not whether each individual component can be initialized. It is whether the process can reach a coherent running state. Startup should therefore be staged around dependency validation and explicit failure boundaries.

Useful startup order usually looks like this:

Load and validate immutable configuration.
Construct resource-owning adapters with explicit limits.
Verify downstream dependencies needed for readiness.
Start listeners and background work only after the process is coherent.
Publish readiness only after the previous steps succeed.

In C++23, std::expected is often a better fit than exceptions for this path because startup naturally accumulates infrastructure failures that need translation into stable operational categories. A service that fails because the config file is malformed, the port is unavailable, or the storage schema is incompatible should expose those as deliberate startup failures, not as whatever exception text happened to leak through an implementation detail.

The tradeoff is verbosity. std::expected asks you to write out translation points explicitly. That cost is usually worth paying in service startup, where hidden exception paths make process state harder to reason about. Inside leaf functions or internal helpers, exceptions may still be acceptable if the boundary that contains them is clear. What matters is that startup exposes one coherent contract to the top level.

Request handling should treat borrowed data as short-lived

Small services often fail by turning temporary request data into longer-lived internal state. Headers become std::string_view members in async jobs. Parsed payload views are retained in caches. Callbacks capture references to objects whose lifetime ended with the request. The service works until a slow path, retry path, or queue delay makes the mistake observable.

The rule is simple: borrowed views are excellent for synchronous inspection and terrible as implicit storage. Use std::string_view, std::span, and range views aggressively inside a request path when the lifetime is local and obvious. Convert to owning representations before the data crosses time, threads, queues, or retry boundaries.

That decision is one of the main reasons service code benefits from explicit request models. Parse and validate into a value type that owns what background work must retain. Keep that model small enough that copying it is a conscious cost, then move it into async work when the design requires time decoupling.

The handler functions in the example project (see examples/web-api/src/modules/handlers.cppm) follow this principle. Each handler factory returns a std::function<Response(const Request&)> – a value type that captures a reference to the repository and nothing else. The handler receives the request by const reference, extracts what it needs (a path parameter, a parsed body), performs the operation, and returns a response value. No request state leaks beyond the function call:

// examples/web-api/src/modules/handlers.cppm — value-type handler design
[[nodiscard]] inline http::Handler
get_task(TaskRepository& repo) {
    return [&repo](const http::Request& req) -> http::Response {
        auto segment = req.path_param_after("/tasks/");
        if (!segment) {
            return http::Response::error(400, R"({"error":"missing task id"})");
        }
        auto id_result = parse_task_id(*segment);
        if (!id_result) {
            return http::Response::error(
                id_result.error().http_status(), id_result.error().to_json());
        }
        auto task = repo.find_by_id(*id_result);
        if (!task) {
            return http::Response::error(404,
                std::format(R"({{"error":"task {} not found"}})", *id_result));
        }
        return http::Response::ok(task->to_json());
    };
}

The borrowed string_view from path_param_after never outlives the synchronous handler call. The Task returned from the repository is a value copy. The response is constructed and returned by value. There is no lifetime ambiguity.

This is where many C++ service codebases overuse std::shared_ptr<request_context>. Shared ownership looks like a convenient escape hatch for async lifetimes, but it often hides the actual design choice: which parts of the request need to survive, who owns them, and when they may be discarded. In a small service, it is usually better to extract an owned work item and move it into the queue than to extend the lifetime of an entire request graph.

Concurrency should be bounded, owned, and cancelable

The service concurrency model matters more than the individual primitive names. A small service rarely needs a large custom scheduler. It does need three things.

First, the amount of concurrent work must be bounded. If overload can translate directly into unbounded queue growth, you have not designed a service; you have delayed an outage. Bounded executors, semaphores, admission control, and per-request time budgets are more valuable than clever thread-pool internals.

Second, work must be owned. Detached threads and fire-and-forget tasks are attractive because they make local code short. They also destroy shutdown semantics. If the service can enqueue work, the service should know when that work starts, when it finishes, and how cancellation reaches it.

Third, cancellation must be part of the normal model rather than an afterthought. std::jthread and std::stop_token help here because they make stop propagation part of the type-level contract. They do not solve everything. You still need work units that check the token at sensible boundaries and storage or network operations that map cancellation into consistent errors. But they force the question into the code instead of leaving it in comments.

Anti-pattern: blocking the event loop

One of the most common service failures is performing synchronous blocking work on a thread that should be driving I/O or dispatching requests. The service appears healthy under light load, then collapses under traffic because the event loop is stuck in a database call, a DNS resolution, or a file read.

// BAD: synchronous blocking on the listener thread
void on_request(http_request& req) {
    auto record = db_.query_sync(req.key());   // blocks for 5-200ms
    auto enriched = enrich(record);             // CPU work, fine
    auto blob = fs::read_file(enriched.path()); // blocks again
    req.respond(200, serialize(blob));
}
// Under 50 concurrent requests, the listener thread is blocked
// for the entire duration of each request. Tail latency explodes.
// New connections queue at the OS level with no backpressure signal.

The fix is to dispatch blocking work to a bounded executor and keep the listener thread non-blocking.

// BETTER: dispatch blocking work off the listener thread
void on_request(http_request& req) {
    auto work = request_work{req.key(), req.take_response_sink()};
    if (!executor_.try_submit(std::move(work))) {
        req.respond(503, "overloaded");  // explicit rejection
        metrics_.increment("request.rejected.overload");
    }
    // listener thread returns immediately, ready for next connection
}

// In the executor's worker threads:
void process(request_work work) {
    auto record = db_.query_sync(work.key);
    auto enriched = enrich(record);
    auto blob = fs::read_file(enriched.path());
    work.sink.respond(200, serialize(blob));
}

Anti-pattern: no graceful shutdown

Services that lack explicit shutdown logic produce use-after-free bugs, partial writes, orphaned connections, and hung processes that must be SIGKILL-ed by the orchestrator. The failure is rarely visible in development because the process exits quickly. In production, in-flight work and background timers create real races.

// BAD: shutdown by destruction order and hope
class service {
    http_listener listener_;
    database_pool db_;
    std::vector<std::jthread> workers_;
public:
    ~service() {
        // listener_ destructor closes the socket (maybe)
        // workers_ destructors request stop and join (maybe)
        // db_ destructor closes connections (maybe)
        // but workers_ may still be using db_ when db_ destructs
        // destruction order is reverse-of-declaration, so db_
        // is destroyed BEFORE workers_ — use-after-free
    }
};

The fix is to make shutdown an explicit, ordered operation that drains work before destroying resources.

// BETTER: explicit drain-then-destroy shutdown
class service {
    database_pool db_;           // destroyed last
    http_listener listener_;
    bounded_executor executor_;  // owns worker threads
    std::atomic<bool> stopping_{false};
public:
    void shutdown() noexcept {
        stopping_.store(true, std::memory_order_relaxed);
        listener_.stop_accepting();               // 1. stop new work
        executor_.drain(std::chrono::seconds{5});  // 2. finish in-flight
        db_.close();                               // 3. release deps
        metrics_.flush();                          // 4. final telemetry
    }
    // destructor now only releases already-drained resources
};

The key insight is that destruction order is a language mechanism, not a shutdown policy. The two must be designed together, and explicit drain logic should precede any resource teardown that in-flight work might depend on.

The example project demonstrates a clean version of this pattern. In examples/web-api/src/main.cpp, a signal handler sets an atomic flag; Server::run_until() (in examples/web-api/src/modules/http.cppm) polls that flag and forwards the stop request to a std::jthread:

// examples/web-api/src/main.cpp — signal handler
namespace {
    std::atomic<bool> shutdown_requested{false};
    extern "C" void signal_handler(int) {
        shutdown_requested.store(true, std::memory_order_release);
    }
}

// examples/web-api/src/modules/http.cppm — Server::run_until()
void run_until(const std::atomic<bool>& should_stop) {
    std::jthread server_thread{[this](std::stop_token st) {
        run(st);  // checks st.stop_requested() each accept loop iteration
    }};
    while (!should_stop.load(std::memory_order_acquire)) {
        std::this_thread::sleep_for(std::chrono::milliseconds(100));
    }
    server_thread.request_stop();
    // jthread auto-joins on destruction
}

The flow is: SIGINT/SIGTERM sets the flag, run_until forwards the stop to the jthread’s stop_source, the accept loop exits, the jthread joins, run_until returns, and main proceeds to destroy resources in reverse declaration order. Every step is explicit, ordered, and cooperative – no detached threads, no SIGKILL required.

Coroutines can improve structure if the service already benefits from asynchronous composition, especially around I/O-heavy request paths. They are a bad bargain when used only to avoid writing callbacks while the lifetime model remains vague. If a coroutine frame captures borrowed request data, executor references, and cancellation state without a clear owner, you have compressed the bug, not removed it. Use coroutines when they simplify a design whose ownership model is already sound.

Backpressure is a product decision, not a queue detail

In a small service, backpressure is where local technical choices become user-visible policy. When the system is saturated, what happens? Do requests block, fail fast, shed optional work, degrade to stale data, or time out after bounded waiting? If the answer is “the queue grows,” the service is still missing an operational decision.

Modern C++ helps you implement these decisions, but it does not choose them. std::expected can represent overload as a stable error. Value-typed work items make queue costs visible. std::chrono-based deadlines can be threaded through the call graph explicitly. Structured cancellation lets a request abandon subwork once the caller no longer benefits from it. None of these replace the need to decide the overload behavior.

For small services, the usual recommendation is to prefer explicit rejection over silent latency inflation. A bounded queue with clear rejection metrics is easier to operate than a “helpful” queue that absorbs bursts until memory and tail latency become somebody else’s incident. The tradeoff is harsher user-visible failure under load. That is still usually the correct tradeoff because it preserves system shape and makes capacity problems measurable.

The middleware pipeline as cross-cutting composition

The example project’s middleware system (in examples/web-api/src/modules/middleware.cppm) demonstrates how cross-cutting concerns compose without coupling to each other or to handler logic. A Middleware is a std::function that takes a request and the next handler, and returns a response. middleware::chain() folds a range of middlewares around a base handler:

// examples/web-api/src/modules/middleware.cppm
template <std::ranges::input_range R>
    requires std::same_as<std::ranges::range_value_t<R>, Middleware>
[[nodiscard]] http::Handler
chain(R&& middlewares, http::Handler base) {
    http::Handler current = std::move(base);
    for (auto it = std::ranges::rbegin(middlewares);
         it != std::ranges::rend(middlewares); ++it)
    {
        current = apply(*it, std::move(current));
    }
    return current;
}

In main.cpp, the pipeline is assembled declaratively:

std::vector<webapi::middleware::Middleware> pipeline{
    webapi::middleware::request_logger(),
    webapi::middleware::require_json(),
};
auto handler = webapi::middleware::chain(pipeline, router.to_handler());

Each middleware is independent, testable in isolation, and composed by value. Adding a new cross-cutting concern – rate limiting, authentication, trace propagation – means appending to the vector, not modifying handler signatures. This is the functional decorator pattern applied to HTTP processing, and it keeps the service shape flat even as operational concerns grow.

Keep dependency boundaries narrow and translating

The code inside a small service often depends on databases, RPC clients, filesystems, clocks, and telemetry vendors. The mistake is either to abstract all of them immediately or to let vendor types flow through the entire codebase. Both approaches age badly.

Narrow boundary adapters are the practical middle ground. The service layer should depend on contracts expressed in the language of the service: persist this record, fetch this snapshot, emit this metric, publish this event. The adapter translates to and from the external API, error model, and allocation behavior.

This gives the service a place to normalize timeouts, classify failures, add observability fields, and control allocation or copying decisions. It also stops transport-specific details from leaking into application logic. A handler should receive a domain-relevant failure class that the service can act on consistently.

Do not overgeneralize these interfaces. A small service usually needs thin ports, not enterprise-wide universes of abstractions. The adapter exists to preserve ownership and failure boundaries, not to simulate a platform team.

Observability should follow the service shape

If the service root, request model, and concurrency model are explicit, observability gets easier. Request identifiers, queue depth, active work count, dependency latency, cancellation counts, startup failures, and shutdown duration all map naturally to named boundaries. If the codebase is built from hidden globals and detached work, telemetry also becomes vague because no one can say where work begins or ends.

A small service should usually expose at least these signals.

Startup success or failure by category.
Request rate, latency histogram, and failure categories.
Queue depth and rejection count for bounded work.
Downstream dependency latency and timeout counts.
Shutdown duration and count of canceled in-flight operations.

Anything beyond that should be justified by an operating question, not by fear. The goal is not maximum telemetry volume. The goal is fast diagnosis when the service is saturated, misconfigured, or stuck in shutdown.

Verification should target lifecycle, not just behavior

The most valuable tests for a small service are rarely “returns 200 for valid input.” They are tests that prove lifecycle behavior under pressure: invalid configuration blocks readiness, overload produces explicit rejection, canceled work does not commit partial state, shutdown drains owned tasks without use-after-free risk, and dependency failures remain classified correctly.

That test mix usually includes focused unit tests around adapters and translation, integration tests around startup and shutdown stories, sanitizer-backed runs for memory and concurrency hazards, and observability assertions where operational contracts matter. For example, if overload rejection is the chosen policy, the service should expose a metric or structured event that proves the policy is happening in the field.

Notice what this chapter is not repeating. It is not re-explaining test taxonomies, sanitizers, or telemetry pipelines. Those were earlier chapters. The synthesis point here is that service shape determines whether those tools can produce useful evidence.

Where this shape stops being enough

The recommendations in this chapter are for a genuinely small service: one process, a modest number of long-lived dependencies, bounded background work, and a codebase where one team can still hold the whole runtime model in its head. Beyond that size, you may need more explicit subsystem ownership, stronger component isolation, service-level admission control, or a dedicated async framework whose lifecycle model is already opinionated.

You should also choose a different shape when the domain is dominated by one constraint this chapter only touches lightly: extreme low-latency trading, hard real-time behavior, plugin-hosting with hostile extensions, or public network servers with specialized protocol stacks. The same principles still apply, but the engineering center of gravity shifts.

Takeaways

A good small C++23 service is built around owned resources, explicit startup and shutdown, bounded concurrency, short-lived borrowing, narrow dependency adapters, and observability tied to real lifecycle boundaries. The code should make it obvious what the process owns, how work is admitted, how cancellation propagates, and what state remains valid during failure.

The tradeoffs are deliberate. Explicit boundaries add boilerplate. Bounded queues reject work sooner. Value-typed work items may copy more than view-heavy designs. Narrow adapters add translation code. Those costs are usually cheap compared with debugging a service whose lifetime and overload behavior are implicit.

Review questions:

What is the single ownership root for long-lived service resources?
Which request data crosses time or thread boundaries, and does it become owned before it does?
Where is concurrent work bounded, and what explicit overload policy follows from that bound?
How does cancellation reach in-flight work, and what state is guaranteed after shutdown completes?
Which dependency failures are translated into stable service-level categories rather than leaked as vendor detail?

Building a Reusable Library in Modern C++

Application code can get away with local convenience for a surprisingly long time. Library code cannot. Once an API escapes its original caller set, every vague ownership contract, every overloaded error channel, every accidental allocation, and every unstable type dependency becomes somebody else’s problem. The library may still compile. It becomes expensive to trust.

The production question for this chapter is therefore not “how do I write a nice API?” It is “what does a reusable modern C++23 library have to make explicit so that other teams can adopt it without inheriting hidden coupling, unstable behavior, or unverifiable claims?” The answer starts with scope. A good reusable library makes a narrow promise, expresses that promise in types and contracts that survive real use, and refuses to leak its implementation economics into the whole dependency graph.

The sample shape here is a parsing and transformation library used by several services and command-line tools. That is a useful domain because it brings together most of the hard pressures: input boundaries, allocation behavior, diagnostics, performance expectations, extensibility, and packaging.

Start by choosing a narrow promise

Many bad libraries fail before the first line of public API. They are designed as “the shared place for everything related to X.” That usually means they accumulate multiple responsibilities, grow extension points for unrelated concerns, and become difficult to version because every change touches somebody.

A reusable library should instead make one narrow promise. Parse and validate this format. Normalize these records. Expose this storage abstraction. Compute these derived values. The promise can be substantial. It should still have a clear center.

This sounds obvious, but it changes interface design immediately. A narrow promise leads to a small vocabulary of public types, a small set of failure categories, and a smaller number of places where callers need customization. A vague promise pushes complexity outward into templates, callback forests, configuration maps, and documentation that reads like negotiated truce.

The most important library design decision is therefore not whether to use modules, concepts, or std::expected. It is where the public contract stops.

Public types should encode ownership and invariants directly

Callers should not need repository context to answer basic questions about ownership, lifetime, mutability, or validity. If the library returns a view, the caller should know what owns the underlying data. If it accepts a callback, the callback lifetime and threading expectations should be obvious. If a configuration object may be invalid, the invalid state should be representable only long enough to validate it.

Value types are often the right center of gravity for library APIs because they travel well across teams and tests. They make copying cost visible, move semantics deliberate, and invariants attachable to construction or validation boundaries. Borrowed inputs such as std::string_view and std::span are still excellent for call boundaries, but they should be used only when the library can complete its work within the borrowed lifetime or copy what it must retain.

Intentional partial: a caller-facing API with explicit contracts

enum class parse_error {
    invalid_syntax,
    unsupported_version,
    duplicate_key,
    resource_limit_exceeded,
};

struct parse_options {
    std::size_t max_document_bytes = 1 << 20;
    bool allow_comments = false;
};

struct document {
    std::pmr::vector<entry> entries;
};

[[nodiscard]] auto parse_document(std::string_view input,
                                  parse_options const& options,
                                  std::pmr::memory_resource& memory)
    -> std::expected<document, parse_error>;

This excerpt does a few useful things. It separates caller-controlled policy from input bytes. It makes allocation strategy visible without forcing a global allocator policy. It returns a domain error rather than a transport-specific or parser-internal type. It also makes result checking harder to forget through [[nodiscard]] and std::expected.

The tradeoff is that the function signature is less minimal than document parse(std::string_view). That is fine. Reusable libraries are not rewarded for looking compact in slides. They are rewarded for making costs and contracts legible.

Make failure shape stable

Application code can sometimes afford to let exceptions and internal error categories drift because one team controls both sides. Libraries should be much stricter. The caller must know which failures are part of the contract, which are programming errors, and which remain implementation accidents.

That usually leads to one of three designs.

Use std::expected or a similar result type for routine domain failures the caller is expected to handle.
Reserve exceptions for invariant violations, exceptional environment failures, or APIs whose ecosystem already assumes exception-based use.
Translate low-level errors into a stable public error vocabulary at the boundary.

The right choice depends on the domain. Parsing, validation, and recoverable business-rule failures usually fit std::expected well. Low-level infrastructure libraries integrated into exception-based application frameworks may reasonably use exceptions. What matters most is consistency. A library that returns std::expected for some recoverable failures, throws for others, and leaks std::system_error from one backend but not another is forcing callers to reverse-engineer policy from implementation.

Do not make the public error surface too granular. Callers need distinctions they can act on. They rarely benefit from twenty parser-internal states that only the library can interpret. Keep the stable categories small, then provide optional richer diagnostics through separate channels.

Separate mechanism from policy without abstracting everything

Reusable libraries often need some customization: allocation, logging hooks, host-owned diagnostics, clock sources, I/O adapters, or user-defined handlers. This is where teams either over-template the entire surface or bury the library inside runtime-polymorphic interfaces that allocate and dispatch everywhere.

The better approach is to choose a small number of explicit policy seams and keep the core mechanism concrete. Concepts help when the customization must stay zero-overhead and compile-time checked. Type erasure or callback interfaces help when binary boundaries, plugin models, or operational decoupling matter more than template transparency.

For example, an internal parsing engine probably does not need to be a giant policy-based template on logging, allocation, diagnostics, and error formatting simultaneously. It can parse concretely, accept a std::pmr::memory_resource&, and optionally emit diagnostics through a narrow sink interface. That keeps most call sites simple while still allowing hosts to control the expensive or environment-specific parts.

This is also where library authors need discipline about dependencies. If the public headers include a networking stack, formatting library, metrics SDK, and filesystem abstraction just to support optional features, the library has already lost portability and build hygiene. Optional operational concerns belong behind narrow seams or in companion adapters, not in the center of the API.

Mistake: exposing internal types in public headers

One of the most common library design failures is leaking implementation types into the public API surface. This creates hidden coupling: callers transitively depend on headers they never asked for, build times grow, and internal refactors become breaking changes.

// BAD: public header pulls in implementation details
#pragma once
#include <boost/asio/io_context.hpp>      // transport detail
#include <spdlog/spdlog.h>                // logging detail
#include "internal/parser_state_machine.h" // implementation detail

class document_parser {
public:
    document_parser(boost::asio::io_context& io,
                    std::shared_ptr<spdlog::logger> log);

    auto parse(std::string_view input) -> document;

private:
    boost::asio::io_context& io_;          // caller now depends on Boost.Asio
    std::shared_ptr<spdlog::logger> log_;  // caller now depends on spdlog
    internal::parser_state_machine fsm_;   // caller now depends on internal layout
};
// Every caller's translation unit now includes Boost.Asio and spdlog headers.
// Changing the logging library is a breaking change for all consumers.

The fix is to keep the public header minimal and push implementation types behind forward declarations, PIMPL, or narrow callback interfaces.

// BETTER: public header exposes only the library's own vocabulary
#pragma once
#include <string_view>
#include <expected>
#include <memory>

namespace mylib {

enum class parse_error { invalid_syntax, resource_limit_exceeded };

struct diagnostic_event {
    std::string_view message;
    std::size_t line;
};

using diagnostic_sink = std::function<void(diagnostic_event const&)>;

class document_parser {
public:
    struct options {
        std::size_t max_bytes = 1 << 20;
        diagnostic_sink on_diagnostic = {};  // optional, no spdlog dependency
    };

    explicit document_parser(options opts = {});
    ~document_parser();
    document_parser(document_parser&&) noexcept;
    document_parser& operator=(document_parser&&) noexcept;

    [[nodiscard]] auto parse(std::string_view input)
        -> std::expected<document, parse_error>;

private:
    struct impl;
    std::unique_ptr<impl> impl_;  // Boost, spdlog, FSM all hidden here
};

} // namespace mylib
// Callers include only standard headers. Internal deps are invisible.
// Changing from spdlog to another logger requires zero caller changes.

Mistake: poor error reporting

Libraries that report errors as raw integers, bare std::string messages, or platform-specific exception types force callers to reverse-engineer failure semantics from implementation details. The result is fragile error handling that breaks whenever the library changes its internals.

// BAD: error reporting through mixed, unstable channels
auto parse(std::string_view input) -> document {
    if (input.empty())
        throw std::runtime_error("empty input");  // string-based
    if (input.size() > max_size)
        return {};  // default-constructed "null" document — is this an error?
    if (!validate_header(input))
        throw parser_exception(ERR_INVALID_HEADER);  // internal enum leaked
    // caller must catch two exception types AND check for empty documents
}

// BETTER: single, stable error channel with actionable categories
[[nodiscard]] auto parse(std::string_view input)
    -> std::expected<document, parse_error>
{
    if (input.empty())
        return std::unexpected(parse_error::invalid_syntax);
    if (input.size() > max_size)
        return std::unexpected(parse_error::resource_limit_exceeded);
    if (!validate_header(input))
        return std::unexpected(parse_error::invalid_syntax);
    // one return type, one error vocabulary, no exceptions for routine failures
}

For richer diagnostics beyond the category, provide a separate channel (a diagnostic sink, an error details accessor, or a structured log) rather than overloading the primary error type with implementation-specific fields that callers cannot act on programmatically.

Versioning and ABI need a policy, not optimism

Even an internal shared library benefits from treating versioning as part of design rather than release paperwork. The practical question is what kinds of change the library promises callers they can survive. Source compatibility, ABI compatibility, wire-format stability, serialized-data stability, and semantic compatibility are related but different promises.

For many C++ libraries, the easiest honest policy is source compatibility within a major version and no blanket ABI promise across arbitrary toolchains. That is often a stronger real-world posture than pretending ABI stability while exposing standard-library types, inline-heavy templates, or platform-dependent layout in the public surface.

If ABI stability does matter, the design must change accordingly. That usually means narrower exported surfaces, opaque types, PIMPL-like boundaries, stricter exception policy, reduced template exposure, and controlled compiler and standard-library assumptions. Those are not finishing touches. They affect the entire API shape.

Mistake: breaking ABI with inline changes

A change that appears safe at the source level can break binary compatibility silently. Adding a member to a class, changing a default parameter value in an inline function, or reordering fields all change the ABI without any compiler diagnostic.

// v1.0 — shipped as shared library
struct document {
    std::pmr::vector<entry> entries;
    // sizeof(document) == N, known to callers at compile time
};

// v1.1 — "just added a field"
struct document {
    std::pmr::vector<entry> entries;
    std::optional<metadata> meta;  // sizeof(document) changed
    // callers compiled against v1.0 still assume size N
    // stack allocations, memcpy, placement new — all wrong
};

The fix for ABI-stable libraries is to hide layout behind a PIMPL boundary, so that callers never depend on sizeof or field offsets.

// ABI-stable public header
class document {
public:
    document();
    ~document();
    document(document&&) noexcept;
    document& operator=(document&&) noexcept;

    [[nodiscard]] auto entries() const -> std::span<entry const>;
    [[nodiscard]] auto metadata() const -> std::optional<metadata_view>;

private:
    struct impl;
    std::unique_ptr<impl> impl_;
};

// In the .cpp file (not visible to callers):
struct document::impl {
    std::pmr::vector<entry> entries;
    std::optional<metadata> meta;
    // add fields freely — callers see only the pointer
};

PIMPL adds one heap allocation per object and one indirection per access. For types created infrequently (documents, connections, sessions), this is nearly always an acceptable cost. For types created millions of times per second in a hot loop, it is not, and ABI stability for those types should be reconsidered.

Versioning pattern: inline namespaces for source versioning

When the library must support multiple API versions simultaneously (for example, during a migration period), inline namespaces let you version the symbols without forcing callers to change their code.

namespace mylib {
inline namespace v2 {
    struct document { /* v2 layout */ };
    auto parse_document(std::string_view input) -> std::expected<document, parse_error>;
}

namespace v1 {
    struct document { /* v1 layout, kept for compatibility */ };
    auto parse_document(std::string_view input) -> std::expected<document, parse_error>;
}
}

// Callers using `mylib::document` get v2 by default.
// Callers that need v1 use `mylib::v1::document` explicitly.
// Linker symbols are distinct, so v1 and v2 can coexist in one binary.

Modules improve build structure and distribution hygiene, but they do not erase ABI reality. Likewise, concepts improve diagnostics and constraints, but they do not automatically make a heavily templated library versionable. Treat these tools as local improvements, not policy substitutes.

Documentation should answer integration questions

Library documentation for experienced programmers should focus on adoption decisions, not feature advertising. A caller evaluating a reusable library needs to know:

What problem the library owns and what it refuses to own.
Which inputs are borrowed and which outputs own storage.
Which failures are routine and how they are reported.
Which allocations, copies, or background work are part of normal use.
Which thread-safety guarantees exist, if any.
Which versioning and compatibility promises are real.

Short, production-looking examples help here, especially when they show the normal error path and configuration boundary. Huge tutorial walkthroughs usually do not. The purpose of the docs is to let another team integrate the library without learning internal folklore.

Performance claims deserve the same restraint. Do not say the library is fast. Say what was measured, under what workload, against what baseline, and where the cost model is sensitive. Parsing libraries often have costs that depend heavily on allocation strategy, input size distribution, and failure rate. Say that directly.

Verification should match the library’s public promises

A reusable library needs stronger verification discipline than a leaf application component because callers cannot inspect all of your assumptions. Tests should therefore map directly to public promises.

If the library guarantees stable parsing behavior for a documented format version, keep fixture-based contract tests. If it guarantees that malformed input returns structured failures without undefined behavior, run fuzzing and sanitizers on the public parse entry points. If it claims bounded allocations under certain modes, benchmark or instrument that claim. If it exposes host-provided diagnostics, test that the sink receives stable event categories rather than implementation chatter.

This is also where compatibility checks belong. If source compatibility across minor versions is part of the promise, keep integration tests or sample clients that exercise old call patterns. If ABI matters, test produced artifacts in a way that actually checks symbol and layout expectations. “It still compiled on my machine” is not a compatibility policy.

Know when not to build a library yet

Teams often create shared libraries too early. If only one application uses the code, if the domain vocabulary is still moving weekly, or if the supposed reuse is mostly organizational aspiration, forcing a stable public surface can freeze bad assumptions into place. Sometimes the right answer is to keep the component internal until the contract stabilizes.

This is not an argument against reuse. It is an argument for charging the real cost of public APIs. Once other teams depend on the library, changing semantics, ownership, or error policy becomes much more expensive than changing internal code. Reuse is valuable when the problem and contract are mature enough to deserve that cost.

Takeaways

A good reusable C++23 library makes a narrow promise, encodes ownership and invariants directly in public types, keeps failure shape stable, exposes only a few deliberate customization seams, and states versioning and performance claims honestly. It should minimize dependency drag and make adoption decisions easy for experienced callers.

The tradeoffs are the usual ones but paid more explicitly than in applications. Richer signatures can look less elegant. Stable error categories require translation. ABI-aware design limits public template freedom. Narrow seams require discipline about what not to customize. Those are acceptable costs because a library is a long-lived contract, not just a pile of reusable code.

Review questions:

What narrow promise does this library make, and what important responsibilities does it explicitly refuse?
Which public types communicate ownership, lifetime, and invariants without hidden assumptions?
Is the public error surface small, stable, and actionable for callers?
Which customization seams are genuinely necessary, and which dependencies should be pushed behind adapters instead of into public headers?
What compatibility promise is actually being made: source, ABI, serialized format, semantic behavior, or some combination?

Reviewer’s Checklist for Modern C++ Code

Most code review failures in C++ are not failures of intelligence. They are failures of review posture. The reviewer follows the diff mechanically, comments on naming or formatting, maybe spots one local bug, and misses the system question the change actually introduced: new ownership across threads, a hidden allocation in a hot path, an exception boundary leak, a silently widened API contract, a queue with no admission control, a sanitizer lane that can no longer exercise the risky path.

Modern C++ makes many expensive decisions visible, but only if reviewers ask the right questions. This closing chapter turns the rest of the book into a practical review routine. It is not a replacement for the appendix checklist, and it is not a style guide. It is the set of questions that should shape how an experienced reviewer reads a change in production C++.

The central idea is simple: review for failure economics before review for local elegance. A line that looks tidy can still expand lifetime, weaken invariants, or increase operational cost. The checklist below is organized around the places where those failures usually hide.

First pass: identify what kind of change this really is

Before reading line by line, classify the change. A reviewer who misclassifies the change asks the wrong questions.

Is this primarily:

a new ownership or lifetime path?
an API or contract change?
a concurrency or cancellation change?
a data-layout or performance-sensitive change?
a tooling, verification, or build-pipeline change?
a service-behavior change with operational consequences?

Many pull requests contain more than one category, but one usually dominates. Start there. If a change adds a background queue, it is not mainly a refactor. If a function now returns std::string_view, it is not merely a micro-optimization. If a library starts exposing a templated callback type in public headers, it is not only a convenience improvement. Review should center on the dominant risk first.

This classification step also tells you what evidence to expect. An API change should come with contract and compatibility reasoning. A concurrency change should come with cancellation and shutdown evidence. A performance claim should come with measurement. A tooling change should explain which bug class becomes easier or harder to catch.

Ownership and lifetime questions

Ownership review remains the highest-value pass in production C++ because lifetime bugs tend to look locally harmless. Ask these questions early.

Who owns each new resource, and where does that ownership end? If the answer depends on reading five files and inferring framework behavior, the design is already weak. Ownership should usually be obvious from types, object graph, and construction sites.

Did the change introduce borrowing across time? This includes storing std::string_view, std::span, iterators, references, or range views in state that may outlive the source object, request, frame, or container epoch. When the code crosses an async boundary, queue, callback, coroutine suspension, or detached thread, re-check every borrowed type aggressively.

Did the change replace clear ownership with shared ownership as a convenience? std::shared_ptr is sometimes the right tool. It is also a common way to postpone a design decision. Reviewers should ask what concrete lifetime problem shared ownership solves here and whether a moved value, owned work item, or explicit parent owner would make the design easier to reason about.

Were move and copy costs changed intentionally? Value semantics are powerful, but reviewers should still ask whether new copies are part of the contract or just accidental byproducts of interface design.

Good review comments in this area are concrete. “This looks risky” is weak. “This queue now stores std::string_view derived from request-local storage, so queued work can outlive the buffer” is strong.

What the reviewer should flag: dangling references

Dangling references are the single most productive category of bugs for reviewers to catch because they are invisible to the type system and often survive testing until a race or reallocation makes them observable.

// FLAG THIS: dangling reference from temporary
auto& config = get_configs()["database"];
// if get_configs() returns by value, the temporary map is destroyed
// at the semicolon. config is now a dangling reference.
// fix: auto config = get_configs()["database"];  (copy the value)

// FLAG THIS: reference into a vector that may reallocate
auto& first = items.front();
items.push_back(new_item);  // may reallocate, invalidating first
use(first);                 // undefined behavior

// FLAG THIS: string_view outliving its source
std::string_view name = get_user().name();
// if get_user() returns by value, the std::string inside the
// temporary is destroyed. name now points to freed memory.
// fix: std::string name = std::string{get_user().name()};

// FLAG THIS: lambda capturing reference to local
auto make_callback(request& req) {
    auto& headers = req.headers();  // reference to req's member
    return [&headers]() {           // captures by reference
        log(headers);               // dangling if req is destroyed
    };
    // fix: capture by value, or capture a copy of headers
}

The reviewer’s question is always the same: does the referenced object provably outlive every use of the reference? If the answer requires reasoning about framework scheduling, queue timing, or caller discipline, the code should copy instead.

What the reviewer should flag: missing noexcept on move operations

A move constructor or move assignment operator that is not marked noexcept silently degrades standard library container performance. std::vector will copy instead of move during reallocation if the move constructor can throw, because the strong exception guarantee requires it.

// FLAG THIS: move constructor without noexcept
class connection {
    std::unique_ptr<socket> sock_;
    std::string endpoint_;
public:
    connection(connection&& other)  // missing noexcept!
        : sock_(std::move(other.sock_))
        , endpoint_(std::move(other.endpoint_))
    {}
    // std::vector<connection> will COPY during reallocation
    // instead of moving. For large vectors, this is a silent
    // performance cliff — and may fail to compile if the type
    // is move-only.
};

// CORRECT:
connection(connection&& other) noexcept
    : sock_(std::move(other.sock_))
    , endpoint_(std::move(other.endpoint_))
{}
// now vector::push_back uses move during reallocation

This also applies to move assignment. Both std::move_if_noexcept and container implementations check noexcept at compile time. If either move operation is potentially throwing, the fallback is always more expensive.

What the reviewer should flag: exception-unsafe resource acquisition

Resource leaks hide in code that acquires multiple resources without RAII guards, especially when the acquisitions are interleaved with operations that can throw.

// FLAG THIS: raw acquire/release with throwing code between
void setup_pipeline(config const& cfg) {
    auto* buf = allocate_buffer(cfg.buffer_size);  // raw allocation
    auto fd = open_file(cfg.path);                 // may throw
    auto conn = connect_to_db(cfg.db_url);         // may throw
    register_pipeline(buf, fd, conn);
    // if open_file throws, buf leaks
    // if connect_to_db throws, buf leaks AND fd leaks
}

// CORRECT: RAII from the first acquisition
void setup_pipeline(config const& cfg) {
    auto buf = std::unique_ptr<std::byte[]>(
        allocate_buffer(cfg.buffer_size));
    auto fd = owned_fd{open_file(cfg.path)};       // RAII wrapper
    auto conn = connect_to_db(cfg.db_url);         // already RAII (or should be)
    register_pipeline(buf.release(), fd.release(), std::move(conn));
    // every intermediate throw is safe — destructors clean up
}

The pattern to watch for is any gap between resource acquisition and RAII ownership. Even a single line of throwing code in that gap is a leak. Reviewers should also flag new expressions passed directly as function arguments, where an exception in another argument evaluation can leak the allocation before the smart pointer is constructed.

Invariants and error boundary questions

The next pass is about invalid state and failure shape. Does the change make invalid state easier to create, easier to observe, or harder to recover from? Construction paths, configuration objects, partial initialization, and mutation APIs are common places where invariants become weaker silently.

Then ask how failure is reported. Are recoverable domain failures represented consistently, or did the change add a second error channel? Did a previously contained dependency error leak into a broader layer? Did a function marked [[nodiscard]] or returning std::expected gain call sites that ignore the result? If exceptions are involved, did the change widen the exception boundary accidentally?

Reviewers should also examine rollback and cleanup behavior, especially around resource-owning operations. If the new path fails halfway through, what remains true? Are partially written files removed, transactions canceled, temporary state discarded, background work stopped, and telemetry still emitted in the right category?

One useful discipline is to ask for the failure story in one sentence. “If dependency X times out after state Y is reserved, the request returns dependency_timeout, the reservation is released, and no background retry survives shutdown.” If the author cannot say that succinctly, the error boundary may not be clear enough yet.

Interface and library surface questions

Any public or widely shared interface deserves a separate review pass because local implementation quality does not compensate for a weak contract.

Did parameter and return types become more honest or less honest? Returning std::span<const std::byte> may communicate borrowing clearly; returning a reference to internal mutable state may hide coupling. Accepting std::string_view may be right for read-only parse calls and wrong for objects that retain the string. Review should focus on what the signature now promises about ownership, cost, and failure.

If templates, concepts, callbacks, or type erasure were added, why this form? Concepts can improve diagnostics and prevent nonsense instantiations, but they also expand compile-time surface. Type erasure can stabilize call sites, but may add allocation or indirect-call cost. New genericity should pay for itself.

For library changes, ask whether the public surface leaked implementation detail. Did a new header expose a transport type, allocator strategy, synchronization primitive, or error type that callers should not have to know? Did a seemingly harmless inline helper change the ABI or source compatibility story? Did the docs or examples change with the contract, or is the new behavior discoverable only by reading the diff?

Reviewing interfaces well means thinking like the next caller, not like the current author.

What the reviewer should flag: implicit conversion and narrowing in APIs

Public interfaces that silently accept wrong types or narrow values are a source of bugs that survive unit tests and only appear under production data.

// FLAG THIS: implicit conversion hides a bug
class rate_limiter {
public:
    rate_limiter(int max_requests, int window_seconds);
};

// caller writes:
rate_limiter limiter(30, 60);     // OK: 30 requests per 60 seconds
rate_limiter limiter(60, 30);     // compiles fine, but the arguments
                                   // are swapped — 60 req per 30s
// no type safety distinguishes max_requests from window_seconds

// BETTER: use distinct types or a builder
struct max_requests { int value; };
struct window_seconds { int value; };

rate_limiter(max_requests max, window_seconds window);
// rate_limiter limiter(window_seconds{60}, max_requests{30}); // compile error

// FLAG THIS: narrowing conversion in initialization
void set_buffer_size(std::size_t bytes);

int user_input = get_config_value("buffer_size");  // may be negative
set_buffer_size(user_input);  // silent narrowing: -1 becomes SIZE_MAX
// fix: validate before conversion, or use std::size_t throughout

Concurrency, time, and shutdown questions

Concurrency review is mostly lifecycle review with time added. The key question is not whether the code uses the right primitive names. It is whether work remains owned and bounded as it moves.

Ask whether the change introduced detached work, hidden threads, executor hops, or coroutine suspension points whose owner is not obvious. Ask how stop requests propagate. Ask whether deadlines and retries are explicit or hidden behind helper layers. Ask whether queue growth is bounded and what overload policy now applies.

If a change touches locks or shared state, review contention and invariants together. Does the lock protect the full invariant or only some fields? Did a callback get invoked while a lock is held? Did statistics or cache updates introduce a race that will be rationalized later as benign? ThreadSanitizer may catch some of this. Review should still try to eliminate the ambiguity before runtime.

Shutdown deserves its own question in almost every service or tool change: after this diff, what work can still be running when destruction begins, and how does it stop? If the answer is unclear, the review is not done.

Data layout and cost model questions

Many performance bugs enter code review disguised as harmless abstraction. Reviewers should therefore ask where cost moved, not just whether the code “looks efficient.”

Did the change add allocations on a hot path, widen objects that live in large containers, increase indirection, or turn a local value into heap-managed shared state? Did a range pipeline improve clarity while preserving lifetime, or did it create hidden iteration, temporary materialization, or dangling-view risk? Did a container choice change the memory and invalidation model in ways the author has not discussed?

The standard for review here is evidence proportional to the claim. If the change says it improves performance, ask for benchmark or profile data. If it says the extra allocation is negligible, ask under which workload. If the answer is “it probably doesn’t matter,” decide whether this code is actually in a place where probably is acceptable.

Not every change needs a benchmark. Performance-sensitive changes do need a cost model that survives basic questioning.

Verification and delivery questions

Good C++ review continues past the source diff. The final pass is about whether the repository still has a credible way to prove the change sound.

Which tests now cover the risky path? If the change adds a rollback branch, overload behavior, or host-library boundary, is there a test that exercises it deliberately? If the change affects memory, concurrency, or input handling, do sanitizer or fuzzing lanes still cover the path? If the build or CI configuration changed, did the diagnostic matrix become stronger or weaker?

Operational changes deserve observability review as well. If the service now rejects work earlier, can operators distinguish that from dependency failure? If a library added new diagnostics, are they stable enough for hosts to consume? If crash or symbol handling changed, can the shipped artifact still be diagnosed later?

This is also where reviewers should challenge missing evidence instead of reverse-engineering it themselves. Review is not unpaid archaeology. If a change requires a new test, benchmark, sanitizer run, or migration note, ask for it plainly.

How to write useful review comments

A good review comment identifies the risk, the violated or unclear contract, and the evidence needed to resolve it. It does not merely express taste.

Strong comments tend to look like this:

“This callback captures this and is stored in work that can outlive shutdown. Who owns that lifetime after request_stop()?”
“The public API now returns a std::string_view into parser-owned storage. Where is that storage guaranteed to outlive the caller’s use?”
“This queue is bounded, but the overload behavior is still implicit. Do we reject, block, or drop optional work, and where is that surfaced operationally?”
“The change claims a latency win. Which benchmark or profile run demonstrates that the new allocation pattern is better under realistic input sizes?”

Weak comments are vague, purely stylistic, or framed as personal preference when the issue is actually semantic.

Reviewers should also say when evidence is sufficient. If ownership is clear, tests hit the risky path, and the contract is improved, state that. Good review is not only about blocking changes. It is about making the reasoning around acceptance explicit.

When to block the change

Not every unresolved issue deserves a hard stop. Some do.

Block the change when ownership is unclear, when borrowed state may outlive its source, when the failure contract is inconsistent, when concurrency is unbounded or shutdown is undefined, when a public interface change lacks compatibility reasoning, when a performance claim lacks required evidence, or when the verification story no longer exercises the risky path.

Do not block merely because you would have written the code differently. Modern C++ already has enough accidental complexity. Review should not add another layer of taste-driven friction.

Takeaways

The most effective C++ reviewers read for ownership, failure shape, interface honesty, concurrency lifecycle, cost movement, and verification evidence before they read for local elegance. They classify the change first, ask questions tied to production risk, and insist on evidence where the code alone is not enough.

That posture is what turns the rest of this book into day-to-day engineering behavior. The point of precise ownership models, explicit failure boundaries, bounded concurrency, honest APIs, and diagnostic discipline is not that they look modern. It is that they let a reviewer explain why a change is safe, risky, or incomplete in concrete terms.

Review questions:

What is the dominant production risk introduced by this change, and did the review focus there first?
Which ownership, lifetime, or borrowing assumptions now cross time, threads, or API boundaries?
How did the change alter failure reporting, rollback guarantees, or invariant preservation?
What cost model changed, and what evidence supports any performance or efficiency claim?
Which tests, sanitizer lanes, diagnostics, or operational signals now prove the risky behavior remains sound?

Appendix A: C++23 Feature Index by Engineering Use

This is not a changelog of everything added in C++23. It is the short list of language and library facilities that materially change design in a C++23 codebase. Use it to locate the right tool for an engineering pressure, then read the referenced chapters for the tradeoffs.

Make Ownership and Retention Obvious

Engineering pressure	Primary tools	Use them when	Main risk	See
Exclusive ownership of a resource	`std::unique_ptr`, direct members, move-only RAII wrappers	One component clearly owns cleanup and transfer should be explicit	Hiding ownership behind raw pointers or shared ownership by convenience	Chapters 1, 4
Borrowing without transfer	`std::string_view`, `std::span`	The callee inspects caller-owned contiguous data and does not retain it	Storing the borrow or carrying it across async boundaries	Chapters 4, 5
Shared lifetime across tasks or subsystems	`std::shared_ptr`, `std::weak_ptr`	There is a real multi-owner lifetime that cannot be expressed more simply	Lost destruction timing, refcount traffic, and vague ownership	Chapters 1, 12, 13

Represent Absence, Failure, and Alternatives Precisely

Engineering pressure	Primary tools	Use them when	Main risk	See
Ordinary absence	`std::optional`	Missing data is expected and not itself an error	Erasing why work failed when callers need diagnostics	Chapters 3, 5
Recoverable failure at a boundary	`std::expected`	The caller must make a visible decision based on failure	Propagating `expected` so deep that local code becomes ceremony	Chapters 3, 5, 21, 22
Closed set of valid alternatives	`std::variant`	Several states are all legitimate and must be handled explicitly	Using it where the state space is open-ended or unstable	Chapters 5, 10

Constrain Generic Code Without Lying About It

Engineering pressure	Primary tools	Use them when	Main risk	See
Reject nonsense template use early	concepts, `requires`, named constraints	The template surface is public or heavily reused	Constraint sets that are harder to understand than the code they protect	Chapters 6, 22
Adapt to ranges instead of container types	`std::ranges`, views, range algorithms	The algorithm works on a sequence abstraction rather than one owning container	Composing pipelines whose lifetime or traversal cost is no longer obvious	Chapters 7, 15

Express Work Over Time Deliberately

Engineering pressure	Primary tools	Use them when	Main risk	See
Cooperative stop and thread ownership	`std::jthread`, `std::stop_source`, `std::stop_token`	Work must stop cleanly during shutdown, deadline expiry, or parent failure	Wiring tokens through code without defining what cancellation means	Chapters 14, 21
Suspended async work with local clarity	coroutines, coroutine-based task types	Async control flow is complex enough that callback state machines are obscuring ownership and failure	Borrowed state outliving the caller or tasks with no clear owner	Chapters 13, 14
Incremental pull-style production	generators	The consumer should pull a sequence lazily without materializing it all at once	Yielding references into storage whose lifetime is unstable	Chapter 7

Control Allocation and Data Movement

Engineering pressure	Primary tools	Use them when	Main risk	See
Caller-controlled allocation strategy	`std::pmr` facilities	Allocation policy materially affects performance or integration	Exposing allocator policy where it is not part of the contract	Chapters 16, 22
Stable contiguous ownership for hot paths	`std::vector`, `std::array`, flat contiguous representations	Locality and predictable traversal matter more than node stability	Overfitting container choice to one benchmark or one workload	Chapters 15, 16, 17

Move Work to Compile Time Carefully

Engineering pressure	Primary tools	Use them when	Main risk	See
Remove runtime branching or validation cost	`constexpr`, `consteval`, compile-time tables	The payoff is real and the generated surface stays readable	Turning build times and diagnostics into the new bottleneck	Chapter 8
Make contracts visible to the compiler	type traits, constrained templates, structural compile-time checks	Incorrect instantiations or illegal states should fail early	Producing metaprogramming that only its author can review	Chapters 6, 8

Keep the Index Honest

Prefer the tool that makes the contract easiest to review, not the tool with the newest spelling.
If a feature removes one local cost by creating hidden lifetime, shutdown, ABI, or measurement risk elsewhere, the feature is not helping.
When in doubt, choose the representation that makes ownership, failure, and cost visible in the type system or at the boundary.

Appendix B: Toolchain Baseline

This appendix defines the reference build baseline behind the examples and recommendations in the manuscript. It is not a promise that every example behaves identically on every platform. It is the minimum environment the book assumes when it talks about C++23 support, warnings, sanitizers, and diagnosable release artifacts.

Reference Compilers

Toolchain	Baseline	Notes
GCC	14+	Strong general C++23 coverage; good Linux sanitizer environment
Clang	18+	Reference environment for sanitizer-heavy investigation and many diagnostics
MSVC	17.10+ in Visual Studio 2022	Required Windows baseline; expect some sanitizer and library support differences

The baseline is intentionally conservative. If a sample depends on behavior or library support that is weaker on one of these toolchains, the surrounding chapter should say so explicitly.

Default Build Expectations

All substantial examples should assume:

-std=c++23 or the equivalent vendor flag.
Debug information in all developer and diagnostic builds.
Assertions enabled in at least one fast local build configuration.
Warning sets strong enough to catch narrowing, shadowing, ignored results, missing overrides, and suspicious conversions before review.
Release artifacts that retain enough symbol and build identity information to support crash analysis later.

The point is not one universal flag block. The point is a stable diagnostic posture across supported compilers.

Required Build Modes

Build mode	Question it answers	Typical characteristics
Fast developer build	Can I iterate on logic quickly?	Debug info, assertions, low optimization or none
ASan + UBSan build	Did execution hit memory or undefined-behavior bugs?	Debug info, frame pointers, moderate optimization
TSan build	Did concurrent execution hit a data race or lock-order bug?	Separate job, heavier overhead, focused workload
Static-analysis build	Does the code trip known defect patterns before runtime?	Serious warnings plus analyzer or lint passes
Release-with-symbols build	Will the shipped binary still be diagnosable?	Production optimization, external symbols or symbol server, build IDs

Trying to collapse these into one magic build usually weakens signal. Sanitizers distort timing. ThreadSanitizer is too heavy for every edit-build-run cycle. Release verification needs the same optimization and packaging shape the shipped artifact will use.

Warning Policy

Treat warnings as a repository policy surface, not as a developer preference.

Enable a serious warning baseline on every supported compiler.
Treat warnings as errors for owned code once the warning baseline is under control.
Keep suppressions local and justified.
Isolate noisy third-party or generated code rather than muting diagnostics across the whole project.

The manuscript does not prescribe an identical flag list for every environment because vendors differ. It does prescribe the review posture: a warning should either identify a real risk or be turned off on purpose, not be ignored as background noise.

Sanitizer Baseline

Use sanitizers as named verification lanes, not as occasional rescue tools.

AddressSanitizer is the default memory-safety lane.
UndefinedBehaviorSanitizer runs with ASan where supported and where the signal is useful.
ThreadSanitizer runs separately on code that genuinely exercises shared-state and shutdown paths.
Fuzzing, stress tests, parser tests, and cancellation tests should preferentially run under sanitizer builds when those paths are high value.

Clang on Linux is the reference environment for the strongest sanitizer coverage. That is not a reason to ignore Windows or MSVC; it is a reason to avoid pretending all sanitizer behavior is equally mature everywhere.

Release Diagnostics Baseline

Release artifacts should preserve enough information to answer production questions later.

Produce symbol files or an equivalent symbolization path.
Attach a build ID or other exact binary identity.
Record source revision or package version in build metadata.
Keep stack unwinding usable in the shipped configuration.
Preserve the commands or presets needed to reproduce the release build shape.

If a crash in production cannot be mapped back to an exact binary and symbols, the toolchain policy is not complete.

Version Policy Notes

The manuscript targets C++23 as the default working language.
C++26 material belongs only where it changes a present-day design decision.
If a feature is technically standardized but not yet dependable across the baseline toolchains, treat it as provisional and say so near the example.

The book is not trying to win a standards-timeline argument. It is trying to describe what an experienced engineer can rely on in a real codebase.

Appendix C: Code Review Checklists

This appendix is the short form of the review posture developed across the book and expanded in Chapter 23. Use it as a compact pass over a diff, not as a substitute for thinking about the dominant risk in the change.

Ownership and Lifetime

What new resources exist after this change, and who owns each one?
Did the change introduce stored std::string_view, std::span, iterators, references, or views whose source may disappear first?
Did a borrow start crossing time, threads, queues, callbacks, or coroutine suspension points?
Is std::shared_ptr solving a real multi-owner lifetime, or compensating for the absence of a named owner?
If move or copy behavior changed, is that part of the contract or an accidental cost shift?

Invariants and Failure Boundaries

What invariant must remain true before and after the new code runs?
If the operation fails halfway through, what cleanup or rollback is guaranteed?
Does the change keep the existing error model, or did it add a second error channel?
Are std::expected, [[nodiscard]], and status-bearing return values still checked consistently?
Were low-level errors translated near the boundary, or did vendor detail leak upward?

Interfaces and Library Surfaces

Do parameter and return types now communicate ownership, retention, mutability, and cost more honestly or less honestly?
Did the public surface widen in a way that commits callers to templates, callbacks, allocators, synchronization, or vendor types they should not need to know about?
If concepts, type erasure, or callbacks were added, why is this the right seam?
If the change affects a library boundary, what compatibility promise changed: source, ABI, serialized format, or semantics?
Can a caller understand the contract from the signature and surrounding docs, or only from implementation detail?

Concurrency, Cancellation, and Shutdown

Who owns each new piece of asynchronous work?
What stops it: parent completion, failure, deadline, explicit stop, or shutdown?
Is concurrent work bounded at the scarce resource, or can overload turn into queue growth and latent failure?
Do locks protect invariants, or only fields?
Does any callback, logging path, I/O path, or foreign code now run while a lock is held?
After this change, what work can still be running when destruction starts, and how does it stop?

Data Layout and Performance

Did the change add allocation, indirection, copying, or object-size growth on a meaningful path?
If a container or representation changed, what workload assumption made that choice better?
Did a ranges or view-based rewrite preserve lifetime and traversal clarity?
If the change claims a performance win, where is the benchmark, profile, or production evidence?
If the change claims the cost is negligible, under what input shape and load is that true?

Verification and Delivery

Which tests exercise the risky path introduced by the change?
Which sanitizer, static-analysis, or build-diagnostics lanes still cover it?
If the diff changes overload behavior, retries, shutdown, or failure classification, is that visible in logs, metrics, traces, or crash diagnostics?
Did the build or CI configuration become more diagnosable or less diagnosable?
Can the shipped artifact still be mapped back to exact symbols, source, and build identity if it fails in production?

When the Checklist Should Block Approval

Block the change when ownership is unclear, borrowed state may outlive its source, failure policy is inconsistent, async work has no owner, overload behavior is implicit, a public boundary widened without compatibility reasoning, or a performance claim lacks evidence proportional to the claim.

Appendix D: Glossary

These definitions are the canonical meanings of the book’s core terms. They are intentionally short. If a chapter relies on one of these words, it should use it in this sense unless it says otherwise.

Core Contracts

Ownership The responsibility for keeping a resource valid and releasing it exactly once when its useful lifetime ends.

Borrowing Using data or a resource without taking over its lifetime. A borrow is valid only while the owner keeps the underlying object alive and in the required state.

Lifetime The interval during which an object, resource, or view remains valid to use. Lifetime is broader than scope once work crosses callbacks, threads, queues, or suspension points.

RAII Resource Acquisition Is Initialization: tying resource ownership to object lifetime so destruction performs cleanup automatically and composes with exceptions and early returns.

Contract What a type or function promises about ownership, validity, failure, cost, and observable behavior. Contracts should be readable from the boundary, not inferred from implementation folklore.

Invariant A property of an object or subsystem that must remain true for the design to be considered valid. Synchronization protects invariants, not merely fields.

Value semantics A design in which objects behave like self-contained values: copying duplicates the value, moving transfers it, and equality is about state rather than object identity.

Identity The fact that a particular object instance matters as itself rather than only as the value it currently holds. Identity usually brings aliasing, lifetime, or synchronization consequences.

Failure and Boundaries

Failure boundary The layer where failure is translated into the form the next layer is expected to handle: exception, std::expected, status, process exit, log and continue, or some other deliberate policy.

Boundary translation Converting unstable or low-level failures into a smaller, stable vocabulary near the dependency that produced them.

Undefined behavior Program behavior for which the C++ standard imposes no requirements. In practice it means the compiler is free to assume the bug does not occur, which can turn a local mistake into arbitrary results.

Reviewable Easy enough to reason about that a competent reviewer can identify ownership, invariants, failure shape, and meaningful costs without reverse-engineering the whole subsystem.

Concurrency and Throughput

Structured concurrency Designing concurrent work so child tasks belong to parent scopes, have bounded lifetimes, and are canceled or completed before the parent is considered done.

Cancellation The deliberate request to stop work that is no longer useful because a parent failed, a deadline expired, shutdown began, or overload policy requires shedding work.

Suspension boundary A point such as co_await where control may resume later, elsewhere, or not at all unless the owning task and its data remain valid.

Contention Delay caused by multiple threads or tasks competing for the same lock, core time, queue capacity, cache lines, or dependency budget.

Backpressure The system’s explicit answer to work arriving faster than it can be completed. Good backpressure is an admission policy, not an oversized queue.

Throughput The amount of useful work completed per unit time. Higher throughput is not automatically better if it is bought with unacceptable tail latency, memory growth, or operational risk.

Data and Performance

Locality How well the runtime layout of code and data matches the access pattern of the hardware. Good locality usually reduces cache misses, pointer chasing, and memory stalls.

Cost model The concrete explanation of where time, memory, allocation, copying, synchronization, and indirection costs enter a design.

Hot path Code that runs often enough, or under enough pressure, that small costs in allocation, layout, or synchronization materially affect system behavior.

Vocabulary type A standard type such as std::span, std::string_view, std::optional, std::expected, or std::variant used to make contracts legible at boundaries.

Packaging and Operations

ABI Application Binary Interface: the binary-level contract covering calling conventions, layout, symbol names, exception behavior, and other details that determine whether separately built components can link and run together.

Observability The runtime evidence needed to explain behavior in production: logs, metrics, traces, crash artifacts, and the metadata that makes them diagnosable.

Diagnostic build A build configuration optimized to find or explain bugs rather than only to maximize runtime speed, typically through warnings, assertions, sanitizers, debug info, and symbolization.

Keyboard shortcuts

Modern CPP in Practice