Historical revision

Undefined Behavior — 2026-06-23 10:44 UTC

rev_c5b9f33afad9498a9df970d6a2dd3447

Undefined Behavior

Any C/C++ code of even modest size (nontrivial) almost certainly contains Undefined Behavior (UB). "Nontrivial" here means code that goes beyond a few-line example and performs real functionality, that is, code where pointers, integer arithmetic, type casting, concurrency, and the like begin to intertwine. This document explains why UB in C/C++ cannot be attributed solely to individual carelessness. The fact that even experts who have written C/C++ daily for nearly 30 years continuously produce UB shows that this is not a matter of individual skill, but something closer to a fundamental cost imposed by the language and its ecosystem.[1]

Put simply, if you were to compose a clean definition and write it out, it would look something like the following.

UB is not a problem caused by the compiler maliciously optimizing your code into failure. UB is the right it grants the compiler to assume that "such a situation will never occur."

Code containing UB runs on the premise that the compiler, ABI, hardware, and runtime all operate under the assumption that "no well-formed C code would ever do this." Even if a human reader finds the intent obvious, the issue arises because there may be no means to express that intent between compiler stages or across module boundaries. In other words, the way a computer and a human communicate is fundamentally different.

When reading C/C++ standard documents, you need to distinguish between expressions that look similar. This is a chronic problem with C-family languages.

Category	Meaning	Compiler/Implementation Obligation	Example
Defined behavior	The standard specifies the result	Must follow the standard	Modulo 2^n wraparound in unsigned integer arithmetic
Implementation-defined behavior	The implementation must determine and document the result	Documentation is required	Whether `char` is signed or unsigned
Unspecified behavior	Possible outcomes exist, but which one is chosen is not specified	Must fall within the allowed range	Certain evaluation-order issues
Undefined Behavior (UB)	The standard places no requirements	No guarantees whatsoever	Signed integer overflow, null pointer dereference, out-of-bounds array access
Ill-formed	The program itself violates the rules	A diagnostic is usually required	Syntax errors, certain type errors

The C++ working draft also treats undefined behavior as "no requirements," explaining that possible outcomes range broadly from being ignored, to unpredictable execution, to termination with a diagnostic.7 cppreference likewise lists out-of-bounds memory access, signed integer overflow, null pointer dereference, and strict aliasing violations as examples of C/C++ UB.6 8

Distinctions:

Implementation-defined behavior is a documented implementation choice. Risk remains.
Unspecified behavior is one of several possible choices.
UB has no fixed set of choices at all.

UB is broader in scope than "random results occur" — it does not mean the program crashes in a bizarre way every single time. Code may appear to work under normal conditions, yet after changing the compiler version or optimization flags, that code path may be treated as impossible.

In other words, UB is called "undefined behavior" precisely because it is unpredictable.

1. `-O0` does not turn UB into defined behavior

Passing -O0 (Optimization Level 0, the compiler flag that disables optimizations) does not transform UB into defined behavior. Developers often assume that "turning off all optimizations means the compiler translates source code one-to-one into machine code, so unpredictable behavior (UB) can be avoided" — but this is a misconception.1

The problem is that not all UB manifests in the same way. Why would it be undefined behavior in the first place? Let's start by splitting it into two broad categories.

First, UB that directly intersects with hardware or the ABI. Cases such as misaligned pointer access, out-of-bounds memory access, invalid function pointer calls, and accessing an object outside its valid lifetime can all result in crashes, traps, or stray memory accesses regardless of optimization level. In these cases, -O0 does not make the program safe; it only means the compiler is somewhat less likely to aggressively rewrite the code.

Second, UB that surfaces when compiler optimization passes treat it as a given. Examples include null check elimination, simplification of conditional expressions exploiting signed overflow, dead code elimination, and loop optimization. These transformations tend to be more visible at -O2, -O3, with LTO, or with target-specific optimizations. At -O0, the same code may appear to execute as intended on the surface. However, that does not mean the code has become defined behavior; it is closer to saying that, in that particular build, the UB has not yet been actively exploited.

This process can be likened to a game of "telephone." The intent of the source code is continuously transformed as it passes through the compiler's Intermediate Representation (IR), optimization stages, the ABI, and hardware instructions. Even if the code happens to behave as intended under -O0 today, there is no guarantee the same result will hold under a future compiler version or a different architecture.

The classic analysis from LLVM and John Regehr connects directly to this point: a compiler may assume that UB never occurs in a valid C/C++ program, and that assumption feeds directly into case analysis, dead code elimination, null check elimination, and loop optimization.3 4

To be precise, UB is still UB even at -O0. It is just that aggressive code elimination and rewriting based on UB typically surface only when optimizations are enabled. Confusing the two makes it hard to counter the argument "but null checks don't get eliminated at -O0, do they?"

Consider the following code.

C

int *p = get_pointer();
int value = *p;          /* (1) Dereference succeeded = the compiler can assume `p` is non-null afterward */
if (p == NULL) {         /* (2) "Since the dereference succeeded, `p` cannot be `NULL`" */
        return -1;       /* (3) The entire branch is eliminated as dead code */
}
return value;            /* Because `value` is actually used, the load in (1) is not eliminated */

A human reader might interpret this as "read the pointer, and if it's null, return an error." But if p is null, UB already occurs at the dereference in (1). In other words, this code is not code that has a null check; it is code where the null check comes too late.

When optimizations are enabled, the situation becomes more severe. The compiler can conclude that in a valid program the dereference at (1) succeeded, and from that fact infer that p is not null. The subsequent if (p == NULL) branch can then be eliminated as dead code.

C

int *p = get_pointer(); 
int value = *p; 
return value;

Does this mean the compiler is maliciously removing defensive code? No. The C/C++ standard allows the compiler to interpret code under the assumption that null pointer dereference never occurs in a valid program. On a path that contains UB, "an unexpected value is produced" is less accurate than "that path may be treated as unreachable." The null check elimination example in the LLVM article takes exactly this form, and it has led to real vulnerabilities in the Linux kernel.4

For example, the fact that signed integer overflow is UB is not simply "the value might go wrong."

C

int greater_after_increment(int x)
{
        return x + 1 > x;
}

A human thinks that when x == INT_MAX, an overflow might corrupt the result. In C/C++, signed integer overflow is UB. The compiler may assume that in a valid program x + 1 does not overflow, and the function may be optimized to always return true.

In an optimized build, this function can be simplified as follows.

C

int greater_after_increment(int x)
{
        return 1;
}

At -O0, the actual addition and comparison may remain intact. However, that does not mean the x == INT_MAX input constitutes defined behavior. If signed integer overflow occurs on that input, the standard's guarantees no longer apply; it is simply that in that particular build the compiler chose not to fold the conditional expression by exploiting the UB.

The conclusion, therefore, is not "UB is caused by optimization settings," nor is it "UB always manifests the same way at -O0." Whether it surfaces as a hardware trap, silently proceeds by chance, or causes defensive code to vanish during an optimization pass is determined by the build configuration and the platform. And what makes this truly insidious is precisely that it works fine on your own machine.4

2. Common UB Cases

The cases below lurk in code that is far more ordinary than the commonly cited memory-safety trio (double-free, use-after-free, out-of-bounds access). This is not an exhaustive list; an exhaustive list would read less like documentation and more like an exorcism ritual.

Personal note: I think C programmers would do well to adopt a religion for the sake of their peace of mind.

2.1 Misaligned Pointer Access (C23 6.3.2.3)

C

int foo(const int *p)
{
        return *p;
}

If p does not satisfy the alignment requirement for int (typically an address that is a multiple of sizeof(int)), dereferencing it is UB and the outcome varies by platform.

Linux on Alpha: on certain loads, the kernel receives a trap and emulates the intended behavior in software; on other loads, the process crashes with SIGBUS.
SPARC: SIGBUS.
x86/amd64: it works without issue in most cases and sometimes even appears to behave like an atomic read, because x86 is a cache-coherency-lenient architecture.

Personal note: "It works on my machine" is the single most meaningless sentence you can utter in front of the C standard. Just because it ran quietly on x86 does not mean the standard has vouched for your code; it simply means the hardware that day had no complaints. It is important to treat a computer the way you would treat a woman during her period.

The compiler has absolutely no obligation to generate safe machine code for a misaligned pointer. It can, at any point, use optimization as justification for emitting a different load instruction, and at that moment the thin mercy the hardware had been granting disappears and the program responds immediately with a crash.

Atomicity is similar.

The moment you ask whether a store/load on a misaligned std::atomic<int> is atomic, you have already taken the wrong path. Because it is UB, the question simply does not apply. In practice, atomicity can break. This becomes even clearer if you picture an object straddling a page boundary.

More strictly speaking, the cast itself can be the problem.

C

bool parse_packet(const uint8_t *bytes)
{
        const int *magic_intp = (const int *)bytes;   /* ✗ UB already occurs here */
        int magic_raw = foo(magic_intp);              /* May crash on SPARC */
        int magic = ntohl(magic_raw);                 /* This line itself is fine */
        /* ... */
}

What makes this even more uncomfortable is that the issue is not limited to the dereference inside foo(). The cast that creates a pointer to an incorrectly aligned object type is already UB. The naive assumption that "just casting is fine as long as you don't read" is exactly where the trouble starts.

7 A pointer to an object type may be converted to a pointer to a different object type. If the resulting pointer is not correctly aligned59) for the referenced type, the behavior is undefined.

(C23 6.3.2.3p7: if the resulting pointer from a conversion is not correctly aligned for the target type, the conversion itself is UB.)

Most C/C++ programmers believe that a cast itself is safe and that trouble only surfaces when reading or writing through the pointer (*p), but the standard is explicit: a conversion to a misaligned pointer is UB on the spot. Before you ever access memory, the act of casting already gives the compiler license to do damage. The standard also permits the compiler to assign meaning to the low bits of an int *, such as for garbage collection or security tagging.

The UB in this example does not end with alignment alone. Reading the original uint8_t array through an int lvalue also violates the strict aliasing (effective type) rule (see section 2.7). Even if bytes happens to satisfy 4-byte alignment and thus avoids the alignment UB, the aliasing UB from reading through int remains independently. The reason code that casts a packet to an integer for parsing is dangerous is not that it breaks one rule, but that it steps on both alignment and effective type in the same line. The safe path is to copy the bytes in via memcpy.

2.2 The `isxdigit(char)` problem (C23 7.4, 6.2.5)

C

bool bar(char ch)
{
        return isxdigit(ch);   /* ✗ UB possible on platforms where char is signed */
}

In all cases the argument is an int, the value of which shall be representable as an unsigned char or shall equal the value of the macro EOF. If the argument has any other value, the behavior is undefined

Functions in the isxdigit family are defined to accept a value representable as unsigned char, or EOF. EOF is an int and is a negative value not representable as unsigned char. Therefore the argument type is int.

On platforms where char is signed (implementation-defined, 6.2.5), passing a byte value of 0x80 or higher converts it to a negative int, and a valid implementation like the following ends up reading from the wrong memory location.

C

int isxdigit(int c)
{
        if (c == EOF) {
                return false;
        }
        return some_array[c];   /* reads outside the array if `c` is negative */
}

If that memory happens to be an I/O-mapped region, the result may not end with a mere random value or crash. In an embedded environment, a motor could start spinning. If a machine physically moves because you called isxdigit() incorrectly once, you are entitled to go home that day, slide your C book under your pillow, and sleep on it.

The defensive form is as follows.

C

  /* Good example */
bool bar(char ch)
{
        return isxdigit((unsigned char)ch) != 0;   /* ✓ */
}

2.3 Converting `float` to `int` (C23 6.3.1.4)

C

int milliseconds(float seconds)
{
        int tmp = (int)(seconds * 1000.0);   /* ✗ Out-of-range access is UB */
        return tmp + 1;                       /* ✗ Signed integer overflow is also a separate UB */
}

If the value of the integral part cannot be represented by the integer type, the behavior is undefined.

When converting a finite floating-point value to an integer type, if the integer part cannot be represented in the target type, the behavior is undefined (6.3.1.4). By extension, non-finite values such as NaN and Infinity are also UB.

The tricky part is that even performing a safe comparison is not straightforward. Casting INT_MAX to float provides no guarantee of exact representation, so the comparison can be inaccurate. The defensive code looks like this.

C

int milliseconds(float seconds)
{
        const float ftmp = seconds * 1000.0f;

        if (!isfinite(ftmp)) {
                return 0;   /* or another error report */
        }
        if ((float)(INT_MIN + 1000) > ftmp) {
                return 0;
        }
        if ((float)(INT_MAX - 1000) < ftmp) {
                return 0;
        }

        /* Now safe to convert */
        const int tmp = (int)ftmp;
        if (INT_MAX == tmp) {
                return 0;
        }

        /* Now safe to add */
        return tmp + 1;
}

"I just wanted to convert seconds to milliseconds" — and yet this much guard code is required. Even a quick scan of open-source projects turns up plenty of code that simply multiplies and casts without any such checks.

Personal note: at this point, C is a language that teaches you to check boundary conditions and avoid mistakes before it teaches you to calculate anything.

2.4 Address 0 and the Null Pointer (C23 6.3.2.3, 6.5.4.2)

There is no guarantee that NULL in the C standard (the null pointer constant, which is either the integer constant 0 or a value converted from nullptr) is identical to hardware address 0. This is because the standard operates strictly on top of the C abstract machine. The only guarantee provided is that "comparing NULL to 0 evaluates as equal," and that 0 may in fact be the result of converting to the platform's native NULL representation (for example, 0xffff).

Dereferencing such a null pointer is unambiguous UB. C standard section 6.5.4.2 states this as follows.

"If an invalid value has been assigned to the pointer, the behavior of the unary * operator is undefined."

The standard declares that dereferencing (*) a pointer holding an "invalid value" produces UB, and footnote 101 of that clause explicitly identifies the null pointer as the primary example of such an invalid value.

Due to this characteristic of the language, there are two traps that frequently appear in real-world practice.

C

memset(&ptr, 0, sizeof(ptr));   /* ✗ No guarantee that `ptr` becomes NULL */

You cannot assume that zeroing out a struct makes all member pointers NULL. This works on most modern platforms, but it is not a guarantee at the standard level.

C

void (*func_ptr)() = NULL;
func_ptr();   /* ✗ Null dereference, example of UB in 6.5.4.2 */

Dereferencing a null pointer is UB regardless of its value (6.5.4.2). Section 6.3.2.3 specifies that NULL is not equal to "any object or function," so even if a real object or vector table exists at address 0, standard C does not allow you to treat that location like an ordinary object. The idea of "just issue a call instruction with an all-zeroes bit pattern" also falls apart once the definition of "all zeroes" itself becomes ambiguous, as it does on 16-bit x86 where address 0 could mean 0000:0000 or CS:0000.

In OS kernel and embedded work, this is no joke.

Personal note: ordinary application developers typically encounter address 0 as a bug, but kernel and embedded folks run into address 0 like a familiar neighbor. Sometimes they even stop in for a drink at the local bar.

2.5 Variadic Arguments and Type Mismatch

C

execl("/bin/sh", "sh", "-c", "date", NULL);   /* ✗ */
execl("/bin/sh", "sh", "-c", "date", 0);      /* ✗ */
execl("/bin/sh", "sh", "-c", "date", (char *)NULL);   /* ✓ */

The sentinel that terminates a variadic argument list must be a pointer type, but the NULL macro can be interpreted as the integer 0, so an explicit cast is required.

There is an ABI-level nuance here. On implementations where NULL is defined as ((void *)0) (which covers most POSIX systems), passing execl(..., NULL) does hand off a pointer, so it works in practice. The case that truly breaks is an LP64 environment where NULL is defined as the integer 0, int is 32 bits, and pointers are 64 bits. Variadic calls carry no type information, so the caller pushes a 32-bit int, while execl internally pulls a 64-bit pointer via va_arg(ap, char *). The width mismatch means the upper bits may contain garbage, and the sentinel may no longer be recognized as NULL. Casting to (char *)NULL ensures the value is passed at pointer width regardless of how NULL is defined.

C

uint64_t blah = 123;
printf("%ld\n", blah);   /* ✗ format and argument type mismatch = UB */

printf("%" PRIu64 "\n", blah);   /* ✓ */

uint64_t should be printed using macros such as PRIu64. For types whose signedness is implementation-defined, such as uid_t, the workaround is to cast to uintmax_t and print with PRIuMAX.

2.6 Division by Zero

Everyone knows that division by zero is UB, yet it comes up more often than you might expect. The denominator more frequently comes from untrusted input than people realize, and divide-by-zero can become an attack surface.

For reference, the C23 standard uses the word "undefined" 283 times. Behaviors that are undefined by omission are not even counted in that number.

Personal note: if "undefined" appears 283 times, that is perhaps not a warning but a philosophy.

2.7 Strict Aliasing

In C/C++, you cannot read an object through just any pointer type. When the compiler determines that two types cannot alias each other, it is allowed to assume that two pointers of those types do not point to the same memory.

C

int f(int *i, float *f)
{
        *i = 1;
        *f = 0.0f;
        return *i;
}

If you force i and f to point to the same address, it may look to a human reader as though *f = 0.0f would overwrite the bits of *i. Under the strict aliasing rule, however, an int * and a float * are generally not considered to point to the same object, so the compiler is free to treat return *i as return 1.

Using memcpy for byte copying is generally the safer workaround.

C

float read_float_from_bytes(const unsigned char bytes[sizeof(float)])
{
        float value;
        memcpy(&value, bytes, sizeof(value));
        return value;
}

char, unsigned char, and std::byte are treated as special channels for reading an object's representation. Code that reinterprets memory through arbitrary type pointers must follow separate rules.

2.8 Object Lifetime and Placement New

In C++, having a memory address does not mean a live object of the desired type exists at that location. An object's storage and its lifetime are distinct concepts.

C++

alignas(std::string) unsigned char buffer[sizeof(std::string)];

auto *s = reinterpret_cast<std::string *>(buffer);
// *s must not be used here. The `std::string` object has not yet been started.

Actually creating an object requires a lifetime-starting operation such as placement new.

C++

auto *s = new (buffer) std::string("hello");
std::cout << *s << "\n";
s->~basic_string();

C++'s object lifetime rules frequently cause issues when building allocators, arenas, serialization layers, ECS systems, and custom containers. "Having allocated the memory" and "having a live object of that type" are not the same thing.

2.9 Data Race

In C++, a data race is UB. If two threads access the same memory, at least one of those accesses is a write, and there is no appropriate synchronization, the entire program becomes undefined.

C++

int counter = 0;

void worker()
{
        counter++;   // When multiple threads execute concurrently, a data race occurs
}

In such cases, use atomic types.

C++

std::atomic<int> counter = 0;

void worker()
{
        counter.fetch_add(1, std::memory_order_relaxed);
}

The reason a data race is UB is not simply that values can be wrong. It is because the compiler and CPU are free to reorder memory accesses and hold values in registers. A concurrent data race is not merely a value error; it is closer to a state that lies outside the language model entirely.

2.10 Shift UB

Bit shifts are another common source of UB.

C

int x = 1 << 31;       /* UB possible if `int` is 32-bit */
int y = 1 << -1;       /* UB */
int z = 1 << 32;       /* UB if `int` is 32-bit */

Shifting by a negative count or by an amount greater than or equal to the type's width is UB. A left shift on a signed type that produces a result not representable in that type is also problematic. Bitwise code is better written using explicit unsigned types such as uint32_t or uint64_t, and shift counts should be validated before use.

Note that the rules for signed left shift differ in C++20.

3. Bonus: Integer Promotion (Not UB, but a Trap)

While not UB itself, integer promotion rules are difficult to evaluate quickly when skimming code. These rules silently convert what appears to be a small-type operation into an int operation.

C

unsigned char a = 0xff;
unsigned char b = 1;
unsigned char zero = 0;
bool overflowed = (a + b) == zero;
/* `overflowed` is 0 (false). `a` and `b` are promoted to `int`, becoming `0x100`, so they differ from `zero` */

C

unsigned char a = 0x80;
uint64_t b = a << 24;
/* `b` is `0xffffffff80000000` (18446744071562067968), not `0x80000000` (2147483648) */
/* This happens even if all variables are `unsigned`. `a` is promoted to `int` before being shifted, then sign-extended */

Even when all variables are unsigned, promotion happens at the int level, and the sign extension of the result can change the final value. It is practically impossible for a human to correctly apply promotion rules at skimming speed.

Strictly speaking, the a << 24 shown above is both an integer promotion trap and UB in its own right. Shifting the value 0x80 (promoted to int) left by 24 bits yields 2^31, which cannot be represented as a signed int, making this a signed left shift that is UB. This is the point where the latest specifications of C and C++ diverge. C++20 mandates two's complement representation for signed integers and defines signed left shift as a modulo operation, so this shift becomes defined behavior. C23 likewise fixes signed integer representation to two's complement, but signed left shift overflow remains UB; pinning down the representation does not automatically define arithmetic or shift overflow. Consequently, the same 0x80 << 24 is UB under C23 but defined behavior under C++20 and later. Either way, writing bitwise operations with explicit unsigned types such as uint32_t or uint64_t is the safer approach.

4. How Do Other Languages Handle This Cost?

UB in C/C++ cannot be explained away as "mistakes made by careless programmers." If even experts keep running into it over 30 years, I think we need to look at the fundamental costs built into the language and its ecosystem rather than attributing the problem to individual skill.

Other languages handle this cost primarily in one of four ways.

Strategy	Representative Languages	Meaning
Block it at compile time	Rust, SPARK, Kotlin/Swift null safety	Dangerous states are rejected by the type system or static verification
Surface it as an exception/panic at runtime	Java, C#, Go, Swift, Python	Turn failure into a defined failure
Isolate dangerous areas with `unsafe`	Rust, C# unsafe, Go unsafe, Swift	Mark dangerous operations syntactically
Define behavior explicitly	Java integer overflow, C# checked/unchecked	Specify the result even if it is slow or strange

4.1 Rust

The type system and borrow checker block many UB categories from C/C++ at compile time. Array out-of-bounds access results in a panic rather than UB, and use-after-free, double-free, and data races are mostly prevented at the compilation stage in safe Rust.

Rust

let value = vec![1, 2, 3];
let x = value[10];   // Panic, not UB

Rust does have UB, but responsibility is isolated to unsafe blocks.

Rust

unsafe {
    let p = 1 as *const i32;
    let x = *p;   // UB possible
}

What isolating things to unsafe blocks means is this.

In C/C++, landmines are scattered along ordinary paths, but Rust at least warns you before you enter safe territory that you could lose a leg. Of course, once you open an unsafe block, you are responsible again.

4.2 Zig

Zig is a lower-level language that is more explicit than C. It surfaces overflow, alignment, optional values, and error handling at the language level. In debug/safe builds, integer overflow can be trapped, while in release-fast builds some checks are dropped for performance. Dangerous conversions such as @intCast, @alignCast, and @ptrCast are also made visible in the code. Because pointers, alignment, and unsafe operations remain, Zig is not a language where UB disappears entirely. And honestly, I don't know it that well either.

4.3 Go

Go favors defined behavior. Array and slice out-of-bounds accesses and nil dereferences typically result in a panic, and because Go has a GC, use-after-free and double-free do not occur in ordinary code. The unsafe package can be used to reopen C-style dangers.

Go

arr := []int{1, 2, 3}
fmt.Println(arr[10])   // Panic, not UB

4.4 Java / C#

Most C-style UB is converted into runtime exceptions.

C#

int[] arr = { 1, 2, 3 };
Console.WriteLine(arr[10]);   // IndexOutOfRangeException

Java

int[] arr = {1, 2, 3};
System.out.println(arr[10]);   // ArrayIndexOutOfBoundsException

Out-of-bounds accesses throw exceptions, null references throw exceptions, memory deallocation is handled by the GC, and type mismatches are caught by runtime or compile-time checks. C# allows pointer usage via unsafe blocks, but the dangerous region is syntactically marked. Java also has Unsafe, JNI, and the Panama pathway, but these are separated from ordinary code.

C#

unsafe {
    int* p = stackalloc int[10];
}

4.5 Swift / Kotlin

Nullability is encoded in the type system, which greatly reduces null pointer problems.

Kotlin

var name: String = "abc"
name = null            // compile error
var maybeName: String? = null

Code

var name: String = "abc"
var maybeName: String? = nil

Array out-of-bounds accesses are handled as traps or exceptions, so failures are surfaced explicitly rather than silently corrupting state as UB would.

4.6 Python / JavaScript

C-style UB is largely absent. Instead, runtime errors and peculiar defined behaviors are common.

Python

arr = [1, 2, 3]
arr[10]   # IndexError

JavaScript

const arr = [1, 2, 3];
console.log(arr[10]);   // undefined

JavaScript trades UB for a different kind of confusion: undefined, NaN, and implicit coercion. It does not have the compiler-assumes-this-is-impossible mode that collapses the entire meaning of a program.

4.7 Ada / SPARK

It provides range types, runtime checks, and a strong type system.

Code

subtype Percent is Integer range 0 .. 100;

SPARK is a formally verifiable subset of Ada, focused on reducing overflow, out-of-bounds accesses, and data races through mathematical proofs and static analysis.

Ada is sometimes described as a more natural choice than C in safety-critical domains such as defense, aviation, and rail, but is there really a compelling reason to use Ada?

4.8 Explicitly Defined Behavior: C# checked / Java overflow

C# lets you choose between checked (exception on overflow) and unchecked (wraparound) for integer overflow, and whichever behavior you select is fully defined.

C#

checked {
    int x = int.MaxValue;
    x = x + 1;   // OverflowException
}

Java's int overflow is also not UB; it is defined as two's complement wraparound.

Java

int x = Integer.MAX_VALUE;
System.out.println(x + 1);   // Integer.MIN_VALUE

You can still introduce bugs, but this is not the kind of UB where the compiler assumes "no overflow occurs" and rewrites your code accordingly.

4.9 Comparison Table

Language	Primary Strategy	UB Risk
C	Developer responsibility; many sources of UB	Very High
C++	RAII and type-system tools available, yet many sources of UB remain	Very High
Rust	safe/unsafe separation, borrow checker	Low in safe Rust
Zig	Explicit marking of unsafe operations, checks per build mode	Medium
Go	panic/GC/defined behavior first	Low, except `unsafe`
Java	Runtime exceptions/GC/defined overflow	Low
C#	Runtime exceptions/GC/`checked` support	Low, except `unsafe`
Swift/Kotlin	Null safety, runtime trap	Low
Python	Runtime error	Low
JavaScript	Many well-defined quirks, few UB cases	Low
Ada/SPARK	Range types / verification / contracts	Designable to be very low

4.10 Rice's Theorem and the "Universal Checker" Problem

When people ask about checking for UB, they usually think along these lines.

Can't the compiler or a static analyzer just find all UB ahead of time?

General UB detection runs into the same undecidability as the halting problem. For example, a question like "Does this program divide by zero on some input?" can directly encode whether an arbitrary program P terminates. If you can rephrase whether P halts as whether something divides by zero, then a universal checker that solves the latter immediately becomes a machine that solves the halting problem. It would be nice if such a machine existed, but unfortunately it does not.

Rice's Theorem generalizes this limitation from one problem to all semantic properties. It states that no algorithm can decide a non-trivial semantic property of computable functions in general.13 In other words, it is not just division by zero that is blocked; most questions about what a program actually does hit the same wall. Concretely, these include the following.

Is this pointer always valid at the point of execution?
Does this integer addition overflow on any input?
Is this array access in-bounds on every execution path?
Do these two threads ever encounter a data race in actual execution?
Does this function necessarily terminate on a given input?

Static analysis demands answers to these questions before execution. In the general case, it runs straight into the undecidability wall described above. That the halting problem is undecidable is itself a classical result from Turing's 1936 paper.14

Options available to a static analyzer:

Choice	Consequence
Allow false positives	Warns on code that is actually safe
Allow false negatives	Misses some UB
Restrict the scope of analysis	Sound only within a specific language subset or specific coding conventions
Require annotations	Developers must annotate preconditions, invariants, and ownership information

A universal UB detector cannot be expected from a general algorithm. Such a tool would be essentially equivalent to a "compiler that finds all bugs," and at that point it would be less a compiler and more a crystal ball.

This does not mean that theoretical limits cripple practical tools. Because no general solution exists, the approach is to carve out narrow, well-scoped problems and solve those.

Sanitizers catch UB on the paths that actually execute.
Static analyzers conservatively flag suspicious sites.
Type systems prevent dangerous expressions from being written in the first place.
Fuzzing feeds in inputs that humans would never think to try.
Rules such as MISRA/CERT prohibit use of the dangerous subsets of C.

Rice's theorem explains "why, even after using multiple tools, a human still has to look at the end." C/C++ UB checking sits at the intersection of mathematical limits and practical cost, which is precisely why this problem is so persistently exhausting.

5. Practical Countermeasures

C/C++ continues to be used in performance-critical code, ABI layers, operating systems, embedded systems, game engines, drivers, and legacy codebases. As of 2026, the reality is closer to "enable your tools, establish coding rules, and mark danger zones" than to "just be careful."

Sanitizers (ASan, UBSan, TSan)
Static analyzers
Fuzzers
Maximizing compiler warnings
MISRA / CERT rules
Code review
LLM-assisted UB auditing
Documenting unsafe boundaries

For new C/C++ code, maintain at least one sanitizer build.

Bash

clang -Wall -Wextra -Wpedantic -Wconversion -Wshadow -fsanitize=undefined,address -fno-omit-frame-pointer main.c

Bash

gcc -Wall -Wextra -Wpedantic -Wconversion -Wshadow -fsanitize=undefined,address -fno-omit-frame-pointer main.c

UBSan is a tool for catching various UBs at runtime. The Clang documentation explains that UBSan inserts compile-time instrumentation to detect out-of-bounds accesses, invalid shifts, null/misaligned pointer dereferences, signed integer overflow, and more at runtime.[fn:fn-9]

Sanitizers are not proofs of correctness. Installing CCTV does not mean no thief exists; it only means you can see the thieves who happen to be caught on camera.

Only paths that are actually executed get checked.
Not all UB can be caught.
The problems that surface may vary depending on optimization settings and build options.
Runtime use may be restricted in kernel, embedded, or real-time environments.
Do not assume that enabling sanitizers means production performance and behavior will be identical.

Example tool combinations:

Stage	Tool
During Development	UBSan, ASan, TSan
CI	Sanitizer builds, warnings as errors, fuzz testing
Before Release	Static analyzers: `clang-tidy`, `cppcheck`, Coverity/PVS-Studio, etc.
Regulatory/Embedded	MISRA C/C++, CERT C/C++, SPARK/Ada review
Code Review	Focused review of unsafe/pointer/casting/lifetime/concurrency boundaries

Even when an LLM audit is applied to a codebase renowned for maturity and strict authorship (such as OpenBSD), UB candidates pour out, and there are documented cases where actual out-of-bounds writes were discovered and patched. LLM audits have their limits too. Confirming what they surface ultimately still requires a human expert. The work resembles housecleaning, except that picking up the broom means reading the standard document and the assembly side by side. Honestly, can anyone do all of this?[fn:fn-000001]

Summary:

C/C++ pushes failures into "undefined," while modern languages pull failures up into one of the following: compile error, runtime exception, unsafe boundary, or explicit wraparound.

UB can be thought of as the distance between "works on my machine" and "valid per the standard."

References

[1] Everything in C is undefined behavior. Thomas Habets, 2026-05-19. A post covering various UB cases and an anecdote about an OpenBSD LLM audit.

blog.habets.se: Everything in C is undefined behavior - Cases involving alignment, isxdigit, float-to-int, null pointer, varargs, and integer promotion, plus an OpenBSD LLM audit anecdote

[2] C23 Standard sections cited in this post. The normative clauses behind the examples discussed.

6.3.2.3 (pointers, null pointer) - Misaligned pointers, comparison of NULL with objects and functions
6.3.1.4 (real floating to integer) - UB when float-to-int conversion is out of range
7.4 (character handling), 6.2.5 (char signedness, implementation-defined) - The isxdigit(char) problem
6.5.4.2 (Address and indirection operators)

[3] A Guide to Undefined Behavior in C and C++. John Regehr. A classic introduction to UB.

blog.regehr.org: A Guide to Undefined Behavior, Part 1 - the mechanism by which compilers transform code under the assumption that "UB never occurs"

[4] What Every C Programmer Should Know About Undefined Behavior. Chris Lattner, LLVM Project Blog (3-part series).

Part 1 - the principles behind how UB enables performance optimizations
Part 2 - surprising effects such as null check elimination, with case studies from Linux, OpenSSL, and glibc

[5] Semantics of UB and the motivation for optimization. A supplementary perspective.

Raph Levien: With Undefined Behavior, Anything is Possible - distinctions among the portable/semi-portable/standard C camps, and strict aliasing
Paul J. Lucas: Undefined Behavior in C and C++ - the origin of "nasal demons" (John F. Woods, comp.lang.c, 1992) and the disappearing-if case study

[6] cppreference.

[7] C++ working draft.

eel.is C++ draft: Definitions - terminology definitions including undefined behavior, implementation-defined behavior, and related concepts

[8] cppreference C++.

cppreference C++: Undefined behavior - examples including signed overflow, out-of-bounds access, uninitialized scalars, invalid scalars, and null pointers

[9] Clang UndefinedBehaviorSanitizer.

LLVM Clang documentation: UndefinedBehaviorSanitizer - UBSan's purpose, types of UB it can detect, and usage

[10] SEI CERT C Coding Standard.

SEI CERT C Coding Standard - a collection of rules in C that lead to UB and security vulnerabilities

[11] C++ Core Guidelines.

C++ Core Guidelines - guidelines related to lifetime, bounds, type safety, and resource safety

[12] What every compiler writer should know about programmers.

Wang et al. "What Every Compiler Writer Should Know about Programmers" - a paper examining the conflicts between UB in C and the tension between compiler optimizations and programmer expectations

[13] Rice's theorem.

H. G. Rice. "Classes of Recursively Enumerable Sets and Their Decision Problems". Transactions of the American Mathematical Society, 74(2), 358-366, 1953.
Wolfram MathWorld: Rice's Theorem

[14] Turing and undecidability.

Alan M. Turing. "On Computable Numbers, with an Application to the Entscheidungsproblem". 1936.

Categories: Undefined Behavior | C++ Memory Safety Debate | Programming Idioms | WIKI

→ 현재 버전 보기