The Undefined Behaviour Bogeyman

2 February 2024

Introduction

It's taken me a while to get properly used to the idea of the Rust programming language. I still haven't actually properly learned the language, though I've been exposed to it in various ways; I've resisted doing so, in fact, I think it's fair to say, even though I've long been aware of C and C++'s propensity for Undefined Behaviour, that catch-all clause of the language specifications which, essentially, allows that a program that violates the language's rules can have any behaviour at all. I'll get to learning Rust eventually. In the meantime, I have some C++ projects going strong, and I'm happy to say that I have no great regrets about choosing C++ for these, despite its lack of memory safety and the presence of Undefined Behaviour, even though Rust shows a clear path forward that I will eventually set out upon.

That I'm a touch blasé about Undefined Behaviour may be surprising. Undefined Behaviour, some will warn us, is really bad. A definitely-possible result is that it will open a hole in the afflicted software and make it vulnerable to attack. Undefined Behaviour ("UB" henceforth) in the wrong bit of critical system software could theoretically lead to whole system compromise. Further, because there are so many ways you can get undefined behaviour in C/C++, and in particular because it can be introduced quite easily by accident, writing C and C++ should be considered akin to walking through a minefield.

For those who haven't already abandoned hope and switched to using Rust, a common follow-on thought is this: we should make C/C++ safer, by removing as many forms of UB as we reasonably can.

Except, I'm now going to explain that most forms of UB are not actually all that bad in practice. And, while I'm all for making the languages safer—in terms of making it harder to unwittingly introduce bugs—I think that trying to reduce the number of causes of UB from C/C++ is largely a fool's errand that won't lead to more secure software. I'll be focussing on one particular (and seemingly contentious) cause of UB as an example, but I believe the logic extends to most other forms of UB in these languages as well. (There are a few key causes of UB that would be beneficial to remove, but these are so fundamental that we can't remove them without making radical changes to the language and/or its implementations).

It's not to say that avoiding all UB isn't a worthwhile goal—Rust is clearly aiming to allow doing so and for good reason. My argument here is not so much "we shouldn't worry about UB at all", rather, it is that if we are already choosing to use a language for which UB is fundamentally ingrained, then we shouldn't obsess too much over many of the individual causes of UB.

It will take me a little while to get to the crux of the argument though. For now, bear with me; we need to cover some background.

Things that UB Prophets of Doom (self-appointed) may rightly point out are:

There are a heap of possible UBs—a C11 language spec draft that I referred to while writing this listed well over 100 different possible causes;
Actual program behaviour when UB is present can vary when different optimisation options are used to compile the same program;
Further, different compilers, and different versions of the same compiler, can cause different actual behaviour in the presence of UB.

This last one is a good opportunity for discussion, so let's take it. I'll come back later to address the first point; the 2nd we can take as a given.

The overflow issue (again...)

One typical example of UB in C/C++ is that resulting from signed integer overflow. If I have two values of type int and add them together, and the result is too large to "fit into" a variable of type int, the behaviour is actually undefined. This (apparently) surprises and/or annoys some people who expect that the behaviour in this case should match "that of the underlying architecture", that is, they expect (or desire) the result of such an addition to be the same as whatever the result of using the underlying processor's add instruction would be. Up to a point in history, these expectations/desires would almost certainly have been fulfilled by just about any compiler that you might have thrown at your code, because compilers weren't all that sophisticated—at least by today's standards. There, of course, lies the snake; things changed, compilers got more complex, and out of a desire to have them generate faster and/or smaller code, their developers incorporated optimisations which drastically altered the actual behaviour of code which exhibited¹ UB. As a result, some programs containing the affected forms of UB that previously worked as intended suddenly ceased to do so when compiled with a newer compiler.

Most (all?) extant architectures use 2's complement representation for integers and their underlying arithmetic instructions (at the processor level) having "wrapping" behaviour for operations which overflow: if I take the largest representable value for an int, and I add one, the result of wrapping is that the value becomes the smallest representable value, which is a negative value. This is convenient at the hardware level for a number of reasons which I won't go into now (but which I'm sure many readers will be aware of anyway). You might think that this is not however a particularly useful behaviour to have, in general, in a C program; it turns out though that there are cases where it could be useful that do crop up, often enough: because we know that adding positive numbers results in a negative number if overflow occurs, assuming 2's complement wrapping on overflow gives us a way to check if overflow occurred. And so (bugged) code similar to the following was, historically, sometimes written:


int a = ...;
...
a += 1;
if (a < 0) {
    printf("Overflow occurred!\n");
    return ERR_OVERFLOW;
}
...

This code adds one to an integer variable and checks if the result is smaller than zero. The intention is to detect overflow so that it can correctly be reported, rather than continue with an arithmetically incorrect value (and perhaps produce incorrect output as a result); however, it contains a bug: it assumes integer overflow will exhibit wrapping behaviour, when in fact the behaviour in this case is undefined. In practice, it worked, for a while, because compilers tended to translate the addition operation to an "add" instruction in the target processor; However, at some point, compilers started compiling this code as if the check wasn't there! As an optimisation this is actually sound, since overflow would be UB and in the presence of UB no particular behaviour is required of the program as a whole; there is no way the condition a < 0 can be true if UB is not present, and so the condition can soundly be assumed to never hold. Hence, it seems that the compiler "exploits" the (conditional, in this case) presence of UB to perform optimisations. And that's often how the compiler behaviour has often been described, even by myself; it is not, however, a very good description of what the compiler actually does (even though the effective result is the same).

Note: I have in fact written about this before, and am somewhat treading the same ground, but this time I will go into a bit more detail about how things work "from the perspective" of the compiler, and I'll provide some additional justification for the notion that 2's-complement wrapping on overflow isn't really any better than UB.

The real reason that the check gets eliminated is a combination of optimisation passes that most modern compilers implement. The most important in this context are value range analysis (VRA) and dead code elimination (DCE). To explain how they could work to remove the overflow check in the above example, we will need to expand the example a little, so let's do that now:


int a = foo();               /* 1;     a : [MIN_INT, MAX_INT] */
if (a < 0) {
    return ERR_FOO_FAILED;   /* 2;     a : [MIN_INT, 0) */
}
/* blank */                  /* 3      a : [0, MAX_INT] */
a += 1;                      /* 4      a : [1, MAX_INT] */
if (a < 0) {                 /* 5      (condition necessarily false) */
    printf("Overflow occurred!\n");
    return ERR_OVERFLOW;
}
...

Now we have a usable example annotated with some points of interest, let's talk about value range analysis or VRA. This is an information-gathering or analysis pass that decides, at each point of the program, a possible range of values that each scalar variable might have. If we look at the above program and consider the (only) variable, 'a', a VRA pass would determine that:

After /* 1 */, assuming no further information about the function foo is available, a can have any value: its value range is [MIN_INT, MAX_INT].
At /* 2 */, we have passed (and evaluated as true) the condition a < 0, and so the known range of the variable can be adjusted accordingly, to: [MIN_INT, 0). Note that the function exits at this point via return, so we do not propagate this range further.
At /* 3 */ on the other hand, the condition evaluated as false, so the range is: [0, MAX_INT]. This comes from taking the range on input to the branch (the if statement), as at /* 1 */, and applying the branch condition (which is a >= 0) to reduce the range. Note that if the other branch, /* 2 */, did not exit the function, we would instead need to union the value ranges from both branches (and so would again have a variable with any possible value).
At /* 4 */ we can adjust the range due to addition of a constant: it becomes [1, MAX_INT]. This assumes overflow is not possible. If we had to account for overflow (with wrapping semantics), the range would instead be represented as [MIN_INT, MAX_INT], aka any value, once more; this particular analysis is always representing the value as a single range between two limits, so it can't do any better than that.
Finally, at /* 5 */, the range is propagated from /* 4 */, and so is [1, MAX_INT]. Note that this makes the condition in the if statement impossible; no values in the range [1, MAX_INT] are less than 0.

Once VRA has done its thing, the dead code elimination (DCE) pass, a transform pass, can proceed to remove the if statement at /* 5 */, since its condition cannot be true.

Astute readers will understand immediately why it can be beneficial for the precision of the analysis to assume no overflow during VRA: it allows retaining a significantly reduced value range, which can benefit later optimisation passes. In this case, it was unfortunate that this resulted in an (incorrect) check designed to detect overflow being optimised away, but it's important to understand that at no point was there any kind of attempt by the compiler to exploit a specific case of undefined behaviour that it detected.

Now, I must point out that range analysis (VRA) and the exact sequence I outlined above aren't the only way that an incorrect bounds check could be eliminated: perhaps there are others that do more directly decide that some execution path contains UB and so the condition that path is predicated on must be false. The VRA example shows however how simply assuming the absence of UB, without actually detecting its presence, can result in optimisations with unexpected effects in case UB does occur. Althought the example isn't one of useful optimisation, I hope it's clear how the technique is at least hypothetically useful for real optimisations (such as eliminating impossible branches) without UB necessarily being involved.

This compiler behaviour of inadvertently removing incorrectly coded attempts to check for overflow has been around for a long time, but there are people even today who rail quite hard against it. Typical arguments have included:

UB was never meant to allow that;
That shouldn't be UB because something as trivial as overflowing an integer value shouldn't be able to compromise my system;
That shouldn't be UB because there is no real advantage to making it be UB (it doesn't really help optimisation) and because UB could compromise my system.

The first argument is hardly worth wasting time on; even if it were true, the ship has already sailed.

I think the best argument of these is the 3rd (I will get to the 2nd shortly), though even for that one the VRA/DCE example I gave illustrates how having overflow be UB can be helpful to optimisation; it's difficult to judge from this artificial example, though, how important this is when applied to real code. The problem is that optimisation is very complex and involves a range of passes, some of which may run multiple times, and for which the interactions between are significant. It is entirely conceivable that VRA leading to dead branch elimination would further improve a second VRA pass, and so on several times over, possibly leading to VRA being able to determine a single value rather than a range for some set of values, allowing constant folding to occur, and so on, reducing even a large function to just a few instructions. It would be easy enough, I am sure, to construct an(other) artificial example demonstrating this, but the question remains as to whether this ever really happens in real applications (or perhaps the question is really: does it happen often enough to justify this optimisation in face of the known potential pitfalls of performing it)? I suspect the answer, especially in C code, is "not really". I am less certain for C++, where there can be a tendency to use metaprogramming techniques and rely heavily on optimisation to remove abstraction costs.

Regarding the notion that a trivial integer overflow shouldn't result in potential system compromise, the issue I take is that even with integer overflow having undefined behaviour, you're unlikely to see an outcome where this results in such a vulnerability—with the obvious exception that we've already covered, where optimsation nullifies an incorrectly coded attempt to check for overflow (because this is sometimes used to check validity before using a value as in index into an array or buffer, and an out-of-bounds access can most definitely lead to a security hole). I suspect compilers could do a better job warning about such cases, and that would presumably alleviate the problem, but I'm also reasonably sure that vulnerabilities caused by the particular problem of silently "disabled" overflow checks are extremely rare (I will circle back to this in a bit). But really, other than in the theoretical case of overflow checks being disabled, integer overflow is not a vulnerability-introducing type of UB. And, indeed, most forms of UB are not. I mentioned earlier:

There are a heap of possible UBs—a C11 language spec draft that I referred to while writing this listed well over 100 different possible causes

... and this is something that UB doom prophets will loudly lament: there are so many UBs, and the list keeps growing with each revision of the language, that avoiding UB is impossible. Indeed, I have seen claims that the ever increasing UB will inevitably lead to programs which were previously UB-free suddenly being prone to it. This lament comes, I think, from a failure to really understand where the "new" explicit UB is coming from. The list of UBs in the language specification was never really exhaustive; the first example, in fact, of something that causes a program to exhibit UB is:

A "shall" or "shall not" requirement that appears outside of a constraint is violated

Note: a "constraint" in C-standardese is essentially a rule which is imposed on the implementation (i.e. a rule which specifies semantics of the language). So a requirement "outside of a constraint" is a rule imposed instead on the program(mer).

This immediately, implicitly, expands to a whole slew of clauses detailing constructs that exhibit UB, some of which are probably also explicated in the list and some of which aren't. But nearly all such clauses, and nearly all specific instances of new UB, relate to new functionality in the newer revision of the language. In C11, for example, the formal addition of support for concurrency (including threads and atomic operations) was introduced, along with some other new features, and with these came a bunch of additions to the list of UB in the annex. This wasn't new undefined behaviour that could potentially now occur in previously UB-free programs: it was new ways that UB could be present due to (mis)use of previously unavailable features. Examples include:

The execution of a program contains a data race
A member of an atomic structure or union is accessed
A function declared with a _Noreturn function specifier returns to its caller

None of these things had any defined behaviour prior to C11, even though they weren't listed in the C99 annex; they were UB by fundamental definition. They are not, in fact, new forms of UB that can apply to older programs; they are rather forms of UB that apply to newer programs that are using newer language constructs.

In addition, many things that cause UB are anyway harmless (in the sense that they will not introduce vulnerabilities). One example (that probably shouldn't be UB at all) is when "An attempt is made to use the value of a void expression". Unsurprisingly, compilers diagnose this at compile time; there is no risk of this "Undefined Behaviour" having any behavior, at all, at run time. (Unless I'm missing something, this specific case UB could easily be removed if the language committee cared to do so, but I suspect they have more important things to worry about; this one really does not matter).

So how does integer overflow tie into this? Well, I have seen criticisms regarding the undefined behavior of integer overflow in C/C++, even from individuals who clearly recognised that you cannot eliminate UB from C/C++ entirely (at least without making drastic changes to the language, or for individual implementations, significantly impacting performance or requiring special hardware). This claim is usually accompanied by a further claim that defining overflow behaviour as 2's complement wrapping (at least on target architectures whose arithmetic instructions do exactly that) would improve the situation. I refute these claims, and particularly the second; I think that removing this one form of UB from C/C++, by itself, is not worthwhile; defining it instead to have wrapping behaviour, or leaving it implementation-defined (which would probably amount to the same thing), would accomplish very little.

I have mentioned already that I don't believe that the UB-nature of integer overflow by itself leads to vulnerabilities. What does lead to vulnerabities is 2's complement wrapping behavior on overflow, which is the typical actually observable behaviour of integer overflow, even though the behaviour is actually undefined by the language specification; and which, oddly enough, is also a commonly nominated behaviour of those who argue that UB-on-overflow is problematic. The reason that wrapping causes vulnerabilities is because it leads to out-of-bounds access (eg buffer overflow). The latter is UB itself, regardless, and it doesn't matter at all whether the bad buffer index value was produced via a defined wrapping overflow or not. The only real difference is that in the latter case (UB on overflow), a present but incorrectly coded check for overflow can be ineffective in the presence of a compiler optimisation pass which assumes that UB will not be present, as I've already discussed. This case is actually very rare, however; in most actual cases of actual vulnerability, integer overflow that leads to buffer overflow is completely unaffected by whether the overflow is technically UB or whether it is specifically defined to use 2's complement wrapping. In short, most integer overflow vulnerabilities are caused by a complete lack of handling of overflow, in combination with 2's complement wrapping (whether as a result of defined or undefined behavour), not by incorrect coding of such handling.

As evidence of this, on searching for integer-overflow related CVEs, it is immediately apparent that the vast majority of these CVEs are caused by wrapping behaviour that incidentally results, and not from any other possible behaviour (note that many of the CVE records resulting from the query do not provide any information to make certain of this, and most of them seem to refer to cases of unsigned overflow which is not UB in any case and in fact already has wrapping behaviour proscribed; I'll still consider those as evidence, because they are still examples of where having a defined behaviour for overflow did not prevent a vulnerability).

Some examples:

CVE-2024-22211, an unsigned multiplication overflow in FreeRDP, leading to buffer overflow.
CVE-2023-52339, an unsigned multiplication overflow in libebml, leading to buffer overflow.
CVE-2023-51714, a potential addition overflow in Qt's HTTP2 implementation. Note that, although the MR comment suggests the existing check is an incorrect check for overflow, it appears to me that it was not intended as an overflow check at all, i.e. the problem is really a missing overflow check, not an incorrect overflow check. This is borne out by the fact that the patch leaves the existing check in place, and adds an additional, separate, check for the overflow condition.
CVE-2023-47996, an unsigned addition overflow in FreeImage.
CVE-2023-4722, a multiplication overflow without check in GPAC (a multimedia framework).
CVE-2023-45681, an unsigned multiplication overflow in st_vorbis, leading to buffer overflow.
CVE-2023-40745, integer overflow in libTiff leading to buffer overflow. Probably refers to this, in which case it's an unchecked unsigned multiplication overflow.

Judging from that list, it's not the case that overflow-related vulnerabilities are being caused by existing attempted overflow checks being nullified by optimisation, and it's not the case that defining (signed) integer overflow to have wrapping behaviour would have prevented these CVEs, even if all of them were using signed arithmetic (the majority actually use unsigned arithmetic, which is perhaps unsurprising, since buffer indexes are naturally unsigned). The majority of issues are simply missing overflow checks. In short, the UB-nature of signed integer overflow is not causing vulnerabilities; 2's complement wrapping is in fact the culprit. (This wasn't an exhaustive study, obviously; I'll leave that for whoever wants to prove me right or wrong).

Note: for reference, the CVE at the top of the list when I search is CVE-2024-22211, the issue in FreeRDP involving unsigned integer overflow.

Some will argue that future compiler versions may change this, by implementing more advanced analyses and optimisations which are even more able to exploit the presence of UB. After all, this has happened historically, as we discussed earlier, and strictly speaking, UB allows anything to eventuate. But, there's no clear evidence I can see of any ongoing trend for compiler optimisation advances to continue creating new problems for old code. As well, I'm dubious that there's really much scope left for the kind of such advances that would cause these issues. (Of course, this is still ultimately an unknown, but I have enough of a handle on the state-of-the-art in program analysis to feel comfortable in having this scepticism, and regardless also doubt that most integer overflow actually presents any real opportunity for optimisation).

Fortunately for those that believe the benefit to optimisation of having undefined behaviour on overflow is minimal or non-existent and that the risk of the presence of UB is too great, compiler options are generally available to produce a defined result instead. GCC and Clang both support -fwrapv, and even give the option of trapping instead via -ftrapv. I hope it's clear now why the latter is the better option (though if using GCC I would recommend instead using -fsanitize=signed-integer-overflow -fsanitize-undefined-trap-on-error rather than -ftrapv, since it is produces much better code which still traps on overflow). If, for whatever reason, you cannot use—or refuse to use—these options: I hope I've made it clear that having integer overflow defined with wraparound semantics wouldn't help.

Enter Titular Character

Once people begin to understand what UB actually means they begin to be really concerned about the ramifications. I know, because I've been there; the realisation that there are no requirements at all on the compiler leads quickly to the realisation that, when UB is present, really anything can happen. I jokingly referred to "UB Prophets of Doom" earlier, but I fully understand the notion that UB is a scary concept—a bogeyman. I also understand the concern that something so easy to overlook as integer overflow can introduce it.

But, I've moved past that concern. The danger from most forms of UB, generally, is actually insignificant. Yes, there are a number of unexpected behaviours that have been seen to occur, which may trip us up before we become very familiar with the rules of the language. But, despite the hypothetically infinite range of results, the typical effects of most UB are pretty constrained. I always try to write UB-free code, but if it turns out there's a potential integer overflow in code that I've written, I'm not going to be too upset. Unless, of course, it opens up the potential for buffer overflow; I recommend compilation with options that will cause trap on overflow. (To be clear, I will of course fix any such bug that is found in my code).

The code I've written in C++ mostly has limited communication with untrusted entities (via deliberate design) and so I'm not even overly concerned about buffer overflows. And, to be honest, if a little arrogant, I mostly trust myself to write code that's free of them anyway. But, I'm not infallible, and now, if I had to write something that had to process untrusted input, I'd probably not choose C++—because buffer overflow is the real villian in this show, along with other out-of-bounds access and use-after-free. Unfortunately we're not going to see those have defined behaviour in C/C++ any time soon. There are tools, though—sanitizers and static analyzers—that are available to help avoid and mitigate even those problems and I would strongly suggest that if, like me, you still find yourself writing (or running) code in these languages, that you make use of those tools.

The ultimate answer though, if the Undefined Behaviour Bogeyman still haunts your nightmares: yeah, probably just use Rust (but stay clear of unsafe!), or whatever other memory-safe language that suits your needs.

Thanks and gratitude to Laurie Tratt and Edd Barrett for reading a draft of this post and providing useful feedback.

Note #1. it was pointed out to me that the term "invoke" when discussing UB is misleading, because to say that something "invokes UB" implies that the effects of UB in a program will only be visible after the "invoking" action. This is not the case, and so I've avoided this term. Regardless of the precise wording I've used in individual places, note that the effect of UB is not guaranteed to be invisible before the effects of code with well-defined behaviour that precedes it in execution order as it appears in the source code.

Surrounded By Bugs

The Undefined Behaviour Bogeyman

Introduction

The overflow issue (again...)

Enter Titular Character