UINT32_C Macro Considered Harmful

The C99 family of INTN_C and UINTN_C macros fills a real gap in the language, but it also lays extremely nasty traps for the unwary. The evolution of how the macros are defined in the C99 language standard shows that they were poorly defined from the beginning.

Consider code such as this:

unsigned x =  1024 / UINT32_C(1 << 8);

Depending on how the UINT32_C macro is implemented, the value of x might end up being 262144 or 4, or perhaps something different altogether.

That’s not how the macro is supposed to be used you say? That may be true, but why does the compiler not complain then? No error, not even a warning. Or were the language designers really so foolish as to expect every programmer to know the entire text of the Standard by heart? Say it ain’t so…

What Actually Happens

Different compilers define the UINTN_C macros very differently. For example Microsoft’s Visual Studio version 10 SP1 defines it as follows:

#define UINT32_C(x)     ((x) + (UINT32_MAX - UINT32_MAX))

On the other hand, GCC (mingw32 version 3.3.3) defines it like this:

#define UINT32_C(val) val##UL

That’s a very different definition. One of the key differences is that Microsoft puts parentheses around the macro argument, and GCC does not. That explains the behavior of the initial example—if the argument is an expression, operator precedence may cause different results with vs. without parentheses, yet both versions of the macro expand to perfectly valid C.

The Problem

The reason why the UINT32_C macro is harmful in the example at the beginning of this article is that its usage violates the language definition, yet the violation is subtle enough that it is impossible for any normal compiler to detect.

The crux of the problem is that the C preprocessor is a relatively simple language which is intentionally separate from the C language proper, and may be implemented as a standalone program. But it means that the preprocessor has no semantic information, and that is why normal language tools can’t provide any diagnostics.

There is no remotely portable way to express a constraint such as “the UINT32_C macro takes a single argument which is an integer literal”. To the preprocessor, it’s all just text.

On the other hand, by the time the compiler proper (which has quite a bit of semantic information) gets to process the source code, the preprocessing phase is finished and there’s no trace of any macros.

The Confusion

Looking at the revisions of the C99 standard, it is obvious that the UINTN_C macros were troubled from the very beginning. The text in the Standard defining these macros (7.18.4, Macros for integer constants) changed in two out of three Technical Corrigenda (TCs).

TC1 reflects the fact that before the ink was even dry on the official C99 standard (in October 1999), Douglas A.Dwyn already noticed that the macros could not be implemented as specified because typical implementations do not support integer constants smaller than int (such as char or short).

TC3 solved another problem, which is reflected in the difference between Microsoft’s and GCC’s definition of UINT32_C. Prior to TC3, the Standard said: “The argument in any instance of these macros shall be a decimal, octal, or hexadecimal constant […]”. The trouble with GCC’s definition is that when the argument is for example 1UL, it will be expanded to 1ULUL, which is not a valid constant. On the other hand, Microsoft’s implementation would still work.

TC3 changed the text as follows: “The argument in any instance of these macros shall be an unsuffixed integer constant […]” (emphasis added). That actually makes the GCC implementation valid. It should be noted that GCC didn’t just make their implementation up—the Standard itself says (in 7.18.4.1) that “UINT64_C(0x123) might expand to the integer constant 0x123ULL”.

Other implementations had trouble too. For example IBM’s compiler apparently used casts, which weren’t explicitly disallowed in the original Standard but were indirectly outlawed by the following TC1 addition: “Each invocation of one of these macros shall expand to an integer constant expression suitable for use in #if preprocessing directives”.

Users are in turn confused by the fact that the UINTN_C macros are very unusual in that they do not accept expressions as arguments—although such misuse merely produces unexpected results rather than triggering errors.

Missing Rationale

All these problems stem from the fact that the rationale of the UINTN_C macros was (or is) unclear to both implementors and users. The actual C99 Rationale says nothing on the subject, which is likely a significant contributor to the trouble.

Following the TC1 and TC3 changes, it is clear that one major intended use for the macros was preprocessor expressions. That is why the macros cannot use casts, and it is also why there is very limited need for the macros in the first place (since in most places, casts can be used). That’s not the say the macros are useless—only that they have very specific, limited use.

The original intent was most likely to for the macros to be implemented via suffixes, explicitly given as a possible example in the Standard. Sadly, the Standard was worded so badly that such implementation was in fact illegal prior to TC3 (and explains the convoluted macros used by Microsoft). This is a reflection of the difficulty the drafters of the Standard face when converting intent and logic to legalese.

The Moral of the Story?

If there is any, it’s as follows: The C preprocessor is a powerful tool, but it needs to be used with great care. Otherwise there’s a risk of creating problems rather than solving them.

This entry was posted in C, Development. Bookmark the permalink.

2 Responses to UINT32_C Macro Considered Harmful

  1. Joshua Rodd says:

    Quite curious what you’re been researching on where you ran across this!

  2. Michal Necasek says:

    Nothing really, just regular coding… only then I started researching why different compilers produce completely different results 🙂

Leave a Reply

Your email address will not be published. Required fields are marked *