Let us talk a little bit about integer underflow and undefined behaviour in C before we discuss the puzzle I want to share in this post.
#include <stdio.h>
int main()
{
int i;
for (i = 0; i < 6; i--)
printf(".");
return 0;
}
This code invokes undefined behaviour. The value in variable
i
decrements to INT_MIN
after
|INT_MIN| + 1
iterations. In the next iteration, there is a
negative overflow which is undefined for signed integers in C. On many
implementations though, INT_MIN - 1
wraps around to
INT_MAX
. Since INT_MAX
is not less than
6
, the loop terminates. With such implementations, this
code prints print |INT_MIN| + 1
dots. With 32-bit integers,
that amounts to 2147483649 dots. Here is one such example output:
$ gcc -std=c89 -Wall -Wextra -pedantic foo.c && ./a.out | wc -c 2147483649
It is worth noting that the above behaviour is only one of the many
possible ones. The code invokes undefined behaviour and the ISO standard
imposes no requirements on a specific implementation of the compiler
regarding what the behaviour of such code should be. For example, an
implementation could also exploit the undefined behaviour to turn the
loop into an infinite loop. In fact, GCC does optimize it to an infinite
loop if we compile the code with the -O2
option.
# This never terminates! $ gcc -O2 -std=c89 -Wall -Wextra -pedantic foo.c && ./a.out
Let us take a look at the puzzle now.
Add or modify exactly one operator in the following code such that it prints exactly 6 dots.
for (i = 0; i < 6; i--)
printf(".");
An obvious solution is to change i--
to i++
.
for (i = 0; i < 6; i--)
printf(".");
There are a few more solutions to this puzzle. One of the solutions is very interesting. We will discuss the interesting solution in detail below.
Update on 02 Oct 2011: The puzzle has been solved in the comments section. We will discuss the solutions now. If you want to think about the problem before you see the solutions, this is a good time to pause and think about it. There are spoilers ahead.
Here is a list of some solutions:
for (i = 0; i < 6; i++)
for (i = 0; i < 6; ++i)
for (i = 0; -i < 6; i--)
for (i = 0; i + 6; i--)
for (i = 0; i ^= 6; i--)
The last solution involving the bitwise XOR operation is not immediately obvious. A little analysis is required to understand why it works.
Let us generalize the puzzle by replacing \( 6 \) in the loop with an arbitrary positive integer \( n. \) The loop in the last solution now becomes:
for (i = 0; i ^= n; i--)
printf(".");
If we denote the value of the variable i
set by the
execution of i ^= n
after \( k \) dots are printed as
\( f(k), \) then
\[
f(k) =
\begin{cases}
n & \text{if } n = 0, \\
n \oplus (f(k - 1) - 1) & \text{if } n > 1
\end{cases}
\]
where \( k \) is a nonnegative integer, \( n \) is a positive integer,
and the symbol \( \oplus \) denotes bitwise XOR operation on two
nonnegative integers.
Note that \( f(0) \) represents the value of i
set by the
execution of i ^= n
when no dots have been printed yet.
If we can show that \( n \) is the least value of \( k \) for which \( f(k) = 0, \) it would prove that the loop terminates after printing \( n \) dots.
We will see in the next section that for odd values of \( n, \) \[ f(k) = \begin{cases} n & \text{if } k \text{ is even}, \\ 1 & \text{if } k \text{ is odd}. \end{cases} \] Therefore there is no value of \( k \) for which \( f(k) = 0 \) when \( n \) is odd. As a result, the loop never terminates when \( n \) is odd.
We will then see that for even values of \( n \) and \( 0 \leq k \leq n, \) \[ f(k) = 0 \iff k = n. \] Therefore the loop terminates after printing \( n \) dots when \( n \) is even.
We will first prove a few lemmas about some interesting properties of the bitwise XOR operation. We will then use it to prove the claims made in the previous section.
Lemma 1. For an odd positive integer \( n, \) \[ n \oplus (n - 1) = 1 \] where the symbol \( \oplus \) denotes bitwise XOR operation on two nonnegative integers.
Proof. Let the binary representation of \( n \) be \( b_m \dots b_1 b_0 \) where \( m \) is a nonnegative integer and \( b_m \) represents the most significant nonzero bit of \( n. \) Since \( n \) is an odd number, \( b_0 = 1. \) Thus \( n \) may be written as \[ b_m \dots b_1 1. \] As a result \( n - 1 \) may be written as \[ b_m \dots b_1 0. \] The bitwise XOR of both binary representations is \( 1. \)
Lemma 2. For a nonnegative integer \( n, \) \[ n \oplus 1 = \begin{cases} n + 1 & \text{if } n \text{ is even}, \\ n - 1 & \text{if } n \text{ is odd}. \end{cases} \] where the symbol \( \oplus \) denotes bitwise XOR operation on two nonnegative integers.
Proof. Let the binary representation of \( n \) be \( b_m \dots b_1 b_0 \) where \( m \) is a nonnegative integer and \( b_m \) represents the most significant nonzero bit of \( n. \)
If \( n \) is even, \( b_0 = 0. \) In this case, \( n \) may be written as \( b_m \dots b_1 0. \) Thus \( n \oplus 1 \) may be written as \( b_m \dots b_1 1. \) Therefore \( n \oplus 1 = n + 1. \)
If \( n \) is odd, \( b_0 = 1. \) In this case, \( n \) may be written as \( b_m \dots b_1 1. \) Thus \( n \oplus 1 \) may be written as \( b_m \dots b_1 0. \) Therefore \( n \oplus 1 = n - 1. \)
Note that for odd \( n, \) lemma 1 can also be derived as a corollary of lemma 2 in this manner: \[ k \oplus (k - 1) = k \oplus (k \oplus 1) = (k \oplus k) \oplus 1 = 0 \oplus 1 = 1. \]
Lemma 3. If \( x \) is an even nonnegative integer and \( y \) is an odd positive integer, then \( x \oplus y \) is odd, where the symbol \( \oplus \) denotes bitwise XOR operation on two nonnegative integers.
Proof. Let the binary representation of \( x \) be \( b_{xm_x} \dots b_{x1} b_{x0} \) and that of \( y \) be \( b_{ym_y} \dots b_{y1} b_{y0} \) where \( m_x \) and \( m_y \) are nonnegative integers and \( b_{xm_x} \) and \( b_{xm_y} \) represent the most significant nonzero bits of \( x \) and \( y, \) respectively.
Since \( x \) is even, \( b_{x0} = 0. \) Since \( y \) is odd, \( b_{y0} = 1. \)
Let \( z = x \oplus y \) with a binary representation of \( b_{zm_z} \dots b_{z1} b_{z0} \) where \( m_{zm_z} \) is a nonnegative integer and \( b_{zm_z} \) is the most significant nonzero bit of \( z. \)
We get \( b_{z0} = b_{x0} \oplus b_{y0} = 0 \oplus 1 = 1. \) Therefore \( z \) is odd.
Theorem 1. Let \( \oplus \) denote bitwise XOR operation on two nonnegative integers and \[ f(k) = \begin{cases} n & \text{if } n = 0, \\ n \oplus (f(n - 1) - 1) & \text{if } n > 1. \end{cases} \] where \( k \) is a nonnegative integer and \( n \) is an odd positive integer. Then \[ f(k) = \begin{cases} n & \text{if } k \text{ is even}, \\ 1 & \text{if } k \text{ is odd}. \end{cases} \]
Proof. This is a proof by mathematical induction. We have \( f(0) = n \) by definition. Therefore the base case holds good.
Let us assume that \( f(k) = n \) for any even \( k \) (induction hypothesis). Let \( k' = k + 1 \) and \( k'' = k + 2. \)
If \( k \) is even, we get \begin{align*} f(k') & = n \oplus (f(k) - 1) && \text{(by definition)} \\ & = n \oplus (n - 1) && \text{(by induction hypothesis)} \\ & = 1 && \text{(by lemma 1)},\\ f(k'') & = n \oplus (f(k') - 1) && \text{(by definition)} \\ & = n \oplus (1 - 1) && \text{(since \( f(k') = 1 \))} \\ & = n \oplus 0 \\ & = n. \end{align*}
Since \( f(k'') = n \) and \( k'' \) is the next even number after \( k, \) the induction step is complete. The induction step shows that for every even \( k, \) \( f(k) = n \) holds good. It also shows that as a result of \( f(k) = n \) for every even \( k, \) we get \( f(k') = 1 \) for every odd \( k'. \)
Theorem 2. Let \( \oplus \) denote bitwise XOR operation on two nonnegative integers and \[ f(k) = \begin{cases} n & \text{if } n = 0, \\ n \oplus (f(n - 1) - 1) & \text{if } n > 1. \end{cases} \] where \( k \) is a nonnegative integer, \( n \) is an even positive integer, and \( 0 \leq k \leq n. \) Then \[ f(k) = 0 \iff k = n. \]
Proof. We will first show by the principle of mathematical induction that for even \( k, \) \( f(k) = n - k. \) We have \( f(0) = n \) by definition, so the base case holds good. Now let us assume that \( f(k) = n - k \) holds good for any even \( k \) where \( 0 \leq k \leq n \) (induction hypothesis).
Since \( n \) is even (by definition) and \( k \) is even (by induction hypothesis), \( f(k) = n - k \) is even. As a result, \( f(k) - 1 \) is odd. By lemma 3, we conclude that \( f(k + 1) = n \oplus (f(k) - 1) \) is odd.
Now we perform the induction step as follows: \begin{align*} f(k + 2) & = n \oplus (f(k + 1) - 1) && \text{(by definition)} \\ & = n \oplus (f(k + 1) \oplus 1) && \text{(by lemma 2 for odd \( n \))} \\ & = n \oplus ((n \oplus (f(k) - 1)) \oplus 1) && \text{(by definition)} \\ & = (n \oplus n ) \oplus ((f(k) - 1) \oplus 1) && \text{(by associativity of XOR)} \\ & = 0 \oplus ((f(k) - 1) \oplus 1) \\ & = (f(k) - 1) \oplus 1 \\ & = (f(k) - 1) - 1 && \text{(from lemma 2 for odd \( n \))} \\ & = f(k) - 2 \\ & = n - k - 2 && \text{(by induction hypothesis).} \end{align*} This completes the induction step and proves that \( f(k) = n - k \) for even \( k \) where \( 0 \leq k \leq n. \)
We have shown above that \( f(k) \) is even for every even \( k \) where \( 0 \leq k \leq n \) which results in \( f(k + 1) \) as odd for every odd \( k + 1. \) This means that \( f(k) \) cannot be \( 0 \) for any odd \( k. \) Therefore \( f(k) = 0 \) is possible only even \( k. \) Solving \( f(k) = n - k = 0, \) we conclude that \( f(k) = 0 \) if and only if \( k = n. \)
]]>#include <stdio.h>
int main()
{
https://susam.net/
printf("hello, world\n");
return 0;
}
This code compiles and runs successfully.
$ c99 hello.c && ./a.out hello, world
However, the C99 standard does not mention anywhere that a URL is a valid syntactic element in C. How does this code work then?
Update on 04 Jun 2011: The puzzle has been solved in the comments section. If you want to think about the problem before you see the solutions, this is a good time to pause and think about it. There are spoilers ahead.
The code works fine because https:
is a label and
//
following it begins a comment. In case, you are
wondering if //
is indeed a valid comment in C, yes, it
is, since C99. Download the
C99
standard, go to section 6.4.9 (Comments) and read the second
point which mentions this:
Except within a character constant, a string literal, or a comment,
the characters //
introduce a comment that includes all
multibyte characters up to, but not including, the next new-line
character. The contents of such a comment are examined only to
identify multibyte characters and to find the terminating new-line
character.
]]>
Here is a fun puzzle that involves complex type declarations in C:
Without using typedef
, declare x
as a
pointer to a function that takes one argument which is an array of
10 pointers to functions which in turn take int *
as
their only argument, and returns a pointer to a function which
has int *
argument and void
return type.
Here is a simpler way to state this puzzle:
Without using typedef
, declare x
as a pointer
that is equivalent to the following declaration of x
:
typedef void (*func_t)(int *);
func_t (*x)(func_t [10]);
If you want to think about this puzzle, this is a good time to pause and think about it. There are spoilers ahead.
Let me describe how I solve such problems. Let us start from the right end of the problem and work our way to the left end defining each part one by one.
void x(int *)
A function that has int *
argument
and void
return type.
void (*x)(int *)
A pointer to a function that has int *
argument and
void
return type.
void (*x())(int *)
A function that returns a pointer to a function that has int
*
argument and void
return type.
void (*x(void (*)(int *)))(int *)
A function that has a pointer to a function that has int
*
argument and void
return type as argument,
and returns a pointer to a function which has int *
argument and void
return type.
void (*x(void (*[10])(int *)))(int *)
A function that has an array of 10 pointers to functions that has
int *
argument and void
return type as
argument, and returns a pointer to a function which has int
*
argument and void
return type.
void (*(*x)(void (*[10])(int *)))(int *)
A pointer to a function that has an array of 10 pointers to
functions that has int *
argument
and void
return type as argument, and returns a
pointer to a function which has int *
argument
and void
return type.
Here is an example that uses the above pointer declaration in a program in order to verify that it works as expected:
#include <stdio.h>
/* A function which has int * argument and void return type. */
void g(int *a)
{
printf("g(): a = %d\n", *a);
}
/* A function which has an array of 10 pointers to g()-like functions
and returns a pointer to a g()-like funciton. */
void (*f(void (*a[10])(int *)))(int *)
{
int i;
for (i = 0; i < 10; i++)
a[i](&i);
return g;
}
int main()
{
/* An array of 10 pointers to g(). */
void (*a[10])(int *) = {g, g, g, g, g, g, g, g, g, g};
/* A pointer to function f(). */
void (*(*x)(void (*[10])(int *)))(int *) = f;
/* A pointer to function g() returned by f(). */
void (*y)(int *a) = x(a);
int i = 10;
y(&i);
return 0;
}
Here is the output of this program:
$ gcc -Wall -Wextra -pedantic -std=c99 foo.c && ./a.out g(): a = 0 g(): a = 1 g(): a = 2 g(): a = 3 g(): a = 4 g(): a = 5 g(): a = 6 g(): a = 7 g(): a = 8 g(): a = 9 g(): a = 10
The book The C Programming Language, Second Edition has some good examples of complicated declarations of pointers in Section 5.12 (Complicated Declarations). Here are a couple of them:
char (*(*x())[])()
x: function returning pointer to array[] of pointer to function
returning char
char (*(*x[3])())[5]
x: array[3] of pointer to function returning pointer to array[5] of
char
Here is a C puzzle that involves some analysis of the machine code generated from it followed by manipulation of the runtime stack. The solution to this puzzle is implementation-dependent. Here is the puzzle:
Consider this C code:
#include <stdio.h>
void f()
{
}
int main()
{
printf("1\n");
f();
printf("2\n");
printf("3\n");
return 0;
}
Define the function f()
such that the output of the
above code is:
1
3
Printing 3
in f()
and exiting is not
allowed as a solution.
If you want to think about this problem, this is a good time to pause and think about it. There are spoilers ahead.
The solution essentially involves figuring out what code we can
place in the body of f()
such that it causes the
program to skip over the machine code generated for
the printf("2\n")
operation. I'll share two solutions
for two different implementations:
Let us first see step by step how I approached this problem for GCC.
We add a statement char a = 7;
to the function
f()
. The code looks like this:
#include <stdio.h>
void f()
{
char a = 7;
}
int main()
{
printf("1\n");
f();
printf("2\n");
printf("3\n");
return 0;
}
There is nothing special about the number 7
here. We
just want to define a variable in f()
and assign some
value to it.
Then we compile the code and analyze the machine code generated for
f()
and main()
functions.
$ gcc -c overwrite.c && objdump -d overwrite.o overwrite.o: file format elf64-x86-64 Disassembly of section .text: 0000000000000000 <f>: 0: 55 push %rbp 1: 48 89 e5 mov %rsp,%rbp 4: c6 45 ff 07 movb $0x7,-0x1(%rbp) 8: c9 leaveq 9: c3 retq 000000000000000a <main>: a: 55 push %rbp b: 48 89 e5 mov %rsp,%rbp e: bf 00 00 00 00 mov $0x0,%edi 13: e8 00 00 00 00 callq 18 <main+0xe> 18: b8 00 00 00 00 mov $0x0,%eax 1d: e8 00 00 00 00 callq 22 <main+0x18> 22: bf 00 00 00 00 mov $0x0,%edi 27: e8 00 00 00 00 callq 2c <main+0x22> 2c: bf 00 00 00 00 mov $0x0,%edi 31: e8 00 00 00 00 callq 36 <main+0x2c> 36: b8 00 00 00 00 mov $0x0,%eax 3b: c9 leaveq 3c: c3 retq
When main()
calls f()
, the microprocessor
saves the return address (where the control must return to after
f()
is executed) in stack. The line at offset
1d in the listing above for main()
is the
call to f()
. After f()
is executed, the
instruction at offset 22 is executed. Therefore the
return address that is saved on stack is the address at which the
instruction at offset
22 would be present at runtime.
The instructions at offsets 22 and 27
highlighted in orange are the instructions for the
printf("2\n")
call. These are the instructions we want
to skip over. In other words, we want to modify the return address
in the stack from the address of the instruction at
offset 22 to that of the instruction at
offset 2c. This is equivalent to skipping 10 bytes
(0x2c - 0x22 = 10) of machine code or adding 10 to the return
address saved in the stack.
Now how do we get hold of the return address saved in the stack when
f()
is being executed? This is where the variable
a
we defined in f()
helps. The instruction
at offset 4 highlighted in olive is the instruction
generated for assigning 7
to the
variable a
.
From the knowledge of how microprocessor works and from the machine
code generated for f()
, we find that the following
sequence of steps are performed during the call to f()
:
f()
pushes the content of the RBP (base
pointer) register into the stack.
f()
copies the content of the RSP (stack
pointer) register to the RBP register.
f()
stores the byte value 7
at the memory address specified by the content of RBP minus 1.
This achieves the assignment of the value 7
to the
variable a
.
After 7
is assigned to the variable a
, the
stack is in the following state:
Address | Content | Size (in bytes) |
---|---|---|
&a + 5 |
Return address (old RIP) | 8 |
&a + 1 |
Old base pointer (old RBP) | 8 |
&a |
Variable a |
1 |
If we add 9 to the address of the variable a
, i.e.,
&a
, we get the address where the return address is
stored. We saw earlier that if we increment this return address by
10 bytes, it solves the problem. Therefore here is the solution
code:
#include <stdio.h>
void f()
{
char a;
(&a)[9] += 10;
}
int main()
{
printf("1\n");
f();
printf("2\n");
printf("3\n");
return 0;
}
Finally, we compile and run this code and confirm that the solution works fine:
$ gcc overwrite.c && ./a.out 1 3
Now we will see another example solution, this time for Visual Studio 2005.
Like before we define a variable a
in f()
.
The code now looks like this:
#include <stdio.h>
void f()
{
char a = 7;
}
int main()
{
printf("1\n");
f();
printf("2\n");
printf("3\n");
return 0;
}
Then we compile the code and analyze the machine code generated from it.
C:\>cl overwrite.c Microsoft (R) 32-bit C/C++ Optimizing Compiler Version 14.00.50727.42 for 80x86 Copyright (C) Microsoft Corporation. All rights reserved. overwrite.c Microsoft (R) Incremental Linker Version 8.00.50727.42 Copyright (C) Microsoft Corporation. All rights reserved. /out:overwrite.exe overwrite.obj C:\>dumpbin /disasm overwrite.obj Microsoft (R) COFF/PE Dumper Version 8.00.50727.42 Copyright (C) Microsoft Corporation. All rights reserved. Dump of file overwrite.obj File Type: COFF OBJECT _f: 00000000: 55 push ebp 00000001: 8B EC mov ebp,esp 00000003: 51 push ecx 00000004: C6 45 FF 07 mov byte ptr [ebp-1],7 00000008: 8B E5 mov esp,ebp 0000000A: 5D pop ebp 0000000B: C3 ret 0000000C: CC int 3 0000000D: CC int 3 0000000E: CC int 3 0000000F: CC int 3 _main: 00000010: 55 push ebp 00000011: 8B EC mov ebp,esp 00000013: 68 00 00 00 00 push offset $SG2224 00000018: E8 00 00 00 00 call _printf 0000001D: 83 C4 04 add esp,4 00000020: E8 00 00 00 00 call _f 00000025: 68 00 00 00 00 push offset $SG2225 0000002A: E8 00 00 00 00 call _printf 0000002F: 83 C4 04 add esp,4 00000032: 68 00 00 00 00 push offset $SG2226 00000037: E8 00 00 00 00 call _printf 0000003C: 83 C4 04 add esp,4 0000003F: 33 C0 xor eax,eax 00000041: 5D pop ebp 00000042: C3 ret Summary B .data 57 .debug$S 2F .drectve 43 .text
Just like in the previous objdump
listing, in this
listing too, the instruction at offset 4
highlighted in
olive shows where the variable a
is allocated and the
instructions at offsets 25
, 2A
,
and 2F
highlighted in orange show the instructions we
want to skip, i.e., instead of returning to the instruction at
offset 25
, we want the microprocessor to return to the
instruction at offset
32
. This involves skipping 13 bytes (0x32 - 0x25 = 13) of
machine code.
Unlike the previous objdump
listing, in this listing we
see that the Visual Studio I am using is a 32-bit on, so it
generates machine code to use 32-bit registers like EBP, ESP, etc.
Thus the stack looks like this after 7
is assigned to
the variable
a
:
Address | Content | Size (in bytes) |
---|---|---|
&a + 5 |
Return address (old EIP) | 4 |
&a + 1 |
Old base pointer (old EBP) | 4 |
&a |
Variable a |
1 |
If we add 5 to the address of the variable a
, i.e.,
&a
, we get the address where the return address is
stored. Here is the solution code:
#include <stdio.h>
void f()
{
char a;
(&a)[5] += 13;
}
int main()
{
printf("1\n");
f();
printf("2\n");
printf("3\n");
return 0;
}
Finally, we compile and run this code and confirm that the solution works fine:
C:\>cl /w overwrite.c Microsoft (R) 32-bit C/C++ Optimizing Compiler Version 14.00.50727.42 for 80x86 Copyright (C) Microsoft Corporation. All rights reserved. overwrite.c Microsoft (R) Incremental Linker Version 8.00.50727.42 Copyright (C) Microsoft Corporation. All rights reserved. /out:overwrite.exe overwrite.obj C:\>overwrite.exe 1 3
The machine code that the compiler generates for a given C code is highly dependent on the implementation of the compiler. In the two examples above, we have two different solutions for two different compilers.
Even with the same brand of compiler, the way it generates machine code for a given code may change from one version of the compiler to another. Therefore, it is very likely that the above solution would not work on another system (such as your system) even if you use the same compiler that I am using in the examples above.
However, we can arrive at the solution for an implementation of the
compiler by determining what number to add to &a
to get
the address where the return address is saved on stack and what number
to add to this return address to make it point to the instruction we
want to skip to after f()
returns.
A particular type of question comes up often in C programming forums. Here is an example of such a question:
#include <stdio.h>
int main()
{
int i = 5;
printf("%d %d %d\n", i, i--, ++i);
return 0;
}
The output is 5 6 5
when compiled with GCC and
6 6 6
when compiled with the C compiler that comes with
Microsoft Visual Studio. The versions of the compilers with which
I got these results are:
Here is another example of such a question:
#include <stdio.h>
int main()
{
int a = 5;
a += a++ + a++;
printf("%d\n", a);
return 0;
}
In this case, I got the output 17
with both the
compilers.
The behaviour of such C programs is undefined. Consider the following two statements:
printf("%d %d %d\n", i, i--, ++i);
a += a++ + a++;
We will see below that in both the statements, the variable is modified twice between two consecutive sequence points. If the value of a variable is modified more than once between two consecutive sequence points, the behaviour is undefined. Such code may behave differently when compiled with different compilers.
Before looking at the relevant sections of the C99 standard, let us see what the book The C Programming Language, Second Edition says about such C statements. In Section 2.12 (Precedence and Order of Evaluation) of the book, the authors write:
C, like most languages, does not specify the order in which the operands of an operator are evaluated. (The exceptions are
&&
,||
,?:
, and ',
'.) For example, in a statement likex = f() + g();
f
may be evaluated beforeg
or vice versa; thus if eitherf
org
alters a variable on which the other depends,x
can depend on the order of evaluation. Intermediate results can be stored in temporary variables to ensure a particular sequence.
In the next paragraph, they write,
Similarly, the order in which function arguments are evaluated is not specified, so the statement
printf("%d %d\n", ++n, power(2, n)); /* WRONG */
can produce different results with different compilers, depending on whether
n
is incremented beforepower
is called. The solution, of course, is to write++n; printf("%d %d\n", n, power(2, n));
They provide one more example in this section:
One unhappy situation is typified by the statement
a[i] = i++;
The question is whether the subscript is the old value of
i
or the new. Compilers can interpret this in different ways, and generate different answers depending on their interpretation.
To read more about this, download the C99 standard, go to section 5.1.2.3 (Program execution), and see the second point which mentions:
Accessing a volatile object, modifying an object, modifying a file, or calling a function that does any of those operations are all side effects,^{11)} which are changes in the state of the execution environment. Evaluation of an expression may produce side effects. At certain specified points in the execution sequence called sequence points, all side effects of previous evaluations shall be complete and no side effects of subsequent evaluations shall have taken place. (A summary of the sequence points is given in annex C.)
Then go to section 6.5 and see the second point which mentions:
Between the previous and next sequence point an object shall have its stored value modified at most once by the evaluation of an expression.^{72)} Furthermore, the prior value shall be read only to determine the value to be stored.^{73)}
Finally go to Annex C (Sequence Points). It lists all the sequence points. For example, the following is mentioned as a sequence point:
The call to a function, after the arguments have been evaluated (6.5.2.2).
This means that in the statement
printf("%d %d %d\n", i, i--, ++i);
there is a sequence point after the evaluation of the three arguments
(i
, i--
, and ++i
) and before
the
printf()
function is called. But none of the items
specified in Annex C implies that there is a sequence point between
the evaluation of the arguments. Yet the value of i
is
modified more than once during the evaluation of these arguments. This
makes the behaviour of this statement undefined. Further, the value of
i
is being read not only for determining what it must
be updated to but also for using as arguments to
the printf()
call. This also makes the behaviour of
this code undefined.
Let us see another example of a sequence point from Annex C.
The end of a full expression: an initializer (6.7.8); the expression in an expression statement (6.8.3); the controlling expression of a selection statement (if
orswitch
) (6.8.4); the controlling expression of awhile
ordo
statement (6.8.5); each of the expressions of afor
statement (6.8.5.3); the expression in areturn
statement (6.8.6.4).
Therefore in the statement
a += a++ + a++;
there is a sequence point at the end of the complete expression
(marked with a semicolon) but there is no other sequence point
before it. Yet the value of a
is modified twice before
the sequence point. Thus the behaviour of this statement is
undefined.
One day we were discussing how we could obfuscate the
main()
function in C such that the main()
function didn't seem to appear in the code. I wrote the following
code and posted to the mailing list:
#include <stdio.h>
#define decode(s,t,u,m,p,e,d) m ## s ## u ## t
#define begin decode(a,n,i,m,a,t,e)
int begin()
{
printf("Stumped?\n");
}
This program compiles and runs successfully. It produces the following output:
Stumped?
The mailing list does not exist anymore. The community disappeared gradually and the mailing list was removed. But the code seems to have survived in the inboxes of some subscribers because if we search the web, we can find so many occurrences of this code.
Here is a quick explanation of the code. In C, ##
is
the preprocessor operator for concatenation.
begin()
is replaced
with decode(a,n,i,m,a,t,e)()
,
decode(a,n,i,m,a,t,e)()
is replaced with m ## a
## i ## n()
, and
m ## a ## i ## n()
is replaced
with main()
.
Thus effectively, begin()
is replaced
with main()
.