Broken C code

A few weeks back, someone asked on debian-68k about an unexpected result of some piece of software when it was compiled on ARAnyM, an m68k emulator that can run Linux.

His initial idea was that this could perhaps be a bug in ARAnyM, since it only occurred to him inside ARAnyM, and not on any of the other architectures he tried. In fact, it was a bug in his code.

Let's have a look at what's going on here. The example program which Sergei provided and which exhibits the problematic behaviour is spread over two files. The first file contains this:

#include <stdio.h>

void *f();

main() {
	void *a;
	a = f();
	printf("%d\n", (long)a);
}
while the second contains this:
#include <stdio.h>

long f();

long f() {
	long a = 84;
	printf("%d\n", a);
	return a;
}

Many people familiar with the C programming language will intuitively expect that this program, when run, will output the following:

84
84

When in fact, when compiled and ran on an m68k machine, the output is as follows:

84
-1072577264

What's going on here? In order to understand that, we have to take a look at the assembly code. This is done by use of "objdump -d", and gives (amongst others) the following output:

800004c4 <main>:
800004c4:	4e56 0000	linkw %fp,#0
800004c8:	61ff 0000 001a	bsrl 800004e4 <f>
800004ce:	2f08		movel %a0,%sp@-
800004d0:	4879 8000 05aa	pea 800005aa <_IO_stdin_used+0x4>
800004d6:	61ff ffff feec	bsrl 800003c4 <printf@plt>
800004dc:	508f		addql #8,%sp
800004de:	4e5e		unlk %fp
800004e0:	4e75		rts
800004e2:	4e75		rts

800004e4 <f>:
800004e4:	4e56 0000	linkw %fp,#0
800004e8:	4878 0054	pea 54 <_init-0x800002f0>
800004ec:	4879 8000 05aa	pea 800005aa <_IO_stdin_used+0x4>
800004f2:	61ff ffff fed0	bsrl 800003c4 <printf@plt>
800004f8:	7054		moveq #84,%d0
800004fa:	4e5e		unlk %fp
800004fc:	4e75		rts
800004fe:	4e75		rts

So what's happening here?

  • main acquires some stack space, and then (0x800004c8) immediately jumps to a subroutine 26 bytes ahead; our function f.
  • f also acquires stack space (0x800004e4), then pushes two addresses on the stack (...4e8 and 4ec), and jumps to the printf function. The first of these two addresses is the constant 84 that is hardcoded in the binary; the second address is our format string.
  • f then moves the constant value 84 to the D0 register (...4f8), frees its stack space (4fa), and returns (4fc; the second rts can be ignored, that's a harmless quirk in the compiler).
  • main now copies whatever is in the A0 register to the stack (4ce), pushes the exact same format string to the stack (4d0), and again jumps to the printf function (4d6; the difference in hexadecimal values is due to the fact that the bsr opcode uses processor-indirect addressing).
  • Finally, main clears up what it's been doing, and exits

The difference should be clear: the f function stores its result value in the D0 register, but main goes looking for it in A0.

The reason for this discrepancy very simple. The m68k processors has three sets of registers: one set is for integer values (D0 through D7), one set is for address values (A0 through A7), and one is for floating-point values (FP0 through FP7). The m68k ABI specifies that integer return values should be stored in integer registers, that address return values should be stored in address registers, and that floating-point return values should be stored in floating-point registers. This makes sense, since register-indirect addressing modes require the address to be stored in an address register; and calculating values requires the value to be stored in either a floating-point or integer register. When main, then, tried to look for a return value in A0, it found something, but obviously not what it should have found...

This issue would have been a non-issue had this function been written the way it should have, like so:

#include <stdio.h>
void* f();
int main() {
	long* a;
	a=(long*)f();
	printf("%Ld\n", *a);
	return 0;
}

void* f() {
	static long a=84;
	printf("%Ld\n", a);
	return &a;
}

To close up: there's a reason why compilers emit warnings if you do something strange or "clever". In this particular case, the warning was suppressed, since both files contained a declaration of an f() function, even if the declaration was different. This is a horrible hack that tries to work around those warnings, rendering them utterly useless.

Please, pretty please, with sugar on top: don't do something like that. If you write C code, please compile it with -Wall -Werror, and make sure it compiles that way. If you want to access a function in a different file, create a common header file that both will #include, so that the compiler can notice differences between declaration and definition of a function; and don't expect that something will work because you fixed it for a common and well-known case, because there will often still be other places where your bug will still trigger. In this case, by declaring the function as one returning a long value, they made sure it could not break on 64-bit architectures. However, as shown above, that doesn't make the bug go away...)

Writing clean and portable code is way more fun. Trust me.