Broken C code
A few weeks back, someone asked on debian-68k about an unexpected result of some piece of software when it was compiled on ARAnyM, an m68k emulator that can run Linux.
His initial idea was that this could perhaps be a bug in ARAnyM, since it only occurred to him inside ARAnyM, and not on any of the other architectures he tried. In fact, it was a bug in his code.
Let's have a look at what's going on here. The example program which Sergei provided and which exhibits the problematic behaviour is spread over two files. The first file contains this:
#include <stdio.h> void *f(); main() { void *a; a = f(); printf("%d\n", (long)a); }while the second contains this:
#include <stdio.h> long f(); long f() { long a = 84; printf("%d\n", a); return a; }
Many people familiar with the C programming language will intuitively expect that this program, when run, will output the following:
84 84
When in fact, when compiled and ran on an m68k machine, the output is as follows:
84 -1072577264
What's going on here? In order to understand that, we have to take a look at the assembly code. This is done by use of "objdump -d", and gives (amongst others) the following output:
800004c4 <main>: 800004c4: 4e56 0000 linkw %fp,#0 800004c8: 61ff 0000 001a bsrl 800004e4 <f> 800004ce: 2f08 movel %a0,%sp@- 800004d0: 4879 8000 05aa pea 800005aa <_IO_stdin_used+0x4> 800004d6: 61ff ffff feec bsrl 800003c4 <printf@plt> 800004dc: 508f addql #8,%sp 800004de: 4e5e unlk %fp 800004e0: 4e75 rts 800004e2: 4e75 rts 800004e4 <f>: 800004e4: 4e56 0000 linkw %fp,#0 800004e8: 4878 0054 pea 54 <_init-0x800002f0> 800004ec: 4879 8000 05aa pea 800005aa <_IO_stdin_used+0x4> 800004f2: 61ff ffff fed0 bsrl 800003c4 <printf@plt> 800004f8: 7054 moveq #84,%d0 800004fa: 4e5e unlk %fp 800004fc: 4e75 rts 800004fe: 4e75 rts
So what's happening here?
- main acquires some stack space, and then (0x800004c8) immediately jumps to a subroutine 26 bytes ahead; our function f.
- f also acquires stack space (0x800004e4), then pushes two addresses on the stack (...4e8 and 4ec), and jumps to the printf function. The first of these two addresses is the constant 84 that is hardcoded in the binary; the second address is our format string.
- f then moves the constant value 84 to the D0 register (...4f8), frees its stack space (4fa), and returns (4fc; the second rts can be ignored, that's a harmless quirk in the compiler).
- main now copies whatever is in the A0 register to the stack (4ce), pushes the exact same format string to the stack (4d0), and again jumps to the printf function (4d6; the difference in hexadecimal values is due to the fact that the bsr opcode uses processor-indirect addressing).
- Finally, main clears up what it's been doing, and exits
The difference should be clear: the f function stores its result value in the D0 register, but main goes looking for it in A0.
The reason for this discrepancy very simple. The m68k processors has three sets of registers: one set is for integer values (D0 through D7), one set is for address values (A0 through A7), and one is for floating-point values (FP0 through FP7). The m68k ABI specifies that integer return values should be stored in integer registers, that address return values should be stored in address registers, and that floating-point return values should be stored in floating-point registers. This makes sense, since register-indirect addressing modes require the address to be stored in an address register; and calculating values requires the value to be stored in either a floating-point or integer register. When main, then, tried to look for a return value in A0, it found something, but obviously not what it should have found...
This issue would have been a non-issue had this function been written the way it should have, like so:
#include <stdio.h> void* f(); int main() { long* a; a=(long*)f(); printf("%Ld\n", *a); return 0; } void* f() { static long a=84; printf("%Ld\n", a); return &a; }
To close up: there's a reason why compilers emit warnings if you do something strange or "clever". In this particular case, the warning was suppressed, since both files contained a declaration of an f() function, even if the declaration was different. This is a horrible hack that tries to work around those warnings, rendering them utterly useless.
Please, pretty please, with sugar on top: don't do something like that. If you write C code, please compile it with -Wall -Werror, and make sure it compiles that way. If you want to access a function in a different file, create a common header file that both will #include, so that the compiler can notice differences between declaration and definition of a function; and don't expect that something will work because you fixed it for a common and well-known case, because there will often still be other places where your bug will still trigger. In this case, by declaring the function as one returning a long value, they made sure it could not break on 64-bit architectures. However, as shown above, that doesn't make the bug go away...)
Writing clean and portable code is way more fun. Trust me.
Odd thing is, back when I was programming on the Amiga (using the same processor family), the C convention was that all return variables were returned in D0, even pointers and floating point. (The exceptions were doubles, in which case D0/D1 was used.)
Perhaps Linux uses a different convention, but I have to say I've never encountered a compiler which behaves the way described in your post.
I always use -Wmissing-prototypes -Wstrict-prototypes. These switches will also force you to mark all functions as static, when you don't prototype them anyway. The -Wstrict-prototypes forces you to fully prototype functions, so:
int foo();
isn't allowed. You must specify:
int foo(void);