cat hello.c
#include <stdio.h>
int main(void) {
printf("Hello World!\n");
return 0;
}
A simple, standard, C program. Compiling and running it shouldn't produce any surprises... except when it does.
./hello
bash: ./hello: No such file or directory
The first time I got that, it was quite confusing. So, strace to the rescue -- I thought. But nope:
strace ./hello
execve("./hello", ["./hello"], [/* 14 vars */]) = -1 ENOENT (No such file or directory)
(output trimmed)
No luck there. What's going on? No, it's not some race condition whereby
the file is being removed. A simple ls -l
will show it's there, and
that it's executable. So what? When I first encountered this, I was at a
loss. Eventually, after searching for several hours, I figured it out
and filed it under "surprising things with an easy fix". And didn't
think of it anymore for a while, because once you understand what's
going on, it's not that complicated. But I recently realized that it's
not that obvious, and I've met several people who were at a loss when
encountering this, and who didn't figure it out. So, here goes:
If you tell the kernel to run some application (i.e., if you run one of
the exec
system calls), it will open the binary and try to figure out
what kind of application it's dealing with. It may be a script with a
shebang line (in which case the kernel calls the appropriate
interpreter), or it may be an ELF binary, or whatever.
If it is an ELF binary, the kernel checks if the architecture of the
binary matches the CPU we're running on. If that's the case, it will
just execute the instructions in the binary. If it's not an ELF binary,
or if the architecture doesn't match, it will fall back on some other
mechanism (e.g., the binfmt_misc
subsystem could have some emulator
set up to run binaries for the architecture in question, or may have
been set up to run java on jar files, etc). Eventually, if all else
fails, the kernel will return an error:
./hello: cannot execute binary file: Exec format error
"Exec format error" is the error message for ENOEXEC, the error code which the kernel returns if it determines that it cannot run the given binary. This makes it fairly obvious what's wrong, and why.
Now assume the kernel is biarch -- that is, it runs on an architecture which can run binaries for two CPU ISAs; e.g., this may be the case for an x86-64 machine, where the CPU can run binaries for that architecture as well as binaries for the i386 architecture (and its 32-bit decendants) without emulation. If the kernel has the option to run 32-bit x86 binaries enabled at compile time (which most binary distributions do, these days), then running i386 ELF binaries is possible, in theory. As far as the kernel is concerned, at least.
So, the kernel maps the i386 binary into memory, and jumps to the binary's entry point. And here is where it gets interesting: When the binary in question uses shared libraries, the kernel not only needs to open this binary itself, but also the runtime dynamic linker (RTDL). It is then the job of the RTDL to map the shared libraries into memory for the process to use, before jumping to its code.
But what if the RTDL isn't installed? Well, then the kernel won't find it. The RTDL is just this file, so that means it will produce an ENOENT error code -- the error message for which is "No such file or directory".
And there you have it. The solution is simple, and the explanation too; but still, even so, it can be a bit baffling: the system tells you "the file isn't there", without telling you which file it's missing. Since you passed it only one file, it's reasonable for you to believe the missing file is the binary you're asking the system to execute. But usually, that's not the problem: the problem is that the file containing the RTDL is not installed, which is just a result of you not having enabled multiarch.
Solution:
dpkg --add-architecture <target architecture>
apt update
apt install libc6:<target architecture>
Obviously you might need to add some more libraries, too, but that's not usually a problem.