Dear lazyweb,
In an effort to improve the performance of nbd-server, I wrote a patch to make it use some common sendfile() implementations (specifically, Linux- and FreeBSD-style sendfile calls). Unfortunately, however, when I test the Linux version (I haven't done tests with the FreeBSD version yet), the server outputs some garbage in fron of the actual data that it needs to send; as a result, obviously the client can't make heads or tails of it, and the connection is dropped.
However, when I run it inside gdb, everything is fine. When I call strace over the server, I don't see any obvious errors. I tried using the DODBG version of the server (see the code for details), but that didn't help me.
At this point, I'm pretty clueless as to what is going on. If anyone were to give me some hints or pointers, I would be eternally grateful.
Hi,
I would be wary of taking the address of a function argument:
ssize_t backend_send(int fh, int net, off_t offset, size_t len) { return sendfile(net, fh, &offset, len); }
This &offset... maybe the optimizer is playing foul games on you? A simple check would be copying it to a local var.
Regards -- tomás
I didn't look at in details, but this hunk looks wrong:
Line 980 Line 948
948 DEBUG4("(READ from fd %d offset %Lu len %u), ", fhandle, foffset, len); DEBUG4("(READ from fd %d offset %Lu len %u), ", fhandle, foffset, len); 949
950 myseek(fhandle, foffset); myseek(fhandle, foffset); 951 return read(fhandle, buf, len); return backend_send(fhandle, client->net, a, len); 952 } }
E.G. you used to read from offset foffset, and now you read from a.
If I were faced with this, I'd do three things: 1) check the file position (via fdopen/ftell) as set after sendfile, perhaps it's not positioned properly and that might tell you something. 2) remove the myseek(), since sendfile doesn't need the in_fd to be prepositioned, the offset tells it what to do. 3) try sendfile with myseek and NULL offset, see what happens. 4) run everything through valgrind, see if you have mem corruption. 5) look at the kernel code.
These are all just things to do to get more clues. Looking at the docs -- seems like you're doing things properly. If true, then that indicates kernel bug, however correct operation through gdb seems to argue against that, and in favor of memory corruption. WAG, perhaps sendfile() doesn't like addresses to variables on the stack (for off_t)?
But after further observation, it could just be that you're using a to compute foffset via get_filepos() function, and using foffset to seek the file, but then using a as the argument to sendfile(), which could be different depending on the value of fi.startoff [see get_filepos()]. perhaps you mean to use foffset in the call to backend_send() instead?
Weird and unfortunate that gdb changes the behavior. Heisenbugs are such fun.
As a wild-ass guess, does anything change if you make sure the data buffer you're sending from is page aligned?