Awtch
Lessons learned: no unicode in filenames on my blog.
I had the great idea, a few days ago, to write a short blog post about my trip to Cologne, and to give the file a name with a UTF-8 character in it (the ö, to be exact). Except that I forgot how my blog actually works...
I use blosxom to manage my RSS feed and other parts of the blog; but to integrate it properly in my website, I wrote a whole lot of stuff around that. It's not just blosxom:
- All files are in a subversion repository.
- On the server, there is a file:/// checkout of the subversion repository in /var/local/blosxom. Files in this checkout are owned by the www-data user.
- The subversion repository has a rather long post-commit hook:
wouter@samba:/var/lib/svn/blog/hooks$ grep -v '^#' post-commit | wc -l 20
It takes care of updating the repository in /var/local/blosxom, and then fixes the timestamps on those files based on the timestamp of the original commit in the subversion repository. It's an ugly amalgam of svnlook date, svnlook history, touch and things, but it works. Sortof. Except if I remove a file, or if the svnlook history thing doesn't find the file itself. Finally, it will call blosxom itself, in the mode in which it creates files on disk rather than trying to do CGI output.
Apparently, however, subversion thinks differently about UTF-8 filenames if you do a checkout from a repository when using a http:// or a file:// URL. As a result, there were some issues in the post-commit subversion hook that I wrote, resulting in the svn up part of the post-commit not entirely succeeding. Or some such. Then, the touch is done, and the svn up doesn't work at all, anymore. In short, things started to break horribly, resulting in empty posts (because the files were created by touch rather than svn up), the comment thing being confused about filenames (and, as a result, postgres complaining about incorrect UTF-8 encodings), and similar other ugliness.
So, I just renamed the file, and did a cleanup of the subversion checkout in /var/local/blosxom. I'll just have to cope with the fact that my setup doesn't like unicode, I guess. Or, perhaps, finally switch to ikiwiki some day, which I've been thinking about ever since Joey first blogged about it a few years ago. But that's not urgent...
Hello,
I would actually suspect it's not difference between http:// and file:// access that's screwing up filenames here. Subversion, in it's infinite wisdom, stores filenames in unicode and converts them to/from local encoding.
Normally, locale environment variables are set from pam_env or shell global rc. So it's possible they are not set in the update hook. And hell knows what filesystem encoding subversion assumes with C locale, but it might not be anything sensible. I would would expect it to use ascii and screw up the non-ascii names, or iso-8859-1 in a mislead attempt for backward compatibility.
Note: I'd only override LC_CTYPE to avoid possible damage caused by parsed output of some tool suddenly becoming localized or similar collateral damage from locales.