Things not to do when validating email addresses
ERROR: WOUTER+TIMHOTEL@GREP.BE is not a valid email address
True, because I entered it in lowercase.
- Email address local parts (the bit before the @) are case sensitive. That means you can't just convert it to all caps and assume it will arrive.
- local parts can be composed out of any character from the following list:
- alphanumeric characters (a-z, A-Z, 0-9), and
- !#$%&'*+-/=?^_`{}|~.
This means that if you're trying to validate an email address by adding a regular expression that does more than check whether there's not exactly one @ in the address, you're almost always wrong.
Next time you try to tell me my mail address is invalid, go read the RFC first.
Morons.
git-annex awesomeness
So a few days ago, there was this:
21:24 < wouter> hum. 21:24 < wouter> Anyone know of a tool to manage scanned documents? 21:25 < wouter> the idea being that I can tell this tool "here's a bunch of newly-scanned documents", and it will upload them to a server 21:25 < wouter> and it should allow me to easily find a specific file later on 21:25 < wouter> and I'd also like version control there 21:26 < wouter> and I do _not_ want to download the entire repository of scanned documents on my laptop (that's why I have a server) 21:26 < wouter> and perhaps I'd also like a pony to go with that. 21:29 < wouter> oh, yes, and I do _not_ want a webbrowser as the primary interface (that might be okay to look things up, but not to store stuff)
The answer, as it turned out, was git-annex: a tool to manage files with git, without checking them into git.
What, I hear you say? Yes, that sounds a little weird, doesn't it?
Perhaps it's easiest to explain with a little example.
$ git annex add 2011-11-07-belgacom.pdf $ ls -l 2011-11-07-belgacom.pdf lrwxrwxrwx 1 wouter wouter 191 nov 7 14:46 2011-11-07-belgacom.pdf -> ../.git/annex/objects/xx/3F/SHA256-s1537334--c44e1a057e247bfe7c196ac146c8a0ca32096c0b10df6c18fd3f1c2e99ecddbf/SHA256-s1537334--c44e1a057e247bfe7c196ac146c8a0ca32096c0b10df6c18fd3f1c2e99ecddbf
The file is now known to git-annex, and I can have it do all kinds of useful things with it now:
$ git annex drop 2011-11-07-belgacom.pdf drop 2011-11-07-belgacom.pdf (unsafe) Could only verify the existence of 0 out of 1 necessary copies No other repository is known to contain the file. (Use --force to override this check, or adjust annex.numcopies.) failed git-annex: drop: 1 failed
Oops, we hadn't copied it to anywhere else yet. We don't want to lose our data!
$ git annex move --to server 2011-11-07-belgacom.pdf move 2011-11-07-belgacom.pdf (checking server...) (to server...) SHA256-s1537334--c44e1a057e247bfe7c196ac146c8a0ca32096c0b10df6c18fd3f1c2e99ecddbf 1537334 100% 9.22MB/s 0:00:00 (xfer#1, to-check=0/1) sent 30 bytes received 1537668 bytes 1025132.00 bytes/sec total size is 1537334 speedup is 1.00 ok $
What just happened? git-annex copied the file to a git remote called "server", and then dropped it from my local copy. It's no longer here! The symlink in my local directory is now a dead link; I can not open it anymore.
But, no worries! If we ever need it again, it's just a single command away.
$ git annex get 2011-11-07-belgacom.pdf get 2011-11-07-belgacom.pdf (from server...) SHA256-s1537334--c44e1a057e247bfe7c196ac146c8a0ca32096c0b10df6c18fd3f1c2e99ecddbf 1537334 100% 9.58MB/s 0:00:00 (xfer#1, to-check=0/1) sent 30 bytes received 1537668 bytes 3075396.00 bytes/sec total size is 1537334 speedup is 1.00 ok
This allows me to save space on my local laptop while not having to care where the files are -- they're just there. And it gets more awesome if you know that git-annex can store multiple copies of each file (so you have automatic distributed backups, as with regular git), where you can enforce the minimum number of copies. Also, git-annex supports multiple backends -- you can store your data in Amazon S3, or on an encrypted USB drive, or whatever, and have git-annex manage it transparently for you.
I said this already on IRC, but: Joey, I owe you beer.