email addresses

Things not to do when validating email addresses

ERROR: WOUTER+TIMHOTEL@GREP.BE is not a valid email address

True, because I entered it in lowercase.

  1. Email address local parts (the bit before the @) are case sensitive. That means you can't just convert it to all caps and assume it will arrive.
  2. local parts can be composed out of any character from the following list:
    • alphanumeric characters (a-z, A-Z, 0-9), and
    • !#$%&'*+-/=?^_`{}|~.

This means that if you're trying to validate an email address by adding a regular expression that does more than check whether there's not exactly one @ in the address, you're almost always wrong.

Next time you try to tell me my mail address is invalid, go read the RFC first.

Morons.

Posted
git-annex

git-annex awesomeness

So a few days ago, there was this:

21:24 < wouter> hum.
21:24 < wouter> Anyone know of a tool to manage scanned documents?
21:25 < wouter> the idea being that I can tell this tool "here's a bunch of newly-scanned documents", and it will upload them to a server
21:25 < wouter> and it should allow me to easily find a specific file later on
21:25 < wouter> and I'd also like version control there
21:26 < wouter> and I do _not_ want to download the entire repository of scanned documents on my laptop (that's why I have a server)
21:26 < wouter> and perhaps I'd also like a pony to go with that.
21:29 < wouter> oh, yes, and I do _not_ want a webbrowser as the primary interface (that might be okay to look things up, but not to store stuff)

The answer, as it turned out, was git-annex: a tool to manage files with git, without checking them into git.

What, I hear you say? Yes, that sounds a little weird, doesn't it?

Perhaps it's easiest to explain with a little example.

$ git annex add 2011-11-07-belgacom.pdf
$ ls -l 2011-11-07-belgacom.pdf
lrwxrwxrwx 1 wouter wouter 191 nov  7 14:46 2011-11-07-belgacom.pdf ->
../.git/annex/objects/xx/3F/SHA256-s1537334--c44e1a057e247bfe7c196ac146c8a0ca32096c0b10df6c18fd3f1c2e99ecddbf/SHA256-s1537334--c44e1a057e247bfe7c196ac146c8a0ca32096c0b10df6c18fd3f1c2e99ecddbf

The file is now known to git-annex, and I can have it do all kinds of useful things with it now:

$ git annex drop 2011-11-07-belgacom.pdf
drop 2011-11-07-belgacom.pdf (unsafe)
  Could only verify the existence of 0 out of 1 necessary copies

  No other repository is known to contain the file.

  (Use --force to override this check, or adjust annex.numcopies.)
failed
git-annex: drop: 1 failed

Oops, we hadn't copied it to anywhere else yet. We don't want to lose our data!

$ git annex move --to server 2011-11-07-belgacom.pdf
move 2011-11-07-belgacom.pdf (checking server...) (to server...)
SHA256-s1537334--c44e1a057e247bfe7c196ac146c8a0ca32096c0b10df6c18fd3f1c2e99ecddbf
     1537334 100%    9.22MB/s    0:00:00 (xfer#1, to-check=0/1)

sent 30 bytes  received 1537668 bytes  1025132.00 bytes/sec
total size is 1537334  speedup is 1.00
ok
$

What just happened? git-annex copied the file to a git remote called "server", and then dropped it from my local copy. It's no longer here! The symlink in my local directory is now a dead link; I can not open it anymore.

But, no worries! If we ever need it again, it's just a single command away.

$ git annex get 2011-11-07-belgacom.pdf
get 2011-11-07-belgacom.pdf (from server...) 
SHA256-s1537334--c44e1a057e247bfe7c196ac146c8a0ca32096c0b10df6c18fd3f1c2e99ecddbf
     1537334 100%    9.58MB/s    0:00:00 (xfer#1, to-check=0/1)

sent 30 bytes  received 1537668 bytes  3075396.00 bytes/sec
total size is 1537334  speedup is 1.00
ok

This allows me to save space on my local laptop while not having to care where the files are -- they're just there. And it gets more awesome if you know that git-annex can store multiple copies of each file (so you have automatic distributed backups, as with regular git), where you can enforce the minimum number of copies. Also, git-annex supports multiple backends -- you can store your data in Amazon S3, or on an encrypted USB drive, or whatever, and have git-annex manage it transparently for you.

I said this already on IRC, but: Joey, I owe you beer.

Posted