Things not to do when validating email addresses
ERROR: WOUTER+TIMHOTEL@GREP.BE is not a valid email address
True, because I entered it in lowercase.
- Email address local parts (the bit before the @) are case sensitive. That means you can't just convert it to all caps and assume it will arrive.
- local parts can be composed out of any character from the following list:
- alphanumeric characters (a-z, A-Z, 0-9), and
- !#$%&'*+-/=?^_`{}|~.
This means that if you're trying to validate an email address by adding a regular expression that does more than check whether there's not exactly one @ in the address, you're almost always wrong.
Next time you try to tell me my mail address is invalid, go read the RFC first.
Morons.
Overzealous, bug-infected email checking is most annoying.
Another fun variant is when people implement websites that fail to escape '+' in links and therefore treat it as a space...
I suggest reading [1] which explains the situation nicely and gives sound arguments against following the RFC exactly as well.
[1] http://www.regular-expressions.info/email.html
Actually, the only arguments they give against following the RFC exactly are 'it gets you a very long regex that is hard to read'. That's not a very sound argument.
Besides, I'm not suggesting you follow the RFC exactly. This will do: .@...*
And then use standard SQL escaping if you're going to put it in a database, and standard command escaping if you're going to send an email. You don't need to use an overly strict regex for that.
While point #2 is definitely annoying and point #1 is strictly speaking correct, I don't think it's useful to uphold #1.
I doubt there are many real world mailsystems that would actually reject mail to Wouter@example.com and accept mail to wouter@example.com. I also doubt that it's an example of Postel's being liberal in what you accept.
But above all, it just seems completely mad to have a mailsystem where Jan.Jansen@example.com goes to a different person than jan.jansen@example.com.
Postel's being liberal in what you accept died a horrible death for email many spammers ago.
No, I'm not advocating that Wouter@foo goes to a different person than wouter@foo. But don't try to send me mail to WOUTER@FOO, because that will be rejected. Go fix your mail system.
What gain is there in enforcing case sensitivity for those addresses? Wouldn't it be inconvenient if the postman didn't deliver mail to my home address because it was spelled oosterstraat instead of the correct Oosterstraat?
In email context, I ran into the problem with mailinglist software: people subscribed with J.Doe@example.com to a mailinglist, but tried to unsubscribe j.doe@example.com which failed. I don't think people find this expected or useful behaviour.
Your email address in uppercase is a syntactically valid one. Only it may not actually exist.
There can even be spaces in the local part. It is very rare but perfectly possible, and it requires to quote it, like that: "foo bar"@example.com.
This would be an invalid check.
Email addresses may have more than one @ in them; as long as others are quoted. This probably rarely (never?) happens in the real world, but it's still possible that an email address could contain more than one @.
Quoting the above website:
Is this the kind of idiot you'd take advice from? Not to mention "his pet regex" rejects any IDN email address as well.
This is why I have
recipient_delimiter = -
in my /etc/postfix/main.cf
Sadly its almost impossible to have a regex validate email addresses correctly.
1 many MTAs and consumer email service providers(CESP) only implement a subset, most are case insenitive (so WOO=woo=Woo)
2 some CESPs implement addresses that violate the standard coughaol*cough*yahoo*cough*
3 spam makes people create the craziest filters in hopes to minimize it
4 some people just copy and paste very bad existing validators (most php validators consider the '+' illegal)
ps. you can have more than one @ in an email as per the RFCs, its for routing