Note: This is a cross-post of an article published on the Mozilla Webdev blog
A number of high-profile websites (like LinkedIn and last.fm) have disclosed possible password leaks from their databases. The suspected leaks put huge amounts of important, private user data at risk.
What’s common to both these cases is the weak security they employed to “safekeep” their users’ login credentials. In the case of LinkedIn, it is alleged that an unsalted SHA-1 hash was used, in the case of last.fm, the technology used is, allegedly, an even worse, unsalted MD5 hash.
Let’s take a look at the most obvious mistakes our protagonists made here and then we’ll discuss the password hashing standards that Mozilla web projects routinely apply in order to mitigate these risks
Nobody should store plain-text passwords in a database. If you do, and someone steals the data through any sort of security hole, they’ve got all your user’s plain text passwords.
Smart mathematicians came up with something called a hashing function or “one-way function” H: password -> H(password). MD5 and SHA-1 mentioned above are examples of those. The idea is that you give this function an input (the password), and it gives you back a “hash value”. It is easy to calculate this hash value when you have the original input, but prohibitively hard to do the opposite. So we create the hash value of all passwords, and only store that. If someone steals the database, they will only have the hashes, not the passwords. And because those are hard or impossible to calculate from the hashes, the stolen data is useless.
For starters, people pick poor passwords. Write this one in stone, as it’ll be true as long as passwords exist. So a smart attacker can start with a copy of Merriam-Webster, throw in a few numbers here and there, calculate the hashes for all those words (remember, it’s easy and fast) and start comparing those hashes against the database they just stole. Because your password was “cheesecake1″, they just guessed it. Whoops! To add insult to injury, they just guessed everyone’s password who also used the same phrase, because the hashes for the same password are the same for every user.
Worse yet, you can actually buy(!) precomputed lists of straight hashes (called Rainbow Tables) for alphanumeric passwords up to about 10 characters in length. Thought “FhTsfdl31a” was a safe password? Think again.
This attack is called an offline dictionary attack and is well-known to the security community.
The standard way to deal with this is by adding a per-user salt. That’s a long, random string added to the password at hashing time: H: password -> H(password + salt). You then store salt and hash in the database, making the hash different for every user, even if they happen to use the same password. In addition, the smart attacker cannot pre-compute the hashes anymore, because they don’t know your salt. So after stealing the data, they’ll have to try every possible password for every possible user, using each user’s personal salt value.
The hashing function has two steps. First, the password is hashed with an algorithm called HMAC, together with a local salt: H: password -> HMAC(local_salt + password). The local salt is a random value that is stored only on the server, never in the database. Why is this good? If an attacker steals one of our password databases, they would need to also separately attack one of our web servers to get file access in order to discover this local salt value. If they don’t manage to pull off two successful attacks, their stolen data is largely useless.
As a second step, this hashed value (or strengthened password, as some call it) is then hashed again with a slow hashing function called bcrypt. The key point here is slow. Unlike general-purpose hash functions, bcrypt intentionally takes a relatively long time to be calculated. Unless an attacker has millions of years to spend, they won’t be able to try out a whole lot of passwords after they steal a password database. Plus, bcrypt hashes are also salted, so no two bcrypt hashes of the same password look the same.
So the whole function looks like: H: password -> bcrypt(HMAC(password, local_salt), bcrypt_salt).