LinkedIn hash leak analysis

Today it was announced that LinkedIn was compromised at some point and 6.5 million unsalted SHA1 hashes were posted. LinkedIn has since confirmed that the hashes relate to accounts from them. Before the official announcement, though, I was curious.

Trying to confirm

The first question was, “Is this real?” Since I’ve had an account on LinkedIn for years and I assume this password file would be highly linked to, I guessed that it would probably be indexed by Google in minutes if it were posted in plaintext. I searched for the SHA1 value of some candidate passwords, but they didn’t get any results. That’s probably a good thing for me.

Without this lead, though, it took some searching of stories to find and download a copy of the full list. Given that, I again searched for possible passwords that could be associated with the account, and searching against the file I again found no matches.

Outside sources

This is where the human networking aspect of the profession comes into play. I heard from a few people that I respect and consider to be likely good sources of information after I linked them to the file, and while some folks like me didn’t find their password there, others did. The one that I really latched onto was an individual who found the SHA1 hash of a 30 character random password in the list.

It’s real: now what?

First off, shame on LinkedIn. They failed to take simple steps to protect leaked passwords, and for that it’s relatively easy to attempt to crack them.

Why is the list not comprehensive? Initially when I scrolled through the file, I saw an odd pattern: the characters in columns 7 and 8 were staying the same when everything changed. They were either “a8” or “a9”. It would be very strange to sort a file by those columns and to include only a portion. To check that what I was seeing at top and bottom of the file where the case, I ran it through a quick series of pipes:

cut -c 7,8 hashes | sort | uniq -c

It turned out I was wrong and distribution was rather uniform. I haven’t come up with a solid explanation as to why the file seems clustered around those offsets in the hashes, but I’m guess it relates to the fact that already compromised passwords have had the first characters of the hash replaced with a string of zeroes.

Why is it just hashes and not usernames or email addresses paired? Whoever compromised the accounts is holding onto something valuable. Because people tend to use the same usernames and passwords across several different sites and email addresses are often linked part of that, whoever compromised the accounts has information that can be sold on the black market. Releasing the hashes offloads the work of breaking them, but the value is in tying them back to an account. It would be bad to have any financial passwords tied to the LinkedIn site’s email / password combination. It would be devastating if one used that password for their email account as access to an email inbox often provides the ability to reset paswords.

The simple reminder

Change your passwords if you used the site. Change any passwords that are the same. Even if you use a simple variant of the same base password, try to have some variety between sites. Make certain that you use a unique password for your email. Consider using unique passwords for other high-value accounts. Ideally, every account would have its own unique complex passwords, but the bounds of human memory are often a challenge to that.

In summary, duplicate passwords might cost you by providing access to an important site if it uses the same password as a low-value site. With vendors adding features and once-free sites making more use of financial data including credit card information, things that used to be low-value to you may now be higher risk.


Leave a Comment

NOTE - You can use these HTML tags and attributes:
<a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>