by Paul Ducklin
In case you’ve never heard of it, Have I Been Pwned, or HIBP as it is widely known, is an online service run out of Queensland in Australia by a data breach researcher called Troy Hunt.
The idea behind HIBP is straightforward: to give you a quick way of checking your own online accounts against data breaches that are already known to be public.
Of course, you’d hope that a company that suffered a data breach would let you know itself, so you wouldn’t need a third party website like HIBP to find out.
But there are numerous problems with relying on the combined goodwill and ability of a company that’s just suffered a breach, not least that the scale of the breach might not be obvious at first, if the company even realises at all.
And even if the company does do its best to identify the victims of the breach, it may not have up-to-date contact data for you; its warning emails might get lost in transit; or it might not be sure which users were affected.
In case you’re unsure, the word pwned is pronounced to rhyme with owned, and it’s what you might call doubleslang – a new jargon word created by deliberately misspelling the existing jargon word “owned”, used to describe a database or a computer system that has been breached by an attacker.
Ironically, perhaps, the fact that it’s hard for a company to be certain how many records were stolen during an attack can have two different outcomes:
- The company might fail to inform everyone who was actually affected, due to underestimating the extent of the attack.
- The company might decide to tell all its customers that they might have been affected, even those who weren’t, due to being unable to estimate the extent of the attack at all.
Indeed, Hunt’s HIBP database started back in 2013, when Adobe suffered a massive data breach that proved just how hard it can be even for a large and well-established company to figure out what happened after a cyberattack.
The art-and-design software giant admitted in October 2013 that its network had been breached, with its Chief Security Officer claiming that “certain information relating to 2.9 million Adobe customers” had been stolen.
That estimate was soon increased to 38 million, but the breach ultimately turned out to have exposed the encrypted-but-highly-crackable passwords of about 150 million accounts, making the breach 50 times bigger that first thought.
Check for yourself
Hunt therefore set out to collect and collate personal information from data breaches that had already become public and make it securely searchable via his HIBP service.
After all, this was stolen data that was as good as available to anyone with enough patience to hunt it down for themselves for evil purposes, so why not try to use it for good instead?
The first 10 breach data dumps that he processed were as follows [link gives JSON data]:
HIBP breach name Date added Took place Notes ---------------- ---------- ---------- ------------------------------------------------- Vodafone 2013-11-30 2013-11-30 IDs, credit cards and SMS messages. Adobe 2013-12-04 2013-10-04 153 million Adobe accounts. Stratfor 2013-12-04 2011-12-24 860,000 accounts, 10,000s of credit cards, 100s of GBs of email. Yahoo 2013-12-04 2012-07-11 500,000 usernames and passwords. Sony 2013-12-04 2011-06-02 Numerous breaches, from PSN to Sony Pictures. Gawker 2013-12-04 2010-12-11 Information about 1.3M users. PixelFederation 2013-12-06 2013-12-04 38,000 gamers' account details. Snapchat 2014-01-02 2014-01-01 4.6 million usernames and phone numbers. BattlefieldHeroes 2014-01-23 2011-06-26 500,000 gamers' usernames and passwords. WPT 2014-02-01 2014-01-04 175,000 World Poker Tour usernames and passwords.
Astonishingly, his service now includes billions of records from 538 breaches over the past eight years. OTHERS STOP AT NOTIFICATION. WE TAKE ACTION Get 24/7 managed threat hunting, detection, and response delivered by Sophos experts Learn more
But did they get your password?
Fortunately, not every breached data record directly exposes the victim’s password, even if password data was amongst the information stolen.
Organizations that care about cybersecurity avoid storing actual passwords at all, typically saving a one-way hashed representation of your password instead.
This hashed version of the password can be quickly computed from the real password, which only ever needs to be stored temporarily in memory, but a cryptographic hash can’t be wrangled backwards to extract the original password, or indeed to learn anything about it.
Hashing stored passwords doesn’t absolve you from keeping the hashes secure, of course, because stolen hashes can be “cracked” one-at-a-time by trying passwords one after the other, based on a list of likely choices known as a dictionary.
The hashing process is a second layer of defence: the more unusual your choice of password, and the longer it is, the less likely it is that a crook will be able to find a hash to match it in a stolen database, and therefore the less useful a database of stolen hashes will be.
Note that properly-stored authentication databases don’t just store a hash of your password, they also store a unique random string of characters colloquially known as a salt that is combined with your password before it’s hashed. This ensures that if two users choose the same password, their hashes are nevertheless completely different, so every possible password needs to be tried separately for every possible user. If salts are used, there’s no way to compute a general-purpose lookup table that converts hashes directly back to passwords, because you’d need a new lookup table for each user.
What if the passwords weren’t hashed?
But what about passwords that were acquired by crooks in their raw, un-hashed form?
That’s not supposed to happen, but:
- Sloppy internet services sometimes store plaintext passwords on purpose, even though they know they shouldn’t, although that’s fortunately less and less common these days.
- Keylogging malware on your laptop can capture passwords as you type them in and upload the raw data directly to crooks who use the passwords themselves, sell them on to other crooks, or both.
- Memory-scraping malware on servers can sniff out passwords while they are being checked, even if they are purged from memory immediately after use and never get written to disk.
- Poor coding by a service provider could result in passwords being saved in plaintext form by mistake, for example to a logfile, where they might go unnoticed by the Good Guys for months or even years.
Google notoriously admitted in 2019 that it had inadvertently, albeit only occasionally, been logging unencrypted passwords for 14 years.
Facebook admitted, at about the same time, to a similar blunder affecting millions of Facebook and Instagram accounts.
In that sort of situation, you probably wouldn’t expect your password to show up in a public dump that might end up on HIBP, given that your password probably wasn’t exposed due to any specific hacking incident at any particular company.
Worse still, if your password gets sniffed out and collected in its raw form, then the crooks can simply start using it right away without doing any hash cracking first, and neither the randomness nor the length of your password would help to protect it better.
Sure, you’re much more likely to guess the password
iloveyou2 than the password
P6GZ54EN5OTV, but if you acquire the password in its original form then you don’t need to guess at all, so that even
C5eblGtr35fDn3TW$/"eeX is no safer than
Hunt therefore also offers a public service called Pwned Passwords, where you can look up your own password in a database of just over 600 million already-recovered passwords, whether those passwords were stolen due to a large-scale corporate data breach, a carefully planned ransomware attack, a long-running malware infestation, or any other cause.
Assuming that you use a password manager, or choose long and complex passwords of your own that don’t follow any obvious pattern, it’s reasonable to assume that each of your passwords is globally unique…
…so that if you find your password on Hunt’s Pwned Passwords list (which is a whopping 10GB download) then it’s equally reasonable to assume that it’s not there by chance.
It’s there because it’s no longer a secret: someone else already stole it, stored it for later, and then either leaked it themselves, got hacked, sold it on, or dumped it publicly for nuisance value.
In short, you’d jolly well better change it right away!
Avoiding a 10GB download
If you don’t have the time or energy to download 10GB or more of of Pwned Passwords data, you can look up your password without giving it away directly.
Hunt stores the 600 million passwords as SHA-1 hashes, so they all come out as 20-byte numbers, each represented as 40 hex digits. (Two hex digits of 4 bits each make up one byte of 8 bits.)
You simply hash your own password and look up the hash in two stages, as shown below, so you never directly reveal what password you were interested in.
Let’s assume your password is
ucanttouchthis. (Don’t choose this one – as you will see below, numerous others have thought of it already!)
Take the first five hex bytes of your SHA-1 password hash and visit a special URL that ends with those bytes, denoting a 20-bit number from 0 to just over a million. (220 = 1,048,576).
That brings up a page of approximately 600 password hashes for each 5-byte prefix, and you search through that much more manageable list for the final 35 hex bytes of your hash, like this:
$ echo -n 'ucanttouchthis' | sha1sum 2b355435e608aad0476ce74001d44aada409c1ab - # First 5 digits are 2B355 in hex # You're looking for the remaining digits 435E...C1AB $ curl https://api.pwnedpasswords.com/range/2B355 0060C6035CFE881ED8490EE2CBAC18247B5:2 02475EE4CCEA7E427D129134D879B56C67C:5 02FBDEF169D2AC92C53D132CBC5D9DDAB4F:1 039864D5A4F176ACF5F43D86B348DDB95F3:1 041F4A10B74CD813905BD39D78DEA151A84:1 . . . . 42C90D0D51A2FE0F8FC026C971B9D00975E:4 435E608AAD0476CE74001D44AADA409C1AB:29 <-- FOUND! 29 people chose this one 437D7DF02F1A8E026DDDB4562408349F514:2 . . . . FDC80988BBAD077D55ECF2845A53BEA423A:1 FE2D8DFE4473E34DD26F3EBDFD69B49564F:2 FE89BBEC3DA79E0D8AECDF831876040B18F:6 FE970AFD7CB1B928119427AAFA4283EAF20:1 FFCBEE88564A963B41549D864A5D12F9B9C:2 $
You can even add a header to the web request to say “pad out the number of replies”, so that between 800 and 1000 hashes are included every time (some of them bogus), so that the length of the reply doesn’t identify which prefix you searched for.
$ curl -H 'Add-Padding: true' https://api.pwnedpasswords.com/range/2B355 [. . . Your hash will definitely come back if it is present in the . . .] [. . . database, but there will be no predictable reply length from . . .] [. . . which an observer could infer which prefix you searched for. . . .]
If you aren’t comfortable using command line tools such as
wget, you can just paste the link with the 5-digit prefix into your browser and then search with Ctrl-F in the single page that comes back.
If you download the raw Pwned Password data and divide it into the same 220 sections as Hunt himself, you will know exactly how many hashes end up in each of the one million sections, a number that will vary randomly from section to section. You will therefore be able to predict how long the reply for each section will be, even if it’s encrypted, and therefore to infer which prefix was used simply from the length of the reply. Adding fake data so the the replies have randomized, variable lengths makes this sort of prediction impossible.
And that brings us to the headline, right here at the end.
HIBP is going to start receiving password hashes for its database from none other than the US Federal Bureau of Investigation (FBI)!
As Hunt himself explains:
[FBI investigators] play integral roles in combating everything from ransomware to child abuse to terrorism and in the course of their investigations, they regularly come across compromised passwords. Often, these passwords are being used by criminal enterprises to exploit the online assets of the people who created them. Wouldn’t it be great if we could do something meaningful to combat that?
And so, the FBI reached out and we began a discussion about what it might look like to provide them with an avenue to feed compromised passwords into HIBP and surface them via the Pwned Passwords feature.
In other words, if your password ends up in the hands of a crook in a way that neither you nor any of your service providers are likely to have noticed, you are unlikely ever to receive a breach notification warning about any sort of “compromise”…
…but there’s now a place that you can check securely for breached passwords anyway, even if you can never be sure exactly how the crooks acquired those passwords in the first place.