Standing up your own Have-I-Been-Pwned Passwords Server

Posted on October 26, 2017
hibp, linux, python, db, postgres

Please read the caveats below this introduction before trying to play along at home.

Have I Been Pwned (HIBP) is a great service by Troy Hunt that allows you to check if logins (and passwords) associated with your email address have been in publicised website breaches.

In August, Troy Hunt added an entirely new feature to HIBP: Checking passwords against a database of 306 million breached passwords that he compiled.

This database can also be queried on HIBP directly and in the HIBP API.

As Troy Hunt explains in his own blog posts, it is now a NIST recommendation to check user passwords against known compromised passwords.

So, if you have users and maintain a database of logins for them, this is something that could be very useful for you. I am one of those people who have users with logins of passwords.

However, it is obviously a bad idea to send (potential) user passwords to someone else on the internet, even someone as trustworthy as Troy Hunt!

So I set to replicate the PwnedPasswords API so I could host it inside my (client’s) own infrastructure.

Here’s how to do that!

What you will need:

A server (I am using an ESXi VMWare hosted virtual machine with 4GB of RAM and 80GB of disk running Ubuntu 16.04)
A database (PostgreSQL 9.6)
A webapp for the API (I wrote a Python Flask-based app)
An integration for your signup / password change frontend that queries the webapp

This post is accompanied by this webapp: https://github.com/duk3luk3/pwndwords

Caveats

You will be setting up a database with approx. 16GB of data and 11GB of index. That is not big data but it’s not trivial data anymore either. If you make a mistake you can easily kick off a database operation that will sit there spinning for ten minutes.
This post is written with a private on-premises deployment in mind. If you want to make this publicly available you will need to take additional measures, e.g. rate-limiting, that have not been taken account here.
You should recommend / urge you users to use password managers and randomly generated passwords / diceware passphrases. This API does not do anything to estimate password quality which has its own bag of caveats.

And now let’s go ~~

The database

I am not an expert on databases in any way shape or form, but here is what I came up with:

Postgres 9.6 on Ubuntu 16.04
Main ‘passwords’ table with id and passwords
Additional table for auxiliary data (e.g. hit counter)

Install pg9.6 on Ubuntu 16.04:

Follow apt source list setup instructions here: https://www.postgresql.org/download/linux/ubuntu/
sudo aptitude install postgresql-9.6

You may want to edit the postgres configuration to tune its memory allocations (especially the shared_buffers and work_mem settings). See here for tuning PostgreSQL.

Setup a user:

sudo -u postgres -i
createuser -s -r -d erlacher
logout

Setup a database:

createdb passwords

Create extension (pgcrypto extension for digest function), table and index:

CREATE EXTENSION pgcrypto;
CREATE TABLE passwords ( id bigserial PRIMARY KEY, hash bytea);
CREATE INDEX ON passwords (substring(hash for 7));

This is an index on the first 7 bytes of the sha1 hash. For our list of 300 million passwords, that is a very useful index size.

Get and clean data

(If you are reading this in the future, V2 of the password list may have been released and be clean so you might not need to do this step.)

Download the files from https://haveibeenpwned.com/Passwords
Unzip them with p7zip -d
Clean duplicates from the files:

$ time sort --parallel=4 -u pwned-passwords-1.0.txt pwned-passwords-update-1.txt pwned-passwords-update-2.txt > pwned_all_uniq.txt

real	4m35.591s
user	1m42.476s
sys	0m26.056s

(Without --parallel it took 12 minutes to sort just the first file - but the 4 minutes here may be cache effects.)

Load into database

If you’ve been playing around with the db and want to clean it up before loading the full list, do this:

TRUNCATE passwords RESTART IDENTITY;

Then load it:

$ sed -e 's/^/\\\\x/' pwned_all_uniq.txt | time psql passwords -c "copy passwords (hash) from STDIN"

The sed here is necessary to prepend the escape sequence \\x to all hashes so that pg will recognize them as hexadecimal strings.

Query

To utilize the index, we need to invoke it by using a where condition that matches the index expression:

prepare pw_lookup (bytea) as select * from passwords WHERE substring(hash for 7) = substring($1 for 7) and hash = $1;
explain analyze execute pw_lookup(digest('sommernacht','sha1'));

and we get this:

                                                                QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------------------
 Bitmap Heap Scan on passwords  (cost=29579.61..2284020.12 rows=1 width=29) (actual time=16.376..16.376 rows=1 loops=1)
   Recheck Cond: ("substring"(hash, 1, 7) = '\x431ce891a0129c'::bytea)
   Filter: (hash = '\x431ce891a0129ceba34e9ef44638a6dbd065f28a'::bytea)
   Heap Blocks: exact=1
   ->  Bitmap Index Scan on passwords_substring_idx  (cost=0.00..29579.61 rows=1601472 width=0) (actual time=7.395..7.395 rows=1 loops=1)
         Index Cond: ("substring"(hash, 1, 7) = '\x431ce891a0129c'::bytea)
 Execution time: 16.396 ms
(7 rows)

Webserver

The server will be a Flask app hosted with apache2 and mod_wsgi.

Install apache2, mod_wsgi (for Python3!) and python3 stuff:

sudo apt-get install apache2 libapache2-mod-wsgi-py3 python3-pip python3-venv
sudo a2enmod ssl wsgi

Create the app, or just clone mine:

sudo mkdir /opt/password-lookup
sudo chown erlacher:tumuser /opt/password-lookup/
git clone git@github.com:duk3luk3/pwndwords.git /opt/password-lookup

You can install the requirements into your system globally, or use a venv. I like to use venvs:

cd /opt/password-lookup
python3 -m venv .venv
cp activate_this.py .venv/bin/
. ./.venv/bin/activate
pip install -r requirements.txt

The very special python3 distribution in Ubuntu lacks activate_this.py so you need to copy that from the repository.

Now grab /etc/apache2/sites-available/default-ssl.conf and set it up to create a wsgi service:

<IfModule mod_ssl.c>
	<VirtualHost _default_:443>
		ServerAdmin ADMIN@EXAMPLE.COM

		DocumentRoot /opt/password-lookup

		WSGIDaemonProcess passwords user=USER group=GROUP threads=5
		WSGIScriptAlias /passwords /opt/password-lookup/venv.wsgi

		<Directory /opt/password-lookup>
			WSGIProcessGroup passwords
			WSGIApplicationGroup %{GLOBAL}
			Order deny,allow
			Allow from all
			Require all granted
		</Directory>

		SSLCertificateFile    /etc/ssl/certs/CERTIFICATE.pem
		SSLCertificateKeyFile /etc/ssl/private/KEY.key
		SSLCACertificateFile  /etc/ssl/certs/ca-certificates.crt

Fill in appropriate values for all the CAPITALISED placeholder values in there.

Integration

OK, this bit is really gnarly and consists mainly of things copy-pasted off of StackOverflow, so please don’t copy and paste this in turn, take it only as an instructive proof-of-concept.

You should also consider that ideally you should not have the API exposed publicly unless you take additional measures for rate-limiting.

This code also should hash the passwords with SHA-1 before submitting them to the API for just a little bit more feel-good factor.

jQuery AJAX code:

<script src="/jquery-1.10.2.js"></script>
<script>
$(function() {
//setup before functions
var typingTimer;                //timer identifier
var doneTypingInterval = 800;  //time in msa
var jqxhr;
   
//on keyup, start the countdown
$('#pwedit').keyup(function(){
    clearTimeout(typingTimer);
    $('#pwhint_pwnd').css('display','none');
    $('#pwhint_ok').css('display','none');
    $('#pwhint_error').css('display','none');
    $('#pwedit').css('background-color','');
    if ($('#pwedit').val()) {
        typingTimer = setTimeout(doneTyping, doneTypingInterval);
    }
});

//user is "finished typing," do something
function doneTyping () {
//alert('done typing');
var inputval = $('#pwedit').val();
$('#pwhint_working').css('display','');
jqxhr = $.ajax({
  url: 'https://pwndwords.in.tum.de/passwords/?password=' + encodeURIComponent(inputval),
  timeout: 5000,
  statusCode: {
    200: function(data, textStatus, xhr) {
        if (xhr == jqxhr) {
                $('#pwedit').css('background-color','#FF9999');
                $('#pwhint_working').css('display','none');
                $('#pwhint_pwnd').css('display','');
        }
    },         
    404: function(xhr, textStatus, errorThrown) {
        if (xhr == jqxhr) {
                $('#pwedit').css('background-color','#99FF99');
                $('#pwhint_pwnd').css('display','none');
                $('#pwhint_working').css('display','none');
                $('#pwhint_ok').css('display','');
        }              
    }          
  },           
  error: function(xhr, textStatus, errorThrown) {
        if (xhr == jqxhr && xhr.status != 404) {
                console.log('Error trying to reach pwndwords.in.tum.de: ' + textStatus + ' (' + errorThrown + ')');
                $('#pwhint_working').css('display','none');
                $('#pwhint_error').css('display','');
        }      
  }
});    
}
});            
</script>

HTML form part with the inputs:

                <table>
                        <tr>
                                <td>Old password:</td>
                                <td><input type=password name=password_old size=60 /></td>
                        </tr>
                        <tr>
                                <td>New Password:</td>
                                <td><input id='pwedit' type=password name=password_new size=60 /></td>
                                <td>
                                        <span id="pwhint_working" style="display: none;"><img src="/24px-spinner-black.gif"> Checking password, please wait.</span>
                                        <span id="pwhint_pwnd" style="background-color: #ff9999; display: none;">This password has been pwned and is not allowed.</span>
                                        <span id="pwhint_ok" style="display: none;">This password has not previously been pwned. (But that does not mean it is a good password)</span>
                                        <span id="pwhint_error" style="display: none;">There was an error checking your password :-(</span>
                                </td>
                        </tr>
                        <tr>
                                <td>Repeat New Password:</td>
                                <td><input type=password name=password_repeat size=60 /></td>
                        </tr>
                </table>