I had built and was managing a web site that takes registrations from thousands of people around the state for a variety of sporting events. One of the goals of the site is to collect better quality data for the people running the events, i.e. they basically needed to get a better handle on how many people were actually attending.
One of the other goals of the site was to make it as easy and hassle-free as possible for anyone to register. This meant that requiring people to sign up for an account with usernames and passwords was undesirable, so make it possible for someone to just sign up with all their info in one session (i.e. never authenticated), then never return.
This also meant that if someone started to register one day, but abandoned their “shopping cart” (so to speak), and then came back the next day, they happily re-entered all their info again – which caused duplicate records to appear in the database. Someone accidentally closes their browser – another duplicate record. Someone signs up their friend on their behalf, not knowing their friend had already signed up – another duplicate record. Someone with very little computer experience gets an error (e.g. “Date of birth must be entered.”) and responds by closing the browser and restarting – and doing this multiple times – we got five duplicate records from this person.
So I built an automated alert system which would email the team coordinators a list of the duplicate records, based on a simple case-insensitive match on first name + surname (we did have one case last year where two different people happened to have the same name, but this is a very rare occurrence when you’re talking about only a few thousand people). I also built a de-duplicator which allowed me to compare two records side-by-side and delete one of them.
In the crunch week (the week before nominations close), we were getting 40+ registrations per day – and each day I was deleting 4 or 5 duplicate records. I thought there must be a better way.
So I (naïvely) quietly added a simple validation check to the signup page – if the player’s name was already registered it showed the error message “Sorry, a registration under you name has already been created. Please login to change your registration.” along with my contact details.
It worked, kind of – immediately I got 3 emails and 1 phone call from people who had started their signup, having earlier ignored or missed the email with their login details, and tried to sign up again. I made sure they could login, and they were able to update their existing registration without creating duplicate records. I was quietly optimistic that it would work better now.
Unfortunately, I was wrong. A few days later (today, actually) I decided to do an audit to see if my change had actually made things better or not. I suspected that some people might ignore that error message and just put in a different name. My suspicion was warranted, as it turns out.
So far I’ve gone through over 800 records and found variances of “Bloggs, Joe” or “Bloggs, Joe B” or “Bloggso, Joe” or “Bloggs Is My Name, Joe” scattered throughout. All my validation had done was put a roadblock in front of the users, who simply drove around it by putting in a slightly different name (I saw a lot of them simply put in their middle name), and now (more importantly) my de-duplicator is useless because it only finds matches on exact given names and surnames.
I’ve removed the validation. The duplicate records are manageable, and the system is overall easier for everyone with less validation.