Close
Glad You're Ready. Let's Get Started!

Let us know how we can contact you.

Thank you!

We'll respond shortly.

LABS
Avoiding username collision

The problem

Suppose you are launching a new web application. You have a list of users with personal information, and you need to mail them instructions for how to sign on to their new accounts. For security reasons, you cannot mail send both their username and password by mail. Instead, you must mail each users either their username or their password, along with instructions for how to construct the other field from personal information.  It is important that the user can easily construct the unknown field, but that another person who happens upon the letter cannot. Each username must be unique.

In this case it is best to mail each user their password. This allows us to create strong, secure passwords, and to have our usernames be constructible from personal information.

A potential solution

One solution to this problem is to to assign each user a username consisting of the first initial of their first name, their last name, and the last four digits of their social security number. A person named John Smith with a social security number of 123-45-6789 would have the username JSmith6789. Assuming a user keeps their SSN secret, this satisfies the security criteria for the usernames, and at first glance also seems to satisfy the uniqueness criteria. After all, if you have the username JSmith6789, the odds are very small that another user has the same first initial, last name, and last four of their SSN.

The birthday problem

However, it’s not enough that there is a low probability that JSmit6789 will encounter another user with the same username. We need to assure that no two of our users will have the same username. To illustrate how different these probabilities can be, consider the Birthday Problem. Among a group of 25 people there is a fairly small chance (1-(364/365)^24 = 6.7%) that another person in the group will share your birthday. However, there is a very high change (1-(364/365)(363/365) … (341/365) = 56.8%) that two people in the room will share the same birthday.

Collision probability

Our potential solution might work for a small set of users, but if we have a large amount, say 50,000, of users we may run into problems. Using census data, we find that approximately 1.006% of the population has “Smith” as a last name and 11.9% of the population have a first name beginning with the letter “J”. If we assume the distribution of names in our group of users is similar to that of the US, we are likely to have around 59 people with the name J. Smith.

Each J. Smith has 10^4 = 10,000 possibilities for the last 4 digits of their social security number. Assuming the last 4 digits are distributed evenly, if we pick two J. Smiths at random there is only a 1/10,000 = .01% chance that they have the same social security number. However, using the techniques we used to solve the birthday problem, we can calculate that there is a 15.75% chance that there will be two J. Smiths in our group of users with the same last 4 digits of their SSN.

What does this mean

We made several assumptions about the name and SSN distribution of our users above, and we stopped short of full statistical analysis, but our results are alarming. If there is a 15.75% chance that there will be two people named J. Smith in our group of users with colliding usernames, we cannot be confident of avoiding username collision.

What to do

Since our first proposed username scheme looks flawed, where can we go from here? Luckily, due to the nature of the birthday problem, adding additional information to our usernames will greatly reduce our chances of username collision. For example, if we add the user’s birthday to the end of the username (JSmith67890131) the chance of two J. Smiths having the same username drops to 0.04%. If we are still unsatisfied we could add additional data such as the full first name or birth year to further decrease the chance of a collision. Check out the code used to calculate these probabilities here.

 

Comments
  1. James B says:

    First, I like the solution and the use of statistics. However, I take issue with the problem. Using this approach you end up with user names that users hate and/or that give up quite a bit of personal information if compromised. If the service already has a user’s private information, why not just have them verify the information and then they can pick from a list of verified collision free options?

    • Tyson Gern says:

      Thanks for the comment, James. After the initial login we could prompt the users to change their username, possible to their email address.

  2. Tyson Gern says:

    You bring up some great points, Can. For this post, I was imagining a situation where the login information would be physically mailed to the users. A one-time invite url is great for emailed login invitations, but is not practical in this situation.

    Once the user has logged in for the first time, I would advise requiring them to change their username and password. While this may not be an ideal user experience, it will help the users to create a username/password combination that they are likely to remember.

Post a Comment

Your Information (Name required. Email address will not be displayed with comment.)

* Copy This Password *

* Type Or Paste Password Here *