Close

Thank you!

We'll respond shortly.

- Blog Navigation

Entropy is not Entropy

Why don’t you call it entropy? In the first place, a mathematical development very much like yours already exists in Boltzmann’s statistical mechanics, and in the second place, no one understands entropy very well, so in any discussion you will be in a position of advantage. – John von Neumann to Claude Shannon

In the context of information theory, entropy is a measure of *unpredictability* or *information content. *The entropy of a coin toss for a fair coin (for which the probability of heads is the same as the probability of tails) is 1. A cryptographically secure random number generator has approximately 1 bit of entropy per bit of output: 0 and 1 are equally probable outputs at any point in the stream. English text has fairly low entropy, because it is fairly predictable; there are many more e’s than z’s, for example.

Entropy often comes up in discussions of passwords and password selection. There are tools online that purport to measure the relative strength of passwords by estimating the entropy of the password based on the unpredictability of the sequence of bytes. This measure of entropy may not be related in any direct way to the actual entropy of the password, which may be determined by the password selection process.

For example, we have used passwords here by selecting two dictionary words separated by a digit. If you eliminate short words (fewer than 4 letters), and long words, it is difficult to find more than 20,000 or so words, even when including foreign and technical terms. Two words selected at random from a dictionary of 20,000 words, combined with a single digit (e.g., *mahua8tynd*) yields 20000*20000*10 possible combinations, or 4,000,000,000. That seems like a lot, but log2(20000*20000*10) is about 31, meaning that there are 31 bits of entropy passwords generated by this mechanism (whereas a password consisting of 4 random 8-bit characters contain 32 bits of entropy). But online password checkers almost uniformly credit this with more entropy than actually was present at the time of creation.

http://rumkin.com/tools/password/passchk.php

reports 41 bits of entropy for mahua8tynd. What’s wrong here? Entropy may not be entropy. While the above tool no doubt properly calculates the entropy of a password based on the expected value of each character; roughly a measure of the compressibility of the word or phrase; the degree of freedom in generating the password is considerably less.

A conservative approach to security asks us to assume that an adversary knows how our locks are constructed, how keys are made, etc. and relies on possession of the key itself to control who can open a lock. But unlike a physical key, a password is something we are expected to remember; the greater the entropy, the lower the likelihood a password is memorable.

As an approach to generating passwords that have more entropy, but are still somewhat more memorable than *lJSKKZtHyNZr1*, I wrote *mkpasswd. **mkpasswd* was inspired by the babble strings produced by the original Bellcore S/Key OTP generator; however, its purpose is merely to produce passwords with a promise of 66 bits of entropy (in the default configuration). The dictionary differs from the original in that only 3- and 4-letter words are used. The security of the passphrases generated is reducible to the security of the underlying system RNG (e.g., /dev/random). Six words are selected at random from a dictionary of 2048 words, yielding 2^66 possible passphrases.

To make passphrases more legible, the -s option inserts spaces, and the -d option inserts dashes. It is up to the user whether to include these in the password itself. A sample of the output:

msierchio@belden-sf:~ > *mkpasswd -s*

Pea Yawn Lin Fun Muff Balk

The code is released under the two-clause BSD license. It’s on github here:

https://github.com/pivotalops/security.git

h/t and thanks to Brian Cunnie, who helped with getting this in shape, and got it queued for inclusion in *homebrew*.

Comments

Post a Comment

Awesome, cool stuff Mike! The issue here is there are two definitions of entropy:

1. The entropy of the *random process* used to generate a particular password, which has to do with the degrees of freedom in the process

2. The entropy of a *particular password*, which is calculated with no knowledge of the process used to generate it, which can be calculated in a number of different ways

Those online strength calculators obviously have to do the latter, which I guess makes sense in a context where one can safely assume an adversary doesn’t know the process by which you generate passwords. On the other hand, if we take as you say a more conservative approach and assume that an adversary knows how our “locks” are constructed, the former calculation is the way to go. Of course, doing to former calculation is harder, because the input to it is a random process, not merely a string.

One can think of the latter calculation as an attempt to estimate the former, by making some assumptions/guesses about how the password was generated from looking at the password itself. I was curious so I peeked under the hood at the JavaScript used by the password checker you linked to, and it makes some pretty whacky assumptions! In particular, it treats all non-letters the same. So both

^341^)8@#05&*6%%#$7(9!24%

and

!111111111111111111111111

have the same entropy!

A better measure for password quality is Kolmogorov Complexity – since 1111111111111111 can be abbreviated to something like {1}16