Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

French dude gets in trouble with the law and has his encryption cracked. Hackernews goes into an extended (and extremely interesting) thread about the entropy content of grammatical English. Probably not that different in terms of entropy content per character, though.

That being said I don't understand your calculation of 4lg(100000). That seems like it can't possibly be correct. For starters it is entirely independent of the entropy content of the words in the dictionary. I could have 100000 strong random passwords of 1000 characters each in my dictionary. Could you explain this a bit, please because there's obviously something I'm missing?

Edit: Aah I think I get it - the assumption is both me and the attacker know the dictionary so the entropy content of the words doesn't matter, the only thing that matters is the joint probability distribution of the combinations I'm choosing. Is that correct?



Oh yeah, my assumption is that it's 4 words chosen from a 100,000 word dictionary. I honestly have no idea if it's a reasonable estimate but it stuck in my head from XKCD's original correct-horse-battery-staple comic. Of course in real life an attacker won't know necessarily the distribution you've pulled your password from, but by using the exact distribution in your calculations you have an ironclad lower bound.


Diceware uses 7776 word dictionary, xkcd uses a shorter dictionary.


My God, where on Earth did I get 100,000? I mean that's a lot of words, plausibly more than I will ever hear. And how did it just cancel out anyway so I got roughly the right entropy? This will frustrate me until I understand how I fucked it so totally




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: