Thursday, October 29, 2009
The Odds of the Curse
[updated: I goofed in my numbers, so I've modified significantly] So Arnold Schwarzenegger has embedded a big "fuck you" to Ammiano in the text explaining why he vetoed an otherwise uncontroversial measure. This was first uncovered by the SF Bay Guardian.
In a follow up post, they discuss the odds of this being innocent, rather than carefully planned.
Of course, they got their number from a politician, so it's wrong:
At any rate, Supervisor David Chiu has done the math and concludes that it's highly unlikely this was a mistake:
"Assuming it was real, I calculated the probability that this is pure chance. Assuming it's a 1/26 chance for each particular letter, the probability that this is random is one out of 8,031,810,176."
Ok, that 1 in 8 billion.
Well, it's quite obvious that letters are not equiprobable. More words starts with A than with Z, of course, so those odds are wrong. A commenter at the Bay Guardian came up with his formula:
using the 2of12 list from the 12dicts file found at http://wordlist.sourceforge.net/, I calculated the probability of a word starting with the following letters as follows:
f = 4.40%; u = 3.59%; c = 9.30%; k = 0.66%; y = 0.29%; o = 2.66%; u = 3.59%
for an overall probability of 2.39E-12, or approximately 1 in 370,855,495,993. so a much lower probability than that calculated by Supervisor Chiu.
That's 1 in 370 billion. But again, that is wrong. Indeed, if only you could keep ONLY ONE word starting with for instance the letter T in the English language (and re-assigning all other T-starting words to other letters), the probability to find a word starting with the letter T in a text would vary on what that unique word means. If it's "THE" then it would be quite likely, it's the most frequent word in the English language; if it's "THEREMIN" not as much. What matters is not how many words start with this letter, but how frequents words that start with a given letter are in the language.
Another commenter gets a better idea: "spike" counted the appearance of these letters as first letter in some big text and came up with odds of 1 in 600 billion. This captures both the frequency of the letters starting a word, and the frequency of the word in the language. However, his text has 30,000 words, and there are over 100,000 words in the English dictionary so his result is not statistically valid. Still that's a good ballpark estimate.
So far (not sure how spike did it) all calculated the odds of finding the letters in FUCKYOU. But these letters also spell CFKOUUY, FCUKYOU, YOUUFCK, and plenty other anagrams. So the odds above include many more words that you need to disambiguate. If you assume the two "unnecessary" are different, one starts with U1, and the other with U2, then there are two combinations of the letters C,F,K,O,U1,U2,Y which spell the right result, FU1CKYOU2 and FU2CKYOU1. There are also 7*6*5*4*3*2*1 = 7! = 5040 possible combinations of these letters. So there is a 2/5040 chances that, if the first letter of the words are taken from the set {CFKOUUY}, then it spells out FUCKYOU.
So if we take the value found by Spike (1 in 600 billion), and multiply by 2/5040, then we get one in 1.5 million billion.
No matter how you slice it, the odds are much much much lower than all the numbers suggested by the commenters at the Bay Guardian (1 in 8 billion for Chiu, one in 370 billion by frouglas, one in 600 billion computed by spike).
The odds of OJ being not guilty according to the DNA evidence are 1 in 170 million.
In a follow up post, they discuss the odds of this being innocent, rather than carefully planned.
Of course, they got their number from a politician, so it's wrong:
At any rate, Supervisor David Chiu has done the math and concludes that it's highly unlikely this was a mistake:
"Assuming it was real, I calculated the probability that this is pure chance. Assuming it's a 1/26 chance for each particular letter, the probability that this is random is one out of 8,031,810,176."
Ok, that 1 in 8 billion.
Well, it's quite obvious that letters are not equiprobable. More words starts with A than with Z, of course, so those odds are wrong. A commenter at the Bay Guardian came up with his formula:
using the 2of12 list from the 12dicts file found at http://wordlist.sourceforge.net/, I calculated the probability of a word starting with the following letters as follows:
f = 4.40%; u = 3.59%; c = 9.30%; k = 0.66%; y = 0.29%; o = 2.66%; u = 3.59%
for an overall probability of 2.39E-12, or approximately 1 in 370,855,495,993. so a much lower probability than that calculated by Supervisor Chiu.
That's 1 in 370 billion. But again, that is wrong. Indeed, if only you could keep ONLY ONE word starting with for instance the letter T in the English language (and re-assigning all other T-starting words to other letters), the probability to find a word starting with the letter T in a text would vary on what that unique word means. If it's "THE" then it would be quite likely, it's the most frequent word in the English language; if it's "THEREMIN" not as much. What matters is not how many words start with this letter, but how frequents words that start with a given letter are in the language.
Another commenter gets a better idea: "spike" counted the appearance of these letters as first letter in some big text and came up with odds of 1 in 600 billion. This captures both the frequency of the letters starting a word, and the frequency of the word in the language. However, his text has 30,000 words, and there are over 100,000 words in the English dictionary so his result is not statistically valid. Still that's a good ballpark estimate.
So far (not sure how spike did it) all calculated the odds of finding the letters in FUCKYOU. But these letters also spell CFKOUUY, FCUKYOU, YOUUFCK, and plenty other anagrams. So the odds above include many more words that you need to disambiguate. If you assume the two "unnecessary" are different, one starts with U1, and the other with U2, then there are two combinations of the letters C,F,K,O,U1,U2,Y which spell the right result, FU1CKYOU2 and FU2CKYOU1. There are also 7*6*5*4*3*2*1 = 7! = 5040 possible combinations of these letters. So there is a 2/5040 chances that, if the first letter of the words are taken from the set {CFKOUUY}, then it spells out FUCKYOU.
So if we take the value found by Spike (1 in 600 billion), and multiply by 2/5040, then we get one in 1.5 million billion.
No matter how you slice it, the odds are much much much lower than all the numbers suggested by the commenters at the Bay Guardian (1 in 8 billion for Chiu, one in 370 billion by frouglas, one in 600 billion computed by spike).
The odds of OJ being not guilty according to the DNA evidence are 1 in 170 million.
Comments:
Sweet. Do you have an issue with the message itself? I appreciate the creativity and departure from politics as usual.
Post a Comment
