Calculating Odds of winning? - math

So I'm trying to calculate the odds of someone winning, so for example there are 1,000 tickets, 4 people buy 250 tickets that would mean they have 25% chance each.
So I looked at how http://www.calculatorsoup.com/calculators/games/odds.php but it says 20% chance...
I'm not sure if I'm being stupid here or not... I'm unsure how to do this. so far I've done the following
p = 250 / (1000 + 250) * 100
but it gives me 20...

Do not think too much, the answer is obvious here:
P = 250 / 1000
Hence the probability to win is 25%.
The only reason where you should be using the formula P = A / (A + B) is the following:
You have A red balls and B blue balls and you want to know the probability of picking a red one amongst all of them. Then the probability is P = A / (A + B).

Your A and B values are 750 and 250 respectively. It gives you 25%.
Reason: If the winning ticket is in 250 tickets which you bought, then you will win. This is why B = 250. But if it is in other 750, you won't win. So, your A is 750. 250 / (250 + 750) = 1/4 = 0.25

The link you provided is for an odds calculator for sports betting. But the question you have is to do with probability.
You are correct in answering that the odds for each of the 4 people is 25%. Each person has a 250/1000 or 1/4 chance of winning.

but it says 20% chance...
I assume you're entering 1000 and 250 in that site's input fields? Note how the input is defined. It's asking for A:B odds. Which means the total would be both numbers combined, and the B value is your chance of winning.
Consider the example by default on that page: 3:2 odds of winning. That is, out of a total of 5, 3 will lose and 2 will win. So you have a 40% chance of winning.
So if you put 1000 and 250 into the inputs, then out of a total of 1250 there will be 250 winners, so any given player has a 20% chance of winning.
p = 250 / (1000 + 250) * 100 but it gives me 20...
That's correct.
If there is a total of 1000 and there will be 250 winners, then it would be 750:250 odds, or 3:1 odds, which is 25%. To verbalize it, you might say something like:
In this contest, any given player has 250 chances to win and 750 chances to lose. So their odds are 3:1.
If the tickets were distributed more oddly, each person would have individual odds which may not reduce to smaller numbers. For example, if someone bought 3 tickets, their odds would be 997:3.

Related

Distribution of money following different beta-distributions

I am trying to find a methodology (or even better, the code) to do the following in Netlogo. Any help is appreciated (I could always try to rewrite the code from R or Matlab to Netlogo):
I have $5000, which I want to distribute following different beta distributions among 10 000 actors. The maximum amount an actor may receive is $1.
Basically, I am looking for a way to generate random numbers to actors (10000 actors) in a [0,1] interval, following different beta distributions, where the mean of the distributed values remains equal to 0.5. This way the purchasing power of the population (10000 actors with a mean of 0.5 is $5000) remains equal for beta(1,1) (uniform population) as well as, for example, beta(4,1) (rich population).
an example with 5 actors distributing 2,5 dollar:
beta(1,1) 0,5 - 0,5 - 0,5 - 0,5 - 0,5 (mean 0,5)
beta(4,1) 0,1 - 0,2 - 0,5 - 0,7 - 1,0 (mean 0,5)
I've been thinking. If there is no apparent solution to this, maybe the following could work. We can write the shape of the frequency distribution of beta(4,1) as y=ax^2+b with some value for a and b (both increase exponentially).
In my case, the integral(0-1) of y=ax^2+b should be 5000. Playing around with values for a and b should give me the shape of beta(4,1).
The number of actors having 0.1 should then be the the integral(0-0.1) of y=ax^2+b where a and b are parameters of the shape resembling the beta(4,1).
Is my reasoning clear enough? Could someone extend this reasoning? Is there a link between the beta distribution and a function of a,b,x ?

Calculate original set size after hash collisions have occurred

You have an empty ice cube tray which has n little ice cube buckets, forming a natural hash space that's easy to visualize.
Your friend has k pennies which he likes to put in ice cube trays. He uses a random number generator repeatedly to choose which bucket to put each penny. If the bucket determined by the random number is already occupied by a penny, he throws the penny away and it is never seen again.
Say your ice cube tray has 100 buckets (i.e, would make 100 ice cubes). If you notice that your tray has c=80 pennies, what is the most likely number of pennies (k) that your friend had to start out with?
If c is low, the odds of collisions are low enough that the most likely number of k == c. E.g. if c = 3, then it's most like that k was 3. However, the odds of a collision are increasingly likely, after say k=14 then odds are there should be 1 collision, so maybe it's maximally likely that k = 15 if c = 14.
Of course if n == c then there would be no way of knowing, so let's set that aside and assume c < n.
What's the general formula for estimating k given n and c (given c < n)?
The problem as it stands is ill-posed.
Let n be the number of trays.
Let X be the random variable for the number of pennies your friend started with.
Let Y be the random variable for the number of filled trays.
What you are asking for is the mode of the distribution P(X|Y=c).
(Or maybe the expectation E[X|Y=c] depending on how you interpret your question.)
Let's take a really simple case: the distribution P(X|Y=1). Then
P(X=k|Y=1) = (P(Y=1|X=k) * P(X=k)) / P(Y=1)
= (1/nk-1 * P(X=k)) / P(Y=1)
Since P(Y=1) is normalizing constant, we can say P(X=k|Y=1) is proportional to 1/nk-1 * P(X=k).
But P(X=k) is a prior probability distribution. You have to assume some probability distribution on the number of coins your friend has to start with.
For example, here are two priors I could choose:
My prior belief is that P(X=k) = 1/2k for k > 0.
My prior belief is that P(X=k) = 1/2k - 100 for k > 100.
Both would be valid priors; the second assumes that X > 100. Both would give wildly different estimates for X: prior 1 would estimate X to be around 1 or 2; prior 2 would estimate X to be 100.
I would suggest if you continue to pursue this question you just go ahead and pick a prior. Something like this would work nicely: WolframAlpha. That's a geometric distribution with support k > 0 and mean 10^4.

sample size for A/B fisher test significance

Given the results for a simple A / B test...
A B
clicked 8 60
ignored 192 1940
( ie a conversation rate of A 4% and B 3% )
... a fisher test in R quite rightly says there's no significant difference
> fisher.test(data.frame(A=c(8,192), B=c(60,1940)))
...
p-value = 0.3933
...
But what function is available in R to tell me how much I need to increase my sample size to get to a p-value of say 0.05?
I could just increase the A values (in their proportion) until I get to it but there's got to be a better way? Perhaps pwr.2p2n.test [1] is somehow usable?
[1] http://rss.acs.unt.edu/Rdoc/library/pwr/html/pwr.2p2n.test.html
power.prop.test() should do this for you. In order to get the math to work I converted your 'ignored' data to impressions by summing up your columns.
> power.prop.test(p1=8/200, p2=60/2000, power=0.8, sig.level=0.05)
Two-sample comparison of proportions power calculation
n = 5300.739
p1 = 0.04
p2 = 0.03
sig.level = 0.05
power = 0.8
alternative = two.sided
NOTE: n is number in *each* group
That gives 5301, which is for each group, so your sample size needs to be 10600. Subtracting out the 2200 that have already run, you have 8400 "tests" to go.
In this case:
sig.level is the same as your p-value.
power is the likelihood of finding significant results that exist within your sample. This is somewhat arbitrary, 80% is a common choice. Note that choosing 80% means that 20% of the time you won't find significance when you should. Increasing the power means you'll need a larger sample size to reach your desired significance level.
If you wanted to decide how much longer it will take to reach significance, divide 8400 by the number of impressions per day. That can help determine if its worth while to continue the test.
You can also use this function to determine required sample size before testing begins. There's a nice write-up describing this on the 37 Signals blog.
This is a native R function, so you won't need to add or load any packages. Other than that I can't say how similar this is to pwr.p2pn.test().

Compound interest but with a twist: "compound tax"

Let's say that I have a diminishing value that should be portrayed both on a monthly basis and on a weekly basis.
For example. I know that the value, say 100 000, diminishes by 30 %/year. Which when I calculate (by normal "periodic compound" formulas) is 2.21 %/month and 0.51 %/week.
However, looking at the results from these calculations (calculating for a entire year) I do not get the same end valued. Only if I calculate it as a "interest" (=the percentage is ADDED to the value, NOT taken away) do I get matching values on both the weekly and monthly calculations.
What is the correct formula for calculating this "compound taxation" problem?
I don't know if I fully understand your question.
You cannot calculate diminushing interest the way you do it.
If your value (100 000) diminishes by 30 %/ year this means that at the end of year 1 your value is 70 000.
The way you calculated you compound would work if diminishing by 30% meant 100000/1.3
Your mistake:
You made your calculation this way:
(1+x)^12 - 1 =30% then x=0.0221 the monthly interest is 2.21%
(1+x)^52 - 1 = 30% then x=0.0051 the weekly interest is 0.51%
But what you should have done is:
(1-x)^12=1-30% then x =0.0292 the monthly interest is 2.92%
(1-x)^52=1-30% then x=0.0068 the monthly interest is 0.68 %
You cannot calculate the compound interest as if it was increasing 30% when it's decreasing 30%.
It's easy to understand that the compound interest for an increasing will be smallest than the one for decreasing:
Exemple:
Let's say your investment makes 30% per year.
At the end of first month you will have more money, and therefore you're investing more so you need a smaller return to make as much money as in the first month.
Therefore for increasing interest the coumpond interest i=2.21 is smaller than 30/12 = 2.5
same reasonning for the decreasing i =2.92 > 30/12=2.5
note:
(1+x)^12 - 1 =30% is not equivalent to (1-x)^12=1-30%
negative interest cannot be treated as negative interest:
following what you did adding 10% to one then taking away 10% to the result would return one:
(1+10%)/(1+10%)=1
The way it's calculated won't give the same result : (1+10%)*(1-10%)=0.99
Hope I understood your question and it helps .
Engaging psychic debugging...
diminishes by 30 %/year. Which when I
calculate (by normal "periodic
compound" formulas) is 2.21 %/month
and 0.51 %/week.
You are doing an inappropriate calculation.
You are correct in saying that 30% annual growth is approx 2.21% monthly growth. The reason for this is because 30% annual growth is expressed as multiplication by 1.30 (since 100% + 30% = 130%, or 1.30), and making this monthly is:
1.30 ^ (1/12) = 1.0221 (approx)
However, it does not follow from this that 30% annual shrinkage is approx 2.21% monthly shrinkage. To work out the monthly shrinkage we must note that 30% shrinkage is multiplication by 0.70 (since 100% - 30% = 70%, or 0.70), and make this monthly in the same way:
0.70 ^ (1/12) = 0.9707 (approx)
Multiplication by 0.9707 is monthly shrinkage of 2.929% (approx).
Hopefully this will give you the tools you need to correct your calculations.

How should I order these "helpful" scores?

Under the user generated posts on my site, I have an Amazon-like rating system:
Was this review helpful to you: Yes | No
If there are votes, I display the results above that line like so:
5 of 8 people found this reply helpful.
I would like to sort the posts based upon these rankings. If you were ranking from most helpful to least helpful, how would you order the following posts?
a) 1/1 = 100% helpful
b) 2/2 = 100% helpful
c) 999/1000 = 99.9% helpful
b) 3/4 = 75% helpful
e) 299/400 = 74.8% helpful
Clearly, its not right to sort just on the percent helpful, somehow the total votes should be factored in. Is there a standard way of doing this?
UPDATE:
Using Charles' formulas to calculate the Agresti-Coull lower range and sorting on it, this is how the above examples would sort:
1) 999/1000 (99.9%) = 95% likely to fall in 'helpfulness' range of 99.2% to 100%
2) 299/400 (74.8%) = 95% likely to fall in 'helpfulness' range of 69.6% to 79.3%
3) 3/4 (75%) = 95% likely to fall in 'helpfulness' range of 24.7% to 97.5%
4) 2/2 (100%) = 95% likely to fall in 'helpfulness' range of 23.7% to 100%
5) 1/1 (100%) = 95% likely to fall in 'helpfulness' range of 13.3% to 100%
Intuitively, this feels right.
UPDATE 2:
From an application point of view, I don't want to be running these calculations every time I pull up a list of posts. I'm thinking I'll either update and store the Agresti-Coull lower bound either on a regular, cron-driven schedule (updating only those posts which have received a vote since the last run) or update it whenever a new vote is received.
For each post, generate bounds on how helpful you expect it to be. I prefer to use the Agresti-Coull interval. Pseudocode:
float AgrestiCoullLower(int n, int k) {
//float conf = 0.05; // 95% confidence interval
float kappa = 2.24140273; // In general, kappa = ierfc(conf/2)*sqrt(2)
float kest=k+kappa^2/2;
float nest=n+kappa^2;
float pest=kest/nest;
float radius=kappa*sqrt(pest*(1-pest)/nest);
return max(0,pest-radius); // Lower bound
// Upper bound is min(1,pest+radius)
}
Then take the lower end of the estimate and sort on this. So the 2/2 is (by Agresti-Coull) 95% likely to fall in the 'helpfulness' range 23.7% to 100%, so it sorts below the 999/1000 which has range 99.2% to 100% (since .237 < .992).
Edit: Since some people seem to have found this helpful (ha ha), let me note that the algorithm can be tweaked based on how confident/risk-averse you want to be. The less confidence you need, the more willing you will be to abandon the 'proven' (high-vote) reviews for the untested but high-scoring reviews. A 90% confidence interval gives kappa = 1.95996398, an 85% confidence interval gives 1.78046434, a 75% confidence interval gives 1.53412054, and the all-caution-to-the-wind 50% confidence interval gives 1.15034938.
The 50% confidence interval gives
1) 999/1000 (99.7%) = 50% likely to fall in 'helpfulness' range of 99.7% to 100%
2) 299/400 (72.2%) = 50% likely to fall in 'helpfulness' range of 72.2% to 77.2%
3) 2/2 (54.9%) = 50% likely to fall in 'helpfulness' range of 54.9% to 100%
4) 3/4 (45.7%) = 50% likely to fall in 'helpfulness' range of 45.7% to 91.9%
5) 1/1 (37.5%) = 50% likely to fall in 'helpfulness' range of 37.5% to 100%
which isn't that different overall, but it does prefer the 2/2 to the safety of the 3/4.
This question is probably better asked on http://stats.stackexchange.com .
I guess you still want to order by increasing of 'helpfulness'.
If you want to know how precise a given number is, the simplest way is to use the square root of the variance of the Binomial distribution with n equal to the total number of responses and p the fraction of responses which were 'helpful'.
A very simple solution would be to ignore everything with less than a cut-off amount of votes, and then sort by percentage.
For example (require at least five votes)
1. 99.9% (1000 votes)
2. 74.8% (400 votes)
3-5. waiting for five votes
It depends on the expected rate of positive feedback and the number of the people that vote on average.
If, like in the example you give, you are going to have sometimes 5 and 10 people voting and other times a 1000, then I would suggest the Wilson midpoint:
(x+z^2/2)/(n+z^2) The midpoint of the Adjusted Wald Interval / Wilson Score
where:
n = Sum(all_votes),
x = Sum(positive_votes) / n,
z = 1.96 (fixed value)

Resources