Help with Probability Equation - math

I'm trying to put together an app for fun that has a scenario where I need to figure out a probability equation for the following scenario:
Suppose I have a number of attempts at something and each attempt has a success rate (known ahead of time). What are the odds after doing all those attempts that a success happens?
For example there are three attempts (all will be taken individually).
The first is known to have a 60% success rate.
The second is known to have a 30% success rate.
The third is known to have a 75% success rate.
What are the odds of a success occurring if all three attempts are made?
I've tried several formulas and can't pinpoint the correct one.
Thanks for the help!

Probability of winning is probability of not losing all three:
1 - (1 - 0.6)(1 - 0.3)(1 - 0.75)

1 - .4 * .7 * .25
That is, find the probability that all attempts fail, and invert it. So in general, given a finite sequence of events with probabilities P[i], the probability that at least one event is successful is 1 - (1 - P[0]) * (1 - P[1]) * ... * (1 - P[n])
And here's a perl one-liner to compute the value: (input is white-space separated list of success rates)
perl -0777 -ane '$p=1; $p*=1-$_ foreach #F; print 1-$p . "\n"'

Compute the chance of "all failures" (product of all the 1-pj where pj is the jth chance of success -- probability computations that represent probabilities as anything but numbers between 0 and 1 are crazy, so if you absolutely need percentages instead as input or output do your transformations at the start or end!) and the probability of "at least 1 success" is 1 minus that product.
Edit: here's some executable pseudocode -- i.e., Python -- with percentages as input and output, using your numbers (the original ones and the ones you changed in a comment):
$ cat proba.py
def totprob(*percents):
totprob_failure = 1.0
for pc in percents:
prob_this_failure = 1.0 - pc/100.0
totprob_failure *= prob_this_failure
return 100.0 * (1.0 - totprob_failure)
$ python -c'import proba; print proba.totprob(60,30,75)'
93.0
$ python -c'import proba; print proba.totprob(2,30,75)'
82.85
$

Related

R Precision for Double - Why code returns negative why positive outcome expected?

I am testing 2 ways of calculating Prod(b-a), where a and b are vectors of length n. Prod(b-a)=(b1-a1)(b2-a2)(b3-a3)*... (bn-an), where b_i>a_i>0 for all i=1,2,3, n. For some special cases, another way (Method 2) of calculation this prod(b-a) is more efficient. It uses the following formula, which is to expand the terms and sum them:
Here is my question is: When it happens that a_i very close to b_i, the true outcome could be very, very close 0, something like 10^(-16). Method 1 (substract and Multiply) always returns positive output. Method 2 of using the formula some times return negative output ( about 7~8% of time returning negative for my experiment). Mathematically, these 2 methods should return exactly the same output. But in computer language, it apparently produces different outputs.
Here are my codes to run the test. When I run the testing code for 10000 times, about 7~8% of my runs for method 2 returns negative output. According to the official document, the R double has the precision of "2.225074e-308" as indicated by R parameter: ".Machine$double.xmin". Why it's getting into the negative values when the differences are between 10^(-16) ~ 10^(-18)? Any help that sheds light on this will be apprecaited. I would also love some suggestions concerning how to practically increase the precision to higher level as indicated by R document.
########## Testing code 1.
ftest1case<-function(a,b) {
n<-length(a)
if (length(b)!=n) stop("--------- length a and b are not right.")
if ( any(b<a) ) stop("---------- b has to be greater than a all the time.")
out1<-prod(b-a)
out2<-0
N<-2^n
for ( i in 1:N ) {
tidx<-rev(as.integer(intToBits(x=i-1))[1:n])
tsign<-ifelse( (sum(tidx)%%2)==0,1.0,-1.0)
out2<-out2+tsign*prod(b[tidx==0])*prod(a[tidx==1])
}
c(out1,out2)
}
########## Testing code 2.
ftestManyCases<-function(N,printFreq=1000,smallNum=10^(-20))
{
tt<-matrix(0,nrow=N,ncol=2)
n<-12
for ( i in 1:N) {
a<-runif(n,0,1)
b<-a+runif(n,0,1)*0.1
tt[i,]<-ftest1case(a=a,b=b)
if ( (i%%printFreq)==0 ) cat("----- i = ",i,"\n")
if ( tt[i,2]< smallNum ) cat("------ i = ",i, " ---- Negative summation found.\n")
}
tout<-apply(tt,2,FUN=function(x) { round(sum(x<smallNum)/N,6) } )
names(tout)<-c("PerLess0_Method1","PerLee0_Method2")
list(summary=tout, data=tt)
}
######## Step 1. Test for 1 case.
n<-12
a<-runif(n,0,1)
b<-a+runif(n,0,1)*0.1
ftest1case(a=a,b=b)
######## Step 2 Test Code 2 for multiple cases.
N<-300
tt<-ftestManyCases(N=N,printFreq = 100)
tt[[1]]
It's hard for me to imagine when an algorithm that consists of generating 2^n permutations and adding them up is going to be more efficient than a straightforward product of differences, but I'll take your word for it that there are some special cases where it is.
As suggested in comments, the root of your problem is the accumulation of floating-point errors when adding values of different magnitudes; see here for an R-specific question about floating point and here for the generic explanation.
First, a simplified example:
n <- 12
set.seed(1001)
a <- runif(a,0,1)
b <- a + 0.01
prod(a-b) ## 1e-24
out2 <- 0
N <- 2^n
out2v <- numeric(N)
for ( i in 1:N ) {
tidx <- rev(as.integer(intToBits(x=i-1))[1:n])
tsign <- ifelse( (sum(tidx)%%2)==0,1.0,-1.0)
j <- as.logical(tidx)
out2v[i] <- tsign*prod(b[!j])*prod(a[j])
}
sum(out2v) ## -2.011703e-21
Using extended precision (with 1000 bits of precision) to check that the simple/brute force calculation is more reliable:
library(Rmpfr)
a_m <- mpfr(a, 1000)
b_m <- mpfr(b, 1000)
prod(a_m-b_m)
## 1.00000000000000857647286522936696473705868726043995807429578968484409120647055193862325070279593735821154440625984047036486664599510856317884962563644275433171621778761377125514191564456600405460403870124263023336542598111475858881830547350667868450934867675523340703947491662460873009229537576817962228e-24
This proves the point in this case, but in general doing extended-precision arithmetic will probably kill any performance gains you would get.
Redoing the permutation-based calculation with mpfr values (using out2 <- mpfr(0, 1000), and going back to the out2 <- out2 + ... running summation rather than accumulating the values in a vector and calling sum()) gives an accurate answer (at least to the first 20 or so digits, I didn't check farther), but takes 6.5 seconds on my machine (instead of 0.03 seconds when using regular floating-point).
Why is this calculation problematic? First, note the difference between .Machine$double.xmin (approx 2e-308), which is the smallest floating-point value that the system can store, and .Machine$double.eps (approx 2e-16), which is the smallest value such that 1+x > x, i.e. the smallest relative value that can be added without catastrophic cancellation (values a little bit bigger than this magnitude will experience severe, but not catastrophic, cancellation).
Now look at the distribution of values in out2v, the series of values in out2v:
hist(out2v)
There are clusters of negative and positive numbers of similar magnitude. If our summation happens to add a bunch of values that almost cancel (so that the result is very close to 0), then add that to another value that is not nearly zero, we'll get bad cancellation.
It's entirely possible that there's a way to rearrange this calculation so that bad cancellation doesn't happen, but I couldn't think of one easily.

F#: integer (%) integer - Is Calculated How?

So in my text book there is this example of a recursive function using f#
let rec gcd = function
| (0,n) -> n
| (m,n) -> gcd(n % m,m);;
with this function my text book gives the example by executing:
gcd(36,116);;
and since the m = 36 and not 0 then it ofcourse goes for the second clause like this:
gcd(116 % 36,36)
gcd(8,36)
gcd(36 % 8,8)
gcd(4,8)
gcd(8 % 4,4)
gcd(0,4)
and now hits the first clause stating this entire thing is = 4.
What i don't get is this (%)percentage sign/operator or whatever it is called in this connection. for an instance i don't get how
116 % 36 = 8
I have turned this so many times in my head now and I can't figure how this can turn into 8?
I know this is probably a silly question for those of you who knows this but I would very much appreciate your help the same.
% is a questionable version of modulo, which is the remainder of an integer division.
In the positive, you can think of % as the remainder of the division. See for example Wikipedia on Euclidean Divison. Consider 9 % 4: 4 fits into 9 twice. But two times four is only eight. Thus, there is a remainder of one.
If there are negative operands, % effectively ignores the signs to calculate the remainder and then uses the sign of the dividend as the sign of the result. This corresponds to the remainder of an integer division that rounds to zero, i.e. -2 / 3 = 0.
This is a mathematically unusual definition of division and remainder that has some bad properties. Normally, when calculating modulo n, adding or subtracting n on the input has no effect. Not so for this operator: 2 % 3 is not equal to (2 - 3) % 3.
I usually have the following defined to get useful remainders when there are negative operands:
/// Euclidean remainder, the proper modulo operation
let inline (%!) a b = (a % b + b) % b
So far, this operator was valid for all cases I have encountered where a modulo was needed, while the raw % repeatedly wasn't. For example:
When filling rows and columns from a single index, you could calculate rowNumber = index / nCols and colNumber = index % nCols. But if index and colNumber can be negative, this mapping becomes invalid, while Euclidean division and remainder remain valid.
If you want to normalize an angle to (0, 2pi), angle %! (2. * System.Math.PI) does the job, while the "normal" % might give you a headache.
Because
116 / 36 = 3
116 - (3*36) = 8
Basically, the % operator, known as the modulo operator will divide a number by other and give the rest if it can't divide any longer. Usually, the first time you would use it to understand it would be if you want to see if a number is even or odd by doing something like this in f#
let firstUsageModulo = 55 %2 =0 // false because leaves 1 not 0
When it leaves 8 the first time means that it divided you 116 with 36 and the closest integer was 8 to give.
Just to help you in future with similar problems: in IDEs such as Xamarin Studio and Visual Studio, if you hover the mouse cursor over an operator such as % you should get a tooltip, thus:
Module operator tool tip
Even if you don't understand the tool tip directly, it'll give you something to google.

Understanding the probability of a double-six if i roll two dice

The probability of a double-six in one throw of two die is 1/36 or 0.028.
If I threw a pair of die a hundred times would 3 (0.028 * 100) be
The amount of times (3) I would get a double-six
OR
The probability (3%) of getting a double-six on all throws.
I have a feeling the correct answer is number 1, because intuitively the chance of getting a double six every time on a hundred throws seems to be a lot lower than 3%.
Please explain, as simply as you can, which is the correct understanding and why.
The probablity of not having double six in one throw (all but one outcome divided by all outcomes):
35/36
The probability of not having double six in N throws
(35/36)**N /* where ** is raising into N-th power */
The probability of having at least one double six in N throws
P(N) = 1 - (35/36)**N
if N == 100 we have
P(100) == 0.94022021...
It is nearly 1., but with a twist in the interpretation. 2.8 is the average number of double sixes if you were to perform a series of experiments with 100 throws each. The correct answer for 2. was given by Dmitry.
Please ask math-oriented questions in the math forum math.stackexchange.

How to generate random arithmetic expressions for game

i would like to know if you can help me with this problem for my game. I'm currently using lots of switch, if-else, etc on my code and i'm not liking it at all.
I would like to generate 2 random arithmethic expressions that have one of the forms like the ones bellow:
1) number
e.g.: 19
2) number operation number
e.g.: 22 * 4
3) (number operation number) operation number
e.g.: (10 * 4) / 5
4) ((number operation number) operation number) operation number
e.g.: ((25 * 2) / 10) - 2
After i have the 2 arithmetic expresions, the game consist in matching them and determine which is larger.
I would like to know how can i randomly choose the numbers and operations for each arithmetic expression in order to have an integer result (not float) and also that both expression have results that are as close as possible. The individual numbers shouldn't be higher than 30.
I mean, i wouldn't like a result to be 1000 and the other 14 because they would be probably too easy to spot which side is larger, so they should be like:
expresion 1: ((25 + 15) / 10) * 4 (which is 16)
expression 2: (( 7 * 2) + 10) / 8 (which is 3)
The results (16 and 3) are integers and close enough to each other.
the posible operations are +, -, * and /
It would be possible to match between two epxressions with different forms, like
(( 7 * 2) + 10) / 8
and
(18 / 3) * 2
I really appreciate all the help that you can give me.
Thanks in advance!!
Best regards.
I think a reasonable way to approach this is to start with a value for the total and recursively construct a random expression tree to reach that total. You can choose how many operators you want in each equation and ensure that all values are integers. Plus, you can choose how close you want the values of two equations, even making them equal if you wish. I'll use your expression 1 above as an example.
((25 + 15) / 10) * 4 = 16
We start with the total 16 and make that the root of our tree:
16
To expand a node (leaf), we select an operator and set that as the value of the node, and create two children containing the operands. In this case, we choose multiplication as our operator.
Multiplication is the only operator that will really give us trouble in trying to keep all of the operands integers. We can satisfy this constraint by constructing a table of divisors of integers in our range [1..30] (or maybe a bit more, as we'll see below). In this case our table would have told us that the divisors of 16 are {2,4,8}. (If the list of divisors for our current value is empty, we can choose a different operator, or a different leaf altogether.)
We choose a random divisor, say 4 and set that as the right child of our node. The left child is obviously value/right, also an integer.
*
/ \
4 4
Now we need to select another leaf to expand. We can randomly choose a leaf, randomly walk the tree until we reach a leaf, randomly walk up and right from our current child node (left) until we reach a leaf, or whatever.
In this case our selection algorithm chooses to expand the left child and the division operator. In the case of division, we generate a random number for the right child (in this case 10), and set left to value*right. (Order is important here! Not so for multiplication.)
*
/ \
÷ 4
/ \
40 10
This demonstrates why I said that the divisor table might need to go beyond our stated range as some of the intermediate values may be a bit larger than 30. You can tweak your code to avoid this, or make sure that large values are further expanded before reaching the final equation.
In the example we do this by selecting the leftmost child to expand with the addition operator. In this case, we can simply select a random integer in the range [1..value-1] for the right child and value-right for the left.
*
/ \
÷ 4
/ \
+ 10
/ \
25 15
You can repeat for as many operations as you want. To reconstruct the final equation, you simply need to perform an in-order traversal of the tree. To parenthesize as in your examples, you would place parentheses around the entire equation when leaving any interior (operator) node during the traversal, except for the root.

How can I modify this (simple) equation to produce my desired result?

I have a database of 817 items, each given a "rank" of 1 to 817 (the smaller the number, the "better" the item). This rank is based off of many factors that indicate quality.
Now, I need to assign a "value" to these items, with the item at rank 1 being valued the most, and the value decreasing with rank (non-linear).
The easiest first attempt was to simply choose an arbitrary base (100,000) and divide by the rank:
$value = 100000 / $rank;
/**
* Rank : Value
* 1 : 100,000
* 2 : 50,000
* 3 : 33,333
* etc.
*/
This produces exponential decay, as shown in the red line in this image:
However, I wish to value these items in a manner that looks more like the blue line above. How can I change my formula to achieve this?
Try 1/sqrt(x) (i.e, pow(x, -1/2)) for starters. If that's still not slow enough, try a smaller fractional power.
Why don't you go with linear?
value = n - rank
where n is the count of your items, i.e. 817.
I haven't tried but use exponent instead of dividing by 1000 of a base 2.
UPDATES
value = 2 pow (n-rank)

Resources