what is logic behind multiplication in probability - math

suppose two coins are tossed find the probability of getting two heads
so probability of getting one head is 1/2 and probability of getting
another head is 1/2 so we multiply 1/2*1/2 so my question is why we multiply

Related

Conditional Binomial - Does sample space change given previous results?

I am brushing up on some probability using R - and came across the following question:
It is estimated that approximately 20% of marketing calls result in a sale. What is the probability that the last 4 marketing calls made on a day (where 12 calls are made) are the only ones to result in a sale?
My initial thought would be that the probability of the last 4 calls being a sale is independent - ergo its a binomial distribution and can use dbinom(4,12,.2). However after looking at it further - I'm not sure if the sample space changes to just the remaining 4 calls, ergo dbinom(4,4,.2)
Also my understanding of the Binomial PMF
Pz({Z=z})= (n¦z)p^z(1-p)^(n-z)
which i believe that R replicates in the dbinom function, would provide the probability of any 4 successes, not specifically the last 4 items.
Is it a simple case of removing the N Choose Z piece of the PMF function? Is there an equivalent function in R?
Havnt looked at probability in a while so appreciate any assistance!

Probability of selecting exactly n elements

I have a list of about 100 000 probabilities on an event stored in a vector.
I want to know if it is possible to calculate the probability of n occuring events (e.g. what is the probability that exactly 1000 events occur).
I managed to calculate several probabilities in R :
p is the vector containing all the probabilities
probability of none : prod(1-p)
probability of at least one : 1 - prod(1-p)
I found how to calculate the probability of exactly one event :
sum(p * (prod(1-p) / (1-p)))
But I don't know how to generate a formula for n events.
I do not know R, but I know how I would solve this with programming.
This is a straightforward dynamic programming problem. We start with a vector v = [1.0] of probabilities. Then in untested Python:
for p_i in probabilities:
next_v = [p_i * v[0]]
v.append(0.0)
for j in range(len(v) - 1):
next_v.append(v[j]*p_i + v[j+1]*(1-p_i)
# For roundoff errors
total = sum(next_v)
for j in range(len(next_v)):
next_v[j] /= total
v = next_v
And now your answers can be just read off of the right entry in the vector.
This approach is equivalent to calculating Pascal's triangle row by row, throwing away the old row when you're done.

How to code it in R that count the probability of succeed once at most in 10times at a Bernoulli which P=0.1

Count the probability of succeed once at most in 10 times at a Bernoulli which P=0.1 I've tried by myself:
pbinom(0,10,.1)
I don't know it is right or not, as I don't know if "at most" here is the same the "at least".
Having at most one success means that you can have either zero success or one success.
If X is the number of successes, then the probability of having at most 1 success is p(X<=1)=p(X=0)+p(X=1).
In R, the function giving the probability p(X=x) is the density function dbinom.
So you want to do dbinom(0,10,0.1)+dbinom(1,10,0.1)

Error probability function

I have DNA amplicons with base mismatches which can arise during the PCR amplification process. My interest is, what is the probability that a sequence contains errors, given the error rate per base, number of mismatches and the number of bases in the amplicon.
I came across an article [Cummings, S. M. et al (2010). Solutions for PCR, cloning and sequencing errors in population genetic analysis. Conservation Genetics, 11(3), 1095–1097. doi:10.1007/s10592-009-9864-6]
that proposes this formula to calculate the probability mass function in such cases.
I implemented the formula with R as shown here
pcr.prob <- function(k,N,eps){
v = numeric(k)
for(i in 1:k) {
v[i] = choose(N,k-i) * (eps^(k-i)) * (1 - eps)^(N-(k-i))
}
1 - sum(v)
}
From the article, suggest we analysed an 800 bp amplicon using a PCR of 30 cycles with 1.85e10-5 misincorporations per base per cycle, and found 10 unique sequences that are each 3 bp different from their most similar sequence. The probability that a novel sequences was generated by three independent PCR errors equals P = 0.0011.
However when I use my implementation of the formula I get a different value.
pcr.prob(3,800,0.0000185)
[1] 5.323567e-07
What could I be doing wrong in my implementation? Am I misinterpreting something?
Thanks
I think they've got the right number (0.00113), but badly explained in their paper.
The calculation you want to be doing is:
pbinom(3, 800, 1-(1-1.85e-5)^30, lower=FALSE)
I.e. what's the probability of seeing less than three modifications in 800 independent bases, given 30 amplifications that each have a 1.85e-5 chance of going wrong. I.e. you're calculating the probability it doesn't stay correct 30 times.
Somewhat statsy, may be worth a move…
Thinking about this more, you will start to see floating-point inaccuracies when working with very small probabilities here. I.e. a 1-x where x is a small number will start to go wrong when the absolute value of x is less than about 1e-10. Working with log-probabilities is a good idea at this point, specifically the log1p function is a great help. Using:
pbinom(3, 800, 1-exp(log1p(-1.85e-5)*30), lower=FALSE)
will continue to work even when the error incorporation rate is very low.

Fit negative binomial distribution in R

I have a data set derived from the sport Snooker:
https://www.dropbox.com/s/1rp6zmv8jwi873s/snooker.csv
Column "playerRating" can take the values from 0 to 1, and describes how good a player is:
0: bad player
1: good player
Column "suc" is the number of consecutive balls potted by each player with the specific rating.
I am trying to prove 2 things regarding the number of consecutive balls potted until first miss:
The distribution of successes follows a negative binomial
The number of success depends on the player's worth. ie if a player is really good, he will manage to pot more consecutive balls.
I am using the "fitdistrplus" package to fit my data, however, I am unable to find a way of using the "playerRatings" as input parameters.
Any help would be much appreciated!

Resources