I'm new to R, and it's been a while for me and statistics. I don't understand what the parameters of rbinom are. What's is the difference between n (="number of observations") and size (="number of trials")?
This question is related, but I don't understand the answer there.
Thank you very much.
A binomial distribution usually has two parameters, an integer which indicates the number of attempts and so the maximum possible value (here called the size) and a success probability for each attempt between 0 and 1. The expectation is then the product of these two parameters.
For a random sample from this distribution, you are also interested in having a particular number of observations.
So in rbinom(n, size, prob) you have
n being the number of sample observations
size being the integer parameter of the binomial distribution, using 1 if you want a Bernoulli distribution
prob for the probability parameter of the binomial distribution
As an example, you might get
set.seed(2021)
rbinom(5, 100, 0.2)
# 19 23 22 19 21
Related
I need help with gamma distribution in R, I want to choose randomly N samples (N=1200) in different sizes n (n=10, n=100 etc.) from gamma distribution where alpha=4.3 and beta=2.1.
I have to set.seed(a) where a is number of four digits, and I have to find the Maximum likelihood estimation for both alpha a beta and also with the Method of moments for both of them.( in interval (0.1,100))
I'm having a problem with how to start the simulation code; do I start by defining X as a seq and then set a seed? If so, where/ how to put the N and different values of n in consideration? And the alpha and beta ...?
How do I use the Binomial function to solve this experiment:
number of trials -> n=18,
p=10%
success x=2
The answer is 28% . I am using Binomial(18, 0.1) but how I pass the n=2?
julia> d=Binomial(18,0.1)
Binomial{Float64}(n=18, p=0.1)
pdf(d,2)
How can I solve this in Julia?
What you want is the Probability Mass Function, aka the probability, that in a binomial experiment of n Bernoulli independent trials with a probability p of success on each individual trial, we obtain exactly x successes.
The way to answer this question in Julia is, using the Distribution package, to first create the "distribution" object with parameters n and p, and then call the function pdf to this object and the variable x:
using Distributions
n = 18 # number of trials in our experiments
p = 0.1 # probability of success of a single trial
x = 2 # number of successes for which we want to compute the probability/PMF
binomialDistribution = Binomial(n,p)
probOfTwoSuccesses = pdf(binomialDistribution,x)
Note that all the other probability related functions (like cdf, quantile, .. but also rand) work in the same way.. you first build the distribution object, that embed the specific distribution parameters, and then you call the function over the distribution object and the variable you are looking for, e.g. quantile(binomialDistribution,0.9) for 90% quantile.
I want to identify the probability of certain events occurring for a range.
Min = 600 Max = 50,000 Most frequent outcome = 600
I generated a sequence of events: numbers <- seq(600,50000,by=1)
This is where I get stuck. Not sure if using the wrong distribution or attempt at execution is going down the wrong path.
qpois(numbers, lambda = 600) produces NaNs
So the outcome desired is to be able to get an output of weighted probabilities (weighted to the mean of 600). And then be able to assess the likelihood of an outlier event about 30000 is 5% or different cuts like that by summing the probabilities for those numbers.
A bit rusty, haven't used this for a few years so any online resources to refresh is also appreciated!
Firstly, I think you're looking for ppois rather than qpois. The function qpois(p, 600) takes a vector p of probabilities. If you do qpois(0.75, 600) you will get 616, meaning that 75% of observations will be at or below 616.
ppois is the opposite of qpois. If you do ppois(616, 600) you will get (approximately) 0.75.
As for your specific distribution, it can't be a Poisson distribution. Let's see what a Poisson distribution with a mean of 600 looks like:
x <- 500:700
plot(x, dpois(x, 600), type = "h")
Getting a value of greater than even 900 has (essentially) a zero probability:
1 - ppois(900, 600)
#> [1] 0
So if your data contains values of 30,000 or 50,000 as well as 600, it's certainly not a Poisson distribution.
Without knowing more about your actual data, it's not really possible to say what distribution you have. Perhaps if you include a sample of it in your question we could be of more help.
EDIT
With the sample of numbers provided in the comments, we can have a look at the actual empirical distribution:
hist(numbers, 200)
and if we want to know the probability at any point, we can create the empirical cumulative distribution function like this:
get_probability_of <- ecdf(numbers)
This allows us to do:
number <- 1:50000
plot(number, get_probability_of(number), ylab = "probability", type = "l")
and
get_probability_of(30000)
#> [1] 0.83588
Which means that the probability of getting a number higher than 30,000 is
1 - get_probability_of(30000)
#> [1] 0.16412
However, in this case, we know how the distribution is generated, so we can calculate the exact theoretical cdf just using some simple geometry (I won't show my working here because although it is simple, it is rather lengthy, dull, and not applicable to other distributions):
cdf <- function(x) ifelse(x < 600, 0, 1 - ((49400 - (x - 600)) / 49400)^2)
and
cdf(30000)
#> [1] 0.8360898
which is very close to, but more theoretically accurate than the empirical value.
I have a bunch of random variables (X1,....,Xn) which are i.i.d. Exp(1/2) and represent the duration of time of a certain event. So this distribution has obviously an expected value of 2, but I am having problems defining it in R. I did some research and found something about a so-called Monte-Carlo Stimulation, but I don't seem to find what I am looking for in it.
An example of what i want to estimate is: let's say we have 10 random variables (X1,..,X10) distributed as above, and we want to determine for example the probability P([X1+...+X10<=25]).
Thanks.
You don't actually need monte carlo simulation in this case because:
If Xi ~ Exp(λ) then the sum (X1 + ... + Xk) ~ Erlang(k, λ) which is just a Gamma(k, 1/λ) (in (k, θ) parametrization) or Gamma(k, λ) (in (α,β) parametrization) with an integer shape parameter k.
From wikipedia (https://en.wikipedia.org/wiki/Exponential_distribution#Related_distributions)
So, P([X1+...+X10<=25]) can be computed by
pgamma(25, shape=10, rate=0.5)
Are you aware of rexp() function in R? Have a look at documentation page by typing ?rexp in R console.
A quick answer to your Monte Carlo estimation of desired probability:
mean(rowSums(matrix(rexp(1000 * 10, rate = 0.5), 1000, 10)) <= 25)
I have generated 1000 set of 10 exponential samples, putting them into a 1000 * 10 matrix. We take row sum and get a vector of 1000 entries. The proportion of values between 0 and 25 is an empirical estimate of the desired probability.
Thanks, this was helpful! Can I use replicate with this code, to make it look like this: F <- function(n, B=1000) mean(replicate(B,(rexp(10, rate = 0.5)))) but I am unable to output the right result.
replicate here generates a matrix, too, but it is an 10 * 1000 matrix (as opposed to a 1000* 10 one in my answer), so you now need to take colSums. Also, where did you put n?
The correct function would be
F <- function(n, B=1000) mean(colSums(replicate(B, rexp(10, rate = 0.5))) <= n)
For non-Monte Carlo method to your given example, see the other answer. Exponential distribution is a special case of gamma distribution and the latter has additivity property.
I am giving you Monte Carlo method because you name it in your question, and it is applicable beyond your example.
I'm just learning how to use R. I'm practicing some statistic stuff, as Normal distribution, Poisson, etc.
When I try to calculate probabilities and the answer is a number very close to zero (0), the program shows as result 0, so I can't see the full answer, and I need the full answer. There is always a probability, even a small one!!
My question is: can I turn off the self-approximation or which code can I use to get a full answer?
Example:
1-pbinom(q =10, size = 10,prob = 0.8)
Result:
0
The pbinom function gives the cumulative density function. That i the probability that a value is less than or equal to a particular value. So with a discrete distribution like the binomial distribution with 10 draws
pbinom(10, 10, .8)
# [1] 1
tells you that there is a 100% change you will observe 10 or fewer successes.
Perhaps you're thinking of the probability density function (or probability mass function since this is a discrete distribution) dbinom
dbinom(10, 10, .8)
# [1] 0.1073742
means that there is a roughly 11% chance that all your draws will be successes. It's also true that
sum(dbinom(0:10, 10, .8))
# [1] 1
that the sum of the probabilities of getting 0 through is exactly 1.
So with these cases you are getting the exact answer. R does round values in the console according to the options(digits=) setting, but that's not what's happening here.
pbinom is the distribution function for the binomial distribution, which is discrete and can thus be exactly 1 (as in your example). You might have been thinking of continuous distributions like the normal or gamma distributions. In this case, rounding can cause your results to be truncated, for example
> 1 - pnorm(10, 0, 1)
[1] 0
However, the p[dist] functions have an argument lower.tail=FALSE designed to address this problem:
> pnorm(10, 0, 1, lower.tail=FALSE)
[1] 7.619853e-24