sample() function in R - r

My question is quite simple, I'm trying to simulate 500 draws from any distribution using sample().
Let's take binomial distribution B(10,0.5) for example. Using the sample() function in R.
This is what I did:
draw = 1:500
data = sample(x=draw, size=10, replace=TRUE, prob=rep(0.5, each=500))
However whenever I draw the historgram, it looks like it's random and doesn't have a binomial distribution. What am I doing wrong?
Note: I know there is rbinom() function in r that does this. I am trying to understand how the sample() function works.

data = rbinom(n=500,size=10,prob=.5)
hist(data)

sample(x = c(1,0),size = 10,replace=TRUE,prob = c(0.5,0.5))
You may want to histogram the sum of this vector generated multiple times to see your binomial distribution.
draws=c()
for(i in 1:500){
draws=c(draws,sum(sample(x = c(1,0),size = 10,replace=TRUE,prob = c(0.5,0.5))))
}
hist(draws)
In this example sample is returning 10 (size = 10) samples of the values 1 or 0 (x = c(1,0)), with equal probability for each (prob = c(0.5,0.5)). replace=TRUE just means that either item can be draw more than once. These 1's and 0's are the results of 10 bernoulli trials with probability 0.5. The binomial distribution is the probability distribution of the number of successes (1's) in a series of n Bernoulli trials each with probability p. So (n=10 and p=0.5). Calling sample once gives the 10 draws, summing that vector gives a draw from the binomial. We take 500 draws from that binomial distribution and draw a histogram.

Related

How to simulate a iid process in R language?

Im new to statictics and received below question that need to be answered in R language:
Simulate an i.i.d process {Xt}t=1,···,n following standard normal Xt ∼ Normal(0,1) with
sample size n = 1000 and simulation time N = 500. Compute the sample mean ̄X(1),··· , ̄X(N),
where ̄X(i) is the sample mean from the i-th simulation. Plot the histogram for ̄X(1),··· , ̄X(N).
my thought is:
as sample size n=1000, then I should
set.seed(1) # Setting a seed
X1 <- rnorm(1000) # Simulating X1
to compute the sample mean of X1-XN
result.mean <- mean(x1)
plot the histogram for mean X1-XN
plot(result.mean, type = 'h')
However I'm not sure what to do with the simulation time N = 500? the plot i generated is just 1 bar histogram, so I'm pretty sure the simulation time should be used.
what is the purpose of simulation here? and if my thought correct in the case of iid? thank you
Using randomized numbers from a normal distribution, the base (stats) r code is rnorm, with default values having a mean of 0 and standard deviation of 1. We get 500 samples from this. Then, take the mean of a vector of those 1000 numbers. We repeat that with replicate 1000 times and throw the result into a histogram.
hist(replicate(500, mean(rnorm(1000)), simplify = "vector"))

Binomial Experiment

How do I use the Binomial function to solve this experiment:
number of trials -> n=18,
p=10%
success x=2
The answer is 28% . I am using Binomial(18, 0.1) but how I pass the n=2?
julia> d=Binomial(18,0.1)
Binomial{Float64}(n=18, p=0.1)
pdf(d,2)
How can I solve this in Julia?
What you want is the Probability Mass Function, aka the probability, that in a binomial experiment of n Bernoulli independent trials with a probability p of success on each individual trial, we obtain exactly x successes.
The way to answer this question in Julia is, using the Distribution package, to first create the "distribution" object with parameters n and p, and then call the function pdf to this object and the variable x:
using Distributions
n = 18 # number of trials in our experiments
p = 0.1 # probability of success of a single trial
x = 2 # number of successes for which we want to compute the probability/PMF
binomialDistribution = Binomial(n,p)
probOfTwoSuccesses = pdf(binomialDistribution,x)
Note that all the other probability related functions (like cdf, quantile, .. but also rand) work in the same way.. you first build the distribution object, that embed the specific distribution parameters, and then you call the function over the distribution object and the variable you are looking for, e.g. quantile(binomialDistribution,0.9) for 90% quantile.

Is there an R function to find the probability of certain data being created from a Beta distribution?

I have a vector x of values in R. I want to know the probability that the data was made from a Beta(20,40) distribution. I am using R.
When I make this function call
dbeta(x, 10, 20)
I get the probability for each entry in the vector.
0.065278039 0.003434240 0.036265577 0.175467370 0.018132789 0.065278039
0.175467370 0.175467370
I was wondering if it is possible to output one number to show the probability that the entire data vector was made from a Beta distribution.
For example, the probability of dataset $x$ being generated from a Beta(20,40) distribution is some number.
Thanks!
You can try by performing some hypotesis tests like Kolmogorov-Smirnov or Hoeffing and compare the data taken from the dataset with the Beta(20,40).
This tests are used to evaluate the hypothesis that two samples are drawn for the same distribution.
Something like this ks.test(x,y = 'pbeta', shape1 = 20, shape2 = 40) should do the work.

How to generate random values from "inverse log-normal distribution" in R

I am trying to replicate one part of the simulating programming results in a working paper, which says the authors 'generate random values from the right-skewed "inverse log-normal" distribution with an expected median value of w_median = 0.85, with an additional condition, 0 <= w <= 1,' which obviously means random values are within 0 and 1. I am using R, and there are functions for generating "log-normal" distributions like dlnorm, plnorm, qlnorm, rlnorm, and it's quite obvious to generate random values from log-normal distributions with those functions like:
rand_val <- rlnorm(1000, meanlog=log(0.85))
hist(rand_val, breaks=100)
median(rand_val) # 0.8856299
min(rand_val) # 0.04660691
max(rand_val) # 23.33998
But I have no idea about how to generate the random values from "inverse log-normal" distribution. There was essentially the same question raised before (Inverse of the lognormal distribution), and they suggested using qlnorm function, but I am not sure how that function works for generating random values from inverse log-normal distribution, especially with additional conditions of mine as mentioned: 1) expected median value = 0.85; 2) random values are within 0 to 1. Thanks in advance!

Defining exponential distribution in R to estimate probabilities

I have a bunch of random variables (X1,....,Xn) which are i.i.d. Exp(1/2) and represent the duration of time of a certain event. So this distribution has obviously an expected value of 2, but I am having problems defining it in R. I did some research and found something about a so-called Monte-Carlo Stimulation, but I don't seem to find what I am looking for in it.
An example of what i want to estimate is: let's say we have 10 random variables (X1,..,X10) distributed as above, and we want to determine for example the probability P([X1+...+X10<=25]).
Thanks.
You don't actually need monte carlo simulation in this case because:
If Xi ~ Exp(λ) then the sum (X1 + ... + Xk) ~ Erlang(k, λ) which is just a Gamma(k, 1/λ) (in (k, θ) parametrization) or Gamma(k, λ) (in (α,β) parametrization) with an integer shape parameter k.
From wikipedia (https://en.wikipedia.org/wiki/Exponential_distribution#Related_distributions)
So, P([X1+...+X10<=25]) can be computed by
pgamma(25, shape=10, rate=0.5)
Are you aware of rexp() function in R? Have a look at documentation page by typing ?rexp in R console.
A quick answer to your Monte Carlo estimation of desired probability:
mean(rowSums(matrix(rexp(1000 * 10, rate = 0.5), 1000, 10)) <= 25)
I have generated 1000 set of 10 exponential samples, putting them into a 1000 * 10 matrix. We take row sum and get a vector of 1000 entries. The proportion of values between 0 and 25 is an empirical estimate of the desired probability.
Thanks, this was helpful! Can I use replicate with this code, to make it look like this: F <- function(n, B=1000) mean(replicate(B,(rexp(10, rate = 0.5)))) but I am unable to output the right result.
replicate here generates a matrix, too, but it is an 10 * 1000 matrix (as opposed to a 1000* 10 one in my answer), so you now need to take colSums. Also, where did you put n?
The correct function would be
F <- function(n, B=1000) mean(colSums(replicate(B, rexp(10, rate = 0.5))) <= n)
For non-Monte Carlo method to your given example, see the other answer. Exponential distribution is a special case of gamma distribution and the latter has additivity property.
I am giving you Monte Carlo method because you name it in your question, and it is applicable beyond your example.

Resources