R 1000 rnorm samples - r

x <- rnorm(25) will produce a single sample of size 25 from the standard normal distribution.
How do I take 1000 samples of size 25 from standard normal distribution at the same time?
I would like to do this efficiently, so that I will be able to compute things such as the mean and standard deviation for each of the 1000 samples and compare them via a histogram.
[Also: I would then like to uniformally and randomly select one of these 1000 samples and bootstrap it]

X <- matrix(rnorm(25000), 1000, 25)
Each row of X is a sample of size 25 from the standard normal distribution. There are 1000 rows.

Related

How to simulate a iid process in R language?

Im new to statictics and received below question that need to be answered in R language:
Simulate an i.i.d process {Xt}t=1,···,n following standard normal Xt ∼ Normal(0,1) with
sample size n = 1000 and simulation time N = 500. Compute the sample mean ̄X(1),··· , ̄X(N),
where ̄X(i) is the sample mean from the i-th simulation. Plot the histogram for ̄X(1),··· , ̄X(N).
my thought is:
as sample size n=1000, then I should
set.seed(1) # Setting a seed
X1 <- rnorm(1000) # Simulating X1
to compute the sample mean of X1-XN
result.mean <- mean(x1)
plot the histogram for mean X1-XN
plot(result.mean, type = 'h')
However I'm not sure what to do with the simulation time N = 500? the plot i generated is just 1 bar histogram, so I'm pretty sure the simulation time should be used.
what is the purpose of simulation here? and if my thought correct in the case of iid? thank you
Using randomized numbers from a normal distribution, the base (stats) r code is rnorm, with default values having a mean of 0 and standard deviation of 1. We get 500 samples from this. Then, take the mean of a vector of those 1000 numbers. We repeat that with replicate 1000 times and throw the result into a histogram.
hist(replicate(500, mean(rnorm(1000)), simplify = "vector"))

R - generating a random sample [duplicate]

This question already has answers here:
Generate matrix with iid normal random variables using R
(4 answers)
Closed 1 year ago.
I'm having trouble figuring out how to perform the following:
Generate a plot for a normally distributed random variable X with a mean of 250 and variance of 625 (SD 25).
Generate a random sample (n=15) from a normally distributed variable Z with mean=10 variance = 400. Using this sample estimate the population mean for Z and 95% confidence interval.
Essentially the main part I'm struggling with is generating a random sample/variable. Thanks!
The function rnorm(n=15,mean=10,sd=sqrt(400)) will supply you with the wanted numbers...
The rnorm() function draws random samples from a normal distribution.
From the normal distribution page in the R manual:
rnorm(n, mean = 0, sd = 1)
n: number of observations. If length(n) > 1, the
length is taken to be the number required.
mean: vector of means.
sd: vector of standard deviations.
So if you need 15 draws with mean 250 and sd 25, rnorm(15, 250, 25).
[1] 250.0760 251.0984 201.1045 231.8379 213.2640 263.3968 274.8070 225.1520
[9] 260.0468 275.5306 295.3408 241.8458 229.2726 285.6786 232.1860

Weighted Likelihood of an Event Occurring

I want to identify the probability of certain events occurring for a range.
Min = 600 Max = 50,000 Most frequent outcome = 600
I generated a sequence of events: numbers <- seq(600,50000,by=1)
This is where I get stuck. Not sure if using the wrong distribution or attempt at execution is going down the wrong path.
qpois(numbers, lambda = 600) produces NaNs
So the outcome desired is to be able to get an output of weighted probabilities (weighted to the mean of 600). And then be able to assess the likelihood of an outlier event about 30000 is 5% or different cuts like that by summing the probabilities for those numbers.
A bit rusty, haven't used this for a few years so any online resources to refresh is also appreciated!
Firstly, I think you're looking for ppois rather than qpois. The function qpois(p, 600) takes a vector p of probabilities. If you do qpois(0.75, 600) you will get 616, meaning that 75% of observations will be at or below 616.
ppois is the opposite of qpois. If you do ppois(616, 600) you will get (approximately) 0.75.
As for your specific distribution, it can't be a Poisson distribution. Let's see what a Poisson distribution with a mean of 600 looks like:
x <- 500:700
plot(x, dpois(x, 600), type = "h")
Getting a value of greater than even 900 has (essentially) a zero probability:
1 - ppois(900, 600)
#> [1] 0
So if your data contains values of 30,000 or 50,000 as well as 600, it's certainly not a Poisson distribution.
Without knowing more about your actual data, it's not really possible to say what distribution you have. Perhaps if you include a sample of it in your question we could be of more help.
EDIT
With the sample of numbers provided in the comments, we can have a look at the actual empirical distribution:
hist(numbers, 200)
and if we want to know the probability at any point, we can create the empirical cumulative distribution function like this:
get_probability_of <- ecdf(numbers)
This allows us to do:
number <- 1:50000
plot(number, get_probability_of(number), ylab = "probability", type = "l")
and
get_probability_of(30000)
#> [1] 0.83588
Which means that the probability of getting a number higher than 30,000 is
1 - get_probability_of(30000)
#> [1] 0.16412
However, in this case, we know how the distribution is generated, so we can calculate the exact theoretical cdf just using some simple geometry (I won't show my working here because although it is simple, it is rather lengthy, dull, and not applicable to other distributions):
cdf <- function(x) ifelse(x < 600, 0, 1 - ((49400 - (x - 600)) / 49400)^2)
and
cdf(30000)
#> [1] 0.8360898
which is very close to, but more theoretically accurate than the empirical value.

simulating the t -distributions -- random samples

I am new to simulation exercises in R. I want to create 1000 samples of size 25 from a t distribution with degrees of freedom 10.
Do I need to create a single vector of data from the rt generator, and then sample repeatedly from that? So, for example, I could create the vector:
singlevector <- rt(5000, 10) , which generates data from a t-distribution of size 5000 and df = 10. So, I would treat this as my population and then sample from it. I chose the population size of 5000 arbitrarily here.
OR, should I create my 1000 samples calling on this random t generator every time?
In other words, create a matrix with 25 rows and 1000 columns, each column containing vector corresponding to a new call of rt(25, 10).
Since you are sampling independent, identically distributed values, all three of these approaches are statistically equivalent.
call the random number generator once to get as many (or more) values than you need, then sample that vector without replacement
call the random number generator 1000 times, picking 25 values each time
call the random number generator once, picking 25000 values, then subdivide the vector into individual samples in order (rather than randomly)
The latter two are not just statistically but computationally equivalent. In the first approach, the order of samples gets scrambled, but that makes no difference to the statistical properties.
Approach #1:
set.seed(101)
x1 <- rt(25000,10)
r1 <- do.call(cbind,split(x1,sample(0:24999) %/% 25))
Illustrating the equivalence of #2 and #3:
set.seed(101)
r2 <- replicate(1000, rt(25, 10))
set.seed(101)
r3 <- matrix(rt(25000,10),nrow=25)
identical(r2,r3) ## TRUE
In general solution #3 is fastest (but all of these approaches are very fast for problems of this order of magnitude, i.e. approx 5 milliseconds (#3) vs 10 milliseconds (#2) for 25 x 1000 samples on my laptop); I would pick whichever approach is easiest for you to understand when you read the code.

Lognormal truncated distribution with R, random values

I need to generate random values that represent times (in seconds) that follow a lognormal distribution with:
Min: 120 seconds
Max: 1260 seconds
Mean: 356 seconds
SD: 98 seconds
I am generating 100 random numbers:
library(EnvStats)
sample1 <- rlnormTrunc(100,356,98,120,1260)
and when I calculate the mean, it is not 356, but higher, about 490 seconds. Why?
I don't understand what I am doing wrong as I though I was going to get the same mean.
Does anyone has an answer for this?
The reason is that you compare different distributions, so when you create random numbers out of these distributions, their mean is different.
If we take as an example the normal distribution then
set.seed(111)
sample1 <- rnorm(n=10000,mean=356,sd=98)
mean(sample1) #355.7724
the mean would indeed almost 356. But if we took the truncated Normal Distribution then
set.seed(111)
sample2<-rnormTrunc(n=100000,mean=356,sd=98,min=120 ,max=1260)
mean(sample2) #357.9636
the mean would be slightly different, around 358 but not 356. The reason why the difference is so small is because, as seen in the histogram
hist(rnorm(n=10000,mean=356,sd=98),breaks=100,xlim=c(0,1300))
abline(v=120,col="red")
abline(v=1260,col="red")
enter image description here
by truncating, you take out very infrequent values ( smaller than 120 and bigger than 1260).
LogNormal is a fat-tailed distribution, skewed to the right. This means that it includes far more infrequent values than the normal distribution, far beyond 1260. If you truncate the distribution between 120 and 1260
hist(rlnormTrunc(10000,meanlog=356,sdlog=98,min=120,max=1260),breaks=100)
you get
set.seed(111)
mean(rlnormTrunc(10000,meanlog=356,sdlog=98,min=120,max=1260)) #493.3903
enter image description here
In each of the examples above you calculate the mean for a random set of values of different ranges because of different distributions, that's why you end up with different mean values.

Resources