R - generating a random sample [duplicate] - r

This question already has answers here:
Generate matrix with iid normal random variables using R
(4 answers)
Closed 1 year ago.
I'm having trouble figuring out how to perform the following:
Generate a plot for a normally distributed random variable X with a mean of 250 and variance of 625 (SD 25).
Generate a random sample (n=15) from a normally distributed variable Z with mean=10 variance = 400. Using this sample estimate the population mean for Z and 95% confidence interval.
Essentially the main part I'm struggling with is generating a random sample/variable. Thanks!

The function rnorm(n=15,mean=10,sd=sqrt(400)) will supply you with the wanted numbers...

The rnorm() function draws random samples from a normal distribution.
From the normal distribution page in the R manual:
rnorm(n, mean = 0, sd = 1)
n: number of observations. If length(n) > 1, the
length is taken to be the number required.
mean: vector of means.
sd: vector of standard deviations.
So if you need 15 draws with mean 250 and sd 25, rnorm(15, 250, 25).
[1] 250.0760 251.0984 201.1045 231.8379 213.2640 263.3968 274.8070 225.1520
[9] 260.0468 275.5306 295.3408 241.8458 229.2726 285.6786 232.1860

Related

How to simulate a iid process in R language?

Im new to statictics and received below question that need to be answered in R language:
Simulate an i.i.d process {Xt}t=1,···,n following standard normal Xt ∼ Normal(0,1) with
sample size n = 1000 and simulation time N = 500. Compute the sample mean ̄X(1),··· , ̄X(N),
where ̄X(i) is the sample mean from the i-th simulation. Plot the histogram for ̄X(1),··· , ̄X(N).
my thought is:
as sample size n=1000, then I should
set.seed(1) # Setting a seed
X1 <- rnorm(1000) # Simulating X1
to compute the sample mean of X1-XN
result.mean <- mean(x1)
plot the histogram for mean X1-XN
plot(result.mean, type = 'h')
However I'm not sure what to do with the simulation time N = 500? the plot i generated is just 1 bar histogram, so I'm pretty sure the simulation time should be used.
what is the purpose of simulation here? and if my thought correct in the case of iid? thank you
Using randomized numbers from a normal distribution, the base (stats) r code is rnorm, with default values having a mean of 0 and standard deviation of 1. We get 500 samples from this. Then, take the mean of a vector of those 1000 numbers. We repeat that with replicate 1000 times and throw the result into a histogram.
hist(replicate(500, mean(rnorm(1000)), simplify = "vector"))

How to generate a sample of artificial data with a particular variance?

I am trying to generate a data set with the following information->
sample size:200
variance = 2
mean = 20
I have tried generating it using the rnorm() function but it only takes standard deviation as variable. I have also tried to square root the standard deviation to generate the desired variance but it doesn't work either.
How can I generate such dataset with that mean and variance in Rstudio?
Thank you.
x = rnorm(200, 20, sd=sqrt(2))
c(mean(x), var(x))
[1] 20.064919 1.981597

Weighted Likelihood of an Event Occurring

I want to identify the probability of certain events occurring for a range.
Min = 600 Max = 50,000 Most frequent outcome = 600
I generated a sequence of events: numbers <- seq(600,50000,by=1)
This is where I get stuck. Not sure if using the wrong distribution or attempt at execution is going down the wrong path.
qpois(numbers, lambda = 600) produces NaNs
So the outcome desired is to be able to get an output of weighted probabilities (weighted to the mean of 600). And then be able to assess the likelihood of an outlier event about 30000 is 5% or different cuts like that by summing the probabilities for those numbers.
A bit rusty, haven't used this for a few years so any online resources to refresh is also appreciated!
Firstly, I think you're looking for ppois rather than qpois. The function qpois(p, 600) takes a vector p of probabilities. If you do qpois(0.75, 600) you will get 616, meaning that 75% of observations will be at or below 616.
ppois is the opposite of qpois. If you do ppois(616, 600) you will get (approximately) 0.75.
As for your specific distribution, it can't be a Poisson distribution. Let's see what a Poisson distribution with a mean of 600 looks like:
x <- 500:700
plot(x, dpois(x, 600), type = "h")
Getting a value of greater than even 900 has (essentially) a zero probability:
1 - ppois(900, 600)
#> [1] 0
So if your data contains values of 30,000 or 50,000 as well as 600, it's certainly not a Poisson distribution.
Without knowing more about your actual data, it's not really possible to say what distribution you have. Perhaps if you include a sample of it in your question we could be of more help.
EDIT
With the sample of numbers provided in the comments, we can have a look at the actual empirical distribution:
hist(numbers, 200)
and if we want to know the probability at any point, we can create the empirical cumulative distribution function like this:
get_probability_of <- ecdf(numbers)
This allows us to do:
number <- 1:50000
plot(number, get_probability_of(number), ylab = "probability", type = "l")
and
get_probability_of(30000)
#> [1] 0.83588
Which means that the probability of getting a number higher than 30,000 is
1 - get_probability_of(30000)
#> [1] 0.16412
However, in this case, we know how the distribution is generated, so we can calculate the exact theoretical cdf just using some simple geometry (I won't show my working here because although it is simple, it is rather lengthy, dull, and not applicable to other distributions):
cdf <- function(x) ifelse(x < 600, 0, 1 - ((49400 - (x - 600)) / 49400)^2)
and
cdf(30000)
#> [1] 0.8360898
which is very close to, but more theoretically accurate than the empirical value.

Simulate a set of number less than 20 that follows normal distribution [duplicate]

This question already has answers here:
Generate random numbers with fixed mean and sd
(3 answers)
Closed 3 years ago.
What can I do to compel R to give me a sample less than 20, such that the sample will have mean = 0 and variance = var.
MWE
rnorm(20, mean=0, sd=1)
As the sample size gets larger, the sample gets closer to normal. How then can I make R to give me a sample of n < 20 with a mean equal to zero and variance equal to whatever I specify?
Try
rnorm(20, mean=0, sd=sqrt(var))
For what it's worth
your last paragraph sounds like a slightly confused statement of the Central Limit Theorem; there's no difficulty in creating small data sets that are normal, or Gaussian (the CLT says that sums of N independent, identically distributed variables approach normality as N goes to infinity ...)
it would be worth using v rather than var to denote your variance, since var() is a built-in function in R (this is mostly harmless, but occasionally confusing).

R 1000 rnorm samples

x <- rnorm(25) will produce a single sample of size 25 from the standard normal distribution.
How do I take 1000 samples of size 25 from standard normal distribution at the same time?
I would like to do this efficiently, so that I will be able to compute things such as the mean and standard deviation for each of the 1000 samples and compare them via a histogram.
[Also: I would then like to uniformally and randomly select one of these 1000 samples and bootstrap it]
X <- matrix(rnorm(25000), 1000, 25)
Each row of X is a sample of size 25 from the standard normal distribution. There are 1000 rows.

Resources