This question already has an answer here:
How can I simulate m random samples of size n from a given distribution with R?
(1 answer)
Closed 5 years ago.
How to calculate medians in R and create a histogram with a normal distribution mu=16 and sigma=4
I think that you probably want a sample with 1000 observations but shrinked to size . For doing that, you'll need a sample() function:
set.seed(12)
s1 <- sample(x = 1:1000, size = 10)
s2 <- sample(x = 1:1000, size = 40)
median(s1)
median(s2)
hist(s1)
hist(s2)
The second option is to go with rnorm(), a function that generates a random sample from a normal distribution based on the specified parameters.
set.seed(12)
s1 = rnorm(1000, mean = 0, sd = 1)
s2 = rnorm(1000, mean = 35, sd = 0.1))
median(s1)
median(s2)
hist(s1)
hist(s2)
Ps. I set the seed to have reproducible results. You may skip that line.
Note that for the second option we assume a normal(Gaussian) distribution.
Learn more about probability distributions here:
http://blog.cloudera.com/blog/2015/12/common-probability-distributions-the-data-scientists-crib-sheet/
n and u are numbers, not vectors. There must be some more information given, such as which distribution you're sampling from, and the population mean and standard deviation. For example, if you want to generate a sample of 1000 from a normal distribution with a mean 0 and sd 1, you would use
sample = rnorm(1000, 0, 1)
From there you can plot the histogram and calculate median:
median(sample)
hist(sample)
Related
I'm trying to assess the feasibility of an instrumental variable in my project with a variable I havent seen before. The variable essentially is an interaction between the mean and standard deviation of a sample drawn from a gaussian, and im trying to see what this distribution might look like. Below is what im trying to do, any help is much appreciated.
Generate a set of 1000 individuals with a variable x following the gaussian distribution, draw 50 random samples of 5 individuals from this distribution with replacement, calculate the means and standard deviation of x for each sample, create an interaction variable named y which is calculated by multiplying the mean and standard deviation of x for each sample, plot the distribution of y.
Beginners version
There might be more efficient ways to code this, but this is easy to follow, I guess:
stat_pop <- rnorm(1000, mean = 0, sd = 1)
N = 50
# As Ben suggested, we create a data.frame filled with NA values
samples <- data.frame(mean = rep(NA, N), sd = rep(NA, N))
# Now we use a loop to populate the data.frame
for(i in 1:N){
# draw 5 samples from population (without replacement)
# I assume you want to replace for each turn of taking 5
# If you want to replace between drawing each of the 5,
# I think it should be obvious how to adapt the following code
smpl <- sample(stat_pop, size = 5, replace = FALSE)
# the data.frame currently has two columns. In each row i, we put mean and sd
samples[i, ] <- c(mean(smpl), sd(smpl))
}
# $ is used to get a certain column of the data.frame by the column name.
# Here, we create a new column y based on the existing two columns.
samples$y <- samples$mean * samples$sd
# plot a histogram
hist(samples$y)
Most functions here use positional arguments, i.e., you are not required to name every parameter. E.g., rnorm(1000, mean = 0, sd = 1) is the same as rnorm(1000, 0, 1) and even the same as rnorm(1000), since 0 and 1 are the default values.
Somewhat more efficient version
In R, loops are very inefficient and, thus, ought to be avoided. In case of your question, it does not make any noticeable difference. However, for large data sets, performance should be kept in mind. The following might be a bit harder to follow:
stat_pop <- rnorm(1000, mean = 0, sd = 1)
N = 50
n = 5
# again, I set replace = FALSE here; if you meant to replace each individual
# (so the same individual can be drawn more than once in each "draw 5"),
# set replace = TRUE
# replicate repeats the "draw 5" action N times
smpls <- replicate(N, sample(stat_pop, n, replace = FALSE))
# we transform the output and turn it into a data.frame to make it
# more convenient to work with
samples <- data.frame(t(smpls))
samples$mean <- rowMeans(samples)
samples$sd <- apply(samples[, c(1:n)], 1, sd)
samples$y <- samples$mean * samples$sd
hist(samples$y)
General note
Usually, you should do some research on the problem before posting here. Then, you either find out how it works by yourself, or you can provide an example of what you tried. To this end, you can simply google each of the steps you outlined (e.g., google "generate random standard distribution R" in order to find out about the function rnorm().
Run ?rnorm to get help on the function in RStudio.
Does anybody know how I could possible simulate data with a correlation between a count variable and a continuous variable? Right now the best idea that I have is to just transform the count variable to make it approximately normal, and then to simulate the data using this R code:
set.seed(2018)
x = rnorm(n = 1000, mean = 0, sd = 1)
y = rnorm(n = 1000, mean = .29*x, sqrt(1-.3^2))
cor(x,y)
However, I really think it would be preferable if I could actually make Y a count variable (because they tend to typically be right-skewed). Also, I want to be able to specify specific correlations between x and y. E.g., simulate data with a 0.5 correlation between x and y etc.
Edit: I'm still looking for help!
You can use runif to simulate the continuous variable, then feed the result as the lambda (rate) parameter of rpois:
set.seed(1)
continuous <- runif(100, 0, 10)
counts <- rpois(100, continuous)
plot(continuous, counts)
cor(counts, continuous)
#> [1] 0.7852701
Created on 2020-12-11 by the reprex package (v0.3.0)
I want to create a sampling distribution for a mean. I have a variable x with at least ten thousand values. I want take 500 samples (n=10) and then show the distribution of the sample means in a histogram. I think it worked with the following, but can anyone check if this is what i meant and tell me what the 2 within the apply function stands for?
x <- rnorm(10000, 7.5, 1.5)
draws = sample(x, size = 10 * 500, replace = TRUE)
draws = matrix(draws, 10)
drawmeans = apply(draws, 2, mean)
hist(drawmeans)
would be sincerely appreciated!
You could do this using replicate if you wanted. One of lots of different ways. For data frame df
out = replicate(500, mean(sample(df$Scores,10)))
hist(out)
I have got n>2 independent continuous Random Variables(RV). For example say I have 4 Uniform RVs with different set of Upper and lowers.
W~U[-1,5], X~U[0,1], Y~[0,2], Z~[0.5,2]
I am trying to find out the approximate PDF for the sum of these RVs i.e. for T=W+X+Y+Z. As I don't need any closed form solution, I have sampled 1 million points for each of them to get 1 million samples for T. Is it possible in R to get the approximate PDF function or a way to get approximate probability of P(t<T)from this samples I have drawn. For example is there a easy way I can calculate P(0.5<T) in R. My priority here is to get probability first even if getting the density function is not possible.
Thanks
Consider the ecdf function:
set.seed(123)
W <- runif(1e6, -1, 5)
X <- runif(1e6, 0, 1)
Y <- runif(1e6, 0, 2)
Z <- runif(1e6, 0.5, 2)
T <- Reduce(`+`, list(W, X, Y, Z))
cdfT <- ecdf(T)
1 - cdfT(0.5) # Pr(T > 0.5)
# [1] 0.997589
See How to calculate cumulative distribution in R? for more details.
I want to generate some Weibull random numbers in a given interval. For example 20 random numbers from the Weibull distribution with shape 2 and scale 30 in the interval (0, 10).
rweibull function in R produce random numbers from a Weibull distribution with given shape and scale values. Can someone please suggest a method? Thank you in advance.
Use the distr package. It allows to do this kind of stuff very easily.
require(distr)
#we create the distribution
d<-Truncate(Weibull(shape=2,scale=30),lower=0,upper=10)
#The d object has four slots: d,r,p,q that correspond to the [drpq] prefix of standard R distributions
#This extracts 10 random numbers
d#r(10)
#get an histogram
hist(d#r(10000))
Using base R you can generate random numbers, filter which drop into target interval and generate some more if their quantity appears to be less than you need.
rweibull_interval <- function(n, shape, scale = 1, min = 0, max = 10) {
weib_rnd <- rweibull(10*n, shape, scale)
weib_rnd <- weib_rnd[weib_rnd > min & weib_rnd < max]
if (length(weib_rnd) < n)
return(c(weib_rnd, rweibull_interval(n - length(weib_rnd), shape, scale, min, max))) else
return(weib_rnd[1:n])
}
set.seed(1)
rweibull_interval(20, 2, 30, 0, 10)
[1] 9.308806 9.820195 7.156999 2.704469 7.795618 9.057581 6.013369 2.570710 8.430086 4.658973
[11] 2.715765 8.164236 3.676312 9.987181 9.969484 9.578524 7.220014 8.241863 5.951382 6.934886