Calling rnorm with a vector of means - r

When I call rnorm passing a single value as mean, it's obvious what happens: a value is generated from Normal(10,1).
y <- rnorm(20, mean=10, sd=1)
But, I see examples of a whole vector being passed to rnorm (or rcauchy, etc..); in this case, I am not sure what the R machinery really does. For example:
a = c(10,22,33,44,5,10,30,22,100,45,97)
y <- rnorm(a, mean=a, sd=1)
Any ideas?

The number of random numbers rnorm generates equals the length of a. From ?rnorm:
n: number of observations. If
‘length(n) > 1’, the length is taken
to be the number required.
To see what is happening when a is passed to the mean argument, it's easier if we change the example:
a = c(0, 10, 100)
y = rnorm(a, mean=a, sd=1)
[1] -0.4853138 9.3630421 99.7536461
So we generate length(a) random numbers with mean a[i].

a better example:
a <- c(0,10,100)
b <- c(2,4,6)
y <- rnorm(6,a,b)
y
result
[1] -1.2261425 10.1596462 103.3857481 -0.7260817 7.0812499 97.8964131
as you can see, for the first and fourth element of y, rnorm takes the first element of a as the mean and the first element of b as the sd.
For the second and fifth element of y, rnorm takes the second element of a as the mean and the second element of b as the sd.
For the third and sixth element of y, rnorm takes the third element of a as the mean and the third element of b as the sd.
you can experiment with diferent number in the first argument of rnorm to see what is happening
For example, what is happening if you use 5 instead 6 as the first argument in calling rnorm?

Related

Distribution of mean*standard deviation of sample from gaussian

I'm trying to assess the feasibility of an instrumental variable in my project with a variable I havent seen before. The variable essentially is an interaction between the mean and standard deviation of a sample drawn from a gaussian, and im trying to see what this distribution might look like. Below is what im trying to do, any help is much appreciated.
Generate a set of 1000 individuals with a variable x following the gaussian distribution, draw 50 random samples of 5 individuals from this distribution with replacement, calculate the means and standard deviation of x for each sample, create an interaction variable named y which is calculated by multiplying the mean and standard deviation of x for each sample, plot the distribution of y.
Beginners version
There might be more efficient ways to code this, but this is easy to follow, I guess:
stat_pop <- rnorm(1000, mean = 0, sd = 1)
N = 50
# As Ben suggested, we create a data.frame filled with NA values
samples <- data.frame(mean = rep(NA, N), sd = rep(NA, N))
# Now we use a loop to populate the data.frame
for(i in 1:N){
# draw 5 samples from population (without replacement)
# I assume you want to replace for each turn of taking 5
# If you want to replace between drawing each of the 5,
# I think it should be obvious how to adapt the following code
smpl <- sample(stat_pop, size = 5, replace = FALSE)
# the data.frame currently has two columns. In each row i, we put mean and sd
samples[i, ] <- c(mean(smpl), sd(smpl))
}
# $ is used to get a certain column of the data.frame by the column name.
# Here, we create a new column y based on the existing two columns.
samples$y <- samples$mean * samples$sd
# plot a histogram
hist(samples$y)
Most functions here use positional arguments, i.e., you are not required to name every parameter. E.g., rnorm(1000, mean = 0, sd = 1) is the same as rnorm(1000, 0, 1) and even the same as rnorm(1000), since 0 and 1 are the default values.
Somewhat more efficient version
In R, loops are very inefficient and, thus, ought to be avoided. In case of your question, it does not make any noticeable difference. However, for large data sets, performance should be kept in mind. The following might be a bit harder to follow:
stat_pop <- rnorm(1000, mean = 0, sd = 1)
N = 50
n = 5
# again, I set replace = FALSE here; if you meant to replace each individual
# (so the same individual can be drawn more than once in each "draw 5"),
# set replace = TRUE
# replicate repeats the "draw 5" action N times
smpls <- replicate(N, sample(stat_pop, n, replace = FALSE))
# we transform the output and turn it into a data.frame to make it
# more convenient to work with
samples <- data.frame(t(smpls))
samples$mean <- rowMeans(samples)
samples$sd <- apply(samples[, c(1:n)], 1, sd)
samples$y <- samples$mean * samples$sd
hist(samples$y)
General note
Usually, you should do some research on the problem before posting here. Then, you either find out how it works by yourself, or you can provide an example of what you tried. To this end, you can simply google each of the steps you outlined (e.g., google "generate random standard distribution R" in order to find out about the function rnorm().
Run ?rnorm to get help on the function in RStudio.

R: compute an integral with an unknown parameter equal to a certain value (for example: int x = 0.6)

I try to simulate values out of an unknown integral (to create a climatological forecaster)
my function is: $\int_{x = 0}^{x = 0.25} 4*y^(-1/x) dx$
Normally one inputs the variable y and gets a value as output.
However, I want to input the value this integral is equal to and get the value of y as an output.
I have 3 runif vectors of length 1 000, 10 000 and 100 000 (with values between 0 and 1), which I use as my input values.
Say the first value is 0.3 and the second value is 0.78
I want to calculate for which y, the integral above is equal to 0.3 (or equal to 0.78 for the second value).
how am I able to do this in R?
I've tried some stuff with the integrate function, but then I need a value for y to make that work
You are trying to solve a non-linear equation with an integral inside.
Intuitively, what you need to do is to start with an interval in which the desired y sits on. Then try different values of y and calculate the integral, narrow the interval by the result.
You can implement that in R using integrate and optimize as below:
f <- function(x, y) {
4*y^(-1/x)
}
intf <- function(y) {
integrate(f, 0, 0.25, y=y)
}
objective <- function(y, value) {
abs(intf(y)$value - value)
}
optimize(objective, c(1, 10), value=0.3)
#$minimum
#[1] 1.14745
#
#$objective
#[1] 1.540169e-05
optimize(objective, c(1, 10), value=0.78)
#$minimum
#[1] 1.017891
#
#$objective
#[1] 0.0001655954
Here, f is the function to be integrated, intf calculates the integral for a given y, and objective measures the distance between the value of the integral against the desired value.
Since optimize function finds the minimum value of a function, it finds y such that the objective is closest to the target value.
Note that non-linear equations with an integral inside are in general tough to solve. This case seems manageable since the function is monotonic and continuous in y. The solution y should be unique and can be easily found by narrowing down the interval.

Problem with creating a lot of new vector's

I want to do some things :
Draw 100 times 50 number's from normal distribution with
mean = 10 and standard deviation = 20
For any draw i want to count his standard deviation and arithmetic mean.
At the end i want to create a vector which has a length 100, containing the absolute value of the difference of the standard deviation and the arithmetic mean. (i.e i want to create some vector x that x[i]=|a-b|, where a is the standard deviation of 100 numbers in i-th draw, and b is the mean of 100 number's in i-th draw.
What i Did :
Creating 100 draw's from normal distribution above :
replicate(100, rnorm(50, 10, 20), simplify = FALSE)
Now i have a problem. I know that i can use functions "mean" and "sd" to count arithmetic mean and standard deviation, but i have to define number's that i draw as a vector. What i mean :
Number's that i rolled in first draw - vector 1
Number's that i rolled in second draw - vector 2
And so on
Then i can count their arithmetic mean and standard deviation.
Then we can count |a-b| (define above). And at the end i will create the vector that x[i]=|a-b|.
I have an idea but i don't know how to write it.
This is a matter of assigning the result of replicate to a variable (of class "list", since simplify = FALSE) and then sapply the mean and sd functions.
set.seed(1234) # Make the results reproducible
repl <- replicate(100, rnorm(50, 10, 20), simplify = FALSE)
mu <- sapply(repl, mean)
s <- sapply(repl, sd)
D <- abs(s - mu)
head(D)
#[1] 16.761930 7.953432 6.833691 12.491605 5.490149 6.850794
A one-liner could be
D2 <- sapply(repl, function(x) abs(sd(x) - mean(x)))
identical(D, D2)
#[1] TRUE

Vector of different n values in rbeta

I would like to simultaneously use vectors of different parameter values in rbeta and get out a vector whose length is the sum of the elements of the n vector. For example,
n <- c(10, 20, 30)
alpha <- c(1,2,3)
beta <- c(3,2,1)
rbeta(n, alpha, beta)
The bottom line doesn't do what I would like. I want the output to be a vector of length 10+20+30 = 60, with the first 10 elements being 10 samples from a beta(1,3), the next 20 elements from a beta(2,2) and the next 30 elements from a beta(3,1). What is the best way to do this?
In general when applying a function to the elements of a vector, you’d need to lapply over your input vector:
unlist(lapply(n, rbeta, 2, 1)
However, in your case you can simply sum all the ns:
rbeta(sum(n), 2, 1)
If you have multiple parameters for alpha and beta, you can use Map instead (careful, arguments are inverted compared to lapply):
unlist(Map(rbeta, n, alpha, beta))
For your revised question I think judicious use of rep() will make it work.
n <- c(10, 20, 30)
alpha <- c(1,2,3)
beta <- c(3,2,1)
rbeta(sum(n),rep(alpha,n),rep(beta,n))

How to generate such random numbers in R

I want to generate bivariates in the following way. I have four lists with equal length n. I need to use the first two lists as means lists, and the latter two as variance lists, and generate normal bivariates.
For example n=2, I have the lists as (1, 2), (3, 4), (5, 6), (7, 8), and I need
c(rnorm(1, mean=1, sd=sqrt(5)), rnorm(1, mean=2, sd=sqrt(6)), rnorm(1, mean=3, sd=sqrt(7)), rnorm(1, mean=4, sd=sqrt(8)),ncol=2)
How can I do this in R in a more functional way?
Here is one way:
m <- 1:4
s <- 5:8
rnorm(n = 4, mean = m, sd = s)
[1] 4.599257 1.661132 16.987241 3.418957
This works because, like many R functions, rnorm() is 'vectorized', meaning that it allows you to call it once with vectors as arguments, rather than many times in a loop that iterates through the elements of the vectors.
Your main task, then, is to convert the 'lists' in which you've got your arguments right now into vectors that can be passed to rnorm().
NOTE: If you want to produce more than one -- lets say 3 -- random variate for each mean/sd combination, rnorm(n=rep(3,4), mean=m, sd=s) will not work. You'll have to either: (a) repeat elements of the m and s vectors like so rnorm(n=3*4, mean=rep(m, each=3), sd=rep(s, each=3)); or (b) use mapply() as described in DWin's answer.
I'm taking you at your word that you have a list, i.e an Rlist:
plist <- list( a=list(1, 2), b=list(3, 4), c=list(5, 6), d=list(7, 8))
means <-plist[c("a","b")] # or you could use means <- plist[1:2]
vars <- plist[c("c","d")]
mapply(rnorm, n=rep(1,4), unlist(means), unlist(vars))
#[1] 3.9382147 1.0502025 0.9554021 -7.3591917
You used the term bivariate. Did you really want to have x,y pairs that had a specific correlation?

Resources