Vector of different n values in rbeta - r

I would like to simultaneously use vectors of different parameter values in rbeta and get out a vector whose length is the sum of the elements of the n vector. For example,
n <- c(10, 20, 30)
alpha <- c(1,2,3)
beta <- c(3,2,1)
rbeta(n, alpha, beta)
The bottom line doesn't do what I would like. I want the output to be a vector of length 10+20+30 = 60, with the first 10 elements being 10 samples from a beta(1,3), the next 20 elements from a beta(2,2) and the next 30 elements from a beta(3,1). What is the best way to do this?

In general when applying a function to the elements of a vector, you’d need to lapply over your input vector:
unlist(lapply(n, rbeta, 2, 1)
However, in your case you can simply sum all the ns:
rbeta(sum(n), 2, 1)
If you have multiple parameters for alpha and beta, you can use Map instead (careful, arguments are inverted compared to lapply):
unlist(Map(rbeta, n, alpha, beta))

For your revised question I think judicious use of rep() will make it work.
n <- c(10, 20, 30)
alpha <- c(1,2,3)
beta <- c(3,2,1)
rbeta(sum(n),rep(alpha,n),rep(beta,n))

Related

R code Gaussian mixture -- numerical expression has 2 elements: only the first used

I'm trying to create a Gaussian Mix function according to these parameters:
For each sample, roll a die with k sides
If the j-th side appears from the roll, draw a sample from Normal(muj, sdj) where muj and sdj are the mean and standard deviation for the j-th Normal distribution respectively. This means you should have k different Normal distributions to choose from. Note that muj is the mathematical form of referring to the j-th element in a vector called mus.
The resulting sample from this Normal is then from a Gaussian Mixture.
Where:
n, an integer that represents the number of independent samples you want from this random variable
mus, a numeric vector with length k
sds, a numeric vector with length k
prob, a numeric vector with length k that indicates the probability of choosing the different Gaussians. This should have a default to NULL.
This is what I came up with so far:
n <- c(1)
mus <- c()
sds <- c()
prob <- c()
rgaussmix <- function(n, mus, sds, prob = NULL){
if(length(mus) != length(sds)){
stop("mus and sds have different lengths")
}
for(i in 1:seq_len(n)){
if(is.null(prob)){
rolls <- c(NA, n)
rolls <- sample(c(1:length(mus)), n, replace=TRUE)
avg <- rnorm(length(rolls), mean=mus[rolls], sd=sds[rolls])
}else{
rolls <- c(NA, n)
rolls <- sample(c(1:length(mus), n, replace=TRUE, p=prob))
avg <- rnorm(length(rolls), mean=mus[rolls], sd=sds[rolls])
}
}
return(avg)
}
rgaussmix(2, 1:3, 1:3)
It seems to match most of the requirements, but it keeps giving me the following error:
numerical expression has 2 elements: only the first usednumber of items to replace is not a multiple of replacement length
I've tried looking at the lengths of multiple variables, but I can't seem to figure out where the error is coming from!
Could someone please help me?
If you do seq_len(2) it gives you:
[1] 1 2
And you cannot do 1:(1:2) .. it doesn't make sense
Also you can avoid the loops in your code, by sampling the number of tries you need, for example if you do:
rnorm(3,c(0,10,20),1)
[1] -0.507961 8.568335 20.279245
It gives you 1st sample from the 1st mean, 2nd sample from 2nd mean and so on. So you can simplify your function to:
rgaussmix <- function(n, mus, sds, prob = NULL){
if(length(mus) != length(sds)){
stop("mus and sds have different lengths")
}
if(is.null(prob)){
prob = rep(1/length(mus),length(mus))
}
rolls <- sample(length(mus), n, replace=TRUE, p=prob)
avg <- rnorm(n, mean=mus[rolls], sd=sds[rolls])
avg
}
You can plot the results:
plot(density(rgaussmix(10000,c(0,5,10),c(1,1,1))),main="mixture of 0,5,10")

How to vectorise sampling from non-identically distributed Bernoulli random variables?

Given a sequence of independent but not identically distributed Bernoulli trials with success probabilities given by a vector, e.g.:
x <- seq(0, 50, 0.1)
prob <- - x*(x - 50)/1000 # trial probabilities for trials 1 to 501
What is the most efficient way to obtain a random variate from each trial? I am assuming that vectorisation is the way to go.
I know of two functions that give Bernoulli random variates:
rbernoulli from the package purr, which does not accept a vector of success probabilities as an input. In this case it may be possible to wrap the function in an apply type operation.
rbinom with arguments size = 1 gives Bernoulli random variates. It also accepts a vector of probabilities, so that:
rbinom(n = length(prob), size = 1, prob = prob)
gives an output with the right length. However, I am not entirely sure that this is actually what I want. The bits in the helpfile ?rbinom that seem relevant are:
The length of the result is determined by n for rbinom, and is the
maximum of the lengths of the numerical arguments for the other
functions.
The numerical arguments other than n are recycled to the length of the
result. Only the first elements of the logical arguments are used.
However, n is a parameter with no default, so I am not sure what the first sentence means. I presume the second sentence means that I get what I want, since only size = 1 should be recycled. However this thread seems to suggest that this method does not work.
This blog post gives some other methods as well. One commentator mentions my suggested idea using rbinom.
Another way to test that rbinom is vectorised for prob, taking advantage of the fact that the sum of N bernoulli random variables is a binomial random variable with denominator N:
x <- seq(0, 50, 0.1)
prob <- -x*(x - 50)/1000
n <- rbinom(prob, size=1000, prob)
par(mfrow=c(1, 2))
plot(prob ~ x)
plot(n ~ x)
If you don't trust random strangers on the internet and do not understand documentation, maybe you can convince yourself by testing. Just set the random seed to get reproducible results:
x <- seq(0, 50, 0.1)
prob <- - x*(x - 50)/1000
#501 seperate draws of 1 random number
set.seed(42)
res1 <- sapply(prob, rbinom, n = 1, size = 1)
#501 "simultaneous" (vectorized) draws
set.seed(42)
res2 <- rbinom(501, 1, prob)
identical(res1, res2)
#[1] TRUE

Estimate a probability from elements of a list in R

I have a list of 100,000 simulated numbers of T in R (min: 1.5, max 88.8) and I want to calculate the probability of T being between 10 and 50.
I sumulated 100,000 numbers of T, where T is t(y) %*% M %*% y where M is a 8x8 matrix of constant values and y is a 8x1 matrix. The element in the i-th row if y, is equal to: a_i + b_i where a is a vector of constants and b is a vector whose elements follow a normal (0,sd=2) distribution (each element is a different simulated number of N(0,2) )
Is it in a vector or a list? If it's a vector, the following should work. If it's in a list, you may use unlist() to convert it to a vector.
mylist <- runif(100000,1.5,88.8) #this is just to generate a random number vector
length(which(mylist>=10 & mylist<=50))/length(mylist)
set.seed(42)
myrandoms <- rnorm(100000, mean=5, sd=2)
mydistr <- ecdf(myrandoms)
#probability of beeing between 1 and 3:
diff(mydistr(c(1, 3)))
#[1] 0.13781
#compare with normal distribution
diff(pnorm(c(1, 3), mean=5, sd=2))
#[1] 0.1359051
If you really have a list, use myrandoms <- do.call(c, mylist) to make it a vector.

How to generate such random numbers in R

I want to generate bivariates in the following way. I have four lists with equal length n. I need to use the first two lists as means lists, and the latter two as variance lists, and generate normal bivariates.
For example n=2, I have the lists as (1, 2), (3, 4), (5, 6), (7, 8), and I need
c(rnorm(1, mean=1, sd=sqrt(5)), rnorm(1, mean=2, sd=sqrt(6)), rnorm(1, mean=3, sd=sqrt(7)), rnorm(1, mean=4, sd=sqrt(8)),ncol=2)
How can I do this in R in a more functional way?
Here is one way:
m <- 1:4
s <- 5:8
rnorm(n = 4, mean = m, sd = s)
[1] 4.599257 1.661132 16.987241 3.418957
This works because, like many R functions, rnorm() is 'vectorized', meaning that it allows you to call it once with vectors as arguments, rather than many times in a loop that iterates through the elements of the vectors.
Your main task, then, is to convert the 'lists' in which you've got your arguments right now into vectors that can be passed to rnorm().
NOTE: If you want to produce more than one -- lets say 3 -- random variate for each mean/sd combination, rnorm(n=rep(3,4), mean=m, sd=s) will not work. You'll have to either: (a) repeat elements of the m and s vectors like so rnorm(n=3*4, mean=rep(m, each=3), sd=rep(s, each=3)); or (b) use mapply() as described in DWin's answer.
I'm taking you at your word that you have a list, i.e an Rlist:
plist <- list( a=list(1, 2), b=list(3, 4), c=list(5, 6), d=list(7, 8))
means <-plist[c("a","b")] # or you could use means <- plist[1:2]
vars <- plist[c("c","d")]
mapply(rnorm, n=rep(1,4), unlist(means), unlist(vars))
#[1] 3.9382147 1.0502025 0.9554021 -7.3591917
You used the term bivariate. Did you really want to have x,y pairs that had a specific correlation?

Calling rnorm with a vector of means

When I call rnorm passing a single value as mean, it's obvious what happens: a value is generated from Normal(10,1).
y <- rnorm(20, mean=10, sd=1)
But, I see examples of a whole vector being passed to rnorm (or rcauchy, etc..); in this case, I am not sure what the R machinery really does. For example:
a = c(10,22,33,44,5,10,30,22,100,45,97)
y <- rnorm(a, mean=a, sd=1)
Any ideas?
The number of random numbers rnorm generates equals the length of a. From ?rnorm:
n: number of observations. If
‘length(n) > 1’, the length is taken
to be the number required.
To see what is happening when a is passed to the mean argument, it's easier if we change the example:
a = c(0, 10, 100)
y = rnorm(a, mean=a, sd=1)
[1] -0.4853138 9.3630421 99.7536461
So we generate length(a) random numbers with mean a[i].
a better example:
a <- c(0,10,100)
b <- c(2,4,6)
y <- rnorm(6,a,b)
y
result
[1] -1.2261425 10.1596462 103.3857481 -0.7260817 7.0812499 97.8964131
as you can see, for the first and fourth element of y, rnorm takes the first element of a as the mean and the first element of b as the sd.
For the second and fifth element of y, rnorm takes the second element of a as the mean and the second element of b as the sd.
For the third and sixth element of y, rnorm takes the third element of a as the mean and the third element of b as the sd.
you can experiment with diferent number in the first argument of rnorm to see what is happening
For example, what is happening if you use 5 instead 6 as the first argument in calling rnorm?

Resources