R code Gaussian mixture -- numerical expression has 2 elements: only the first used - r

I'm trying to create a Gaussian Mix function according to these parameters:
For each sample, roll a die with k sides
If the j-th side appears from the roll, draw a sample from Normal(muj, sdj) where muj and sdj are the mean and standard deviation for the j-th Normal distribution respectively. This means you should have k different Normal distributions to choose from. Note that muj is the mathematical form of referring to the j-th element in a vector called mus.
The resulting sample from this Normal is then from a Gaussian Mixture.
Where:
n, an integer that represents the number of independent samples you want from this random variable
mus, a numeric vector with length k
sds, a numeric vector with length k
prob, a numeric vector with length k that indicates the probability of choosing the different Gaussians. This should have a default to NULL.
This is what I came up with so far:
n <- c(1)
mus <- c()
sds <- c()
prob <- c()
rgaussmix <- function(n, mus, sds, prob = NULL){
if(length(mus) != length(sds)){
stop("mus and sds have different lengths")
}
for(i in 1:seq_len(n)){
if(is.null(prob)){
rolls <- c(NA, n)
rolls <- sample(c(1:length(mus)), n, replace=TRUE)
avg <- rnorm(length(rolls), mean=mus[rolls], sd=sds[rolls])
}else{
rolls <- c(NA, n)
rolls <- sample(c(1:length(mus), n, replace=TRUE, p=prob))
avg <- rnorm(length(rolls), mean=mus[rolls], sd=sds[rolls])
}
}
return(avg)
}
rgaussmix(2, 1:3, 1:3)
It seems to match most of the requirements, but it keeps giving me the following error:
numerical expression has 2 elements: only the first usednumber of items to replace is not a multiple of replacement length
I've tried looking at the lengths of multiple variables, but I can't seem to figure out where the error is coming from!
Could someone please help me?

If you do seq_len(2) it gives you:
[1] 1 2
And you cannot do 1:(1:2) .. it doesn't make sense
Also you can avoid the loops in your code, by sampling the number of tries you need, for example if you do:
rnorm(3,c(0,10,20),1)
[1] -0.507961 8.568335 20.279245
It gives you 1st sample from the 1st mean, 2nd sample from 2nd mean and so on. So you can simplify your function to:
rgaussmix <- function(n, mus, sds, prob = NULL){
if(length(mus) != length(sds)){
stop("mus and sds have different lengths")
}
if(is.null(prob)){
prob = rep(1/length(mus),length(mus))
}
rolls <- sample(length(mus), n, replace=TRUE, p=prob)
avg <- rnorm(n, mean=mus[rolls], sd=sds[rolls])
avg
}
You can plot the results:
plot(density(rgaussmix(10000,c(0,5,10),c(1,1,1))),main="mixture of 0,5,10")

Related

Stochastic parameter estimation

I have made a pandemic stochastic simulator which takes probabilities of an infection, recovery or neither and uses a gillespie algorithm with vectors to determine the number of people in each category at each time. I want to carry out a simulation study and use maximum liklihood estimation to get parameter estimates for my simulations. It worked perfectly for the SI model but in this model i get the following error codes that i cannot understand. When i run just the function MLE i get scalars and I can even produce the vector J. But when i try and use optim it tells me that the function PL isnt a scalar when i know it is. Any help would be greatly appreciated thanks
#SIR 100 DAYS WITH 10 INTERVALS A DAY
T<-100 #Setting the number of intervals
dt<-0.01 #Setting the interval lengths
B<-1.5 #Setting Beta
N<-50 #Setting population size
Y<-0.5 #Setting recovery rate
r<-function(i){runif(1,0,1)} #Random number generator
S<-c(1:T)
I<-c(1:T)
R<-c(1:T)
I1<-c(1:T)
I2<-c(1:T)
I3<-c(1:T)
It<-c(1:T)
Time<-c(1:T)
I[1]<-1
S[1]<-N-I[1]
R[1]<-0
It[1]<-I[1]
P1<-function(t){(B)*(I[t])*(S[t])*(dt)*(1/N)} #Creates first event interval(Infection)
P2<-function(t){(Y)*(I[t])*(dt)+(B)*(I[t])*(S[t])*(dt)*(1/N)} #Creates 2nd event interval(Recovery)
P3<-function(t){1} #Creates 3rd event interval (No transition)
PI1<-function(t){(I1[t])/I[t]} #Creates interval for recovery from first group
PI2<-function(t){((I1[t])/I[t])+((I2[t])/I[t])} #Creates interval for recovery from third group
PI3<-function(t){1} #Creates interval for recovery from first group
for(i in 2:T){
x<-r(i)
if(x<P1(i-1)){ #If an infection occurs
S[i]<-S[i-1]-1
I[i]<-I[i-1]+1
R[i]<-R[i-1]
It[i]<-It[i-1]+1
}
else if(x<P2(i-1)){ #If a recovery occurs
S[i]<-S[i-1]
I[i]<-I[i-1]-1
R[i]<-R[i-1]+1
It[i]<-It[i-1]}
else{ #If no transition occurs
S[i]<-S[i-1]
I[i]<-I[i-1]
R[i]<-R[i-1]
It[i]<-It[i-1]}
}
n<-c(1:T)
for(i in 1:T){
n[i]<-S[i]+I[i]+R[i]}
n
S
I
R
Data<-cbind.data.frame(Time,S,I,R,n,It) #Create a dataframe for ease of manipulations
Data$EventInfection<-0
Data$EventRecovery<-0
Data$EventNotransition<-0
for(i in 2:T){if(Data$It[i]>Data$It[i-1]){Data$EventInfection[i]<-1} #Event indiciators to make Liklihood easier
else if(Data$R[i]>Data$R[i-1]){Data$EventRecovery[i]<-1}
else{Data$EventNotransition[i]<-1}}
PL<-function(i,b,y){((b*S[i]*I[i]*dt*(1/N))^Data$EventInfection[[i]])*((I[i]*(y)*dt)^Data$EventRecovery[[i]])*((1-(b*S[i]*I[i]*dt*(1/N))-((y)*I[i]*dt))^Data$EventNotransition[[i]])}
MLE<-function(b,y){
J<<-c(1:T)
for(i in 1:T){
J[i]<<-log(PL(i,b,y))}
return(sum(J))}
MLE(1,0.5)
optim(c(1,1), MLE, y=1)
Warning messages:
1: In J[i] <- log(PL(i, b, y)) :
number of items to replace is not a multiple of replacement length
2: In J[i] <- log(PL(i, b, y)) :
number of items to replace is not a multiple of replacement length
3: In J[i] <- log(PL(i, b, y)) :
number of items to replace is not a multiple of replacement length
4: In J[i] <- log(PL(i, b, y)) :
number of items to replace is not a multiple of replacement length
5: In J[i] <- log(PL(i, b, y)) :
number of items to replace is not a multiple of replacement length
MLE() takes two variables, yet you gave the optim() function three parameters. Essentially, the optim() function expects b in your MLE function to be a vector of two spots. If you wanted to optimize b and y, for example, this will work.
MLE <- function(b){
J <<- vector(length = Ti)
for(i in 1:Ti){
J[i] <<- log(PL(i, b[1], b[2]))
}
return(sum(J))
}
MLE(c(1, 0.5))
optim(c(1, 1), MLE)
Now b is b[1] and y is b[2]. I'm not sure if that's what you wanted to optimize, though.

Matrices in R: number of items to replace is not a multiple of replacement length

I'm trying to initialize an empty matrix X with n rows and 4 columns. And then allocating a vector of random stock values to each column. Each column represents a different stock.
I know I should apply X[,i]=cumsum(X[,i]) eventually, to get actual stock values, but that can only be done after allocating the values in the first place.
#Inputs mean return, volatility, time period and time step
mu=0.25; sigma=2; T=1; n=2^(12); X0=5;
#Generating trajectories for stocks
#NOTE: Seed is fixed. Changing seed will produce
#different trajectories
dt=T/n
t=seq(0,T,by=dt)
set.seed(201)
X <- matrix(nrow = n, ncol = 4)
for(i in 1:4){
X[,i] <- c(X0,mu*dt+sigma*sqrt(dt)*rnorm(n,mean=0,sd=1))
}
After running the code, I get the error message:
Error in X[, i] <- c(X0, mu * dt + sigma * sqrt(dt) * rnorm(n, mean = 0, :
number of items to replace is not a multiple of replacement length

How can I simulate m random samples of size n from a given distribution with R?

I know how to generate a random sample of size n from a standard statistical distribution, say exponential. But if I want to generate m such random samples of size n (i.e. m vectors of dimension n) how can I do it?
To create a n by m matrix containing m samples of size n you can use:
x <- replicate(m, rnorm(n, ...))
Obviously substituting rnorm with other distributions if desired. If you then want to store these in separate individual vectors then you can use
v <- x[ , i]
This puts the ith column of x into v, which corresponds to the ith sample. It may be easier/quicker to just use a simple for loop altogether though:
for(i in 1:m){
name <- paste("V", i, sep = "")
assign(name, rnorm(n, ...))
}
This generates a random sample at each iteration, and for stage i, names the sample Vi. By the end of it you'll have m random samples named V1, V2, ..., Vm.

Matrix computation with for loop

I am newcomer to R, migrated from GAUSS because of the license verification issues.
I want to speed-up the following code which creates n×k matrix A. Given the n×1 vector x and vectors of parameters mu, sig (both of them k dimensional), A is created as A[i,j]=dnorm(x[i], mu[j], sigma[j]). Following code works ok for small numbers n=40, k=4, but slows down significantly when n is around 10^6 and k is about the same size as n^{1/3}.
I am doing simulation experiment to verify the bootstrap validity, so I need to repeatedly compute matrix A for #ofsimulation × #bootstrap times, and it becomes little time comsuming as I want to experiment with many different values of n,k. I vectorized the code as much as I could (thanks to vector argument of dnorm), but can I ask more speed up?
Preemptive thanks for any help.
x = rnorm(40)
mu = c(-1,0,4,5)
sig = c(2^2,0.5^2,2^2,3^2)
n = length(x)
k = length(mu)
A = matrix(NA,n,k)
for(j in 1:k){
A[,j]=dnorm(x,mu[j],sig[j])
}
Your method can be put into a function like this
A.fill <- function(x,mu,sig) {
k <- length(mu)
n <- length(x)
A <- matrix(NA,n,k)
for(j in 1:k) A[,j] <- dnorm(x,mu[j],sig[j])
A
}
and it's clear that you are filling the matrix A column by column.
R stores the entries of a matrix columnwise (just like Fortran).
This means that the matrix can be filled with a single call of dnorm using suitable repetitions of x, mu, and sig. The vector z will have the columns of the desired matrix stacked. and then the matrix to be returned can be formed from that vector just by specifying the number of rows an columns. See the following function
B.fill <- function(x,mu,sig) {
k <- length(mu)
n <- length(x)
z <- dnorm(rep(x,times=k),rep(mu,each=n),rep(sig,each=n))
B <- matrix(z,nrow=n,ncol=k)
B
}
Let's make an example with your data and test this as follows:
N <- 40
set.seed(11)
x <- rnorm(N)
mu <- c(-1,0,4,5)
sig <- c(2^2,0.5^2,2^2,3^2)
A <- A.fill(x,mu,sig)
B <- B.fill(x,mu,sig)
all.equal(A,B)
# [1] TRUE
I'm assuming that n is an integer multiple of k.
Addition
As noted in the comments B.fill is quite slow for large values of n.
The reason lies in the construct rep(...,each=...).
So is there a way to speed A.fill.
I tested this function:
C.fill <- function(x,mu,sig) {
k <- length(mu)
n <- length(x)
sapply(1:k,function(j) dnorm(x,mu[j],sig[j]), simplify=TRUE)
}
This function is about 20% faster than A.fill.

generate random integers between two values with a given probability using R

I have the following four number sets:
A=[1,207];
B=[208,386];
C=[387,486];
D=[487,586].
I need to generate 20000 random numbers between 1 and 586 in which the probability that the generated number belongs to A is 1/2 and to B,C,D is 1/6.
in which way I can do this using R?
You can directly use sample, more specifcally the probs argument. Just divide the probability over all the 586 numbers. Category A get's 0.5/207 weight each, etc.
A <- 1:207
B <- 208:386
C <- 387:486
D <- 487:586
L <- sapply(list(A, B, C, D), length)
x <- sample(c(A, B, C, D),
size = 20000,
prob = rep(c(1/2, 1/6, 1/6, 1/6) / L, L),
replace = TRUE)
I would say use the Roulette selection method. I will try to give a brief explanation here.
Take a line of say length 1 unit. Now break this in proportion of the probability values. So in our case, first piece will be of 1.2 length and next three pieces will be of 1/6 length. Now sample a number between 0,1 from uniform distribution. As all the number have same probability of occurring, a sampled number belonging to a piece will be equal to length of the piece. Hence which ever piece the number belongs too, sample from that vector. (I will give you the R code below you can run it for a huge number to check if what I am saying is true. I might not be doing a good job of explaining it here.)
It is called Roulette selection because another analogy for the same situation can be, take a circle and split it into sectors where the angle of each sector is proportional to the probability values. Now sample a number again from uniform distribution and see which sector it falls in and sample from that vector with the same probability
A <- 1:207
B <- 208:386
C <- 387:486
D <- 487:586
cumList <- list(A,B,C,D)
probVec <- c(1/2,1/6,1/6,1/6)
cumProbVec <- cumsum(probVec)
ret <- NULL
for( i in 1:20000){
rand <- runif(1)
whichVec <- which(rand < cumProbVec)[1]
ret <- c(ret,sample(cumList[[whichVec]],1))
}
#Testing the results
length(which(ret %in% A)) # Almost 1/2*20000 of the values
length(which(ret %in% B)) # Almost 1/6*20000 of the values
length(which(ret %in% C)) # Almost 1/6*20000 of the values
length(which(ret %in% D)) # Almost 1/6*20000 of the values

Resources