I first want to simulate correlated MVN data using the mvrnorm function from the MASS package. Then I want to repeat this simulation i times and fill results in a matrix so that first results are in columns i, i+1, second in i+2, i+3 and so on.
So far I did the following:
SimYCB <- c(73.1,60.6,59.6,54.5,57.9,61.14)
SimPCB <- c(15.7,18.25,22.38,20.22,16.53,18.616)
SimCB <- data.frame(SimYCB,SimPCB)
n=20
m=1000
MVSimCB = matrix()
for(i in 1:m)
{MVSimCB[,i]=mvrnorm(n, mu=mean(SimCB),
Sigma=cov(SimCB))}
What is the mistake?
May be this helps
MVSimCB <- matrix(,ncol=m, nrow=n)
set.seed(24)
for(i in seq(1,m, by=2)){
MVSimCB[, i:(i+1)] <- mvrnorm(n, mu=colMeans(SimCB), Sigma=cov(SimCB))
}
Or you could use replicate
set.seed(24)
MVSimCB2 <- do.call(cbind, replicate(m/2, mvrnorm(n, mu=colMeans(SimCB),
Sigma=cov(SimCB)), simplify=FALSE))
all.equal(MVSimCB, MVSimCB2, check.attributes=FALSE)
#[1] TRUE
Related
I would like to code a loop for cross-validation: computing MSE for a one- and a four-step forecast and store the results in a matrix. The problem I get is that the columns for the 1 to 3-step forecast get overwritten and I get just the 4-step forecast in all columns. Anybody can help?
k<-20
n<-length(xy)-1
h<-4
start <- tsp(xy) [1]+k
j <- n-k
mseQ1 <- matrix(NA,j,h)
colnames(mseQ1) <- paste0('h=',1:h)
for(i in 1:j)
{
xtrain <- window(xy, end=start+(i-1))
xvalid <- window(xy, start=start+i, end=start+i)
qualifiedETS <- ets(xtrain, alpha=NULL, beta=NULL, additive.only=TRUE, opt.crit="mse")
fcastHW <- forecast(qualifiedETS, h=h)
mseQ1[i,] <- ((fcastHW[['mean']]-xvalid)^2)
}
I'm running some Monte Carlo simulations of OLS estimation in which I conduct several versions of the same simulation for different beta values. To do this, I have set up a for loop to run the simulation (which has 1000 repetitions), and wrapped a second loop around this in which I want to assign the beta values.
So, I have set up 4 matrices to store the results of each version of the simulation, and I want to identify which matrix to write to using the for loop counter.
Here is a simple example of my setup:
reps = 1000
mat1 = matrix(NA, nrow=reps, ncol=2)
mat2 = matrix(NA, nrow=reps, ncol=2)
mat3 = matrix(NA, nrow=reps, ncol=2)
mat4 = matrix(NA, nrow=reps, ncol=2)
for(i in 1:4){
#Here I am going to alter my beta values for each iteration of i
for(j in 1:reps){
#Here I run my simulation and store values to mat1, mat2, mat3, mat4
#I want to store to mat1 on first iteration of j, mat2 on second etc.
model <- lm(Y~X)
mat[["i"]][j,1] <- model$coef[1]
mat[["i"]][j,2] <- model$coef[2]
}
}
For iteration 1 of the i loop I want mat[["i"]][j,1] to associate with column 1 of mat1, iteration 2 to mat2 etc. This does not work obviously as I have it coded here and I cannot figure out how to make it work.
I could accomplish this with if else statements on the value of i, but I'd like to avoid this if possible.
EDIT
Thanks for the help everyone! This worked:
reps = 1000
myMatList <- list()
for(i in 1:4){
#Here I am going to alter my beta values for each iteration of i
myMatList[[i]] <- matrix(NA, nrow=reps, ncol=2)
for(j in 1:reps){
#Here I run my simulation and store values to mat1, mat2, mat3, mat4
#I want to store to mat1 on first iteration of j, mat2 on second etc.
model <- lm(Y~X)
myMatList[[i]][j,1] <- model$coef[1]
myMatList[[i]][j,2] <- model$coef[2]
}
}
I am not sure as your code does not work, but I think this might help:
l <- list()
reps = 10
for(i in 1:4) {
l[[i]] <- matrix(NA, nrow=reps, ncol=2)
}
l[[1]][1, 1] # [1] NA
l[[1]][1, ]
My question is about how to improve the performance of function that downsamples from the columns of a matrix without replacement (a.k.a. "rarefication" of a matrix... I know there has been mention of this here, but I could not find a clear answer that a) does what I need; b) does it quickly).
Here is my function:
downsampled <- function(data,samplerate=0.8) {
data.test <- apply(data,2,function(q) {
names(q) <- rownames(data)
samplepool <- character()
for (i in names(q)) {
samplepool <- append(samplepool,rep(i,times=q[i]))
}
sampled <- sample(samplepool,size=samplerate*length(samplepool),replace = F)
tab <- table(sampled)
mat <- match(names(tab),names(q))
toret=numeric(length <- length(q))
names(toret) <- names(q)
toret[mat] <- tab
return(toret)
})
return(data.test)
}
I need to be downsampling matrices with millions of entries. I find this is quite slow (here I'm using a 1000x1000 matrix, which is about 20-100x smaller than my typical data size):
mat <- matrix(sample(0:40,1000*1000,replace=T),ncol=1000,nrow=1000)
colnames(mat) <- paste0("C",1:1000)
rownames(mat) <- paste0("R",1:1000)
system.time(matd <- downsampled(mat,0.8))
## user system elapsed
## 69.322 21.791 92.512
Is there a faster/easier way to perform this operation that I haven't thought of?
I think you can make this dramatically faster. If I understand what you are trying to do correctly, you want to down-sample each cell of the matrix, such that if samplerate = 0.5 and the cell of the matrix is mat[i,j] = 5, then you want to sample up to 5 things where each thing has a 0.5 chance of being sampled.
To speed things up, rather than doing all these operations on columns of the matrix, you can just loop through each cell of the matrix, draw n things from that cell by using runif (e.g., if mat[i,j] = 5, you can generate 5 random numbers between 0 and 1, and then add up the number of values that are < samplerate), and finally add the number of things to a new matrix. I think this effectively achieves the same down-sampling scheme, but much more efficiently (both in terms of running time and lines of code).
# Sample matrix
set.seed(23)
n <- 1000
mat <- matrix(sample(0:10,n*n,replace=T),ncol=n,nrow=n)
colnames(mat) <- paste0("C",1:n)
rownames(mat) <- paste0("R",1:n)
# Old function
downsampled<-function(data,samplerate=0.8) {
data.test<-apply(data,2,function(q){
names(q)<-rownames(data)
samplepool<-character()
for (i in names(q)) {
samplepool=append(samplepool,rep(i,times=q[i]))
}
sampled=sample(samplepool,size=samplerate*length(samplepool),replace = F)
tab=table(sampled)
mat=match(names(tab),names(q))
toret=numeric(length = length(q))
names(toret)<-names(q)
toret[mat]<-tab
return(toret)
})
return(data.test)
}
# New function
downsampled2 <- function(mat, samplerate=0.8) {
new <- matrix(0, nrow(mat), ncol(mat))
colnames(new) <- colnames(mat)
rownames(new) <- rownames(mat)
for (i in 1:nrow(mat)) {
for (j in 1:ncol(mat)) {
new[i,j] <- sum(runif(mat[i,j], 0, 1) < samplerate)
}
}
return(new)
}
# Compare times
system.time(downsampled(mat,0.8))
## user system elapsed
## 26.840 3.249 29.902
system.time(downsampled2(mat,0.8))
## user system elapsed
## 4.704 0.247 4.918
Using an example 1000 X 1000 matrix, the new function I provided runs about 6 times faster.
One source of savings would be to remove the for loop that appends samplepool using rep. Here is a reproducible example:
myRows <- 1:5
names(myRows) <- letters[1:5]
# get the repeated values for sampling
samplepool <- rep(names(myRows), myRows)
Within your function, this would be
samplepool <- rep(names(q), q)
In a quite big data frame, I have to pick up some random rows to execute a function. In my example, the first function I use is the variance and then a function closed to the real one I use in my script, called after f. I do not detail the purpose of f but it deals with truncated Gaussian distribution and maximum-likelihood estimation.
My problem is that my code is way too slow with the second function and I suppose a bit of optimization of the for loop or the sample function could help me.
Here is the code :
df <- as.data.frame(matrix(0,2e+6,2))
df$V1 <- runif(nrow(df),0,1)
df$V2 <- sample(c(1:10),nrow(df), replace=TRUE)
nb.perm <- 100 # number of permutations
res <- c()
for(i in 1:nb.perm) res <- rbind(res,tapply(df[sample(1:nrow(df)),"V1"],df$V2,var))
library(truncnorm)
f <- function(d) # d is a vector
{
f2 <- function(x) -sum(log(dtruncnorm(d, a=0, b=1, mean = x[1], sd = x[2])))
res <- optim(par=c(mean(d),sd(d)),fn=f2)
if(res$convergence!=0) warning("Optimization has not converged")
return(list(res1=res$par[1],res2=res$par[2]^2))
}
for(i in 1:nb.perm) res2 <- rbind(res,tapply(df[sample(1:nrow(df)),"V1"],df$V2,function(x) f(x)$res2))
I hope I am clear enough.
I have a function myF(g,m,alpha,gam,theta,beta). Which returns three estimates of parameters. I want to iterate this function for (i in 1:10). How can i do this it in R?
myF <- function(g,m,alpha,gam,theta,beta){
dat <- sim.data(g,m,alpha,gam,theta,beta)
time <- dat$times
delta <- dat$cens
i <- dat$group
X1<-dat$cov #cov~rbinom
n <- length(levels(as.factor(i)))
di <- aggregate(delta,by=list(i),FUN=sum)[,2]
D <- sum(di)
loglik <- function(par){
.........................................
return(-lik)
}
initial=c(0.5,0.5,-0.5,0.5)
maxF <- nlm(loglik, initial)
return(c(theta=exp(maxF$estimate[2]),beta1=maxF$estimate[3],alpha=exp(maxF$estimate[2])))
}
This can easily be done using replicate:
replicate(10, myF(g,m,alpha,gam,theta,beta))
This will create a 3*10 matrix of the parameter estimates, where each column is the result of a separate iteration.