I'm running some Monte Carlo simulations of OLS estimation in which I conduct several versions of the same simulation for different beta values. To do this, I have set up a for loop to run the simulation (which has 1000 repetitions), and wrapped a second loop around this in which I want to assign the beta values.
So, I have set up 4 matrices to store the results of each version of the simulation, and I want to identify which matrix to write to using the for loop counter.
Here is a simple example of my setup:
reps = 1000
mat1 = matrix(NA, nrow=reps, ncol=2)
mat2 = matrix(NA, nrow=reps, ncol=2)
mat3 = matrix(NA, nrow=reps, ncol=2)
mat4 = matrix(NA, nrow=reps, ncol=2)
for(i in 1:4){
#Here I am going to alter my beta values for each iteration of i
for(j in 1:reps){
#Here I run my simulation and store values to mat1, mat2, mat3, mat4
#I want to store to mat1 on first iteration of j, mat2 on second etc.
model <- lm(Y~X)
mat[["i"]][j,1] <- model$coef[1]
mat[["i"]][j,2] <- model$coef[2]
}
}
For iteration 1 of the i loop I want mat[["i"]][j,1] to associate with column 1 of mat1, iteration 2 to mat2 etc. This does not work obviously as I have it coded here and I cannot figure out how to make it work.
I could accomplish this with if else statements on the value of i, but I'd like to avoid this if possible.
EDIT
Thanks for the help everyone! This worked:
reps = 1000
myMatList <- list()
for(i in 1:4){
#Here I am going to alter my beta values for each iteration of i
myMatList[[i]] <- matrix(NA, nrow=reps, ncol=2)
for(j in 1:reps){
#Here I run my simulation and store values to mat1, mat2, mat3, mat4
#I want to store to mat1 on first iteration of j, mat2 on second etc.
model <- lm(Y~X)
myMatList[[i]][j,1] <- model$coef[1]
myMatList[[i]][j,2] <- model$coef[2]
}
}
I am not sure as your code does not work, but I think this might help:
l <- list()
reps = 10
for(i in 1:4) {
l[[i]] <- matrix(NA, nrow=reps, ncol=2)
}
l[[1]][1, 1] # [1] NA
l[[1]][1, ]
Related
I am trying to write the following function without the for loop. Note that I am trying to replicate the function diag().
selfdiag <- function(a) {
j <- c()
for (i in 1:ncol(a)) {
j[i] <- a[i, i]
}
return(j)
}
Consider that:
mat <- matrix(rnorm(4), ncol = 2)
The function selfdiag() should create the same result as diag().
Thanks for any help on this.
You can create a data frame with the row and column indices for the diagonal and use it to extract the diagonal values from the matrix.
mat <- matrix(rnorm(4), ncol = 2)
The diag() way to do it -
diag(mat)
[1] -0.5004046 -0.8785558
The other way to do it -
rows_cols <- data.frame(rows = c(1:ncol(mat)), cols = c(1:ncol(mat)))
mat2 <- mat[as.matrix(rows_cols)]
mat2
[1] -0.5004046 -0.8785558
Hope this helps!
Given a data matrix with n rows and m columns, I would like to calculate the total sum of squares in R.
For this I've tried a loop that iterates through the rows of each column and saves the results in a vector. These are then added to the "TSS" vector where each value is the SS of one column. The sum of this vector should be the TSS.
set.seed(2020)
m <- matrix(c(sample(1:100, 80)), nrow = 40, ncol = 2)
tss <- c()
for(j in 1:ncol(m)){
tssVec <- c()
for(i in 1:nrow(m)){
b <- sum(((m[i,]) - mean(m[,j]))^2)
tssVec <- c(tssVec, b)
}
tss <- c(tss, sum(tssVec))
}
sum(tss)
The output is equal to 136705.6. This is not feasible at all. As a novice coder, I am unfortunately stuck.
Any help is appreciated!
There are many methods to evaluate the TSS, of course they will give you the same result. I would do something like:
Method 1 that implies the use of ANOVA:
n <- as.data.frame(m)
mylm <- lm(n$V1 ~ n$V2)
SSTotal <-sum(anova(mylm)[,2])
Method 2:
SSTotal <- var( m[,1] ) * (nrow(m)-1)
I would like to code a loop for cross-validation: computing MSE for a one- and a four-step forecast and store the results in a matrix. The problem I get is that the columns for the 1 to 3-step forecast get overwritten and I get just the 4-step forecast in all columns. Anybody can help?
k<-20
n<-length(xy)-1
h<-4
start <- tsp(xy) [1]+k
j <- n-k
mseQ1 <- matrix(NA,j,h)
colnames(mseQ1) <- paste0('h=',1:h)
for(i in 1:j)
{
xtrain <- window(xy, end=start+(i-1))
xvalid <- window(xy, start=start+i, end=start+i)
qualifiedETS <- ets(xtrain, alpha=NULL, beta=NULL, additive.only=TRUE, opt.crit="mse")
fcastHW <- forecast(qualifiedETS, h=h)
mseQ1[i,] <- ((fcastHW[['mean']]-xvalid)^2)
}
My question is about how to improve the performance of function that downsamples from the columns of a matrix without replacement (a.k.a. "rarefication" of a matrix... I know there has been mention of this here, but I could not find a clear answer that a) does what I need; b) does it quickly).
Here is my function:
downsampled <- function(data,samplerate=0.8) {
data.test <- apply(data,2,function(q) {
names(q) <- rownames(data)
samplepool <- character()
for (i in names(q)) {
samplepool <- append(samplepool,rep(i,times=q[i]))
}
sampled <- sample(samplepool,size=samplerate*length(samplepool),replace = F)
tab <- table(sampled)
mat <- match(names(tab),names(q))
toret=numeric(length <- length(q))
names(toret) <- names(q)
toret[mat] <- tab
return(toret)
})
return(data.test)
}
I need to be downsampling matrices with millions of entries. I find this is quite slow (here I'm using a 1000x1000 matrix, which is about 20-100x smaller than my typical data size):
mat <- matrix(sample(0:40,1000*1000,replace=T),ncol=1000,nrow=1000)
colnames(mat) <- paste0("C",1:1000)
rownames(mat) <- paste0("R",1:1000)
system.time(matd <- downsampled(mat,0.8))
## user system elapsed
## 69.322 21.791 92.512
Is there a faster/easier way to perform this operation that I haven't thought of?
I think you can make this dramatically faster. If I understand what you are trying to do correctly, you want to down-sample each cell of the matrix, such that if samplerate = 0.5 and the cell of the matrix is mat[i,j] = 5, then you want to sample up to 5 things where each thing has a 0.5 chance of being sampled.
To speed things up, rather than doing all these operations on columns of the matrix, you can just loop through each cell of the matrix, draw n things from that cell by using runif (e.g., if mat[i,j] = 5, you can generate 5 random numbers between 0 and 1, and then add up the number of values that are < samplerate), and finally add the number of things to a new matrix. I think this effectively achieves the same down-sampling scheme, but much more efficiently (both in terms of running time and lines of code).
# Sample matrix
set.seed(23)
n <- 1000
mat <- matrix(sample(0:10,n*n,replace=T),ncol=n,nrow=n)
colnames(mat) <- paste0("C",1:n)
rownames(mat) <- paste0("R",1:n)
# Old function
downsampled<-function(data,samplerate=0.8) {
data.test<-apply(data,2,function(q){
names(q)<-rownames(data)
samplepool<-character()
for (i in names(q)) {
samplepool=append(samplepool,rep(i,times=q[i]))
}
sampled=sample(samplepool,size=samplerate*length(samplepool),replace = F)
tab=table(sampled)
mat=match(names(tab),names(q))
toret=numeric(length = length(q))
names(toret)<-names(q)
toret[mat]<-tab
return(toret)
})
return(data.test)
}
# New function
downsampled2 <- function(mat, samplerate=0.8) {
new <- matrix(0, nrow(mat), ncol(mat))
colnames(new) <- colnames(mat)
rownames(new) <- rownames(mat)
for (i in 1:nrow(mat)) {
for (j in 1:ncol(mat)) {
new[i,j] <- sum(runif(mat[i,j], 0, 1) < samplerate)
}
}
return(new)
}
# Compare times
system.time(downsampled(mat,0.8))
## user system elapsed
## 26.840 3.249 29.902
system.time(downsampled2(mat,0.8))
## user system elapsed
## 4.704 0.247 4.918
Using an example 1000 X 1000 matrix, the new function I provided runs about 6 times faster.
One source of savings would be to remove the for loop that appends samplepool using rep. Here is a reproducible example:
myRows <- 1:5
names(myRows) <- letters[1:5]
# get the repeated values for sampling
samplepool <- rep(names(myRows), myRows)
Within your function, this would be
samplepool <- rep(names(q), q)
I first want to simulate correlated MVN data using the mvrnorm function from the MASS package. Then I want to repeat this simulation i times and fill results in a matrix so that first results are in columns i, i+1, second in i+2, i+3 and so on.
So far I did the following:
SimYCB <- c(73.1,60.6,59.6,54.5,57.9,61.14)
SimPCB <- c(15.7,18.25,22.38,20.22,16.53,18.616)
SimCB <- data.frame(SimYCB,SimPCB)
n=20
m=1000
MVSimCB = matrix()
for(i in 1:m)
{MVSimCB[,i]=mvrnorm(n, mu=mean(SimCB),
Sigma=cov(SimCB))}
What is the mistake?
May be this helps
MVSimCB <- matrix(,ncol=m, nrow=n)
set.seed(24)
for(i in seq(1,m, by=2)){
MVSimCB[, i:(i+1)] <- mvrnorm(n, mu=colMeans(SimCB), Sigma=cov(SimCB))
}
Or you could use replicate
set.seed(24)
MVSimCB2 <- do.call(cbind, replicate(m/2, mvrnorm(n, mu=colMeans(SimCB),
Sigma=cov(SimCB)), simplify=FALSE))
all.equal(MVSimCB, MVSimCB2, check.attributes=FALSE)
#[1] TRUE