Saving output from for-loop to 3D array in R - r

I am working in R to save outputs from a 'for' loop in to a 3D matrix. I have been unable to adapt a similar example answered here for my purposes, so I'd like to share a different example.
I have a mostly-completed "for" loop that generates slopes and intercepts from a linear model for N iterations; with each iteration using a new set of y-values with a random t-distribution ('rt').
The desired resulting output is a 3D matrix with two slices, here named "out2". One slice is named "Intercept" and the other is "Slope." Each column in both of the sheets is a result from the model generated with different degrees of dreedom (dfs)
set.seed(14)
x <- sample(0:50, 15) # Generate x-values for simulation
true.a <- 1.5 # Intercept for linear relationship
true.m <- 5 # Slope for linear relationship
dfs <- c(1,2,3,4,6,8,10,15,20,25) # Degrees of freedom
N <- 1000 # Reps in for-loop
out2 <- array(NA, dim=c(N, length(dfs), 2))
dimnames(out2) <- list(NULL, dfs, c("Intercept", "Slope"))
for(j in 1:length(dfs)) {
df.tdist <- dfs[j]
for(i in 1:N) {
y <- true.a + true.m * x + 25*rt(15,df.tdist)
fit <- lm(y ~ x)
out2[ ] <- ?????????????
# The output array 'out2' will consist of two "slices", one with intercepts
and one with slopes. The length of each slice is 1000 rows, and the
width of each slice is 10 columns
}
}
Thanks greatly in advance for your feedback.

Related

Creating a point distance component to a monte carlo simulation function in R

I am attempting to do some Monte Carlo simulations, where I have a population of 325 samples in a field. I want to create a list of composite samples (samples consisting of multiple subsamples) from the dataset, while increasing sample size, repeated 100 times. I have created the function that will do so, and have supplied that below in the code.
##Create an example data set
# x and y are coordinates
x <- c(1:100)
y <- rev(c(1:100))
## z and w are soil test values
set.seed(2345)
z <- rnorm(100,mean=50, sd=10)
set.seed(2345)
w <- rnorm(100, mean=75, sd=5)
data <- data.frame(x, y, z, w)
##Initialize list
data.step.sim.list <- list()
## Code that increases sample size
for(i in seq_len(nrow(data))){
thisdat <- replicate(100,data[sample(1:nrow(data), size=i, replace = F),], simplify = F)
data.step.sim.list[[i]] <- thisdat
}
The product becomes a list n long (n being length of dataset), with each list consisting of a list of 100 dataframes (100 coming from 100 replications) that are length 1:n length long.
I have x and y data for each sample as well, and want to stipulate that each subsample collected would be at least 'm' meters from the other samples.
I have created a function that will calculate each distance seen below. I cannot find a way to implement this into my current code. Would anyone know how to do this?
#function to compute distances
calc.dist <- function(x1, y1, x2, y2) {
d <- sqrt(((x2 - x1)^2) + ((y2 - y1)^2))
return(d)
} #end function calc.dist

How do I add new columns to a data set for each regression loop iteration?

I'm trying to test the predictive power of a model by breaking the observations into 1/4th and 3/4th groups (test and train respectively), running a first-order regression with the independent variable train sample, using these coefficients to produce predicted values from the independent variable test sample, and then I would like to add new columns of these predicted values to the dependent variable test data for each iteration of the loop.
For context: TSIP500 is the full sample; iv is independent variable; dv is dependent variable, a max of 50 iterations is simply a test that isn't too large in quantity of iterations.
I was having trouble with the predict function so I did the equation manually. My code is below:
for(i in 1:50){
test_index <- sample(nrow(TSIP500iv), (1/4)*nrow(TSIP500iv), replace=FALSE)
train_500iv <- TSIP500[-test_index,"distance"]
test_500iv <- TSIP500[test_index,"distance"]
train_500dv <- TSIP500[-test_index,"percent_of_max"]
test_500dv <- TSIP500[test_index,"percent_of_max"]
reg_model <- lm(train_500dv~train_500iv)
int <- reg_model$coeff[1]
B1 <- reg_model$coeff[2]
predicted <- (int + B1*test_500iv)
predicted <- data.frame(predicted)
test_500dv <- data.frame(test_500dv)
test_500dv[,i] <- apply(predicted)
}
I've tried different approaches for the last line, but I always just get a singular column added. Any help would be tremendously appreciated.
for(i in 1:50){
test_index <- sample(nrow(TSIP500iv), (1/4)*nrow(TSIP500iv), replace=FALSE)
train_500iv <- TSIP500[-test_index,"distance"]
test_500iv <- TSIP500[test_index,"distance"]
train_500dv <- TSIP500[-test_index,"percent_of_max"]
test_500dv <- TSIP500[test_index,"percent_of_max"]
reg_model <- lm(train_500dv~train_500iv)
int <- reg_model$coeff[1]
B1 <- reg_model$coeff[2]
temp_results <- paste('pred',i,sep='_')
assign(temp_results, as.data.frame(int + B1*test_500iv))
test_500dv <- cbind(data.frame(test_500dv),temp_results)
}

Computing Spearman's rho for increasing subsets of rows in for Loop

I am trying to fit a for Loop in R in order to run correlations for multiple subsets in a data frame and then store the results in a vector.
What I have in this loop is a data frame with 2 columns, x and y, and 30 rows of different continuous measurement values in each column. The process should be repeated 100 times. The data can be invented.
What I need, is to compute the Spearman's rho for the first five rows (between x and y) and then for increasing subsets (e.g., the sixth first rows, the sevenths first rows etc.). Then, I'd need to store the rho results in a vector that I can further use.
What I had in mind (but does not work):
sortvector <- 1:(30)
for (i in 1:100)
{
sortvector <- sample(sortvector, replace = F)
xtemp <- x[sortvector]
rho <- cor.test(xtemp,y, method="spearman")$estimate
}
The problem is that the code gives me one value of rho for the whole dataframe, but I need it for increments of subsets.
How can I get rho for subsets of increasing values in a for-loop? And how can i store the coefficients in a vector that i can use afterwards?
Any help would be much appreciated, thanks.
Cheers
The easiest approach is to convertfor loop into sapply function, which returns a vector of rho's as a result of your bootstrapping:
sortvector <- 1:(30)
x <- rnorm(30)
y <- rnorm(30)
rho <- sapply(1:100, function(i) {
sortvector <- sample(sortvector, replace = F)
xtemp <- x[sortvector]
cor.test(xtemp, y, method = "spearman")$estimate
})
head(rho)
Output:
rho rho rho rho rho rho
0.014460512 -0.239599555 0.003337041 -0.126585095 0.007341491 0.264516129

Calculating divergence between joint posterior distributions

I wish to calculate the distance between two 3-dimensional posterior distributions. The draws are stored at two 30,000x3 matrices.
So far I have been successful in calculating Total Variation distance between two 2-dimensional posteriors (two 30,000x2 matrices) by splitting the grid into bins. However, I am having trouble calculating the divergence between posteriors with more parameters. Some examples of related distance measures can be found here.
NOTE: I do not wish to calculate the distance between the marginals (column-wise entries), rather than obtain an overall value after comparing the joint distributions in R.
I would really appreciate it if somebody could point out what I am missing here.
EDIT 1: Some example code for calculating Total variation distance between posterior samples stored in two matrices has been added below:
EDIT 2: This is a R question.
set.seed(123)
comparison.2D <- matrix(rnorm(40000*2,0,1),ncol=2)
ground.truth.2D <- matrix(rnorm(40000*2,0,2),ncol=2)
# Function to calculate TVD between matrices with 2 columns:
Total.Variation.Distance.2D<-function(true,
comparison,
burnin,
window.size){
# Bandwidth for theta.1.
my_bw_x<-window.size
# Bandwidth for theta.2.
my_bw_y<-window.size
range_x<-range(c(true[-c(1:burnin),1],comparison[-c(1:burnin),1]))
range_y<-range(c(true[-c(1:burnin),2],comparison[-c(1:burnin),2]))
xx <- seq(range_x[1],range_x[2],by=my_bw_x)
yy <- seq(range_y[1],range_y[2],by=my_bw_y)
true.pointidxs <- matrix( c( findInterval(true[-c(1:burnin),1], xx),
findInterval(true[-c(1:burnin),2], yy) ), ncol=2)
comparison.pointidxs <- matrix( c( findInterval(comparison[-c(1:burnin),1], xx),
findInterval(comparison[-c(1:burnin),2], yy) ), ncol=2)
# Count the frequencies in the corresponding cells:
square.mat.dims <- max(length(xx),nrow=length(yy))
frequencies.true <- frequencies.comparison <- matrix(0, ncol=square.mat.dims, nrow=square.mat.dims)
for (i in 1:dim(true.pointidxs)[1]){
frequencies.true[true.pointidxs[i,1], true.pointidxs[i,2]] <- frequencies.true[true.pointidxs[i,1],
true.pointidxs[i,2]] + 1
frequencies.comparison[comparison.pointidxs[i,1], comparison.pointidxs[i,2]] <- frequencies.comparison[comparison.pointidxs[i,1],
comparison.pointidxs[i,2]] + 1
}# End for
# Normalize frequencies matrix:
frequencies.true <- frequencies.true/dim(true.pointidxs)[1]
frequencies.comparison <- frequencies.comparison/dim(comparison.pointidxs)[1]
TVD <-0.5*sum(abs(frequencies.comparison-frequencies.true))
return(TVD)
}# End function
TVD.2D <- Total.Variation.Distance.2D(true=ground.truth.2D, comparison=comparison.2D,burnin=10000,window.size=0.05)

Computing linear regressions for every possible permutation of matrix columns

I have a (k x n) matrix. I have initially managed to linearly regress (using the lm function) column 1 with each and every other column and extracted only the coefficients.
fore.choose <- matrix(0, 1, NCOL(assets))
for(i in seq(1, NCOL(assets), 1))
{
abc <- lm(assets[,1]~assets[,i])$coefficients
fore.choose[1,i] <- abc[2:length(abc)]
}
The coefficients are placed in the fore.choose matrix.
What I now need to do is to linearly regress column 2 with each and every other column, and then column 3 and so on and so forth and extract only the coefficients.
The output will be a square matrix of OLS univariate coefficients. Kind of similar to a correlation matrix, but it is the beta coefficients I am interested in.
fore.choose <- matrix(0, 1, NCOL(assets))
will initially need to become
fore.choose <- matrix(0, NCOL(assets), NCOL(assets))
I'd just compute the coefficients directly from the correlation matrix, using beta = cor(x,y)*sd(x)/sd(y), like this:
# set up some sample data
set.seed(1)
d <- matrix(rnorm(50), ncol=5)
# get the coefficients
s <- apply(d, 2, sd)
cor(d)*outer(s, s, "/")
You could also use lsfit to get the coefficients of one term on all the others at once and then only have one loop to do:
sapply(1:ncol(d), function(i) {
coef(lsfit(d[,i], d))[2,]
})
I'm sure there must be a more elegant way than to nested loops.
fore.choose <- matrix(NA, NCOL(assets), NCOL(assets))
abc <- NULL
for(i in seq_len(ncol(assets))){ # loop over "dependant" columns
for(j in seq_len(ncol(assets))){ # loop over "independant" columns
abc <- lm(assets[,i]~assets[,j])$coefficients
fore.choose[i,j] <- abc[-1]
}
}

Resources