Computing linear regressions for every possible permutation of matrix columns - r

I have a (k x n) matrix. I have initially managed to linearly regress (using the lm function) column 1 with each and every other column and extracted only the coefficients.
fore.choose <- matrix(0, 1, NCOL(assets))
for(i in seq(1, NCOL(assets), 1))
{
abc <- lm(assets[,1]~assets[,i])$coefficients
fore.choose[1,i] <- abc[2:length(abc)]
}
The coefficients are placed in the fore.choose matrix.
What I now need to do is to linearly regress column 2 with each and every other column, and then column 3 and so on and so forth and extract only the coefficients.
The output will be a square matrix of OLS univariate coefficients. Kind of similar to a correlation matrix, but it is the beta coefficients I am interested in.
fore.choose <- matrix(0, 1, NCOL(assets))
will initially need to become
fore.choose <- matrix(0, NCOL(assets), NCOL(assets))

I'd just compute the coefficients directly from the correlation matrix, using beta = cor(x,y)*sd(x)/sd(y), like this:
# set up some sample data
set.seed(1)
d <- matrix(rnorm(50), ncol=5)
# get the coefficients
s <- apply(d, 2, sd)
cor(d)*outer(s, s, "/")
You could also use lsfit to get the coefficients of one term on all the others at once and then only have one loop to do:
sapply(1:ncol(d), function(i) {
coef(lsfit(d[,i], d))[2,]
})

I'm sure there must be a more elegant way than to nested loops.
fore.choose <- matrix(NA, NCOL(assets), NCOL(assets))
abc <- NULL
for(i in seq_len(ncol(assets))){ # loop over "dependant" columns
for(j in seq_len(ncol(assets))){ # loop over "independant" columns
abc <- lm(assets[,i]~assets[,j])$coefficients
fore.choose[i,j] <- abc[-1]
}
}

Related

How to create correlation matrix after mice multiple imputation

I'm using the mice package to create multiple imputations. I want to create a correlations matrix (and a matrix of p-values for the correlation coefficients. I use miceadds::micombine.cor to do this. But this gives a dataframe with variables in the first to columns, and then a number of columns to contain r, p, t-values, and the like.
I'm looking for a way to turn this dataframe into a "good old" matrix with the correlation coefficient between x and y in position [x,y], and a matrix with p-values Does anyone have an easy way to do this?
Here's some code to reproduce:
data <- mtcars
mt.mis <- prodNA(mtcars, noNA = 0.1)
imputed <-mice(iris.mis, m = 5, maxit = 5, method = "pmm")
correlations<- miceadds::micombine.cor(mi.res=iris.mis, variables = c(1:3))
What I'm looking for is something like the output from cor(mtcars). Who can help?
I ended up writing my own function. Can probably be done much more efficiently, but this is what I made.
cormatrix <- function(r, N){
x <- 1
cormatrix <- matrix(nrow = N, ncol = N) # create empty matrix
for (i in 1:N) {
for (j in i:N) {
if(j>i){
cormatrix[i,j] <- r[x]
cormatrix[j,i] <- r[x]
x <- x + 1
}
}
}
diag(cormatrix) <- 1
cormatrix
}
You can call it with the output of micombine.cor and the number of variables in your model as arguments. So for example cormatrix(correlations$r,ncol(df)).

Find slope by linear regression of 2 matrices (R)

I have 2 matrices. One contains the quantities a client bought of products. The matrix looks like this quantitymatrix:
the other one contains unitprices at which a client bought the products. The matrix looks like this pricematrix:
How can I run a linear regression with the matrices so that I obtain the slope for each product?
Your data:
quantity <- matrix(c(4,2,6, 9,4,3, 1,1,2, 3,1,5), 3, 4)
price <- matrix(c(1,0.5,8, 4.2,1.2,2, 2,5,2, 1,2.5,1), 3, 4)
First, you have to transform your two matrices into a single data frame. (Although you can avoid that if you want, but I think it makes it much more straightforward if you do so):
df <- data.frame(quantity = as.numeric(quantity),
price = as.numeric(price),
product = rep(1:4, each = 3), ID = 1:3)
Then, run the linear models by groups:
lms <- by(df, df$product, FUN = function(x) lm(price~quantity, data = x))
And get the slopes:
slopes <- sapply(lms, coef)[2,]
If however, you want to keep the orignial matrices as they are, you can run a simple loop:
slopes <- numeric(dim(price)[2])
for (i in 1:dim(price)[2]) {
model <- lm(price[,i]~quantity[,i])
slopes[i] <- coef(model)[2]
}
NB: this solution assumes that the two matrices have identical dimensions.
And if you want to avoid loops, the following solution may be faster:
f <- function(x,y) coef(lm(y~x))[2]
l <- function(m) lapply(seq_len(ncol(m)), function(i) m[,i])
mapply(f, l(quantity), l(price))

Computing Spearman's rho for increasing subsets of rows in for Loop

I am trying to fit a for Loop in R in order to run correlations for multiple subsets in a data frame and then store the results in a vector.
What I have in this loop is a data frame with 2 columns, x and y, and 30 rows of different continuous measurement values in each column. The process should be repeated 100 times. The data can be invented.
What I need, is to compute the Spearman's rho for the first five rows (between x and y) and then for increasing subsets (e.g., the sixth first rows, the sevenths first rows etc.). Then, I'd need to store the rho results in a vector that I can further use.
What I had in mind (but does not work):
sortvector <- 1:(30)
for (i in 1:100)
{
sortvector <- sample(sortvector, replace = F)
xtemp <- x[sortvector]
rho <- cor.test(xtemp,y, method="spearman")$estimate
}
The problem is that the code gives me one value of rho for the whole dataframe, but I need it for increments of subsets.
How can I get rho for subsets of increasing values in a for-loop? And how can i store the coefficients in a vector that i can use afterwards?
Any help would be much appreciated, thanks.
Cheers
The easiest approach is to convertfor loop into sapply function, which returns a vector of rho's as a result of your bootstrapping:
sortvector <- 1:(30)
x <- rnorm(30)
y <- rnorm(30)
rho <- sapply(1:100, function(i) {
sortvector <- sample(sortvector, replace = F)
xtemp <- x[sortvector]
cor.test(xtemp, y, method = "spearman")$estimate
})
head(rho)
Output:
rho rho rho rho rho rho
0.014460512 -0.239599555 0.003337041 -0.126585095 0.007341491 0.264516129

Saving output from for-loop to 3D array in R

I am working in R to save outputs from a 'for' loop in to a 3D matrix. I have been unable to adapt a similar example answered here for my purposes, so I'd like to share a different example.
I have a mostly-completed "for" loop that generates slopes and intercepts from a linear model for N iterations; with each iteration using a new set of y-values with a random t-distribution ('rt').
The desired resulting output is a 3D matrix with two slices, here named "out2". One slice is named "Intercept" and the other is "Slope." Each column in both of the sheets is a result from the model generated with different degrees of dreedom (dfs)
set.seed(14)
x <- sample(0:50, 15) # Generate x-values for simulation
true.a <- 1.5 # Intercept for linear relationship
true.m <- 5 # Slope for linear relationship
dfs <- c(1,2,3,4,6,8,10,15,20,25) # Degrees of freedom
N <- 1000 # Reps in for-loop
out2 <- array(NA, dim=c(N, length(dfs), 2))
dimnames(out2) <- list(NULL, dfs, c("Intercept", "Slope"))
for(j in 1:length(dfs)) {
df.tdist <- dfs[j]
for(i in 1:N) {
y <- true.a + true.m * x + 25*rt(15,df.tdist)
fit <- lm(y ~ x)
out2[ ] <- ?????????????
# The output array 'out2' will consist of two "slices", one with intercepts
and one with slopes. The length of each slice is 1000 rows, and the
width of each slice is 10 columns
}
}
Thanks greatly in advance for your feedback.

Create a matrix out of remaining data from a random row selection of a matrix, and use data to calculate RMSE in R

I have a matrix[A] with 42 rows and 2 columns. I then have a function that selects randomly 12 of these rows, does a linear regression of the randomly selected matrix and outputs the coefficients (slope and intercept) of the linear regression.
In R, I want to then get the other 30 rows from the original matrix that were not selected in my random function, and then use that data with my newly calculated coefficients, to generate a point (y-value). So I will have 30 y-values, and then from there I would like to calculate the RMSE (http://upload.wikimedia.org/math/e/f/b/efb7882a7dbfa5fe48d771565d2675f3.png) using the new y-values, and 1 of the columns in my new 30 row matrix.
The code below is what I currently have right now:
#Calibration Equation 1 (TC OFF)
A <- matrix(c(Box.CR, Box.DC.ww), nrow=42)
randco <- function(A) {
B<- A[sample(42,12),]
lm(B[,2] ~ B[,1])$coefficients
}
Z <- t(replicate(10000, randco(A)))
arows <- apply(A, 1, paste, collapse="_")
brows <- apply(B, 1, paste, collapse="_")
A[-match(brows, arows), ]
Alternative method, converting matrix to data.table
(not recommended, if your sole purpose is whats described above)
library(data.table)
A <- as.data.table(A)
B <- A[sample(nrow(A), 12)]
setkey(A)
setkey(B)
A[!B]

Resources