Correlation coefficient for three variables in r - r

For three n-dimensional non-zero-variance variables a, b, and c, n > 2, if
r(ab), r(bc), and r(ac) are Pearson’s correlation coefficients between a and b, between b and c, and between a and c, respectively, then correlation coefficient r(abc) among a, b, and c is defined as:
r^2(abc) = ( r^2(ab) + r^2(bc) + r^2(ac) ) - ( 2 x r(ab) x r(bc) x r(ac) )
I was able to get the code for a manual way of doing it:
a <- c(4, 6, 2, 7)
b <- c(8, 1, 3, 5)
c <- c(6, 3, 1, 9)
al <- data.frame(a, b, c)
al
ab_cor <- cor(al$a, al$b, method = c("pearson"))
bc_cor <- cor(al$b, al$c, method = c("pearson"))
ac_cor <- cor(al$a, al$c, method = c("pearson"))
abc_cor <- sqrt( ( (ab_cor)^2 + (bc_cor)^2 + (ac_cor)^2 ) - ( 2 * ab_cor * bc_cor * ac_cor) )
abc_cor
But I was wondering if this could be done with less lines of code, for example with a for loop. Addittionaly, how would I write it so that I could do it with more than 3 variables as well, for example, r(abcd) i.e. r(ab), r(ac), r(ad), r(bc), r(bd), and r(cd).

The cor function already creates a matrix of the correlations. You just need to pick out the relevant ones and then use some vector operations.
cs <- cor(al, method = "pearson")
cs <- cs[upper.tri(cs)]
#sqrt(sum(cs^2)) - 2*prod(cs)
# apparently it's
sqrt(sum(cs^2) - 2*prod(cs))
This generalizes to your larger case as well assuming that you have all the variables you want in your al data.frame.

Related

Apply function to cartesian product of numeric and function type

I have a function
eval_ = function(f, i) f(i)
for a list of functions, say
fns = list(function(x) x**2, function(y) -y)
and a vector of integers, say
is = 1:2
I would like to get eval_ evaluated at all combinations of fns and is.
I tried the following:
cross = expand.grid(fns, is)
names(cross) = c("f", "i")
results = sapply(1:nrow(cross), function(i) do.call(eval_, cross[i,]))
This throws an error:
Error in f(i) : could not find function "f"
I think that the underlying problem is, that cross is a data.frame and can not carry functions. Hence, it puts the function into a list and then carries a list (indeed, class(cross[1,][[1]]) yields "list". My ugly hack is to change the third line to:
results = sapply(
1:nrow(cross),
function(i) do.call(eval_, list(f = cross[i,1][[1]], i = cross[i,2]))
)
results
#[1] 1 -1 4 -2
This works, but it defeats the purpose of do.call and is very cumbersome.
Is there a nice solution for this kind of problem?
Note: I would like a solution that generalizes well to cases where the cross product is not only over two, but possibly an arbitrary amount of lists, e.g. functions that map R^n into R.
Edit:
For a more involved example, I think of the following:
fns = list(mean, sum, median)
is1 = c(1, 2, 4, 9), ..., isn = c(3,6,1,2) and my goal is to evaluate the functions on the cartesian product spanned by is1, ..., isn, e.g. on the n-dimensional vector c(4, ..., 6).
You can use mapply() for this:
eval_ <- function(f, i) f(i)
fns <- list(function(x) x**2, function(y) -y)
is <- 1:2
cross <- expand.grid(fns = fns, is = is)
cross$result <- mapply(eval_, cross$fn, cross$is)
print(cross)
#> fns is result
#> 1 function (x) , x^2 1 1
#> 2 function (y) , -y 1 -1
#> 3 function (x) , x^2 2 4
#> 4 function (y) , -y 2 -2
An attempt for my "more involved example" with n = 2.
Let X = expand.grid(c(1, 2, 4, 9), c(3,6,1,2)).
The following pattern generalizes to higher dimensions:
nfns = length(fns)
nn = nrow(X)
res = array(0, c(nfns, nn))
for(i in 1:nfns){
res[i,] = apply(X, MARGIN = 1, FUN = fns[[i]])
}
The shape of the margin of X (i.e. nrow(X)) must correspond to the shape of the slice res[i,] (i.e. nn). The function must map the complement of the margin of X (i.e. slices of the form X[i,]) to a scalar. Note that a function that is not scalar has components that are scalar, i.e. in a non-scalar case, we would loop over all components of the function.

Extract co-linear columns name - R

Based on the answer for this question and its script, how can I print to the console the co-linear columns names?
Script:
library(corrplot)
library(caret)
x <- seq(0, 100, 1)
# colinear with x
y <- x + 2.3
# almost colinear with x / some small gaussian noise
z <- x + rnorm(mean = 0, sd = 5, n = 101)
# uncorrrelated gaussian
w <- rnorm(mean = 0, sd = 1, n = 101)
a <- z+seq(101, 200, 1)/.33 + rnorm(mean = 0, sd = 5, n = 1001)
b <- a -2.3
# this frame is made to exemplify the procedure
df <- data.frame(x = x, y = y, z = z, w = w, a=a, b=b)
corrplot(cor(df))
#drop perfectly multicollinear variables
constant<-rep(1,nrow(df))
tmp<-lm(constant ~ ., data=df)
to_keep<-tmp$coefficients[!is.na(tmp$coefficients)]
to_keep<-names(to_keep[-which(names(to_keep) == "(Intercept)")])
df_result<-df[to_keep]
corrplot(cor(df_result))
You want the variables not included in to_keep. Based off how to_keep is defined, you can write to_drop <- tmp$coefficients[is.na(tmp$coefficients)] to get the coefficients with NA values (meaning there are no estimates for the corresponding variables because they are collinear with others). Then, to print the names of those coefficients, you can simply do print(names(to_drop)).
However, keep in mind that: 1. this will only drop perfectly collinear variables in a hacky way and 2. the way this method decides which variables out of a set of perfectly collinear variables to drop is rather arbitrary (it will depend on the other of variables in your data).

R Loop: Perform a Function for Every 3 Rows

I have 2000 wheat plants, growing over the course of 40 days.
I'd like to perform the coeff function on each plant to find the coefficients of the quadratic equation the 3 time points make. (a, b, and c)
(1) The coef(lm(y~poly(x,2,raw=TRUE)) function works exactly the way I want it to.
(2) However, the way my data is presented, requires me to manually set x and y.
(3) Thus, I melted my data, and ordered it.
(4) I'd like to make a loop that will take the first three in column "Day" and set that as x. Then I'd like it to take the first three in column "Height" and set that as y.
Then I'd like to perform the coeff function.
Last I'd like it to present the coefficient outputs I need, preferably in a new data table.
Then repeat for every three rows, which represent each wheat ID, for all wheat plants.
1) This function works, giving me coefficients: a, b, c
x<-c(1,2,3)
y<-c(1,10,4)
coef(lm(y~poly(x,2,raw=TRUE)))
2) This is what my data originally looked like
A = matrix(c(5, 4, 2, 10, 10, 4, 5, 15, 6),nrow=3, ncol=3)
colnames(A)<-c("10", "25", "40")
rownames(A)<-c("Wheat 1", "Wheat 2", "Wheat 3")
A
3) This is my melted format
A.melted<-as.data.frame(melt(A, id.vars="ID"))
A.melted<-A.melted[with(A.melted,order(Var1)),]
colnames(A.melted) <- c("WheatID", "Day", "Height")
A.melted$Day<-as.numeric(as.character(A.melted$Day))
A.melted
#
4) This is what I am trying to do with my loop....
for every 3 rows,
x<-A.melted[,2]
y<-A.melted[,3]
coef(lm(y~poly(x,2,raw=TRUE)))
something to compile the coefficients: a, b, c
I am just not familiar with the syntax of loops, and I'd love any tips and suggestions. Perusing Google tells me that one should not do loops unless it is absolutely required since I may run into more problems- thus I am open to non loop techniques as well.
If you want to do it in a loop try this. The crucial part is to use seq together with a by = argument to let the index take the steps you need.
library(tibble)
df <- tibble(
WheatID = rep(NA_character_, nrow(A)),
Intercept = rep(NA_real_, nrow(A)),
poly1 = rep(NA_real_, nrow(A)),
poly2 = rep(NA_real_, nrow(A))
)
cnt <- 1
for (i in seq(1, nrow(A.melted), by = 3)) {
x <- A.melted$Day[i + 0:2]
y <- A.melted$Height[i + 0:2]
df$WheatID[cnt] <- as.character(A.melted$WheatID[i])
df[cnt, 2:4] <- coef(lm(y~poly(x,2,raw=TRUE)))
cnt <- cnt + 1
}
df
Note: I am not a data.table guy. Therefore, I present you with a tibble.
We can do this with the help of data.table, see ?data.table:
library(data.table)
A.models = A.melted[, model := list(.(lm(Height ~ poly(Day, 2),
data = list(.(.SD[WheatID == .BY[[1]]]))))),
by = WheatID]
A.models[, coefs := list(.(coefficients(model[[1]]))),
by = WheatID]
You can access each model like this:
A.models[WheatID == "Wheat 1", model[[1]]]
and even
A.models[WheatID == "Wheat 1", summary(model[[1]])]
The magic here happens because data.table takes in J expressions, not only functions.
This is something you can do with data.table package.
data.list <- split(A.melted, f = (1:nrow(A.melted) - 1) %/% 3)
coefs <- lapply(data.list, function(x) {
coefs <- coef(lm(Day ~ poly(Height, raw=TRUE), data = x))
data.table(
intercept = coefs[1],
poly.height = coefs[2]
)
})
coefs <- rbindlist(coefs)
Or you could perform apply() directly on the original matrix:
x <- as.numeric(colnames(A))
apply(A, 1, function(y) coef(lm(y~poly(x,2,raw=TRUE))))
Wheat 1 Wheat 2 Wheat 3
(Intercept) -3.88888889 -0.555555556 6.666667e-01
poly(x, 2, raw = TRUE)1 1.11111111 0.477777778 1.333333e-01
poly(x, 2, raw = TRUE)2 -0.02222222 -0.002222222 -2.417315e-18
Or you could transpose the data and use the coef(...) call directly:
x <- as.numeric(colnames(A))
coef(lm(t(A) ~ poly(x, 2, raw = TRUE)))

Creating block matrix via loop

I'm trying to create a block matrix using a loop in R, which depend on some variable I call T. The two matrices used to construct the block matrix could look like this:
A=matrix(c(1,0.3,0.3,1.5),nrow=2)
B=matrix(c(0.5,0.3,0.3,1.5),nrow=2)
So depending on what i set T to, I need different results. For T=2:
C=rbind(cbind(A,B),cbind(B,A))
For T=3:
C=rbind(cbind(A,B,B),cbind(B,A,B),cbind(B,B,A))
For T=5:
C=rbind(cbind(A,B,B,B,B),cbind(B,A,B,B,B),cbind(B,B,A,B,B),cbind(B,B,B,A,B),cbind(B,B,B,B,A))
So basically, I'm just trying to create a loop or something similar, where I can just specify my T and it will create the block matrix for me depending on T.
Thanks
You can do that:
N <- nrow(A)
C <- matrix(NA,N*T,N*T)
for (i in 1:T){
for (j in 1:T){
if (i == j)
C[(i-1)*N+1:N, (j-1)*N+1:N] <- A
else
C[(i-1)*N+1:N, (j-1)*N+1:N] <- B
}
}
From your explanation I suppose that you want single A and T-1 Bs in your final matrix.
If that is correct then here is a quick try using the permn function from the combinat library. All I am doing is generating the expression using the permutation and then evaluating it.
A = matrix(c(1,0.3,0.3,1.5),nrow=2)
B = matrix(c(0.5,0.3,0.3,1.5),nrow=2)
T = 5
x = c("A", rep("B",T-1))
perms = unique(permn(x)) #permn generates non-unique permutations
perms = lapply(perms, function(xx) {xx=paste(xx,collapse=","); xx=paste("cbind(",xx,")")})
perms = paste(perms, collapse=",")
perms = paste("C = rbind(",perms,")",collapse=",")
eval(parse(text=perms))
With the blockmatrix package this is pretty straightforward.
library(blockmatrix)
# create toy matrices (block matrix elements)
# with values which makes it easier to track them in the block matrix in the example here
A <- matrix("a", nrow = 2, ncol = 2)
B <- matrix("b", nrow = 2, ncol = 2)
# function for creating the block matrix
# n: number of repeating blocks in each dimension
# (I use n instead of T, to avoid confusion with T as in TRUE)
# m_list: the two matrices in a list
block <- function(n, m_list){
# create a 'layout matrix' of the block matrix elements
m <- matrix("B", nrow = n, ncol = n)
diag(m) <- "A"
# build block matrix
as.matrix(blockmatrix(dim = dim(m_list[[1]]), value = m, list = m_list))
}
# try with different n
block(n = 2, m_list = list(A = A, B = B))
block(n = 3, m_list = list(A = A, B = B))
block(n = 5, m_list = list(A = A, B = B))

Passing multiple arguments to Reduce

I have a list of data.frames, and would like to operate on their columns, using various weights.
For example, subtracting the first columns from the second column (solved, see below); or subtracting the first and third from twice the second (unsolved).
Thanks to the generous help obtained in response to this question, I have a solution to the the problem in two dimensions without weights using Reduce.
I would like to have the flexibility to operate with weights - and in higher dimesions.
What I have so far is:
priceList <- data.frame(aaa = rnorm(100, 100, 10), bbb = rnorm(100, 100, 10),
ccc = rnorm(100, 100, 10), ddd = rnorm(100, 100, 10),
eee = rnorm(100, 100, 10), fff = rnorm(100, 100, 10),
ggg = rnorm(100, 100, 10)
)
colDiff <- function(x)
{
Reduce('-', rev(x))
}
tradeLegsList <- combn(names(priceList), 3, function(x) priceList[x], simplify = FALSE)
tradeList <- lapply(tradeLegsList, colDiff)
From what I can tell, Reduce is not designed to take multiple arguments.
I can do this the long way with 2* tradeLegsList[[1]]$bbb - tradeLegsList[[1]]$aaa - tradeLegsList[[1]]$ccc, and some loops, but it doesn't seem like the R way.
Is there a way to pass in a weight vector?
Ideally, I would to pass an argument such as w = c(-1, 2, -1) to the colDiff (or Reduce) function ... or something similar.
True, Reduce is not geared to allow multiple arguments, just two for each reduction. Therefore it is easiest to premultiply the elements in the list you are Reduce-ing.
Below is a solution that does this using mapply within your colDiff function definition.
Change your definifion of colDiff to allow a weight vector, and apply this using mapply
with SIMPLIFY = F.
EDIT
In light of the comments, weighting depends on the number of columns and there being no need for the rev
The weighting by length
length(x) == 1 -> w = 1
length(x) == 2 -> w = c(-1, 1),
length(x) == 3 -> w = c(-1, 2, -1),
length(x) == 4 -> w = c(-1, 1, -1, +1)
weighting <- function(i){
switch(i, 1, c(-1,1), c(-1,2,-1), c(-1,1,-1, 1))
}
colDiff <- function(x)
{
w = weighting(length(x))
Reduce('+', mapply('*', x, e2 = w, SIMPLIFY = F))
}
Then something like this would work
tradeList <- lapply(tradeLegsList, colDiff)
you could also keep with the functional programming theme and use Map which is a simple wrapper for mapply with SIMPLIFY = F
colDiff <- function(x)
{
w = weighting(length(x))
Reduce('+', Map('*', x , e2 = w))
}
you could also prefine the weighting within the function colDiff (which may be easier).
weighting[[2]] is weighting for when there are 2 columns, weighting[[3]] when there are 3.
colDiff <- function(x)
{
weighting <- list(1, c(-1,1), c(-1,2,-1), c(-1,1,-1, 1))
w = weighting[[length(x)]]
Reduce('+', Map('*', x , e2 = w))
}

Resources