R match variables from different lists - r

I want to write a loop, in which values from 3 different lists are put into another function
x = list(value1, value2, value3)
y = list(value1, value2, value3)
z = list(value1, value2, value3)
Example: function (x1, y1, z1)
I want to insert the values from the same column(x1, y1, z1 .... x3, y3, z3)of the different lists into the function. How could I do that?

This could be achieved using e.g. mapply:
x <- list(1, 2, 3)
y <- list(4, 5, 6)
z <- list(7, 8, 9)
mapply(function(vx, vy, vz) vx + vy + vz, x, y, z)
#> [1] 12 15 18
or using purrr::pmap:
purrr::pmap(list(x, y, z), function(vx, vy, vz) vx + vy + vz)

You get value1 from x list by using x[[1]]. Likewise, you get velue3 from y list by using y[[3]], etc.
You can then use these values in your function in various ways, depending on what your function has to do with these values.
#stefan has shown two efficient ways to do that.
If your function has to take three values from the same indices to result in a single value, such as mean, you can combine the list to become a matrix with three columns and three rows, so that you can apply this function to each column or each row with single index.
x = list(c(-3,0,2))
y = list(c(10,-1,20))
z = list(c(0.7, 0.5, 0.9))
myData = cbind(x,y,z)
myData
# x y z
# [1,] -3 10 0.7
# [2,] 0 -1 0.5
# [3,] 2 20 0.9
To get the mean of the value1s :
mean(myData[1,])
# [1] 2.566667
Suppose you want to compute the means and the standard deviations of each row, you can write a function to do that
myfun = function(x){
Mean = apply(x, 1, mean)
SD = apply(x, 1, sd)
result = rbind(Mean, SD)
return(result)
}
and then apply the function to your matrix:
myfun(myData)
# [,1] [,2] [,3]
# Mean 2.566667 -0.1666667 7.633333
# SD 6.698010 0.7637626 10.723961
This is just an example. Many other, more efficient, ways are possible.

Related

Tensorflow: Find greater than pairs and stack along axis

The problem I have using tensorflow is as follows:
For one tensor X with dims n X m
X = [[x11,x12...,x1m],[x21,x22...,x2m],...[xn1,xn2...,xnm]]
I want to get an n X m X m tensor which are n m X m matrices
Each m X m matrix is the result of:
tf.math.greater(tf.reshape(x,(-1,1)), x) where x is a row of X
In words, for every row k in X, Im trying to get the pairs i,j where xki > xkj. This gives me a matrix, and then I want to stack those matrices along the first axis, to get a n m x m cube.
Example:
X = [[1,2],[4,3], [5,7]
Result = [[[False, False],[True, False]],[[False, True],[False, False]], [[False, False],[True, False]]]
Result has shape 3 X 2 X 2
Reshaping each row is the same as reshaping all rows. Try this:
def fun(X):
n, m = X.shape
X1 = tf.expand_dims(X, -1)
X2 = tf.reshape(X, (n, 1, m))
return tf.math.greater(X1, X2)
X = tf.Variable([[1,2],[4,3], [5,7]])
print(fun(X))
Output:
tf.Tensor(
[[[False False]
[ True False]]
[[False True]
[False False]]
[[False False]
[ True False]]], shape=(3, 2, 2), dtype=bool)

How to mean center variables based on binary condition in r

I have a dataframe ("md") containing several variables, of which one is binary ("adopter"). I would like to mean center three of the other (continous) variables, let's say X, Y, and Z, but only for the ones where adopter = 1. The others, for which adopter = 0, should remain unchanged.
In the end I would like to end up with a new dataframe containing all variables as before, but with the X, Y, and Z for which adopter = 1 being mean centered, while leaving the X, Y, and Z for which adopter = 0 being unchanged.
My dataframe looks like this (117 observations in total):
adopter
X
Y
Z
A
B
0
0.5
2.3
4.5
3
4.7
1
1.5
6.5
-2.3
69.3
-2.5
...
...
...
...
So the new dataframe should contain the center means of X, Y, and Z of the second row in this example, as adopter=1, and leave the rest unchanged.
I know how to mean center all X, Y, and Z:
md_cen <- md
covs_to_center <- c("X", "Y", "Z")
md_cen[covs_to_center] <- scale(md_cen[covs_to_center],
scale = FALSE)
But I cannot figure out how to get the "only if adopter == "1" " into it. I also tried applying a function:
center_apply <- function(x) {
apply(x, 2, function(y) y - mean(y))}
However, this leaves me again with the mean centered versions for all X, Y, Z, of course, and on top the new dataset contains only those three variables.
Can anyone help me out here, please?
The basic way to accomplish what you're trying to do is to use the split-apply-combine workflow. That is:
Split your data frame up into coherent and useful sub-parts.
Do the thing you want to each sub-part.
Reconstitute the parts into the whole.
First, here's a toy dataset:
covs_to_center <- c("X", "Y", "Z")
set.seed(123)
md <- data.frame(
adopter = sample(0:1, 10, replace = T),
X = rnorm(10, 2, 1),
Y = rnorm(10, 3, 2),
Z = rnorm(10, 5, 10),
A = rnorm(10, 40, 50),
B = rnorm(10, 0, 2)
)
md
## adopter X Y Z A B
## 1 0 3.7150650 6.5738263 -11.866933 74.432013 -2.24621717
## 2 0 2.4609162 3.9957010 13.377870 67.695883 -0.80576967
## 3 0 0.7349388 -0.9332343 6.533731 36.904414 -0.93331071
## 4 1 1.3131471 4.4027118 -6.381369 24.701867 1.55993024
## 5 0 1.5543380 2.0544172 17.538149 20.976450 -0.16673813
## 6 1 3.2240818 0.8643526 9.264642 5.264651 0.50663703
## 7 1 2.3598138 2.5640502 2.049285 29.604136 -0.05709351
## 8 1 2.4007715 0.9479911 13.951257 -23.269818 -0.08574091
## 9 0 2.1106827 1.5422175 13.781335 148.447798 2.73720457
## 10 0 1.4441589 1.7499215 13.215811 100.398100 -0.45154197
A base R solution:
md_base <- data.frame(row_num = 1:nrow(md), md)
# append column of row numbers to make it easier to recombine things later
md_split <- split(md_base, md_base$adopter)
# this is a list of 2 data frames, corresponding to the 2 possible outcomes
# of the adopter variable
md_split$`1`[, covs_to_center] <-
apply(md_split$`1`[, covs_to_center], 2, function(y) y - mean(y))
# grab the data frame that had a 1 in the response column; apply the centering
# function to the correct variables in that data frame
md_new <- do.call(rbind, md_split)
# glue the data frame back together; it will be ordered by adopter
rownames(md_new) <- NULL
# remove row name artifact created by joining
md_new <- md_new[order(md_new$row_num), names(md_new) != "row_num"]
# sort by the row_num column, then drop it
This is pretty clunky, and I'm sure it could be improved. Here's a tidyverse equivalent that produces the same output:
library(tidyverse)
md %>%
group_by(adopter) %>%
mutate(across(covs_to_center, function(y) y - adopter * mean(y))) %>%
ungroup()
The idea behind this is: group by adopter (much like the split() approach), calculate the mean() of the relevant variables within each group, and then subtract the mean of the subgroup multiplied by the adopter variable (meaning when adopter == 0, nothing will be subtracted).

Creating a new column from a matrix

I have an n by 2 matrix, for example:
x <- matrix(1:4, nrow = 2, ncol = 2)
I have to create a new column which will store the result
(a11+a12)-a22, (a21+a22)-a32, ...
and so on. a32 is not there so it is considered as 0. Is there an easy way to do this in R ?
I have tried to use the apply() function with no luck. The desired output is a column with values
0
6
Something like this?
x <- matrix(1:4, nrow = 2, ncol = 2)
# obtain the row sum of x
rs = rowSums(x)
# obtain the last column from the matrix
x = x[,ncol(x)]
# remove the first value and add a 0 at the end
# since your last value will always be 0
x = x[-1]
x = c(x, 0)
rs - x

How to write and calculate Sum/ Product (of a function) in R

Assuming that I have a function, let's say f(x).
How can I write the product or sum of this function for given limits in x.
For instance product of f for x=1 until x=5
f(1)*f(2)*f(3)*f(4)*f(5)
Additionally I need to figure this out for sums/double sums.
Consider f(x,y) and the sum while x runs from 1 to 3 and y runs from 0 to x-1.
If written in mathematica, it would be this:
Sum[f[x, y], {x, 1, 3}, {y, 0, x - 1}]
and the output would be this
f[1, 0] + f[2, 0] + f[2, 1] + f[3, 0] + f[3, 1] + f[3, 2]
f is not defined for simplicity.
EDIT: example as requested:
f <- function (x,y) {
x + 2*y
}
Calculate sum where x runs from 1 to 3 and y runs from 0 to x-1.
(this is equal to 22 btw)
You can do this:
f <- function (x,y) {
x + 2*y
}
)
#calculate f for all combinations
tmp <- outer(1:3, 0:2, f)
#discard undesired combinations and sum
sum(tmp[lower.tri(tmp, diag = TRUE)])
#[1] 22
Alternatively you can use a loop to create the desired combinations only. This is much slower:
inds <- lapply(1:3, function(x) data.frame(x = x, y = 0:(x-1)))
inds <- do.call(rbind, inds)
sum(do.call(f, inds))
#[1] 22

Weighted Pearson's Correlation?

I have a 2396x34 double matrix named y wherein each row (2396) represents a separate situation consisting of 34 consecutive time segments.
I also have a numeric[34] named x that represents a single situation of 34 consecutive time segments.
Currently I am calculating the correlation between each row in y and x like this:
crs[,2] <- cor(t(y),x)
What I need now is to replace the cor function in the above statement with a weighted correlation. The weight vector xy.wt is 34 elements long so that a different weight can be assigned to each of the 34 consecutive time segments.
I found the Weighted Covariance Matrix function cov.wt and thought that if I first scale the data it should work just like the cor function. In fact you can specify for the function to return a correlation matrix as well. Unfortunately it does not seem like I can use it in the same manner because I cannot supply my two variables (x and y) separately.
Does anyone know of a way I can get a weighted correlation in the manner I described without sacrificing much speed?
Edit: Perhaps some mathematical function could be applied to y prior to the cor function in order to get the same results that I'm looking for. Maybe if I multiply each element by xy.wt/sum(xy.wt)?
Edit #2 I found another function corr in the boot package.
corr(d, w = rep(1, nrow(d))/nrow(d))
d
A matrix with two columns corresponding to the two variables whose correlation we wish to calculate.
w
A vector of weights to be applied to each pair of observations. The default is equal weights for each pair. Normalization takes place within the function so sum(w) need not equal 1.
This also is not what I need but it is closer.
Edit #3
Here is some code to generate the type of data I am working with:
x<-cumsum(rnorm(34))
y<- t(sapply(1:2396,function(u) cumsum(rnorm(34))))
xy.wt<-1/(34:1)
crs<-cor(t(y),x) #this works but I want to use xy.wt as weight
Unfortunately the accepted answer is wrong when y is a matrix of more than one row. The error is in the line
vy <- rowSums( w * y * y )
We want to multiply the columns of y by w, but this will multiply the rows by the elements of w, recycled as necessary. Thus
> f(x, y[1, , drop = FALSE], xy.wt)
[1] 0.103021
is correct, because in this case the multiplication is performed element-wise, which is equivalent to column-wise multiplication here, but
> f(x, y, xy.wt)[1]
[1] 0.05463575
gives a wrong answer due to the row-wise multiplication.
We can correct the function as follows
f2 <- function( x, y, w = rep(1,length(x))) {
stopifnot(length(x) == dim(y)[2] )
w <- w / sum(w)
# Center x and y, using the weighted means
x <- x - sum(x * w)
ty <- t(y - colSums(t(y) * w))
# Compute the variance
vx <- sum(w * x * x)
vy <- colSums(w * ty * ty)
# Compute the covariance
vxy <- colSums(ty * x * w)
# Compute the correlation
vxy / sqrt(vx * vy)
}
and check the results against those produced by corr from the boot package:
> res1 <- f2(x, y, xy.wt)
> res2 <- sapply(1:nrow(y),
+ function(i, x, y, w) corr(cbind(x, y[i,]), w = w),
+ x = x, y = y, w = xy.wt)
> all.equal(res1, res2)
[1] TRUE
which in itself gives another way that this problem could be solved.
You can go back to the definition of the correlation.
f <- function( x, y, w = rep(1,length(x))) {
stopifnot( length(x) == dim(y)[2] )
w <- w / sum(w)
# Center x and y, using the weighted means
x <- x - sum(x*w)
y <- y - apply( t(y) * w, 2, sum )
# Compute the variance
vx <- sum( w * x * x )
vy <- rowSums( w * y * y ) # Incorrect: see Heather's remark, in the other answer
# Compute the covariance
vxy <- colSums( t(y) * x * w )
# Compute the correlation
vxy / sqrt(vx * vy)
}
f(x,y)[1]
cor(x,y[1,]) # Identical
f(x, y, xy.wt)
Here is a generalization to compute the weighted Pearson correlation between two matrices (instead of a vector and a matrix, as in the original question):
matrix.corr <- function (a, b, w = rep(1, nrow(a))/nrow(a))
{
# normalize weights
w <- w / sum(w)
# center matrices
a <- sweep(a, 2, colSums(a * w))
b <- sweep(b, 2, colSums(b * w))
# compute weighted correlation
t(w*a) %*% b / sqrt( colSums(w * a**2) %*% t(colSums(w * b**2)) )
}
Using the above example and the correlation function from Heather, we can verify it:
> sum(matrix.corr(as.matrix(x, nrow=34),t(y),xy.wt) - f2(x,y,xy.wt))
[1] 1.537507e-15
In terms of calling syntax, this resembles the unweighted cor:
> a <- matrix( c(1,2,3,1,3,2), nrow=3)
> b <- matrix( c(2,3,1,1,7,3,5,2,8,1,10,12), nrow=3)
> matrix.corr(a,b)
[,1] [,2] [,3] [,4]
[1,] -0.5 0.3273268 0.5 0.9386522
[2,] 0.5 0.9819805 -0.5 0.7679882
> cor(a, b)
[,1] [,2] [,3] [,4]
[1,] -0.5 0.3273268 0.5 0.9386522
[2,] 0.5 0.9819805 -0.5 0.7679882

Resources