'apply' on two different data frames at once in R - r

I am wondering whether there is a way to use apply-family function to evaluate two different dataframes at once? Or is there a better way to solve this problem? I can only think of a loop, and that is too slow:
# example data
df_model <- data.frame(DY = c(93,100,107), CC=rnorm(1:3, mean = 0.1))
df_data <- data.frame(DY = rep(c(93,100,107),each = 3), CC = c(rnorm(1:3),rnorm(1:3),rnorm(1:3)))
In this example, I would like to have a vector of three elements as output, processed as follows ( here for the first case)
#example procedure case 1
collect <- matrix(0,ncol=3,nrow=3)
collect[1,] <- dnorm( df_data[which(df_data$DY == df_model$DY[1]),]$CC, df_model[1,]$CC, log=TRUE )
as Input, I envisage
a list/vector of CCs in df_data, subsetted for by the corresponding day DY (0.07624536 1.32623789 0.92921693)
evaluated against one value (0.00049671) of df_model, on that corresponding day DY
In the end I would like to collect the vectors in example(collect) a matrix of the number of rows of three df_model$DY, and three columns, which contains the evaluation of df_data against df_model on day DY.
[,1] [,2] [,3]
[1,] -0.9218075 -1.7977334 -1.3501992
[2,] -0.9356356 -0.9850012 -1.1753341
[3,] -1.2152926 -0.9195071 -2.4127840
This needs to be done as efficiently as possible.
I can do it in a loop (above you see the first case for the loop), but I am sure there are better ways.
I looked into the apply function family, but I get confused, as I have two different dataframes which I evaluate. Any help/pointers would be much appreciated!

We can use mapply or Map
mapply(function(x, y) dnorm(df_data$CC[df_data$DY == x], y,
log = TRUE), df_model$DY, df_model$CC)
-output
# [,1] [,2] [,3]
#[1,] -1.5031401 -2.7449464 -1.734319
#[2,] -0.9237629 -0.9243094 -1.115875
#[3,] -4.9848319 -1.1494313 -1.187122

Related

R - Outer product of two vectors, using custom FUN to return lists [duplicate]

This question already has answers here:
Unique combination of all elements from two (or more) vectors
(6 answers)
Closed 4 years ago.
I have two sequences which I'm trying to combine using an outer join but returning lists
For example,
s1 <- seq(1,3)
s2 <- seq(4,6)
and I'm trying to end up with a matrix of lists like this:
c(1,4)c(2,4)c(3,4)
c(1,5)c(2,5)c(3,5)
c(1,6)c(2,6)c(3,6)
so I'm trying to use the outer product function with a custom function
listify <- function(a,b){
lst <- cbind(a,b)
}
outer(h1,h2,FUN = "listify")
which should be doing what I want but I can't seem to get it to work. I've read this question which suggests the usage of cbind, but as I said, I've gotten no results. I just get an error:
Error in dim(robj) <- c(dX, dY) :
dims [product 121] do not match the length of object [242]
I can get pretty close (or maybe all the way there?) (a) making your function return a list, and (b) vectorizing it. From ?outer
FUN... must be a vectorized function (or the name of one) expecting at least two arguments and returning a value with the same length as the first (and the second).
listify <- function(a,b){
list(a,b)
}
lv = Vectorize(listify)
s1 <- seq(1,3)
s2 <- seq(4,6)
result = t(outer(s1,s2,FUN = lv))
result
# [,1] [,2] [,3]
# [1,] Integer,2 Integer,2 Integer,2
# [2,] Integer,2 Integer,2 Integer,2
# [3,] Integer,2 Integer,2 Integer,2
result[1, 1]
# [[1]]
# a b
# [1,] 1 4
result[2, 1]
# [[1]]
# a b
# [1,] 1 5
Note that each element of the matrix is a list of length 1.
As pointed out in comments, without more info I don't see any advantage of having these single-element lists in a matrix. A long-form expand.grid table or a 3-d array both seem potentially easier to work with.

Divide each each cell of large matrix by sum of its row

I have a site by species matrix. The dimensions are 375 x 360. Each value represents the frequency of a species in samples of that site.
I am trying to convert this matrix from frequencies to relative abundances at each site.
I've tried a few ways to achieve this and the only one that has worked is using a for loop. However, this takes an incredibly long time or simply never finishes.
Is there a function or a vectorised method of achieving this? I've included my for-loop as an example of what I am trying to do.
relative_abundance <- matrix(0, nrow= nrow(data_wide),
ncol=ncol(data), dimnames = dimnames(data))
i=0
j=0
for(i in 1:nrow(relative_abundance)){
for(j in 1:ncol(relative_abundance)){
species_freq <- data[i,j]
row_sum <- sum(data[i,])
relative_abundance[i,j] <- species_freq/row_sum
}
}
You could do this using apply, but scale in this case makes things even simplier. Assuming you want to divide columns by their sums:
set.seed(0)
relative_abundance <- matrix(sample(1:10, 360*375, TRUE), nrow= 375)
freqs <- scale(relative_abundance, center = FALSE,
scale = colSums(relative_abundance))
The matrix is too big to output here, but here's how it shoud look like:
> head(freqs[, 1:5])
[,1] [,2] [,3] [,4] [,5]
[1,] 0.004409603 0.0014231499 0.003439803 0.004052685 0.0024026910
[2,] 0.001469868 0.0023719165 0.002457002 0.005065856 0.0004805382
[3,] 0.001959824 0.0018975332 0.004914005 0.001519757 0.0043248438
[4,] 0.002939735 0.0042694497 0.002948403 0.002532928 0.0009610764
[5,] 0.004899559 0.0009487666 0.000982801 0.001519757 0.0028832292
[6,] 0.001469868 0.0023719165 0.002457002 0.002026342 0.0009610764
And a sanity check:
> head(colSums(freqs))
[1] 1 1 1 1 1 1
Using apply:
freqs2 <- apply(relative_abundance, 2, function(i) i/sum(i))
This has the advatange of being easly changed to run by rows, but the results will be joined as columns anyway, so you'd have to transpose it.
Firstly, you could just do
relative_abundance[i,j] <- data[i,j]/sum(data[i,])
so you dont create the variables...
But to vectorise it, I suggest: compute the row sums with rowsum function(fast) and then you can just use apply by columns and each of that divide by the rowsums:
relative_freq<-apply(data,2,function(x) data[,x]/rowsum(data))
Using some simple linear algebra we can produce faster results. Simply multiply on the left by a diagonal matrix with the scaling factors you need, like this:
library(Matrix)
set.seed(0)
relative_abundance <- matrix(sample(1:10, 360*375, TRUE), nrow= 375)
Diagonal_Matrix <- diag(1/rowSums(relative_abundance))
And then we multiply from the left:
row_normalized_matrix <- Diagonal_Matrix %*% relative_abundance
If you want to normalize columnwise simply make:
Diagonal_Matrix <- diag(1/colSums(relative_abundance))
and multiply from the right.
You can do something like this
relative_abundance <- matrix(sample(1:10, 360*375, TRUE), nrow= 375)
datnorm <- relative_abundance/rowSums(relative_abundance)
this will be faster if relative_abundance is a matrix rather than a data.frame

Combining deeply-nested vectors from multiple lists

I wish to combine equivalent, deeply-nested columns from all elements of a reasonably long list. What I would like to do, though it's not possible in R, is this:
combined.columns <- my.list[[1:length(my.list)]]$my.matrix[,"my.column"]
The only thing I can think of is to manually type out all the elements in cbind() like this:
combined.columns <- cbind(my.list[[1]]$my.matrix[,"my.column"], my.list[[2]]$my.matrix[,"my.column"], . . . )
This answer is pretty close to what I need, but I can't figure out how to make it work for the extra level of nesting.
There must be a more elegant way of doing this, though. Any ideas?
Assuming all your matrices have the same column name you wish to extract you could use sapply
set.seed(123)
my.list <- vector("list")
my.list[[1]] <- list(my.matrix = data.frame(A=rnorm(10,sd=0.3), B=rnorm(10,sd=0.3)))
my.list[[2]] <- list(my.matrix = data.frame(C=rnorm(10,sd=0.3), B=rnorm(10,sd=0.3)))
my.list[[3]] <- list(my.matrix = data.frame(D=rnorm(10,sd=0.3), B=rnorm(10,sd=0.3)))
sapply(my.list, FUN = function(x) x$my.matrix[,"B"])
Free data:
myList <- list(list(myMat = matrix(1:10, 2, dimnames=list(NULL, letters[1:5])),
myVec = 1:10),
list(myMat = matrix(10:1, 2, dimnames=list(NULL, letters[1:5])),
myVec = 10:1))
We can get column a of myMat a few different ways. Here's one that uses with.
sapply(myList, with, myMat[,"a"])
# [,1] [,2]
# [1,] 1 10
# [2,] 2 9
This mapply one might be better for a more recursive type problem. It works too and might be faster than sapply.
mapply(function(x, y, z) x[[y]][,z] , myList, "myMat", "a")
# [,1] [,2]
# [1,] 1 10
# [2,] 2 9

How can I make processing of matrices and vectors regular (as, e.g., in Matlab)

Suppose I have a function that takes an argument x of dimension 1 or 2. I'd like to do something like
x[1, i]
regardless of whether I got a vector or a matrix (or a table of one variable, or two).
For example:
x = 1:5
x[1,2] # this won't work...
Of course I can check to see which class was given as an argument, or force the argument to be a matrix, but I'd rather not do that. In Matlab, for example, vectors are matrices with all but one dimension of size 1 (and can be treated as either row or column, etc.). This makes code nice and regular.
Also, does anyone have an idea why in R vectors (or in general one dimensional objects) aren't special cases of matrices (or multidimensional objects)?
Thanks
In R, it is the other way round; matrices are vectors. The matrix-like behaviour comes from some extra attributes on top of the atomic vector part of the object.
To get the behaviour you want, you'd need to make the vector be a matrix, by setting dimensions on the vector using dim() or explicit coercion.
> vm <- 1:5
> dim(vm) <- c(1,5)
> vm
[,1] [,2] [,3] [,4] [,5]
[1,] 1 2 3 4 5
> class(vm)
[1] "matrix"
Next you'll need to maintain the dimensions when subsetting; by default R will drop empty dimensions, which in the case of vm above is the row dimension. You do that using drop = FALSE in the call to '['(). The behaviour by default is drop = TRUE:
> vm[, 2:4]
[1] 2 3 4
> vm[, 2:4, drop = FALSE]
[,1] [,2] [,3]
[1,] 2 3 4
You could add a class to your matrices and write methods for [ for that class where the argument drop is set to FALSE by default
class(vm) <- c("foo", class(vm))
`[.foo` <- function(x, i, j, ..., drop = FALSE) {
clx <- class(x)
class(x) <- clx[clx != "foo"]
x[i, j, ..., drop = drop]
}
which in use gives:
> vm[, 2:4]
[,1] [,2] [,3]
[1,] 2 3 4
i.e. maintains the empty dimension.
Making this fool-proof and pervasive will require a lot more effort but the above will get you started.

R: how to store a vector of vectors

I'm trying to write a function to determine the euclidean distance between x (one point) and y (a set of n points).
How should I pass y to the function? Until now, I used a matrix like that:
[,1] [,2] [,3]
[1,] 0 2 1
[2,] 1 1 1
Which would pass the points (0,2,1) and (1,1,1) to that function.
However, when I pass x as a normal (column) vector, the two variables don't match in the function.
I either have to transpose x or y, or save a vector of vectors an other way.
My question: What is the standard way to save more than one vector in R? (my matrix y)
Is it just my y transposed or maybe a list or dataframe?
There is no standard way, so you should just pick the most effective one, what on the other hand depends on how this vector of vectors looks just after creation (it is better to avoid any conversion which is not necessary) and on the speed of the function itself.
I believe that a data.frame with columns x, y and z should be pretty good choice; the distance function will be quite simple and fast then:
d<-function(x,y) sqrt((y$x-x[1])^2+(y$y-x[2])^2+(y$z-x[3])^2)
The apply function with the margin argument = 1 seems the most obvious:
> x
[,1] [,2] [,3]
[1,] 0 2 1
[2,] 1 1 1
> apply(x , 1, function(z) crossprod(z, 1:length(z) ) )
[1] 7 6
> 2*2+1*3
[1] 7
> 1*1+2*1+3*1
[1] 6
So if you wanted distances then square-root of the crossproduct of the differences to a chose point seems to work:
> apply(x , 1, function(z) sqrt(sum(crossprod(z -c(0,2,2), z-c(0,2,2) ) ) ) )
[1] 1.000000 1.732051

Resources