scaling standardized or weighting - r

I hope I can find answer for this question here. I have this piece of code that I am trying to analyze closely,
alphas <- matrix(runif(900), ncol=3, byrow=TRUE)
z <- t(apply(alphas, 1, cumsum))
for(i in 1:nrow(z)){
z[i, ] <- z[i, ] / (1:ncol(z))
}
I am trying to understand what does z[i,]<- z[i,]/(1:ncol(z)) code is doing for the matrix alphas. I know we are dividing each column by the sequence of columns in the input matrix. I also know when using apply with margin 2, we apply the function we are interested in, which is in this case "cumsum" over the rows of matrix alphas. Thats basically what I know, I have no clue why the next line and what does to my matrix alphas?
I would appreciate some insigts
Thank you very much

With your code I would say you are calculating row-wise cumulative means of your alphas.
With the line in your loop you're doing a vector division that yields the averages of cumulative sums of each column.
Look what ncol(z) yields
> ncol(z)
[1] 3
So basically what you're doing with z[i, ] / (1:ncol(z)) in your loop is a division of each row by a vector, or sequence respectively, with length of column numbers, i.e. c(1, 2, 3) or just 1:3.
Consider the first row of your alphas and your z.
set.seed(42) # for sake of reproducibility
alphas <- matrix(runif(900), ncol=3, byrow=TRUE)
z <- t(apply(alphas, 1, cumsum))
> alphas[1, ]
[1] 0.9148060 0.9370754 0.2861395
> z[1, ]
[1] 0.914806 1.851881 2.138021
> cbind(alphas[1, 1], mean(c(alphas[1, 1:2])), mean(c(alphas[1, 1:3])))
[,1] [,2] [,3]
[1,] 0.914806 0.9259407 0.7126737
The core of your loop yields
> z[1, ] / 1:ncol(z)
[1] 0.9148060 0.9259407 0.7126737
So each element of a row of z[1, ] will be divided by its corresponding divisor of the vector, yielding the means of the aggregated cells of
Your loop simply does this for your whole z matrix.
Apropos—faster and more convenient in R we do this in a vectorized way within a function. Since you understand apply() you will understand sapply(). Which we will use by first defining a function.
FUN1 <- function(i){
z[i, ] / 1:ncol(z)
}
M <- t(sapply(1:nrow(z), FUN1))
> head(M, 3)
[,1] [,2] [,3]
[1,] 0.9148060 0.9259407 0.7126737
[2,] 0.8304476 0.7360966 0.6637630
[3,] 0.7365883 0.4356275 0.5094157
This yields the same as your loop but in the R way.
In one step we can do this saying
z <- t(sapply(seq_len(nrow(alphas)),
function(i) cumsum(alphas[i, ]) / seq_along(alphas[i, ])))
> head(z, 3)
[,1] [,2] [,3]
[1,] 0.9148060 0.9259407 0.7126737
[2,] 0.8304476 0.7360966 0.6637630
[3,] 0.7365883 0.4356275 0.5094157

Related

Apply() cannot be applied to this list?

I have created an example below, where I am trying to make a list of each row of a matrix, then use apply().
mat<-matrix(rexp(9, rate=.1), ncol=3)
my_list2 <- list()
for(i in 1:nrow(mat)) {
my_list2[[i]] <- mat[i,]
}
#DO NOT CHANGE THIS:
apply(my_list2[[i]],2,sum)
However the apply() function does not work, giving a dimension error. I understand that apply() is not the best function to use here but it is present in a function that I need so I cannot change that line.
Does anyone have any idea how I can change my "my_list2" to work better? Thank you!
Edit:
Here is an example that works (non reproducible)
Example
Note both the example above and this example have type "list"
This answer addresses "how to properly get a list of matrices", not how to resolve the use of apply.
By default in R, when you subset a matrix to a single column or a single row, it reduces the dimensionality. For instance,
mtx <- matrix(1:6, nrow = 2)
mtx
# [,1] [,2] [,3]
# [1,] 1 3 5
# [2,] 2 4 6
mtx[1,]
# [1] 1 3 5
mtx[,3]
# [1] 5 6
If you want a single row or column but to otherwise retain dimensionality, add the drop=FALSE argument to the [-subsetting:
mtx[1,,drop=FALSE]
# [,1] [,2] [,3]
# [1,] 1 3 5
mtx[,3,drop=FALSE]
# [,1]
# [1,] 5
# [2,] 6
In this way, your code to produce sample data can be adjusted to be:
set.seed(42) # important for reproducibility in questions on SO
mat<-matrix(rexp(9, rate=.1), ncol=3)
my_list2 <- list()
for(i in 1:nrow(mat)) {
my_list2[[i]] <- mat[i,,drop=FALSE]
}
my_list2
# [[1]]
# [,1] [,2] [,3]
# [1,] 1.983368 0.381919 3.139846
# [[2]]
# [,1] [,2] [,3]
# [1,] 6.608953 4.731766 4.101296
# [[3]]
# [,1] [,2] [,3]
# [1,] 2.83491 14.63627 11.91598
And then you can use akrun's most recent code to resolve how to get the row-wise sums within each list element, i.e., one of
lapply(my_list2, apply, 2, sum)
lapply(my_list2, function(z) apply(z, 2, sum))
lapply(my_list2, \(z) apply(z, 2, sum)) # R-4.1 or later
In your screenshot it works because the object part of the list ex[[1]] is an array. And in your example the elements of your list are vectors. You could try the following:
mat<-matrix(rexp(9, rate=.1), ncol=3)
my_list2 <- list()
for(i in 1:nrow(mat)) {
my_list2[[i]] <- as.matrix(mat[i,])
}
#DO NOT CHANGE THIS:
apply(my_list2[[1]],2,sum)
apply(my_list2[[2]],2,sum)
apply(my_list2[[3]],2,sum)
You should note that apply cannot be applied to all three elements of the array in one line. And to do it in one, that line should be changed.

Calculating distance between two points for multiple records for matching rows - loop over rows of two matrices

I have got two matrices with coordinates and I am trying to compute distances between points in matching rows, i.e. between row 1 in first matrix and row 1 in second matrix.
What I am getting is computed distance between row 1 and all the other rows. This is creating memory issues as I have 800,000 rows. Does anyone know how to ask for that?
I am using
dist1 <- distm(FareStageMatrix[1:25000,], LSOACentroidMatrix[1:25000,], fun=distHaversine)
I am trying to create something like this but doesn't seem to work
for(i in 1:nrow(FareStageMatrix)) {
for(j in 1:nrow(LSOACentroidMatrix)) {
my_matrix[i] <- my_matrix[distm(FareStageMatrix[i], LSOACentroidMatrix[i], fun=distHaversine)]
}
}
changed to
for (i in 1:nrow(FareStageMatrix)){
for (i in 1:nrow(LSOACentroidMatrix)){
r1<-FareStageMatrix[i,]
r2<-LSOACentroidMatrix[i,]
results[i]<-distm(r1, r2, fun=distHaversine)
}
}
Is that something that should be working?
It seems I have managed to find a solution to that:
results<-matrix(NA,nrow(FareStageMatrix))
for (i in 1:nrow(FareStageMatrix)){
for (i in 1:nrow(LSOACentroidMatrix)){
r1<-FareStageMatrix[i,]
r2<-LSOACentroidMatrix[i,]
results[i]<-distm(r1, r2, fun=distHaversine) ## Example function
}
}
where FareStageMatrix and LSOACentroidMatrix are matrices with coordinates
It seems to have calculated one distance for a given pair of points
I've adapted geosphere's distGeo function (geodesic distance) for this purpose.
library(geosphere)
source("https://raw.githubusercontent.com/RomanAbashin/distGeo_v/master/distGeo_v.R")
Data
set.seed(1702)
m1 <- matrix(runif(20000, -10, 10), ncol = 2)
m2 <- matrix(runif(20000, -10, 10), ncol = 2)
Code
result <- distGeo_v(m1[, 1], m1[, 2],
m2[, 1], m2[, 2])
Result
> head(m1)
[,1] [,2]
[1,] 8.087152 9.227607
[2,] 9.528334 9.103403
[3,] 5.637921 -2.213228
[4,] -2.473758 -9.812986
[5,] -2.844036 -5.245779
[6,] -4.824615 -4.330890
> head(m2)
[,1] [,2]
[1,] 0.1673027 0.6483745
[2,] -2.5033184 0.1386050
[3,] 4.8589785 5.1996968
[4,] 8.3239454 -8.9810949
[5,] 0.8280422 -7.8272613
[6,] -6.2633738 -5.8725562
> head(result)
[1] 1292351.3 1661739.3 824260.0 1189476.4 496403.2 233480.2

Finding cumulative sum and then average the values in R

I want to compute cumulative sum for the first (n-1) columns(if we have n columns matrix) and subsequently average the values. I created a sample matrix to do this task. I have the following matrix
ma = matrix(c(1:10), nrow = 2, ncol = 5)
ma
[,1] [,2] [,3] [,4] [,5]
[1,] 1 3 5 7 9
[2,] 2 4 6 8 10
I wanted to find the following
ans = matrix(c(1,2,2,3,3,4,4,5), nrow = 2, ncol = 4)
ans
[,1] [,2] [,3] [,4]
[1,] 1 2 3 4
[2,] 2 3 4 5
The following are my r function.
ColCumSumsAve <- function(y){
for(i in seq_len(dim(y)[2]-1)) {
y[,i] <- cumsum(y[,i])/i
}
}
ColCumSumsAve(ma)
However, when I run the above function its not producing any output. Are there any mistakes in the code?
Thanks.
There were several mistakes.
Solution
This is what I tested and what works:
colCumSumAve <- function(m) {
csum <- t(apply(X=m, MARGIN=1, FUN=cumsum))
res <- t(Reduce(`/`, list(t(csum), 1:ncol(m))))
res[, 1:(ncol(m)-1)]
}
Test it with:
> colCumSumAve(ma)
[,1] [,2] [,3] [,4]
[1,] 1 2 3 4
[2,] 2 3 4 5
which is correct.
Explanation:
colCumSumAve <- function(m) {
csum <- t(apply(X=m, MARGIN=1, FUN=cumsum)) # calculate row-wise colsum
res <- t(Reduce(`/`, list(t(csum), 1:ncol(m))))
# This is the trickiest part.
# Because `csum` is a matrix, the matrix will be treated like a vector
# when `Reduce`-ing using `/` with a vector `1:ncol(m)`.
# To get quasi-row-wise treatment, I change orientation
# of the matrix by `t()`.
# However, the output, the output will be in this transformed
# orientation as a consequence. So I re-transform by applying `t()`
# on the entire result at the end - to get again the original
# input matrix orientation.
# `Reduce` using `/` here by sequencial list of the `t(csum)` and
# `1:ncol(m)` finally, has as effect `/`-ing `csum` values by their
# corresponding column position.
res[, 1:(ncol(m)-1)] # removes last column for the answer.
# this, of course could be done right at the beginning,
# saving calculation of values in the last column,
# but this calculation actually is not the speed-limiting or speed-down-slowing step
# of these calculations (since this is sth vectorized)
# rather the `apply` and `Reduce` will be rather speed-limiting.
}
Well, okay, I could do then:
colCumSumAve <- function(m) {
csum <- t(apply(X=m[, 1:(ncol(m)-1)], MARGIN=1, FUN=cumsum))
t(Reduce(`/`, list(t(csum), 1:ncol(m))))
}
or:
colCumSumAve <- function(m) {
m <- m[, 1:(ncol(m)-1)] # remove last column
csum <- t(apply(X=m, MARGIN=1, FUN=cumsum))
t(Reduce(`/`, list(t(csum), 1:ncol(m))))
}
This is actually the more optimized solution, then.
Original Function
Your original function makes only assignments in the for-loop and doesn't return anything.
So I copied first your input into a res, processed it with your for-loop and then returned res.
ColCumSumsAve <- function(y){
res <- y
for(i in seq_len(dim(y)[2]-1)) {
res[,i] <- cumsum(y[,i])/i
}
res
}
However, this gives:
> ColCumSumsAve(ma)
[,1] [,2] [,3] [,4] [,5]
[1,] 1 1.5 1.666667 1.75 9
[2,] 3 3.5 3.666667 3.75 10
The problem is that the cumsum in matrices is calculated in column-direction instead row-wise, since it treats the matrix like a vector (which goes columnwise through the matrix).
Corrected Original Function
After some frickeling, I realized, the correct solution is:
ColCumSumsAve <- function(y){
res <- matrix(NA, nrow(y), ncol(y)-1)
# create empty matrix with the dimensions of y minus last column
for (i in 1:(nrow(y))) { # go through rows
for (j in 1:(ncol(y)-1)) { # go through columns
res[i, j] <- sum(y[i, 1:j])/j # for each position do this
}
}
res # return `res`ult by calling it at the end!
}
with the testing:
> ColCumSumsAve(ma)
[,1] [,2] [,3] [,4]
[1,] 1 2 3 4
[2,] 2 3 4 5
Note: dim(y)[2] is ncol(y) - and dim(y)[1] is nrow(y) -
and instead seq_len(), 1: is shorter and I guess even slightly faster.
Note: My solution given first will be faster, since it uses apply, vectorized cumsum and Reduce. - for-loops in R are slower.
Late Note: Not so sure that the first solution is faster. Since R-3.x it seems that for loops are faster. Reduce will be the speed limiting funtion and can be sometimes incredibly slow.
k <- t(apply(ma,1,cumsum))[,-ncol(k)]
for (i in 1:ncol(k)){
k[,i] <- k[,i]/i
}
k
This should work.
All you need is rowMeans:
nc <- 4
cbind(ma[,1],sapply(2:nc,function(x) rowMeans(ma[,1:x])))
[,1] [,2] [,3] [,4]
[1,] 1 2 3 4
[2,] 2 3 4 5
Here's how I did it
> t(apply(ma, 1, function(x) cumsum(x) / 1:length(x)))[,-NCOL(ma)]
[,1] [,2] [,3] [,4]
[1,] 1 2 3 4
[2,] 2 3 4 5
This applies the cumsum function row-wise to the matrix ma and then divides by the correct length to get the average (cumsum(x) and 1:length(x) will have the same length). Then simply transpose with t and remove the last column with [,-NCOL(ma)].
The reason why there is no output from your function is because you aren't returning anything. You should end the function with return(y) or simply y as Marius suggested. Regardless, your function doesn't seem to give you the correct response anyway.

Multiply a matrix' columns by its columns

I have a 4x100 matrix where I would like to multiply column 1 with row 1 in its transpose etc and store these matrices somewhere to be able to take the sum of these new matrices lateron.
I really don't know where to start due to the fact that I get 4x4 matrices after the column-row-multiplication. Due to this fact I cannot store them in a matrix
data:
mm num[1:4,1:100]
mm_t num[1:100,1:4]
I'm thinking of creating a list in some way
list1=list()
for(i in 1:100){
list1[i] <- mm[,i]%*%mm_t[i,]
}
but I need some more indices i think because this just leaves me with a number in each argument..
First, your call for data is not clear. Second, are you tryign to multiply each value by itself, or do matrix multiplication
We create a 4x100 matrix and its transpose:
mm <- matrix(1:400, nrow = 4, ncol = 100)
mm.t <- t(mm)
Then we can do the matrix multiplication (which is what you did, and you get a 4 x 4 matrix from the definition of matrix multiplication https://www.wikiwand.com/en/Matrix_multiplication)
If we want to multiply each index by itself (so mm[1,1] by mm [1,1]) then:
mm * mm
This will result in 4x100 matrix where each value is the square of the original value.
If we want the matrix multiplication of each column with itself, then:
sapply(1:100, function(x) {
mm[, x] %*% mm[, x]
})
This results in 100 values: each one is the matrix product of a 4x1 vector with itself.
Let's start with some sample data. Please get in the habit of including things like this in your question:
nr = 4
nc = 100
set.seed(47)
mm = matrix(runif(nr * nc), nrow = nr)
Here's a working answer, very similar to your attempt:
result = list()
for (i in 1:ncol(mm)) result[[i]] = mm[, i] %*% t(mm[, i])
result[1:2]
# [[1]]
# [,1] [,2] [,3] [,4]
# [1,] 0.9544547 0.3653018 0.7439585 0.8035430
# [2,] 0.3653018 0.1398132 0.2847378 0.3075428
# [3,] 0.7439585 0.2847378 0.5798853 0.6263290
# [4,] 0.8035430 0.3075428 0.6263290 0.6764924
#
# [[2]]
# [,1] [,2] [,3] [,4]
# [1,] 0.3289532 0.3965557 0.2231443 0.2689613
# [2,] 0.3965557 0.4780511 0.2690022 0.3242351
# [3,] 0.2231443 0.2690022 0.1513691 0.1824490
# [4,] 0.2689613 0.3242351 0.1824490 0.2199103
As to why yours didn't work, we can experiment and see that indeed we get a number rather than a matrix. The reason is that when you subset a single row or column of a matrix, the dimensions are "dropped" and it is coerced to a plain vector. And when you matrix multiply two vectors, you get their dot product.
mmt = t(mm)
mm[, 1] %*% mmt[1, ]
# [,1]
# [1,] 2.350646
dim(mm[, 1])
# NULL
dim(mmt[1, ])
# NULL
We can avoid this by specifying drop = FALSE in the subset code
dim(mmt[1, , drop = FALSE])
# [1] 1 4
And thus slightly modify your attempt, just adding drop = FALSE will make it work.
res2 = list()
for (i in 1:ncol(mm)) res2[[i]] = mm[, i] %*% mmt[i, , drop = FALSE]
identical(result, res2)
# [1] TRUE

How to combine subsequent list elements into a new list in R?

For example: I have a list of matrices, and I would like to evaluate their differences, sort of a 3-D diff. So if I have:
m1 <- matrix(1:4, ncol=2)
m2 <- matrix(5:8, ncol=2)
m3 <- matrix(9:12, ncol=2)
mat.list <- list(m1,m2,m3)
I want to obtain
mat.diff <- list(m2-m1, m3-m2)
The solution I found is the following:
mat.diff <- mapply(function (A,B) B-A, mat.list[-length(mat.list)], mat.list[-1])
Is there a nicer/built-in way to do this?
You can do this with just lapply or other ways of looping:
mat.diff <- lapply( tail( seq_along(mat.list), -1 ),
function(i) mat.list[[i]] - mat.list[[ i-1 ]] )
You can use combn to generate the indexes of matrix and apply a function on each combination.
combn(1:length(l),2,FUN=function(x)
if(diff(x) == 1) ## apply just for consecutive index
l[[x[2]]]-l[[x[1]]],
simplify = FALSE) ## to get a list
Using #Arun data, I get :
[[1]]
[,1] [,2]
[1,] 4 4
[2,] 4 4
[[2]]
NULL
[[3]]
[,1] [,2]
[1,] 4 4
[2,] 4 4

Resources