Convert ranks assigned by rank function into values - r

Sorry for the dummy question
My sample data looks like this
DF
a b c
0.01 0.02 0.03
0.08 0.09 0.10
I use rank to assign ranks for values in DF as
s <- sapply(DF, rank, ties.method ="average")
How can then i assign values to ranks? Or apparently i don't understand something.
Thank you for any suggestions.

not sure if I understand question, assuming you want ranks within column (variable)
> set.seed(100)
> df<-matrix(rnorm(6),ncol=3)
> df
[,1] [,2] [,3]
[1,] -0.5021924 -0.07891709 0.1169713
[2,] 0.1315312 0.88678481 0.3186301
> s <- apply(df,2, rank, ties.method ="average")
> s
[,1] [,2] [,3]
[1,] 1 1 1
[2,] 2 2 2
or if across all data.
> s <- matrix(rank(c(df), ties.method ="average"), ncol=3)
> s
[,1] [,2] [,3]
[1,] 1 2 3
[2,] 4 6 5

Related

Change elements in one matrix based on positions given by another matrix in R

Let's say I have a symmetric matrix A, for example:
> A <- matrix(runif(16),nrow = 4,byrow = T)
> ind <- lower.tri(A)
> A[ind] <- t(A)[ind]
> A
[,1] [,2] [,3] [,4]
[1,] 0.4212778 0.6874073 0.1551896 0.46757640
[2,] 0.6874073 0.5610995 0.1779030 0.54072946
[3,] 0.1551896 0.1779030 0.9515304 0.79429777
[4,] 0.4675764 0.5407295 0.7942978 0.01206526
I also have a 4 x 3 matrix B that gives specific positions of matrix A, for example:
> B<-matrix(c(1,2,4,2,1,3,3,2,4,4,1,3),nrow=4,byrow = T)
> B
[,1] [,2] [,3]
[1,] 1 2 4
[2,] 2 1 3
[3,] 3 2 4
[4,] 4 1 3
The B matrix represents the following positions of A: (1,1), (1,2), (1,4), (2,2), (2,1), (2,3), (3,3), (3,2), (3,4), (4,4), (4,1), (4,3).
I want to change the values of A that are NOT in the positions given by B, replacing them by Inf. The result I want is:
[,1] [,2] [,3] [,4]
[1,] 0.4212778 0.6874073 Inf 0.46757640
[2,] 0.6874073 0.5610995 0.1779030 Inf
[3,] Inf 0.1779030 0.9515304 0.79429777
[4,] 0.4675764 Inf 0.7942978 0.01206526
How can I do that quickly avoiding a for loop (which I'm able to code)? I've seen many similar posts, but no one gave me what I want. Thank you!
You want to do something like matrix subsetting (e.g., P[Q]) except that you can't use negative indexing in matrix subsetting (e.g., P[-Q] is not allowed). Here's a work-around.
Store the elements you want to retain from A in a 2-column matrix where each row is a coordinate of A:
Idx <- cbind(rep(1:4, each=ncol(B)), as.vector(t(B)))
Create a matrix where all values are Inf, and then overwrite the values you wanted to "keep" from A:
Res <- matrix(Inf, nrow=nrow(A), ncol=ncol(A))
Res[Idx] <- A[Idx]
Result
Res
# [,1] [,2] [,3] [,4]
#[1,] 0.9043131 0.639718071 Inf 0.19158238
#[2,] 0.6397181 0.601327568 0.007363378 Inf
#[3,] Inf 0.007363378 0.752123162 0.61428003
#[4,] 0.1915824 Inf 0.614280026 0.02932679
Here is a one-liner
A[cbind(1:nrow(A), sum(c(1:ncol(A))) - rowSums(B))] <- Inf
[,1] [,2] [,3] [,4]
[1,] 0.4150663 0.23440503 Inf 0.6665222
[2,] 0.2344050 0.38736067 0.01352211 Inf
[3,] Inf 0.01352211 0.88319263 0.9942303
[4,] 0.6665222 Inf 0.99423028 0.7630221
Another way would be to identify the cells with an apply and set then to inf.
cnum <- 1:ncol(A)
A[cbind(1:nrow(A), apply(B, 1, function(x) cnum[-which(cnum %in% x)]))] <- Inf
A
# [,1] [,2] [,3] [,4]
# [1,] 0.9148060 0.9370754 Inf 0.8304476
# [2,] 0.9370754 0.5190959 0.7365883 Inf
# [3,] Inf 0.7365883 0.4577418 0.7191123
# [4,] 0.8304476 Inf 0.7191123 0.9400145
Note: set.seed(42).
A <- matrix(runif(16),nrow = 4,byrow = T)
ind <- lower.tri(A)
A[ind] <- t(A)[ind]
## >A[]
## [,1] [,2] [,3] [,4]
## [1,] 0.07317535 0.167118857 0.0597721 0.2128698
## [2,] 0.16711886 0.008661005 0.6419335 0.6114373
## [3,] 0.05977210 0.641933514 0.7269202 0.3547959
## [4,] 0.21286984 0.611437278 0.3547959 0.4927997
The first thing to notice is that the matrix B is not very helpful in its current form, because the information we need is the rows and each value in B
B<-matrix(c(1,2,4,2,1,3,3,2,4,4,1,3),nrow=4,byrow = T)
> B
## [,1] [,2] [,3]
## [1,] 1 2 4
## [2,] 2 1 3
## [3,] 3 2 4
## [4,] 4 1 3
So we can create that simply by using melt and use Var1 and value.
>melt(B)
## Var1 Var2 value
## 1 1 1 1
## 2 2 1 2
## 3 3 1 3
## 4 4 1 4
## 5 1 2 2
## 6 2 2 1
## 7 3 2 2
## 8 4 2 1
## 9 1 3 4
## 10 2 3 3
## 11 3 3 4
## 12 4 3 3
We need to replace the non existing index in A by inf. This is not easy to do directly. So an easy way out would be to create another matrix of Inf and fill the values of A according to the index of melt(B)
> C<-matrix(Inf,nrow(A),ncol(A))
idx <- as.matrix(melt(B)[,c("Var1","value")])
C[idx]<-A[idx]
> C
## [,1] [,2] [,3] [,4]
## [1,] 0.07317535 0.167118857 0.0597721 0.2128698
## [2,] 0.16711886 0.008661005 0.6419335 Inf
## [3,] Inf 0.641933514 0.7269202 0.3547959
## [4,] 0.21286984 Inf 0.3547959 0.4927997
Another approach that accomplishes matrix subsetting (e.g., P[Q]) would be to create the index Q manually. Here's one approach.
Figure out which column index is "missing" from each row of B:
col_idx <- apply(B, 1, function(x) (1:nrow(A))[-match(x, 1:nrow(A))])
Create subsetting matrix Q
Idx <- cbind(1:nrow(A), col_idx)
Do the replacement
A[Idx] <- Inf
Of course, you can make this a one-liner if you really want to:
A[cbind(1:nrow(A), apply(B, 1, function(x) (1:nrow(A))[-match(x, 1:nrow(A))])]

Apply function on each element of a list of matrices

I have a list of matrices.
(below is a simplified example, I actually have a list of 3 matrices, the first one being in 2D, while the second and third ones are in 3D)
> a <- matrix(-1:2, ncol = 2)
> b <- array(c(-2:5), dim=c(2, 2, 2))
> c_list <- list(a,b)
> c_list
[[1]]
[,1] [,2]
[1,] -1 1
[2,] 0 2
[[2]]
, , 1
[,1] [,2]
[1,] -2 0
[2,] -1 1
, , 2
[,1] [,2]
[1,] 2 4
[2,] 3 5
I'd like to apply the function max(0,c_list) to each and every element (without a loop), in order to have the same type of object as "c_list" but with the negative values replaced by zeros.
> output
[[1]]
[,1] [,2]
[1,] 0 1
[2,] 0 2
[[2]]
, , 1
[,1] [,2]
[1,] 0 0
[2,] 0 1
, , 2
[,1] [,2]
[1,] 2 4
[2,] 3 5
I've managed to do it for a matrice or for a list with mapply or lapply, but not for a list of matrices.
Answer : either Sotos' answer
output <- lapply(c_list, function(i)replace(i, i < 0, 0))
or Moody_Mudskipper's answer
output <- lapply(c_list,pmax,0)
You can use pmax, it will preserve the format of the source matrix and vectorized so faster than looping with max.
lapply(c_list,pmax,0)
Using apply and lapply:
a <- matrix(-1:2, ncol = 2)
b <- matrix(-3:0, ncol = 2)
c <- list(a,b)
d <- lapply(c, function(m) {
apply(m, c(1, 2), function(x) max(0, x))
})
Output:
> d
[[1]]
[,1] [,2]
[1,] 0 1
[2,] 0 2
[[2]]
[,1] [,2]
[1,] 0 0
[2,] 0 0

Mean on the third dimension in R

Is there any quick way or build-in function in R to calculation the mean values based on the third dimension?
For example my array is:
, , 1
[,1] [,2]
[1,] 1 3
[2,] 2 4
, , 2
[,1] [,2]
[1,] 11 13
[2,] 12 14
, , 3
[,1] [,2]
[1,] 21 23
[2,] 22 24
My output would be:
[,1] [,2]
[1,] mean(1,11,21) mean(3,13,23)
[2,] mean(2,12,22) mean(4,14,24)
Thanks!
?apply is your friend for these types of tasks.
# Make the sample data
j <- array(c(1:4, 11:14, 21:24), c(2,2,3))
# For each combination in the 1st and 2nd dimension
# average over the values in the 3rd.
apply(j, c(1,2), mean)

Construct dynamic-sized array in R

I was wondering about what are the ways to construct dynamic-size array in R.
For one example, I want to construct a n-vector but its dimension n is dynamically determined. The following code will work:
> x=NULL
> n=2;
> for (i in 1:n) x[i]=i;
> x
[1] 1 2
For another example, I want to construct a n by 2 matrix where the number of rows n is dynamically determined. But I fail even at assigning the first row:
> tmp=c(1,2)
> x=NULL
> x[1,]=tmp
Error in x[1, ] = tmp : incorrect number of subscripts on matrix
> x[1,:]=tmp
Error: unexpected ':' in "x[1,:"
Thanks and regards!
I think the answers you are looking for are rbind() and cbind():
> x=NULL # could also use x <- c()
> rbind(x, c(1,2))
[,1] [,2]
[1,] 1 2
> x <- rbind(x, c(1,2))
> x <- rbind(x, c(1,2)) # now extend row-wise
> x
[,1] [,2]
[1,] 1 2
[2,] 1 2
> x <- cbind(x, c(1,2)) # or column-wise
> x
[,1] [,2] [,3]
[1,] 1 2 1
[2,] 1 2 2
The strategy of trying to assign to "new indices" on the fly as you attempted can be done in some languages but cannot be done that way in R.
You can also use sparse matrices provided in the Matrix package. They would allow assignments of the form M <- sparseMatrix(i=200, j=50, x=234) resulting in a single value at row 200, column 50 and 0's everywhere else.
require(Matrix)
M <- sparseMatrix(i=200, j=50, x=234)
M[1,1]
# [1] 0
M[200, 50]
# [1] 234
But I think the use of sparse matrices is best reserved for later use after mastering regular matrices.
It is possible to dimension the array after we fill it (in a one-dimensional, vector, fashion)
Emulating the 1-dimension snippet of the question, here's the way it can be done with higher dimensions.
> x=c()
> tmp=c(1,2)
> n=6
> for (i in seq(1, by=2, length=n)) x[i:(i+1)] =tmp;
> dim(x) = c(2,n)
> x
[,1] [,2] [,3] [,4] [,5] [,6]
[1,] 1 1 1 1 1 1
[2,] 2 2 2 2 2 2
>
Rather than using i:(i+1) as index, it may be preferable to use seq(i, length=2) or better yet, seq(i, length=length(tmp)) for a more generic approach, as illustrated below (for a 4 x 7 array example)
> x=c()
> tmp=c(1,2,3,4)
> n=7
> for (i in seq(1, by=length(tmp), length=n))
x[seq(i, length=length(tmp))] = tmp;
> dim(x) = c(length(tmp),n)
> x
[,1] [,2] [,3] [,4] [,5] [,6] [,7]
[1,] 1 1 1 1 1 1 1
[2,] 2 2 2 2 2 2 2
[3,] 3 3 3 3 3 3 3
[4,] 4 4 4 4 4 4 4
>
We can also obtain a similar result by re-assigning x with cbind/rbind, as follow.
> tmp=c(1,2)
> n=6
> x=rbind(tmp)
> for (i in 1:n) x=rbind(x, tmp);
> x
[,1] [,2]
tmp 1 2
tmp 1 2
tmp 1 2
tmp 1 2
tmp 1 2
tmp 1 2
tmp 1 2
Note: one can get rid of the "tmp" names (these are a side effect of the rbind), with
> dimnames(x)=NULL
You can rbind it:
tmp = c(1,2)
x = NULL
rbind(x, tmp)
I believe this is an approach you need
arr <- array(1)
arr <- append(arr,3)
arr[1] <- 2
print(arr[1])
(found on rosettacode.org)
When I want to dynamically construct an array (matrix), I do it like so:
n <- 500
new.mtrx <- matrix(ncol = 2, nrow = n)
head(new.mtrx)
[,1] [,2]
[1,] NA NA
[2,] NA NA
[3,] NA NA
[4,] NA NA
[5,] NA NA
[6,] NA NA
Your matrix is now ready to accept vectors.
Assuming you already have a vector, you pass that to the matrix() function. Notice how values are "broken" into the matrix (column wise). This can be changed with byrow argument.
matrix(letters, ncol = 2)
[,1] [,2]
[1,] "a" "n"
[2,] "b" "o"
[3,] "c" "p"
[4,] "d" "q"
[5,] "e" "r"
[6,] "f" "s"
[7,] "g" "t"
[8,] "h" "u"
[9,] "i" "v"
[10,] "j" "w"
[11,] "k" "x"
[12,] "l" "y"
[13,] "m" "z"
n = 5
x = c(1,2) %o% rep(1,n)
x
# [,1] [,2] [,3] [,4] [,5]
# [1,] 1 1 1 1 1
# [2,] 2 2 2 2 2
x = rep(1,n) %o% c(1,2)
x
# [,1] [,2]
# [1,] 1 2
# [2,] 1 2
# [3,] 1 2
# [4,] 1 2
# [5,] 1 2

How to keep certain values in an array in R?

Suppose I have a data array,
dat <- array(NA, c(115,45,248))
Q1: What I do if I want to get a new data array,
datnew <- array(NA, c(115,45,248))
in which, all the positive value remain and the negative value changed to NA?
Q2: What I do if I want to get a new data array,
datnew <- array(NA,c(115,45,31))
by averaging with the third dimension, but only averaging every 8 values?
Thanks a lot.
For question 2,
you can reverse the order of the dimensions, then add a dimension representing the groups to average over, then use apply:
tmp <- array( 1:32, c(2,2,8) )
tmp2 <- array( aperm(tmp), c(4,2,2,2) )
apply( tmp2, 2:4, mean )
Answer to Q1:
dat[dat < 0] <- NA
We treat dat as if it were a vector (it is but just with dims).
Answer to Q2:
Following Greg's nice, succinct solution, the solution I had in mind when posting my comment earlier was this (using Greg's tmp)
foo <- function(x, grp) aggregate(x, by = list(grp = grp), mean)$x
apply(tmp, 2:1, foo, grp = gl(2,4))
Examples:
Q1
> dat <- array(rnorm(3*3*3), c(3,3,3))
> dat
, , 1
[,1] [,2] [,3]
[1,] 0.1427815 0.1642626 -0.6876034
[2,] 0.6791252 2.1420478 -0.7073936
[3,] -0.9695173 -1.1050933 -0.3068230
, , 2
[,1] [,2] [,3]
[1,] 0.8246182 0.5132398 2.5428203
[2,] -0.4328711 0.9080648 -0.1231653
[3,] -0.7798170 -1.1160706 -0.9237559
, , 3
[,1] [,2] [,3]
[1,] -0.79505298 0.8795420 0.4520150
[2,] 0.04154077 -1.0422061 0.4657002
[3,] -0.67168971 0.7925304 -0.5461143
> dat[dat < 0] <- NA
> dat
, , 1
[,1] [,2] [,3]
[1,] 0.1427815 0.1642626 NA
[2,] 0.6791252 2.1420478 NA
[3,] NA NA NA
, , 2
[,1] [,2] [,3]
[1,] 0.8246182 0.5132398 2.542820
[2,] NA 0.9080648 NA
[3,] NA NA NA
, , 3
[,1] [,2] [,3]
[1,] NA 0.8795420 0.4520150
[2,] 0.04154077 NA 0.4657002
[3,] NA 0.7925304 NA
Q2
> foo <- function(x, grp) aggregate(x, by = list(grp = grp), mean)$x
> apply(tmp, 2:1, foo, grp = gl(2,4))
, , 1
[,1] [,2]
[1,] 7 9
[2,] 23 25
, , 2
[,1] [,2]
[1,] 8 10
[2,] 24 26
> all.equal(apply(tmp, 2:1, foo, grp = gl(2,4)), apply( tmp2, 2:4, mean ))
[1] TRUE
For question 1:
tmp2 <- ifelse(tmp1<0,tmp1,NA)
For question 2 see Greg's solution.

Resources