max.col with the value not the index - r

If I have a matrix:
mod_xgb_softprob$pred[1:3,1:3]
[,1] [,2] [,3]
[1,] 6.781361e-04 6.781361e-04 6.781422e-04
[2,] 2.022457e-07 2.022457e-07 4.051039e-07
[3,] 6.714367e-04 6.714367e-04 6.714399e-04
Generated by:
> dput(mod_xgb_softprob$pred[1:3,1:3])
structure(c(0.00067813612986356, 2.02245701075299e-07, 0.000671436660923064,
0.00067813612986356, 2.02245701075299e-07, 0.000671436660923064,
0.000678142241667956, 4.05103861567113e-07, 0.000671439862344414
), .Dim = c(3L, 3L))
I can transform it into a data frame and get the column with the highest value:
x <- mymatrix %>% as.data.frame %>% mutate(max_prob = max.col(., ties.method = "last"))
Looks like this:
> x
V1 V2 V3 max_prob
1 6.781361e-04 6.781361e-04 6.781422e-04 3
2 2.022457e-07 2.022457e-07 4.051039e-07 3
3 6.714367e-04 6.714367e-04 6.714399e-04 3
If I wanted max_prob to be the actual value not the column index, how would I do that?

If you don't mind base R you can use apply. For example:
> x <- matrix(rnorm(9), ncol = 3)
> apply(x, 1, max)
[1] 0.246652 1.063506 2.148525
gives the maximum of the column vectors of x.

Beside the apply method from #Mariane and matrix indexing from #lmo's comment, you can also use matrixStats::rowMaxs:
matrixStats::rowMaxs(mymatrix)
# [1] 6.781422e-04 4.051039e-07 6.714399e-04
If you have a data frame, you can use do.call(pmax, ...) to calculate the parallel maxima of the input columns:
mymatrix %>% as.data.frame %>% mutate(max_val = do.call(pmax, .))
# V1 V2 V3 max_val
#1 6.781361e-04 6.781361e-04 6.781422e-04 6.781422e-04
#2 2.022457e-07 2.022457e-07 4.051039e-07 4.051039e-07
#3 6.714367e-04 6.714367e-04 6.714399e-04 6.714399e-04

Another option which uses max.col, seq_along and mathematics. If m is your matrix, then the following works as well:
mc <- max.col(m, ties.method = 'last')
m[(mc - 1) * nrow(m) + seq_along(mc)]
The result:
[1] 6.781422e-04 4.051039e-07 6.714399e-04
With cbind you can than bind this result to the matrix again:
> cbind(m, m[(mc - 1) * nrow(m) + seq_along(mc)])
[,1] [,2] [,3] [,4]
[1,] 6.781361e-04 6.781361e-04 6.781422e-04 6.781422e-04
[2,] 2.022457e-07 2.022457e-07 4.051039e-07 4.051039e-07
[3,] 6.714367e-04 6.714367e-04 6.714399e-04 6.714399e-04

This is a variation on #h3rm4n's answer, but you can use a special kind of matrix subsetting as well:
> x[cbind(1:nrow(x), max.col(x))]
[1] 6.781361e-04 4.051039e-07 6.714367e-04
Using an index like cbind(i, j) extracts row i and column j for each entry in the resulting matrix.

Related

merging matrix columns that exists inside a numerical list

I have created a list like the following one that contains all combinations of a specific character inside a string. The code that creates the list is as follows :
library(stringr)
test = str_locate_all("TTEST" , "T")
ind1 = lapply( lapply(1:nrow(test[[1]]), combn , x=test[[1]][,1]) , t )
ind1[[1]] = rbind(ind1[[1]], 0 )
and the list that I'm getting looks like
[[1]]
[,1]
[1,] 1
[2,] 2
[3,] 5
[4,] 0
[[2]]
[,1] [,2]
[1,] 1 2
[2,] 1 5
[3,] 2 5
[[3]]
[,1] [,2] [,3]
[1,] 1 2 5
what I want now is to combine/collapse the columns (where ever are more than one) and unlist the whole object in order to create a final vector that will look like c(1, 2, 5, 0, 1:2, 1:5, 2:5, 1:2:5 ) and be able to use it with expand.grid() function later.
Tried to solve it with the following code partially but ":" character went on different position than the wanted.
do.call(paste, c( as.data.frame(ind1[[2]]) ,collapse=":") )
[1] "1 2:1 5:2 5"
Here is an idea via base R where we convert the list elements to data frames and use do.call to paste them, i.e.
unlist(lapply(ind1, function(i) do.call(paste, c(as.data.frame(i), sep = ':'))))
#[1] "1" "2" "5" "0" "1:2" "1:5" "2:5" "1:2:5"

Replacing matrix columns in R

I created a matrix in R
C<-matrix(c(0),nrow=6,ncol=6,byrow = FALSE)
Now I would like to replace the first column of the matrix with the value 1, the second and third column with standard normal random variables and the last three columns with the values of an other matrix.
C<-matrix(c(0),nrow=6,ncol=6,byrow = FALSE)
other.matrix<-matrix(runif(18), nrow = 6, ncol = 3)
C[,1]<-1
C[,3]<-rnorm(6)
C[,4:6]<-other.matrix
To access the rows and columns of matrices (and for that matter, data.frames) in R you can use [] brackets and i,j notation, where i is the row and j is the column. For example, the 3rd row and 2nd column of your matrix C can be addressed with
C[3,2]
#[1] 0
Use <- to assign new values to the rows/columns you have selected.
For the first three columns, you can use
C<-matrix(c(0),nrow=6,ncol=6,byrow = FALSE)
C[ ,1] <- 1; C[ ,2] <- rnorm(6); C[ ,3] <- rnorm(6)
Let's now say your other matrix is called D and looks like
[,1] [,2] [,3] [,4] [,5] [,6] [,7]
[1,] 0.6527716 0.81793644 0.67138209 0.3175264 0.1067119 0.5907180 0.4619992
[2,] 0.2268516 0.90893913 0.62917211 0.1768426 0.3659889 0.0339911 0.2322981
[3,] 0.9264116 0.81693835 0.59555163 0.6960895 0.1667125 0.6631861 0.9718530
[4,] 0.2613363 0.06515864 0.04971742 0.7277188 0.2580444 0.3718222 0.8028141
[5,] 0.2526979 0.49294947 0.97502566 0.7962410 0.8321882 0.2981480 0.7098733
[6,] 0.4245959 0.95951112 0.45632856 0.8227812 0.3542232 0.2680804 0.7042317
Now let's say you want columns 3,4, and 5 in from D as the last three columns in C, then you can simply just say
C[ ,4:6] <- D[ ,3:5]
And your result is
[,1] [,2] [,3] [,4] [,5] [,6]
[1,] 1 -1.76111875 0.4621061 0.67138209 0.3175264 0.1067119
[2,] 1 0.40036245 0.9054436 0.62917211 0.1768426 0.3659889
[3,] 1 -1.03238266 -0.6705829 0.59555163 0.6960895 0.1667125
[4,] 1 -0.47064774 0.3119684 0.04971742 0.7277188 0.2580444
[5,] 1 -0.01436411 -0.4688032 0.97502566 0.7962410 0.8321882
[6,] 1 -1.18711832 0.8227810 0.45632856 0.8227812 0.3542232
Just one thing to note is that this requires your number of rows to be the same between C and D.

R: Creating a data frame from list with missing values.

I have a list here that looks like this:
head(h)
[[1]]
[1] "gene=dnaA" "locus_tag=CD630_00010" "location=1..1320"
[[2]]
character(0)
[[3]]
[1] "locus_tag=CD630_05950" "location=719777..720313"
[[4]]
[1] "gene=dnrA" "locus_tag=CD630_00010" "location=50..1320"
I'm having trouble trying to manipulate this list to create a data.frame with three columns. For the rows with missing gene info, I want to list them as "gene=unnamed" and completely remove the empty rows into a matrix as shown:
[,1] [,2] [,3]
[1,] "gene=dnaA" "locus_tag=CD630_00010" "location=1..1320"
[2,] "gene=thrA" "locus_tag=CD630_05950" "location=719777..720313"
[3,] "gene=dnrA" "locus_tag=CD630_00010" "location=50..1320"
This is what I have right now, but I get an error about missing values in the gene column. Any suggestions?
h <- data.frame(h[lapply(h,length)>0])
h <- t(h)
rownames(h) <- NULL
# Data
l <- list(c("gene=dnaA","locus_tag=CD630_00010", "location=1..1320"),
character(0), c("locusc_tag=CD630_05950", "location=719777..720313"),
c("gene=dnrA","locus_tag=CD630_00010" ,"location=50..1320" ))
# Manipulation
n <- sapply(l, length)
seq.max <- seq_len(max(n))
df <- t(sapply(l, "[", i = seq.max))
df <- t(apply(df,1,function(x){
c(x[is.na(x)],x[!is.na(x)])}))
df <- df[rowSums(!is.na(df))>0, ]
df[is.na(df)] <- "gen=unnamed"
Output:
[,1] [,2] [,3]
[1,] "gene=dnaA" "locus_tag=CD630_00010" "location=1..1320"
[2,] "gen=unnamed" "locusc_tag=CD630_05950" "location=719777..720313"
[3,] "gene=dnrA" "locus_tag=CD630_00010" "location=50..1320"
There are a number of methods for binding lists with unequal lengths. See bind_rows from dplyr, rbind.fill from plyr or rbindlist from data.table. Here is using base R
## Sample data
h <- list(letters[1:3],
character(0),
letters[4:5])
out <- do.call(rbind, lapply(h, `length<-`, 3)) # fix lengths and make matrix
out <- out[rowSums(!is.na(out))>0, ] # remove empty rows
out[is.na(out)] <- "gen=unnamed" # rename NA
data.frame(out)
# X1 X2 X3
# 1 a b c
# 2 d e gen=unnamed

Triplicates in R

I have a set of 80 samples, with 2 variables, each measured as triplicate:
sample var1a var1b var1c var2a var2b var2c
1 -169.784 -155.414 -146.555 -175.295 -159.534 -132.511
2 -180.577 -180.792 -178.192 -177.294 -171.809 -166.147
3 -178.605 -184.183 -177.672 -167.321 -168.572 -165.335
and so on. How do I apply functions like mean, sd, se etc. for each row for var1 and var2? Also, the dataset contains NAs. Thanks for bothering with such basic questions
What is your expected result when there are NAs? apply(df[-1], 1, mean) (or whatever function) will work, but it would give NA as a result for the row. If you can replace NA with 0 then you could do df[is.na(df)] <- 0 first, and then the apply function in order to get the results.
One approach could be to reshape your data set. Another one might be just apply a function over rows of a subset of the data frame.
So, for var2X you have:
apply(dat[5:7], 1, function(x){m <- mean(x); s <- sd(x); da <-c(m, s) })
[,1] [,2] [,3]
[1,] -155.78000 -171.750000 -167.076000
[2,] 21.63763 5.573734 1.632348
and for var1X:
apply(dat[2:4], 1, function(x){m <- mean(x); s <- sd(x); da <-c(m, s) })
[,1] [,2] [,3]
[1,] -157.25100 -179.853667 -180.153333
[2,] 11.72295 1.443055 3.520835

Getting selected matrix columns from a list of matrices

I have a list of matrices with identical dimensions, for example:
mat.list=rep(list(matrix(rnorm(n=12,mean=1,sd=1), nrow = 3, ncol=4)),3)
I'm looking for an efficient way to retrieve a column from each matrix in the list where the column index of interest from each matrix is specified by a vector. For example, for this vector of column indices:
idx.vec=c(3,2,3)
I would like to obtain column 3 from matrix 1, column 2 from matrix 2, and column 3 from matrix 3, as a matrix so that this matrix dimensions are the number of rows of the matrices in the list by the number of matrices in the list.
For this example the result would therefore be:
cbind(mat.list[[1]][,3],mat.list[[2]][,2],mat.list[[3]][,3])
[,1] [,2] [,3]
[1,] 1.4852810 1.305448 1.4852810
[2,] 1.8647327 -1.237507 1.8647327
[3,] -0.0416013 2.156055 -0.0416013
One possible approach would be mapply('[', mat.list, TRUE, idx.vec). The trick is to use '[' for subsetting and TRUE as an argument to select all the rows. Here is how it works:
'['(matrix(1:4, ncol = 2), TRUE, 2)
# [1] 3 4
Another (ugly) approach would be lapply(mat.list, "[",,idx.vec)[[1]]:
> set.seed(1)
> mat.list=rep(list(matrix(rnorm(n=12,mean=1,sd=1), nrow = 3, ncol=4)),3)
> idx.vec=c(3,2,3)
> lapply(mat.list, "[",,idx.vec)[[1]]
[,1] [,2] [,3]
[1,] 1.487429 2.5952808 1.487429
[2,] 1.738325 1.3295078 1.738325
[3,] 1.575781 0.1795316 1.575781

Resources