Expand columns of matrix in R - r

I have a matrix "a" like the following:
a<-rbind(c("a1","ost1;ost2;ost3","utr;body;pro"),
c("a2","idh1;idh2","pro;body"),
c("a3","dnm1","body"))
>a
[,1] [,2] [,3]
[1,] "a1" "ost1;ost2;ost3" "utr;body;pro"
[2,] "a2" "idh1;idh2" "pro;body"
[3,] "a3" "dnm1" "body"
I want to get a matrix "b" like this
[,1] [,2] [,3]
[1,] "a1" "ost1" "utr"
[2,] "a1" "ost2" "body"
[3,] "a1" "ost3" "pro"
[4,] "a2" "idh1" "pro"
[5,] "a2" "idh2" "body"
[6,] "a3" "dnm1" "body"
OK, get it:
b<-do.call(rbind, (apply(a, 1, function(x) {do.call(cbind, strsplit(x,";"))})))

Your solution, without the unnecessary parentheses:
do.call(rbind, apply(a, 1, function(x) do.call(cbind, strsplit(x, ";"))))
This also works:
do.call(rbind, lapply(apply(a, 1, strsplit, ';'), do.call, what = cbind))
Not that there is anything wrong with using anonymous functions (function(x){...}), but some people find it more "elegant" without any.

Related

R as.Date() has two different return fomats

Here is my observation. Given this sample data
> xx
[,1] [,2] [,3] [,4] [,5] [,6]
[1,] "ABC" "20" "04" "13" "C" "00700000"
[2,] "XYZ" "20" "04" "13" "C" "00800000"
> class(xx)
[1] "matrix"
I wrote a simple function to parse this data matrix:
foo <- function(xx)
{
year=2000+as.integer(xx[2])
month = as.integer(xx[3])
day=as.integer(xx[4])
as.Date(sprintf("%02d-%02d-%04d",month,day, year), format="%m-%d-%Y")
}
When applying this function to xx using apply(), I got these:
> apply(xx, 1, foo)
[1] 18365 18365
I don't know why 18365 was displayed here, maybe 18365 representing NA? Some posts on Stack Overflow said as.Date is sensitive to Sys.setlocale(). But if I just type this in the same terminal:
> as.Date(sprintf("%02d-%02d-%04d", 4, 23, 2020), format="%m-%d-%Y")
[1] "2020-04-23"
So it seems proved there is no Sys.setlocale() problem for as.Date(). Can someone point out what the problem is? Thanks
So it proved
apply will do that. See the reference to as.vector in the help file for apply.
1) Instead, use ISOdate and as.Date.
as.Date(ISOdate(as.integer(m[, 2]) + 2000L, m[, 3], m[, 4]))
## [1] "2020-04-13" "2020-04-13"
2) or as.Date and paste:
as.Date(paste(as.integer(m[, 2]) + 2000L, m[, 3], m[, 4], sep = "-"))
## [1] "2020-04-13" "2020-04-13"
Note
Lines <- '"ABC" "20" "04" "13" "C" "00700000"
"XYZ" "20" "04" "13" "C" "00800000"'
m <- matrix(scan(text = Lines, what = "", quiet = TRUE), 2, byrow = TRUE)
m
## [,1] [,2] [,3] [,4] [,5] [,6]
## [1,] "ABC" "20" "04" "13" "C" "00700000"
## [2,] "XYZ" "20" "04" "13" "C" "00800000"

Explicitly set sequence of vectors in paste -function

Say I wanted to add a minus sign - in front of all values in both columns of a data.frame datasets::cars using apply:
> apply(cars[1:5,], 2, paste0, "-")
speed dist
[1,] "4-" "2-"
[2,] "4-" "10-"
[3,] "7-" "4-"
[4,] "7-" "22-"
[5,] "8-" "16-"
Note, that here the minus is behind the numbers not in front. So I came up with the following which gives the desired output:
> apply(cars[1:5,], 2, function(x) paste0("-", x))
speed dist
[1,] "-4" "-2"
[2,] "-4" "-10"
[3,] "-7" "-4"
[4,] "-7" "-22"
[5,] "-8" "-16"
However, this got me wondering: Is there a way to directly specify the position of the minus or, conversely, the position of the margin values in the paste function?
The syntax of paste0 is paste0(..., collapse = NULL). I.e it takes arguments in the order of their appearance and pastes together. The syntax of apply is apply(X, MARGIN, FUN, ...), where ... stands for additional arguments, that are passed to paste0 after the subsetted element from X on positions 2, 3 and so on. Because apply passes x always in first place there is no way around the anonymous fucntion.
I.e. the argument must be FUN = function(x) paste0("-", x) to force paste0 to put the "-" first.
You can try using some regex
> sapply(cars[1:5,], function(x) sub("(.*)", "-\\1", x)) # infront
speed dist
[1,] "-4" "-2"
[2,] "-4" "-10"
[3,] "-7" "-4"
[4,] "-7" "-22"
[5,] "-8" "-16"
> sapply(cars[1:5,], function(x) sub("(.*)", "\\1-", x)) # behind
speed dist
[1,] "4-" "2-"
[2,] "4-" "10-"
[3,] "7-" "4-"
[4,] "7-" "22-"
[5,] "8-" "16-"
> sapply(cars[1:5,], function(x) sub("(.{1})(.*)", "\\1-\\2", x)) # between
speed dist
[1,] "4-" "2-"
[2,] "4-" "1-0"
[3,] "7-" "4-"
[4,] "7-" "2-2"
[5,] "8-" "1-6"

mean of triplicate measurments in 3 matrices

I am having 3 matrices that store values from triplicate measurements and would like to take the mean of the 3 matrices.
So let's say the three matrices are:
m1<-t(matrix(c("text", 1:3), ncol=2, nrow=4))
m2<-t(matrix(c("text", 1:3), ncol=2, nrow=4))
m3<-t(matrix(c("text", 1:3), ncol=2, nrow=4))
> m1
[,1] [,2] [,3] [,4]
[1,] "text" "1" "2" "3"
[2,] "text" "1" "2" "3"
> m2
[,1] [,2] [,3] [,4]
[1,] "text" "1" "2" "3"
[2,] "text" "1" "2" "3"
> m3
[,1] [,2] [,3] [,4]
[1,] "text" "1" "2" "3"
[2,] "text" "1" "2" "3"
I would like to have this for every position of the matrices:
mean(m1[i,j], m2[i,j], m2[i,j])
So I tried it with 2 for loops:
for(i in ncol(m1)){
for(j in nrow(m1)){
means[i,j]<-mean(m1[i,j], m2[i,j], m2[i,j])
}
which obviously doesn't work
The text in the first column isn't an issue if NA is returned.
Anyone could help me please?
Thanks!
We can place it in a list, convert to numeric and use Reduce
lst <- lapply(list(m1[, -1], m2[,-1], m3[, -1]), as.numeric)
Reduce(`+`,lst)/length(lst)
If there are many matrices starting with 'm', we can use mget
lst <- lapply(mget(paste0("m", 1:3)), function(x) as.numeric(x[,-1]))
and then do the Reduce step.

cbind values to sublists in R

I have two matrices in a list:
colList <- list()
colList[["V1"]] <- as.matrix(c("asd", "asd", "asd"))
colList[["V2"]] <- as.matrix(c("das", "das", "das"))
And I want to cbind the values of a data.frame value.frame$keyID to each sublist. The first value (2000) to the first sublist, the second value (3000) to the second sublist.
Here the value.frame:
value.frame <- data.frame(keyID =c("2000", "3000"))
The result should look like this:
colList <- list()
colList[["V1"]] <- matrix(c("asd", "asd", "asd", 2000, 2000, 2000),
nrow=3,
ncol=2)
colList[["V2"]] <- matrix(c("das", "das", "das", 3000, 3000, 3000),
nrow=3,
ncol=2)
I tried it with the following code, but the result is not the desired one. Hope someone can help me.
mapply( cbind, colList, paste(value.frame[,1]))
Using lapply and seq_along
nms <- names(colList)
colList <- lapply(seq_along(colList), x=colList,
y=as.character(value.frame$keyID), function(j, x, y) {
cbind(x[[j]], y[j])
})
names(colList) <- nms
colList[["V1"]]
[,1] [,2]
[1,] "asd" "2000"
[2,] "asd" "2000"
[3,] "asd" "2000"
colList[["V2"]]
[,1] [,2]
[1,] "das" "3000"
[2,] "das" "3000"
[3,] "das" "3000"
You could do this with mapply using the option SIMPLIFY=FALSE
mapply(cbind, colList, as.character(value.frame$keyID), SIMPLIFY=FALSE)
#$V1
# [,1] [,2]
#[1,] "asd" "2000"
#[2,] "asd" "2000"
#[3,] "asd" "2000"
#$V2
# [,1] [,2]
#[1,] "das" "3000"
#[2,] "das" "3000"
#[3,] "das" "3000"
Or using Map which is a wrapper for mapply(..., SIMPLIFY=FALSE)
Map(cbind, colList, as.character(value.frame$keyID))

R flatten out list hierarchy to matrix or data.frame

I would like to flatten out a list hierarchy (similar to JSON) to a matrix or data frame. Let's say that I create the following list:
a <- list(
b1 = list(
c1 = list(
d1 = data.frame()
),
c2 = data.frame()
),
b2 = data.frame()
)
Where each letter is another level or step down the hierarchy. Then I want a function, e.g. listToMatrix(mylist = a, steps = 2), that generates the following:
[,1] [,2]
[1,] "b1" "c1"
[2,] "b1" "c2"
[3,] "b2" "b2"
Observe that the function's argument steps = 2 imply that it should only go 2 steps down the hierarchy. Also, if there aren't enough levels available in one direction, see b2, then it should keep the previous list name in the matrix.
Any suggestions? :)
Here is a solution. So it reads easily here, I have broken the code into two parts. Later, you can easily merge the two parts into a single function.
First, a function that gets a matrix of all the names, using recursion:
anames <- function(x) {
require(plyr)
if (is.data.frame(x)) return(NA)
y <- do.call(rbind.fill.matrix,
mapply(cbind, names(x), lapply(x, anames),
SIMPLIFY = FALSE))
colnames(y) <- NULL
return(y)
}
anames(a)
# [,1] [,2] [,3] [,4]
# [1,] "b1" "c1" "d1" NA
# [2,] "b1" "c2" NA NA
# [3,] "b2" NA NA NA
Then, a function that applies the given steps input, and fills the NAs like you requested:
listToMatrix <- function(myList, steps = Inf) {
a <- anames(myList)
steps <- min(steps, ncol(a) - 1)
cols.idx <- seq_len(steps)
a <- a[, cols.idx]
for (j in tail(cols.idx, -1))
a[, j] <- ifelse(is.na(a[, j]), a[, j - 1], a[, j])
return(a)
}
listToMatrix(a, 2)
# [,1] [,2]
# [1,] "b1" "c1"
# [2,] "b1" "c2"
# [3,] "b2" "b2"

Resources