How to add prefixes to matrix given by a vector? - r

Let's consider matrix:
example_matrix <- matrix(c("big", "small", "big_something",
"small_really", "small", "big_enough",
"themendous", "big", "small"),ncol = 3, nrow = 3)
> example_matrix
[,1] [,2] [,3]
[1,] "big" "small_really" "themendous"
[2,] "small" "small" "big"
[3,] "big_something" "big_enough" "small"
And some vector:
group_vector <- c("group1_big", "group2_small")
This vector shows to which words in matrix I should give prefixes group. We should end up with:
[,1] [,2] [,3]
[1,] "group1_big" "small_really" "themendous"
[2,] "group2_small" "group2_small" "group1_big"
[3,] "big_something" "big_enough" "group2_small"
i.e. we replaced every "big" in example_matrix with group1_big, and "small" with "group2_small" without touching"big_enough, small_really" (just replacing exactly "big" and "small").
My Idea
Let's consider first case i.e. to replace every "big" with "group1_big". My idea was to check which elements in example_matrix ends with "big" and add prefix "group_1" for each of them
> apply(example_matrix, 2, function(x) endsWith(x, "big"))
[,1] [,2] [,3]
[1,] TRUE FALSE FALSE
[2,] FALSE FALSE TRUE
[3,] FALSE FALSE FALSE
And my idea how it can be replaced was the following:
apply(example_matrix, 2, function(x) if endsWith(x, "big") paste0(group_vector[1], x) else x)
So to put condition - if the specific element really ends on "big" then we add the prefix, if not - we leave it.
This code however produces error:
Error: unexpected symbol in "apply(example_matrix, 2, function(x) if endsWith"
Do you know what I'm doing wrong and what's the solution to this problem?

Here's one way using str_replace_all from stringr :
example_matrix[] <- stringr::str_replace_all(example_matrix,
setNames(group_vector, sprintf('\\b%s\\b',
sub('group\\d+_', '', group_vector))))
example_matrix
# [,1] [,2] [,3]
#[1,] "group1_big" "small_really" "themendous"
#[2,] "group2_small" "group2_small" "group1_big"
#[3,] "big_something" "big_enough" "group2_small"
To understand this break it down in smaller steps -
sub removes 'group' + number from group_vector.
sub('group\\d+_', '', group_vector)
#[1] "big" "small"
We add a word boundary to this so that it only matches the pattern that exactly match ('big') does not match with ('big_something').
sprintf('\\b%s\\b', sub('group\\d+_', '', group_vector))
#[1] "\\bbig\\b" "\\bsmall\\b"
Now create a named vector which can be used in str_replace_all :
setNames(group_vector, sprintf('\\b%s\\b', sub('group\\d+_', '', group_vector)))
# \\bbig\\b \\bsmall\\b
# "group1_big" "group2_small"

Related

Efficiently finding minimum cells values from a set of matrices in R

I have a list of matrices (size n*n), and I need to create a new matrix giving the minimum value observed for each cell, based on my list.
For instance, with the following matrices list:
> a = list(matrix(rexp(9), 3), matrix(rexp(9), 3), matrix(rexp(9), 3))
> a
[[1]]
[,1] [,2] [,3]
[1,] 0.5220069 0.39643016 0.04255687
[2,] 0.4464044 0.66029350 0.34116609
[3,] 2.2495949 0.01705576 0.08861866
[[2]]
[,1] [,2] [,3]
[1,] 0.3823704 0.271399 0.7388449
[2,] 0.1227819 1.160775 1.2131681
[3,] 0.1914548 1.004209 0.7628437
[[3]]
[,1] [,2] [,3]
[1,] 0.2125612 0.45379057 1.5987420
[2,] 0.3242311 0.02736743 0.4372894
[3,] 0.6634098 1.15401347 0.9008529
The output should be:
[,1] [,2] [,3]
[1,] 0.2125612 0.271399 0.04255687
[2,] 0.1227819 0.02736743 0.34116609
[3,] 0.1914548 0.01705576 0.08861866
I tried using apply loop with the following code (using melt and dcast from reshape2 library):
library(reshape2)
all = melt(a)
allComps = unique(all[,c(1:2)])
allComps$min=apply(allComps, 1, function(x){
g1 = x[1]
g2 = x[2]
b = unlist(lapply(a, function(y){
return(y[g1,g2])
}))
return(b[which(b==min(b))])
})
dcast(allComps, Var1~Var2)
It works but it is taking a very long time to run when applied on large matrices (6000*6000). I am looking for a faster way to do this.
Use Reduce with pmin :
Reduce(pmin, a)
# [,1] [,2] [,3]
#[1,] 0.02915345 0.03157736 0.3142273
#[2,] 0.57661027 0.05621098 0.1452668
#[3,] 0.48021473 0.18828404 0.4787604
data
set.seed(123)
a = list(matrix(rexp(9), 3), matrix(rexp(9), 3), matrix(rexp(9), 3))
Maybe it should be considered to store the matrices in an array instead of a list. This can be done with simplify2array. In an array the minimum over specific dimensions can be found using min in apply.
A <- simplify2array(a)
apply(A, 1:2, min)
We can use
apply(array(unlist(a), c(3, 3, 3)), 1:2, min)

Explicitly set sequence of vectors in paste -function

Say I wanted to add a minus sign - in front of all values in both columns of a data.frame datasets::cars using apply:
> apply(cars[1:5,], 2, paste0, "-")
speed dist
[1,] "4-" "2-"
[2,] "4-" "10-"
[3,] "7-" "4-"
[4,] "7-" "22-"
[5,] "8-" "16-"
Note, that here the minus is behind the numbers not in front. So I came up with the following which gives the desired output:
> apply(cars[1:5,], 2, function(x) paste0("-", x))
speed dist
[1,] "-4" "-2"
[2,] "-4" "-10"
[3,] "-7" "-4"
[4,] "-7" "-22"
[5,] "-8" "-16"
However, this got me wondering: Is there a way to directly specify the position of the minus or, conversely, the position of the margin values in the paste function?
The syntax of paste0 is paste0(..., collapse = NULL). I.e it takes arguments in the order of their appearance and pastes together. The syntax of apply is apply(X, MARGIN, FUN, ...), where ... stands for additional arguments, that are passed to paste0 after the subsetted element from X on positions 2, 3 and so on. Because apply passes x always in first place there is no way around the anonymous fucntion.
I.e. the argument must be FUN = function(x) paste0("-", x) to force paste0 to put the "-" first.
You can try using some regex
> sapply(cars[1:5,], function(x) sub("(.*)", "-\\1", x)) # infront
speed dist
[1,] "-4" "-2"
[2,] "-4" "-10"
[3,] "-7" "-4"
[4,] "-7" "-22"
[5,] "-8" "-16"
> sapply(cars[1:5,], function(x) sub("(.*)", "\\1-", x)) # behind
speed dist
[1,] "4-" "2-"
[2,] "4-" "10-"
[3,] "7-" "4-"
[4,] "7-" "22-"
[5,] "8-" "16-"
> sapply(cars[1:5,], function(x) sub("(.{1})(.*)", "\\1-\\2", x)) # between
speed dist
[1,] "4-" "2-"
[2,] "4-" "1-0"
[3,] "7-" "4-"
[4,] "7-" "2-2"
[5,] "8-" "1-6"

Using R `outer` with `%in%` operator

I am trying to perform the following outer operation:
x <- c(1, 11)
choices <- list(1:10, 10:20)
outer(x, choices, FUN=`%in%`)
I expect the following matrix:
[,1] [,2]
[1,] TRUE FALSE
[2,] FALSE TRUE
which would correspond to the following operations:
outer(x, choices, FUN=paste, sep=" %in% ")
[,1] [,2]
[1,] "1 %in% 1:10" "1 %in% 10:20"
[2,] "11 %in% 1:10" "11 %in% 10:20"
But for some reason I am getting:
[,1] [,2]
[1,] FALSE FALSE
[2,] FALSE FALSE
What is happening?
As expressed in the comments, the table argument of match (the function called by %in%) isn't intended to be a list (if it is, it gets coerced to a character). You should use vapply:
vapply(choices,function(y) x %in% y,logical(length(x)))
# [,1] [,2]
#[1,] TRUE FALSE
#[2,] FALSE TRUE
Another way that is close to your train of thought, would be to use expand.grid() to create the combinations, and then Map the two columns via %in% function, i.e.
d1 <- expand.grid(x, choices)
matrix(mapply(`%in%`, d1$Var1, d1$Var2), nrow = length(x))
#or you can use Map(`%in%`, ...) in order to keep results in a list
OR
As #nicola suggests, in order to make things better,
d1 <- expand.grid(list(x), choices)
mapply(%in%, d1$Var1, d1$Var2)
both giving,
[,1] [,2]
[1,] TRUE FALSE
[2,] FALSE TRUE

How to edit elements of a 2xn matrix along the rows

I am having some issues working a forloop which allows me to take the following matrix:
> cd
[,1] [,2]
[1,] -142.5066 -132.9431
[2,] -161.6038 -166.9276
and renaming the elements along the columns. Specifically, I want do the following:
if cd[1,1] > cd[1,2] , then I want cd[1,2] == 'STOP'
else cd[1,2]==cd[1,2]
my code right now for a forloop for K rows is:
for(k in 1:2){
if(cd[1,k]>cd[2,k]){
cd[2,k]<-'STOP'
}else{
cd[2,k]<-cd[2,k]
}
print(cd)
}
The output is the following:
[,1] [,2]
[1,] "-142.50660967154" "-132.943085827163"
[2,] "STOP" "-166.92760911847"
[,1] [,2]
[1,] "-142.50660967154" "-132.943085827163"
[2,] "STOP" "-166.92760911847"
Essentially, after running the loop, I want the result to be:
> cd
[,1] [,2]
[1,] -142.5066 -132.9431
[2,] STOP STOP
Thank you again.
When you have a matrix in r, all the elements have to be the same type. At first, you have all numeric elements, so the first comparison works, but when it makes it "stop", then it makes them all character, and "-3" is less than "-4", characterwise. Your options are to use a dataframe instead of a matrix, or use as.numeric(cd[1,k])...
> cd <- matrix(c(-1,-2,-3,-4), nrow = 2)
> for(k in 1:2){
+ if(as.numeric(cd[1,k])>as.numeric(cd[2,k])){
+ cd[2,k]<-'STOP'
+ } else{cd[2,k]<-cd[2,k]}
+ print(cd)
+ }
[,1] [,2]
[1,] "-1" "-3"
[2,] "STOP" "-4"
[,1] [,2]
[1,] "-1" "-3"
[2,] "STOP" "STOP"
With a matrix there can be no conflicting data types in a column so the numeric values are coerced to character. If you want separate types, use a data.frame:
cd <- matrix(c(-142.5066, -132.9431, -161.6038, -166.9276), nrow = 2, byrow = TRUE)
cd[2, ] <- ifelse(cd[1,] > cd[2, ], "STOP", cd[2,])
cd
[,1] [,2]
[1,] "-142.5066" "-132.9431"
[2,] "STOP" "STOP"
Instead of using a loop, you could try the following:
cd[cd[,1] > cd[,2]] <- 'STOP'
[,1] [,2]
cd "-142.5066" "-132.9431"
"STOP" "STOP"

How to use some apply function to solve what requires two for-loops in R

I have a matrix, named "mat", and a smaller matrix, named "center".
temp = c(1.8421,5.6586,6.3526,2.904,3.232,4.6076,4.8,3.2909,4.6122,4.9399)
mat = matrix(temp, ncol=2)
[,1] [,2]
[1,] 1.8421 4.6076
[2,] 5.6586 4.8000
[3,] 6.3526 3.2909
[4,] 2.9040 4.6122
[5,] 3.2320 4.9399
center = matrix(c(3, 6, 3, 2), ncol=2)
[,1] [,2]
[1,] 3 3
[2,] 6 2
I need to compute the distance between each row of mat with every row of center. For example, the distance of mat[1,] and center[1,] can be computed as
diff = mat[1,]-center[1,]
t(diff)%*%diff
[,1]
[1,] 3.92511
Similarly, I can find the distance of mat[1,] and center[2,]
diff = mat[1,]-center[2,]
t(diff)%*%diff
[,1]
[1,] 24.08771
Repeat this process for each row of mat, I will end up with
[,1] [,2]
[1,] 3.925110 24.087710
[2,] 10.308154 7.956554
[3,] 11.324550 1.790750
[4,] 2.608405 16.408805
[5,] 3.817036 16.304836
I know how to implement it with for-loops. I was really hoping someone could tell me how to do it with some kind of an apply() function, maybe mapply() I guess.
Thanks
apply(center, 1, function(x) colSums((x - t(mat)) ^ 2))
# [,1] [,2]
# [1,] 3.925110 24.087710
# [2,] 10.308154 7.956554
# [3,] 11.324550 1.790750
# [4,] 2.608405 16.408805
# [5,] 3.817036 16.304836
If you want the apply for expressiveness of code that's one thing but it's still looping, just different syntax. This can be done without any loops, or with a very small one across center instead of mat. I'd just transpose first because it's wise to get into the habit of getting as much as possible out of the apply statement. (The BrodieG answer is pretty much identical in function.) These are working because R will automatically recycle the smaller vector along the matrix and do it much faster than apply or for.
tm <- t(mat)
apply(center, 1, function(m){
colSums((tm - m)^2) })
Use dist and then extract the relevant submatrix:
ix <- 1:nrow(mat)
as.matrix( dist( rbind(mat, center) )^2 )[ix, -ix]
6 7
# 1 3.925110 24.087710
# 2 10.308154 7.956554
# 3 11.324550 1.790750
# 4 2.608405 16.408805
# 5 3.817036 16.304836
REVISION: simplified slightly.
You could use outer as well
d <- function(i, j) sum((mat[i, ] - center[j, ])^2)
outer(1:nrow(mat), 1:nrow(center), Vectorize(d))
This will solve it
t(apply(mat,1,function(row){
d1<-sum((row-center[1,])^2)
d2<-sum((row-center[2,])^2)
return(c(d1,d2))
}))
Result:
[,1] [,2]
[1,] 3.925110 24.087710
[2,] 10.308154 7.956554
[3,] 11.324550 1.790750
[4,] 2.608405 16.408805
[5,] 3.817036 16.304836

Resources