Say I wanted to add a minus sign - in front of all values in both columns of a data.frame datasets::cars using apply:
> apply(cars[1:5,], 2, paste0, "-")
speed dist
[1,] "4-" "2-"
[2,] "4-" "10-"
[3,] "7-" "4-"
[4,] "7-" "22-"
[5,] "8-" "16-"
Note, that here the minus is behind the numbers not in front. So I came up with the following which gives the desired output:
> apply(cars[1:5,], 2, function(x) paste0("-", x))
speed dist
[1,] "-4" "-2"
[2,] "-4" "-10"
[3,] "-7" "-4"
[4,] "-7" "-22"
[5,] "-8" "-16"
However, this got me wondering: Is there a way to directly specify the position of the minus or, conversely, the position of the margin values in the paste function?
The syntax of paste0 is paste0(..., collapse = NULL). I.e it takes arguments in the order of their appearance and pastes together. The syntax of apply is apply(X, MARGIN, FUN, ...), where ... stands for additional arguments, that are passed to paste0 after the subsetted element from X on positions 2, 3 and so on. Because apply passes x always in first place there is no way around the anonymous fucntion.
I.e. the argument must be FUN = function(x) paste0("-", x) to force paste0 to put the "-" first.
You can try using some regex
> sapply(cars[1:5,], function(x) sub("(.*)", "-\\1", x)) # infront
speed dist
[1,] "-4" "-2"
[2,] "-4" "-10"
[3,] "-7" "-4"
[4,] "-7" "-22"
[5,] "-8" "-16"
> sapply(cars[1:5,], function(x) sub("(.*)", "\\1-", x)) # behind
speed dist
[1,] "4-" "2-"
[2,] "4-" "10-"
[3,] "7-" "4-"
[4,] "7-" "22-"
[5,] "8-" "16-"
> sapply(cars[1:5,], function(x) sub("(.{1})(.*)", "\\1-\\2", x)) # between
speed dist
[1,] "4-" "2-"
[2,] "4-" "1-0"
[3,] "7-" "4-"
[4,] "7-" "2-2"
[5,] "8-" "1-6"
Related
Let's consider matrix:
example_matrix <- matrix(c("big", "small", "big_something",
"small_really", "small", "big_enough",
"themendous", "big", "small"),ncol = 3, nrow = 3)
> example_matrix
[,1] [,2] [,3]
[1,] "big" "small_really" "themendous"
[2,] "small" "small" "big"
[3,] "big_something" "big_enough" "small"
And some vector:
group_vector <- c("group1_big", "group2_small")
This vector shows to which words in matrix I should give prefixes group. We should end up with:
[,1] [,2] [,3]
[1,] "group1_big" "small_really" "themendous"
[2,] "group2_small" "group2_small" "group1_big"
[3,] "big_something" "big_enough" "group2_small"
i.e. we replaced every "big" in example_matrix with group1_big, and "small" with "group2_small" without touching"big_enough, small_really" (just replacing exactly "big" and "small").
My Idea
Let's consider first case i.e. to replace every "big" with "group1_big". My idea was to check which elements in example_matrix ends with "big" and add prefix "group_1" for each of them
> apply(example_matrix, 2, function(x) endsWith(x, "big"))
[,1] [,2] [,3]
[1,] TRUE FALSE FALSE
[2,] FALSE FALSE TRUE
[3,] FALSE FALSE FALSE
And my idea how it can be replaced was the following:
apply(example_matrix, 2, function(x) if endsWith(x, "big") paste0(group_vector[1], x) else x)
So to put condition - if the specific element really ends on "big" then we add the prefix, if not - we leave it.
This code however produces error:
Error: unexpected symbol in "apply(example_matrix, 2, function(x) if endsWith"
Do you know what I'm doing wrong and what's the solution to this problem?
Here's one way using str_replace_all from stringr :
example_matrix[] <- stringr::str_replace_all(example_matrix,
setNames(group_vector, sprintf('\\b%s\\b',
sub('group\\d+_', '', group_vector))))
example_matrix
# [,1] [,2] [,3]
#[1,] "group1_big" "small_really" "themendous"
#[2,] "group2_small" "group2_small" "group1_big"
#[3,] "big_something" "big_enough" "group2_small"
To understand this break it down in smaller steps -
sub removes 'group' + number from group_vector.
sub('group\\d+_', '', group_vector)
#[1] "big" "small"
We add a word boundary to this so that it only matches the pattern that exactly match ('big') does not match with ('big_something').
sprintf('\\b%s\\b', sub('group\\d+_', '', group_vector))
#[1] "\\bbig\\b" "\\bsmall\\b"
Now create a named vector which can be used in str_replace_all :
setNames(group_vector, sprintf('\\b%s\\b', sub('group\\d+_', '', group_vector)))
# \\bbig\\b \\bsmall\\b
# "group1_big" "group2_small"
I am trying to get this simple 'for loop' to work. I can't get dim(F4) to be a 6848x2 matrix. I just want to divide the row entries of two matrices. Here's what I have...
> dim(F3)
[1] 6848 2
> head(F3)
[,1] [,2]
[1,] 140.9838 516.0239
[2,] 140.9838 516.0239
[3,] 140.9838 516.0239
[4,] 140.9838 516.0239
[5,] 140.9838 516.0239
[6,] 175.5093 515.2280
> dim(scale)
[1] 6848 1
F4 <- matrix(, nrow = nrow(F1), ncol = 1)
for (i in 1:t){
F4[i,]<-(F3[i]/scale[i])} #ONLY WANT F3(i) ROW TO BE DIVIDED BY SCALE(i) ROW
> dim(F4) #DOESN'T GIVE ME 6848x2 Matrix
[1] 6848 1
No need to use a for loop here. Here a vectorized solution:
F3/as.vector(sacle) ## BAD! use of built-in function "scale" as a variable!
Example :
mat <- matrix(1:8,4,2)
sx <- matrix(1:4,4,1)
mat /as.vector(sx)
The use of as.vector to get-rid of matrix division dimensions.
I have a matrix, named "mat", and a smaller matrix, named "center".
temp = c(1.8421,5.6586,6.3526,2.904,3.232,4.6076,4.8,3.2909,4.6122,4.9399)
mat = matrix(temp, ncol=2)
[,1] [,2]
[1,] 1.8421 4.6076
[2,] 5.6586 4.8000
[3,] 6.3526 3.2909
[4,] 2.9040 4.6122
[5,] 3.2320 4.9399
center = matrix(c(3, 6, 3, 2), ncol=2)
[,1] [,2]
[1,] 3 3
[2,] 6 2
I need to compute the distance between each row of mat with every row of center. For example, the distance of mat[1,] and center[1,] can be computed as
diff = mat[1,]-center[1,]
t(diff)%*%diff
[,1]
[1,] 3.92511
Similarly, I can find the distance of mat[1,] and center[2,]
diff = mat[1,]-center[2,]
t(diff)%*%diff
[,1]
[1,] 24.08771
Repeat this process for each row of mat, I will end up with
[,1] [,2]
[1,] 3.925110 24.087710
[2,] 10.308154 7.956554
[3,] 11.324550 1.790750
[4,] 2.608405 16.408805
[5,] 3.817036 16.304836
I know how to implement it with for-loops. I was really hoping someone could tell me how to do it with some kind of an apply() function, maybe mapply() I guess.
Thanks
apply(center, 1, function(x) colSums((x - t(mat)) ^ 2))
# [,1] [,2]
# [1,] 3.925110 24.087710
# [2,] 10.308154 7.956554
# [3,] 11.324550 1.790750
# [4,] 2.608405 16.408805
# [5,] 3.817036 16.304836
If you want the apply for expressiveness of code that's one thing but it's still looping, just different syntax. This can be done without any loops, or with a very small one across center instead of mat. I'd just transpose first because it's wise to get into the habit of getting as much as possible out of the apply statement. (The BrodieG answer is pretty much identical in function.) These are working because R will automatically recycle the smaller vector along the matrix and do it much faster than apply or for.
tm <- t(mat)
apply(center, 1, function(m){
colSums((tm - m)^2) })
Use dist and then extract the relevant submatrix:
ix <- 1:nrow(mat)
as.matrix( dist( rbind(mat, center) )^2 )[ix, -ix]
6 7
# 1 3.925110 24.087710
# 2 10.308154 7.956554
# 3 11.324550 1.790750
# 4 2.608405 16.408805
# 5 3.817036 16.304836
REVISION: simplified slightly.
You could use outer as well
d <- function(i, j) sum((mat[i, ] - center[j, ])^2)
outer(1:nrow(mat), 1:nrow(center), Vectorize(d))
This will solve it
t(apply(mat,1,function(row){
d1<-sum((row-center[1,])^2)
d2<-sum((row-center[2,])^2)
return(c(d1,d2))
}))
Result:
[,1] [,2]
[1,] 3.925110 24.087710
[2,] 10.308154 7.956554
[3,] 11.324550 1.790750
[4,] 2.608405 16.408805
[5,] 3.817036 16.304836
I have two equally long dataset - 'vpXmin' and 'vpXmax' created from 'vp'
> head(vpXmin)
vp
[1,] 253641 2621722
[2,] 253641 2622722
[3,] 253641 2623722
[4,] 253641 2624722
[5,] 253641 2625722
[6,] 253641 2626722
> head(vpXmax)
vp
[1,] 268641 2621722
[2,] 268641 2622722
[3,] 268641 2623722
[4,] 268641 2624722
[5,] 268641 2625722
[6,] 268641 2626722
I want to join each of the rows from these datasets using 'rbind' and want to create separate matrix; e.g.
l1<-rbind(vpXmax[1,],vpXmin[1,])
l2<-rbind(vpXmax[2,],vpXmin[2,])
... ...
Even though I'm not familiar with R loops, I want to deal with such a large data as a loop ... but I failed while trying this:
for (i in 1:length(vp)){rbind(vpXmax[i,],vpXmin[i,])}
Any idea why? Also, please gimme some good references for learning different kinds of loops using R, if any. thanks in advance.
Maybe something like:
vpXmax <- matrix(1:10,ncol=2)
vpXmin <- matrix(11:20,ncol=2)
l <- lapply(1:nrow(vpXmin),function(i) rbind(vpXmax[i,],vpXmin[i,]) )
Then, instead of l1, l2 etc etc you have
l[[1]]
# [,1] [,2]
#[1,] 1 6
#[2,] 11 16
l[[2]]
# [,1] [,2]
#[1,] 2 7
#[2,] 12 17
And although it is probably not ideal, there is one major thing wrong with your initial loop.
You aren't assigning your output, so you need to use assign or <- in some way to actually make an object. However, using assign, is pretty much a flag to set off alarm bells that there is a better way to do things, and <- would require pre-allocating or other stuffing around.
Nevertheless, it will work, albeit polluting your work space with l1 l2... ln objects:
for (i in 1:nrow(vpXmax)) {assign(paste0("l",i), rbind(vpXmax[i,],vpXmin[i,]) )}
> l1
# [,1] [,2]
#[1,] 1 6
#[2,] 11 16
> l2
# [,1] [,2]
#[1,] 2 7
#[2,] 12 17
As #ToNoy indicates, it is not obvious the kind of output that you want. The easiest way to proceed would be to create a list in which each element is the result of rbind each row of the two original data frames.
A <- data.frame("a" = runif(100, -1, 0), "b" = runif(100, 0, 1))
Z <- data.frame("a" = runif(100, -2, -1), "b" = runif(100, 1, 2))
output <- vector("list", nrow(A))
for (i in 1:nrow(A)) {
output[[i]] <- rbind(A[i, ], Z[i, ])
}
I have a matrix "a" like the following:
a<-rbind(c("a1","ost1;ost2;ost3","utr;body;pro"),
c("a2","idh1;idh2","pro;body"),
c("a3","dnm1","body"))
>a
[,1] [,2] [,3]
[1,] "a1" "ost1;ost2;ost3" "utr;body;pro"
[2,] "a2" "idh1;idh2" "pro;body"
[3,] "a3" "dnm1" "body"
I want to get a matrix "b" like this
[,1] [,2] [,3]
[1,] "a1" "ost1" "utr"
[2,] "a1" "ost2" "body"
[3,] "a1" "ost3" "pro"
[4,] "a2" "idh1" "pro"
[5,] "a2" "idh2" "body"
[6,] "a3" "dnm1" "body"
OK, get it:
b<-do.call(rbind, (apply(a, 1, function(x) {do.call(cbind, strsplit(x,";"))})))
Your solution, without the unnecessary parentheses:
do.call(rbind, apply(a, 1, function(x) do.call(cbind, strsplit(x, ";"))))
This also works:
do.call(rbind, lapply(apply(a, 1, strsplit, ';'), do.call, what = cbind))
Not that there is anything wrong with using anonymous functions (function(x){...}), but some people find it more "elegant" without any.