R loop over two or more vectors simultaneously - paralell - r

I was looking for method to iterate over two or more character vectors/list in R simultaneously ex. is it some way to do something like:
foo <- c('a','c','d')
bar <- c('aa','cc','dd')
for(i in o){
print(o[i], p[i])
}
Desired result:
'a', 'aa'
'c', 'cc'
'd', 'dd'
In Python we can do simply:
foo = ('a', 'c', 'd')
bar = ('aa', 'cc', 'dd')
for i, j in zip(foo, bar):
print(i, j)
But can we do this in R?

Like this?
foo <- c('a','c','d')
bar <- c('aa','cc','dd')
for (i in 1:length(foo)){
print(c(foo[i],bar[i]))
}
[1] "a" "aa"
[1] "c" "cc"
[1] "d" "dd"
Works under the condition that the vectors are the same length.

In R, you rather iterate based on the indices than on vectors directly:
for (i in 1:(min(length(foo), length(bar)))){
print(foo[i], bar[i])
}

Another option is to use mapply. This wouldn't make a lot of sense for printing, but I'm assuming you have an interest in doing this for something more interesting than print
foo <- c('a','c','d')
bar <- c('aa','cc','dd')
invisible(
mapply(function(f, b){ print(c(f, b))},
foo, bar)
)

Maybe someone arriving based on the title makes good use of this:
foo<-LETTERS[1:10]
bar<-LETTERS[1:3]
i = 0
for (j in 1:length(foo)){
i = i + 1
if (i > length(bar)){
i = 1
}
print(paste(foo[j],bar[i]) )
}
[1] "A A"
[1] "B B"
[1] "C C"
[1] "D A"
[1] "E B"
[1] "F C"
[1] "G A"
[1] "H B"
[1] "I C"
[1] "J A"
which is "equivalent" to: (using for eases assignments)
suppressWarnings(invisible(
mapply(function(x, y){
print(paste(x, y))},
foo, bar)
))

Related

How to disambiguate repeated strings by appending varying length strings?

I saw the clever code submitted by Gabor G. in response to this question about disambiguation of strings. His answer, slightly modified, is:
uniqName <- function(x){
thenames <- ave(x,x,FUN = function(z){
znam <- if (length(z) == 1) z else sprintf("%s%02d", z, seq_along(z))
return(znam)
})
return(thenames)
}
I wanted to go for an "invisible" version of that, and tried to come up with a compact function that would append N spaces to the (N+1)th occurrence of a name.
(Gabor's code calculates an integer and appends that, so the number of characters appended is constant). The best I could do was the following clunky function ("fatit")
spacify <- function (x){
fatit <-function(x){
k = vector(length=length(x))
for(jp in 1:length(x)){
k[jp]=sprintf('%s%s',x[jp],paste0(rep(' ',jp),collapse=''))
}
return(k)
}
spaceOut <- ave(x,x, FUN = function(z) if (length(z) == 1) z else fatit(z) )
return(spaceOut)
}
Is there some cleaner, more compact, way to set the number of characters to append based on length(z) in the fatit function ?
Note:
uniqName(foo)
[1] "a01" "b01" "c01" "a02" "b02" "a03" "c02" "d" "e"
spacify(foo)
[1] "a " "b " "c " "a " "b " "a " "c " "d" "e"
We can take advantage of make.unique by striping the numbers that make the characters unique, and using them (... + 1) as reference as to how many characters to append, i.e.
i1 <- as.numeric(gsub('\\D+', '', make.unique(x)))
i1[is.na(i1)] <- 0 #because where there is no number it returns NA
paste0(x, sapply(i1 + 1, function(i) paste(rep(' ', each = i), collapse = '')))
#[1] "a " "b " "c " "a " "b " "a " "c " "d " "e "
We can take advantage of the stri_pad_right function from stringi:
library(stringi)
f <- function(x){
ave(x, x, FUN = function(z){
if(length(z) == 1) z else stri_pad_right(z, nchar(z[1]) + seq_along(z))
})
}
x <- c('a', 'b', 'c', 'a', 'b', 'a', 'c', 'd', 'e')
f(x)
# [1] "a " "b " "c " "a " "b " "a " "c " "d" "e"
Using stringr::str_pad(..., side = 'right') is conceptually similar.

How do I replicate a nested for loop with mapply?

I would like to vectorize the creation of a list in R, but can only get what I want with a nested for loop. I've included a vastly simplified version of my problem for reproducibility. Can someone help me to modify or replace my mapply function?
Desired functionality:
my_list <- list()
A <- c("one", "two", "three", "four")
B <- c("left", "right")
for (a in A) {
for (b in B) {
my_list <- c(my_list, paste(a, b))
}
}
print(my_list)
output (edited white space for brevity):
[[1]] [1] "one left"
[[2]] [1] "one right"
[[3]] [1] "two left"
[[4]] [1] "two right"
[[5]] [1] "three left"
[[6]] [1] "three right"
[[7]] [1] "four left"
[[8]] [1] "four right"
My attempt to vectorize this:
combinate <- function(a, b) {
return(paste(a, b))
}
mapply(combinate, a=A, b=B, SIMPLIFY=FALSE)
output:
$one [1] "one left"
$two [1] "two right"
$three [1] "three left"
$four [1] "four right"
I'm not concerned about labels; I'm concerned about getting all eight results from looping over both lists. I have found documentation that mapply is doing exactly what it is supposed to by pairing the first items from both lists, then the second items from both lists, etc. repeating shorter lists. But after much searching, I can't find what must be there, a way to pair all list items combinatorically like the nested for loop.
We can do with expand.grid and paste
v1 <- do.call(paste, expand.grid(A, B))
Or with outer
v1 <- c(outer(A, B, paste))
If these needs to be in a list
as.list(v1)
Checking with the OP's output
identical(as.list( c(t(outer(A, B, paste)))), my_list)
#[1] TRUE

How to use sub on a vector in R?

Consider this R code and output:
> the_string <- "a, b, c"
> the_vec <- strsplit(the_string, ",")
> str(the_vec)
List of 1
$ : chr [1:3] "a" " b" " c"
> str(sub("^ +", "", the_vec))
chr "c(\"a\", \" b\", \" c\")"
Looks like sub returns a single character array instead of a vector of character arrays. I'm hoping for:
chr [1:3] "a" "b" "c"
How do I get that?
Edit: the_string will come from users, so I want to tolerate a variable number of spaces, zero to many.
Edit: the tokens may have spaces in the middle that should be preserved. So, "a, b c,d" should result in c('a', 'b c', 'd').
the_string <- "a, b, c"
the_vec <- unlist(strsplit(the_string, ", "))
If you add the space after the comma and unlist the entire thing you get the vector.
Update:
If the string has a varying amount of space between characters, I would remove all of the excess spaces and then run the same as above. I chose 5 but maybe your string has more. Also I added a second step to split characters that do not have a comma in between characters.
a <- "a, b, c, d, e, f g, h,i"
a <- gsub("( {2,5})", " ",a)
a <- unlist(strsplit(a, ", |,"))
unlist(strsplit(a, " "))
[1] "a" "b" "c" "d" "e" "f" "g" "h" "i"
strsplit creates a list where each element is a vector of the split of each item in the original vector, eg.:
strsplit( c("a, b, c", "d, e"), ",")
[[1]]
[1] "a" " b" " c"
[[2]]
[1] "d" " e"
Here you only have one item in the input vector, so the result is all in the first item of the list:
the_string <- "a, b, c"
the_list <- strsplit(the_string, ",")
sub("^ +", "", the_list[[1]])
[1] "a" "b" "c"
If you don't use [[1]] or unlist, the_list is coerced to a character vector using as.character:
as.character(the_list)
[1] "c(\"a\", \" b\", \" c\")"
One base-R solution
lapply(the_vec, function(x) sub("^ +", "", x))[[1]]
[1] "a" "b" "c"

Concatenating groups of vector character elements

I don't know the proper technical terms for this kind of operation, so it has been difficult to search for existing solutions. I thought I would try to post my own question and hopefully someone can help me out (or point me in the right direction).
I have a vector of characters and I want to collect them in groups of twos and threes. To illustrate, here is a simplified version:
The table I have:
"a"
"b"
"c"
"d"
"e"
"f"
I want to run through the vector and concatenate groups of two and three elements. This is the end result I want:
"a b"
"b c"
"c d"
"d e"
"e f"
And
"a b c"
"b c d"
"c d e"
"d e f"
I solved this the simplest and dirtiest way possible by using for-loops, but it takes a long time to run and I am convinced it can be done more efficiently.
Here is my ghetto-hack:
t1 <- c("a", "b", "c", "d", "e", "f")
t2 <- rep("", length(t1)-1)
for (i in 1:length(t1)-1) {
t2[i] = paste(t1[i], t1[i+1])
}
t3 <- rep("", length(t1)-2)
for (i in 1:length(t1)-2) {
t3[i] = paste(t1[i], t1[i+1], t1[i+2])
}
I was looking into sapply and tapply etc. but I can't seem to figure out how to use "the following element" in the vector.
Any help will be rewarded with my eternal gratitude!
-------------- Edit --------------
Run times of the suggestions using input data with ~ 3 million rows:
START: [1] "2016-11-20 19:24:50 CET"
For-loop: [1] "2016-11-20 19:28:26 CET"
rollapply: [1] "2016-11-20 19:38:55 CET"
apply(matrix): [1] "2016-11-20 19:42:15 CET"
paste t1[-length...]: [1] "2016-11-20 19:42:37 CET"
grep: [1] "2016-11-20 19:44:30 CET"
Have you considered the zoo package? For example
library('zoo')
input<-c('a','b','c','d','e','f')
output<-rollapply(data=input, width=2, FUN=paste, collapse=" ")
output
will return
"a b" "b c" "c d" "d e" "e f"
The width argument controls how many elements to concatenate. I expect you'll have improved runtimes here too but I haven't tested
For groups of two, we can do this with
paste(t1[-length(t1)], t1[-1])
#[1] "a b" "b c" "c d" "d e" "e f"
and for higher numbers, one option is shift from data.table
library(data.table)
v1 <- do.call(paste, shift(t1, 0:2, type="lead"))
grep("NA", v1, invert=TRUE, value=TRUE)
#[1] "a b c" "b c d" "c d e" "d e f"
Or
n <- length(t1)
n1 <- 3
apply(matrix(t1, ncol=n1, nrow = n+1)[seq(n-(n1-1)),], 1, paste, collapse=' ')

Pasting two strings using paste function and its collapse argument

I am trying to paste two vectors
vector_1 <- c("a", "b")
vector_2 <- c("x", "y")
paste(vector_1, vector_2, collapse = " + ")
The output I get is
"a + b x + y "
My desired output is
"a + b + x + y"
paste with more then one argument will paste together term-by-term.
> paste(c("a","b","c"),c("A","B","C"))
[1] "a A" "b B" "c C"
the result being the length of the longest vector, with the shorter term recycled. That enables things like this to work:
> paste("A",c("1","2","BBB"))
[1] "A 1" "A 2" "A BBB"
> paste(c("1","2","BBB"),"A")
[1] "1 A" "2 A" "BBB A"
then sep is used within the elements and collapse to join the elements.
> paste(c("a","b","c"),c("A","B","C"))
[1] "a A" "b B" "c C"
> paste(c("a","b","c"),c("A","B","C"),sep="+")
[1] "a+A" "b+B" "c+C"
> paste(c("a","b","c"),c("A","B","C"),sep="+",collapse="#")
[1] "a+A#b+B#c+C"
Note that once you use collapse you get a single result rather than three.
You seem to not want to combine your two vectors element-wise, so you need to turn them into one vector, which you can do with c(), giving us the solution:
> c(vector_1, vector_2)
[1] "a" "b" "x" "y"
> paste(c(vector_1, vector_2), collapse=" + ")
[1] "a + b + x + y"
Note that sep isn't needed - you are just collapsing the individual elements into one string.

Resources