Avoiding nested loops but iterate over 2 values using (l)apply? - r

I'd like to get better at writing elegant code in R, and am trying to avoid writing nested loops, but cannot figure out an (l)apply solution to my problem.
I have a set of paired files, each of which has two variables associated with them - a name and a number. The filenames are long, so I'd like to generate a vector of filenames that can then be accessed by my own custom downstream function for reading them into a dataframe, plotting, etc.
For example, the files look like:
5_simulationA.k 5_simulationA.b
10_simulationA.k 10_simulationA.b
5_simulationB.k 5_simulationB.b
10_simulationB.k 10_simualtionB.b
The ".k" and ".b" files are mates of a pair and must stay together for downstream processing.
I could read in these files by writing a nested loop that would look something like,
K_files = c()
B_files = c()
for (i in c(A,B,C)){ # iterate over letter variable
for (n in c(5,10,15)){ #iterate over numbers of the files
k_filename = paste(n, "_simulation", i, ".k")
b_filename = paste(n, "_simulation", i, ".b")
K_files = c(K_files, k_filename)
B_files = c(B_files, b_filename)
}
}
This is of course very ugly and un-R-like. I would love to find a way to do this with the very powerful apply or lapply statements, or any other elegant solutions anyone might have. Thanks!

Base R function outer is meant for this kind of problem.
L <- c("A", "B", "C")
N <- c(5, 10, 15)
f <- function(i, n, e) paste0(n, "_simulation", i, e)
sapply(c(".k", ".b"), function(.e) outer(L, N, f, e = .e))
# .k .b
# [1,] "5_simulationA.k" "5_simulationA.b"
# [2,] "5_simulationB.k" "5_simulationB.b"
# [3,] "5_simulationC.k" "5_simulationC.b"
# [4,] "10_simulationA.k" "10_simulationA.b"
# [5,] "10_simulationB.k" "10_simulationB.b"
# [6,] "10_simulationC.k" "10_simulationC.b"
# [7,] "15_simulationA.k" "15_simulationA.b"
# [8,] "15_simulationB.k" "15_simulationB.b"
# [9,] "15_simulationC.k" "15_simulationC.b"

From OP's example output filenames, it looks like we want all combinations of n and i. expand.grid returns a dataframe of all combinations of ns and is. We could then use apply to loop through its row to generate the filenames:
i <- c("A", "B", "C")
n <- c(5, 10, 15)
combi <- expand.grid(n = n, i = i)
invisible(apply(combi, 1, function(x){
k_filename = paste0(x[1], "_simulation", x[2], ".k")
b_filename = paste0(x[1], "_simulation", x[2], ".b")
print(k_filename)
print(b_filename)
}))
Noticed that I used invisible to suppress the output of apply since we are only interested in the side-effects (read/write files). Alternatively, we can use pwalk from purrr, which takes each column of the same expand.grid dataframe as input and creates the filenames silently:
library(dplyr)
library(purrr)
combi %>%
pwalk(~ {
k_filename = paste0(.x, "_simulation", .y, ".k")
b_filename = paste0(.x, "_simulation", .y, ".b")
print(k_filename)
print(b_filename)
})
Output:
[1] "5_simulationA.k"
[1] "5_simulationA.b"
[1] "10_simulationA.k"
[1] "10_simulationA.b"
[1] "15_simulationA.k"
[1] "15_simulationA.b"
[1] "5_simulationB.k"
[1] "5_simulationB.b"
[1] "10_simulationB.k"
[1] "10_simulationB.b"
[1] "15_simulationB.k"
[1] "15_simulationB.b"
[1] "5_simulationC.k"
[1] "5_simulationC.b"
[1] "10_simulationC.k"
[1] "10_simulationC.b"
[1] "15_simulationC.k"
[1] "15_simulationC.b"

library(tidyverse)
Type = c("A", "B", "C")
Index = c(5, 10, 15)
crossing(Type, Index) %>%
mutate(k_filename = map2_chr(Index, Type, ~paste(.x, "_simulation", .y, ".k", sep="")),
b_filename = map2_chr(Index, Type, ~paste(.x, "_simulation", .y, ".b", sep=""))) -> names
After that, you can access the k_filename or b_filename using pull
K_files <- names %>% pull(k_filename)

Related

mean for replicate lists in R?

I have simulation and data structures as follows (just a toy example):
foo = function(mu=0,lambda=1){
x1 = rnorm(1,mu) #X~N(μ,1)
y1 = rexp(1,lambda) #Y~Exp(λ)
list(x=x1,y=y1)
}
mu = 1; lambda = 2 #true values: E(X)=μ=1; E(Y)=1/λ=0.5
set.seed(0); out = replicate(1000, foo(mu,lambda), simplify=FALSE)
# str(out)
Then we get a list out of length(out)=1000, with each list having out$x and out$y.
I want to compute the means for 1000 out$xs and out$ys, respectively.
Of course, I can reach my goal through a not-clever way as
m = c() #for storing simulated values
for(i in 1:2){
s = sapply( 1:1000, function(j)out[[j]][i] )
m[i] = mean( as.numeric(s) )
}
m
# [1] 0.9736922 0.4999028
Can we use a more simple and efficient way to compute the means? I also try lapply(out, mean)
and Reduce("+",out)/1000, but failed...
This is another option if the sublists are always the same length:
> rowMeans(matrix(unlist(out),2))
[1] 0.9736922 0.4999028
Or:
> rowMeans(replicate(1000,unlist(foo(mu,lambda))))
x y
0.9736922 0.4999028
An option is to use purrr::transpose
library(purrr)
out %>% transpose() %>% map(~ mean(unlist(.x)[1:1000]))
# Or: out[1:1000] %>% transpose() %>% map(~ mean(unlist(.x)))
#$x
#[1] 0.9736922
#
#$y
#[1] 0.4999028
Or a base R solution using lapply (which is essentially the same as your explicit for loop):
lapply(c("x", "y"), function(var) mean(sapply(out[1:1000], "[[", var)))
#[[1]]
#[1] 0.9736922
#
#[[2]]
#[1] 0.4999028

Combining elements in a string vector with defined element size and accounting for not event sizes

Given is vector:
vec <- c(LETTERS[1:10])
I would like to be able to combine it in a following manner:
resA <- c("AB", "CD", "EF", "GH", "IJ")
resB <- c("ABCDEF","GHIJ")
where elements of the vector vec are merged together according to the desired size of a new element constituting the resulting vector. This is 2 in case of resA and 5 in case of resB.
Desired solution characteristics
The solution should allow for flexibility with respect to the element sizes, i.e. I may want to have vectors with elements of size 2 or 20
There may be not enough elements in the vector to match the desired chunk size, in that case last element should be shortened accordingly (as shown)
This is shouldn't make a difference but the solution should work on words as well
Attempts
Initially, I was thinking of using something on the lines:
c(
paste0(vec[1:2], collapse = ""),
paste0(vec[3:4], collapse = ""),
paste0(vec[5:6], collapse = "")
# ...
)
but this would have to be adapted to jump through the remaining pairs/bigger groups of the vec and handle last group which often would be of a smaller size.
Here is what I came up with. Using Harlan's idea in this question, you can split the vector in different number of chunks. You also want to use your paste0() idea in lapply() here. Finally, you unlist a list.
unlist(lapply(split(vec, ceiling(seq_along(vec)/2)), function(x){paste0(x, collapse = "")}))
# 1 2 3 4 5
#"AB" "CD" "EF" "GH" "IJ"
unlist(lapply(split(vec, ceiling(seq_along(vec)/5)), function(x){paste0(x, collapse = "")}))
# 1 2
#"ABCDE" "FGHIJ"
unlist(lapply(split(vec, ceiling(seq_along(vec)/3)), function(x){paste0(x, collapse = "")}))
# 1 2 3 4
#"ABC" "DEF" "GHI" "J"
vec <- c(LETTERS[1:10])
f1 <- function(x, n){
f <- function(x) paste0(x, collapse = '')
regmatches(f(x), gregexpr(f(rep('.', n)), f(x)))[[1]]
}
f1(vec, 2)
# [1] "AB" "CD" "EF" "GH" "IJ"
or
f2 <- function(x, n)
apply(matrix(x, nrow = n), 2, paste0, collapse = '')
f2(vec, 5)
# [1] "ABCDE" "FGHIJ"
or
f3 <- function(x, n) {
f <- function(x) paste0(x, collapse = '')
strsplit(gsub(sprintf('(%s)', f(rep('.', n))), '\\1 ', f(x)), '\\s+')[[1]]
}
f3(vec, 4)
# [1] "ABCD" "EFGH" "IJ"
I would say the last is best of these since n for the others must be a factor or you will get warnings or recycling
edit - more
f4 <- function(x, n) {
f <- function(x) paste0(x, collapse = '')
Vectorize(substring, USE.NAMES = FALSE)(f(x), which((seq_along(x) %% n) == 1),
which((seq_along(x) %% n) == 0))
}
f4(vec, 2)
# [1] "AB" "CD" "EF" "GH" "IJ"
or
f5 <- function(x, n)
mapply(function(x) paste0(x, collapse = ''),
split(x, c(0, head(cumsum(rep_len(sequence(n), length(x)) %in% n), -1))),
USE.NAMES = FALSE)
f5(vec, 4)
# [1] "ABCD" "EFGH" "IJ"
Here is another way, working with the original array.
A side note, working with words is not straightforward, since there is at least two ways to understand it: you can either keep each word separately or collapse them first an get individual characters. The next function can deal with both options.
vec <- c(LETTERS[1:10])
vec2 <- c("AB","CDE","F","GHIJ")
cuts <- function(x, n, bychar=F) {
if (bychar) x <- unlist(strsplit(paste0(x, collapse=""), ""))
ii <- seq_along(x)
li <- split(ii, ceiling(ii/n))
return(sapply(li, function(y) paste0(x[y], collapse="")))
}
cuts(vec2,2,F)
# 1 2
# "ABCDE" "FGHIJ"
cuts(vec2,2,T)
# 1 2 3 4 5
# "AB" "CD" "EF" "GH" "IJ"

reassign values in a list without looping

test <- list(a = list("first"= 1, "second" = 2),
b = list("first" = 3, "second" = 4))
In the list above, I would like to reassign the "first" elements to equal, let's say, five. This for loop works:
for(temp in c("a", "b")) {
test[[temp]]$first <- 5
}
Is there a way to do the same using a vectorized operation (lapply, etc)? The following extracts the values, but I can't get them reassigned:
lapply(test, "[[", "first")
Here is a vectorised one-liner using unlist and relist:
relist((function(x) ifelse(grepl("first",names(x)),5,x))(unlist(test)),test)
$a
$a$first
[1] 5
$a$second
[1] 2
$b
$b$first
[1] 5
$b$second
[1] 4
You can do it like this:
test <- lapply(test, function(x) {x$first <- 5; x})

is there a way to extend LETTERS past 26 characters e.g., AA, AB, AC...?

I use LETTERS most of the time for my factors but today I tried to go beyond 26 characters:
LETTERS[1:32]
Expecting there to be an automatic recursive factorization AA, AB, AC... But was disappointed. Is this simply a limitation of LETTERS or is there a way to get what I'm looking for using another function?
Would 702 be enough?
LETTERS702 <- c(LETTERS, sapply(LETTERS, function(x) paste0(x, LETTERS)))
If not, how about 18,278?
MOAR_LETTERS <- function(n=2) {
n <- as.integer(n[1L])
if(!is.finite(n) || n < 2)
stop("'n' must be a length-1 integer >= 2")
res <- vector("list", n)
res[[1]] <- LETTERS
for(i in 2:n)
res[[i]] <- c(sapply(res[[i-1L]], function(y) paste0(y, LETTERS)))
unlist(res)
}
ml <- MOAR_LETTERS(3)
str(ml)
# chr [1:18278] "A" "B" "C" "D" "E" "F" "G" "H" "I" "J" "K" "L" "M" "N" "O" ...
This solution uses recursion. Usage is a bit different in the sense MORELETTERS is not a long vector you will have to store and possibly expand as your inputs get larger. Instead, it is a function that converts your numbers into the new base.
extend <- function(alphabet) function(i) {
base10toA <- function(n, A) {
stopifnot(n >= 0L)
N <- length(A)
j <- n %/% N
if (j == 0L) A[n + 1L] else paste0(Recall(j - 1L, A), A[n %% N + 1L])
}
vapply(i-1L, base10toA, character(1L), alphabet)
}
MORELETTERS <- extend(LETTERS)
MORELETTERS(1:1000)
# [1] "A" "B" ... "ALL"
MORELETTERS(c(1, 26, 27, 1000, 1e6, .Machine$integer.max))
# [1] "A" "Z" "AA" "ALL" "BDWGN" "FXSHRXW"
You can make what you want like this:
LETTERS2<-c(LETTERS[1:26], paste0("A",LETTERS[1:26]))
Another solution for excel style column names, generalized to any number of letters
#' Excel Style Column Names
#'
#' #param n maximum number of letters in column name
excel_style_colnames <- function(n){
unlist(Reduce(
function(x, y) as.vector(outer(x, y, 'paste0')),
lapply(1:n, function(x) LETTERS),
accumulate = TRUE
))
}
A variant on eipi10's method (ordered correctly) using data.table:
library(data.table)
BIG_LETTERS <- c(LETTERS,
do.call("paste0",CJ(LETTERS,LETTERS)),
do.call("paste0",CJ(LETTERS,LETTERS,LETTERS)))
Yet another option:
l2 = c(LETTERS, sort(do.call("paste0", expand.grid(LETTERS, LETTERS[1:3]))))
Adjust the two instances of LETTERS inside expand.grid to get the number of letter pairs you'd like.
A function to produce Excel-style column names, i.e.
# A, B, ..., Z, AA, AB, ..., AZ, BA, BB, ..., ..., ZZ, AAA, ...
letterwrap <- function(n, depth = 1) {
args <- lapply(1:depth, FUN = function(x) return(LETTERS))
x <- do.call(expand.grid, args = list(args, stringsAsFactors = F))
x <- x[, rev(names(x)), drop = F]
x <- do.call(paste0, x)
if (n <= length(x)) return(x[1:n])
return(c(x, letterwrap(n - length(x), depth = depth + 1)))
}
letterwrap(26^2 + 52) # through AAZ
## This will take a few seconds:
# x <- letterwrap(1e6)
It's probably not the fastest, but it extends indefinitely and is nicely predictable. Took about 20 seconds to produce through 1 million, BDWGN.
(For a few more details, see here: https://stackoverflow.com/a/21689613/903061)
A little late to the party, but I want to play too.
You can also use sub, and sprintf in place of paste0 and get a length 702 vector.
c(LETTERS, sapply(LETTERS, sub, pattern = " ", x = sprintf("%2s", LETTERS)))
Here's another addition to the list. This seems a bit faster than Gregor's (comparison done on my computer - using length.out = 1e6 his took 12.88 seconds, mine was 6.2), and can also be extended indefinitely. The flip side is that it's 2 functions, not just 1.
make.chars <- function(length.out, case, n.char = NULL) {
if(is.null(n.char))
n.char <- ceiling(log(length.out, 26))
m <- sapply(n.char:1, function(x) {
rep(rep(1:26, each = 26^(x-1)) , length.out = length.out)
})
m.char <- switch(case,
'lower' = letters[m],
'upper' = LETTERS[m]
)
m.char <- LETTERS[m]
dim(m.char) <- dim(m)
apply(m.char, 1, function(x) paste(x, collapse = ""))
}
get.letters <- function(length.out, case = 'upper'){
max.char <- ceiling(log(length.out, 26))
grp <- rep(1:max.char, 26^(1:max.char))[1:length.out]
unlist(lapply(unique(grp), function(n) make.chars(length(grp[grp == n]), case = case, n.char = n)))
}
##
make.chars(5, "lower", 2)
#> [1] "AA" "AB" "AC" "AD" "AE"
make.chars(5, "lower")
#> [1] "A" "B" "C" "D" "E"
make.chars(5, "upper", 4)
#> [1] "AAAA" "AAAB" "AAAC" "AAAD" "AAAE"
tmp <- get.letters(800)
head(tmp)
#> [1] "A" "B" "C" "D" "E" "F"
tail(tmp)
#> [1] "ADO" "ADP" "ADQ" "ADR" "ADS" "ADT"
Created on 2019-03-22 by the reprex package (v0.2.1)

elementwise combination of two lists in R

Say I have two lists:
list.a <- as.list(c("a", "b", "c"))
list.b <- as.list(c("d", "e", "f"))
I would like to combine these lists recursively, such that the result would be a list of combined elements as a vector like the following:
[[1]]
[1] a d
[[2]]
[1] a e
[[3]]
[1] a f
[[4]]
[1] b d
and so on. I feel like I'm missing something relatively simple here. Any help?
Cheers.
expand.grid(list.a, list.b) gives you the desired result in a data.frame. This tends to be the most useful format for working with data in R. However, you could get the exact structure you ask for (save the ordering) with a call to apply and lapply:
result.df <- expand.grid(list.a, list.b)
result.list <- lapply(apply(result.df, 1, identity), unlist)
If you want this list ordered by the first element:
result.list <- result.list[order(sapply(result.list, head, 1))]
You want mapply (if by "recursively" you mean "in parallel"):
mapply(c, list.a, list.b, SIMPLIFY=FALSE)
Or maybe this is more what you want:
unlist(lapply(list.a, function(a) lapply(list.b, function (b) c(a, b))), recursive=FALSE)
Surprised nobody has mentioned this simple one liner:
as.list(outer(list.a,list.b, paste))
[[1]]
[1] "a d"
[[2]]
[1] "b d"
[[3]]
[1] "c d"
[[4]]
[1] "a e"
This gets you what you are looking for:
unlist(lapply(list.a, function(X) {
lapply(list.b, function(Y) {
c(X, Y)
})
}), recursive=FALSE)
Here is a function you can pass lists to to expand
expand.list <- function(...){
lapply(as.data.frame(t((expand.grid(...)))),c, recursive = TRUE, use.names = FALSE)}
expand.list(list.a, list.b)
Here is a somewhat brute force approach that will, given they are the same dimensions, append list.b to list.a recursively using the append function.
# CREATE LIST OBJECTS
list.a <- as.list(c("a", "b", "c"))
list.b <- as.list(c("d", "e", "f"))
# CREATE AN EMPTY LIST TO POPULATE
list.ab <- list()
# DOUBLE LOOP TO CREATE RECURSIVE COMBINATIONS USING append
ct=0
for( i in 1:length(list.a) ) {
for (j in 1:length(list.b) ) {
ct=ct+1
list.ab[[ct]] <- append(list.a[[i]], list.b[[j]])
}
}
# PRINT RESULTS
list.ab

Resources