zipping lists in R - r

As a guideline I prefer apply functions on elements of a list using lapply or *ply (from plyr) rather than explicitly iterating through them. However, this works well when I have to process one list at a time. When the function takes multiple arguments, I usually do a cycle.
I was wondering if it's possible to have a cleaner construct, still functional in nature. One possible approach could be to define a function similar to Python, zip(x,y), which takes the input lists, and returns a list, whose i-th element is list(x, y), and then apply the function to this list. But my question is whether I am using the cleanest approach or not. I am not worried about performance optimization, but rather clarity/elegance.
Below is the naive example.
A <- as.list(0:9)
B <- as.list(0:9)
f <- function(x, y) x^2+y
OUT <- list()
for (n in 1:10) OUT[[n]] <- f(A[[n]], B[[n]])
OUT
[[1]]
[1] 0
[[2]]
[1] 2
...
And here is the zipped example (which could be extended to arbitrary arguments):
zip <- function(x, y){
stopifnot(length(x)==length(y))
z <- list()
for (i in seq_along(x)){
z[[i]] <- list(x[[i]], y[[i]])
}
z
}
E <- zip(A, B)
lapply(E, function(x) f(x[[1]], x[[2]]))
[[1]]
[1] 0
[[2]]
[1] 2
...

I think you're looking for mapply:
‘mapply’ is a multivariate version of ‘sapply’. ‘mapply’ applies
‘FUN’ to the first elements of each ... argument, the second
elements, the third elements, and so on. Arguments are recycled
if necessary.
For your example, use mapply(f, A, B)

I came across a similar problem today. And after learning the usage of the func mapply, I know how to solve it now.
mapply is so cool!!
Here is an examples:
en = c("cattle", "chicken", "pig")
zh = c("牛", "鸡", "猪")
dict <- new.env(hash = TRUE)
Add <- function(key, val) dict[[key]] <- val
mapply(Add, en, zh)
## cattle chicken pig
## "牛" "鸡" "猪"

I think you could do this with what I call an 'implicit loop' (this name does not hit it fully, but whatever), taking into account that you can loop over vectors within *apply:
OUT <- lapply(1:10, function(x) (A[[x]]^2 + B[[x]]))
or
OUT <- lapply(1:10, function(x) f(A[[x]], B[[x]]))
Note that you then could also use vapply (or 'sapply`) for output managing (i.e. if you don't want a list).
(by the way, I am not getting what you want with the zip function, so I am sorry, if I missed your point.)

Related

R: Reference list item within the same list

In R, we can reference items created within that same list, i.e.:
list(a = a <- 1, b = a)
I am curious if there is a way to write a function which takes the place of a = a <- 1. That is, if something like
`%=%` <- function(x,y) {
envir <- environment()
char_x <- deparse(substitute(x))
assign(char_x, y, parent.env(envir))
unlist(lapply(setNames(seq_along(x),char_x), function(T) y))
}
# does not work
list(a%=%1, b=a)
is possible in R (i.e. returns the list given above)?
edit: I think this boils down to asking, 'can we call list with a language object that preserves all aspects of manually coding list?' (specifically, assigns the list's names attribute the left-hand side of the language element).
It seems to me that below shows that such a solution is hopeless.
my_call <- do.call(substitute, list(expr(expr = {x = y}), list(x=quote(a), y=1)))
equals <- languageEl(my_call, which = 1)
str(equals)
do.call(list, list(equals))
Welp, the clever folk behind tibble have figured this out in their lst() function (also in package dplyr)
library(dplyr)
lst(a=1, b=a, c=c(3,4), d=c)
What a useful feature!

A replacement for `subset()` for a list of data.frames

Function foo1 can subset (using subset()) a list of data.frames by one or more requested variables (e.g., by = ESL == 1 or by == ESL == 1 & type == 4).
However, I'm aware of the danger of using subset() in R. Thus, I wonder in foo1 below, what I can use instead of subset() to get the same output?
foo1 <- function(data, by){
s <- substitute(by)
L <- split(data, data$study.name) ; L[[1]] <- NULL
lapply(L, function(x) do.call("subset", list(x, s))) ## What to use instead of `subset`
## to get the same output?
}
# EXAMPLE OF USE:
D <- read.csv("https://raw.githubusercontent.com/izeh/i/master/k.csv", header=TRUE) # DATA
foo1(D, ESL == 1)
You can compute on the language. Building on my answer to "Working with substitute after $ sign in R":
foo1 <- function(data, by){
s <- substitute(by)
L <- split(data, data$study.name) ; L[[1]] <- NULL
E <- quote(x$a)
E[[3]] <- s[[2]]
s[[2]] <- E
eval(bquote(lapply(L, function(x) x[.(s),])))
}
foo1(D, ESL == 1)
This gets more complex for arbitrary subset expressions. You'd need a recursive function that crawls the parse tree and inserts the calls to $ at the right places.
Personally, I'd just use package data.table where this is easier because you don't need $, i.e., you can just do eval(bquote(lapply(L, function(x) setDT(x)[.(s),]))) without changing s. OTOH, I wouldn't do this at all. There is really no reason to split before subsetting.
I would guess (based on general knowledge and a quick skim of the answers to the "dangers of subset()" question) that the dangers of subset are intrinsic dangers of non-standard evaluation (NSE); if you want to be able to pass a generic expression and have it evaluated within the context of a data frame, I think you're more or less stuck with subset() or something like it.
If you were willing to use a more constrained set of expressions such as var, vals (looking for cases where the variable indexed by string var took on values in the vector vals) you could use
d[d[[var]] %in% vals, ]
Here var is a string, not a naked R symbol ("cyl" rather than cyl); it's unambiguous that you want to extract it from the data frame.
You could extend this to a vector of variables and a list of vectors of values:
for (i in seq_along(vars)) {
d <- d[d[[vars[i]]] %in% vals[[i]], ]
}
but if you want the full flexibility of expressions (e.g. to be able to use either ESL == 1 & type == 4 or ESL == 1 | type == 4, or inequalities based on numeric variables) I think you're stuck with an NSE-based approach.
It's conceivable that the new-ish "tidy eval" machinery (in the rlang package, documented in some detail here) would give you a slightly more principled approach, but I don't think the dangers will completely go away.

getting lost in Using which() and regex in R

OK, I have a little problem which I believe I can solve with which and grepl (alternatives are welcome), but I am getting lost:
my_query<- c('g1', 'g2', 'g3')
my_data<- c('string2','string4','string5','string6')
I would like to return the index in my_query matching in my_data. In the example above, only 'g2' is in mydata, so the result in the example would be 2.
It seems to me that there is no easy way to do this without a loop. For each element in my_query, we can use either of the below functions to get TRUE or FALSE:
f1 <- function (pattern, x) length(grep(pattern, x)) > 0L
f2 <- function (pattern, x) any(grepl(pattern, x))
For example,
f1(my_query[1], my_data)
# [1] FALSE
f2(my_query[1], my_data)
# [1] FALSE
Then, we use *apply loop to apply, say f2 to all elements of my_query:
which(unlist(lapply(my_query, f2, x = my_data)))
# [1] 2
Thanks, that seems to work. To be honest, I preferred to your one-line original version. I am not sure why you edited with creating another function to call afterwards with *apply. Is there any advantage as compared to which(lengths(lapply(my_query, grep, my_data)) > 0L)?
Well, I am not entirely sure. When I read ?lengths:
One advantage of ‘lengths(x)’ is its use as a more efficient
version of ‘sapply(x, length)’ and similar ‘*apply’ calls to
‘length’.
I don't know how much more efficient that lengths is compared with sapply. Anyway, if it is still a loop, then my original suggestion which(lengths(lapply(my_query, grep, my_data)) > 0L) is performing 2 loops. My edit is essentially combining two loops together, hopefully to get some boost (if not too tiny).
You can still arrange my new edit into a single line:
which(unlist(lapply(my_query, function (pattern, x) any(grepl(pattern, x)), x = my_data)))
or
which(unlist(lapply(my_query, function (pattern) any(grepl(pattern, my_data)))))
Expanding on a comment posted initially by #Gregor you could try:
which(colSums(sapply(my_query, grepl, my_data)) > 0)
#g2
# 2
The function colSums is vectorized and represents no problem in terms of performance. The sapply() loop seems inevitable here, since we need to check each element within the query vector. The result of the loop is a logical matrix, with each column representing an element of my_query and each row an element of my_data. By wrapping this matrix into which(colSums(..) > 0) we obtain the index numbers of all columns that contain at least one TRUE, i.e., a match with an entry of my_data.

Print.myclass function in R

I am trying to develop my first package in R and I am facing some issues with "myclass" generic functions that i will try to describe.
Assume a data.frame X with n <- nrow(X) rows and K <- ncol(X) columns.
My main package function (too big to put it in this post) lets say
fun1 <- function(X){
# do staff...
out <- list(index= character vector, A= A, B= B,... etc)
return(out)
class(out) <- "myclass"
}
returns as an output a list. Then I have to use the output for the generic print method in a print.myclass function. However, in my print function I want to use the data frame X used in my main function without asking the user to provide it in an argument (i.e, print(out,X)) and without having it in my output list out (visible to the user at least). Is there any way to do that? Thanks in advance!

R: How to use as.call with vectors as optional parameters?

I'm trying to write a wrapper for a function in order to use lists as input. I cannot change the function itself, therefore I need a workaround outside of it. I use as.call() and it works without optional arguments, but I fail to make it work when I have vectors as optional arguments.
Example:
# function I cannot change
func <- function(..., opt=c(1,2)) {
cl <- match.call(expand.dots = FALSE)
names <- lapply(cl[[2]],as.character)
ev <- parent.frame()
classes <- unlist(lapply(names,function(name){class(get(name,envir=ev))}))
print(c(opt,names, classes))
}
a <- structure(1:3, class="My_Class")
b <- structure(letters[1:3], class="My_Class")
lst <- list(a, b)
names(lst) <- c("a","b")
# Normal result
func(a,b,opt=c(3,4))
# This should give the same but it doesn't
call <- as.call(append(list(func), list(names(lst), opt=c(3,4))))
g <- eval(call, lst)
Instead of a list as optional argument, I also tried c(), but this doesn't work either. Does anybody have a suggestion or a help page? ?call wasn't to clear about my problem.
(I already asked a previous question to the topic here: R: How to use list elements like arguments in ellipsis? , but left out the detail about the optional parameter and cannot figure it out now.)
This produces the same result for me under both versions
call <- as.call(c(list(quote(func)), lapply(names(lst), as.name), list(opt=c(3,4))))
g <- eval(call, lst)
EDIT: as per Hadley's suggestions in comments.

Resources