Is there a method to stop lapply() from returning NULL values for each element of the list when a function doesn't have a return().
Here's a pretty basic example:
x <- function(x) {
return(NULL) }
a.list <- list(a=1,b=2,c=3)
lapply(a.list, x)
The output is:
$a
NULL
$b
NULL
$c
NULL
My goal is to not have that output, at all.
Update: my usage case is as follows. I'm using lapply() to pump out xtable() text and I'm sink()'ing it to an Rnw file. So this NULL output is bugging up my automation.
two options come to mind:
Either
trash_can <- lapply(a.list, x)
or
invisible(lapply(a.list, x))
The first one makes me wonder if there is an analog of Linux's /dev/null in R that you can use to redirect stuff that you don't want. The only problem with creating the variable trash_can is that it will hang around and use up memory unless you rm(trash_can). But I don't think that's a problem here.
You did
R> x <- function(x) { return(NULL) }
R> a.list <- list(a=1,b=2,c=3)
R> res <- lapply(a.list, x)
R> res
$a
NULL
$b
NULL
$c
NULL
R>
and as as you asked lapply to sweep over all elements of the list, you can hardly complain you get results (in res) for all elements of a.list. That is correct.
But what nice about the NULL values, though, is that it is trivial to have them skipped in the next aggregation step:
R> do.call(rbind, res)
NULL
R>
So I've mostly used this approach of returning NULL when the data had an issue or another irregularity arose, as you can easily aggregate the 'good' results afterwards.
You could just do
a.list <- a.list[!sapply(a.list, is.null)]
I think you might want to take a look at l_ply from the plyr package. It is supposed to return nothing, and it has all the properties of lapply, plus some more.
These days I would use purrr::walk since it is meant to do a function for the side effect without returning a result
Related
I have an example function below that reads in a date as a string and returns it as a date object. If it reads a string that it cannot convert to a date, it returns an error.
testFunction <- function (date_in) {
return(as.Date(date_in))
}
testFunction("2010-04-06") # this works fine
testFunction("foo") # this returns an error
Now, I want to use lapply and apply this function over a list of dates:
dates1 = c("2010-04-06", "2010-04-07", "2010-04-08")
lapply(dates1, testFunction) # this works fine
But if I want to apply the function over a list when one string in the middle of two good dates returns an error, what is the best way to deal with this?
dates2 = c("2010-04-06", "foo", "2010-04-08")
lapply(dates2, testFunction)
I presume that I want a try catch in there, but is there a way to catch the error for the "foo" string whilst asking lapply to continue and read the third date?
Use a tryCatch expression around the function that can throw the error message:
testFunction <- function (date_in) {
return(tryCatch(as.Date(date_in), error=function(e) NULL))
}
The nice thing about the tryCatch function is that you can decide what to do in the case of an error (in this case, return NULL).
> lapply(dates2, testFunction)
[[1]]
[1] "2010-04-06"
[[2]]
NULL
[[3]]
[1] "2010-04-08"
One could try to keep it simple rather than to make it complicated:
Use the vectorised date parsing
R> as.Date( c("2010-04-06", "foo", "2010-04-08") )
[1] "2010-04-06" NA "2010-04-08"
You can trivially wrap na.omit() or whatever around it. Or find the index of NAs and extract accordingly from the initial vector, or use the complement of the NAs to find the parsed dates, or, or, or. It is all here already.
You can make your testFunction() do something. Use the test there -- if the returned (parsed) date is NA, do something.
Add a tryCatch() block or a try() to your date parsing.
The whole things is a little odd as you go from a one-type data structure (vector of chars) to something else, but you can't easily mix types unless you keep them in a list type. So maybe you need to rethink this.
You can also accomplish this kind of task with the purrr helper functions map and possibly. For example
library(purrr)
map(dates2, possibly(testFunction, NA))
Here possibly will return NA (or whatever value you specified if an error occurs.
Assuming the testFunction() is not trivial and/or that one cannot alter it, it can be wrapped in a function of your own, with a tryCatch() block. For example:
> FaultTolerantTestFunction <- function(date_in) {
+ tryCatch({ret <- testFunction(date_in);}, error = function(e) {ret <<- NA});
+ ret
+ }
> FaultTolerantTestFunction('bozo')
[1] NA
> FaultTolerantTestFunction('2010-03-21')
[1] "2010-03-21"
(I hope that this question hasn't been asked before).
For convenience I am using abbreviations for functions like "cn" instead of "colnames". However, for colnames/rownames the abbreviated functions only work for reading purposes. I am not able to set colnames with that new "cn" function. Can anyone explain the black magic behind the colnames function? This is the example:
cn <- match.fun(colnames)
x <- matrix(1:2)
colnames(x) <- "a" # OK, works.
cn(x) <- "b" # Error in cn(x) <- "b" : could not find function "cn<-"
Thank you, echasnovski, for the link to that great website.
It has helped me a lot to better understand R!
http://adv-r.had.co.nz/Functions.html#replacement-functions
In R, special "replacement functions" like foo<- can be defined. E.g. we can define a function
`setSecondElement<-` <- function(x, value){
x[2] <- value
return(x)
}
# Let's try it:
x <- 1:3
setSecondElement(x) <- 100
print(x)
# [1] 1 100 3
The colnames<- function works essentially the same. However, "behind the scenes" it will check if x is a data.frame or matrix and set either names(x) or dimnames(x)[[2]]. Just execute the following line in R and you'll see the underlying routine.
print( `colnames<-` )
For my specific problem the solution turns out to be very simple. Remember that I'd like to have a shorter version of colnames which shall be called cn. I can either do it like this:
cn <- match.fun(colnames);
`cn<-` <- function(x, value){
colnames(x) <- value
return(x)
}
More easily, as Stéphane Laurent points out, the definition of `cn<-` can be simplified to:
`cn<-` <- `colnames<-`
There is a minor difference between these approaches. The first approach will define a new function, which calls the colnames<- function. The second approach will copy the reference from the colnames<- function and make exactly the same function call even if you use cn<-. This approach is more efficient, since 1 additinal function call will be avoided.
OK, I have a little problem which I believe I can solve with which and grepl (alternatives are welcome), but I am getting lost:
my_query<- c('g1', 'g2', 'g3')
my_data<- c('string2','string4','string5','string6')
I would like to return the index in my_query matching in my_data. In the example above, only 'g2' is in mydata, so the result in the example would be 2.
It seems to me that there is no easy way to do this without a loop. For each element in my_query, we can use either of the below functions to get TRUE or FALSE:
f1 <- function (pattern, x) length(grep(pattern, x)) > 0L
f2 <- function (pattern, x) any(grepl(pattern, x))
For example,
f1(my_query[1], my_data)
# [1] FALSE
f2(my_query[1], my_data)
# [1] FALSE
Then, we use *apply loop to apply, say f2 to all elements of my_query:
which(unlist(lapply(my_query, f2, x = my_data)))
# [1] 2
Thanks, that seems to work. To be honest, I preferred to your one-line original version. I am not sure why you edited with creating another function to call afterwards with *apply. Is there any advantage as compared to which(lengths(lapply(my_query, grep, my_data)) > 0L)?
Well, I am not entirely sure. When I read ?lengths:
One advantage of ‘lengths(x)’ is its use as a more efficient
version of ‘sapply(x, length)’ and similar ‘*apply’ calls to
‘length’.
I don't know how much more efficient that lengths is compared with sapply. Anyway, if it is still a loop, then my original suggestion which(lengths(lapply(my_query, grep, my_data)) > 0L) is performing 2 loops. My edit is essentially combining two loops together, hopefully to get some boost (if not too tiny).
You can still arrange my new edit into a single line:
which(unlist(lapply(my_query, function (pattern, x) any(grepl(pattern, x)), x = my_data)))
or
which(unlist(lapply(my_query, function (pattern) any(grepl(pattern, my_data)))))
Expanding on a comment posted initially by #Gregor you could try:
which(colSums(sapply(my_query, grepl, my_data)) > 0)
#g2
# 2
The function colSums is vectorized and represents no problem in terms of performance. The sapply() loop seems inevitable here, since we need to check each element within the query vector. The result of the loop is a logical matrix, with each column representing an element of my_query and each row an element of my_data. By wrapping this matrix into which(colSums(..) > 0) we obtain the index numbers of all columns that contain at least one TRUE, i.e., a match with an entry of my_data.
I am using the 'try' function to create some subsets but I cannot manage to keep only the results taht worked with the 'try' function. Below is the line of code I have.
list_shp_Deforested_2000_Africa <- lapply(list_shp_FC_Africa, function(x){try(x[x$D_90_00!=100,],)})
Does somebody know the function which will allow me to keep only the result that worked? Thanks for your help?
When you ask about the try function, you should read help(try) first. The last line in the examples does what you are interested in (where you should substitute list_shp_Deforested_2000_Africa for res).
unlist(res[sapply(res, function(x) !inherits(x, "try-error"))])
You can Filter the list to not include those that inherit "try-error"
Filter(function(x) !inherits(x, "try-error"), list_shp_Deforested_2000_Africa)
Or, if you used tryCatch and return NULL if there is an error, it might be cleaner
L <- lapply(1:10, function(x) tryCatch(if(runif(1) > 0.5) stop() else 42, error=function(e) NULL))
Filter(length, L)
As a guideline I prefer apply functions on elements of a list using lapply or *ply (from plyr) rather than explicitly iterating through them. However, this works well when I have to process one list at a time. When the function takes multiple arguments, I usually do a cycle.
I was wondering if it's possible to have a cleaner construct, still functional in nature. One possible approach could be to define a function similar to Python, zip(x,y), which takes the input lists, and returns a list, whose i-th element is list(x, y), and then apply the function to this list. But my question is whether I am using the cleanest approach or not. I am not worried about performance optimization, but rather clarity/elegance.
Below is the naive example.
A <- as.list(0:9)
B <- as.list(0:9)
f <- function(x, y) x^2+y
OUT <- list()
for (n in 1:10) OUT[[n]] <- f(A[[n]], B[[n]])
OUT
[[1]]
[1] 0
[[2]]
[1] 2
...
And here is the zipped example (which could be extended to arbitrary arguments):
zip <- function(x, y){
stopifnot(length(x)==length(y))
z <- list()
for (i in seq_along(x)){
z[[i]] <- list(x[[i]], y[[i]])
}
z
}
E <- zip(A, B)
lapply(E, function(x) f(x[[1]], x[[2]]))
[[1]]
[1] 0
[[2]]
[1] 2
...
I think you're looking for mapply:
‘mapply’ is a multivariate version of ‘sapply’. ‘mapply’ applies
‘FUN’ to the first elements of each ... argument, the second
elements, the third elements, and so on. Arguments are recycled
if necessary.
For your example, use mapply(f, A, B)
I came across a similar problem today. And after learning the usage of the func mapply, I know how to solve it now.
mapply is so cool!!
Here is an examples:
en = c("cattle", "chicken", "pig")
zh = c("牛", "鸡", "猪")
dict <- new.env(hash = TRUE)
Add <- function(key, val) dict[[key]] <- val
mapply(Add, en, zh)
## cattle chicken pig
## "牛" "鸡" "猪"
I think you could do this with what I call an 'implicit loop' (this name does not hit it fully, but whatever), taking into account that you can loop over vectors within *apply:
OUT <- lapply(1:10, function(x) (A[[x]]^2 + B[[x]]))
or
OUT <- lapply(1:10, function(x) f(A[[x]], B[[x]]))
Note that you then could also use vapply (or 'sapply`) for output managing (i.e. if you don't want a list).
(by the way, I am not getting what you want with the zip function, so I am sorry, if I missed your point.)