I have a function that I have made which returns a dataframe with two variables. As a simple example lets have:
test <- function(x) {y <- matrix( 5 , nrow= x , ncol = 2)
z<- data.frame(y)
return(z) }
I want to find out on which x values this function gives an error. (on our example I think for negative values, but I just want to convey the concept.) So I try:
z <- rep(0)
testnumbers <- c(0,1,2,3,4,-1,5)
for (i in 1:length(testnumbers)) {
tempo <- tryCatch( testfun(testnumbers[i]) , error= function(e) return(0) )
if (tempo == 0 ) z[i] <- {testnumbers[i] next}
}
What is wrong with my process and how can I find where in my function does not work?
If you're looking to run all of the testnumbers regardless of any of them failing, I suggest a slightly different tact.
Base R
This borrows from Rui's use of inherits which is more robust and unambiguous. It goes one step further by preserving not just which one had the error, but the actual error text as well:
testfun <- function(x) {
y <- matrix(5, nrow = x, ncol = 2)
z <- as.data.frame(y)
z
}
testnumbers <- c(0, 1, 2, 3, 4, -1, 5)
rets <- setNames(
lapply(testnumbers, function(n) tryCatch(testfun(n), error=function(e) e)),
testnumbers
)
sapply(rets, inherits, "error")
# 0 1 2 3 4 -1 5
# FALSE FALSE FALSE FALSE FALSE TRUE FALSE
Filter(function(a) inherits(a, "error"), rets)
# $`-1`
# <simpleError in matrix(5, nrow = x, ncol = 2): invalid 'nrow' value (< 0)>
(The setNames(lapply(...), ...) is because the inputs are numbers so sapply(..., simplify=F) did not preserve the names, something I thought was important.)
All of this falls in line with what some consider good practice: if you're doing one function to a lot of "things", then do it in a list, and therefore in one of the *apply functions.
tidyverse
There is a function in purrr that formalizes this a little: safely, which returns a function wrapped around its argument. For instance:
library(purrr)
safely(testfun)
# function (...)
# capture_error(.f(...), otherwise, quiet)
# <environment: 0x0000000015151d90>
It is returning a function that can then be passed. A one-time call would look like one of the following:
safely(testfun)(0)
# $result
# [1] V1 V2
# <0 rows> (or 0-length row.names)
# $error
# NULL
testfun_safe <- safely(testfun)
testfun_safe(0)
# $result
# [1] V1 V2
# <0 rows> (or 0-length row.names)
# $error
# NULL
To use it here, you can do:
rets <- setNames(
lapply(testnumbers, safely(testfun)),
testnumbers
)
str(rets[5:6])
# List of 2
# $ 4 :List of 2
# ..$ result:'data.frame': 4 obs. of 2 variables:
# .. ..$ V1: num [1:4] 5 5 5 5
# .. ..$ V2: num [1:4] 5 5 5 5
# ..$ error : NULL
# $ -1:List of 2
# ..$ result: NULL
# ..$ error :List of 2
# .. ..$ message: chr "invalid 'nrow' value (< 0)"
# .. ..$ call : language matrix(5, nrow = x, ncol = 2)
# .. ..- attr(*, "class")= chr [1:3] "simpleError" "error" "condition"
Filter(Negate(is.null), sapply(rets, `[[`, "error"))
# $`-1`
# <simpleError in matrix(5, nrow = x, ncol = 2): invalid 'nrow' value (< 0)>
and to get to the results of all runs (including the errant one):
str(sapply(rets, `[[`, "result"))
# List of 7
# $ 0 :'data.frame': 0 obs. of 2 variables:
# ..$ V1: num(0)
# ..$ V2: num(0)
# $ 1 :'data.frame': 1 obs. of 2 variables:
# ..$ V1: num 5
# ..$ V2: num 5
# $ 2 :'data.frame': 2 obs. of 2 variables:
# ..$ V1: num [1:2] 5 5
# ..$ V2: num [1:2] 5 5
# $ 3 :'data.frame': 3 obs. of 2 variables:
# ..$ V1: num [1:3] 5 5 5
# ..$ V2: num [1:3] 5 5 5
# $ 4 :'data.frame': 4 obs. of 2 variables:
# ..$ V1: num [1:4] 5 5 5 5
# ..$ V2: num [1:4] 5 5 5 5
# $ -1: NULL
# $ 5 :'data.frame': 5 obs. of 2 variables:
# ..$ V1: num [1:5] 5 5 5 5 5
# ..$ V2: num [1:5] 5 5 5 5 5
or just the results without the failed run:
str(Filter(Negate(is.null), sapply(rets, `[[`, "result")))
# List of 6
# $ 0:'data.frame': 0 obs. of 2 variables:
# ..$ V1: num(0)
# ..$ V2: num(0)
# $ 1:'data.frame': 1 obs. of 2 variables:
# ..$ V1: num 5
# ..$ V2: num 5
# $ 2:'data.frame': 2 obs. of 2 variables:
# ..$ V1: num [1:2] 5 5
# ..$ V2: num [1:2] 5 5
# $ 3:'data.frame': 3 obs. of 2 variables:
# ..$ V1: num [1:3] 5 5 5
# ..$ V2: num [1:3] 5 5 5
# $ 4:'data.frame': 4 obs. of 2 variables:
# ..$ V1: num [1:4] 5 5 5 5
# ..$ V2: num [1:4] 5 5 5 5
# $ 5:'data.frame': 5 obs. of 2 variables:
# ..$ V1: num [1:5] 5 5 5 5 5
# ..$ V2: num [1:5] 5 5 5 5 5
You were actually quite close. I'm not sure what did the trick in the end but I
Changed the 1:length(testnumbers) as this is unneccessary
Changed return(0) to a character
Wrapped your if in another if as it kept failing if the length was larger than 1 or could not be assessed.
Then you get the correct results. You could try and change the code bit by bit to see what was wrong.
test <- function(x) {y <- matrix( 5 , nrow = x , ncol = 2)
z<- data.frame(y)
return(z) }
errored <- numeric()
testnumbers <- c(0,1,2,3,4,-1,5)
for (i in testnumbers) {
tempo <- tryCatch(test(i), error = function(e) "error")
if (length(tempo) == 1) {
if (tempo == "error") errored <- c(errored, i)
}
}
errored
> -1
You need tryCatch to return the error, not zero.
testfun <- function(x) {
y <- matrix(5, nrow = x, ncol = 2)
z <- as.data.frame(y)
z
}
testnumbers <- c(0, 1, 2, 3, 4, -1, 5)
z <- numeric(length(testnumbers))
for (i in seq_along(testnumbers)) {
tempo <- tryCatch(testfun(testnumbers[i]), error = function(e) e)
if (inherits(tempo, "error")) {
z[i] <- testnumbers[i]
}
}
z
#[1] 0 0 0 0 0 -1 0
Also,
In order to coerce a matrix to data.frame use as.data.frame.
I have removed the calls to return since the last value of a function is its return value.
rep(0) is the same as just 0, replaced by numeric(length(testnumbers)).
seq_along(testnumbers) is always better than 1:length(testnumbers). Try it with testnumbers of length zero and see what happens.
Related
I have the following two data frames that in a list called df.list
df1 <- data.frame(name=c("a","b","c"),total=c("1","2","3"),other=c("100","200","300"))
df2 <- data.frame(name=c("d","e","f"),total=c("4","5","6"),other=c("100","200","300"))
df.list <- list(df1,df2)
[[1]]
name total other
1 a 1 100
2 b 2 200
3 c 3 300
[[2]]
name total other
1 d 4 100
2 e 5 200
3 f 6 300
I want to be able to go through each data frame in the list and covert the total and other columns to be numeric, and assign it back to df.list
I tried the following but it does not seem to work
lapply(df.list, function(x) as.numeric(x[2:3]))
We may use type.convert directly on the list
df.list2 <- type.convert(df.list, as.is = TRUE)
-checking the structure
str(df.list2)
List of 2
$ :'data.frame': 3 obs. of 3 variables:
..$ name : chr [1:3] "a" "b" "c"
..$ total: int [1:3] 1 2 3
..$ other: int [1:3] 100 200 300
$ :'data.frame': 3 obs. of 3 variables:
..$ name : chr [1:3] "d" "e" "f"
..$ total: int [1:3] 4 5 6
..$ other: int [1:3] 100 200 300
If we want to loop, then as.integer/as.numeric works on vectors. So, we need to loop again
df.list2 <- lapply(df.list, function(x) {
x[2:3] <- lapply(x[2:3], as.integer)
x})
Or maybe this one:
library(purrr)
df.list %>%
map(., ~mutate(.x, across(c(other, total), ~as.numeric(.x)))) %>%
str()
List of 2
$ :'data.frame': 3 obs. of 3 variables:
..$ name : chr [1:3] "a" "b" "c"
..$ total: num [1:3] 1 2 3
..$ other: num [1:3] 100 200 300
$ :'data.frame': 3 obs. of 3 variables:
..$ name : chr [1:3] "d" "e" "f"
..$ total: num [1:3] 4 5 6
..$ other: num [1:3] 100 200 300
you can create a function that works for each data frame such as the following functional_as_numeric() and then apply to each element in the list with map() from {purrr}. Personally I find {purrr}'s interface more consistent and easier to follow than the traditional _apply() functions.
library(purrr)
functional_as_numeric <- function(df) {
df %>% mutate(
total = as.numeric(total),
other = as.numeric(other)
)
}
df.list.result <- df.list %>%
purrr::map(functional_as_numeric)
str(df.list.result)
List of 2
$ :'data.frame': 3 obs. of 3 variables:
..$ name : chr [1:3] "a" "b" "c"
..$ total: num [1:3] 1 2 3
..$ other: num [1:3] 100 200 300
$ :'data.frame': 3 obs. of 3 variables:
..$ name : chr [1:3] "d" "e" "f"
..$ total: num [1:3] 4 5 6
..$ other: num [1:3] 100 200 300
After a global tidyverse update, I have noted a change of behaviour in my code and after many researches I am desperately unable to solve the issue. Basically I need to convert a list of elements (including lists) to a dataframe.
Here is a reprex:
Data
x <- list(
col1 = list("a", "b", "c", NA),
col2 = list(1, 2, 3, 4),
col3 = list("value1", "value2", "value1", c("value1", "value2")))
Expected behaviour and output (before tidyverse update):
x <- data.frame((sapply(x, c)))
x <- purrr::map_df(x, function(x) sapply(x, function(x) unlist(x))) %>% as.data.frame()
> x
# col1 col2 col3
# 1 a 1 value1
# 2 b 2 value2
# 3 c 3 value1
# 4 <NA> 4 value1, value2
> str(x)
# 'data.frame': 4 obs. of 3 variables:
# $ col1: chr "a" "b" "c" NA
# $ col2: num 1 2 3 4
# $ col3:List of 4
# ..$ : chr "value1"
# ..$ : chr "value2"
# ..$ : chr "value1"
# ..$ : chr "value1" "value2"
Problem encountered after update
x <- data.frame((sapply(x, c)))
x <- purrr::map_df(x, function(x) sapply(x, function(x) unlist(x)))
# Error: Argument 1 must have names.
# Run `rlang::last_error()` to see where the error occurred.
# In addition: Warning message:
# Outer names are only allowed for unnamed scalar atomic inputs
> rlang::last_error()
# <error/rlang_error>
# Argument 1 must have names.
# Backtrace:
# 1. purrr::map_df(x, function(x) sapply(x, function(x) unlist(x)))
# 2. dplyr::bind_rows(res, .id = .id)
# Run `rlang::last_trace()` to see the full context.
This error seems to be well known, and I have explored many options with the purrr::flatten_() family, and others found on Stackoverflow but was not able to solve.
Thank you if any help is providable!
The first part of your attempt gives you a list for every column irrespective of it's length.
x <- data.frame((sapply(x, c)))
str(x)
#'data.frame': 4 obs. of 3 variables:
# $ col1:List of 4
# ..$ : chr "a"
# ..$ : chr "b"
# ..$ : chr "c"
# ..$ : logi NA
# $ col2:List of 4
# ..$ : num 1
# ..$ : num 2
# ..$ : num 3
# ..$ : num 4
# $ col3:List of 4
# ..$ : chr "value1"
# ..$ : chr "value2"
# ..$ : chr "value1"
# ..$ : chr "value1" "value2"
You can unlist the above for columns with only 1 element.
x[] <- lapply(x, function(p) if(max(lengths(p)) == 1) unlist(p) else p)
x
# col1 col2 col3
#1 a 1 value1
#2 b 2 value2
#3 c 3 value1
#4 <NA> 4 value1, value2
str(x)
#'data.frame': 4 obs. of 3 variables:
# $ col1: chr "a" "b" "c" NA
# $ col2: num 1 2 3 4
# $ col3:List of 4
# ..$ : chr "value1"
# ..$ : chr "value2"
# ..$ : chr "value1"
# ..$ : chr "value1" "value2"
One option utilizing dplyr, tibble and purrr could be:
imap_dfc(x, ~ tibble(!!.y := .x)) %>%
mutate(across(where(~ all(lengths(.) == 1)), ~ unlist(.)))
col1 col2 col3
<chr> <dbl> <list>
1 a 1 <chr [1]>
2 b 2 <chr [1]>
3 c 3 <chr [1]>
4 <NA> 4 <chr [2]>
no tidyverse solution, but it seems to work..
library( rlist )
as.data.frame( rlist::list.cbind( x ) )
# col1 col2 col3
# 1 a 1 value1
# 2 b 2 value2
# 3 c 3 value1
# 4 NA 4 value1, value2
I'm new to R programming, trying to write a loop to extract a number from a list containing dataframes. However, I can't seem to subset the list correctly. This is probably basic, but its driving me nuts by now!
df1 <- tibble("a"=c(1,2,3,4,5), "b"=c(22,23,24,25,26), c=c("alpha", "beta","alpha", "beta", "alpha"))
df2 <- tibble("a"=c(1,2,3,4,5), "b"=c(22,23,24,25,26), c=c("alpha", "beta","alpha", "beta", "alpha"))
df3 <- tibble("a"=c(1,2,3,4,5), "b"=c(22,23,24,25,26), c=c("alpha", "beta","alpha", "beta", "alpha"))
df4 <- tibble("a"=c(1,2,3,4,5), "b"=c(22,23,24,25,26), c=c("alpha", "beta","alpha", "beta", "alpha"))
list <- c(df1, df2, df3, df4)
res <- vector("numeric",4)
df2[[2,2]]
for (i in list){
res[i] <- i[[2,2]]
}
I get this (and similar) error; "Error in i[[2, 2]] : incorrect number of subscripts"
Thankful for any help.
I suppose you're trying to pick out the second row of the second column in each dataframe that is part of a list:
library(tidyverse)
res <- list(df1 = tibble("a"=c(1,2,3,4,5), "b"=c(22,23,24,25,26),
c=c("alpha", "beta","alpha", "beta", "alpha")),
df2 = tibble("a"=c(1,2,3,4,5), "b"=c(22,23,24,25,26),
c=c("alpha", "beta","alpha", "beta", "alpha")),
df3 = tibble("a"=c(1,2,3,4,5), "b"=c(22,23,24,25,26),
c=c("alpha", "beta","alpha", "beta", "alpha")),
df4 = tibble("a"=c(1,2,3,4,5), "b"=c(22,23,24,25,26),
c=c("alpha", "beta","alpha", "beta", "alpha"))) %>%
# if you want the answer in a dataframe
purrr::map_df(~ .x %>%
dplyr::select(2) %>% # Pick the second column
dplyr::slice(2)) %>% # Pick the second row's value
unlist() # if you want it as a vector
> res
b1 b2 b3 b4
23 23 23 23
(Up front, I'm going to use mylist instead of list.)
Your data is not a list of frames as you appear to be trying to use it as.
mylist <- c(df1, df2, df3, df4)
str(mylist)
# List of 12
# $ a: num [1:5] 1 2 3 4 5
# $ b: num [1:5] 22 23 24 25 26
# $ c: chr [1:5] "alpha" "beta" "alpha" "beta" ...
# $ a: num [1:5] 1 2 3 4 5
# $ b: num [1:5] 22 23 24 25 26
# $ c: chr [1:5] "alpha" "beta" "alpha" "beta" ...
# $ a: num [1:5] 1 2 3 4 5
# $ b: num [1:5] 22 23 24 25 26
# $ c: chr [1:5] "alpha" "beta" "alpha" "beta" ...
# $ a: num [1:5] 1 2 3 4 5
# $ b: num [1:5] 22 23 24 25 26
# $ c: chr [1:5] "alpha" "beta" "alpha" "beta" ...
Instead, use list:
mylist <- list(df1, df2, df3, df4)
str(mylist)
# List of 4
# $ :Classes 'tbl_df', 'tbl' and 'data.frame': 5 obs. of 3 variables:
# ..$ a: num [1:5] 1 2 3 4 5
# ..$ b: num [1:5] 22 23 24 25 26
# ..$ c: chr [1:5] "alpha" "beta" "alpha" "beta" ...
# $ :Classes 'tbl_df', 'tbl' and 'data.frame': 5 obs. of 3 variables:
# ..$ a: num [1:5] 1 2 3 4 5
# ..$ b: num [1:5] 22 23 24 25 26
# ..$ c: chr [1:5] "alpha" "beta" "alpha" "beta" ...
# $ :Classes 'tbl_df', 'tbl' and 'data.frame': 5 obs. of 3 variables:
# ..$ a: num [1:5] 1 2 3 4 5
# ..$ b: num [1:5] 22 23 24 25 26
# ..$ c: chr [1:5] "alpha" "beta" "alpha" "beta" ...
# $ :Classes 'tbl_df', 'tbl' and 'data.frame': 5 obs. of 3 variables:
# ..$ a: num [1:5] 1 2 3 4 5
# ..$ b: num [1:5] 22 23 24 25 26
# ..$ c: chr [1:5] "alpha" "beta" "alpha" "beta" ...
for (i in mylist) { print(i[[2,2]]); break; }
# [1] 23
(Notional for loop just to demonstrate that i[[2,2]] does work after all.)
And to your intended use, some working examples:
res <- sapply(mylist, function(x) x[[2,2]])
res
# [1] 23 23 23 23
### identical results, might be more obscure (and perhaps less flexible)
res <- sapply(mylist, `[[`, c(2, 2))
### identical results, less R-idiomatic
res <- vector("numeric", 4)
for (i in seq_along(mylist)) { res[[i]] <- mylist[[i]][[2,2]]; }
res
I have a list like this:
x = list(a = 1:4, b = 3:10, c = NULL)
x
#$a
#[1] 1 2 3 4
#
#$b
#[1] 3 4 5 6 7 8 9 10
#
#$c
#NULL
and I want to extract all elements that are not null. How can this be done? Thanks.
Here's another option:
Filter(Negate(is.null), x)
What about:
x[!unlist(lapply(x, is.null))]
Here is a brief description of what is going on.
lapply tells us which elements are NULL
R> lapply(x, is.null)
$a
[1] FALSE
$b
[1] FALSE
$c
[1] TRUE
Next we convect the list into a vector:
R> unlist(lapply(x, is.null))
a b c
FALSE FALSE TRUE
Then we switch TRUE to FALSE:
R> !unlist(lapply(x, is.null))
a b c
TRUE TRUE FALSE
Finally, we select the elements using the usual notation:
x[!unlist(lapply(x, is.null))]
x[!sapply(x,is.null)]
This generalizes to any logical statement about the list, just sub in the logic for "is.null".
Simpler and likely quicker than the above, the following works for lists of any non-recursive (in the sense of is.recursive) values:
example_1_LST <- list(NULL, a=1.0, b=Matrix::Matrix(), c=NULL, d=4L)
example_2_LST <- as.list(unlist(example_1_LST, recursive=FALSE))
str(example_2_LST) prints:
List of 3
$ a: num 1
$ b:Formal class 'lsyMatrix' [package "Matrix"] with 5 slots
.. ..# x : logi NA
.. ..# Dim : int [1:2] 1 1
.. ..# Dimnames:List of 2
.. .. ..$ : NULL
.. .. ..$ : NULL
.. ..# uplo : chr "U"
.. ..# factors : list()
$ d: int 4
I have a list like this:
x = list(a = 1:4, b = 3:10, c = NULL)
x
#$a
#[1] 1 2 3 4
#
#$b
#[1] 3 4 5 6 7 8 9 10
#
#$c
#NULL
and I want to extract all elements that are not null. How can this be done? Thanks.
Here's another option:
Filter(Negate(is.null), x)
What about:
x[!unlist(lapply(x, is.null))]
Here is a brief description of what is going on.
lapply tells us which elements are NULL
R> lapply(x, is.null)
$a
[1] FALSE
$b
[1] FALSE
$c
[1] TRUE
Next we convect the list into a vector:
R> unlist(lapply(x, is.null))
a b c
FALSE FALSE TRUE
Then we switch TRUE to FALSE:
R> !unlist(lapply(x, is.null))
a b c
TRUE TRUE FALSE
Finally, we select the elements using the usual notation:
x[!unlist(lapply(x, is.null))]
x[!sapply(x,is.null)]
This generalizes to any logical statement about the list, just sub in the logic for "is.null".
Simpler and likely quicker than the above, the following works for lists of any non-recursive (in the sense of is.recursive) values:
example_1_LST <- list(NULL, a=1.0, b=Matrix::Matrix(), c=NULL, d=4L)
example_2_LST <- as.list(unlist(example_1_LST, recursive=FALSE))
str(example_2_LST) prints:
List of 3
$ a: num 1
$ b:Formal class 'lsyMatrix' [package "Matrix"] with 5 slots
.. ..# x : logi NA
.. ..# Dim : int [1:2] 1 1
.. ..# Dimnames:List of 2
.. .. ..$ : NULL
.. .. ..$ : NULL
.. ..# uplo : chr "U"
.. ..# factors : list()
$ d: int 4