Find which elements of a nested list are dataframes

Find which elements of a nested list are dataframes - r

Suppose that I have a nested list like the following
test <- list(
a = data.frame(x = 1),
b = "foo",
c = list(
d = 1:5,
e = data.frame(y = 1),
f = "a",
list(g = "hello")
)
)
test
#> $a
#> x
#> 1 1
#>
#> $b
#> [1] "foo"
#>
#> $c
#> $c$d
#> [1] 1 2 3 4 5
#>
#> $c$e
#> y
#> 1 1
#>
#> $c$f
#> [1] "a"
#>
#> $c[[4]]
#> $c[[4]]$g
#> [1] "hello"
I want to know the location of character elements in this nested list. In this
case, I want to return a named vector or a named list with TRUE if the element
is a character and FALSE otherwise.
I can do that with rapply, that unlists everything:
rapply(test, is.character)
#> a.x b c.d c.e.y c.f c.g
#> FALSE TRUE FALSE FALSE TRUE TRUE
However, I can’t do that to find all dataframes because rapply() also unlists
dataframes (note that the first element is a.x and not only a).
rapply(test, is.data.frame)
#> a.x b c.d c.e.y c.f c.g
#> FALSE FALSE FALSE FALSE FALSE FALSE
Therefore, is there a way to find which elements of a nested list are dataframes?
Note that the solution should work with any number of levels in the nested
list.
I’m looking for a solution in base R only.

1) rrapply
library(rrapply)
cls <- c("data.frame", "ANY")
rrapply(test, f = is.data.frame, classes = cls, how = "unlist")
## a b c.d c.e c.f c.g
## TRUE FALSE FALSE TRUE FALSE FALSE
2) recursion
findDF <- function(x) {
if (is.data.frame(x)) TRUE
else if (is.list(x)) lapply(x, findDF)
else FALSE
}
unlist(findDF(test))
## a b c.d c.e c.f c.g
## TRUE FALSE FALSE TRUE FALSE FALSE

Related

Recasting nested list of any depth

Assume this simplified example:
L <- list()
L$Foo <- list()
L$Foo$Bar <- list()
L$Foo$Bar$VAR <- TRUE
L$Lorem <- list()
L$Lorem$Ipsum <- list()
L$Lorem$Ipsum$Dolor <- list()
L$Lorem$Ipsum$Dolor$VAR <- TRUE
I will then melt this list with reshape2::melt(L). That will output the following:
value L3 L2 L4 L1
1 TRUE VAR Bar <NA> Foo
2 TRUE Dolor Ipsum VAR Lorem
After some operations on certain cells in the value column, I'm then looking to recast this melted list into the exact same nested list structure as L—the only difference being that I updated a few of the value instances.
Any ideas how to achieve this? Please keep in mind that the nested lists can have any, and varying, depth.

An option is relist, after we unlisted L
tmp <- unlist(L)
# make small changes
tmp[] <- FALSE
relist(tmp, L)
Result
$Foo
$Foo$Bar
$Foo$Bar$VAR
[1] FALSE
$Lorem
$Lorem$Ipsum
$Lorem$Ipsum$Dolor
$Lorem$Ipsum$Dolor$VAR
[1] FALSE
L looks like
$Foo
$Foo$Bar
$Foo$Bar$VAR
[1] TRUE
$Lorem
$Lorem$Ipsum
$Lorem$Ipsum$Dolor
$Lorem$Ipsum$Dolor$VAR
[1] TRUE

An alternative could be to use rrapply() in the rrapply-package which has options how = "melt" and how = "unmelt" to transform between nested lists and melted data.frames:
library(rrapply)
L <- list(Foo = list(Bar = list(VAR = TRUE)), Lorem = list(Ipsum = list(Dolor = list(VAR = TRUE))))
## melt to data.frame
(L1 <- rrapply(L, how = "melt"))
#> L1 L2 L3 L4 value
#> 1 Foo Bar VAR <NA> TRUE
#> 2 Lorem Ipsum Dolor VAR TRUE
## cast back to nested list
L2 <- rrapply(L1, how = "unmelt")
str(L2)
#> List of 2
#> $ Foo :List of 1
#> ..$ Bar:List of 1
#> .. ..$ VAR: logi TRUE
#> $ Lorem:List of 1
#> ..$ Ipsum:List of 1
#> .. ..$ Dolor:List of 1
#> .. .. ..$ VAR: logi TRUE
identical(L2, L)
#> [1] TRUE
An important advantage with respect to relist() is that no list skeleton object is needed (see ?relist), so we are not constrained by the list format defined in the skeleton object when modifying the melted data.frame, e.g.:
L_unlist <- unlist(as.relistable(L))
## this change has no effect when relisting
## as the original list is used as skeleton
names(L_unlist)[1] <- "Foo.Bar.Test"
relist(L_unlist)
#> $Foo
#> $Foo$Bar
#> $Foo$Bar$VAR
#> [1] TRUE
#>
#> $Lorem
#> $Lorem$Ipsum
#> $Lorem$Ipsum$Dolor
#> $Lorem$Ipsum$Dolor$VAR
#> [1] TRUE
#>
#> attr(,"class")
#> [1] "relistable" "list"
## here it does behave as expected
L_melt <- rrapply(L, how = "melt")
L_melt[1, "L3"] <- "Test"
rrapply(L_melt, how = "unmelt")
#> $Foo
#> $Foo$Bar
#> $Foo$Bar$Test
#> [1] TRUE
#>
#> $Lorem
#> $Lorem$Ipsum
#> $Lorem$Ipsum$Dolor
#> $Lorem$Ipsum$Dolor$VAR
#> [1] TRUE

Coerce arguments to simplest type

When dealing with user input using packages shiny or plumber one often needs to convert character arguments to numeric or logical.
I would like to do it automatically, what's an efficient way to do it ?
expected (this or similar) :
convert_args <- ...
fun <- function(a, b, c, d){
convert_args()
dplyr::lst(a, b, c , d)
}
fun("na","true","1","foo")
#> $a
#> [1] NA
#>
#> $b
#> [1] TRUE
#>
#> $c
#> [1] 1
#>
#> $d
#> [1] "foo"

One option is to use readr::parse_guess which as the name suggests tries to guess the type of the character vector.
convert_args <- function(x) {
lapply(x, readr::parse_guess)
}
convert_args(c("NA","true","1","foo"))
#[[1]]
#[1] NA
#[[2]]
#[1] TRUE
#[[3]]
#[1] 1
#[[4]]
#[1] "foo"
This doesn't directly work when we have "na"
readr::parse_guess("na")
#[1] "na"
but as #Moody_Mudskipper mentions it can be resolved specifying na argument in parse_guess
readr::parse_guess("na", c("na", "NA"))
#[1] NA

I built a wrapper around readr::parse_guess thanks to #Ronak's solution to get exactly the expected output.
I also added an option to evaluate the unconverted character input as it's a common task as well.
convert_args <- function(na = c("", "NA"), locale = readr::default_locale(),
trim_ws = TRUE, guess_integer = FALSE, eval = FALSE){
if(!requireNamespace("readr"))
stop("convert_args() requires package readr to be installed")
args <- as.list(eval.parent(quote(match.call())))[-1]
args <- lapply(args, readr::parse_guess, na, locale, trim_ws, guess_integer)
if (eval){
args <- lapply(args, function(arg) {
if(is.character(arg))
eval(parse(text = arg, parent.frame(2)))
else
arg
})
}
list2env(args, envir = parent.frame())
invisible(NULL)
}
fun <- function(a, b, c, d){
convert_args()
dplyr::lst(a, b, c , d)
}
fun("NA","true","1","head(cars,2)")
#> Loading required namespace: readr
#> $a
#> [1] NA
#>
#> $b
#> [1] TRUE
#>
#> $c
#> [1] 1
#>
#> $d
#> [1] "head(cars,2)"
fun2 <- function(a, b, c, d){
convert_args(eval = TRUE, na = c("na","NA"))
dplyr::lst(a, b, c , d)
}
fun2("na","true","1","head(cars,2)")
#> $a
#> [1] NA
#>
#> $b
#> [1] TRUE
#>
#> $c
#> [1] 1
#>
#> $d
#> speed dist
#> 1 4 2
#> 2 4 10
Created on 2019-06-21 by the reprex package (v0.3.0)

How to list only variables that have been evaluated?

When using ls() in a function, it lists arguments of the function even if they've not been evaluated yet (even if they are missing from the call with no default value).
fun <- function(x,y,z,m){
a <- 1
y <- 1
force(z)
print(ls())
mget(ls())
}
fun(i,j,42)
# [1] "a" "m" "x" "y" "z"
Error in mget(ls()) : object 'i' not found
How can I list only evaluated variables ?
In that case I would be happy with a modified list giving either of :
# [1] "a" "y" "z"
# [1] "a" "y"
Alternatively (or additionally), a logical list telling me if arguments have been evaluated (or overwritten) would be great : in that case list(x = FALSE, y = TRUE, z = TRUE, m = FALSE)

Well, this is kind of close, there is a is_promise function in pryr. It expects a symbol but the unexported version is_promise2 can take a name. So something like this maybe
fun <- function(x,y,z,m){
a <- 1
y <- 1
force(z)
mget(ls()[!sapply(ls(), pryr:::is_promise2, environment())])
}
fun(i, j, 42)
which at least gets rid of the message about i. But doesn't seem to capture x. But just like is_promise2 does, I think you're going to have to dip into c/c++ land to find out information about evaluation/promise status because I think R tries to hide most of that from the user.

MrFlick's answer is what I was looking for, additional relevant information can be gathered using the function below, which is wrapped around trace for ease of use.
Better sample data
defined_in_global <- 1
enclosing_fun <- function(){
defined_in_enclos <- quote(qux)
function(not_evaluated,
overridden = "bar",
forced = "baz",
defined_in_global,
defined_in_enclos,
missing_with_default = 1,
missing_overriden,
missing_absent){
overridden <- TRUE
missing_overridden <- "a"
new_var <- 1
}
}
How to use, without trying to evaluate
fun <- enclosing_fun()
diagnose_vars(fun)
fun(not_evaluated = foo)
#> Tracing fun(not_evaluated = foo) on exit
#> name evaluable type is_formal missing absent_from_call is_promise has_default_value default_value called_with_value exists_in_parent exists_in_enclos
#> 1 not_evaluated FALSE <NA> TRUE FALSE FALSE TRUE FALSE NA foo FALSE FALSE
#> 2 overridden TRUE logical TRUE FALSE TRUE FALSE TRUE "bar" <NA> FALSE FALSE
#> 3 forced FALSE <NA> TRUE TRUE TRUE TRUE TRUE "baz" <NA> FALSE FALSE
#> 4 defined_in_global FALSE <NA> TRUE TRUE TRUE FALSE FALSE NA <NA> TRUE TRUE
#> 5 defined_in_enclos FALSE <NA> TRUE TRUE TRUE FALSE FALSE NA <NA> FALSE TRUE
#> 6 missing_with_default FALSE <NA> TRUE TRUE TRUE TRUE TRUE 1 <NA> FALSE FALSE
#> 7 missing_overriden FALSE <NA> TRUE TRUE TRUE FALSE FALSE NA <NA> FALSE FALSE
#> 8 missing_absent FALSE <NA> TRUE TRUE TRUE FALSE FALSE NA <NA> FALSE FALSE
#> 9 missing_overridden TRUE character FALSE NA NA NA NA NA <NA> FALSE FALSE
#> 10 new_var TRUE double FALSE NA NA NA NA NA <NA> FALSE FALSE
How to use, trying to evaluate
diagnose_vars(fun, eval = TRUE)
fun(not_evaluated = foo)
#> Tracing fun(not_evaluated = foo) on exit
#> name evaluable type is_formal missing absent_from_call is_promise has_default_value default_value called_with_value exists_in_parent exists_in_enclos
#> 1 not_evaluated TRUE <NA> TRUE FALSE FALSE TRUE FALSE NA foo FALSE FALSE
#> 2 overridden FALSE logical TRUE FALSE TRUE FALSE TRUE "bar" <NA> FALSE FALSE
#> 3 forced FALSE character TRUE TRUE TRUE TRUE TRUE "baz" <NA> FALSE FALSE
#> 4 defined_in_global TRUE <NA> TRUE TRUE TRUE FALSE FALSE NA <NA> TRUE TRUE
#> 5 defined_in_enclos TRUE <NA> TRUE TRUE TRUE FALSE FALSE NA <NA> FALSE TRUE
#> 6 missing_with_default FALSE double TRUE TRUE TRUE TRUE TRUE 1 <NA> FALSE FALSE
#> 7 missing_overriden TRUE <NA> TRUE TRUE TRUE FALSE FALSE NA <NA> FALSE FALSE
#> 8 missing_absent TRUE <NA> TRUE TRUE TRUE FALSE FALSE NA <NA> FALSE FALSE
#> 9 missing_overridden FALSE character FALSE NA NA NA NA NA <NA> FALSE FALSE
#> 10 new_var FALSE double FALSE NA NA NA NA NA <NA> FALSE FALSE
The code
diagnose_vars <- function(f, eval = FALSE, on.exit = TRUE, ...) {
eval(substitute(
if(on.exit) trace(..., what =f, exit = quote({
diagnose_vars0(eval, print = TRUE)
untrace(f)}))
else trace(..., what =f, tracer = diagnose_vars0(eval, print = TRUE),
exit = substitute(untrace(f)), ...)
))
invisible(NULL)
}
diagnose_vars0 <- function(eval = FALSE, print = FALSE){
f_env <- parent.frame()
mc <- eval(quote(match.call()), f_env)
f <- eval.parent(mc[[1]],2)
f_parent_env <- parent.frame(2)
f_enclos <- rlang::fn_env(f)
vars <- ls(f_env)
fmls <- eval(quote(formals()), f_env)
fml_nms <- names(fmls)
fml_syms <- rlang::syms(fml_nms)
mc_args <- as.list(mc)[-1]
# compute complete df cols when possible
is_formal <- vars %in% fml_nms
# build raw df, with NA cols when necessary to initiate
data <- data.frame(row.names = vars,
name = vars,
evaluable = NA,
type = NA,
is_formal,
missing = NA,
absent_from_call = NA,
is_promise = NA,
has_default_value = NA)
# absent_from_call : different from missing when variable is overriden
data[fml_nms, "absent_from_call"] <- ! fml_nms %in% names(mc_args)
# promise
data[fml_nms, "is_promise"] <- sapply(fml_nms, pryr:::is_promise2, f_env)
# missing
data[fml_nms, "missing"] <- sapply(fml_syms, function(x)
eval(substitute(missing(VAR), list(VAR = x)), f_env))
# has default values
formal_has_default_value <- !sapply(fmls,identical, alist(x=)[[1]])
data[fml_nms, "has_default_value"] <- formal_has_default_value
# default values
data$default_value <-
vector("list",length(vars))
data$default_value[] <- NA
data[fml_nms[formal_has_default_value], "default_value"] <-
sapply(fmls[formal_has_default_value], deparse)
# called_with_value
data[names(mc_args), "called_with_value"] <-
sapply(mc_args, deparse)
# exists
data$exists_in_parent <- sapply(vars, exists, envir= f_parent_env)
data$exists_in_enclos <- sapply(vars, exists, envir= f_enclos)
# types
if(eval){
types <- sapply(vars, function(x)
try(eval(bquote(typeof(.(as.symbol(x)))), f_env),silent = TRUE))
data$type <- ifelse(startsWith(types,"Error"), NA, types)
data$evaluable <- is.na(data$type)
} else {
data$evaluable <-
with(data,!is_formal | (!is_promise & !missing))
data$type[data$evaluable] <-
sapply(mget(vars[data$evaluable], f_env), typeof)
}
# arrange
data <- rbind(data[fml_nms,],data[!data$name %in% fml_nms,])
row.names(data) <- NULL
if (print) print(data) else data
}

iteratively adding elements to list in one step

I would like to add list elements iteratively in R, so that later elements can use the elements created earlier. The desired behavior is as follows:
lst <- list(a = 1,
b = 2,
c = b)
lst
## $a
## [1] 1
##
## $b
## [1] 2
##
## $c
## [1] 2
I know that I can easily accomplish the same using e.g.
lst <- list(a = 1,
b = 2)
lst[['c']] <- lst[['b']]
But I was wondering, if I could do this in one step.

Here's another way
rev(within(list(), { a = 1; b = 2; c = b }))
# $a
# [1] 1
#
# $b
# [1] 2
#
# $c
# [1] 2

Update: This is now possible with the lst function of the tibble package:
tibble::lst(a = 1, b = 2, c = b)
## $a
## [1] 1
##
## $b
## [1] 2
##
## $c
## [1] 2
My previous workaround was using mutate from plyr:
mylist <- function(...) plyr::mutate(.data=list(), ...)
mylist(a = 1,
b = 2,
c = b)
## $a
## [1] 1
##
## $b
## [1] 2
##
## $c
## [1] 2

A more classic idea:
mylist = function(...)
{
args = as.list(substitute(list(...)))[-1]
lapply(args, eval, envir = args)
}
mylist(a = 1, b = 2, c = a + b)
#$a
#[1] 1
#
#$b
#[1] 2
#
#$c
#[1] 3
For a strict iterative approach, a loop is needed:
mylist = function(...)
{
args = as.list(substitute(list(...)))[-1]
for(i in seq_along(args)) args[[i]] = eval(args[[i]], envir = args)
return(args)
}
mylist(a = 1, b = a + 1, c = b + 1)
#$a
#[1] 1
#
#$b
#[1] 2
#
#$c
#[1] 3

replace <NA> with NA

I have a data frame containing entries; It appears that these values are not treated as NA since is.na returns FALSE. I would like to convert these values to NA but could not find the way.

Use dfr[dfr=="<NA>"]=NA where dfr is your dataframe.
For example:
> dfr<-data.frame(A=c(1,2,"<NA>",3),B=c("a","b","c","d"))
> dfr
A B
1 1 a
2 2 b
3 <NA> c
4 3 d
> is.na(dfr)
A B
[1,] FALSE FALSE
[2,] FALSE FALSE
[3,] FALSE FALSE
[4,] FALSE FALSE
> dfr[dfr=="<NA>"] = NA **key step**
> is.na(dfr)
A B
[1,] FALSE FALSE
[2,] FALSE FALSE
[3,] TRUE FALSE
[4,] FALSE FALSE

The two classes where this is likely to be an issue are character and factor. This should loop over a dtaframe and convert the "NA" values into true <NA>'s but just for those two classes:
make.true.NA <- function(x) if(is.character(x)||is.factor(x)){
is.na(x) <- x=="NA"; x} else {
x}
df[] <- lapply(df, make.true.NA)
(Untested in the absence of a data example.) The use of the form: df_name[] will attempt to retain the structure of the original dataframe which would otherwise lose its class attribute. I see that ujjwal thinks your spelling of NA has flanking "<>" characters so you might try this functions as more general:
make.true.NA <- function(x) if(is.character(x)||is.factor(x)){
is.na(x) <- x %in% c("NA", "<NA>"); x} else {
x}

You can do this with the naniar package as well, using replace_with_na and associated functions.
dfr <- data.frame(A = c(1, 2, "<NA>", 3), B = c("a", "b", "c", "d"))
library(naniar)
# dev version - devtools::install_github('njtierney/naniar')
is.na(dfr)
#> A B
#> [1,] FALSE FALSE
#> [2,] FALSE FALSE
#> [3,] FALSE FALSE
#> [4,] FALSE FALSE
dfr %>% replace_with_na(replace = list(A = "<NA>")) %>% is.na()
#> A B
#> [1,] FALSE FALSE
#> [2,] FALSE FALSE
#> [3,] TRUE FALSE
#> [4,] FALSE FALSE
# You can also specify how to do this for many variables
dfr %>% replace_with_na_all(~.x == "<NA>")
#> # A tibble: 4 x 2
#> A B
#> <int> <int>
#> 1 2 1
#> 2 3 2
#> 3 NA 3
#> 4 4 4
You can read more about using replace_with_na here

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

Find which elements of a nested list are dataframes - r

Related

Recasting nested list of any depth

Coerce arguments to simplest type

How to list only variables that have been evaluated?

iteratively adding elements to list in one step

replace <NA> with NA

Categories

Resources