R - Partially flatten nested lists with dataframes - r

I have a list with this structure:
$ List (length 13) ; 13 Types
$ --- Lists (Length 4) ; Each have 4 subsets of the same original data
$ ------- Dataframes 1, 2, 3, and 4 ; for each of 13 types
I want
$ List (length 52) ; 52 Versions (Type_Subset)
$ --- Dataframes 1, 2, 3, ... 52 ; As separate elements in list
How would I do this using the below mtcars example?
df <- list(Blue = list(mtcars[1:3,], mtcars[4:6,], mtcars[7:9,]),
Red = list(mtcars[10:12,], mtcars[13:15,], mtcars[16:18,]),
Green = list(mtcars[18:20,], mtcars[21:23,], mtcars[24:26,]))
# Need function on df ...
# new_df <- SingleNestLevel(df)
# Which yields:
list(Blue1 = mtcars[1:3,],
Blue2 = mtcars[4:6,],
Blue3 = mtcars[7:9,],
Red1 = mtcars[10:12,],
Red2 = mtcars[13:15,],
Red3 = mtcars[16:18,],
Green1 = mtcars[18:20,],
Green2 = mtcars[21:23,],
Green3 = mtcars[24:26,])
Note: I have looked at analogous questions like this one, but I want to convert to one nested level, not flatten my structure entirely.

just use:
unlist(your_list,recursive=F)

I think this generalizes your issue to any nested list:
library(purrr)
new_df <- flatten(df) %>%
setNames(paste0(rep(names(df), times = map_int(df, ~length(.x))),
unlist(map(df, ~1:length(.x)))))

Using the same library data.table you can try
library(data.table)
df <- copy(mtcars[1:27,]) # copying reserved dataset mtcars.
setDT(df, keep.rownames = TRUE)[,v1 :=rep(unlist(lapply(c("Blue","Red", "Green"),paste, 1:3, sep = "")),
each = 3)] #including temporary variable v1
df <- split(df, df$v1) #spliting into a list
df <- lapply(df, function(x) x[,v1 := NULL]) #removing temporary variable nv1
df #Returns
$Blue1
rn mpg cyl disp hp drat wt qsec vs am gear carb
1: Mazda RX4 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4
2: Mazda RX4 Wag 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4
3: Datsun 710 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1
$Blue2
rn mpg cyl disp hp drat wt qsec vs am gear carb
1: Hornet 4 Drive 21.4 6 258 110 3.08 3.215 19.44 1 0 3 1
2: Hornet Sportabout 18.7 8 360 175 3.15 3.440 17.02 0 0 3 2
3: Valiant 18.1 6 225 105 2.76 3.460 20.22 1 0 3 1
$Blue3
rn mpg cyl disp hp drat wt qsec vs am gear carb
1: Duster 360 14.3 8 360.0 245 3.21 3.57 15.84 0 0 3 4
2: Merc 240D 24.4 4 146.7 62 3.69 3.19 20.00 1 0 4 2
3: Merc 230 22.8 4 140.8 95 3.92 3.15 22.90 1 0 4 2
$Green1
rn mpg cyl disp hp drat wt qsec vs am gear carb
1: Honda Civic 30.4 4 75.7 52 4.93 1.615 18.52 1 1 4 2
2: Toyota Corolla 33.9 4 71.1 65 4.22 1.835 19.90 1 1 4 1
3: Toyota Corona 21.5 4 120.1 97 3.70 2.465 20.01 1 0 3 1
$Green2
rn mpg cyl disp hp drat wt qsec vs am gear carb
1: Dodge Challenger 15.5 8 318 150 2.76 3.520 16.87 0 0 3 2
2: AMC Javelin 15.2 8 304 150 3.15 3.435 17.30 0 0 3 2
3: Camaro Z28 13.3 8 350 245 3.73 3.840 15.41 0 0 3 4
$Green3
rn mpg cyl disp hp drat wt qsec vs am gear carb
1: Pontiac Firebird 19.2 8 400.0 175 3.08 3.845 17.05 0 0 3 2
2: Fiat X1-9 27.3 4 79.0 66 4.08 1.935 18.90 1 1 4 1
3: Porsche 914-2 26.0 4 120.3 91 4.43 2.140 16.70 0 1 5 2
$Red1
rn mpg cyl disp hp drat wt qsec vs am gear carb
1: Merc 280 19.2 6 167.6 123 3.92 3.44 18.3 1 0 4 4
2: Merc 280C 17.8 6 167.6 123 3.92 3.44 18.9 1 0 4 4
3: Merc 450SE 16.4 8 275.8 180 3.07 4.07 17.4 0 0 3 3
$Red2
rn mpg cyl disp hp drat wt qsec vs am gear carb
1: Merc 450SL 17.3 8 275.8 180 3.07 3.73 17.60 0 0 3 3
2: Merc 450SLC 15.2 8 275.8 180 3.07 3.78 18.00 0 0 3 3
3: Cadillac Fleetwood 10.4 8 472.0 205 2.93 5.25 17.98 0 0 3 4
$Red3
rn mpg cyl disp hp drat wt qsec vs am gear carb
1: Lincoln Continental 10.4 8 460.0 215 3.00 5.424 17.82 0 0 3 4
2: Chrysler Imperial 14.7 8 440.0 230 3.23 5.345 17.42 0 0 3 4
3: Fiat 128 32.4 4 78.7 66 4.08 2.200 19.47 1 1 4 1

Related

Passing a quoted function argument to a three-dots argument inside another function using base R

I would like to pass quoted variables in the group argument of geom_col_wrap to the split_group function.
# I deleted the rest of the function for readability
geom_col_wrap = function(data, mapping, group, ...) {
data |>
split_group(group)
}
# This function was based on the `tidytable` package
split_group = function(data, ...) {
by_quote = as.list(substitute(...()))
by = sapply(by_quote, deparse)
split = vctrs::vec_split(data, data[c(by)])
out = split[["val"]]
names = do.call(paste, c(split[["key"]], sep = "_"))
names(out) = names
return(out)
}
split_group use substitute to quote variables, here is the problem. How can I make split_group recognize quote variables from group argument? I know it is easy to solve using rlang, but I need a R base solution.
split_group(mtcars, vs, am)
$`0_1`
mpg cyl disp hp drat wt qsec vs am gear carb
Mazda RX4 21.0 6 160.0 110 3.90 2.620 16.46 0 1 4 4
Mazda RX4 Wag 21.0 6 160.0 110 3.90 2.875 17.02 0 1 4 4
...
$`1_1`
mpg cyl disp hp drat wt qsec vs am gear carb
Datsun 710 22.8 4 108.0 93 3.85 2.320 18.61 1 1 4 1
Fiat 128 32.4 4 78.7 66 4.08 2.200 19.47 1 1 4 1
...
$`1_0`
mpg cyl disp hp drat wt qsec vs am gear carb
Hornet 4 Drive 21.4 6 258.0 110 3.08 3.215 19.44 1 0 3 1
Valiant 18.1 6 225.0 105 2.76 3.460 20.22 1 0 3 1
...
$`0_0`
mpg cyl disp hp drat wt qsec vs am gear carb
Hornet Sportabout 18.7 8 360.0 175 3.15 3.440 17.02 0 0 3 2
Duster 360 14.3 8 360.0 245 3.21 3.570 15.84 0 0 3 4
...
geom_col_wrap(
mtcars,
mapping = ggplot2::aes(x = cyl, y = hp, color = am),
group = c(vs, am)
)
Error in `[.data.frame`(data, c(by)) : undefined columns selected
This error comes from as.list(substitute(...())). It does not unquoted the group argument. Why?
Note: I cannot use dots arg to solve the problem.
Using the miraculous ...() chain, explanation is given here.
split_group <- \(x, ...) split(x, x[, sapply(substitute(...()), as.character)])
split_group(mtcars, vs, am)
# $`0.0`
# mpg cyl disp hp drat wt qsec vs am gear carb
# Hornet Sportabout 18.7 8 360.0 175 3.15 3.44 17.02 0 0 3 2
# Duster 360 14.3 8 360.0 245 3.21 3.57 15.84 0 0 3 4
# Merc 450SE 16.4 8 275.8 180 3.07 4.07 17.40 0 0 3 3
# ...
#
# $`1.0`
# mpg cyl disp hp drat wt qsec vs am gear carb
# Hornet 4 Drive 21.4 6 258.0 110 3.08 3.215 19.44 1 0 3 1
# Valiant 18.1 6 225.0 105 2.76 3.460 20.22 1 0 3 1
# Merc 240D 24.4 4 146.7 62 3.69 3.190 20.00 1 0 4 2
# ...
#
# $`0.1`
# mpg cyl disp hp drat wt qsec vs am gear carb
# Mazda RX4 21 6 160.0 110 3.90 2.620 16.46 0 1 4 4
# Mazda RX4 Wag 21 6 160.0 110 3.90 2.875 17.02 0 1 4 4
# Porsche 914-2 26 4 120.3 91 4.43 2.140 16.70 0 1 5 2
# ...
#
# $`1.1`
# mpg cyl disp hp drat wt qsec vs am gear carb
# Datsun 710 22.8 4 108.0 93 3.85 2.320 18.61 1 1 4 1
# Fiat 128 32.4 4 78.7 66 4.08 2.200 19.47 1 1 4 1
# Honda Civic 30.4 4 75.7 52 4.93 1.615 18.52 1 1 4 2
You basically need the base R version of rlang's {{ group }} or !!enquo(group) workflow. Which would be using substitute() to grab your group argument, and then using .(group) inside bquote().
However bquote() helps you build the expression, we then need to use eval() to evaluate your new expression.
Another thing - you're using deparse() in split_group() which would convert c(vs, am) to "c(vs, am)". Instead we'll need to mimic tidyselect so you can use c() style selection (that also still works without c() for a single column).
Put together it looks like this.
split_group = function(data, ...) {
by_quote = as.list(substitute(...()))
# Mimic tidyselect
cols = as.list(seq_along(data))
names(cols) = names(data)
by = unlist(lapply(by_quote, eval, cols))
split = vctrs::vec_split(data, data[c(by)])
out = split[["val"]]
names = do.call(paste, c(split[["key"]], sep = "_"))
names(out) = names
return(out)
}
geom_col_wrap = function(data, mapping, group, ...) {
# Use substitute/bquote to "unquote" group arg inside split_group function
# Much like using `{{ group }}` or `!!enquo(group)` in rlang
group = substitute(group)
eval(bquote(
data |>
split_group(.(group))
))
}
geom_col_wrap(
mtcars,
mapping = ggplot2::aes(x = cyl, y = hp, color = am),
group = c(vs, am)
)
#> $`0_1`
#> mpg cyl disp hp drat wt qsec vs am gear carb
#> Mazda RX4 21.0 6 160.0 110 3.90 2.620 16.46 0 1 4 4
#> Mazda RX4 Wag 21.0 6 160.0 110 3.90 2.875 17.02 0 1 4 4
#> Porsche 914-2 26.0 4 120.3 91 4.43 2.140 16.70 0 1 5 2
#> Ford Pantera L 15.8 8 351.0 264 4.22 3.170 14.50 0 1 5 4
#> Ferrari Dino 19.7 6 145.0 175 3.62 2.770 15.50 0 1 5 6
#> Maserati Bora 15.0 8 301.0 335 3.54 3.570 14.60 0 1 5 8
#>
#> $`1_1`
#> mpg cyl disp hp drat wt qsec vs am gear carb
#> Datsun 710 22.8 4 108.0 93 3.85 2.320 18.61 1 1 4 1
#> Fiat 128 32.4 4 78.7 66 4.08 2.200 19.47 1 1 4 1
#> Honda Civic 30.4 4 75.7 52 4.93 1.615 18.52 1 1 4 2
#> Toyota Corolla 33.9 4 71.1 65 4.22 1.835 19.90 1 1 4 1
#> Fiat X1-9 27.3 4 79.0 66 4.08 1.935 18.90 1 1 4 1
#> Lotus Europa 30.4 4 95.1 113 3.77 1.513 16.90 1 1 5 2
#> Volvo 142E 21.4 4 121.0 109 4.11 2.780 18.60 1 1 4 2
#>
#> $`1_0`
#> mpg cyl disp hp drat wt qsec vs am gear carb
#> Hornet 4 Drive 21.4 6 258.0 110 3.08 3.215 19.44 1 0 3 1
#> Valiant 18.1 6 225.0 105 2.76 3.460 20.22 1 0 3 1
#> Merc 240D 24.4 4 146.7 62 3.69 3.190 20.00 1 0 4 2
#> Merc 230 22.8 4 140.8 95 3.92 3.150 22.90 1 0 4 2
#> Merc 280 19.2 6 167.6 123 3.92 3.440 18.30 1 0 4 4
#> Merc 280C 17.8 6 167.6 123 3.92 3.440 18.90 1 0 4 4
#> Toyota Corona 21.5 4 120.1 97 3.70 2.465 20.01 1 0 3 1
#>
#> $`0_0`
#> mpg cyl disp hp drat wt qsec vs am gear carb
#> Hornet Sportabout 18.7 8 360.0 175 3.15 3.440 17.02 0 0 3 2
#> Duster 360 14.3 8 360.0 245 3.21 3.570 15.84 0 0 3 4
#> Merc 450SE 16.4 8 275.8 180 3.07 4.070 17.40 0 0 3 3
#> Merc 450SL 17.3 8 275.8 180 3.07 3.730 17.60 0 0 3 3
#> Merc 450SLC 15.2 8 275.8 180 3.07 3.780 18.00 0 0 3 3
#> Cadillac Fleetwood 10.4 8 472.0 205 2.93 5.250 17.98 0 0 3 4
#> Lincoln Continental 10.4 8 460.0 215 3.00 5.424 17.82 0 0 3 4
#> Chrysler Imperial 14.7 8 440.0 230 3.23 5.345 17.42 0 0 3 4
#> Dodge Challenger 15.5 8 318.0 150 2.76 3.520 16.87 0 0 3 2
#> AMC Javelin 15.2 8 304.0 150 3.15 3.435 17.30 0 0 3 2
#> Camaro Z28 13.3 8 350.0 245 3.73 3.840 15.41 0 0 3 4
#> Pontiac Firebird 19.2 8 400.0 175 3.08 3.845 17.05 0 0 3 2
Any reason you can't use rlang? vctrs depends on rlang so you're already sort of using it anyway.

How can I assign NA?

I use the mtcars dataframe and I use the following code to assign NA to all the values of columns drat and wt <=3.
df <- mtcars %>%
mutate(across(c(drat, wt), ~ifelse(.x<=3, NA, .x)))
How can I modify the code in a way that let me to assign NA also to the values of the column qsec if the value of drat or wt in the same row is <=3? At the end I want that each row where drat or wt is NA has NA also in the column qsec. Thanks
We may use if_any on the columns that are changed to NA to return a logical vector to replace values in 'qsec'
library(dplyr)
mtcars1 <- mtcars %>%
mutate(across(c(drat, wt), ~ifelse(.x<=3, NA, .x)),
qsec = ifelse(if_any(c(drat, wt), is.na), NA, qsec))
-output
> head(mtcars1)
mpg cyl disp hp drat wt qsec vs am gear carb
Mazda RX4 21.0 6 160 110 3.90 NA NA 0 1 4 4
Mazda RX4 Wag 21.0 6 160 110 3.90 NA NA 0 1 4 4
Datsun 710 22.8 4 108 93 3.85 NA NA 1 1 4 1
Hornet 4 Drive 21.4 6 258 110 3.08 3.215 19.44 1 0 3 1
Hornet Sportabout 18.7 8 360 175 3.15 3.440 17.02 0 0 3 2
Valiant 18.1 6 225 105 NA 3.460 NA 1 0 3 1
1) dplyr Continue the mutate to set qsec to NA if either drat or wt is NA. (If you meant and rather than or then replace | with & .)
mtcars %>%
mutate(across(c(drat, wt), ~ifelse(.x<=3, NA, .x)),
qsec = ifelse(is.na(drat) | is.na(wt), NA, qsec))
giving:
mpg cyl disp hp drat wt qsec vs am gear carb
Mazda RX4 21.0 6 160.0 110 3.90 NA NA 0 1 4 4
Mazda RX4 Wag 21.0 6 160.0 110 3.90 NA NA 0 1 4 4
Datsun 710 22.8 4 108.0 93 3.85 NA NA 1 1 4 1
Hornet 4 Drive 21.4 6 258.0 110 3.08 3.215 19.44 1 0 3 1
Hornet Sportabout 18.7 8 360.0 175 3.15 3.440 17.02 0 0 3 2
Valiant 18.1 6 225.0 105 NA 3.460 NA 1 0 3 1
Duster 360 14.3 8 360.0 245 3.21 3.570 15.84 0 0 3 4
Merc 240D 24.4 4 146.7 62 3.69 3.190 20.00 1 0 4 2
Merc 230 22.8 4 140.8 95 3.92 3.150 22.90 1 0 4 2
Merc 280 19.2 6 167.6 123 3.92 3.440 18.30 1 0 4 4
Merc 280C 17.8 6 167.6 123 3.92 3.440 18.90 1 0 4 4
Merc 450SE 16.4 8 275.8 180 3.07 4.070 17.40 0 0 3 3
Merc 450SL 17.3 8 275.8 180 3.07 3.730 17.60 0 0 3 3
Merc 450SLC 15.2 8 275.8 180 3.07 3.780 18.00 0 0 3 3
Cadillac Fleetwood 10.4 8 472.0 205 NA 5.250 NA 0 0 3 4
Lincoln Continental 10.4 8 460.0 215 NA 5.424 NA 0 0 3 4
Chrysler Imperial 14.7 8 440.0 230 3.23 5.345 17.42 0 0 3 4
Fiat 128 32.4 4 78.7 66 4.08 NA NA 1 1 4 1
Honda Civic 30.4 4 75.7 52 4.93 NA NA 1 1 4 2
Toyota Corolla 33.9 4 71.1 65 4.22 NA NA 1 1 4 1
Toyota Corona 21.5 4 120.1 97 3.70 NA NA 1 0 3 1
Dodge Challenger 15.5 8 318.0 150 NA 3.520 NA 0 0 3 2
AMC Javelin 15.2 8 304.0 150 3.15 3.435 17.30 0 0 3 2
Camaro Z28 13.3 8 350.0 245 3.73 3.840 15.41 0 0 3 4
Pontiac Firebird 19.2 8 400.0 175 3.08 3.845 17.05 0 0 3 2
Fiat X1-9 27.3 4 79.0 66 4.08 NA NA 1 1 4 1
Porsche 914-2 26.0 4 120.3 91 4.43 NA NA 0 1 5 2
Lotus Europa 30.4 4 95.1 113 3.77 NA NA 1 1 5 2
Ford Pantera L 15.8 8 351.0 264 4.22 3.170 14.50 0 1 5 4
Ferrari Dino 19.7 6 145.0 175 3.62 NA NA 0 1 5 6
Maserati Bora 15.0 8 301.0 335 3.54 3.570 14.60 0 1 5 8
Volvo 142E 21.4 4 121.0 109 4.11 NA NA 1 1 4 2
2) Base R With base R we can use within like this giving the same result.
within(mtcars, {
drat <- ifelse(drat <= 3, NA, drat)
wt <- ifelse(wt <= 3, NA, wt)
qsec <- ifelse(is.na(drat) | is.na(wt), NA, qsec)
})
or at the expense of some redundancy we could use transform:
transform(mtcars,
drat = ifelse(drat <= 3, NA, drat),
wt = ifelse(wt <= 3, NA, wt),
qsec = ifelse(drat <= 3 | wt <= 3, NA, qsec))
Since the question is not tagged tidyverse here is a base R way with is.na<-.
is.na(mtcars$drat) <- mtcars$drat < 3
is.na(mtcars$wt) <- mtcars$wt < 3
is.na(mtcars$qsec) <- with(mtcars, is.na(drat) | is.na(wt))

Split dataframe column in R

I have a tibble
a <- tribble(~names,"|david:123|",)
and I've seen code that does the following but, not sure what it does.
a %>% split(.$names)
split splits the dataframe based on values in a column. You have provided one row data which is not helpful to demonstrate what it does. Let's consider the inbuilt mtcars dataset.
The unique values in cyl column of mtcars dataset are 6, 4, 8.
unique(mtcars$cyl)
#[1] 6 4 8
When we use mtcars %>% split(.$cyl) it divides mtcars dataset into list of length 3 where each list consists of one unique cyl value.
temp <- mtcars %>% split(.$cyl)
temp[[1]]
# mpg cyl disp hp drat wt qsec vs am gear carb
#Datsun 710 22.8 4 108.0 93 3.85 2.320 18.61 1 1 4 1
#Merc 240D 24.4 4 146.7 62 3.69 3.190 20.00 1 0 4 2
#Merc 230 22.8 4 140.8 95 3.92 3.150 22.90 1 0 4 2
#Fiat 128 32.4 4 78.7 66 4.08 2.200 19.47 1 1 4 1
#...
temp[[2]]
# mpg cyl disp hp drat wt qsec vs am gear carb
#Mazda RX4 21.0 6 160.0 110 3.90 2.620 16.46 0 1 4 4
#Mazda RX4 Wag 21.0 6 160.0 110 3.90 2.875 17.02 0 1 4 4
#Hornet 4 Drive 21.4 6 258.0 110 3.08 3.215 19.44 1 0 3 1
#Valiant 18.1 6 225.0 105 2.76 3.460 20.22 1 0 3 1
#...
temp[[3]]
# mpg cyl disp hp drat wt qsec vs am gear carb
#Hornet Sportabout 18.7 8 360.0 175 3.15 3.440 17.02 0 0 3 2
#Duster 360 14.3 8 360.0 245 3.21 3.570 15.84 0 0 3 4
#Merc 450SE 16.4 8 275.8 180 3.07 4.070 17.40 0 0 3 3
#Merc 450SL 17.3 8 275.8 180 3.07 3.730 17.60 0 0 3 3
#Merc 450SLC 15.2 8 275.8 180 3.07 3.780 18.00 0 0 3 3
#Cadillac Fleetwood 10.4 8 472.0 205 2.93 5.250 17.98 0 0 3 4
#...
As we can see that mtcars[[1]] has all the rows where cyl = 4, mtcars[[2]] has rows where cyl = 6, mtcars[[3]] has all cyl = 8.
Similarly, for your case, a %>% split(.$names) splits dataframe/tibble into list of unique names from the data. .$names is to extract the names column from a dataframe.

retain dplyr::group_by columns when using tidyr::nest()

Take the following example in R
library(dplyr)
library(tidyr)
mtcars_cyl <- mtcars %>% group_by(cyl) %>% nest()
if we look at the column names of mtcars_cyl, we see that cyl is no longer included.
mtcars_cyl$data[[1]] %>% colnames()
[1] "mpg" "disp" "hp" "drat" "wt" "qsec" "vs" "am" "gear" "carb"
I was expecting to find some method/option for retaining the group_by
columns within data, but finding a solution is escaping me. I can understand this might be a niche need. As an example, one might want to create a table of each group_by data frame and include cyl as a column in that output.
library(pander)
mtcars_cyl$data %>% pander::pander()
In other cases, when using in combination with purrr, one might need to include the group_by columns in a function call.
You can use split(mtcars, mtcars$cyl) instead. This gives a list of data frames.
split(mtcars, mtcars$cyl)
#> $`4`
#> mpg cyl disp hp drat wt qsec vs am gear carb
#> Datsun 710 22.8 4 108.0 93 3.85 2.320 18.61 1 1 4 1
#> Merc 240D 24.4 4 146.7 62 3.69 3.190 20.00 1 0 4 2
#> Merc 230 22.8 4 140.8 95 3.92 3.150 22.90 1 0 4 2
#> Fiat 128 32.4 4 78.7 66 4.08 2.200 19.47 1 1 4 1
#> Honda Civic 30.4 4 75.7 52 4.93 1.615 18.52 1 1 4 2
#> Toyota Corolla 33.9 4 71.1 65 4.22 1.835 19.90 1 1 4 1
#> Toyota Corona 21.5 4 120.1 97 3.70 2.465 20.01 1 0 3 1
#> Fiat X1-9 27.3 4 79.0 66 4.08 1.935 18.90 1 1 4 1
#> Porsche 914-2 26.0 4 120.3 91 4.43 2.140 16.70 0 1 5 2
#> Lotus Europa 30.4 4 95.1 113 3.77 1.513 16.90 1 1 5 2
#> Volvo 142E 21.4 4 121.0 109 4.11 2.780 18.60 1 1 4 2
#>
#> $`6`
#> mpg cyl disp hp drat wt qsec vs am gear carb
#> Mazda RX4 21.0 6 160.0 110 3.90 2.620 16.46 0 1 4 4
#> Mazda RX4 Wag 21.0 6 160.0 110 3.90 2.875 17.02 0 1 4 4
#> Hornet 4 Drive 21.4 6 258.0 110 3.08 3.215 19.44 1 0 3 1
#> Valiant 18.1 6 225.0 105 2.76 3.460 20.22 1 0 3 1
#> Merc 280 19.2 6 167.6 123 3.92 3.440 18.30 1 0 4 4
#> Merc 280C 17.8 6 167.6 123 3.92 3.440 18.90 1 0 4 4
#> Ferrari Dino 19.7 6 145.0 175 3.62 2.770 15.50 0 1 5 6
#>
#> $`8`
#> mpg cyl disp hp drat wt qsec vs am gear carb
#> Hornet Sportabout 18.7 8 360.0 175 3.15 3.440 17.02 0 0 3 2
#> Duster 360 14.3 8 360.0 245 3.21 3.570 15.84 0 0 3 4
#> Merc 450SE 16.4 8 275.8 180 3.07 4.070 17.40 0 0 3 3
#> Merc 450SL 17.3 8 275.8 180 3.07 3.730 17.60 0 0 3 3
#> Merc 450SLC 15.2 8 275.8 180 3.07 3.780 18.00 0 0 3 3
#> Cadillac Fleetwood 10.4 8 472.0 205 2.93 5.250 17.98 0 0 3 4
#> Lincoln Continental 10.4 8 460.0 215 3.00 5.424 17.82 0 0 3 4
#> Chrysler Imperial 14.7 8 440.0 230 3.23 5.345 17.42 0 0 3 4
#> Dodge Challenger 15.5 8 318.0 150 2.76 3.520 16.87 0 0 3 2
#> AMC Javelin 15.2 8 304.0 150 3.15 3.435 17.30 0 0 3 2
#> Camaro Z28 13.3 8 350.0 245 3.73 3.840 15.41 0 0 3 4
#> Pontiac Firebird 19.2 8 400.0 175 3.08 3.845 17.05 0 0 3 2
#> Ford Pantera L 15.8 8 351.0 264 4.22 3.170 14.50 0 1 5 4
#> Maserati Bora 15.0 8 301.0 335 3.54 3.570 14.60 0 1 5 8
Generally, I tend to use nest() but also miss the grouping variables.
It is rarely a problem in workflows where the nested data is passed to purrr::pmap functions. This work flow allows for subsetting data with nest and apply functions to the nested dataframes including the grouping variables.
library(dplyr)
library(tidyr)
mtcars_cyl <- mtcars %>% group_by(cyl) %>% nest()
# The nested data
mtcars_cyl
# A tibble: 3 x 2
cyl data
<dbl> <list>
1 6 <tibble [7 x 10]>
2 4 <tibble [11 x 10]>
3 8 <tibble [14 x 10]>
# The nested data is summarized and returned with the grouping variable intact
mtcars_cyl %>%
purrr::pmap_dfr(function(cyl, data) {
data %>%
summarise_if(is.numeric, mean) %>%
mutate(cyl = cyl))
}
# A tibble: 3 x 11
mpg disp hp drat wt qsec vs am gear carb cyl
<dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 19.7 183. 122. 3.59 3.12 18.0 0.571 0.429 3.86 3.43 6
2 26.7 105. 82.6 4.07 2.29 19.1 0.909 0.727 4.09 1.55 4
3 15.1 353. 209. 3.23 4.00 16.8 0 0.143 3.29 3.5 8
For an indept discussion on split vs nest see this

how to give right context to subset in R

I am wondering what is the right way of making subset understand context of each variable. For instance, consider the following function:
> f <- function(num) {
subset(mtcars, carb == num)
}
> f(2)
mpg cyl disp hp drat wt qsec vs am gear carb
Hornet Sportabout 18.7 8 360.0 175 3.15 3.440 17.02 0 0 3 2
Merc 240D 24.4 4 146.7 62 3.69 3.190 20.00 1 0 4 2
Merc 230 22.8 4 140.8 95 3.92 3.150 22.90 1 0 4 2
Honda Civic 30.4 4 75.7 52 4.93 1.615 18.52 1 1 4 2
Dodge Challenger 15.5 8 318.0 150 2.76 3.520 16.87 0 0 3 2
AMC Javelin 15.2 8 304.0 150 3.15 3.435 17.30 0 0 3 2
Pontiac Firebird 19.2 8 400.0 175 3.08 3.845 17.05 0 0 3 2
Porsche 914-2 26.0 4 120.3 91 4.43 2.140 16.70 0 1 5 2
Lotus Europa 30.4 4 95.1 113 3.77 1.513 16.90 1 1 5 2
Volvo 142E 21.4 4 121.0 109 4.11 2.780 18.60 1 1 4 2
Now, consider the case where the name of argument of f is also coincidentally carb:
> f <- function(carb) {
subset(mtcars, carb == carb)
}
> f(2)
mpg cyl disp hp drat wt qsec vs am gear carb
Mazda RX4 21.0 6 160.0 110 3.90 2.620 16.46 0 1 4 4
Mazda RX4 Wag 21.0 6 160.0 110 3.90 2.875 17.02 0 1 4 4
Datsun 710 22.8 4 108.0 93 3.85 2.320 18.61 1 1 4 1
..............
This obviously, doesn't work. Wondering what is the right way of fixing this. I would have expected the following to work, but it doesn't. Could someone elaborate?
> f <- function(carb, env=parent.frame()) {
+ subset(mtcars, carb == eval(substitute(carb), env))
+ }
Thanks in advance
This works for me.
carb <- 2
mtcars[mtcars$carb == carb, ]
mpg cyl disp hp drat wt qsec vs am gear carb
Hornet Sportabout 18.7 8 360.0 175 3.15 3.440 17.02 0 0 3 2
Merc 240D 24.4 4 146.7 62 3.69 3.190 20.00 1 0 4 2
Merc 230 22.8 4 140.8 95 3.92 3.150 22.90 1 0 4 2
Honda Civic 30.4 4 75.7 52 4.93 1.615 18.52 1 1 4 2
Dodge Challenger 15.5 8 318.0 150 2.76 3.520 16.87 0 0 3 2
AMC Javelin 15.2 8 304.0 150 3.15 3.435 17.30 0 0 3 2
Pontiac Firebird 19.2 8 400.0 175 3.08 3.845 17.05 0 0 3 2
Porsche 914-2 26.0 4 120.3 91 4.43 2.140 16.70 0 1 5 2
Lotus Europa 30.4 4 95.1 113 3.77 1.513 16.90 1 1 5 2
Volvo 142E 21.4 4 121.0 109 4.11 2.780 18.60 1 1 4 2
Or alternatively
x <- 2
with(mtcars, mtcars[carb == x, ]) # same output as above

Resources