Is there a better way to select rownames beginning with something?
Ex.
df
k1 k2 p1 p2 l perda lP lucroVar
C16-C12 6.02 12.12 5.35 0.48 4.87 -1.23 3.96 79.84
C47-C12 6.62 12.12 4.63 0.48 4.15 -1.35 3.07 75.45
C7-C12 7.02 12.12 4.30 0.48 3.82 -1.28 2.98 74.90
C21-C12 7.12 12.12 4.19 0.48 3.71 -1.29 2.88 74.20
C12-C13 12.12 13.12 0.48 0.24 0.24 -0.76 0.32 24.00
C12-C43 12.12 13.62 0.48 0.16 0.32 -1.18 0.27 21.33
* The real data frame has 8000 rows.
The 2 following options work:
df[substr(rownames(df),1,3)=='C12',]
or
df[grep('^C12',rownames(df)),]
I would like
df['C12*',]
k1 k2 p1 p2 l perda lP lucroVar
C12-C13 12.12 13.12 0.48 0.24 0.24 -0.76 0.32 24.00
C12-C43 12.12 13.62 0.48 0.16 0.32 -1.18 0.27 21.33
In SQL there is "like 'C12%'".
not that I would recommend doing this but..
`[.data.frame` <- function(x, i, ...) {
base::`[.data.frame`(x, if (is.character(i)) grepl(i, rownames(x)) else i, ...)
}
letters[1]
# [1] "a"
mtcars[1, ]
# mpg cyl disp hp drat wt qsec vs am gear carb
# Mazda RX4 21 6 160 110 3.9 2.62 16.46 0 1 4 4
mtcars['M', ]
# mpg cyl disp hp drat wt qsec vs am gear carb
# Mazda RX4 21.0 6 160.0 110 3.90 2.620 16.46 0 1 4 4
# Mazda RX4 Wag 21.0 6 160.0 110 3.90 2.875 17.02 0 1 4 4
# Merc 240D 24.4 4 146.7 62 3.69 3.190 20.00 1 0 4 2
# Merc 230 22.8 4 140.8 95 3.92 3.150 22.90 1 0 4 2
# Merc 280 19.2 6 167.6 123 3.92 3.440 18.30 1 0 4 4
# Merc 280C 17.8 6 167.6 123 3.92 3.440 18.90 1 0 4 4
# Merc 450SE 16.4 8 275.8 180 3.07 4.070 17.40 0 0 3 3
# Merc 450SL 17.3 8 275.8 180 3.07 3.730 17.60 0 0 3 3
# Merc 450SLC 15.2 8 275.8 180 3.07 3.780 18.00 0 0 3 3
# AMC Javelin 15.2 8 304.0 150 3.15 3.435 17.30 0 0 3 2
# Maserati Bora 15.0 8 301.0 335 3.54 3.570 14.60 0 1 5 8
And to return to normal
rm('[', '[.data.frame')
Why you dislike your approach and what do you mean by "better"? More coincise syntax or faster?
dplyr can do the same, but is actually more convoluted since I think you need to transform rownames to an explicit variable
library(dplyr)
a <- data.frame(a=(1:4), row.names=c("C12","CC12", "C1","12"))
tbl_df(cbind(a=a, b=rownames(a)))%>%
filter(grepl("^C12", b))
Related
I use the mtcars dataframe and I use the following code to assign NA to all the values of columns drat and wt <=3.
df <- mtcars %>%
mutate(across(c(drat, wt), ~ifelse(.x<=3, NA, .x)))
How can I modify the code in a way that let me to assign NA also to the values of the column qsec if the value of drat or wt in the same row is <=3? At the end I want that each row where drat or wt is NA has NA also in the column qsec. Thanks
We may use if_any on the columns that are changed to NA to return a logical vector to replace values in 'qsec'
library(dplyr)
mtcars1 <- mtcars %>%
mutate(across(c(drat, wt), ~ifelse(.x<=3, NA, .x)),
qsec = ifelse(if_any(c(drat, wt), is.na), NA, qsec))
-output
> head(mtcars1)
mpg cyl disp hp drat wt qsec vs am gear carb
Mazda RX4 21.0 6 160 110 3.90 NA NA 0 1 4 4
Mazda RX4 Wag 21.0 6 160 110 3.90 NA NA 0 1 4 4
Datsun 710 22.8 4 108 93 3.85 NA NA 1 1 4 1
Hornet 4 Drive 21.4 6 258 110 3.08 3.215 19.44 1 0 3 1
Hornet Sportabout 18.7 8 360 175 3.15 3.440 17.02 0 0 3 2
Valiant 18.1 6 225 105 NA 3.460 NA 1 0 3 1
1) dplyr Continue the mutate to set qsec to NA if either drat or wt is NA. (If you meant and rather than or then replace | with & .)
mtcars %>%
mutate(across(c(drat, wt), ~ifelse(.x<=3, NA, .x)),
qsec = ifelse(is.na(drat) | is.na(wt), NA, qsec))
giving:
mpg cyl disp hp drat wt qsec vs am gear carb
Mazda RX4 21.0 6 160.0 110 3.90 NA NA 0 1 4 4
Mazda RX4 Wag 21.0 6 160.0 110 3.90 NA NA 0 1 4 4
Datsun 710 22.8 4 108.0 93 3.85 NA NA 1 1 4 1
Hornet 4 Drive 21.4 6 258.0 110 3.08 3.215 19.44 1 0 3 1
Hornet Sportabout 18.7 8 360.0 175 3.15 3.440 17.02 0 0 3 2
Valiant 18.1 6 225.0 105 NA 3.460 NA 1 0 3 1
Duster 360 14.3 8 360.0 245 3.21 3.570 15.84 0 0 3 4
Merc 240D 24.4 4 146.7 62 3.69 3.190 20.00 1 0 4 2
Merc 230 22.8 4 140.8 95 3.92 3.150 22.90 1 0 4 2
Merc 280 19.2 6 167.6 123 3.92 3.440 18.30 1 0 4 4
Merc 280C 17.8 6 167.6 123 3.92 3.440 18.90 1 0 4 4
Merc 450SE 16.4 8 275.8 180 3.07 4.070 17.40 0 0 3 3
Merc 450SL 17.3 8 275.8 180 3.07 3.730 17.60 0 0 3 3
Merc 450SLC 15.2 8 275.8 180 3.07 3.780 18.00 0 0 3 3
Cadillac Fleetwood 10.4 8 472.0 205 NA 5.250 NA 0 0 3 4
Lincoln Continental 10.4 8 460.0 215 NA 5.424 NA 0 0 3 4
Chrysler Imperial 14.7 8 440.0 230 3.23 5.345 17.42 0 0 3 4
Fiat 128 32.4 4 78.7 66 4.08 NA NA 1 1 4 1
Honda Civic 30.4 4 75.7 52 4.93 NA NA 1 1 4 2
Toyota Corolla 33.9 4 71.1 65 4.22 NA NA 1 1 4 1
Toyota Corona 21.5 4 120.1 97 3.70 NA NA 1 0 3 1
Dodge Challenger 15.5 8 318.0 150 NA 3.520 NA 0 0 3 2
AMC Javelin 15.2 8 304.0 150 3.15 3.435 17.30 0 0 3 2
Camaro Z28 13.3 8 350.0 245 3.73 3.840 15.41 0 0 3 4
Pontiac Firebird 19.2 8 400.0 175 3.08 3.845 17.05 0 0 3 2
Fiat X1-9 27.3 4 79.0 66 4.08 NA NA 1 1 4 1
Porsche 914-2 26.0 4 120.3 91 4.43 NA NA 0 1 5 2
Lotus Europa 30.4 4 95.1 113 3.77 NA NA 1 1 5 2
Ford Pantera L 15.8 8 351.0 264 4.22 3.170 14.50 0 1 5 4
Ferrari Dino 19.7 6 145.0 175 3.62 NA NA 0 1 5 6
Maserati Bora 15.0 8 301.0 335 3.54 3.570 14.60 0 1 5 8
Volvo 142E 21.4 4 121.0 109 4.11 NA NA 1 1 4 2
2) Base R With base R we can use within like this giving the same result.
within(mtcars, {
drat <- ifelse(drat <= 3, NA, drat)
wt <- ifelse(wt <= 3, NA, wt)
qsec <- ifelse(is.na(drat) | is.na(wt), NA, qsec)
})
or at the expense of some redundancy we could use transform:
transform(mtcars,
drat = ifelse(drat <= 3, NA, drat),
wt = ifelse(wt <= 3, NA, wt),
qsec = ifelse(drat <= 3 | wt <= 3, NA, qsec))
Since the question is not tagged tidyverse here is a base R way with is.na<-.
is.na(mtcars$drat) <- mtcars$drat < 3
is.na(mtcars$wt) <- mtcars$wt < 3
is.na(mtcars$qsec) <- with(mtcars, is.na(drat) | is.na(wt))
I would like to combine two list of data frames element wise and return a list of data frames. The following code works for the mtcars dataset
list1=split(mtcars[c(1:16),-11],mtcars[c(1:16),2])
list2=split(data.frame(mtcars[c(1:16),]),mtcars[c(1:16),2])
newList=Map(cbind, list1, list2)
How do I modify the Map function to just bind a specific column(s) from list2? Thanks
Since #thelatemail doesn't want to add an answer here is purrr version of his answer.
library(purrr)
map2(list1, map(list2, `[`, 'carb'), cbind)
#Or
#map2(list1, map(list2, `[`, 'carb'), dplyr::bind_cols)
#$`4`
# mpg cyl disp hp drat wt qsec vs am gear carb
#1 22.8 4 108.0 93 3.85 2.32 18.61 1 1 4 1
#2 24.4 4 146.7 62 3.69 3.19 20.00 1 0 4 2
#3 22.8 4 140.8 95 3.92 3.15 22.90 1 0 4 2
#$`6`
# mpg cyl disp hp drat wt qsec vs am gear carb
#1 21.0 6 160.0 110 3.90 2.620 16.46 0 1 4 4
#2 21.0 6 160.0 110 3.90 2.875 17.02 0 1 4 4
#3 21.4 6 258.0 110 3.08 3.215 19.44 1 0 3 1
#4 18.1 6 225.0 105 2.76 3.460 20.22 1 0 3 1
#5 19.2 6 167.6 123 3.92 3.440 18.30 1 0 4 4
#6 17.8 6 167.6 123 3.92 3.440 18.90 1 0 4 4
#$`8`
# mpg cyl disp hp drat wt qsec vs am gear carb
#1 18.7 8 360.0 175 3.15 3.440 17.02 0 0 3 2
#2 14.3 8 360.0 245 3.21 3.570 15.84 0 0 3 4
#3 16.4 8 275.8 180 3.07 4.070 17.40 0 0 3 3
#4 17.3 8 275.8 180 3.07 3.730 17.60 0 0 3 3
#5 15.2 8 275.8 180 3.07 3.780 18.00 0 0 3 3
#6 10.4 8 472.0 205 2.93 5.250 17.98 0 0 3 4
#7 10.4 8 460.0 215 3.00 5.424 17.82 0 0 3 4
I'm looking for an approach to do something like this
# this doesnt work
# accumulate(1:8, ~filter(mtcars, carb >= .x))
So that I can examine some summary statistics at different cutoff values. I could simply do
# this works but redundant filtering is done
map2(list(mtcars), 1:8, ~filter(.x, carb >= .y))
But since my data is rather large, it doesn't make sense to filter out values that were already filtered out in the step just before. In essence, this just duplicates the original dataframe a number of times and then filters each one separately. I was looking at accumulate from the purrr package, but that function doesn't seem fit to this problem (I'm hoping that I'm wrong on this). The base-R solution could be
# something like this works, but is ugly
output <- vector("list", length(1:8) + 1)
output[[1]] <- mtcars
for (i in 1:8) {
output[[i + 1]] <- filter(output[[i]], carb >= i)
}
output[[1]] <- NULL
but that's not particularly elegant. How can I accomplish this better?
# the above code assumes
library(tidyverse)
mtcars <- as_tibble(mtcars)
This is an example of something the output could be used for:
Your initial example accumulate(1:8, ~filter(mtcars, carb >= .x)) doesn't work because it uses the accumulated value (.x) as the filtering criteria, rather than the "next" value (.y). Try this:
library(tidyverse)
accumulate(2:8, function(x,y) filter(x, carb >= y), .init=mtcars)
#> [[1]]
#> mpg cyl disp hp drat wt qsec vs am gear carb
#> Mazda RX4 21.0 6 160.0 110 3.90 2.620 16.46 0 1 4 4
#> Mazda RX4 Wag 21.0 6 160.0 110 3.90 2.875 17.02 0 1 4 4
#> Datsun 710 22.8 4 108.0 93 3.85 2.320 18.61 1 1 4 1
#> Hornet 4 Drive 21.4 6 258.0 110 3.08 3.215 19.44 1 0 3 1
#> Hornet Sportabout 18.7 8 360.0 175 3.15 3.440 17.02 0 0 3 2
#> Valiant 18.1 6 225.0 105 2.76 3.460 20.22 1 0 3 1
#> Duster 360 14.3 8 360.0 245 3.21 3.570 15.84 0 0 3 4
#> Merc 240D 24.4 4 146.7 62 3.69 3.190 20.00 1 0 4 2
#> Merc 230 22.8 4 140.8 95 3.92 3.150 22.90 1 0 4 2
#> Merc 280 19.2 6 167.6 123 3.92 3.440 18.30 1 0 4 4
#> Merc 280C 17.8 6 167.6 123 3.92 3.440 18.90 1 0 4 4
#> Merc 450SE 16.4 8 275.8 180 3.07 4.070 17.40 0 0 3 3
#> Merc 450SL 17.3 8 275.8 180 3.07 3.730 17.60 0 0 3 3
#> Merc 450SLC 15.2 8 275.8 180 3.07 3.780 18.00 0 0 3 3
#> Cadillac Fleetwood 10.4 8 472.0 205 2.93 5.250 17.98 0 0 3 4
#> Lincoln Continental 10.4 8 460.0 215 3.00 5.424 17.82 0 0 3 4
#> Chrysler Imperial 14.7 8 440.0 230 3.23 5.345 17.42 0 0 3 4
#> Fiat 128 32.4 4 78.7 66 4.08 2.200 19.47 1 1 4 1
#> Honda Civic 30.4 4 75.7 52 4.93 1.615 18.52 1 1 4 2
#> Toyota Corolla 33.9 4 71.1 65 4.22 1.835 19.90 1 1 4 1
#> Toyota Corona 21.5 4 120.1 97 3.70 2.465 20.01 1 0 3 1
#> Dodge Challenger 15.5 8 318.0 150 2.76 3.520 16.87 0 0 3 2
#> AMC Javelin 15.2 8 304.0 150 3.15 3.435 17.30 0 0 3 2
#> Camaro Z28 13.3 8 350.0 245 3.73 3.840 15.41 0 0 3 4
#> Pontiac Firebird 19.2 8 400.0 175 3.08 3.845 17.05 0 0 3 2
#> Fiat X1-9 27.3 4 79.0 66 4.08 1.935 18.90 1 1 4 1
#> Porsche 914-2 26.0 4 120.3 91 4.43 2.140 16.70 0 1 5 2
#> Lotus Europa 30.4 4 95.1 113 3.77 1.513 16.90 1 1 5 2
#> Ford Pantera L 15.8 8 351.0 264 4.22 3.170 14.50 0 1 5 4
#> Ferrari Dino 19.7 6 145.0 175 3.62 2.770 15.50 0 1 5 6
#> Maserati Bora 15.0 8 301.0 335 3.54 3.570 14.60 0 1 5 8
#> Volvo 142E 21.4 4 121.0 109 4.11 2.780 18.60 1 1 4 2
#>
#> [[2]]
#> mpg cyl disp hp drat wt qsec vs am gear carb
#> 1 21.0 6 160.0 110 3.90 2.620 16.46 0 1 4 4
#> 2 21.0 6 160.0 110 3.90 2.875 17.02 0 1 4 4
#> 3 18.7 8 360.0 175 3.15 3.440 17.02 0 0 3 2
#> 4 14.3 8 360.0 245 3.21 3.570 15.84 0 0 3 4
#> 5 24.4 4 146.7 62 3.69 3.190 20.00 1 0 4 2
#> 6 22.8 4 140.8 95 3.92 3.150 22.90 1 0 4 2
#> 7 19.2 6 167.6 123 3.92 3.440 18.30 1 0 4 4
#> 8 17.8 6 167.6 123 3.92 3.440 18.90 1 0 4 4
#> 9 16.4 8 275.8 180 3.07 4.070 17.40 0 0 3 3
#> 10 17.3 8 275.8 180 3.07 3.730 17.60 0 0 3 3
#> 11 15.2 8 275.8 180 3.07 3.780 18.00 0 0 3 3
#> 12 10.4 8 472.0 205 2.93 5.250 17.98 0 0 3 4
#> 13 10.4 8 460.0 215 3.00 5.424 17.82 0 0 3 4
#> 14 14.7 8 440.0 230 3.23 5.345 17.42 0 0 3 4
#> 15 30.4 4 75.7 52 4.93 1.615 18.52 1 1 4 2
#> 16 15.5 8 318.0 150 2.76 3.520 16.87 0 0 3 2
#> 17 15.2 8 304.0 150 3.15 3.435 17.30 0 0 3 2
#> 18 13.3 8 350.0 245 3.73 3.840 15.41 0 0 3 4
#> 19 19.2 8 400.0 175 3.08 3.845 17.05 0 0 3 2
#> 20 26.0 4 120.3 91 4.43 2.140 16.70 0 1 5 2
#> 21 30.4 4 95.1 113 3.77 1.513 16.90 1 1 5 2
#> 22 15.8 8 351.0 264 4.22 3.170 14.50 0 1 5 4
#> 23 19.7 6 145.0 175 3.62 2.770 15.50 0 1 5 6
#> 24 15.0 8 301.0 335 3.54 3.570 14.60 0 1 5 8
#> 25 21.4 4 121.0 109 4.11 2.780 18.60 1 1 4 2
#>
#> [[3]]
#> mpg cyl disp hp drat wt qsec vs am gear carb
#> 1 21.0 6 160.0 110 3.90 2.620 16.46 0 1 4 4
#> 2 21.0 6 160.0 110 3.90 2.875 17.02 0 1 4 4
#> 3 14.3 8 360.0 245 3.21 3.570 15.84 0 0 3 4
#> 4 19.2 6 167.6 123 3.92 3.440 18.30 1 0 4 4
#> 5 17.8 6 167.6 123 3.92 3.440 18.90 1 0 4 4
#> 6 16.4 8 275.8 180 3.07 4.070 17.40 0 0 3 3
#> 7 17.3 8 275.8 180 3.07 3.730 17.60 0 0 3 3
#> 8 15.2 8 275.8 180 3.07 3.780 18.00 0 0 3 3
#> 9 10.4 8 472.0 205 2.93 5.250 17.98 0 0 3 4
#> 10 10.4 8 460.0 215 3.00 5.424 17.82 0 0 3 4
#> 11 14.7 8 440.0 230 3.23 5.345 17.42 0 0 3 4
#> 12 13.3 8 350.0 245 3.73 3.840 15.41 0 0 3 4
#> 13 15.8 8 351.0 264 4.22 3.170 14.50 0 1 5 4
#> 14 19.7 6 145.0 175 3.62 2.770 15.50 0 1 5 6
#> 15 15.0 8 301.0 335 3.54 3.570 14.60 0 1 5 8
#>
#> [[4]]
#> mpg cyl disp hp drat wt qsec vs am gear carb
#> 1 21.0 6 160.0 110 3.90 2.620 16.46 0 1 4 4
#> 2 21.0 6 160.0 110 3.90 2.875 17.02 0 1 4 4
#> 3 14.3 8 360.0 245 3.21 3.570 15.84 0 0 3 4
#> 4 19.2 6 167.6 123 3.92 3.440 18.30 1 0 4 4
#> 5 17.8 6 167.6 123 3.92 3.440 18.90 1 0 4 4
#> 6 10.4 8 472.0 205 2.93 5.250 17.98 0 0 3 4
#> 7 10.4 8 460.0 215 3.00 5.424 17.82 0 0 3 4
#> 8 14.7 8 440.0 230 3.23 5.345 17.42 0 0 3 4
#> 9 13.3 8 350.0 245 3.73 3.840 15.41 0 0 3 4
#> 10 15.8 8 351.0 264 4.22 3.170 14.50 0 1 5 4
#> 11 19.7 6 145.0 175 3.62 2.770 15.50 0 1 5 6
#> 12 15.0 8 301.0 335 3.54 3.570 14.60 0 1 5 8
#>
#> [[5]]
#> mpg cyl disp hp drat wt qsec vs am gear carb
#> 1 19.7 6 145 175 3.62 2.77 15.5 0 1 5 6
#> 2 15.0 8 301 335 3.54 3.57 14.6 0 1 5 8
#>
#> [[6]]
#> mpg cyl disp hp drat wt qsec vs am gear carb
#> 1 19.7 6 145 175 3.62 2.77 15.5 0 1 5 6
#> 2 15.0 8 301 335 3.54 3.57 14.6 0 1 5 8
#>
#> [[7]]
#> mpg cyl disp hp drat wt qsec vs am gear carb
#> 1 15 8 301 335 3.54 3.57 14.6 0 1 5 8
#>
#> [[8]]
#> mpg cyl disp hp drat wt qsec vs am gear carb
#> 1 15 8 301 335 3.54 3.57 14.6 0 1 5 8
Created on 2019-11-21 by the reprex package (v0.3.0)
The .init argument starts you off with mtcars, and then each step filters with an increment from the sequence (y) and passes off the filtered dataframe as the "accumulated" value (x) to the next iteration.
I have a huge dataframe and I am applying a function that has multiple outputs on one column and would like to add these outputs as columns in the dataframe.
Example function:
measure <- function(x){ # useless function for illustrative purposes
one <- x+1
two <- x^2
three <- x/2
m <- c(one,two,three)
names(m) <- c('Plus1','Square','Half')
return(m)
}
My current method which is very inefficient:
a <- mtcars %>% group_by(cyl) %>% mutate(Plus1 = measure(wt)[1], Square = measure(wt)[2],
Half = measure(wt)[3]) %>% as.data.frame()
Output:
head(a,15)
mpg cyl disp hp drat wt qsec vs am gear carb Plus1 Square Half
1 21.0 6 160.0 110 3.90 2.620 16.46 0 1 4 4 3.62 3.875 4.215
2 21.0 6 160.0 110 3.90 2.875 17.02 0 1 4 4 3.62 3.875 4.215
3 22.8 4 108.0 93 3.85 2.320 18.61 1 1 4 1 3.32 4.190 4.150
4 21.4 6 258.0 110 3.08 3.215 19.44 1 0 3 1 3.62 3.875 4.215
5 18.7 8 360.0 175 3.15 3.440 17.02 0 0 3 2 4.44 4.570 5.070
6 18.1 6 225.0 105 2.76 3.460 20.22 1 0 3 1 3.62 3.875 4.215
7 14.3 8 360.0 245 3.21 3.570 15.84 0 0 3 4 4.44 4.570 5.070
8 24.4 4 146.7 62 3.69 3.190 20.00 1 0 4 2 3.32 4.190 4.150
9 22.8 4 140.8 95 3.92 3.150 22.90 1 0 4 2 3.32 4.190 4.150
10 19.2 6 167.6 123 3.92 3.440 18.30 1 0 4 4 3.62 3.875 4.215
11 17.8 6 167.6 123 3.92 3.440 18.90 1 0 4 4 3.62 3.875 4.215
12 16.4 8 275.8 180 3.07 4.070 17.40 0 0 3 3 4.44 4.570 5.070
13 17.3 8 275.8 180 3.07 3.730 17.60 0 0 3 3 4.44 4.570 5.070
14 15.2 8 275.8 180 3.07 3.780 18.00 0 0 3 3 4.44 4.570 5.070
15 10.4 8 472.0 205 2.93 5.250 17.98 0 0 3 4 4.44 4.570 5.070
Is there any more efficient way to do this? My actual function has 13 outputs and it is taking very long to apply to my large dataframe. Please help!
There could be various ways to solve this however, one option is to return a tibble output from the function, split the dataframe based on group, calculate the statistics for each and bind the result together.
library(tidyverse)
measure <- function(x){
tibble(Plus1 = x+1,Square = x^2,Half = x/2)
}
bind_cols(mtcars %>% arrange(cyl),
mtcars %>%
group_split(cyl) %>%
map_df(~measure(.$wt)))
# mpg cyl disp hp drat wt qsec vs am gear carb Plus1 Square Half
#1 22.8 4 108.0 93 3.85 2.320 18.61 1 1 4 1 3.320 5.382400 1.1600
#2 24.4 4 146.7 62 3.69 3.190 20.00 1 0 4 2 4.190 10.176100 1.5950
#3 22.8 4 140.8 95 3.92 3.150 22.90 1 0 4 2 4.150 9.922500 1.5750
#4 32.4 4 78.7 66 4.08 2.200 19.47 1 1 4 1 3.200 4.840000 1.1000
#5 30.4 4 75.7 52 4.93 1.615 18.52 1 1 4 2 2.615 2.608225 0.8075
#6 33.9 4 71.1 65 4.22 1.835 19.90 1 1 4 1 2.835 3.367225 0.9175
#....
This calls measure only once per group irrespective of number of values returned unlike in the attempt it was called n times to extract n values.
Take the following example in R
library(dplyr)
library(tidyr)
mtcars_cyl <- mtcars %>% group_by(cyl) %>% nest()
if we look at the column names of mtcars_cyl, we see that cyl is no longer included.
mtcars_cyl$data[[1]] %>% colnames()
[1] "mpg" "disp" "hp" "drat" "wt" "qsec" "vs" "am" "gear" "carb"
I was expecting to find some method/option for retaining the group_by
columns within data, but finding a solution is escaping me. I can understand this might be a niche need. As an example, one might want to create a table of each group_by data frame and include cyl as a column in that output.
library(pander)
mtcars_cyl$data %>% pander::pander()
In other cases, when using in combination with purrr, one might need to include the group_by columns in a function call.
You can use split(mtcars, mtcars$cyl) instead. This gives a list of data frames.
split(mtcars, mtcars$cyl)
#> $`4`
#> mpg cyl disp hp drat wt qsec vs am gear carb
#> Datsun 710 22.8 4 108.0 93 3.85 2.320 18.61 1 1 4 1
#> Merc 240D 24.4 4 146.7 62 3.69 3.190 20.00 1 0 4 2
#> Merc 230 22.8 4 140.8 95 3.92 3.150 22.90 1 0 4 2
#> Fiat 128 32.4 4 78.7 66 4.08 2.200 19.47 1 1 4 1
#> Honda Civic 30.4 4 75.7 52 4.93 1.615 18.52 1 1 4 2
#> Toyota Corolla 33.9 4 71.1 65 4.22 1.835 19.90 1 1 4 1
#> Toyota Corona 21.5 4 120.1 97 3.70 2.465 20.01 1 0 3 1
#> Fiat X1-9 27.3 4 79.0 66 4.08 1.935 18.90 1 1 4 1
#> Porsche 914-2 26.0 4 120.3 91 4.43 2.140 16.70 0 1 5 2
#> Lotus Europa 30.4 4 95.1 113 3.77 1.513 16.90 1 1 5 2
#> Volvo 142E 21.4 4 121.0 109 4.11 2.780 18.60 1 1 4 2
#>
#> $`6`
#> mpg cyl disp hp drat wt qsec vs am gear carb
#> Mazda RX4 21.0 6 160.0 110 3.90 2.620 16.46 0 1 4 4
#> Mazda RX4 Wag 21.0 6 160.0 110 3.90 2.875 17.02 0 1 4 4
#> Hornet 4 Drive 21.4 6 258.0 110 3.08 3.215 19.44 1 0 3 1
#> Valiant 18.1 6 225.0 105 2.76 3.460 20.22 1 0 3 1
#> Merc 280 19.2 6 167.6 123 3.92 3.440 18.30 1 0 4 4
#> Merc 280C 17.8 6 167.6 123 3.92 3.440 18.90 1 0 4 4
#> Ferrari Dino 19.7 6 145.0 175 3.62 2.770 15.50 0 1 5 6
#>
#> $`8`
#> mpg cyl disp hp drat wt qsec vs am gear carb
#> Hornet Sportabout 18.7 8 360.0 175 3.15 3.440 17.02 0 0 3 2
#> Duster 360 14.3 8 360.0 245 3.21 3.570 15.84 0 0 3 4
#> Merc 450SE 16.4 8 275.8 180 3.07 4.070 17.40 0 0 3 3
#> Merc 450SL 17.3 8 275.8 180 3.07 3.730 17.60 0 0 3 3
#> Merc 450SLC 15.2 8 275.8 180 3.07 3.780 18.00 0 0 3 3
#> Cadillac Fleetwood 10.4 8 472.0 205 2.93 5.250 17.98 0 0 3 4
#> Lincoln Continental 10.4 8 460.0 215 3.00 5.424 17.82 0 0 3 4
#> Chrysler Imperial 14.7 8 440.0 230 3.23 5.345 17.42 0 0 3 4
#> Dodge Challenger 15.5 8 318.0 150 2.76 3.520 16.87 0 0 3 2
#> AMC Javelin 15.2 8 304.0 150 3.15 3.435 17.30 0 0 3 2
#> Camaro Z28 13.3 8 350.0 245 3.73 3.840 15.41 0 0 3 4
#> Pontiac Firebird 19.2 8 400.0 175 3.08 3.845 17.05 0 0 3 2
#> Ford Pantera L 15.8 8 351.0 264 4.22 3.170 14.50 0 1 5 4
#> Maserati Bora 15.0 8 301.0 335 3.54 3.570 14.60 0 1 5 8
Generally, I tend to use nest() but also miss the grouping variables.
It is rarely a problem in workflows where the nested data is passed to purrr::pmap functions. This work flow allows for subsetting data with nest and apply functions to the nested dataframes including the grouping variables.
library(dplyr)
library(tidyr)
mtcars_cyl <- mtcars %>% group_by(cyl) %>% nest()
# The nested data
mtcars_cyl
# A tibble: 3 x 2
cyl data
<dbl> <list>
1 6 <tibble [7 x 10]>
2 4 <tibble [11 x 10]>
3 8 <tibble [14 x 10]>
# The nested data is summarized and returned with the grouping variable intact
mtcars_cyl %>%
purrr::pmap_dfr(function(cyl, data) {
data %>%
summarise_if(is.numeric, mean) %>%
mutate(cyl = cyl))
}
# A tibble: 3 x 11
mpg disp hp drat wt qsec vs am gear carb cyl
<dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 19.7 183. 122. 3.59 3.12 18.0 0.571 0.429 3.86 3.43 6
2 26.7 105. 82.6 4.07 2.29 19.1 0.909 0.727 4.09 1.55 4
3 15.1 353. 209. 3.23 4.00 16.8 0 0.143 3.29 3.5 8
For an indept discussion on split vs nest see this