Specify a row on mutate - r

I have this code which adds a different text for each line in the df.
mt<- mtcars %>% group_by(cyl,am) %>% nest() %>%
mutate(formula = "Add separate model for each row in text like mpg~wt for one row, mpg~wt+hp for another etc.")
mt$formula[[1]] <- "mpg~wt"
mt$formula[[2]] <- "mpg~wt+drat"
mt$formula[[3]] <- "mpg~wt+qsec"
mt$formula[[4]] <- "mpg~wt+gear"
mt$formula[[5]] <- "mpg~wt"
mt$formula[[6]] <- "mpg~wt+qsec"
Is there a more elegant way of referring to row number 1, like specifying cyl==6 and am==1 instead of the line number?
Thanks

One option would be to use dplyr::case_when:
library(tidyr)
library(dplyr)
mt<- mtcars %>% group_by(cyl,am) %>% nest()
mt %>%
mutate(formula = case_when(
cyl == 6 & am == 1 ~ "mpg~wt",
cyl == 8 & am == 0 ~ "mpg~wt+drat"
))
#> # A tibble: 6 x 4
#> # Groups: cyl, am [6]
#> cyl am data formula
#> <dbl> <dbl> <list> <chr>
#> 1 6 1 <tibble [3 x 9]> mpg~wt
#> 2 4 1 <tibble [8 x 9]> <NA>
#> 3 6 0 <tibble [4 x 9]> <NA>
#> 4 8 0 <tibble [12 x 9]> mpg~wt+drat
#> 5 4 0 <tibble [3 x 9]> <NA>
#> 6 8 1 <tibble [2 x 9]> <NA>

Related

unnest() doesn't work with list columns any more

You can comfortably create nested models with map() resulting in list columns:
df <- mtcars %>%
nest(data = -c(cyl)) %>%
mutate(aov = map(data, ~ aov(mpg ~ hp, data = .x))) %>%
mutate(dunned = map(data, ~ rstatix::dunn_test(mpg ~ hp, data = .x)))
df
# A tibble: 3 × 4
cyl data aov dunned
<dbl> <list> <list> <list>
1 6 <tibble [7 × 10]> <aov> <rstatix_test [6 × 9]>
2 4 <tibble [11 × 10]> <aov> <rstatix_test [45 × 9]>
3 8 <tibble [14 × 10]> <aov> <rstatix_test [36 × 9]>
However, desolving these list columns by unnest() works only sometimes, e.g. here:
df %>% unnest(dunned)
# A tibble: 87 × 12
cyl data aov .y. group1 group2 n1 n2 statistic p
<dbl> <list> <lis> <chr> <chr> <chr> <int> <int> <dbl> <dbl>
1 6 <tibble… <aov> mpg 105 110 1 3 1.62 0.106
2 6 <tibble… <aov> mpg 105 123 1 2 0 1
3 6 <tibble… <aov> mpg 105 175 1 1 0.661 0.509
However, it doesn't work in other cases:
df %>% unnest(aov) # Error: Input must be a vector, not a <aov/lm> object.
df %>% unnest_wider(aov) # Error: Input must be list of vectors
df %>% unnest_legacy(aov) # Error: Each column must either be a list of vectors or a list of data frames [lm]
Before tidier 1.0, the unnest() would work on the above objects, but now it doesn't (see unnest_legacy(aov)). Why?
My guess is it depends on the output format (e.g. if it is a dataframe, denoted by [6 × 9]), so these cases are no problem.
Question:
How can you desolve list columns (created by common models such as aov and lm) for which none of the unnest/unnest_wider/unnest_legacy options work?
Expected Result:
I'm looking for a solution which is more general than a "trick" I know for the aov case:
df <- mtcars %>%
nest(data = -c(cyl)) %>%
mutate(aov = map(data, ~ aov(mpg ~ hp, data = .x))) %>%
mutate(tidied = map(aov, ~ broom::tidy(.x)))
which unfolds easily:
df %>% unnest(tidied)
# A tibble: 6 × 9
cyl data aov term df sumsq meansq statistic
<dbl> <list> <list> <chr> <dbl> <dbl> <dbl> <dbl>
1 6 <tibble… <aov> hp 1 0.205 0.205 0.0821
2 6 <tibble… <aov> Resi… 5 12.5 2.49 NA
3 4 <tibble… <aov> hp 1 55.7 55.7 3.40
4 4 <tibble… <aov> Resi… 9 148. 16.4 NA

Since .key is deprecated how it is possible to rename the data column in nest()?

Before .key was deprecated I did this:
library(tidyverse)
mtcars %>% group_by(cyl) %>% nest(.key = "my_name")
The help of nest() points out that now this is performed using tidy select, but I don't know how.
You can use the new nest_by function in dplyr 1.0.0 which works similar to what you had previously with nest.
library(dplyr)
mtcars %>% group_by(cyl) %>% nest_by(.key = "my_name")
# cyl my_name
# <dbl> <list<tbl_df[,10]>>
#1 4 [11 × 10]
#2 6 [7 × 10]
#3 8 [14 × 10]
You can also do the same without grouping.
mtcars %>% nest_by(cyl, .key = "my_name")
You can use group_cols() to refer to grouping variables:
mtcars %>% group_by(cyl) %>% nest(my_name = !group_cols())
#> # A tibble: 3 x 2
#> # Groups: cyl [3]
#> cyl my_name
#> <dbl> <list>
#> 1 6 <tibble [7 × 10]>
#> 2 4 <tibble [11 × 10]>
#> 3 8 <tibble [14 × 10]>
mtcars %>% nest(my_name = !cyl)
#> # A tibble: 3 x 2
#> # Groups: cyl [3]
#> cyl my_name
#> <dbl> <list>
#> 1 6 <tibble [7 × 10]>
#> 2 4 <tibble [11 × 10]>
#> 3 8 <tibble [14 × 10]>
The name can be provided directly in the arguments of nest:
mtcars %>% nest( my_name = -cyl ) # Nest by everything except cyl
# # A tibble: 3 x 2
# cyl my_name
# <dbl> <list>
# 1 6 <tibble [7 × 10]>
# 2 4 <tibble [11 × 10]>
# 3 8 <tibble [14 × 10]>

Nesting tibbles after grouping by factor variables produces NULL elements in R

I was reviewing some of my old R code when I stumbled upon several errors.
After running each line and playing around with my data I discovered that tidyr::nest()ing tibbles dplyr::group(ed)_by factor variables produced one or more NULL elements.
Here an example with the mtcars data:
library(dplyr)
library(tidyr)
mtcars %>%
as_tibble() %>%
select(cyl, carb, mpg) %>%
mutate(cyl = factor(cyl),
carb = factor(carb)) %>%
group_by(cyl, carb) %>%
nest()
# A tibble: 9 x 3
# cyl carb data
# <fct> <fct> <list>
# 1 6 4 <NULL>
# 2 4 1 <tibble [5 x 1]>
# 3 6 1 <tibble [3 x 1]>
# 4 8 2 <NULL>
# 5 8 4 <NULL>
# 6 4 2 <tibble [6 x 1]>
# 7 8 3 <NULL>
# 8 6 6 <NULL>
# 9 8 8 <NULL>
I thought nest() is taking factors as.numeric() and "getting confused" when different variables present same-named groups. But then I tried:
mtcars %>%
as_tibble() %>%
select(cyl, carb, mpg) %>%
mutate(cyl = factor(cyl) %>% as.numeric(),
carb = factor(carb) %>% as.numeric()) %>%
group_by(cyl, carb) %>%
nest()
and got the same result as when nesting with non-factorial variables:
# A tibble: 9 x 3
# cyl carb data
# <dbl> <dbl> <list>
# 1 2 4 <tibble [4 x 1]>
# 2 1 1 <tibble [5 x 1]>
# 3 2 1 <tibble [2 x 1]>
# 4 3 2 <tibble [4 x 1]>
# 5 3 4 <tibble [6 x 1]>
# 6 1 2 <tibble [6 x 1]>
# 7 3 3 <tibble [3 x 1]>
# 8 2 5 <tibble [1 x 1]>
# 9 3 6 <tibble [1 x 1]>
Compare with:
mtcars %>%
as_tibble() %>%
select(cyl, carb, mpg) %>%
group_by(cyl, carb) %>%
nest()
# A tibble: 9 x 3
# cyl carb data
# <dbl> <dbl> <list>
# 1 6 4 <tibble [4 x 1]>
# 2 4 1 <tibble [5 x 1]>
# 3 6 1 <tibble [2 x 1]>
# 4 8 2 <tibble [4 x 1]>
# 5 8 4 <tibble [6 x 1]>
# 6 4 2 <tibble [6 x 1]>
# 7 8 3 <tibble [3 x 1]>
# 8 6 6 <tibble [1 x 1]>
# 9 8 8 <tibble [1 x 1]>
Since my code used to work fine until last month, I wonder if tidyr was updated lately and the way of handling factor groups by nest() was changed?
Is it generally advisable not to nest data grouped by factor variables or rather not to group_by() on factor variables?
Edit:
In the issue mentioned by aosmith, Hadley references to group_nest() which seems to resolve the problem (caveat: this function reorders the tibble!). Nevertheless, I still wonder why nest() is producing NULLs...
mtcars %>%
as_tibble() %>%
select(cyl, carb, mpg) %>%
mutate(cyl = factor(cyl),
carb = factor(carb)) %>%
group_by(cyl, carb) %>%
group_nest() %>%
unnest %>%
all.equal(.,
mtcars %>%
as_tibble() %>%
select(cyl, carb, mpg) %>%
mutate(cyl = factor(cyl),
carb = factor(carb)))
# [1] TRUE
As aosmith suggested, this has recently been fixed in the dev version of tidyr. Since I didn't recognize that from the issue linked and I couldn't manage do install the dev version, I submitted this question as another issue. Hadley just answered it.

Accessing grouping variables in purrr::map() with nested dataframes

I'm using tidyr::nest() in combination with purrr::map() (-family) to group a data.frame into groups and then do some fancy stuff with each subset. Consider following example, and please ignore the fact that I don't need nest() and map() to do this (this is an oversimplified example):
library(dplyr)
library(purrr)
library(tidyr)
mtcars %>%
group_by(cyl) %>%
nest() %>%
mutate(
wt_mean = map_dbl(data,~mean(.x$wt))
)
# A tibble: 8 x 4
cyl gear data cly2
<dbl> <dbl> <list> <dbl>
1 6 4 <tibble [4 x 9]> 6
2 4 4 <tibble [8 x 9]> 4
3 6 3 <tibble [2 x 9]> 6
4 8 3 <tibble [12 x 9]> 8
5 4 3 <tibble [1 x 9]> 4
6 4 5 <tibble [2 x 9]> 4
7 8 5 <tibble [2 x 9]> 8
8 6 5 <tibble [1 x 9]> 6
Usually when I do this type of operation, I need access to the grouping variable (cyl in this case) within map(). But these grouping variables appear as vectors with length corresponding to the number of rows in the nested dataframe, and therefore don't lend themselves easily.
Is there a way I could run the following operation? I would want the mean of wt to be divided by the number of cylinders (cyl) per group (i.e. row).
mtcars %>%
group_by(cyl,gear) %>%
nest() %>%
mutate(
wt_mean = map_dbl(data,~mean(.x$wt)/cyl)
)
Error in mutate_impl(.data, dots) :
Evaluation error: Result 1 is not a length 1 atomic vector.
Take cyl out of the map call:
mtcars %>%
group_by(cyl,gear) %>%
nest() %>%
mutate(
wt_mean = map_dbl(data, ~mean(.x$wt)) / cyl
)
# A tibble: 8 x 4
cyl gear data wt_mean
<dbl> <dbl> <list> <dbl>
1 6 4 <tibble [4 x 9]> 0.516
2 4 4 <tibble [8 x 9]> 0.595
3 6 3 <tibble [2 x 9]> 0.556
4 8 3 <tibble [12 x 9]> 0.513
5 4 3 <tibble [1 x 9]> 0.616
6 4 5 <tibble [2 x 9]> 0.457
7 8 5 <tibble [2 x 9]> 0.421
8 6 5 <tibble [1 x 9]> 0.462
map_dbl sees cyl as a length 8 vector because nest removes groups from data.frame. Using cyl in map_* function call (as in OP's example) results in 8 length-8 vectors.
2 other approaches:
Both with same result as above, but keep the grouped variables in the map_* call, per OP's specs:
Re grouping after nest
mtcars %>%
group_by(cyl,gear) %>%
nest() %>%
group_by(cyl, gear) %>%
mutate(wt_mean = map_dbl(data,~mean(.x$wt)/cyl))
map2 for iterating over cyl
mtcars %>%
group_by(cyl,gear) %>%
nest() %>%
mutate(wt_mean = map2_dbl(data, cyl,~mean(.x$wt)/ .y))
In the new release of dplyr 0-8-0, you can now use group_map, which I find very handy for this use case. This is the example by github user #yutannihilation
library(dplyr, warn.conflicts = FALSE)
mtcars %>%
group_by(cyl) %>%
group_map(function(data, group_info) {
tibble::tibble(wt_mean = mean(data$wt) / group_info$cyl)
})

Tidyverse and R: how to count rows in a tibble of a nested dataframe

So, I've checked multiple posts and haven't found anything. According to this, my code should work, but it isn't.
Objective: I want to essentially print out the number of subjects--which in this case is also the number of rows in this tibble.
Code:
data<-read.csv("advanced_r_programming/data/MIE.csv")
make_LD<-function(x){
LongitudinalData<-x%>%
group_by(id)%>%
nest()
structure(list(LongitudinalData), class = "LongitudinalData")
}
print.LongitudinalData<-function(x){
paste("Longitudinal dataset with", x[["id"]], "subjects")
}
x<-make_LD(data)
print(x)
Here's the head of the dataset I'm working on:
> head(x)
[[1]]
# A tibble: 10 x 2
id data
<int> <list>
1 14 <tibble [11,945 x 4]>
2 20 <tibble [11,497 x 4]>
3 41 <tibble [11,636 x 4]>
4 44 <tibble [13,104 x 4]>
5 46 <tibble [13,812 x 4]>
6 54 <tibble [10,944 x 4]>
7 64 <tibble [11,367 x 4]>
8 74 <tibble [11,517 x 4]>
9 104 <tibble [11,232 x 4]>
10 106 <tibble [13,823 x 4]>
Output:
[1] "Longitudinal dataset with subjects"
I've tried every possible combination from the aforementioned stackoverflow post and none seem to work.
Here are two options:
library(tidyverse)
# Create a nested data frame
dat = mtcars %>%
group_by(cyl) %>%
nest %>% as.tibble
cyl data
1 6 <tibble [7 x 10]>
2 4 <tibble [11 x 10]>
3 8 <tibble [14 x 10]>
dat %>%
mutate(nrow=map_dbl(data, nrow))
dat %>%
group_by(cyl) %>%
mutate(nrow = nrow(data.frame(data)))
cyl data nrow
1 6 <tibble [7 x 10]> 7
2 4 <tibble [11 x 10]> 11
3 8 <tibble [14 x 10]> 14
There is a specific function for this in the tidyverse: n()
You can simply do: mtcars %>% group_by(cyl) %>% summarise(rows = n())
> mtcars %>% group_by(cyl) %>% summarise(rows = n())
# A tibble: 3 x 2
cyl rows
<dbl> <int>
1 4 11
2 6 7
3 8 14
In more sophisticated cases, in which subjects may span across multiple rows ("long format data"), you can do (assuming hp denotes the subject):
> mtcars %>% group_by(cyl, hp) %>% #always group by subject-ID last
+ summarise(n = n()) %>% #observations per subject and cyl
+ summarise(n = n()) #subjects per cyl (implicitly summarises across all group-variables except the last)
`summarise()` has grouped output by 'cyl'. You can override using the `.groups` argument.
# A tibble: 3 x 2
cyl n
<dbl> <int>
1 4 10
2 6 4
3 8 9
Note that the n in the last case is smaller than in the first because there are cars with same amount of cyl and hp that are now counted as just one "subject".

Resources