tidyr::pivot_wider() reorder column names grouping by `name_from` - r

I would like to reorder the columns grouping by names_from instead of values_from, here is my minimal example:
mtcars %>%
tidyr::pivot_wider(names_from = gear, values_from = c(vs, am, carb))
output:
mpg cyl disp hp drat wt qsec vs_4 vs_3 vs_5 am_4 am_3 am_5 carb_4 carb_3 carb_5
<dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 21 6 160 110 3.9 2.62 16.5 0 NA NA 1 NA NA 4 NA NA
2 21 6 160 110 3.9 2.88 17.0 0 NA NA 1 NA NA 4 NA NA
3 22.8 4 108 93 3.85 2.32 18.6 1 NA NA 1 NA NA 1 NA NA
Here is what I want the output:
mpg cyl disp hp drat wt qsec vs_4 am_4 carb_4 vs_3 am_3 carb_3 vs_5 am_5 carb_5
<dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 21 6 160 110 3.9 2.62 16.5 0 1 4 NA NA NA NA NA NA
2 21 6 160 110 3.9 2.88 17.0 0 1 4 NA NA NA NA NA NA
Thanks in advance!

As far as I know, this can't be accomplished with pivot_wider and must be done afterwards.
Here is a long-winded attempt, but it does the job:
library(tidyverse)
suffixes <- unique(mtcars$gear)
pivoted <- mtcars %>%
tidyr::pivot_wider(names_from = gear, values_from = c(vs, am, carb))
names_to_order <- map(suffixes, ~ names(pivoted)[grep(paste0("_", .x), names(pivoted))]) %>% unlist
names_id <- setdiff(names(pivoted), names_to_order)
pivoted %>%
select(names_id, names_to_order)
#> # A tibble: 32 x 16
#> mpg cyl disp hp drat wt qsec vs_4 am_4 carb_4 vs_3 am_3
#> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 21 6 160 110 3.9 2.62 16.5 0 1 4 NA NA
#> 2 21 6 160 110 3.9 2.88 17.0 0 1 4 NA NA
#> 3 22.8 4 108 93 3.85 2.32 18.6 1 1 1 NA NA
#> 4 21.4 6 258 110 3.08 3.22 19.4 NA NA NA 1 0
#> 5 18.7 8 360 175 3.15 3.44 17.0 NA NA NA 0 0
#> 6 18.1 6 225 105 2.76 3.46 20.2 NA NA NA 1 0
#> 7 14.3 8 360 245 3.21 3.57 15.8 NA NA NA 0 0
#> 8 24.4 4 147. 62 3.69 3.19 20 1 0 2 NA NA
#> 9 22.8 4 141. 95 3.92 3.15 22.9 1 0 2 NA NA
#> 10 19.2 6 168. 123 3.92 3.44 18.3 1 0 4 NA NA
#> # ... with 22 more rows, and 4 more variables: carb_3 <dbl>, vs_5 <dbl>,
#> # am_5 <dbl>, carb_5 <dbl>
Created on 2020-02-25 by the reprex package (v0.3.0)

Related

Replace values with NAs based on a column condition

I have a dataframe with different columns, one of which tells me if data in other columns can be "trusted" or not, containing a "yes" or a no" (column name: inside_calibration_range). What I would like to do is simply to replace the values in the whole row with NA every time I have a "no" in the inside_calibration_range column.
I gave it a look to dplyr::na_if and replace_with_na_all() functions, but (I may be wrong) it seems they do not accept conditions, but they replace specific values in the whole dataframe.
When cyl equal to 6 cannot be trusted in mtcars, we can mutate across everything to NA for that condition:
library(tidyverse)
data(mtcars)
as_tibble(mtcars %>% mutate(across(everything(), ~replace(., cyl == 6 , NA))))
# A tibble: 32 × 11
mpg cyl disp hp drat wt qsec vs am gear carb
<dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 NA NA NA NA NA NA NA NA NA NA NA
2 NA NA NA NA NA NA NA NA NA NA NA
3 22.8 4 108 93 3.85 2.32 18.6 1 1 4 1
4 NA NA NA NA NA NA NA NA NA NA NA
5 18.7 8 360 175 3.15 3.44 17.0 0 0 3 2
6 NA NA NA NA NA NA NA NA NA NA NA
7 14.3 8 360 245 3.21 3.57 15.8 0 0 3 4
8 24.4 4 147. 62 3.69 3.19 20 1 0 4 2
9 22.8 4 141. 95 3.92 3.15 22.9 1 0 4 2
10 NA NA NA NA NA NA NA NA NA NA NA
# … with 22 more rows
# ℹ Use `print(n = ...)` to see more rows
Select only some columns instead of all:
as_tibble(mtcars %>% mutate(across(c(mpg, disp), ~replace(., cyl == 6 , NA))))
# A tibble: 32 × 11
mpg cyl disp hp drat wt qsec vs am gear carb
<dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 NA 6 NA 110 3.9 2.62 16.5 0 1 4 4
2 NA 6 NA 110 3.9 2.88 17.0 0 1 4 4
3 22.8 4 108 93 3.85 2.32 18.6 1 1 4 1
4 NA 6 NA 110 3.08 3.22 19.4 1 0 3 1
5 18.7 8 360 175 3.15 3.44 17.0 0 0 3 2
6 NA 6 NA 105 2.76 3.46 20.2 1 0 3 1
7 14.3 8 360 245 3.21 3.57 15.8 0 0 3 4
8 24.4 4 147. 62 3.69 3.19 20 1 0 4 2
9 22.8 4 141. 95 3.92 3.15 22.9 1 0 4 2
10 NA 6 NA 123 3.92 3.44 18.3 1 0 4 4
# … with 22 more rows
# ℹ Use `print(n = ...)` to see more rows

Add dynamic value to row by group

I'd like to add a row for each group, where the entry for a particular column is the mean of the values of that column for that group. It's easy to add a constant value
library(dplyr)
mtcars %>% group_by(cyl) %>% group_modify(~add_row(.x, .before=0, carb=2))
# A tibble: 35 x 11
# Groups: cyl [3]
cyl mpg disp hp drat wt qsec vs am gear carb
<dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 4 NA NA NA NA NA NA NA NA NA 2
2 4 22.8 108 93 3.85 2.32 18.6 1 1 4 1
3 4 24.4 147. 62 3.69 3.19 20 1 0 4 2
4 4 22.8 141. 95 3.92 3.15 22.9 1 0 4 2
But when I try to dynamically add e.g. the mean of all carbs for that group, it doesn't recognise carb as a column:
mtcars %>% group_by(cyl) %>% group_modify(~add_row(.x, .before=0, carb=mean(carb)))
Error in mean(carb) : object 'carb' not found
Alternatively:
library(tidyverse)
mtcars %>%
group_by(cyl) %>%
summarise(carb = mean(carb)) %>%
bind_rows(mtcars) %>%
arrange(cyl)
#> # A tibble: 35 x 11
#> cyl carb mpg disp hp drat wt qsec vs am gear
#> * <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 4 1.55 NA NA NA NA NA NA NA NA NA
#> 2 4 1 22.8 108 93 3.85 2.32 18.6 1 1 4
#> 3 4 2 24.4 147. 62 3.69 3.19 20 1 0 4
#> 4 4 2 22.8 141. 95 3.92 3.15 22.9 1 0 4
#> 5 4 1 32.4 78.7 66 4.08 2.2 19.5 1 1 4
#> 6 4 2 30.4 75.7 52 4.93 1.62 18.5 1 1 4
#> 7 4 1 33.9 71.1 65 4.22 1.84 19.9 1 1 4
#> 8 4 1 21.5 120. 97 3.7 2.46 20.0 1 0 3
#> 9 4 1 27.3 79 66 4.08 1.94 18.9 1 1 4
#> 10 4 2 26 120. 91 4.43 2.14 16.7 0 1 5
#> # ... with 25 more rows

unnest_longer gives dollar sign instead of normal tibble

can anybody explain the following difference:
library(tidyverse)
tribble(~id,
c(1:10))%>%
unnest_longer(id)%>%
mutate(data = map(.x = id, ~mtcars))%>%
unnest_longer(data)
gives:
# A tibble: 320 x 2
id data$mpg $cyl $disp $hp $drat $wt $qsec $vs $am $gear $carb
<int> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 1 21 6 160 110 3.9 2.62 16.5 0 1 4 4
2 1 21 6 160 110 3.9 2.88 17.0 0 1 4 4
3 1 22.8 4 108 93 3.85 2.32 18.6 1 1 4 1
4 1 21.4 6 258 110 3.08 3.22 19.4 1 0 3 1
5 1 18.7 8 360 175 3.15 3.44 17.0 0 0 3 2
whereas
library(tidyverse)
tribble(~id,
c(1:10))%>%
unnest_longer(id)%>%
mutate(data = map(.x = id, ~mtcars))%>%
unnest(data)
gives the result I want.
# A tibble: 320 x 12
id mpg cyl disp hp drat wt qsec vs am gear carb
<int> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 1 21 6 160 110 3.9 2.62 16.5 0 1 4 4
2 1 21 6 160 110 3.9 2.88 17.0 0 1 4 4
3 1 22.8 4 108 93 3.85 2.32 18.6 1 1 4 1
4 1 21.4 6 258 110 3.08 3.22 19.4 1 0 3 1
5 1 18.7 8 360 175 3.15 3.44 17.0 0 0 3 2
Why are there $-Signs in the 1st example of code?
Thanks in advance!
This is a feature introduced in {tibble} version 2.0.1 - docs.
To get the same results as tidyr::unnest(), you'll need to add tidyr::unpack().
tribble(
~id,
c(1:10)
) %>%
tidyr::unnest_longer(id) %>%
dplyr::mutate(data = purrr::map(
.x = id,
~ mtcars
)) %>%
tidyr::unnest_longer(data) %>%
tidyr::unpack(cols = data)

Pivot_wider / spread instead of value_from or values just 1?

I would like to take a feature and spread it's values as columns with 1/0 if true/false e.g.
mtcars %>%
pivot_wider(names_from = cyl,
values_from = 1)
This appears to have done something, cyl has now been spread into columns except the values are things like 21, 21.4 or NA.
> mtcars %>%
+ pivot_wider(names_from = cyl,
+ values_from = 1)
# A tibble: 32 x 12
disp hp drat wt qsec vs am gear carb `6` `4` `8`
<dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 160 110 3.9 2.62 16.5 0 1 4 4 21 NA NA
2 160 110 3.9 2.88 17.0 0 1 4 4 21 NA NA
3 108 93 3.85 2.32 18.6 1 1 4 1 NA 22.8 NA
4 258 110 3.08 3.22 19.4 1 0 3 1 21.4 NA NA
5 360 175 3.15 3.44 17.0 0 0 3 2 NA NA 18.7
6 225 105 2.76 3.46 20.2 1 0 3 1 18.1 NA NA
7 360 245 3.21 3.57 15.8 0 0 3 4 NA NA 14.3
8 147. 62 3.69 3.19 20 1 0 4 2 NA 24.4 NA
9 141. 95 3.92 3.15 22.9 1 0 4 2 NA 22.8 NA
10 168. 123 3.92 3.44 18.3 1 0 4 4 19.2 NA NA
I tried using values_fill like so:
> mtcars %>%
+ pivot_wider(names_from = cyl,
+ values_from = 1,
+ values_fill = list(1 = 0))
Error: unexpected '=' in:
" values_from = 1,
values_fill = list(1 ="
How can I spread cyl across columns with binary 1 or 0 values depending on whether or not the cyl is 4, 6 or 8?
Is pivot_wider() what I want?
Set mpg to 1 and set the fill for mpg to 0 like this:
mtcars %>%
mutate(mpg = 1) %>%
pivot_wider(names_from = cyl, values_from = mpg, values_fill = list(mpg = 0))
## # A tibble: 32 x 12
## disp hp drat wt qsec vs am gear carb `6` `4` `8`
## <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 160 110 3.9 2.62 16.5 0 1 4 4 1 0 0
## 2 160 110 3.9 2.88 17.0 0 1 4 4 1 0 0
## 3 108 93 3.85 2.32 18.6 1 1 4 1 0 1 0
## ... etc ...
or given the problem that pivot_wider currently has with ordering the columns you may prefer the older spread:
mtcars %>%
mutate(mpg = 1) %>%
spread(cyl, mpg, fill = 0)
## disp hp drat wt qsec vs am gear carb 4 6 8
## 1 71.1 65 4.22 1.835 19.90 1 1 4 1 1 0 0
## 2 75.7 52 4.93 1.615 18.52 1 1 4 2 1 0 0
## 3 78.7 66 4.08 2.200 19.47 1 1 4 1 1 0 0
## ... etc ...
Alternately specify values_fn like this:
mtcars %>%
pivot_wider(names_from = cyl, values_from = mpg,
values_fn = list(mpg = ~ 1), values_fill = list(mpg = 0))
## # A tibble: 32 x 12
## disp hp drat wt qsec vs am gear carb `6` `4` `8`
## <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 160 110 3.9 2.62 16.5 0 1 4 4 1 0 0
## 2 160 110 3.9 2.88 17.0 0 1 4 4 1 0 0
## 3 108 93 3.85 2.32 18.6 1 1 4 1 0 1 0
## ...etc...
An option would be to use the names and values from cyl and then recode this based on is.na:
mtcars %>%
pivot_wider(names_from = cyl,
values_from = cyl) %>%
mutate_at(vars(!!!syms(as.character(unique(mtcars$cyl)))), ~if_else(is.na(.), 0, 1))
# A tibble: 32 x 13
# mpg disp hp drat wt qsec vs am gear carb `6` `4` `8`
# <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
# 1 21 160 110 3.9 2.62 16.5 0 1 4 4 1 0 0
# 2 21 160 110 3.9 2.88 17.0 0 1 4 4 1 0 0
# 3 22.8 108 93 3.85 2.32 18.6 1 1 4 1 0 1 0
# 4 21.4 258 110 3.08 3.22 19.4 1 0 3 1 1 0 0
# 5 18.7 360 175 3.15 3.44 17.0 0 0 3 2 0 0 1
# 6 18.1 225 105 2.76 3.46 20.2 1 0 3 1 1 0 0
# 7 14.3 360 245 3.21 3.57 15.8 0 0 3 4 0 0 1
# 8 24.4 147. 62 3.69 3.19 20 1 0 4 2 0 1 0
# 9 22.8 141. 95 3.92 3.15 22.9 1 0 4 2 0 1 0
#10 19.2 168. 123 3.92 3.44 18.3 1 0 4 4 1 0 0

Filtering data frame by condition including data after that condition

Is there an easy way to filter my data frame so that any rows after and including a row that follows some condition are filtered out? The issue here is that I want it to be robust enough to handle a case where that condition is not met, in which the whole data frame will be returned. Check out my examples below if that sounds confusing:
library(dplyr)
## Works
mtcars %>%
as_tibble() %>%
filter(between(row_number(), 1, which(mpg == 17.8)))
#> # A tibble: 11 x 11
#> mpg cyl disp hp drat wt qsec vs am gear carb
#> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 21 6 160 110 3.9 2.62 16.5 0 1 4 4
#> 2 21 6 160 110 3.9 2.88 17.0 0 1 4 4
#> 3 22.8 4 108 93 3.85 2.32 18.6 1 1 4 1
#> 4 21.4 6 258 110 3.08 3.22 19.4 1 0 3 1
#> 5 18.7 8 360 175 3.15 3.44 17.0 0 0 3 2
#> 6 18.1 6 225 105 2.76 3.46 20.2 1 0 3 1
#> 7 14.3 8 360 245 3.21 3.57 15.8 0 0 3 4
#> 8 24.4 4 147. 62 3.69 3.19 20 1 0 4 2
#> 9 22.8 4 141. 95 3.92 3.15 22.9 1 0 4 2
#> 10 19.2 6 168. 123 3.92 3.44 18.3 1 0 4 4
#> 11 17.8 6 168. 123 3.92 3.44 18.9 1 0 4 4
## Doesn't work
mtcars %>%
as_tibble() %>%
filter(between(row_number(), 1, which(mpg == 30.5)))
#> Error in filter_impl(.data, quo): Evaluation error: Expecting a single value: [extent=0]..
Created on 2018-08-12 by the reprex package (v0.2.0).
You could include an ifelse statement to check whether the value is present in the dataframe. Also, you need to select the first row where the condition is verified to account for cases where the value is present more than once (in your example 21.0)
library(dplyr)
mtcars %>%
as_tibble() %>%
filter(between(row_number(), 1,ifelse(!any(mpg == 30),n(),which(mpg == 30)[1]-1)))
## returns the whole tibble
mtcars %>%
as_tibble() %>%
filter(between(row_number(), 1,ifelse(!any(mpg == 21),n(),which(mpg == 21)[1]-1)))
## Returns a tibble with 0 rows
mtcars %>%
as_tibble() %>%
filter(between(row_number(), 1,ifelse(!any(mpg == 21.4),n(),which(mpg == 21.4)[1]-1)))
## returns:
# A tibble: 3 x 11
mpg cyl disp hp drat wt qsec vs am gear carb
<dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4
2 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4
3 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1
I think your specific example does not work because there is no mpg that equals 30.5, however, you get the same error with mpg equals 21.0 because there are two rows with that value. You will need to chose whether you want the first or the last instance of that condition
library(tidyverse)
#max row
mtcars %>%
as_tibble() %>%
filter(between(row_number(), 1, which(mtcars$mpg == 21.0)[length(which(mtcars$mpg == 21.0))]))
#> # A tibble: 2 x 11
#> mpg cyl disp hp drat wt qsec vs am gear carb
#> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 21 6 160 110 3.9 2.62 16.5 0 1 4 4
#> 2 21 6 160 110 3.9 2.88 17.0 0 1 4 4
or
#min row
mtcars %>%
as_tibble() %>%
filter(between(row_number(), 1, which(mtcars$mpg == 21.0)[1]))
#> # A tibble: 1 x 11
#> mpg cyl disp hp drat wt qsec vs am gear carb
#> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 21 6 160 110 3.9 2.62 16.5 0 1 4 4
The example I chose just happened to be rows 1 and 2, but it illustrates the idea.
EDIT
The other answer by Lamia is much more elegant, and I probably thought about this too hard, but I felt like I needed to come up with something
library(dplyr)
filter_if_condition <- function(.data, condition, yes){
test_cond <- enquo(condition)
yes_filter <- enquo(yes)
if(.data %>% filter(!!test_cond) %>% nrow() > 0){
.data %>% filter(!!yes_filter)
}
else{.data}
}
mtcars %>%
as_tibble() %>%
filter_if_condition(366.0 %in% mpg, between(row_number(), 1, which(mpg == 366)[1]))
#> # A tibble: 32 x 11
#> mpg cyl disp hp drat wt qsec vs am gear carb
#> * <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 21 6 160 110 3.9 2.62 16.5 0 1 4 4
#> 2 21 6 160 110 3.9 2.88 17.0 0 1 4 4
#> 3 22.8 4 108 93 3.85 2.32 18.6 1 1 4 1
#> 4 21.4 6 258 110 3.08 3.22 19.4 1 0 3 1
#> 5 18.7 8 360 175 3.15 3.44 17.0 0 0 3 2
#> 6 18.1 6 225 105 2.76 3.46 20.2 1 0 3 1
#> 7 14.3 8 360 245 3.21 3.57 15.8 0 0 3 4
#> 8 24.4 4 147. 62 3.69 3.19 20 1 0 4 2
#> 9 22.8 4 141. 95 3.92 3.15 22.9 1 0 4 2
#> 10 19.2 6 168. 123 3.92 3.44 18.3 1 0 4 4
#> # ... with 22 more rows
mtcars %>%
as_tibble() %>%
filter_if_condition(18.1 %in% mpg, between(row_number(), 1, which(mpg == 18.1)[1]))
#> # A tibble: 6 x 11
#> mpg cyl disp hp drat wt qsec vs am gear carb
#> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 21 6 160 110 3.9 2.62 16.5 0 1 4 4
#> 2 21 6 160 110 3.9 2.88 17.0 0 1 4 4
#> 3 22.8 4 108 93 3.85 2.32 18.6 1 1 4 1
#> 4 21.4 6 258 110 3.08 3.22 19.4 1 0 3 1
#> 5 18.7 8 360 175 3.15 3.44 17.0 0 0 3 2
#> 6 18.1 6 225 105 2.76 3.46 20.2 1 0 3 1

Resources