Replace values with NAs based on a column condition - r

I have a dataframe with different columns, one of which tells me if data in other columns can be "trusted" or not, containing a "yes" or a no" (column name: inside_calibration_range). What I would like to do is simply to replace the values in the whole row with NA every time I have a "no" in the inside_calibration_range column.
I gave it a look to dplyr::na_if and replace_with_na_all() functions, but (I may be wrong) it seems they do not accept conditions, but they replace specific values in the whole dataframe.

When cyl equal to 6 cannot be trusted in mtcars, we can mutate across everything to NA for that condition:
library(tidyverse)
data(mtcars)
as_tibble(mtcars %>% mutate(across(everything(), ~replace(., cyl == 6 , NA))))
# A tibble: 32 × 11
mpg cyl disp hp drat wt qsec vs am gear carb
<dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 NA NA NA NA NA NA NA NA NA NA NA
2 NA NA NA NA NA NA NA NA NA NA NA
3 22.8 4 108 93 3.85 2.32 18.6 1 1 4 1
4 NA NA NA NA NA NA NA NA NA NA NA
5 18.7 8 360 175 3.15 3.44 17.0 0 0 3 2
6 NA NA NA NA NA NA NA NA NA NA NA
7 14.3 8 360 245 3.21 3.57 15.8 0 0 3 4
8 24.4 4 147. 62 3.69 3.19 20 1 0 4 2
9 22.8 4 141. 95 3.92 3.15 22.9 1 0 4 2
10 NA NA NA NA NA NA NA NA NA NA NA
# … with 22 more rows
# ℹ Use `print(n = ...)` to see more rows
Select only some columns instead of all:
as_tibble(mtcars %>% mutate(across(c(mpg, disp), ~replace(., cyl == 6 , NA))))
# A tibble: 32 × 11
mpg cyl disp hp drat wt qsec vs am gear carb
<dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 NA 6 NA 110 3.9 2.62 16.5 0 1 4 4
2 NA 6 NA 110 3.9 2.88 17.0 0 1 4 4
3 22.8 4 108 93 3.85 2.32 18.6 1 1 4 1
4 NA 6 NA 110 3.08 3.22 19.4 1 0 3 1
5 18.7 8 360 175 3.15 3.44 17.0 0 0 3 2
6 NA 6 NA 105 2.76 3.46 20.2 1 0 3 1
7 14.3 8 360 245 3.21 3.57 15.8 0 0 3 4
8 24.4 4 147. 62 3.69 3.19 20 1 0 4 2
9 22.8 4 141. 95 3.92 3.15 22.9 1 0 4 2
10 NA 6 NA 123 3.92 3.44 18.3 1 0 4 4
# … with 22 more rows
# ℹ Use `print(n = ...)` to see more rows

Related

Rolling average indexed on multiple variables

I'm working with a dataframe that indexes values by three variables, date, campaign and country. Every other value is indexed according to these three values, as follows:
# Groups: date, campaign [1,325]
date campaign country cost clicks
<date> <dbl> <chr> <dbl> <dbl>
1 2021-03-01 10127671839 0 0.45 7
2 2021-03-01 10127671839 AD 0.47 10
3 2021-03-01 10127671839 AE 0.39 11
4 2021-03-01 10127671839 AF 0.27 2
5 2021-03-01 10127671839 AG 0 0
6 2021-03-01 10127671839 AI 1.28 2
7 2021-03-01 10127671839 AL 0.66 6
8 2021-03-01 10127671839 AM 0.33 2
9 2021-03-01 10127671839 AO 0 0
10 2021-03-01 10127671839 AR 0 0
# … with 335,215 more rows
What I'm trying to do is creating a moving average of those values (in the table above, "cost" and "clicks") that is still indexed on country, campaign and date.
Edit: I found a good function that works when there are only two index variables (in here: Rolling mean (moving average) by group/id with dplyr), but I am not skilled enough to tweak the code into working for three or more variables.
I think zoo::rollmean works well here, and dplyr::group_by can handle as many index variables as you need:
library(dplyr)
mtcars %>%
group_by(cyl, am, vs) %>%
mutate(across(c(mpg,disp), list(rm = ~ zoo::rollmeanr(., 2, fill = NA))))
# # A tibble: 32 x 13
# # Groups: cyl, am, vs [7]
# mpg cyl disp hp drat wt qsec vs am gear carb mpg_rm disp_rm
# <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
# 1 21 6 160 110 3.9 2.62 16.5 0 1 4 4 NA NA
# 2 21 6 160 110 3.9 2.88 17.0 0 1 4 4 21 160
# 3 22.8 4 108 93 3.85 2.32 18.6 1 1 4 1 NA NA
# 4 21.4 6 258 110 3.08 3.22 19.4 1 0 3 1 NA NA
# 5 18.7 8 360 175 3.15 3.44 17.0 0 0 3 2 NA NA
# 6 18.1 6 225 105 2.76 3.46 20.2 1 0 3 1 19.8 242.
# 7 14.3 8 360 245 3.21 3.57 15.8 0 0 3 4 16.5 360
# 8 24.4 4 147. 62 3.69 3.19 20 1 0 4 2 NA NA
# 9 22.8 4 141. 95 3.92 3.15 22.9 1 0 4 2 23.6 144.
# 10 19.2 6 168. 123 3.92 3.44 18.3 1 0 4 4 18.6 196.
# # ... with 22 more rows
The fill=NA argument means that the first in each series has no history to average on, so it is NA. If you prefer the first in a series to be an average of itself, you can instead use partial=TRUE (using rollapplyr instead):
mtcars %>%
group_by(cyl, am, vs) %>%
mutate(across(c(mpg,disp), list(rm = ~ zoo::rollapplyr(., 2, FUN = mean, partial = TRUE))))
# # A tibble: 32 x 13
# # Groups: cyl, am, vs [7]
# mpg cyl disp hp drat wt qsec vs am gear carb mpg_rm disp_rm
# <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
# 1 21 6 160 110 3.9 2.62 16.5 0 1 4 4 21 160
# 2 21 6 160 110 3.9 2.88 17.0 0 1 4 4 21 160
# 3 22.8 4 108 93 3.85 2.32 18.6 1 1 4 1 22.8 108
# 4 21.4 6 258 110 3.08 3.22 19.4 1 0 3 1 21.4 258
# 5 18.7 8 360 175 3.15 3.44 17.0 0 0 3 2 18.7 360
# 6 18.1 6 225 105 2.76 3.46 20.2 1 0 3 1 19.8 242.
# 7 14.3 8 360 245 3.21 3.57 15.8 0 0 3 4 16.5 360
# 8 24.4 4 147. 62 3.69 3.19 20 1 0 4 2 24.4 147.
# 9 22.8 4 141. 95 3.92 3.15 22.9 1 0 4 2 23.6 144.
# 10 19.2 6 168. 123 3.92 3.44 18.3 1 0 4 4 18.6 196.
# # ... with 22 more rows
I've used the align="right" variants of zoo's functions, assuming that your moving average is historical and that time increases in subsequent rows. If these assumptions are not true, make sure you intentionally choose between the align-variants.
I used dplyr::across here to handle an arbitrary number of columns in one step: Since I used a named list of "tilde-functions", it took the name of each function and appended it to the name of each of the column names. You can break it out into individual mutate assignments if you prefer, for readability, maintainability, or if you need different sets of arguments for each column.

Mutate a dynamic column name with conditions using other dynamic column names

I'm trying to use dplyr::mutate to change a dynamic column with conditions using other columns dynamically.
I've got this bit of code:
d <- mtcars %>% tibble
fld_name <- "mpg"
other_fld_name <- "cyl"
d <- d %>% mutate(!!fld_name := ifelse(!!other_fld_name < 5,NA,!!fld_name))
which sets mpg to
mpg
<chr>
1 mpg
2 mpg
3 mpg
4 mpg
5 mpg
6 mpg
7 mpg
8 mpg
9 mpg
10 mpg
it seems to select the field on the LHS of assignment operator, but just pastes the field name on the RHS.
Removing the unquotes on the RHS yields the same result.
Any help is much appreciated.
use get to retreive column value instead
library(tidyverse)
d <- mtcars %>% tibble
fld_name <- "mpg"
other_fld_name <- "cyl"
d %>% mutate(!!fld_name := ifelse(get(other_fld_name) < 5 ,NA, get(fld_name)))
#> # A tibble: 32 x 11
#> mpg cyl disp hp drat wt qsec vs am gear carb
#> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 21 6 160 110 3.9 2.62 16.5 0 1 4 4
#> 2 21 6 160 110 3.9 2.88 17.0 0 1 4 4
#> 3 NA 4 108 93 3.85 2.32 18.6 1 1 4 1
#> 4 21.4 6 258 110 3.08 3.22 19.4 1 0 3 1
#> 5 18.7 8 360 175 3.15 3.44 17.0 0 0 3 2
#> 6 18.1 6 225 105 2.76 3.46 20.2 1 0 3 1
#> 7 14.3 8 360 245 3.21 3.57 15.8 0 0 3 4
#> 8 NA 4 147. 62 3.69 3.19 20 1 0 4 2
#> 9 NA 4 141. 95 3.92 3.15 22.9 1 0 4 2
#> 10 19.2 6 168. 123 3.92 3.44 18.3 1 0 4 4
#> # ... with 22 more rows
Created on 2021-06-22 by the reprex package (v2.0.0)
We can also use ensym function to quote variable name stored as string and unquote it with !! like the following:
library(rlang)
d <- mtcars %>% tibble
fld_name <- "mpg"
other_fld_name <- "cyl"
d %>%
mutate(!!ensym(fld_name) := ifelse(!!ensym(other_fld_name) < 5, NA, !!ensym(fld_name)))
# A tibble: 32 x 11
mpg cyl disp hp drat wt qsec vs am gear carb
<dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 21 6 160 110 3.9 2.62 16.5 0 1 4 4
2 21 6 160 110 3.9 2.88 17.0 0 1 4 4
3 NA 4 108 93 3.85 2.32 18.6 1 1 4 1
4 21.4 6 258 110 3.08 3.22 19.4 1 0 3 1
5 18.7 8 360 175 3.15 3.44 17.0 0 0 3 2
6 18.1 6 225 105 2.76 3.46 20.2 1 0 3 1
7 14.3 8 360 245 3.21 3.57 15.8 0 0 3 4
8 NA 4 147. 62 3.69 3.19 20 1 0 4 2
9 NA 4 141. 95 3.92 3.15 22.9 1 0 4 2
10 19.2 6 168. 123 3.92 3.44 18.3 1 0 4 4
# ... with 22 more rows
We could also use .data
library(dplyr)
d %>%
mutate(!! fld_name := case_when(.data[[other_fld_name]] >=5 ~
.data[[fld_name]]))
-output
# A tibble: 32 x 11
mpg cyl disp hp drat wt qsec vs am gear carb
<dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 21 6 160 110 3.9 2.62 16.5 0 1 4 4
2 21 6 160 110 3.9 2.88 17.0 0 1 4 4
3 NA 4 108 93 3.85 2.32 18.6 1 1 4 1
4 21.4 6 258 110 3.08 3.22 19.4 1 0 3 1
5 18.7 8 360 175 3.15 3.44 17.0 0 0 3 2
6 18.1 6 225 105 2.76 3.46 20.2 1 0 3 1
7 14.3 8 360 245 3.21 3.57 15.8 0 0 3 4
8 NA 4 147. 62 3.69 3.19 20 1 0 4 2
9 NA 4 141. 95 3.92 3.15 22.9 1 0 4 2
10 19.2 6 168. 123 3.92 3.44 18.3 1 0 4 4
# … with 22 more rows
data
d <- mtcars %>%
as_tibble
fld_name <- "mpg"
other_fld_name <- "cyl"

How to check for duplication under some conditions in a data frame?

I have a data frame with given structure
'data.frame': 3005 obs. of 6 variables:
$ trial_id : int 2184 2184 2184 2184 2184 2184 2184 2184 2184 2184 ...
$ ctri_number: Factor w/ 278 levels "CTRI/2016/06/006993 ",..: 134 134 134 134 134 134 134 134 134 134 ...
$ noofsites : int 13 13 13 13 13 13 13 13 13 13 ...
$ Sites_Names: chr "Acharya Tulsi Regional Cancer Treatment And Research Institute" "City Cancer Centre" "Curie Manavata Cancer Centre" "Government Stanley Medical College and Hospital" ...
$ noofcom : int 13 13 13 13 13 13 13 13 13 13 ...
$ ECs_Names : Factor w/ 2493 levels "\"Aakash Healthcare Institutional Ethics Committee Aakash Healthcare Super Specialty Hospital Hospital Plot, Ro"| __truncated__,..: 218 210 211 1007 834 859 2047 2058 2096 2212 ...
There are a total of 278 unique trial_ids.
Each trial has more than 1 noofsite and thus their respective Names, each site name is in a different row. So for every trial, the number of rows = noofsites.
I want to check - For every single trial if there is any duplication of Sites_Names. I don't want to check duplication of sites name in a whole data frame, only for a specific trial.
How can this be achieved ???
Thankyou in advance
Perhaps you want to group_by(trial_id) before checking for duplicates, e.g.
library(tidyverse)
df1 <- mtcars %>%
as_tibble() %>%
add_count(disp, name = "duplicates?")
df1
# A tibble: 32 x 12
# mpg cyl disp hp drat wt qsec vs am gear carb `duplicates?`
# <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <int>
# 1 21 6 160 110 3.9 2.62 16.5 0 1 4 4 2
# 2 21 6 160 110 3.9 2.88 17.0 0 1 4 4 2
# 3 22.8 4 108 93 3.85 2.32 18.6 1 1 4 1 1
# 4 21.4 6 258 110 3.08 3.22 19.4 1 0 3 1 1
# 5 18.7 8 360 175 3.15 3.44 17.0 0 0 3 2 2
# 6 18.1 6 225 105 2.76 3.46 20.2 1 0 3 1 1
# 7 14.3 8 360 245 3.21 3.57 15.8 0 0 3 4 2
# 8 24.4 4 147. 62 3.69 3.19 20 1 0 4 2 1
# 9 22.8 4 141. 95 3.92 3.15 22.9 1 0 4 2 1
#10 19.2 6 168. 123 3.92 3.44 18.3 1 0 4 4 2
df2 <- mtcars %>%
group_by(carb) %>%
add_count(disp, name = "duplicates?") %>%
ungroup()
df2
# A tibble: 32 x 12
# mpg cyl disp hp drat wt qsec vs am gear carb `duplicates?`
# <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <int>
# 1 21 6 160 110 3.9 2.62 16.5 0 1 4 4 2
# 2 21 6 160 110 3.9 2.88 17.0 0 1 4 4 2
# 3 22.8 4 108 93 3.85 2.32 18.6 1 1 4 1 1
# 4 21.4 6 258 110 3.08 3.22 19.4 1 0 3 1 1
# 5 18.7 8 360 175 3.15 3.44 17.0 0 0 3 2 1
# 6 18.1 6 225 105 2.76 3.46 20.2 1 0 3 1 1
# 7 14.3 8 360 245 3.21 3.57 15.8 0 0 3 4 1
# 8 24.4 4 147. 62 3.69 3.19 20 1 0 4 2 1
# 9 22.8 4 141. 95 3.92 3.15 22.9 1 0 4 2 1
#10 19.2 6 168. 123 3.92 3.44 18.3 1 0 4 4 2
identical(df1, df2)
# FALSE
Here, duplicates are noted in the last column of each tibble. If you group_by(carb) before checking for duplicates, you get fewer duplicates than if you search the entire dataframe for duplicates (df1 and df2 are not identical). I.e. the dataset is grouped by "carb" and each group is searched for duplicates separately. Does this solve your problem?

Unpack a data frame column into multiple columns

I am trying to create a function that will mutate a column if it exists. If the column does exist, I return a data frame with two columns. I'd like help unpacking this data frame column, into its component columns:
library(dplyr)
#>
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#>
#> filter, lag
#> The following objects are masked from 'package:base':
#>
#> intersect, setdiff, setequal, union
my_transformation = function(df){
df %>%
mutate(across(any_of('cyl'), function(x) tibble(a = x + 3, b = x + 1)))
}
df_1 = as_tibble(mtcars)
df_2 = df_1 %>% select(-cyl)
my_transformation(df_1)
#> # A tibble: 32 x 11
#> mpg cyl$a $b disp hp drat wt qsec vs am gear carb
#> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 21 9 7 160 110 3.9 2.62 16.5 0 1 4 4
#> 2 21 9 7 160 110 3.9 2.88 17.0 0 1 4 4
#> 3 22.8 7 5 108 93 3.85 2.32 18.6 1 1 4 1
#> 4 21.4 9 7 258 110 3.08 3.22 19.4 1 0 3 1
#> 5 18.7 11 9 360 175 3.15 3.44 17.0 0 0 3 2
#> 6 18.1 9 7 225 105 2.76 3.46 20.2 1 0 3 1
#> 7 14.3 11 9 360 245 3.21 3.57 15.8 0 0 3 4
#> 8 24.4 7 5 147. 62 3.69 3.19 20 1 0 4 2
#> 9 22.8 7 5 141. 95 3.92 3.15 22.9 1 0 4 2
#> 10 19.2 9 7 168. 123 3.92 3.44 18.3 1 0 4 4
#> # … with 22 more rows
my_transformation(df_2)
#> # A tibble: 32 x 10
#> mpg disp hp drat wt qsec vs am gear carb
#> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 21 160 110 3.9 2.62 16.5 0 1 4 4
#> 2 21 160 110 3.9 2.88 17.0 0 1 4 4
#> 3 22.8 108 93 3.85 2.32 18.6 1 1 4 1
#> 4 21.4 258 110 3.08 3.22 19.4 1 0 3 1
#> 5 18.7 360 175 3.15 3.44 17.0 0 0 3 2
#> 6 18.1 225 105 2.76 3.46 20.2 1 0 3 1
#> 7 14.3 360 245 3.21 3.57 15.8 0 0 3 4
#> 8 24.4 147. 62 3.69 3.19 20 1 0 4 2
#> 9 22.8 141. 95 3.92 3.15 22.9 1 0 4 2
#> 10 19.2 168. 123 3.92 3.44 18.3 1 0 4 4
#> # … with 22 more rows
Created on 2020-08-22 by the reprex package (v0.3.0)
As you can see, when calling my_transformation(df_1), there are two subcolumns: cyl$a and cyl$b. How do I get these to be regular columns?
I have tried unnest(cyl) but had no success.
I think what you're after is something like
mtcars %>% mutate(across(cyl, list(a = ~ .x + 3, b = ~ .x + 1)))
# mpg cyl disp hp drat wt qsec vs am gear carb cyl_a cyl_b
# 1 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4 9 7
# 2 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4 9 7
# 3 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1 7 5
# 4 21.4 6 258 110 3.08 3.215 19.44 1 0 3 1 9 7
# 5 18.7 8 360 175 3.15 3.440 17.02 0 0 3 2 11 9
# 6 18.1 6 225 105 2.76 3.460 20.22 1 0 3 1 9 7
# ...
Note that the .fns argument of across can take a list of (lambda) functions; so if you replace function(x) tibble(a = ..., b = ...) with list(a = ~ ..., b = ~ ...) the new mutate (dplyr >= 1.0.0) will automatically create columns cyl_a and cyl_b.
So the only way I've found to drop the "nesting" for dataframe columns is by not supplying an LHS argument to mutate as documented here
Unfortunately, using across to check for missing columns is not possible, as it uses .names to assign something on the LHS.
Therefore, I'm taking the approach of inserting the missing column if it is missing and then calling mutate without across.
library(dplyr)
#>
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#>
#> filter, lag
#> The following objects are masked from 'package:base':
#>
#> intersect, setdiff, setequal, union
library(tibble)
my_transformation = function(df){
cols <- c(cyl = NA_real_)
df %>%
add_column(!!!cols[!names(cols) %in% names(.)]) %>%
mutate(tibble(a = cyl + 3, b = cyl + 1))
}
df_1 = as_tibble(mtcars)
df_2 = df_1 %>% select(-cyl)
my_transformation(df_1)
#> # A tibble: 32 x 13
#> mpg cyl disp hp drat wt qsec vs am gear carb a b
#> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 21 6 160 110 3.9 2.62 16.5 0 1 4 4 9 7
#> 2 21 6 160 110 3.9 2.88 17.0 0 1 4 4 9 7
#> 3 22.8 4 108 93 3.85 2.32 18.6 1 1 4 1 7 5
#> 4 21.4 6 258 110 3.08 3.22 19.4 1 0 3 1 9 7
#> 5 18.7 8 360 175 3.15 3.44 17.0 0 0 3 2 11 9
#> 6 18.1 6 225 105 2.76 3.46 20.2 1 0 3 1 9 7
#> 7 14.3 8 360 245 3.21 3.57 15.8 0 0 3 4 11 9
#> 8 24.4 4 147. 62 3.69 3.19 20 1 0 4 2 7 5
#> 9 22.8 4 141. 95 3.92 3.15 22.9 1 0 4 2 7 5
#> 10 19.2 6 168. 123 3.92 3.44 18.3 1 0 4 4 9 7
#> # … with 22 more rows
my_transformation(df_2)
#> # A tibble: 32 x 13
#> mpg disp hp drat wt qsec vs am gear carb cyl a b
#> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 21 160 110 3.9 2.62 16.5 0 1 4 4 NA NA NA
#> 2 21 160 110 3.9 2.88 17.0 0 1 4 4 NA NA NA
#> 3 22.8 108 93 3.85 2.32 18.6 1 1 4 1 NA NA NA
#> 4 21.4 258 110 3.08 3.22 19.4 1 0 3 1 NA NA NA
#> 5 18.7 360 175 3.15 3.44 17.0 0 0 3 2 NA NA NA
#> 6 18.1 225 105 2.76 3.46 20.2 1 0 3 1 NA NA NA
#> 7 14.3 360 245 3.21 3.57 15.8 0 0 3 4 NA NA NA
#> 8 24.4 147. 62 3.69 3.19 20 1 0 4 2 NA NA NA
#> 9 22.8 141. 95 3.92 3.15 22.9 1 0 4 2 NA NA NA
#> 10 19.2 168. 123 3.92 3.44 18.3 1 0 4 4 NA NA NA
#> # … with 22 more rows
Created on 2020-08-23 by the reprex package (v0.3.0)
Not a huge fan of the solution. but it does work. I'm considering creating a github issue for instances where you want to return an output data frame column, using mutate but only if an input column exists.

tidyr::pivot_wider() reorder column names grouping by `name_from`

I would like to reorder the columns grouping by names_from instead of values_from, here is my minimal example:
mtcars %>%
tidyr::pivot_wider(names_from = gear, values_from = c(vs, am, carb))
output:
mpg cyl disp hp drat wt qsec vs_4 vs_3 vs_5 am_4 am_3 am_5 carb_4 carb_3 carb_5
<dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 21 6 160 110 3.9 2.62 16.5 0 NA NA 1 NA NA 4 NA NA
2 21 6 160 110 3.9 2.88 17.0 0 NA NA 1 NA NA 4 NA NA
3 22.8 4 108 93 3.85 2.32 18.6 1 NA NA 1 NA NA 1 NA NA
Here is what I want the output:
mpg cyl disp hp drat wt qsec vs_4 am_4 carb_4 vs_3 am_3 carb_3 vs_5 am_5 carb_5
<dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 21 6 160 110 3.9 2.62 16.5 0 1 4 NA NA NA NA NA NA
2 21 6 160 110 3.9 2.88 17.0 0 1 4 NA NA NA NA NA NA
Thanks in advance!
As far as I know, this can't be accomplished with pivot_wider and must be done afterwards.
Here is a long-winded attempt, but it does the job:
library(tidyverse)
suffixes <- unique(mtcars$gear)
pivoted <- mtcars %>%
tidyr::pivot_wider(names_from = gear, values_from = c(vs, am, carb))
names_to_order <- map(suffixes, ~ names(pivoted)[grep(paste0("_", .x), names(pivoted))]) %>% unlist
names_id <- setdiff(names(pivoted), names_to_order)
pivoted %>%
select(names_id, names_to_order)
#> # A tibble: 32 x 16
#> mpg cyl disp hp drat wt qsec vs_4 am_4 carb_4 vs_3 am_3
#> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 21 6 160 110 3.9 2.62 16.5 0 1 4 NA NA
#> 2 21 6 160 110 3.9 2.88 17.0 0 1 4 NA NA
#> 3 22.8 4 108 93 3.85 2.32 18.6 1 1 1 NA NA
#> 4 21.4 6 258 110 3.08 3.22 19.4 NA NA NA 1 0
#> 5 18.7 8 360 175 3.15 3.44 17.0 NA NA NA 0 0
#> 6 18.1 6 225 105 2.76 3.46 20.2 NA NA NA 1 0
#> 7 14.3 8 360 245 3.21 3.57 15.8 NA NA NA 0 0
#> 8 24.4 4 147. 62 3.69 3.19 20 1 0 2 NA NA
#> 9 22.8 4 141. 95 3.92 3.15 22.9 1 0 2 NA NA
#> 10 19.2 6 168. 123 3.92 3.44 18.3 1 0 4 NA NA
#> # ... with 22 more rows, and 4 more variables: carb_3 <dbl>, vs_5 <dbl>,
#> # am_5 <dbl>, carb_5 <dbl>
Created on 2020-02-25 by the reprex package (v0.3.0)

Resources