mutate with case_when - multiple LHS/RHS OR evaluations - r

I'm not sure of the best way to ask this question.
I would like to mutate using case_when (or if_else if that works better) to examine if a value exists in any of a range of columns.
E.g. in mtcars I would like to check if any of the columns vs, am, gear or carb contained 1 or 2 and set a new variable newVar to 1 if they do. I could do the following:
mtcars %>%
mutate(newVar = case_when(vs %in% c(1, 2) | am %in% c(1, 2) | gear %in% c(1, 2) | carb %in% c(1, 2) ~ 1,
TRUE ~ 0))
Is there a prettier way to do this? I want to check across 10+ columns so it gets long. Something like:
mtcars %>%
mutate(newVar = case_when(c(vs, am, gear, carb) %in% c(1, 2) ~ 1,
TRUE ~ 0))

I think base R can work good here. Select columns for which you want to check and take row wise sum of logical vector to calculate newVar.
df <- mtcars
cols <- c("vs", "am", "gear", "carb")
df$newVar <- +(rowSums(df[cols] == 1 | df[cols] == 2) > 0)
df
# mpg cyl disp hp drat wt qsec vs am gear carb newVar
#Mazda RX4 21.0 6 160.0 110 3.90 2.620 16.46 0 1 4 4 1
#Mazda RX4 Wag 21.0 6 160.0 110 3.90 2.875 17.02 0 1 4 4 1
#Datsun 710 22.8 4 108.0 93 3.85 2.320 18.61 1 1 4 1 1
#Hornet 4 Drive 21.4 6 258.0 110 3.08 3.215 19.44 1 0 3 1 1
#Hornet Sportabout 18.7 8 360.0 175 3.15 3.440 17.02 0 0 3 2 1
#Valiant 18.1 6 225.0 105 2.76 3.460 20.22 1 0 3 1 1
#Duster 360 14.3 8 360.0 245 3.21 3.570 15.84 0 0 3 4 0
#Merc 240D 24.4 4 146.7 62 3.69 3.190 20.00 1 0 4 2 1
#Merc 230 22.8 4 140.8 95 3.92 3.150 22.90 1 0 4 2 1
#Merc 280 19.2 6 167.6 123 3.92 3.440 18.30 1 0 4 4 1
#....
We can also use apply for row-wise manipulation
df$newVar <- +(apply(df[cols] == 1 | df[cols] == 2, 1, any))

We can use tidyverse option to create the column
library(dplyr)
library(purrr)
mtcars %>%
mutate(newVar = select(., vs:carb) %>%
map(~ .x %in% 1:2) %>%
reduce(`|`) %>%
as.integer)
#. mpg cyl disp hp drat wt qsec vs am gear carb newVar
#1 21.0 6 160.0 110 3.90 2.620 16.46 0 1 4 4 1
#2 21.0 6 160.0 110 3.90 2.875 17.02 0 1 4 4 1
#3 22.8 4 108.0 93 3.85 2.320 18.61 1 1 4 1 1
#4 21.4 6 258.0 110 3.08 3.215 19.44 1 0 3 1 1
#5 18.7 8 360.0 175 3.15 3.440 17.02 0 0 3 2 1
#6 18.1 6 225.0 105 2.76 3.460 20.22 1 0 3 1 1
#7 14.3 8 360.0 245 3.21 3.570 15.84 0 0 3 4 0
#8 24.4 4 146.7 62 3.69 3.190 20.00 1 0 4 2 1
# ...
Or with base R
nm1 <- c("vs", "am", "gear", "carb")
mtcars$newVar <- +(Reduce(`|`, lapply(mtcars[nm1], `%in%`, 1:2)))

Related

Using dplyr, how should I create a column of strings repeating a character based on the value of another column?

With mtcars for example, I'd like to create a new column carb_dots such that when carb = 4, carb_dots = "...."
Using dplyr, I've tried
library(dplyr)
mtcars2 <- mtcars %>% mutate(carb_dots = rep(".", carb))
This errors with
Error in mutate_impl(.data, dots) :
Evaluation error: invalid 'times' argument.
What should I do? Thanks for your suggestions.
With the addition of stringr, you can do:
mtcars %>%
mutate(carb_dots = str_dup(".", carb))
mpg cyl disp hp drat wt qsec vs am gear carb carb_dots
1 21.0 6 160.0 110 3.90 2.620 16.46 0 1 4 4 ....
2 21.0 6 160.0 110 3.90 2.875 17.02 0 1 4 4 ....
3 22.8 4 108.0 93 3.85 2.320 18.61 1 1 4 1 .
4 21.4 6 258.0 110 3.08 3.215 19.44 1 0 3 1 .
5 18.7 8 360.0 175 3.15 3.440 17.02 0 0 3 2 ..
6 18.1 6 225.0 105 2.76 3.460 20.22 1 0 3 1 .
7 14.3 8 360.0 245 3.21 3.570 15.84 0 0 3 4 ....
8 24.4 4 146.7 62 3.69 3.190 20.00 1 0 4 2 ..
9 22.8 4 140.8 95 3.92 3.150 22.90 1 0 4 2 ..
10 19.2 6 167.6 123 3.92 3.440 18.30 1 0 4 4 ....
We can use strrep
library(dplyr)
mtcars %>%
mutate(carb_dots = strrep(".", carb))
# mpg cyl disp hp drat wt qsec vs am gear carb carb_dots
#Mazda RX4 21.0 6 160.0 110 3.90 2.620 16.46 0 1 4 4 ....
#Mazda RX4 Wag 21.0 6 160.0 110 3.90 2.875 17.02 0 1 4 4 ....
#Datsun 710 22.8 4 108.0 93 3.85 2.320 18.61 1 1 4 1 .
#Hornet 4 Drive 21.4 6 258.0 110 3.08 3.215 19.44 1 0 3 1 .
#Hornet Sportabout 18.7 8 360.0 175 3.15 3.440 17.02 0 0 3 2 ..
#Valiant 18.1 6 225.0 105 2.76 3.460 20.22 1 0 3 1 .
#Duster 360 14.3 8 360.0 245 3.21 3.570 15.84 0 0 3 4 ....
#...
If we need to use rep
mtcars %>%
rowwise %>%
mutate(carb_dots = paste(rep(".", carb), collapse=""))

Using `dplyr::na_if` with a probability to create missing data?

I'm interested in simulating data with a chance of missing-ness. How can I do this using using dplyr::na_if?
Intuitively I wanted to do something like:
mtcars %>%
mutate(mpg = na_if(mpg, rbinom(n = n(),
1,
prob = .5) == 1))
But I think this is wrong because na_if is really for matching x and y. How do I use na_if to create a probability of missingness?
(edit: Also if there is a better function for creating missing data in the tidyverse please let me know in the comments)
You don't need na_if here, just use if_else. rbinom is overkill also, runif works fine.
mtcars %>%
mutate(mpg = if_else(runif(n = n()) > 0.5, NA_real_, mpg))
With a slight modification of your code:
mtcars %>%
mutate(mpg = if_else(rbinom(n(), 1, prob = 0.5) == 1, NA_real_, mpg))
mpg cyl disp hp drat wt qsec vs am gear carb
1 21.0 6 160.0 110 3.90 2.620 16.46 0 1 4 4
2 21.0 6 160.0 110 3.90 2.875 17.02 0 1 4 4
3 22.8 4 108.0 93 3.85 2.320 18.61 1 1 4 1
4 NA 6 258.0 110 3.08 3.215 19.44 1 0 3 1
5 18.7 8 360.0 175 3.15 3.440 17.02 0 0 3 2
6 NA 6 225.0 105 2.76 3.460 20.22 1 0 3 1
7 14.3 8 360.0 245 3.21 3.570 15.84 0 0 3 4
8 24.4 4 146.7 62 3.69 3.190 20.00 1 0 4 2
9 22.8 4 140.8 95 3.92 3.150 22.90 1 0 4 2
10 NA 6 167.6 123 3.92 3.440 18.30 1 0 4 4

How to swap TRUE and FALSE in a dataframe with dplyr

I'm looking to change values into their opposite (T becomes F, and vice versa) in specific rows in a column in a datatable
I know that x <- !x works for T/F variables but how to finish this dplyr approach:
library(dplyr)
library(datatable)
library(magrittr)
mtcars$selected <- T
mtcars %>% select(selected) %>% slice(c(1,4,5)) %>% mutate(??)
If you just want to subset those rows, then #Shree's answer is likely right. If you want to invert just those rows but otherwise keep all, then something like:
In dplyr:
library(dplyr)
mtcars %>%
mutate(selected = TRUE) %>%
# the heart of the answer
mutate(selected = if_else(row_number() %in% c(1, 4, 5), !selected, selected))
# mpg cyl disp hp drat wt qsec vs am gear carb selected
# 1 21.0 6 160.0 110 3.90 2.620 16.46 0 1 4 4 FALSE
# 2 21.0 6 160.0 110 3.90 2.875 17.02 0 1 4 4 TRUE
# 3 22.8 4 108.0 93 3.85 2.320 18.61 1 1 4 1 TRUE
# 4 21.4 6 258.0 110 3.08 3.215 19.44 1 0 3 1 FALSE
# 5 18.7 8 360.0 175 3.15 3.440 17.02 0 0 3 2 FALSE
# 6 18.1 6 225.0 105 2.76 3.460 20.22 1 0 3 1 TRUE
# 7 14.3 8 360.0 245 3.21 3.570 15.84 0 0 3 4 TRUE
# 8 24.4 4 146.7 62 3.69 3.190 20.00 1 0 4 2 TRUE
# ...
You said datatable, I think you meant data.table, in which case
library(data.table)
DT <- as.data.table(mtcars)
DT[, selected := TRUE]
DT[, selected := ifelse(.I %in% c(1, 3, 4), !selected, selected)]
head(DT, n = 8)
# mpg cyl disp hp drat wt qsec vs am gear carb selected
# 1: 21.0 6 160.0 110 3.90 2.620 16.46 0 1 4 4 FALSE
# 2: 21.0 6 160.0 110 3.90 2.875 17.02 0 1 4 4 TRUE
# 3: 22.8 4 108.0 93 3.85 2.320 18.61 1 1 4 1 FALSE
# 4: 21.4 6 258.0 110 3.08 3.215 19.44 1 0 3 1 FALSE
# 5: 18.7 8 360.0 175 3.15 3.440 17.02 0 0 3 2 TRUE
# 6: 18.1 6 225.0 105 2.76 3.460 20.22 1 0 3 1 TRUE
# 7: 14.3 8 360.0 245 3.21 3.570 15.84 0 0 3 4 TRUE
# 8: 24.4 4 146.7 62 3.69 3.190 20.00 1 0 4 2 TRUE
Or pipe-wise as
library(magrittr)
DT <- as.data.table(mtcars)
DT %>%
.[, selected := TRUE] %>%
.[, selected := ifelse(.I %in% c(1, 3, 4), !selected, selected)]
head(DT, n = 8)
# mpg cyl disp hp drat wt qsec vs am gear carb selected
# 1: 21.0 6 160.0 110 3.90 2.620 16.46 0 1 4 4 FALSE
# 2: 21.0 6 160.0 110 3.90 2.875 17.02 0 1 4 4 TRUE
# 3: 22.8 4 108.0 93 3.85 2.320 18.61 1 1 4 1 FALSE
# 4: 21.4 6 258.0 110 3.08 3.215 19.44 1 0 3 1 FALSE
# 5: 18.7 8 360.0 175 3.15 3.440 17.02 0 0 3 2 TRUE
# 6: 18.1 6 225.0 105 2.76 3.460 20.22 1 0 3 1 TRUE
# 7: 14.3 8 360.0 245 3.21 3.570 15.84 0 0 3 4 TRUE
# 8: 24.4 4 146.7 62 3.69 3.190 20.00 1 0 4 2 TRUE
In Base R, you can do it this way:
mtcars$selected <- TRUE
mtcars$selected[c(1, 3, 4)] <- !mtcars$selected[c(1, 3, 4)]
head(mtcars, n = 8)
# mpg cyl disp hp drat wt qsec vs am gear carb selected
# Mazda RX4 21.0 6 160.0 110 3.90 2.620 16.46 0 1 4 4 FALSE
# Mazda RX4 Wag 21.0 6 160.0 110 3.90 2.875 17.02 0 1 4 4 TRUE
# Datsun 710 22.8 4 108.0 93 3.85 2.320 18.61 1 1 4 1 FALSE
# Hornet 4 Drive 21.4 6 258.0 110 3.08 3.215 19.44 1 0 3 1 FALSE
# Hornet Sportabout 18.7 8 360.0 175 3.15 3.440 17.02 0 0 3 2 TRUE
# Valiant 18.1 6 225.0 105 2.76 3.460 20.22 1 0 3 1 TRUE
# Duster 360 14.3 8 360.0 245 3.21 3.570 15.84 0 0 3 4 TRUE
# Merc 240D 24.4 4 146.7 62 3.69 3.190 20.00 1 0 4 2 TRUE
Here's one way -
mtcars %>%
select(selected) %>%
slice(c(1,4,5)) %>%
mutate(
selected = !selected # or as.logical(1 - selected)
)

Programming dplyr operations

Any idea how I can manipulate dplyr variables programatically?
This works:
out = "new_var"
mtcars %>%
mutate(!!out := mpg/carb)
But I really need to be able to adjust the variables in the division. Thought I could do it like this:
out = "new_var"
numer = "mpg"
denom = "carb"
mtcars %>%
mutate(!!out := !! quo(numer/denom))
but no dice:
Error in mutate_impl(.data, dots) :
Evaluation error: non-numeric argument to binary operator.
result should look like:
mpg cyl disp hp drat wt qsec vs am gear carb new_var
1 21.0 6 160.0 110 3.90 2.620 16.46 0 1 4 4 5.250000
2 21.0 6 160.0 110 3.90 2.875 17.02 0 1 4 4 5.250000
3 22.8 4 108.0 93 3.85 2.320 18.61 1 1 4 1 22.800000
4 21.4 6 258.0 110 3.08 3.215 19.44 1 0 3 1 21.400000
5 18.7 8 360.0 175 3.15 3.440 17.02 0 0 3 2 9.350000
6 18.1 6 225.0 105 2.76 3.460 20.22 1 0 3 1 18.100000
7 14.3 8 360.0 245 3.21 3.570 15.84 0 0 3 4 3.575000
8 24.4 4 146.7 62 3.69 3.190 20.00 1 0 4 2 12.200000
...
Any idea how this works?
SOLVED -------------------------------------------------
myFunction = function(df, col, col2, new_col) {
col <- enquo(col)
col2 <- enquo(col2)
new_col <- quo_name(enquo(new_col))
df %>%
mutate(!!new_col := (!!col)/(!!col2))
}
myFunction(mtcars, mpg, wt, mpg_based_new_col)
If you want to make a quosure from a character value, you can use the rlang::sym() function (or just the base as.name() function). For example
out = "new_var"
numer = rlang::sym("mpg")
denom = rlang::sym("carb")
library(tidyverse)
mtcars %>%
mutate(!!out := (!!numer)/(!!denom))
Note how we escape each variable separately rather than the entire expression.

Adding row in dplyr across a selected number of columns

While within dplyr workflow I would like to append a row across a selected number of columns.
Desired results
Starting with the mtcarsdata and applying function(s) with the goal of adding string "A" to columns 2:5 the one should arrive at the following results:
mpg cyl disp hp drat wt qsec vs am gear carb
NA A A A A NA NA NA NA NA NA
Mazda RX4 21.0 6 160.0 110 3.90 2.620 16.46 0 1 4 4
Mazda RX4 Wag 21.0 6 160.0 110 3.90 2.875 17.02 0 1 4 4
Datsun 710 22.8 4 108.0 93 3.85 2.320 18.61 1 1 4 1
Hornet 4 Drive 21.4 6 258.0 110 3.08 3.215 19.44 1 0 3 1
Hornet Sportabout 18.7 8 360.0 175 3.15 3.440 17.02 0 0 3 2
Valiant 18.1 6 225.0 105 2.76 3.460 20.22 1 0 3 1
Duster 360 14.3 8 360.0 245 3.21 3.570 15.84 0 0 3 4
Merc 240D 24.4 4 146.7 62 3.69 3.190 20.00 1 0 4 2
Merc 230 22.8 4 140.8 95 3.92 3.150 22.90 1 0 4 2
The following criteria were met:
For the columns with available index in vars() call the "A" string was added
For the remaining columns the NA value was provided
Approach
require(dplyr)
mtcars %>%
mutate_at(.cols = vars(2:5),
.funs = add_row(. = "A", .before = 1))
Naturally, this results in an error message:
Error: Unsupported index type: NULL
Hence my question: how can I utilise add_row, or a similar approach, to force value across a set of columns initially passed via vars()?
Side notes
I don't mind doing this via rbind but I would like to keep my %>% workflow:
%>% - receive object
Add something across first row to columns x:y %>%
Add something across first row to columns m:n %>%
Other manipulations
Add the row then update:
mtcars %>%
head %>%
add_row(.before = 1) %>%
mutate_at(.cols = vars(2:5),
funs(ifelse(is.na(.), "A", .)))
# mpg cyl disp hp drat wt qsec vs am gear carb
# 1 NA A A A A NA NA NA NA NA NA
# 2 21.0 6 160 110 3.9 2.620 16.46 0 1 4 4
# 3 21.0 6 160 110 3.9 2.875 17.02 0 1 4 4
# 4 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1
# 5 21.4 6 258 110 3.08 3.215 19.44 1 0 3 1
# 6 18.7 8 360 175 3.15 3.440 17.02 0 0 3 2
# 7 18.1 6 225 105 2.76 3.460 20.22 1 0 3 1
Note: This will add "A" to any row that has NAs.

Resources