Any idea how I can manipulate dplyr variables programatically?
This works:
out = "new_var"
mtcars %>%
mutate(!!out := mpg/carb)
But I really need to be able to adjust the variables in the division. Thought I could do it like this:
out = "new_var"
numer = "mpg"
denom = "carb"
mtcars %>%
mutate(!!out := !! quo(numer/denom))
but no dice:
Error in mutate_impl(.data, dots) :
Evaluation error: non-numeric argument to binary operator.
result should look like:
mpg cyl disp hp drat wt qsec vs am gear carb new_var
1 21.0 6 160.0 110 3.90 2.620 16.46 0 1 4 4 5.250000
2 21.0 6 160.0 110 3.90 2.875 17.02 0 1 4 4 5.250000
3 22.8 4 108.0 93 3.85 2.320 18.61 1 1 4 1 22.800000
4 21.4 6 258.0 110 3.08 3.215 19.44 1 0 3 1 21.400000
5 18.7 8 360.0 175 3.15 3.440 17.02 0 0 3 2 9.350000
6 18.1 6 225.0 105 2.76 3.460 20.22 1 0 3 1 18.100000
7 14.3 8 360.0 245 3.21 3.570 15.84 0 0 3 4 3.575000
8 24.4 4 146.7 62 3.69 3.190 20.00 1 0 4 2 12.200000
...
Any idea how this works?
SOLVED -------------------------------------------------
myFunction = function(df, col, col2, new_col) {
col <- enquo(col)
col2 <- enquo(col2)
new_col <- quo_name(enquo(new_col))
df %>%
mutate(!!new_col := (!!col)/(!!col2))
}
myFunction(mtcars, mpg, wt, mpg_based_new_col)
If you want to make a quosure from a character value, you can use the rlang::sym() function (or just the base as.name() function). For example
out = "new_var"
numer = rlang::sym("mpg")
denom = rlang::sym("carb")
library(tidyverse)
mtcars %>%
mutate(!!out := (!!numer)/(!!denom))
Note how we escape each variable separately rather than the entire expression.
Related
I'm interested in simulating data with a chance of missing-ness. How can I do this using using dplyr::na_if?
Intuitively I wanted to do something like:
mtcars %>%
mutate(mpg = na_if(mpg, rbinom(n = n(),
1,
prob = .5) == 1))
But I think this is wrong because na_if is really for matching x and y. How do I use na_if to create a probability of missingness?
(edit: Also if there is a better function for creating missing data in the tidyverse please let me know in the comments)
You don't need na_if here, just use if_else. rbinom is overkill also, runif works fine.
mtcars %>%
mutate(mpg = if_else(runif(n = n()) > 0.5, NA_real_, mpg))
With a slight modification of your code:
mtcars %>%
mutate(mpg = if_else(rbinom(n(), 1, prob = 0.5) == 1, NA_real_, mpg))
mpg cyl disp hp drat wt qsec vs am gear carb
1 21.0 6 160.0 110 3.90 2.620 16.46 0 1 4 4
2 21.0 6 160.0 110 3.90 2.875 17.02 0 1 4 4
3 22.8 4 108.0 93 3.85 2.320 18.61 1 1 4 1
4 NA 6 258.0 110 3.08 3.215 19.44 1 0 3 1
5 18.7 8 360.0 175 3.15 3.440 17.02 0 0 3 2
6 NA 6 225.0 105 2.76 3.460 20.22 1 0 3 1
7 14.3 8 360.0 245 3.21 3.570 15.84 0 0 3 4
8 24.4 4 146.7 62 3.69 3.190 20.00 1 0 4 2
9 22.8 4 140.8 95 3.92 3.150 22.90 1 0 4 2
10 NA 6 167.6 123 3.92 3.440 18.30 1 0 4 4
I'm not sure of the best way to ask this question.
I would like to mutate using case_when (or if_else if that works better) to examine if a value exists in any of a range of columns.
E.g. in mtcars I would like to check if any of the columns vs, am, gear or carb contained 1 or 2 and set a new variable newVar to 1 if they do. I could do the following:
mtcars %>%
mutate(newVar = case_when(vs %in% c(1, 2) | am %in% c(1, 2) | gear %in% c(1, 2) | carb %in% c(1, 2) ~ 1,
TRUE ~ 0))
Is there a prettier way to do this? I want to check across 10+ columns so it gets long. Something like:
mtcars %>%
mutate(newVar = case_when(c(vs, am, gear, carb) %in% c(1, 2) ~ 1,
TRUE ~ 0))
I think base R can work good here. Select columns for which you want to check and take row wise sum of logical vector to calculate newVar.
df <- mtcars
cols <- c("vs", "am", "gear", "carb")
df$newVar <- +(rowSums(df[cols] == 1 | df[cols] == 2) > 0)
df
# mpg cyl disp hp drat wt qsec vs am gear carb newVar
#Mazda RX4 21.0 6 160.0 110 3.90 2.620 16.46 0 1 4 4 1
#Mazda RX4 Wag 21.0 6 160.0 110 3.90 2.875 17.02 0 1 4 4 1
#Datsun 710 22.8 4 108.0 93 3.85 2.320 18.61 1 1 4 1 1
#Hornet 4 Drive 21.4 6 258.0 110 3.08 3.215 19.44 1 0 3 1 1
#Hornet Sportabout 18.7 8 360.0 175 3.15 3.440 17.02 0 0 3 2 1
#Valiant 18.1 6 225.0 105 2.76 3.460 20.22 1 0 3 1 1
#Duster 360 14.3 8 360.0 245 3.21 3.570 15.84 0 0 3 4 0
#Merc 240D 24.4 4 146.7 62 3.69 3.190 20.00 1 0 4 2 1
#Merc 230 22.8 4 140.8 95 3.92 3.150 22.90 1 0 4 2 1
#Merc 280 19.2 6 167.6 123 3.92 3.440 18.30 1 0 4 4 1
#....
We can also use apply for row-wise manipulation
df$newVar <- +(apply(df[cols] == 1 | df[cols] == 2, 1, any))
We can use tidyverse option to create the column
library(dplyr)
library(purrr)
mtcars %>%
mutate(newVar = select(., vs:carb) %>%
map(~ .x %in% 1:2) %>%
reduce(`|`) %>%
as.integer)
#. mpg cyl disp hp drat wt qsec vs am gear carb newVar
#1 21.0 6 160.0 110 3.90 2.620 16.46 0 1 4 4 1
#2 21.0 6 160.0 110 3.90 2.875 17.02 0 1 4 4 1
#3 22.8 4 108.0 93 3.85 2.320 18.61 1 1 4 1 1
#4 21.4 6 258.0 110 3.08 3.215 19.44 1 0 3 1 1
#5 18.7 8 360.0 175 3.15 3.440 17.02 0 0 3 2 1
#6 18.1 6 225.0 105 2.76 3.460 20.22 1 0 3 1 1
#7 14.3 8 360.0 245 3.21 3.570 15.84 0 0 3 4 0
#8 24.4 4 146.7 62 3.69 3.190 20.00 1 0 4 2 1
# ...
Or with base R
nm1 <- c("vs", "am", "gear", "carb")
mtcars$newVar <- +(Reduce(`|`, lapply(mtcars[nm1], `%in%`, 1:2)))
I am trying to convert a large number of numeric variables into factor variables using a 'codebook' of factor levels (formatted as a list of named lists). I can do this one by one using mutate() and recode_factor(), but would like to do them all in one go using mutate_at(). How might I go about this?
codebook <- list(
vs = list(`0` = 'V-shaped',
`1` = 'straight'),
am = list(`0` = 'automatic',
`1` = 'manual')
)
mtcars %>%
mutate(vs = recode_factor(vs, levels = !!!(pluck(codebook, 'vs'))))
mtcars %>%
mutate_at(vars(names(codebook)),
funs(recode_factor(., levels = !!!(pluck(codebook, 'somehow_pass_column_name_here?')))))
One option would be to loop through the names of the 'codebook'
library(tidyverse)
names(codebook) %>%
map(~ mtcars %>%
transmute(!! .x := recode_factor(!! rlang::sym(.x),
levels = !!!(pluck(codebook, .x))))) %>%
bind_cols(mtcars %>%
select(-one_of(names(codebook))), .)
or use a for loop
library(magrittr)
for(nm in names(codebook)) {
mtcars %<>%
mutate(!! nm := recode_factor(!! rlang::sym(nm),
levels = !!!(pluck(codebook, nm))))
}
You could still use mutate for multiple variables, unless that's what you meant by one-by-one. I'm not well versed on mutate_at, so maybe someone knows that method.
mtcars %>%
mutate(vs = recode_factor(vs, levels = !!!(pluck(codebook, 'vs'))),
am = recode_factor(am, levels = !!!(pluck(codebook, 'am'))))
mpg cyl disp hp drat wt qsec vs am gear carb
1 21.0 6 160.0 110 3.90 2.620 16.46 V-shaped manual 4 4
2 21.0 6 160.0 110 3.90 2.875 17.02 V-shaped manual 4 4
3 22.8 4 108.0 93 3.85 2.320 18.61 straight manual 4 1
4 21.4 6 258.0 110 3.08 3.215 19.44 straight automatic 3 1
5 18.7 8 360.0 175 3.15 3.440 17.02 V-shaped automatic 3 2
6 18.1 6 225.0 105 2.76 3.460 20.22 straight automatic 3 1
7 14.3 8 360.0 245 3.21 3.570 15.84 V-shaped automatic 3 4
8 24.4 4 146.7 62 3.69 3.190 20.00 straight automatic 4 2
9 22.8 4 140.8 95 3.92 3.150 22.90 straight automatic 4 2
10 19.2 6 167.6 123 3.92 3.440 18.30 straight automatic 4 4
I have this function
var_sup <- function(var1,var2)
{
df$RD <- ifelse(df[var1]>df[var2],1,0)
df$RD <- as.numeric(df$RD)
return(df)
}
I want to write with dplyr to use it : like that
var_sup(num,num2) without "" !
compare_sup <- function (var1,var2) {
# capture the argument without evaluating it
var1 <- quo_name(enquo(var1))
var2 <- quo_name(enquo(var2))
# construct the expression
df %>%
mutate(RD = ifelse(!!var1 > !!var2 ,1,0))
}
I tried that but I have an error
thank you
The following works for me:
compare_sup <- function (var1,var2) {
require(tidyverse)
# capture the argument without evaluating it
var1 <- enquo(var1)
var2 <- enquo(var2)
# construct the expression
mtcars %>%
mutate(RD = ifelse(!!var1 > !!var2, 1, 0))
}
compare_sup(drat, wt) %>% head
# mpg cyl disp hp drat wt qsec vs am gear carb RD
#1 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4 1
#2 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4 1
#3 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1 1
#4 21.4 6 258 110 3.08 3.215 19.44 1 0 3 1 0
#5 18.7 8 360 175 3.15 3.440 17.02 0 0 3 2 0
#6 18.1 6 225 105 2.76 3.460 20.22 1 0 3 1 0
I basically removed the quo_name() from the function (and used mtcars as data set).
While within dplyr workflow I would like to append a row across a selected number of columns.
Desired results
Starting with the mtcarsdata and applying function(s) with the goal of adding string "A" to columns 2:5 the one should arrive at the following results:
mpg cyl disp hp drat wt qsec vs am gear carb
NA A A A A NA NA NA NA NA NA
Mazda RX4 21.0 6 160.0 110 3.90 2.620 16.46 0 1 4 4
Mazda RX4 Wag 21.0 6 160.0 110 3.90 2.875 17.02 0 1 4 4
Datsun 710 22.8 4 108.0 93 3.85 2.320 18.61 1 1 4 1
Hornet 4 Drive 21.4 6 258.0 110 3.08 3.215 19.44 1 0 3 1
Hornet Sportabout 18.7 8 360.0 175 3.15 3.440 17.02 0 0 3 2
Valiant 18.1 6 225.0 105 2.76 3.460 20.22 1 0 3 1
Duster 360 14.3 8 360.0 245 3.21 3.570 15.84 0 0 3 4
Merc 240D 24.4 4 146.7 62 3.69 3.190 20.00 1 0 4 2
Merc 230 22.8 4 140.8 95 3.92 3.150 22.90 1 0 4 2
The following criteria were met:
For the columns with available index in vars() call the "A" string was added
For the remaining columns the NA value was provided
Approach
require(dplyr)
mtcars %>%
mutate_at(.cols = vars(2:5),
.funs = add_row(. = "A", .before = 1))
Naturally, this results in an error message:
Error: Unsupported index type: NULL
Hence my question: how can I utilise add_row, or a similar approach, to force value across a set of columns initially passed via vars()?
Side notes
I don't mind doing this via rbind but I would like to keep my %>% workflow:
%>% - receive object
Add something across first row to columns x:y %>%
Add something across first row to columns m:n %>%
Other manipulations
Add the row then update:
mtcars %>%
head %>%
add_row(.before = 1) %>%
mutate_at(.cols = vars(2:5),
funs(ifelse(is.na(.), "A", .)))
# mpg cyl disp hp drat wt qsec vs am gear carb
# 1 NA A A A A NA NA NA NA NA NA
# 2 21.0 6 160 110 3.9 2.620 16.46 0 1 4 4
# 3 21.0 6 160 110 3.9 2.875 17.02 0 1 4 4
# 4 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1
# 5 21.4 6 258 110 3.08 3.215 19.44 1 0 3 1
# 6 18.7 8 360 175 3.15 3.440 17.02 0 0 3 2
# 7 18.1 6 225 105 2.76 3.460 20.22 1 0 3 1
Note: This will add "A" to any row that has NAs.