How do I convert this code from mutate_ to mutate? - r

I am trying to create a copy of a column based on a variable - that is, the new column's name is constant, but which one it copies changes. This is what I would do previously:
library(dplyr)
x <- "mpg"
mtcars %>%
mutate_(Target = x)
To receive results like this:
However, when you run this, you now receive a warning:
Warning message:
mutate_() is deprecated.
Please use mutate() instead
It suggests looking at https://tidyeval.tidyverse.org/ for guidance; I've had a quick skim, but didn't spot this as a use case in the document. (It doesn't seem to cover the problem of converting existing code, but maybe I'm just not understanding it well enough?)
How do I move this code from mutate_() to mutate()?

You need to adhere to dplyr's non-standard evaluation
mtcars %>% mutate(Target = !!sym(x))
# mpg cyl disp hp drat wt qsec vs am gear carb Target
#1 21.0 6 160.0 110 3.90 2.620 16.46 0 1 4 4 21.0
#2 21.0 6 160.0 110 3.90 2.875 17.02 0 1 4 4 21.0
#3 22.8 4 108.0 93 3.85 2.320 18.61 1 1 4 1 22.8
#4 21.4 6 258.0 110 3.08 3.215 19.44 1 0 3 1 21.4
#5 18.7 8 360.0 175 3.15 3.440 17.02 0 0 3 2 18.7
#6 18.1 6 225.0 105 2.76 3.460 20.22 1 0 3 1 18.1
...
Here sym takes a string as input and turns it into a symbol, which you then unquote using the bang-bang operator !!.
Also note that mutate_ has been deprecated.

We can use mutate_at and this can be also used for multiple columns
library(dplyr)
mtcars %>%
mutate_at(vars(x), list(Target = ~ I))

You could use rlang::sym or base R get
library(dplyr)
mtcars %>% mutate(Target = !!rlang::sym(x))
mtcars %>% mutate(Target = get(x))

You can also try basic way like this...
x <- mtcars$mpg
mtcars$Target <- x

Related

Turning data manipulation into a function in R

I have downloaded an .ods file from this website (UK office for national statistics). Because of the way the sheet is structured, I import it as two separate dataframes:
library(readODS)
income_pretax <- read_ods('/Users/c.robin/Downloads/NS_Table_3_1a_1819.ods', range = "A4:U103")
income_posttax <- read_ods('/Users/c.robin/Downloads/NS_Table_3_1a_1819.ods', range = "A104:U203")
I want to do some cleaning on both dataframes: changing the name of the two of the variables and recasting one of the variables as numeric. This is what I have for this, which works on a single df:
income_pretax <- income_pretax %>%
rename(pp_tot_income_pretax = 'Percentile point\nTotal income before tax',
'2008-09' = '2008-09(a)')
income_pretax['2008-09'] <- as.numeric(income_pretax$'2008-09')
I'm struggling to get the above into a function though. I think it should be something like the below, but honestly I have no idea how to tell R i'm passing multiple dataframes to the function, nor how to handle multiple variables. Can anyone advise on this?
##Attempting a function
cleanvars <- function(data, varlist){
data <- data %>%
rename(pp_tot_income_pretax = {{varlist}})
data['2008-09'] <- as.numeric(data$'2008-09')
}
You can pass a named vector to the function.
library(dplyr)
cleanvars <- function(data, varlist){
data %>% rename(varlist)
}
cleanvars(mtcars %>% head, c('new_mpg' = 'mpg', 'new_cyl' = 'cyl'))
# new_mpg new_cyl disp hp drat wt qsec vs am gear carb
#Mazda RX4 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4
#Mazda RX4 Wag 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4
#Datsun 710 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1
#Hornet 4 Drive 21.4 6 258 110 3.08 3.215 19.44 1 0 3 1
#Hornet Sportabout 18.7 8 360 175 3.15 3.440 17.02 0 0 3 2
#Valiant 18.1 6 225 105 2.76 3.460 20.22 1 0 3 1
We can do this in base R
nm1 <- c('mpg', 'cyl')
nm2 <- paste0("new_", nm1)
i1 <- match(nm1, names(mtcars))
names(mtcars)[i1] <- nm2

Can you use "starts_with" as shorthand within a simple "as.numeric" function to query multiple columns?

I have a dataframe with multiple columns of a numeric type, where I want to query if a range of values exist in any of them, and bring back a true/false binary flag with as.numeric.
So I can do this the long way with:
df <- df %>%
mutate(flag = as.numeric(days_dry %in% c(1:28) |
days_frozen %in% c(1:28) |
days_fresh %in% c(1:28))
But I have a bunch of columns I want to query. Why can't I bring back the same result with this?:
df <- df %>%
mutate(flag = as.numeric(vars(starts_with("days_")) %in% c(1:28))
I get no error, but it doesn't bring back any cases which match the criteria.
There might be a better way, but ...
mtcars %>%
mutate(flag = rowSums(sapply(cbind(select(., starts_with("c"))), `%in%`, 4:6)) > 0) %>%
head()
# mpg cyl disp hp drat wt qsec vs am gear carb flag
# Mazda RX4 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4 TRUE
# Mazda RX4 Wag 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4 TRUE
# Datsun 710 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1 TRUE
# Hornet 4 Drive 21.4 6 258 110 3.08 3.215 19.44 1 0 3 1 TRUE
# Hornet Sportabout 18.7 8 360 175 3.15 3.440 17.02 0 0 3 2 FALSE
# Valiant 18.1 6 225 105 2.76 3.460 20.22 1 0 3 1 TRUE
The premise is using cbind(select(., <>))) to form a mid-pipe inner frame. From there, we sapply over its columns, converting them to columns of logicals. The last step is using rowSums(.) > 0 to determine if a row has at least one TRUE; an alternative to rowSums can use Reduce(``` | ```, ...), but while that is elegant in a list-processing kind of way, it is also slower (especially with multiple matching columns).

Make a function work on the LHS of the assignment operator in R

It seems like I should know the answer to this but I don't. How can I write a function that will work on the lefthand side of the assignment operator? E.g., in the example below how can I make a function called my.rownames that I can put on the LHS of <- to assign rownames to foo.
# get rownames and change them
foo <- rownames(mtcars)
foo <- paste("x",foo)
# put altered rownames back
rownames(mtcars) <- foo
# create a new function my.rownames
my.rownames <- rownames
# works
my.rownames(mtcars)
# doesn't work
my.rownames(mtcars) <- foo
According to ?rownames
row.names returns a character vector.
row.names<- returns a data frame with the row names changed.
`my.rownames<-` <- `rownames<-`
Also,
There are generic functions for getting and setting row names, with default methods for arrays. The description here is for the data.frame method.
.rowNamesDF<- is a (non-generic replacement) function to set row names for data frames, with extra argument make.names. This function only exists as workaround as we cannot easily change the row.names<- generic without breaking legacy code in existing packages.
it should work
data(mtcars)
my.rownames(mtcars) <- foo
head(mtcars)
# mpg cyl disp hp drat wt qsec vs am gear carb
#x Mazda RX4 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4
#x Mazda RX4 Wag 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4
#x Datsun 710 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1
#x Hornet 4 Drive 21.4 6 258 110 3.08 3.215 19.44 1 0 3 1
#x Hornet Sportabout 18.7 8 360 175 3.15 3.440 17.02 0 0 3 2
#x Valiant 18.1 6 225 105 2.76 3.460 20.22 1 0 3 1

mutate several features simultaniously with a custom function

I've found several SO posts on this already but cannot see how to apply to my specific problem.
I have a dataframe with a number of features that I would like to simultaneously mutate. I want to write over them rather than create new features.
E.g. using mtcars. Suppose I want to amend am, gear and carb to be 1 if greater than 0 and 0 otherwise. For each of those 3 features. How could I do that?
mtcars %>% mutate_at(vs:carb, funs(???))
I want to apply a custom function of this form ifelse(x > 0, 1, 0) where x is either of the 3 features being worked on.
How can I achieve this?
You need to use vars() for vs:carb to parse, and you use . as a stand-in for the argument in funs:
mtcars %>% mutate_at(vars(vs:carb), funs(ifelse(. > 0, 1, 0)))
# mpg cyl disp hp drat wt qsec vs am gear carb
# 1 21.0 6 160.0 110 3.90 2.620 16.46 0 1 1 1
# 2 21.0 6 160.0 110 3.90 2.875 17.02 0 1 1 1
# 3 22.8 4 108.0 93 3.85 2.320 18.61 1 1 1 1
# 4 21.4 6 258.0 110 3.08 3.215 19.44 1 0 1 1
# 5 18.7 8 360.0 175 3.15 3.440 17.02 0 0 1 1
# 6 18.1 6 225.0 105 2.76 3.460 20.22 1 0 1 1
# ...
This is explained in the ?funs help page:
A list of functions specified by:
Their name, "mean"
The function itself, mean
A call to the function with . as a dummy argument, mean(., na.rm = TRUE)
With this corresponding to the third bullet.

Assign new variable created with mutate_ to a name that will be passed as a string in dplyr

I want to use strings with dplyr expressions and in particular I want to pass expressions as strings to mutate to create new variables and assign names to these variables that will be passed also as strings.
My code at this point is the following:
library(dplyr)
library(tidyverse)
data(mtcars)
mutate_expr = "gear * carb"
mtcars %>% mutate_(mutate_expr)
The new variable is named here 'gear*carb'. How I could give it the name 'gear_carb' passing the name to the dplyr expression as a string?
You now do this with tidyeval:
library(dplyr)
mutate_expr <- quo(gear * carb)
mtcars %>% mutate(new_col = !!mutate_expr) %>% head()
#> mpg cyl disp hp drat wt qsec vs am gear carb new_col
#> 1 21.0 6 160.0 110 3.90 2.620 16.46 0 1 4 4 16
#> 2 21.0 6 160.0 110 3.90 2.875 17.02 0 1 4 4 16
#> 3 22.8 4 108.0 93 3.85 2.320 18.61 1 1 4 1 4
#> 4 21.4 6 258.0 110 3.08 3.215 19.44 1 0 3 1 3
#> 5 18.7 8 360.0 175 3.15 3.440 17.02 0 0 3 2 6
#> 6 18.1 6 225.0 105 2.76 3.460 20.22 1 0 3 1 3
If you must store the expression as a string, you can use sym instead of quo (really rlang::parse_expr in this context), but storing code as a character string is a bad idea.

Resources