I'm new to writing functions in R, but want to write a function to add 1% of the median of a variable to itself, using dplyr, and replace the variable with this transformation.
x is a numeric variable.
add_median <- function(df, x) {
x <- enquo(x)
x <- quo_name(x)
mutate(x=x+.01*median(x, na.rm=T))
}
When I run newDF <- DF %>% add_median(variable_of_interest), I get the following error:
Error in 0.01 * median(x, na.rm = T) : non-numeric argument to binary operator
What am I doing wrong here?
We could change the function to evaluate with {{}} and then use assign (:=) instead of = in mutate
library(dplyr)
add_median <- function(df, x) {
df %>%
mutate({{x}} := {{x}} + .01 * median({{x}}, na.rm = TRUE))
}
If we need to change multiple columns, use mutate_at
add_median_multiple <- function(df, vec){
df %>%
mutate_at(vars(vec), ~ . + .01 * median(., na.rm = TRUE))
}
-testing
data(mtcars)
head(mtcars) %>%
add_median(mpg)
# mpg cyl disp hp drat wt qsec vs am gear carb
#Mazda RX4 21.21 6 160 110 3.90 2.620 16.46 0 1 4 4
#Mazda RX4 Wag 21.21 6 160 110 3.90 2.875 17.02 0 1 4 4
#Datsun 710 23.01 4 108 93 3.85 2.320 18.61 1 1 4 1
#Hornet 4 Drive 21.61 6 258 110 3.08 3.215 19.44 1 0 3 1
#Hornet Sportabout 18.91 8 360 175 3.15 3.440 17.02 0 0 3 2
#Valiant 18.31 6 225 105 2.76 3.460 20.22 1 0 3 1
comparison with original 'mpg' column
head(mtcars)
# mpg cyl disp hp drat wt qsec vs am gear carb
#Mazda RX4 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4
#Mazda RX4 Wag 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4
#Datsun 710 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1
#Hornet 4 Drive 21.4 6 258 110 3.08 3.215 19.44 1 0 3 1
#Hornet Sportabout 18.7 8 360 175 3.15 3.440 17.02 0 0 3 2
#Valiant 18.1 6 225 105 2.76 3.460 20.22 1 0 3 1
add_median_multiple(head(mtcars), c('mpg', 'wt'))
# mpg cyl disp hp drat wt qsec vs am gear carb
#Mazda RX4 21.21 6 160 110 3.90 2.65045 16.46 0 1 4 4
#Mazda RX4 Wag 21.21 6 160 110 3.90 2.90545 17.02 0 1 4 4
#Datsun 710 23.01 4 108 93 3.85 2.35045 18.61 1 1 4 1
#Hornet 4 Drive 21.61 6 258 110 3.08 3.24545 19.44 1 0 3 1
#Hornet Sportabout 18.91 8 360 175 3.15 3.47045 17.02 0 0 3 2
#Valiant 18.31 6 225 105 2.76 3.49045 20.22 1 0 3 1
Related
I want to rename() some variables in my data programmatically, so I can to it via map at some point.
I'm looking for the equivalent of,
library(tidyverse)
mtcars %>% rename(
"MPG" = "mpg"
)
but using environment variables instead. I tried !!sym() by doing the following,
library(tidyverse)
new_name <- "MPG"
old_name <- "mpg"
mtcars %>% rename(
!!sym(new_name) = !!sym(old_name)
)
However, I get the error Error: unexpected ')' in ")". I am not sure what I am missing here!
We could use setNames and evaluate (!!!)
head(mtcars %>%
rename(!!! setNames(old_name, new_name)))
-output
MPG cyl disp hp drat wt qsec vs am gear carb
Mazda RX4 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4
Mazda RX4 Wag 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4
Datsun 710 22.8 6 108 93 3.85 2.320 18.61 1 1 4 1
Hornet 4 Drive 21.4 6 258 110 3.08 3.215 19.44 1 0 3 1
Hornet Sportabout 18.7 8 360 175 3.15 3.440 17.02 0 0 3 2
Valiant 18.1 6 225 105 2.76 3.460 20.22 1 0 3 1
You can use {{}} -
library(dplyr)
new_name <- "MPG"
old_name <- "mpg"
mtcars %>% rename({{new_name}} := {{old_name}}) %>% head
# MPG cyl disp hp drat wt qsec vs am gear carb
#Mazda RX4 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4
#Mazda RX4 Wag 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4
#Datsun 710 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1
#Hornet 4 Drive 21.4 6 258 110 3.08 3.215 19.44 1 0 3 1
#Hornet Sportabout 18.7 8 360 175 3.15 3.440 17.02 0 0 3 2
#Valiant 18.1 6 225 105 2.76 3.460 20.22 1 0 3 1
I'm looking to add the input of a function to the column name created in the function.
new_col = function(dat_, period_) {
dat_ %>% mutate(paste0('mpg',period_) = mpg + period_
}
mtcars %>% newcol(20)
In this function, mpg20 should be created where it's values are equal to mpg + 20.
You can use :
library(dplyr)
new_col = function(dat_, period_) {
dat_ %>% mutate(!!paste0('mpg',period_) := mpg + period_)
}
mtcars %>% new_col(20)
# mpg cyl disp hp drat wt qsec vs am gear carb mpg20
#Mazda RX4 21.0 6 160.0 110 3.90 2.620 16.46 0 1 4 4 41.0
#Mazda RX4 Wag 21.0 6 160.0 110 3.90 2.875 17.02 0 1 4 4 41.0
#Datsun 710 22.8 4 108.0 93 3.85 2.320 18.61 1 1 4 1 42.8
#Hornet 4 Drive 21.4 6 258.0 110 3.08 3.215 19.44 1 0 3 1 41.4
#Hornet Sportabout 18.7 8 360.0 175 3.15 3.440 17.02 0 0 3 2 38.7
#Valiant 18.1 6 225.0 105 2.76 3.460 20.22 1 0 3 1 38.1
#...
#...
!! and := combination is used when we have a character variable at the LHS.
I have a set a variables say Var1, Var2 to Varn. They all take three possible values 0, 1, and 2. I want to replace all 2 as 1
like so
df$Var1[df$Var1 >= 1] <- 1
This does the job. But when I try to write a function to do this
MakeBinary <- function(varName dfName){dfName$varName[dfName$varNAme > = 1] <- 1}
and use this function like:
MakeBinary(Var2, df)
I got an error message: Error in $<-.data.frame(*tmp*, "varName", value = numeric(0)) :
replacement has 0 rows, data has 512.
I just want to know why I got this message. Thanks. My sample size is 512.
If we are passing column name as string, then use [[ instead of $ and return the dataset
MakeBinary <- function(varName, dfName){
dfName[[varName]][dfName[[varName]] >= 1] <- 1
dfName
}
MakeBinary("Var2", df)
example with mtcars
MakeBinary("carb", head(mtcars))
# mpg cyl disp hp drat wt qsec vs am gear carb
#Mazda RX4 21.0 6 160 110 3.90 2.620 16.46 0 1 4 1
#Mazda RX4 Wag 21.0 6 160 110 3.90 2.875 17.02 0 1 4 1
#Datsun 710 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1
#Hornet 4 Drive 21.4 6 258 110 3.08 3.215 19.44 1 0 3 1
#Hornet Sportabout 18.7 8 360 175 3.15 3.440 17.02 0 0 3 1
#Valiant 18.1 6 225 105 2.76 3.460 20.22 1 0 3 1
Unquoted arguments for variable names can be passed as well, but it needs to be converted to string
MakeBinary <- function(varName, dfName){
varName <- deparse(substitute(varName))
dfName[[varName]][dfName[[varName]] >= 1] <- 1
dfName
}
MakeBinary(Var2, df)
Using a reproducible example with mtcars
MakeBinary(carb, head(mtcars))
# mpg cyl disp hp drat wt qsec vs am gear carb
#Mazda RX4 21.0 6 160 110 3.90 2.620 16.46 0 1 4 1
#Mazda RX4 Wag 21.0 6 160 110 3.90 2.875 17.02 0 1 4 1
#Datsun 710 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1
#Hornet 4 Drive 21.4 6 258 110 3.08 3.215 19.44 1 0 3 1
#Hornet Sportabout 18.7 8 360 175 3.15 3.440 17.02 0 0 3 1
#Valiant 18.1 6 225 105 2.76 3.460 20.22 1 0 3 1
When I create a new variable, is there a way to specify in the function where to place it?
Right now, it adds it to the end of the dataframe, but for ease of viewing in Excel for example, I'd like to place a new calculated column beside the columns I used for the calculation.
Here's an example of code:
rawdata2 <- (rawdata1 %>% unite(location, locations1,locations2, locations3,
na.rm = TRUE, remove=TRUE)
%>% select(-location7, -location16)
%>% unite(Sector, Sectors, na.rm=TRUE, remove=TRUE)
%>% unite(TypeofSpace, TypesofSpace, type.of.spaceOther, na.rm=TRUE,
remove=TRUE)
)
You can rearrange the columns in your data frame. It looks like you are using dplyr::select in your example.
library(dplyr)
head(mtcars)
# mpg cyl disp hp drat wt qsec vs am gear carb
# Mazda RX4 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4
# Mazda RX4 Wag 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4
# Datsun 710 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1
# Hornet 4 Drive 21.4 6 258 110 3.08 3.215 19.44 1 0 3 1
# Hornet Sportabout 18.7 8 360 175 3.15 3.440 17.02 0 0 3 2
# Valiant 18.1 6 225 105 2.76 3.460 20.22 1 0 3 1
mtcars2 <- mtcars %>%
select(mpg, carb, everything()) ## moves carb up behind mpg
head(mtcars2)
# mpg carb cyl disp hp drat wt qsec vs am gear
# Mazda RX4 21.0 4 6 160 110 3.90 2.620 16.46 0 1 4
# Mazda RX4 Wag 21.0 4 6 160 110 3.90 2.875 17.02 0 1 4
# Datsun 710 22.8 1 4 108 93 3.85 2.320 18.61 1 1 4
# Hornet 4 Drive 21.4 1 6 258 110 3.08 3.215 19.44 1 0 3
# Hornet Sportabout 18.7 2 8 360 175 3.15 3.440 17.02 0 0 3
# Valiant 18.1 1 6 225 105 2.76 3.460 20.22 1 0 3
You can do the same thing with base subsetting, for example with a data frame with 11 columns you can move the 11th behind the second by
mtcars3 <- mtcars[,c(1,11,2:10)]
identical(mtcars2, mtcars3)
# [1] TRUE
I ended up using relocate, documentation here: dplyr.tidyverse.org/reference/relocate.html
I'm trying to calculate a new column with a user defined function that needs data from same row and a fixed value valid for all rows:
myfunc <- function(ds,colname,val1,col1,col2){
# content of new column <colname> should be computed from:
ds[colname] = val1 + ds[col1] * ds[col2] # for each row of ds
return(ds)
}
v1 = 2
data(mtcars)
mt = head(mtcars)
mt
mpg cyl disp hp drat wt qsec vs am gear
carb
Mazda RX4 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4
Mazda RX4 Wag 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4
Datsun 710 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1
Hornet 4 Drive 21.4 6 258 110 3.08 3.215 19.44 1 0 3 1
Hornet Sportabout 18.7 8 360 175 3.15 3.440 17.02 0 0 3 2
Valiant 18.1 6 225 105 2.76 3.460 20.22 1 0 3 1
apply(mt,'newcol',v1,mt$wt,mt$qsec)
mt
What I would like to see in mt$newcol in first row is: 2 + 2.620 * 16.46 (-> 45.12) and all other rows similiar.
So, how can I send a fixed value (v1) and two values from each row to my function and store returned value in this row in a new column?
Thanks
dplyr approach:
library(dplyr)
data(mtcars)
myfunc <- function(ds, new_column, val1, col1, col2){
name <- rownames(ds)
ds <- ds %>%
mutate(!!as.name(new_column) := val1 + !!as.name(col1) + !!as.name(col2),
car_name = name) %>%
select(car_name, mpg:!!as.name(new_column))
return(ds)
}
head(
myfunc(ds = mtcars,
new_column = "new_column",
val1 = 2,
col1 = "hp",
col2 = "vs")
)
output
car_name mpg cyl disp hp drat wt qsec vs am gear carb new_column
1 Mazda RX4 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4 112
2 Mazda RX4 Wag 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4 112
3 Datsun 710 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1 96
4 Hornet 4 Drive 21.4 6 258 110 3.08 3.215 19.44 1 0 3 1 113
5 Hornet Sportabout 18.7 8 360 175 3.15 3.440 17.02 0 0 3 2 177
6 Valiant 18.1 6 225 105 2.76 3.460 20.22 1 0 3 1 108