R function with variable number of arguments for ifelse - r

I have a script that creates a column so that I know which rule should be applied to each row in a dataframe.
EndoSubset$FU_Group<-ifelse(EndoSubset$IMorNoIM=="No_IM","Rule1",
ifelse(EndoSubset$IMorNoIM=="IM","Rule2",
ifelse(EndoSubset$IMorNoIM=="AnotherIM","Rule3",
"NoRules")))
I want to make this into a function so that there can be any number of rules and any number of conditions for a column so it could be:
EndoSubset$FU_Group<-ifelse(EndoSubset$IMorNoIM=="No_IM","Rule1",
ifelse(EndoSubset$IMorNoIM=="IM","Rule2",
ifelse(EndoSubset$IMorNoIM=="AnotherIM","Rule3",
ifelse(EndoSubset$IMorNoIM=="SomeOtherIM","Rule4",
ifelse(EndoSubset$IMorNoIM=="LotsOfIM","Rule5",
"NoRules")))
I understand that I can use the ellipsis for this but I don't understand how to use this for both the conditional string ("No_IM, "IM,"AnotherIM", etc) and the Rule string at the same time ("Rule1","Rule2","Rule3" etc.)

This answer is based upon another, incomplete answer that has been deleted.
You can use case_when() from the dplyr package to achieve this. It takes an arbitrary number of conditions. Since you don't give a reproducible example, I show how this works with mtcars:
library(dplyr)
mtcars$cyl_group <- case_when(mtcars$cyl == 4 ~ "Rule1",
mtcars$cyl == 6 ~ "Rule2",
TRUE ~ "NoRules")
mtcars[2:5, ]
## mpg cyl disp hp drat wt qsec vs am gear carb cyl_group
## Mazda RX4 Wag 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4 Rule2
## Datsun 710 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1 Rule1
## Hornet 4 Drive 21.4 6 258 110 3.08 3.215 19.44 1 0 3 1 Rule2
## Hornet Sportabout 18.7 8 360 175 3.15 3.440 17.02 0 0 3 2 NoRules
As you can see, you can easily connect a condition with a value using ~. Your two examples can probably be solved like this (I cannot check this, since you don't give your data):
EndoSubset$FU_Group <- case_when(EndoSubset$IMorNoIM == "No_IM" ~ "Rule1",
EndoSubset$IMorNoIM == "IM" ~ "Rule2",
EndoSubset$IMorNoIM == "AnotherIM" ~ "Rule3",
TRUE ~ "NoRules")
EndoSubset$FU_Group <- case_when(EndoSubset$IMorNoIM == "No_IM" ~ "Rule1",
EndoSubset$IMorNoIM == "IM" ~ "Rule2",
EndoSubset$IMorNoIM == "AnotherIM" ~ "Rule3",
EndoSubset$IMorNoIM == "SomeOtherIM" ~ "Rule4",
EndoSubset$IMorNoIM == "LotsOfIM" ~ "Rule5",
TRUE ~ "NoRules")

Related

Using glue-like constructs on RHS in R/Tidyeval

I've spent hours trying to make glue on the RHS of a formula work and out of clues. Here is a simple reprex.
meta <- function(x, var, suffix){
x<- x %>% mutate("{{var}}_{suffix}":= 5)
x<- x %>% mutate("{{var}}_{suffix}_new":= {{var}} - "{{var}}_{suffix}")
}
x<- meta(mtcars, mpg, suf)
#Should be equivalent to
x<- mtcars %>% mutate(mpg_suf:= 5)
x<- x%>% mutate(mpg_suf_new:= mpg - mpg_suf)
#N: Tried https://stackoverflow.com/questions/70427403/how-to-correctly-glue-together-prefix-suffix-in-a-function-call-rhs but none of the methods in it worked, unfortunately
Meta function gives me "Error in local_error_context(dots = dots, .index = i, mask = mask) :
promise already under evaluation: recursive default argument reference or earlier problems? "
Went over all hits for the searchwords for it on SO but nothing worked at the moment.
Would really appreciate any insights. Thank you!
Here is a working version:
meta <- function(x, var, suffix){
new_name <- rlang::englue("{{ var }}_{{ suffix }}")
x %>%
mutate("{new_name}" := 5) %>%
mutate("{new_name}_new" := {{ var }} - .data[[new_name]])
}
names(meta(mtcars, mpg, suf))
#> [1] "mpg" "cyl" "disp" "hp"
#> [5] "drat" "wt" "qsec" "vs"
#> [9] "am" "gear" "carb" "mpg_suf"
#> [13] "mpg_suf_new"
To understand what is going on:
Learn about the difference between "{{ var }}" and "{var}" in tidyeval glue strings: https://rlang.r-lib.org/reference/glue-operators.html
Learn about englue() to create glue strings outside of the LHS of :=: https://rlang.r-lib.org/reference/englue.html. This part is not necessary but I thought it was nicer to create and reuse a variable.
Tricky part, you create a new column with a constructed name and then want to use the new column that this name refers to. You'll have to subset it with .data, see: https://rlang.r-lib.org/reference/dot-data.html
See also the general topic: https://rlang.r-lib.org/reference/topic-data-mask-programming.html
I think it's best if we define the pieces we need first, then we can use them as needed on the LHS or the RHS of the calculation. I will add that it doesn't make much sense to me to pass the suffix argument as a bare name. I think it would be a clearer choice to make it string only.
library(dplyr)
meta <- function(x, var, suffix) {
var <- rlang::as_name(enquo(var))
suffix <- rlang::as_name(enquo(suffix)) # Remove this to make "suffix" string only.
new_var <- glue::glue("{var}_{suffix}")
x %>%
mutate("{new_var}" := 5,
"{new_var}_new" := !!sym(var) - !!sym(new_var))
}
mtcars %>%
head() %>%
meta(mpg, suf)
mpg cyl disp hp drat wt qsec vs am gear carb mpg_suf mpg_suf_new
Mazda RX4 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4 5 16.0
Mazda RX4 Wag 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4 5 16.0
Datsun 710 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1 5 17.8
Hornet 4 Drive 21.4 6 258 110 3.08 3.215 19.44 1 0 3 1 5 16.4
Hornet Sportabout 18.7 8 360 175 3.15 3.440 17.02 0 0 3 2 5 13.7
Valiant 18.1 6 225 105 2.76 3.460 20.22 1 0 3 1 5 13.1

How to use a short script to eliminate all but one duplicate column variables based on the prefix of the colname

I want to know to use a short script to eliminate all but one duplicate column variables based on the prefix of the colname without inputting the variables I want to remove by hand.
For example, I created repeats of the mtcars$am variables, called am1, am2, am3, and am4 in a data frame called mtcars_example_2. I removed the original am variable in the mtcars_example_2 data frame.
I can use the script below to eliminate all variables with the prefix "am" but the am1 variable into a new variable called mtcars_example_3 using the code below, which inputs all variables to remove by hand:
## long way of removing all variable with am prefix that were not am1
mtcars_example_3 <-
mtcars_example_2 %>%
select(
-c(
"am2", "am3", "am4"
)
)
But this seems like the long way of doing this. Is there a faster way that does not require me to individual type in the names of each of the variables that I want to remove from the data.
Is this possible? If so, how can this be done?
Thanks ahead of time.
Here is the code for the example:
# example data
## loads packages
library(tidyverse)
## creates mtcars_example data
mtcars_example_1 <- data.frame(mtcars)
mtcars_example_2 <- data.frame(mtcars_example_1)
## creates duplicate variables, based on am variable
mtcars_example_2$am1 <- mtcars_example_1$am
mtcars_example_2$am2 <- mtcars_example_1$am
mtcars_example_2$am3 <- mtcars_example_1$am
mtcars_example_2$am4 <- mtcars_example_1$am
## removes original variable
mtcars_example_2 <-
mtcars_example_2 %>%
select(
-c(
"am"
)
)
## long way of removing all variable with am prefix that were not am1
mtcars_example_3 <-
mtcars_example_2 %>%
select(
-c(
"am2", "am3", "am4"
)
)
You can remove all the variables that start with am but keep am1 :
library(dplyr)
mtcars_example_2 %>% select(-starts_with('am'), am1) %>% head
# mpg cyl disp hp drat wt qsec vs gear carb am1
#Mazda RX4 21.0 6 160 110 3.90 2.620 16.46 0 4 4 1
#Mazda RX4 Wag 21.0 6 160 110 3.90 2.875 17.02 0 4 4 1
#Datsun 710 22.8 4 108 93 3.85 2.320 18.61 1 4 1 1
#Hornet 4 Drive 21.4 6 258 110 3.08 3.215 19.44 1 3 1 0
#Hornet Sportabout 18.7 8 360 175 3.15 3.440 17.02 0 3 2 0
#Valiant 18.1 6 225 105 2.76 3.460 20.22 1 3 1 0
Depending on your actual scenario you can also use regex to remove columns.
mtcars_example_2 %>% select(-matches('am[2-4]')) %>% head
We could also do
library(dplyr)
mtcars_example_2 %>%
select(-contains('am'), am1)

Apply variable function to columns in data.table

I'm wondering if there's a way to apply a function in a string variable to .SD cols in a data.table.
I can generalize all other parts of function calls using a data.table, including input and output columns, which I'm very happy about. But the final piece seems to be applying a variable function to a data.table, which is something I believe I've done before with dplyr and do.call.
mtcars <- as.data.table(mtcars)
returnNames <- "calculatedColumn"
SDnames <- c("mpg","hp")
myfunc <- function(data) {
print(data)
return(data[,1]*data[,2])
}
This obviously works:
mtcars[,eval(returnNames) := myfunc(.SD),.SDcols = SDnames,by = cyl]
But if I want to apply a dynamic function, something like this does not work:
functionCall <- "myfunc"
mtcars[,eval(returnNames) := lapply(.SD,eval(functionCall)),.SDcols = SDnames,by = cyl]
I get this error:
Error in `[.data.table`(mtcars, , `:=`(eval(returnNames), lapply(.SD, : attempt to apply non-function
Is using "apply" with "eval" the right idea, or am I on the wrong track entirely?
You don't want lapply. Since myfunc takes a data.table with multiple columns, you just want to feed such a data table into the function as one object.
To get the function you need get instead of eval
On the left-hand-side of :=, you can just put the character vector in parentheses, eval isn't needed
-
mtcars[, (returnNames) := get(functionCall)(.SD)
, .SDcols = SDnames
, by = cyl]
head(mtcars)
# mpg cyl disp hp drat wt qsec vs am gear carb calculatedColumn
# 1: 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4 2310.0
# 2: 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4 2310.0
# 3: 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1 2120.4
# 4: 21.4 6 258 110 3.08 3.215 19.44 1 0 3 1 2354.0
# 5: 18.7 8 360 175 3.15 3.440 17.02 0 0 3 2 3272.5
# 6: 18.1 6 225 105 2.76 3.460 20.22 1 0 3 1 1900.5
The code above was run after the following code
mtcars <- as.data.table(mtcars)
returnNames <- "calculatedColumn"
SDnames <- c("mpg","hp")
myfunc <- function(data) {
print(data)
return(data[,1]*data[,2])
}
functionCall <- "myfunc"

Indexing by column name to the end of the dataframe - R

I'm wondering if there is a way to select a group of columns by the name of the first column in the group and then all the next columns either a) to the end of the data frame, or b) to another column, also using its name.
a) As an example for the first question, in the mtcars dataset, is there a way to select the columns from drat to the end of the data frame? (Something like mtcars[,'drat':ncol(mtcars)])
b) For the second question, is there a way to select the columns starting at cyl and ending at wt? (Something like mtcars[,'cyl':'wt'])
Many elegant solutions already provided but one can even use base-R to get the desired result using which as:
Ans a:
mtcars[,which(names(mtcars) == "drat"):ncol(mtcars)]
Ans b:
mtcars[,which(names(mtcars) == "cyl"):which(names(mtcars) == "wt")]
# cyl disp hp drat wt
#Mazda RX4 6 160.0 110 3.90 2.620
#Mazda RX4 Wag 6 160.0 110 3.90 2.875
#Datsun 710 4 108.0 93 3.85 2.320
#Hornet 4 Drive 6 258.0 110 3.08 3.215
#Hornet Sportabout 8 360.0 175 3.15 3.440
#......so on
We can do with this with select from dplyr
Answer a)
mtcars %>% select(drat:get(last(names(.))))
Answer b)
mtcars %>% select(cyl:wt)
In dplyr, the select function does exactly this (no quotes needed):
mtcards %>%
select(cyl:wt)
If we need to use a quoted string, convert it to sym (symbol) and then do the evaluation (!!
mtcars %>%
select(!! (rlang::sym("cyl")): !!(rlang::sym("wt")))
It would be when these are stored in an object
a <- "cyl"
b <- "wt"
mtcars %>%
select(!! (rlang::sym(a)): !!(rlang::sym(b)))
Or another option is
mtcars %>%
select(!! rlang::parse_expr(glue::glue("{a}:{b}")))

Predicting data via regression model and storing in a vector

Apologies for what is probably a very basic question.
I have created a linear model for a massive meteorological dataset using multiple regression. My goal is to use that model to "predict" data during a certain period using predictors 1, 2 and 3. I will then compare those predicted data to the observed data for that period.
My approach thus far has been to create a new vector for the predicted values and loop through the vector, creating predicted values based on the extracted coefficients of the linear model. Then, I will simply subtract the predicted values from the observed values. For some reason, this approach results in the new predicted vector being NULL. Any idea how I could approach this?
A sample is below. "data" refers to the dataset containing the predictors.
coef <- coefficients(multipleRegressionModel)
predictedValues=c()
for(i in 1:length(data$timePeriod)){
predictedValues[i] = append(predictedValues, data$coef[1]+data$predictor1[i]*data$coef[2]+data$predictor2[i]*data$coef[3]+
data$predictor3[i]*data$coef[4])
}
diff=c()
diff=observedValues - predictedValues
It looks like you are making this more difficult than it needs to be. R has a predict() function that does all of this for you. If you had a sample data.frame like so:
set.seed(26)
mydf = data.frame (a=1:20 , b = rnorm(20),
c = 1:20 + runif(20,2,3)*runif(20, 2, 3),
d = 1:20 + rpois(20,5)*runif(1:20)*sin(1:20))
And you wanted to train on some rows, and test on the others
trainRows<-sample(1:20, 16)
mydf.train<-mydf[trainRows,]
mydf.test<-mydf[-trainRows,]
Then fit the model and predict
model<-lm(a~b+c+d, data = mydf.train)
summary(model) #gives info about your model.
mydf.test$pred<-predict(model1, newdata = mydf.test)
MSE<-mean((mydf.test$pred-mydf.test$a)^2) #calculate mean squared error
MSE
#[1] 0.06321
View the predictions with mydf.test$pred
Here is a simple example using a glm on the mtcars data.
Line<- #setting up the linear model function
function (train_dat, test_dat, variables, y_var, family = "gaussian")
{
fm <- as.formula(paste(y_var, " ~", paste(variables, collapse = "+"))) #formula
glm1 <- glm(fm, data = train_dat, family = family) #run the model
pred <- predict(glm1, newdata = test_dat) #predict the model
}
data(mtcars)
y_var<-'mpg'
x_vars<-setdiff(names(mtcars),y_var)
mtcars[,'linear_prediction']<-Line(mtcars,mtcars,x_vars,y_var)
head(mtcars)
mpg cyl disp hp drat wt qsec vs am gear carb linear_prediction
Mazda RX4 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4 22.59951
Mazda RX4 Wag 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4 22.11189
Datsun 710 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1 26.25064
Hornet 4 Drive 21.4 6 258 110 3.08 3.215 19.44 1 0 3 1 21.23740
Hornet Sportabout 18.7 8 360 175 3.15 3.440 17.02 0 0 3 2 17.69343
Valiant 18.1 6 225 105 2.76 3.460 20.22 1 0 3 1 20.38304

Resources