Pick a function by user input

Pick a function by user input - r

I've a list of functions and I'd like to pick one of them by user input, use it for a regression and then display the output of the function summary and plot.
re_show<-function(y){
f1<-x+I(x^2)
f2<-I(x^0.5)+I(x^2)
...
f20<-x+I(x^0.5)+I(x^2)
message("Choose the model")
i <- readLines(n = 1)
summary(lm(y~i))
plot(lm(y~i))
}
Have you got any ideas about how to solve this problem?
Thank you.

Well here is the most generalized answer I can provide for your function. I have altered it to accept a dependent variable, an independent variable, and a dataset. If you do not like this option, you can always set the parameters to a default value or modify the function to your particular scenario. This function, as it stands, will allow you to use any dataset you wish (given no provided sample data). I have also added a switch statement to allow the user to choose which model to use from stdin. A simple example is shown with the iris dataset.
re_show<-function(dv, iv, dat){
# Define your variables to evaluate
x <- dat[,iv]
y <- dat[,dv]
# choose the function
message("Choose the model")
fun <- readLines(n = 1)
# The switch statement
i <- switch(fun,
f1 = {x+I(x^2)},
f2 = {I(x^0.5)+I(x^2)},
# add your remaining functions
f20 = {x+I(x^0.5)+I(x^2)}
)
# finish your analysis
print(summary(lm(y~i)))
plot(lm(y~i))
}
data(iris)
re_show("Sepal.Length", "Sepal.Width", iris)

Related

What does "invalid type (closure) for variable 'variable1'" mean and how do I fix it?

I am trying to write a function in R, which contains a function from another package. The code works perfectly outside a function.
I am guessing, it might have got to do something with the package I am using (survey).
A self-contained code example:
#activating the package
library(survey)
#getting the dataset into R
tm <- read.spss("tm.sav", to.data.frame = T, max.value.labels = 5)
# creating svydesign object (it basically contains the weights to adjust the variables (~persgew: also a column variable contained in the tm-dataset))
tm_w <- svydesign(ids=~0, weights = ~persgew, data = tm)
#getting overview of the welle-variable
#this variable is part of the tm-dataset. it is needed to execute the following steps
table(tm$welle)
# data manipulation as in: taking the v12d_gr-variable as well as the welle-variable and the svydesign-object to create a longitudinal variable which is transformed into a data frame that can be passed to ggplot
t <- svytable(~v12d_gr+welle, tm_w)
tt <- round(prop.table(t,2)*100, digits=0)
v12d <- tt[2,]
v12d <- as.data.frame(v12d)
this is the code outside the function, working perfectly. since I have to transform quite a few variables in the exact same way, I aim to create a function to save up some time.
The following function is supposed to take a variable that will be transformed as an argument (v12sd2_gr).
#making sure the survey-object is loaded
tm_w <- svydesign(ids=~0, weights = ~persgew, data = data)
#trying to write a function containing the code from above
ltd_zsw <- function(variable1){
t <- svytable(~variable1+welle, tm_w)
tt <- round(prop.table(t,2)*100, digits=0)
var_ltd_zsw <- tt[2,]
var_ltd_zsw <- as.data.frame(var_ltd_zsw)
return(var_ltd_zsw)
}
Calling the function:
#as v12d has been altered already, I am trying to transform another variable v12sd2_gr
v12sd2 <- ltd_zsw(v12sd2_gr)
Console output:
Error in model.frame.default(formula = weights ~ variable1 + welle, data = model.frame(design)) :
invalid type (closure) for variable 'variable1'
Called from: model.frame.default(formula = weights ~ variable1 + welle, data = model.frame(design))
How do I fix it? And what does it mean to dynamically build a formula and reformulating?
PS: I hope it is the appropriate way to answer to the feedback in the comments.
Update: I think I was able to trace the problem back to the argument I am passing (variable1) and I am guessing it has got something to do with the fact, that I try to call a formula within the function. But when I try to call the svytable with as.formula(svytable(~variable1+welle, tm_w))it still doesn't work.
What to do?

I have found a solution to the problem.
Here is the tested and working function:
ltd_test <- function (var, x, string1="con", string2="pro") {
print (table (var))
x$w12d_gr <- ifelse(as.numeric(var)>2,1,0)
x$w12d_gr <- factor(x$w12d_gr, levels = c(0,1), labels = c(string1,string2))
print (table (x$w12d_gr))
x_w <- svydesign(ids=~0, weights = ~persgew, data = x)
t <- svytable(~w12d_gr+welle, x_w)
tt <- round(prop.table(t,2)*100, digits=0)
w12d <- tt[2,]
w12d <- as.data.frame(w12d)
}
The problem appeared to be caused by the svydesgin()-fun. In its output it produces an object which is then used by the formula for svytable()-fun. Thats why it is imperative to first create the x_w-object with svydesgin() and then use the svytable()-fun to create the t-object.
Within the code snippet I posted originally in the question the tm_w-object has been created and stored globally.
Thanks for the help to everyone. I hope this is gonna be of use to someone one day!

Converting a R2jags object into a Stanreg (rstanarm) object

I made a model using R2jags. I like the jags syntax but I find the output produced by R2jags not easy to use. I recently read about the rstanarm package. It has many useful functions and is well supported by the tidybayes and bayesplot packages for easy model diagnostics and visualisation. However, I'm not a fan of the syntax used to write a model in rstanarm. Ideally, I would like to get the best of the two worlds, that is writing the model in R2jags and convert the output into a Stanreg object to use rstanarm functions.
Is that possible? If so, how?

I think then question isn't necessarily whether or not it's possible - I suspect it probably is. The question really is how much time you're prepared to spend doing it. All you'd have to do is try to replicate in structure the object that gets created by rstanarm, to the extent that it's possible with the R2jags output. That would make it so that some post-processing tasks would probably work.
If I might be so bold, I suspect a better use of your time would be to turn the R2jags object into something that could be used with the post-processing functions you want to use. For example, it only takes a small modification to the JAGS output to make all of the mcmc_*() plotting functions from bayesplot work. Here's an example. Below is the example model from the jags() function help.
# An example model file is given in:
model.file <- system.file(package="R2jags", "model", "schools.txt")
# data
J <- 8.0
y <- c(28.4,7.9,-2.8,6.8,-0.6,0.6,18.0,12.2)
sd <- c(14.9,10.2,16.3,11.0,9.4,11.4,10.4,17.6)
jags.data <- list("y","sd","J")
jags.params <- c("mu","sigma","theta")
jags.inits <- function(){
list("mu"=rnorm(1),"sigma"=runif(1),"theta"=rnorm(J))
}
jagsfit <- jags(data=jags.data, inits=jags.inits, jags.params,
n.iter=5000, model.file=model.file, n.chains = 2)
Now, what the mcmc_*() plotting functions from bayesplot expect is a list of matrices of MCMC draws where the column names give the name of the parameter. By default, jags() puts all of them into a single matrix. In the above case, there are 5000 iterations in total, with 2500 as burnin (leaving 2500 sampled) and the n.thin is set to 2 in this case (jags() has an algorithm for identifying the thinning parameter), but in any case, the jagsfit$BUGSoutput$n.keep element identifies how many iterations are kept. In this case, it's 1250. So you could use that to make a list of two matrices from the output.
jflist <- list(jagsfit$BUGSoutput$sims.matrix[1:jagsfit$BUGSoutput$n.keep, ],
jagsfit$BUGSoutput$sims.matrix[(jagsfit$BUGSoutput$n.keep+1):(2*jagsfit$BUGSoutput$n.keep), ])
Now, you'd just have to call some of the plotting functions:
mcmc_trace(jflist, regex_pars="theta")
or
mcmc_areas(jflist, regex_pars="theta")
So, instead of trying to replicate all of the output that rstanarm produces, it might be a better use of your time to try to bend the jags output into a format that would be amenable to the post-processing functions you want to use.
EDIT - added possibility for pp_check() from bayesplot.
The posterior draws of y in this case are in the theta parameters. So, we make an object that has elements y and yrep and make it of class foo
x <- list(y = y, yrep = jagsfit$BUGSoutput$sims.list$theta)
class(x) <- "foo"
We can then write a pp_check method for objects of class foo. This come straight out of the help file for bayesplot::pp_check().
pp_check.foo <- function(object, ..., type = c("multiple", "overlaid")) {
y <- object[["y"]]
yrep <- object[["yrep"]]
switch(match.arg(type),
multiple = ppc_hist(y, yrep[1:min(8, nrow(yrep)),, drop = FALSE]),
overlaid = ppc_dens_overlay(y, yrep[1:min(8, nrow(yrep)),, drop = FALSE]))
}
Then, just call the function:
pp_check(x, type="overlaid")

Writing a function in R to plot ROC curve using pROC

I'm trying to write a function to plot ROC curves based on different scoring systems I have to predict an outcome.
I have a dataframe data_all, with columns "score_1" and "Threshold.2000". I generate a ROC curve as desired with the following:
plot.roc(data_all$Threshold.2000, data_all$score_1)
My goal is to generate a ROC curve for a number of different outcomes (e.g. Threshold.1000) and scores (score_1, score_2 etc), but am initially trying to set it up just for different scores. My function is as follows:
roc_plot <- function(dataframe_of_interest, score_of_interest) {
plot.roc(dataframe_of_interest$Threshold.2000, dataframe_of_interest$score_of_interest)}
I get the following error: Error in roc.default(x, predictor, plot =
TRUE, ...) : No valid data provided.
I'd be very grateful if someone can spot why my function doesn't work! I'm a python coder and new-ish to R, and haven't had much luck trying a number of different things. Thanks very much.
EDIT:
Here is the same example with mtcars so it's reproducible:
data(mtcars)
plot.roc(mtcars$vs, mtcars$mpg) # --> makes correct graph
roc_plot <- function(dataframe_of_interest, score_of_interest) {
plot.roc(dataframe_of_interest$mpg, dataframe_of_interest$score_of_interest)}
Outcome:
Error in roc.default(x, predictor, plot = TRUE, ...) : No valid data provided.
roc_plot(mtcars, vs)

Here's one solution that works as desired (i.e. lets the user specify different values for score_of_interest):
library(pROC)
data(mtcars)
plot.roc(mtcars$vs, mtcars$mpg) # --> makes correct graph
# expects `score_of_interest` to be a string!!!
roc_plot <- function(dataframe_of_interest, score_of_interest) {
plot.roc(dataframe_of_interest$vs, dataframe_of_interest[, score_of_interest])
}
roc_plot(mtcars, 'mpg')
roc_plot(mtcars, 'cyl')
Note that your error was not resulting from an incorrect column name, it was resulting from an incorrect use of the data.frame class. Notice what happens with a simpler function:
foo <- function(x, col_name) {
head(x$col_name)
}
foo(mtcars, mpg)
## NULL
This returns NULL. So in your original function when you tried to supply plot.roc with dataframe_of_interest$score_of_interest you were actually feeding plot.roc a NULL.
There are several ways to extract a column from a data.frame by the column name when that name is stored in an object (which is what you're doing when you pass it as an argument in a function). Perhaps the easiest way is to remember that a data.frame is like a 2D array-type object and so we can use familiar object[i, j] syntax, but we ask for all rows and we specify the column by name, e.g., mtcars[, 'mpg']. This still works if we assign the string 'mpg' to an object:
x <- 'mpg'
mtcars[, x]
So that's how I produced my solution. Going a step further, it's not hard to imagine how you would be able to supply both a score_of_interest and a threshold_of_interest:
roc_plot2 <- function(dataframe_of_interest, threshold_of_interest, score_of_interest) {
plot.roc(dataframe_of_interest[, threshold_of_interest],
dataframe_of_interest[, score_of_interest])
}
roc_plot2(mtcars, 'vs', 'mpg')

Programmatically detect function calls in R formulae, e.g. y ~ x + log(z), and surround them in backticks

Let me explain my goal first because while the title expresses my strategy, I don't think it is likely to be the only way to solve the problem.
I have an R function to which I pass fitted model objects, like those from lm, and the function extracts the model frame, saves that as a data frame, standardizes the variables in the new data frame, then refits the model with the standardized variables to ease the interpretation of the model's coefficients.
Example code without wrapping it in a function:
mod <- lm(mpg ~ wt, data = mtcars)
new_data <- model.frame(mod)
new_data <- data.frame(lapply(new_data, FUN = scale))
standardized_mod <- update(mod, data = new_data)
Now a summary of standardized_mod by virtue of being fitted with standardized data will give standardized coefficients.
This isn't the most efficient way of doing things, I admit, since I could do something like multiplying the estimates and SEs by each variable's standard deviation. But in the context of the function, I'm trying to be more flexible; this gets less straightforward when working with survey package objects and the like. I also use the same logic to fit models with interaction terms for simple slopes analysis. But this is besides the main point of the question, I just want to offer some explanation to avoid getting bogged down with "there's other ways to standardize coefficients" responses. I'm more interested in this general problem with formulae than the specific application.
The solution above falls apart when a function is applied to any of the variables. For example,
mod <- lm(mpg ~ log(wt), data = mtcars)
new_data <- model.frame(mod)
new_data <- data.frame(lapply(new_data, FUN = scale), check.names = FALSE)
standardized_mod <- update(mod, data = new_data)
This will break on update(mod, data = new_data), because lm is going to look for a column called wt to apply log to in new_data, which only has columns called mpg and log(wt).
What I would like to do is manipulate the model formula in such a way that it goes from mpg ~ log(data) to mpg ~ `log(data)`. Of course, if it was just log I was worried about, I might be able to get something really hacky going to address it. But I'd like to be able to do the same regardless of the function in the formula, like if it's poly or some such.
Here are some solutions I've considered:
Instead of update, re-fit the model with lm directly and use the . for the RHS of the formula. This would work for some cases, but has big drawbacks, too. This will ignore any interaction terms in the original formula or other arithmetic uses of the formula from the original model. It also won't fix the problem if the function was applied to the LHS of the formula in the original model.
Use some kind of convoluted regex matching to isolate terms that appear to be functions on the basis of being right before (, but as a general rule I'm fearful of using string manipulation since it may fail in confusing ways. I'm not completely ruling this route out, but I haven't wrapped my head around how to do it safely and am not sure how to match terms with functions without accidentally capturing other parts of the formula.
I've tried messing around with the terms object and trying to use that as a way to use update on the formula itself, but haven't had much luck figuring out how to edit the terms object in the right ways.

We can avoid having to re-create the formula like this. mm0 is the model matrix columns except for the intercept. scale that giving mm0_std0. Now compute the new standardized lm:
mod <- lm(mpg ~ log(wt) * qsec, data = mtcars)
response <- mod$model[1]
mm0 <- model.matrix(mod)[, -1]
mm0_std <- scale(mm0)
mod_std <- lm(cbind(response, mm0_std))
If you do want the formula this will give it:
formula(mod_std)
## mpg ~ `log(wt)` + qsec + `log(wt):qsec`
## <environment: 0x000000000b1988c8>

I've thought of another potential solution as well, but I've not extensively tested it and it uses regex, which is in my understanding not the most R way of doing things.
mod <- lm(mpg ~ log(wt) * qsec, data = mtcars)
new_data <- model.frame(mod)
new_data <- data.frame(lapply(new_data, FUN = scale), check.names = FALSE)
We have the usual start, above.
Now I pull the variable names from the terms object.
vars <- as.character(attributes(terms(mod))$variables)
vars <- vars[-1] # gets rid of "list"
And save the full formula as a string.
char_form <- as.character(deparse(formula(mod)))
Now I iterate through the variables and use regex to surround each one in backticks. This gets around the trickier regex I was worried about with regard to detect which variables had functions applied.
for (var in vars) {
backtick_name <- paste("`", var, "`", sep = "")
char_form <- gsub(var, backtick_name, char_form, fixed = TRUE)
}
If I want to specify a variable not to standardize, like the outcome variable, I can exclude it from the vars vector programmatically. For instance, I can do this:
response <- as.character(formula(mod))[2]
vars <- vars[vars != response]
Of course, we can remove the response by dropping the first item in the list, but the above is for demonstrative purposes.
Now I can refit the model with the new data and new formula.
new_model <- update(mod, formula = as.formula(char_form), data = new_data)
In this narrow case, I don't really need to use update since I have all I need for lm. But if I was starting with a glm object or some other model, other user-supplied arguments like family are preserved.
Note: Weights and offsets can be problematic here, but it's not an intractable problem. I think the most straightforward thing to do is explicitly exclude columns named "(weights)" and "(offset)" from the model frame before scaling, then cbinding it back together afterwards. Then the user can use conditionals or some such to decide when to supply weights = `(weights)` and offset = `(offset)` arguments to update.

Pass df column names to nested equation in Graph Printing Function

I need some clarification on the primary post on Passing a data.frame column name to a function
I need to create a function that will take a testSet, trainSet, and colName(aka predictor) as inputs to a function that prints a plot of the dataset with a GAM model trend line.
The issue I run into is:
plot.model = function(predictor, train, test) {
mod = gam(Response ~ s(train[[predictor]], spar = 1), data = train)
...
}
#Function Call
plot.model("Predictor1", 1.0, crime.train, crime.test)
I can't simply pass the predictor as a string into the gam function, but I also can't use a string to index the data frame values as shown in the link above. Somehow, I need to pass the colName key to the game function. This issue occurs in other similar scenarios regarding plotting.
plot <- ggplot(data = test, mapping = aes(x=predictor, y=ViolentCrimesPerPop))
Again, I can't pass a string value for the column name and I can't pass the column values either.
Does anyone have a generic solution for these situations. I apologize if the answer is buried in the above link, but it's not clear to me if it is.
Note: A working gam function call looks like this:
mod = gam(Response ~ s(Predictor1, spar = 1.0), data = train)
Where the train set is a data frame with column names "Response" & "Predictor".

Use aes_string instead of aes when you pass a column name as string.
plot <- ggplot(data = test, mapping = aes_string(x=predictor, y=ViolentCrimesPerPop))
For gam function:: Example which is copied from gam function's documentation. I have used vector, scalar is even easier. Its just using paste with a collapse parameter.
library(mgcv)
set.seed(2) ## simulate some data...
dat <- gamSim(1,n=400,dist="normal",scale=2)
# String manipulate for formula
formula <- as.formula(paste("y~s(", paste(colnames(dat)[2:5], collapse = ")+s("), ")", sep =""))
b <- gam(formula, data=dat)
is same as
b <- gam(y~s(x0)+s(x1)+s(x2)+s(x3),data=dat)