Writing a function in R to plot ROC curve using pROC - r

I'm trying to write a function to plot ROC curves based on different scoring systems I have to predict an outcome.
I have a dataframe data_all, with columns "score_1" and "Threshold.2000". I generate a ROC curve as desired with the following:
plot.roc(data_all$Threshold.2000, data_all$score_1)
My goal is to generate a ROC curve for a number of different outcomes (e.g. Threshold.1000) and scores (score_1, score_2 etc), but am initially trying to set it up just for different scores. My function is as follows:
roc_plot <- function(dataframe_of_interest, score_of_interest) {
plot.roc(dataframe_of_interest$Threshold.2000, dataframe_of_interest$score_of_interest)}
I get the following error: Error in roc.default(x, predictor, plot =
TRUE, ...) : No valid data provided.
I'd be very grateful if someone can spot why my function doesn't work! I'm a python coder and new-ish to R, and haven't had much luck trying a number of different things. Thanks very much.
EDIT:
Here is the same example with mtcars so it's reproducible:
data(mtcars)
plot.roc(mtcars$vs, mtcars$mpg) # --> makes correct graph
roc_plot <- function(dataframe_of_interest, score_of_interest) {
plot.roc(dataframe_of_interest$mpg, dataframe_of_interest$score_of_interest)}
Outcome:
Error in roc.default(x, predictor, plot = TRUE, ...) : No valid data provided.
roc_plot(mtcars, vs)

Here's one solution that works as desired (i.e. lets the user specify different values for score_of_interest):
library(pROC)
data(mtcars)
plot.roc(mtcars$vs, mtcars$mpg) # --> makes correct graph
# expects `score_of_interest` to be a string!!!
roc_plot <- function(dataframe_of_interest, score_of_interest) {
plot.roc(dataframe_of_interest$vs, dataframe_of_interest[, score_of_interest])
}
roc_plot(mtcars, 'mpg')
roc_plot(mtcars, 'cyl')
Note that your error was not resulting from an incorrect column name, it was resulting from an incorrect use of the data.frame class. Notice what happens with a simpler function:
foo <- function(x, col_name) {
head(x$col_name)
}
foo(mtcars, mpg)
## NULL
This returns NULL. So in your original function when you tried to supply plot.roc with dataframe_of_interest$score_of_interest you were actually feeding plot.roc a NULL.
There are several ways to extract a column from a data.frame by the column name when that name is stored in an object (which is what you're doing when you pass it as an argument in a function). Perhaps the easiest way is to remember that a data.frame is like a 2D array-type object and so we can use familiar object[i, j] syntax, but we ask for all rows and we specify the column by name, e.g., mtcars[, 'mpg']. This still works if we assign the string 'mpg' to an object:
x <- 'mpg'
mtcars[, x]
So that's how I produced my solution. Going a step further, it's not hard to imagine how you would be able to supply both a score_of_interest and a threshold_of_interest:
roc_plot2 <- function(dataframe_of_interest, threshold_of_interest, score_of_interest) {
plot.roc(dataframe_of_interest[, threshold_of_interest],
dataframe_of_interest[, score_of_interest])
}
roc_plot2(mtcars, 'vs', 'mpg')

Related

Writing a function to produce a Kaplan-Meier curve

I am trying to write a function that spits out a KM survival curve. I am going to use this in a ShineyApp which is why I want to write a function so I can easily pass in arguments from a dropdown menu (which will input as a string into the strata argument). Here is a simplified version of what I need:
survival_function <- function(data_x, strata_x="1"){
survFormula <- Surv(data_x$time, data_x$status)
my_survfit <- survfit(data=data_x, as.formula(paste("survFormula~", {{strata_x}})))
ggsurvplot(my_survfit, data = data_x, pval=T)
}
survival_function(inputdata, "strata_var")
I get an error:
Error in paste("survFormula1~", { : object 'strata_x' not found
I'm at a loss because
as.formula(paste("~", {{arg}}))
has worked in other functions I've written to produce plots using ggplot to easily change variables to facet by, but this doesn't even seem to recognize strata_x as an argument.
Your function needs a couple of tweaks to get it working with ggsurvplot. It would be best to create the Surv object as a new column in the data frame and use this column in your formula. You also need to make sure you have an actual symbolic formula as the $call$formula member of the survfit object, otherwise ggsurvplot will fail to work due to non-standard evaluation deep within its internals.
library(survival)
library(survminer)
survival_function <- function(data_x, strata_x) {
data_x$s <- Surv(data_x$time, data_x$status)
survFormula <- as.formula(paste("s ~", strata_x))
my_survfit <- survfit(survFormula, data = data_x)
my_survfit$call$formula <- survFormula
ggsurvplot(my_survfit, data = data_x)
}
We can test this on the included lung data set:
survival_function(lung, "sex")
Created on 2022-08-03 by the reprex package (v2.0.1)

What does "invalid type (closure) for variable 'variable1'" mean and how do I fix it?

I am trying to write a function in R, which contains a function from another package. The code works perfectly outside a function.
I am guessing, it might have got to do something with the package I am using (survey).
A self-contained code example:
#activating the package
library(survey)
#getting the dataset into R
tm <- read.spss("tm.sav", to.data.frame = T, max.value.labels = 5)
# creating svydesign object (it basically contains the weights to adjust the variables (~persgew: also a column variable contained in the tm-dataset))
tm_w <- svydesign(ids=~0, weights = ~persgew, data = tm)
#getting overview of the welle-variable
#this variable is part of the tm-dataset. it is needed to execute the following steps
table(tm$welle)
# data manipulation as in: taking the v12d_gr-variable as well as the welle-variable and the svydesign-object to create a longitudinal variable which is transformed into a data frame that can be passed to ggplot
t <- svytable(~v12d_gr+welle, tm_w)
tt <- round(prop.table(t,2)*100, digits=0)
v12d <- tt[2,]
v12d <- as.data.frame(v12d)
this is the code outside the function, working perfectly. since I have to transform quite a few variables in the exact same way, I aim to create a function to save up some time.
The following function is supposed to take a variable that will be transformed as an argument (v12sd2_gr).
#making sure the survey-object is loaded
tm_w <- svydesign(ids=~0, weights = ~persgew, data = data)
#trying to write a function containing the code from above
ltd_zsw <- function(variable1){
t <- svytable(~variable1+welle, tm_w)
tt <- round(prop.table(t,2)*100, digits=0)
var_ltd_zsw <- tt[2,]
var_ltd_zsw <- as.data.frame(var_ltd_zsw)
return(var_ltd_zsw)
}
Calling the function:
#as v12d has been altered already, I am trying to transform another variable v12sd2_gr
v12sd2 <- ltd_zsw(v12sd2_gr)
Console output:
Error in model.frame.default(formula = weights ~ variable1 + welle, data = model.frame(design)) :
invalid type (closure) for variable 'variable1'
Called from: model.frame.default(formula = weights ~ variable1 + welle, data = model.frame(design))
How do I fix it? And what does it mean to dynamically build a formula and reformulating?
PS: I hope it is the appropriate way to answer to the feedback in the comments.
Update: I think I was able to trace the problem back to the argument I am passing (variable1) and I am guessing it has got something to do with the fact, that I try to call a formula within the function. But when I try to call the svytable with as.formula(svytable(~variable1+welle, tm_w))it still doesn't work.
What to do?
I have found a solution to the problem.
Here is the tested and working function:
ltd_test <- function (var, x, string1="con", string2="pro") {
print (table (var))
x$w12d_gr <- ifelse(as.numeric(var)>2,1,0)
x$w12d_gr <- factor(x$w12d_gr, levels = c(0,1), labels = c(string1,string2))
print (table (x$w12d_gr))
x_w <- svydesign(ids=~0, weights = ~persgew, data = x)
t <- svytable(~w12d_gr+welle, x_w)
tt <- round(prop.table(t,2)*100, digits=0)
w12d <- tt[2,]
w12d <- as.data.frame(w12d)
}
The problem appeared to be caused by the svydesgin()-fun. In its output it produces an object which is then used by the formula for svytable()-fun. Thats why it is imperative to first create the x_w-object with svydesgin() and then use the svytable()-fun to create the t-object.
Within the code snippet I posted originally in the question the tm_w-object has been created and stored globally.
Thanks for the help to everyone. I hope this is gonna be of use to someone one day!

R: How to fit gamlss in a foor loop with a variable (character)

I have a tricky problem. I have a dataframe with more than 1000 variables and want to fit each variable to age using fp smoothing function. I know how to use gamlss() for a specific variable (vari), but that's not practical to repeat this explicitly for more than 1000 times. Moreover, I want to plot the fitting for all 1000 variable in a single figure. What I did is:
variables <- colnames(data)[7:dim(data)[2]]
for(vari in variables) {
print("ROI is:")
print(vari)
model_fem <- gamlss(vari ~ fp(age), family=GG, data=females)
But I got errors:
Error in model.frame.default(formula = vari ~ fp(age), data = females) :
variable lengths differ (found for 'fp(age)')
I think the tricky part is from fp(). I have tried to use as.formula, it didn't work. Also because females$vari return NULL, that's why we got this error.
Do you have any solution for this?
Thank you
Character values are very different from formuals. Formulas contain symbols and you need to properly rebuild them to make them dynamic. There are lots of different ways to do that, but here's one that uses reformulate to turn characters into formulas and update() to modify a base formula.
variables <- colnames(data)[7:dim(data)[2]]
form_resp <- ~ fp(age)
for(vari in variables) {
print("ROI is:")
form_model <- update(form, reformulate(".", response=vari))
print(form_model)
model_fem <- gamlss(form_model, family=GG, data=females)
}

Pick a function by user input

I've a list of functions and I'd like to pick one of them by user input, use it for a regression and then display the output of the function summary and plot.
re_show<-function(y){
f1<-x+I(x^2)
f2<-I(x^0.5)+I(x^2)
...
f20<-x+I(x^0.5)+I(x^2)
message("Choose the model")
i <- readLines(n = 1)
summary(lm(y~i))
plot(lm(y~i))
}
Have you got any ideas about how to solve this problem?
Thank you.
Well here is the most generalized answer I can provide for your function. I have altered it to accept a dependent variable, an independent variable, and a dataset. If you do not like this option, you can always set the parameters to a default value or modify the function to your particular scenario. This function, as it stands, will allow you to use any dataset you wish (given no provided sample data). I have also added a switch statement to allow the user to choose which model to use from stdin. A simple example is shown with the iris dataset.
re_show<-function(dv, iv, dat){
# Define your variables to evaluate
x <- dat[,iv]
y <- dat[,dv]
# choose the function
message("Choose the model")
fun <- readLines(n = 1)
# The switch statement
i <- switch(fun,
f1 = {x+I(x^2)},
f2 = {I(x^0.5)+I(x^2)},
# add your remaining functions
f20 = {x+I(x^0.5)+I(x^2)}
)
# finish your analysis
print(summary(lm(y~i)))
plot(lm(y~i))
}
data(iris)
re_show("Sepal.Length", "Sepal.Width", iris)

Dynamic time-series prediction and rollapply

I am trying to get a rolling prediction of a dynamic timeseries in R (and then work out squared errors of the forecast). I based a lot of this code on this StackOverflow question, but I am very new to R so I am struggling quite a bit. Any help would be much appreciated.
require(zoo)
require(dynlm)
set.seed(12345)
#create variables
x<-rnorm(mean=3,sd=2,100)
y<-rep(NA,100)
y[1]<-x[1]
for(i in 2:100) y[i]=1+x[i-1]+0.5*y[i-1]+rnorm(1,0,0.5)
int<-1:100
dummydata<-data.frame(int=int,x=x,y=y)
zoodata<-as.zoo(dummydata)
prediction<-function(series)
{
mod<-dynlm(formula = y ~ L(y) + L(x), data = series) #get model
nextOb<-nrow(series)+1
#make forecast
predicted<-coef(mod)[1]+coef(mod)[2]*zoodata$y[nextOb-1]+coef(mod)[3]*zoodata$x[nextOb-1]
#strip timeseries information
attributes(predicted)<-NULL
return(predicted)
}
rolling<-rollapply(zoodata,width=40,FUN=prediction,by.column=FALSE)
This returns:
20 21 ..... 80
10.18676 10.18676 10.18676
Which has two problems I was not expecting:
Runs from 20->80, not 40->100 as I would expect (as the width is 40)
The forecasts it gives out are constant: 10.18676
What am I doing wrong? And is there an easier way to do the prediction than to write it all out? Thanks!
The main problem with your function is the data argument to dynlm. If you look in ?dynlm you will see that the data argument must be a data.frame or a zoo object. Unfortunately, I just learned that rollapply splits your zoo objects into array objects. This means that dynlm, after noting that your data argument was not of the right form, searched for x and y in your global environment, which of course were defined at the top of your code. The solution is to convert series into a zoo object. There were a couple of other issues with your code, I post a corrected version here:
prediction<-function(series) {
mod <- dynlm(formula = y ~ L(y) + L(x), data = as.zoo(series)) # get model
# nextOb <- nrow(series)+1 # This will always be 21. I think you mean:
nextOb <- max(series[,'int'])+1 # To get the first row that follows the window
if (nextOb<=nrow(zoodata)) { # You won't predict the last one
# make forecast
# predicted<-coef(mod)[1]+coef(mod)[2]*zoodata$y[nextOb-1]+coef(mod)[3]*zoodata$x[nextOb-1]
# That would work, but there is a very nice function called predict
predicted=predict(mod,newdata=data.frame(x=zoodata[nextOb,'x'],y=zoodata[nextOb,'y']))
# I'm not sure why you used nextOb-1
attributes(predicted)<-NULL
# I added the square error as well as the prediction.
c(predicted=predicted,square.res=(predicted-zoodata[nextOb,'y'])^2)
}
}
rollapply(zoodata,width=20,FUN=prediction,by.column=F,align='right')
Your second question, about the numbering of your results, can be controlled by the align argument is rollapply. left would give you 1..60, center (the default) would give you 20..80 and right gets you 40..100.

Resources