Need some help writing a function - r

I'm trying to write a function that takes a few lines of code and allows me to input a single variable. I've got the code below that creates an object using the Surv function (Survival package). The second line takes the variable in question, in this case a column listed as Variable_X, and outputs data that can then be visualized using ggsurvplot. The output is a Kaplan-Meier survival curve. What I'd like to do is have a function such that i can type f(Variable_X) and have the output KM curve visualized for whichever column I choose from the data. I want f(y) to output the KM as if I had put y where the ~Variable_X currently is. I'm new to R and very new to how functions work, I've tried the below code but it obviously doesn't work. I'm working through datacamp and reading posts but I'm having a hard time with it, appreciate any help.
surv_object <- Surv(time = KMeier_DF$Followup_Duration, event = KMeier_DF$Death_Indicator)
fitX <- survfit(surv_object ~ Variable_X, data = KMeier_DF)
ggsurvplot(fitX, data = KMeier_DF, pval = TRUE)
f<- function(x) {
dat<-read.csv("T:/datafile.csv")
KMeier_DF < - dat
surv_object <- Surv(time = KMeier_DF$Followup_Duration, event =
KMeier_DF$Death_Indicator)
fitX<-survfit(surv_object ~ x, data = KMeier_DF)
PlotX<- ggsurvplot(fitX, data = KMeier_DF, pval = TRUE)
return(PlotX)
}

The crux of the problem you have is actually a tough stumbling block to figure out initially: how to pass variable or dataframe column names into a function. I created some example data. In the example below I supply a function four variables, one of which is your data. You can see two ways I call on the columns, using [[]], and [,], which you can think of as being equivalent to using $. Outside of functions, they are, but not inside. The print functions are there to just show you the data along the way. If those objects exist in your global environment, remove them one by one, rm(surv_object), or clear them all rm(list = ls()).
duration <- c(1, 3, 4, 3, 3, 4, 2)
di <- c(1, 1, 0, 0, 0, 0, 1)
color <- c(1, 1, 2, 2, 3, 3, 4)
KMdf <- data.frame(duration, di, color)
testfun <- function(df, varb1, varb2, varb3) {
surv_object <- Surv(time = df[[varb1]], event = df[ , varb2])
print(surv_object)
fitX <- survfit(surv_object ~ df[[varb3]], data = df)
print(fitX)
# plotx <- ggsurvplot(fitX, data = df, pval = TRUE) # this gives an error that surv_object is not found
# return(plotx)
}
testfun(KMdf, "duration", "di", "color") # notice the use of quotes here, if not you'll get an error about object not found.
And even better, you have an even tougher stumbling block: how r handles variables and where it looks for them. From what I can tell, you're running into that because there is possibly a bug in ggsurvplot and looking at the global environment for variables, and not inside the function. They closed the issue, but as far as I can tell, it's still there. When you try to run the ggsurvplot line, you'll get an error that you would get if you didn't supply a variable:
Error in eval(inp, data, env) : object 'surv_object' not found.
Hopefully that helps. I'd submit a bug report if I were you.
edit
I was hoping this solution would help, but it doesn't.
testfun <- function(df, varb1, varb2, varb3) {
surv_object <- Surv(time = df[[varb1]], event = df[,varb2])
print(surv_object)
fitX <- survfit(surv_object ~ df[[varb3]], data = df)
print(fitX)
attr(fitX[['strata']], "names") <- c("color = 1", "color = 2", "color = 3", "color = 4")
plotx <- ggsurvplot(fitX, data = df, pval = TRUE) # this gives an error that surv_object is not found
return(plotx)
}
Error in eval(inp, data, env) : object 'surv_object' not found

This is homework, right?
First, you need to try to run the code before you provide it as an example. Your example has several fatal errors. ggsurvplot() needs either a library call to survminer or to be summoned as follows: survminer::ggsurvplot().
You have defined a function f, but you never used it. In the function definition, you have a wayward space < -. It never would have worked.
I suggest you start by defining a function that calculates the sum of two numbers, or concatenates two strings. Start here or here. Then, you can return to the Kaplan-Meier stuff.
Second, in another class or two, you will need to know the three parts of a function. You will need to understand the scope of a function. You might as well dig into the basics before you start copy-and-pasting.
Third, before you post another question, please read How to make a great R reproducible example?.
Best of luck.

Related

R - Defining a function which recognises arguments not as objects, but as being part of the call

I'm trying to define a function which returns a graphical object in R. The idea is that I can then call this function with different arguments multiple times using an for loop or lapply function, then plotting the list of grobs in gridExtra::grid.arrange. However, I did not get that far yet. I'm having trouble with r recognising the arguments as being part of the call. I've made some code to show you my problem. I have tried quoting and unquoting the arguments, using unqoute() in the function ("Object not found" error within a user defined function, eval() function?), using eval(parse()) (R - how to filter data with a list of arguments to produce multiple data frames and graphs), using !!, etc. However, I can't seem to get it to work. Does anyone know how I should handle this?
library(survminer)
library(survival)
data_km <- data.frame(Duration1 = c(1,2,3,4,5,6,7,8,9,10),
Event1 = c(1,1,0,1,1,0,1,1,1,1),
Duration2 = c(1,1,2,2,3,3,4,4,5,5),
Event2 = c(1,0,1,0,1,1,1,0,1,1),
Duration3 = c(11,12,13,14,15,16,17,18,19,20),
Event3 = c(1,1,0,1,1,0,1,1,0,1),
Area = c(1,1,1,1,1,2,2,2,2,2))
# this is working perfectly
ggsurvplot(survfit(Surv(Duration1, Event1) ~ Area, data = data_km))
ggsurvplot(survfit(Surv(Duration2, Event2) ~ Area, data = data_km))
ggsurvplot(survfit(Surv(Duration3, Event3) ~ Area, data = data_km))
myfun <- function(TimeVar, EventVar){
ggsurvplot(survfit(Surv(eval(parse(text = TimeVar), eval(parse(text = EventVar)) ~ Area, data = data_km))
}
x <- myfun("Duration1", "Event1")
plot(x)
You need to study some tutorials about computing on the language. I like doing it with base R, e.g., using bquote.
myfun <- function(TimeVar, EventVar){
TimeVar <- as.name(TimeVar)
EventVar <- as.name(EventVar)
fit <- eval(bquote(survfit(Surv(.(TimeVar), .(EventVar)) ~ Area, data = data_km)))
ggsurvplot(fit)
}
x <- myfun("Duration1", "Event1")
print(x)
#works

Write function to plot data, requires passing data.frame column names

I would like to write a function to create plots (in order to create multiple plots without listing the design settings every time). The pirateplot function that I use requires columnnames and a dataframe as input, which causes problems.
My not-working code is:
pirateplot_default <- function(DV,IV,Dataset) {
plot <- pirateplot(formula = DV ~ IV,
data = Dataset,
xlab = "Solution")
return(plot)
}
I have tried "as.name" (saw that here) but it did not work.
using data[DV] is no option because the pirateplot function requires a different notation
I know that there are similar questions here,here,here, and this probably qualifies as duplicate for more skilled programmers, but I did not manage to apply the solutions at other questions to my problem, so hoping for help.
Here is an example
pirateplot_default <- function(DV,IV,Dataset) {
tmp=as.formula(paste0(DV,"~",paste0(IV,collapse="+")))
plot <- pirateplot(formula = tmp,
data = Dataset,
xlab = "Solution")
return(plot)
}
pirateplot_default("mpg",c("disp","cyl","hp"),mtcars)

R - apply survfit to a list and plot with corresponding names

I have a list of named dataframes:
library(survival)
library(survminer)
surv.days<- runif(n = 50, min = 0, max = 500)
censor<- sample(c(0,1), 50, replace=TRUE)
survdata<- data.frame(surv.days, censor)
survlist<- list(survdata, survdata)
names(survlist)<- c("name1", "name2")
rm(survdata, censor, surv.days)
I want to run a survfit on each dataframe and then generate several plots (I put only one here for the sake of simplicity), each plot with the corresponding title. I think Map is the way to do it, so:
titles<- names(survlist)
Then I define the function that I want to use to run the survival analysis and plots:
survival.function<- function(survivaldata, datanames){
sfit<- survfit(Surv(surv.days, censor)~1, data=survivaldata)
ggsurvplot(sfit, conf.int=TRUE, risk.table=TRUE,
surv.median.line = "v",
title=datanames,
risk.table.height=.25)
}
And try to apply it:
Map(survival.function, survlist, titles)
But the idea didn't work:
"Error in eval(fit$call$data) : object 'survivaldata' not found "
Is there a way to properly assign the objects to the survival functions?
Thank you!.
In this case the error message seems to be misleading, at least to my reading. It appears to point to the error being in the call to survfit when in fact it's an error within ggsurvplot as can be seen from the output of traceback(). I first tried to change the name of the object passed to survival.function. Then no error. But also no plot. So also added a print call inside survival.function. I'll attache the first of the two plots that result.
survival.function<- function(data, datanames){
sfit<- survfit(Surv(surv.days, censor)~1, data=data)
print( ggsurvplot(sfit, conf.int=TRUE, risk.table=TRUE,
surv.median.line = "v",
title=datanames,
risk.table.height=.25) )
}
I wish I could better explain why this hack works. (I'm only guessing at the cause on the basis of the error message and traceback()-results.) I'm using the fact that the name of the "missing" object was "data". It's possible this is a semantic bug in the ggsurvplot and you would be doing a favor to the maintainer of the package to send him a link to this fully documented example. I wonder whetehr there would be possible improvement if the maintainer changed the code so that the list member name were not accessed from the environment with $data but rather [[data]].

Error plotting Kohonen maps in R?

I was reading through this blog post on R-bloggers and I'm confused by the last section of the code and can't figure it out.
http://www.r-bloggers.com/self-organising-maps-for-customer-segmentation-using-r/
I've attempted to recreate this with my own data. I have 5 variables that follow an exponential distribution with 2755 points.
I am fine with and can plot the map that it generates:
plot(som_model, type="codes")
The section of the code I don't understand is the:
var <- 1
var_unscaled <- aggregate(as.numeric(training[,var]),by=list(som_model$unit.classif),FUN = mean, simplify=TRUE)[,2]
plot(som_model, type = "property", property=var_unscaled, main = names(training)[var], palette.name=coolBlueHotRed)
As I understand it, this section of the code is suppose to be plotting one of the variables over the map to see what it looks like but this is where I run into problems. When I run this section of the code I get the warning:
Warning message:
In bgcolors[!is.na(showcolors)] <- bgcol[showcolors[!is.na(showcolors)]] :
number of items to replace is not a multiple of replacement length
and it produces the plot:
Which just some how doesn't look right...
Now what I think it has come down to is the way the aggregate function has re-ordered the data. The length of var_unscaled is 789 and the length of som_model$data, training[,var] and unit.classif are all of length 2755. I tried plotting the aggregated data, the result was no warning but an unintelligible graph (as expected).
Now I think it has done this because unit.classif has a lot of repeated numbers inside it and that's why it has reduced in size.
The question is, do I worry about the warning? Is it producing an accurate graph? What exactly is the "Property"'s section looking for in the plot command? Is there a different way I could "Aggregate" the data?
I think that you have to create the palette color. If you put the argument
coolBlueHotRed <- function(n, alpha = 1) {rainbow(n, end=4/6, alpha=alpha)[n:1]}
and then try to get a plot, for example
plot(som_model, type = "count", palette.name = coolBlueHotRed)
the end is succesful.
This link can help you: http://rgm3.lab.nig.ac.jp/RGM/R_rdfile?f=kohonen/man/plot.kohonen.Rd&d=R_CC
I think that not all of the cells on your map have points inside.
You have 30 by 30 map and about 2700 points. In average it's about 3 points per cell. With high probability some cells have more than 3 points and some cells are empty.
The code in the post on R-bloggers works well when all of the cells have points inside.
To make it work on your data try change this part:
var <- 1
var_unscaled <- aggregate(as.numeric(training[, var]), by = list(som_model$unit.classif), FUN = mean, simplify = TRUE)[, 2]
plot(som_model, type = "property", property = var_unscaled, main = names(training)[var], palette.name = coolBlueHotRed)
with this one:
var <- 1
var_unscaled <- aggregate(as.numeric(data.temp[, data.classes][, var]),
by = list(som_model$unit.classif),
FUN = mean,
simplify = T)
v_u <- rep(0, max(var_unscaled$Group.1))
v_u[var_unscaled$Group.1] <- var_unscaled$x
plot(som_model,
type = "property",
property = v_u,
main = colnames(data.temp[, data.classes])[var],
palette.name = coolBlueHotRed)
Hope it helps.
Just add these functions to your script:
coolBlueHotRed <- function(n, alpha = 1) {rainbow(n, end=4/6, alpha=alpha)[n:1]}
pretty_palette <- c("#1f77b4","#ff7f0e","#2ca02c", "#d62728","#9467bd","#8c564b","#e377c2")

tryCatch() apparently ignoring a warning

I'm writing a function that uses kmeans to determine bin widths to convert a continuous measurement (a predicted probability) into an integer (one of 3 bins). I've stumbled upon an edge case in which it's possible for my algorithm to (correctly) predict the same probability for a whole set, and I want to handle that situation. I'm using the rattle package's binning() function in the following way:
btsKmeansBin <- function(x, k = 3, default = c(0, 0.3, 0.5, 1)) {
result <- binning(x, bins = k, method = "kmeans", ordered = T)
bins <- attr(result, "breaks")
attr(bins, "names") <- NULL
bins <- bins[order(bins)]
bins[1] <- 0
bins[length(bins)] <- 1
return(bins)
}
Run this function on x <- c(.5,.5,.5,.5,.5,.5), and you'll get an error at the order(bins) step, because bins will be NULL and therefore not a vector.
Obviously, if x has only one distinct value, kmeans shouldn't work. In this case, I'd like to return the default bin divisions. When this happens, binning issues "Warning: the variable is not considered." So I'd like to use tryCatch to handle this warning, but surrounding the result <- ... line with the following code doesn't work the way I expect:
...
tryCatch({
result <- binning(x, bins = k, method = "kmeans", ordered = T)
}, warning = function(w) {
warn(sprintf("%s. Using default values", w))
return(default)
}, error = function(e) {
stop(e)
})
...
The warning gets printed as though I hadn't used tryCatch, and the code progresses past the return statement and throws the error from order again. I have tried a bunch of variations to no avail. What am I missing, here??
If you look in binning I think you'll find that the "warning" you see is not generated via warning() but with cat(), which is why tryCatch isn't picking it up. The author of binning probably deserves a few lashings with a wet noodle for this oversight. ;) (Or it could be on purpose due to the particular way that rattle works, I'm not sure.)
It appears to return NULL when this happens, so you could simply handle it manually. Not ideal, but possibly the only way to go.

Resources