R - Defining a function which recognises arguments not as objects, but as being part of the call - r

I'm trying to define a function which returns a graphical object in R. The idea is that I can then call this function with different arguments multiple times using an for loop or lapply function, then plotting the list of grobs in gridExtra::grid.arrange. However, I did not get that far yet. I'm having trouble with r recognising the arguments as being part of the call. I've made some code to show you my problem. I have tried quoting and unquoting the arguments, using unqoute() in the function ("Object not found" error within a user defined function, eval() function?), using eval(parse()) (R - how to filter data with a list of arguments to produce multiple data frames and graphs), using !!, etc. However, I can't seem to get it to work. Does anyone know how I should handle this?
library(survminer)
library(survival)
data_km <- data.frame(Duration1 = c(1,2,3,4,5,6,7,8,9,10),
Event1 = c(1,1,0,1,1,0,1,1,1,1),
Duration2 = c(1,1,2,2,3,3,4,4,5,5),
Event2 = c(1,0,1,0,1,1,1,0,1,1),
Duration3 = c(11,12,13,14,15,16,17,18,19,20),
Event3 = c(1,1,0,1,1,0,1,1,0,1),
Area = c(1,1,1,1,1,2,2,2,2,2))
# this is working perfectly
ggsurvplot(survfit(Surv(Duration1, Event1) ~ Area, data = data_km))
ggsurvplot(survfit(Surv(Duration2, Event2) ~ Area, data = data_km))
ggsurvplot(survfit(Surv(Duration3, Event3) ~ Area, data = data_km))
myfun <- function(TimeVar, EventVar){
ggsurvplot(survfit(Surv(eval(parse(text = TimeVar), eval(parse(text = EventVar)) ~ Area, data = data_km))
}
x <- myfun("Duration1", "Event1")
plot(x)

You need to study some tutorials about computing on the language. I like doing it with base R, e.g., using bquote.
myfun <- function(TimeVar, EventVar){
TimeVar <- as.name(TimeVar)
EventVar <- as.name(EventVar)
fit <- eval(bquote(survfit(Surv(.(TimeVar), .(EventVar)) ~ Area, data = data_km)))
ggsurvplot(fit)
}
x <- myfun("Duration1", "Event1")
print(x)
#works

Related

Write function to plot data, requires passing data.frame column names

I would like to write a function to create plots (in order to create multiple plots without listing the design settings every time). The pirateplot function that I use requires columnnames and a dataframe as input, which causes problems.
My not-working code is:
pirateplot_default <- function(DV,IV,Dataset) {
plot <- pirateplot(formula = DV ~ IV,
data = Dataset,
xlab = "Solution")
return(plot)
}
I have tried "as.name" (saw that here) but it did not work.
using data[DV] is no option because the pirateplot function requires a different notation
I know that there are similar questions here,here,here, and this probably qualifies as duplicate for more skilled programmers, but I did not manage to apply the solutions at other questions to my problem, so hoping for help.
Here is an example
pirateplot_default <- function(DV,IV,Dataset) {
tmp=as.formula(paste0(DV,"~",paste0(IV,collapse="+")))
plot <- pirateplot(formula = tmp,
data = Dataset,
xlab = "Solution")
return(plot)
}
pirateplot_default("mpg",c("disp","cyl","hp"),mtcars)

Need some help writing a function

I'm trying to write a function that takes a few lines of code and allows me to input a single variable. I've got the code below that creates an object using the Surv function (Survival package). The second line takes the variable in question, in this case a column listed as Variable_X, and outputs data that can then be visualized using ggsurvplot. The output is a Kaplan-Meier survival curve. What I'd like to do is have a function such that i can type f(Variable_X) and have the output KM curve visualized for whichever column I choose from the data. I want f(y) to output the KM as if I had put y where the ~Variable_X currently is. I'm new to R and very new to how functions work, I've tried the below code but it obviously doesn't work. I'm working through datacamp and reading posts but I'm having a hard time with it, appreciate any help.
surv_object <- Surv(time = KMeier_DF$Followup_Duration, event = KMeier_DF$Death_Indicator)
fitX <- survfit(surv_object ~ Variable_X, data = KMeier_DF)
ggsurvplot(fitX, data = KMeier_DF, pval = TRUE)
f<- function(x) {
dat<-read.csv("T:/datafile.csv")
KMeier_DF < - dat
surv_object <- Surv(time = KMeier_DF$Followup_Duration, event =
KMeier_DF$Death_Indicator)
fitX<-survfit(surv_object ~ x, data = KMeier_DF)
PlotX<- ggsurvplot(fitX, data = KMeier_DF, pval = TRUE)
return(PlotX)
}
The crux of the problem you have is actually a tough stumbling block to figure out initially: how to pass variable or dataframe column names into a function. I created some example data. In the example below I supply a function four variables, one of which is your data. You can see two ways I call on the columns, using [[]], and [,], which you can think of as being equivalent to using $. Outside of functions, they are, but not inside. The print functions are there to just show you the data along the way. If those objects exist in your global environment, remove them one by one, rm(surv_object), or clear them all rm(list = ls()).
duration <- c(1, 3, 4, 3, 3, 4, 2)
di <- c(1, 1, 0, 0, 0, 0, 1)
color <- c(1, 1, 2, 2, 3, 3, 4)
KMdf <- data.frame(duration, di, color)
testfun <- function(df, varb1, varb2, varb3) {
surv_object <- Surv(time = df[[varb1]], event = df[ , varb2])
print(surv_object)
fitX <- survfit(surv_object ~ df[[varb3]], data = df)
print(fitX)
# plotx <- ggsurvplot(fitX, data = df, pval = TRUE) # this gives an error that surv_object is not found
# return(plotx)
}
testfun(KMdf, "duration", "di", "color") # notice the use of quotes here, if not you'll get an error about object not found.
And even better, you have an even tougher stumbling block: how r handles variables and where it looks for them. From what I can tell, you're running into that because there is possibly a bug in ggsurvplot and looking at the global environment for variables, and not inside the function. They closed the issue, but as far as I can tell, it's still there. When you try to run the ggsurvplot line, you'll get an error that you would get if you didn't supply a variable:
Error in eval(inp, data, env) : object 'surv_object' not found.
Hopefully that helps. I'd submit a bug report if I were you.
edit
I was hoping this solution would help, but it doesn't.
testfun <- function(df, varb1, varb2, varb3) {
surv_object <- Surv(time = df[[varb1]], event = df[,varb2])
print(surv_object)
fitX <- survfit(surv_object ~ df[[varb3]], data = df)
print(fitX)
attr(fitX[['strata']], "names") <- c("color = 1", "color = 2", "color = 3", "color = 4")
plotx <- ggsurvplot(fitX, data = df, pval = TRUE) # this gives an error that surv_object is not found
return(plotx)
}
Error in eval(inp, data, env) : object 'surv_object' not found
This is homework, right?
First, you need to try to run the code before you provide it as an example. Your example has several fatal errors. ggsurvplot() needs either a library call to survminer or to be summoned as follows: survminer::ggsurvplot().
You have defined a function f, but you never used it. In the function definition, you have a wayward space < -. It never would have worked.
I suggest you start by defining a function that calculates the sum of two numbers, or concatenates two strings. Start here or here. Then, you can return to the Kaplan-Meier stuff.
Second, in another class or two, you will need to know the three parts of a function. You will need to understand the scope of a function. You might as well dig into the basics before you start copy-and-pasting.
Third, before you post another question, please read How to make a great R reproducible example?.
Best of luck.

How to use plot function to plot results of your own function?

I'm writing a short R package which contains a function. The function returns a list of vectors. I would like to use the plot function in order to plot by default a plot done with some of those vectors, add lines and add a new parameter.
As an example, if I use the survival package I can get the following:
library(survival)
data <- survfit(Surv(time, status == 2) ~ 1, data = pbc)
plot(data) # Plots the result of survfit
plot(data, conf.int = "none") # New parameter
In order to try to make a reproducible example:
f <- function(x, y){
b <- x^2
c <- y^2
d <- x+y
return(list(one = b, two = c, three = d))
}
dat <- f(3, 2)
So using plot(dat) I would like to get the same as plot(dat$one, dat$two). I would also like to add one more (new) parameter that could be set to TRUE/FALSE.
Is this possible?
I think you might be looking for classes. You can use the S3 system for this.
For your survival example, data has the class survfit (see class(data)). Then using plot(data) will look for a function called plot.survfit. That is actually a non-exported function in the survival package, at survival:::plot.survfit.
You can easily do the same for your package. For example, have a function that creates an object of class my_class, and then define a plotting method for that class:
f <- function(x, y){
b <- x^2
c <- y^2
d <- x+y
r <- list(one = b, two = c, three = d)
class(r) <- c('list', 'my_class') # this is the important bit.
r
}
plot.my_class <- function(x) {
plot(x$one, x$two)
}
Now your code should work:
dat <- f(3, 2)
plot(dat)
You can put anything in plot.my_class you want, including additional arguments, as long as your first argument is x and is the my_class object.
plot now calls plot.my_class, since dat is of class my_class.
You can also add other methods, e.g. for print.
There are many different plotting functions that can be called with plot for different classes, see methods(plot)
Also see Hadley's Advanced R book chapter on S3.

Using foreach() in R to speed up loop for ggplot2

I would like to create a PDF file containing hundreds of plots in a certain order.
My strategy was using foreach() and storing each ggplot2 object into the output list, and then printing each ggplot2 object to the output file.
For example, I would like to plot a histogram of prices for every factor "carat" in the diamonds dataset:
library(ggplot2)
library(plyr)
library(foreach) # for parallelization
library(doParallel) # for parallelization
#setup parallel backend to use 4 processors
cl<-makeCluster(4)
registerDoParallel(cl)
# use diamonds dataset
carats.summary <- ddply(diamonds, .(carat), summarise, count = length(carat))
m.list <- foreach(i = 1:length(carats.summary$carat),
.packages = "ggplot2") %dopar% {
jcarat = carats.summary$carat[i]
m <- ggplot(subset(diamonds, carat == jcarat), aes(x = price)) +
geom_histogram()
print(m)
}
With this code, I am hoping to create a list of ggplot2 objects which I can then save into a single pdf file (for example using pdf()) in an ordered manner (for example, in ascending carats).
However, running this results in an error message:
Error in serialize(data, node$con) : error writing to connection
I suspect this is due to the fact that if I tried to append the ggplot2 object to a list, I would get a warning message like this:
lst <- vector(mode = "list")
lst[1] <- m
Warning message:
In lst[1] <- m :
number of items to replace is not a multiple of replacement length
Although this is pure speculation and I could be wrong.
Does anybody have an idea how to use foreach() to save ggplot2 objects onto a list? Or some way to parallelize for loops involving ggplot2?
Thanks in advance.
You shouldn't be printing the object inside the loop, just create the ggplot object. Only print when you have the graphic device open that you want.
m.list <- foreach(i = 1:length(carats.summary$carat),
.packages = "ggplot2") %dopar% {
jcarat = carats.summary$carat[i]
ggplot(subset(diamonds, carat == jcarat), aes(x = price)) +
geom_histogram()
}
then you can get at them with
m.list[[1]]
etc...

Exclude Node in semPaths {semPlot}

I'm trying to plot a sem-path with R.
Im using an OUT file provinent from Mplus with semPaths {semPLot}.
Apparently it seems to work, but i want to remove some latent variables and i don't know how.
I am using the following syntax :
Out from Mplus : https://www.dropbox.com/s/vo3oa5fqp7wydlg/questedMOD2.out?dl=0
outfile1 <- "questedMOD.out"
```
semPaths(outfile1,what="est", intercepts=FALSE, rotation=4, edge.color="black", sizeMan=5, esize=TRUE, structural="TRUE", layout="tree2", nCharNodes=0, intStyle="multi" )
There may be an easier way to do this (and ignoring if it is sensible to do it) - one way you can do this is by removing nodes from the object prior to plotting.
Using the Mplus example from your question Rotate Edges in semPaths/qgraph
library(qgraph)
library(semPlot)
library(MplusAutomation)
# This downloads an output file from Mplus examples
download.file("http://www.statmodel.com/usersguide/chap5/ex5.8.out",
outfile <- tempfile(fileext = ".out"))
# Unadjusted plot
s <- semPaths(outfile, intercepts = FALSE)
In the above call to semPaths, outfile is of class character, so the line (near the start of code for semPaths)
if (!"semPlotModel" %in% class(object))
object <- do.call(semPlotModel, c(list(object), modelOpts))
returns the object from semPlot:::semPlotModel.mplus.model(outfile). This is of class "semPlotModel".
So the idea is to create this object first, amend it and then pass this object to semPaths.
# Call semPlotModel on your Mplus file
obj <- semPlot:::semPlotModel.mplus.model(outfile)
# obj <- do.call(semPlotModel, list(outfile)) # this is more general / not just for Mplus
# Remove one factor (F1) from object#Pars - need to check lhs and rhs columns
idx <- apply(obj#Pars[c("lhs", "rhs")], 1, function(i) any(grepl("F1", i)))
obj#Pars <- obj#Pars[!idx, ]
class(obj)
obj is now of class "semPlotModel" and can be passed directly to semPaths
s <- semPaths(obj, intercepts = FALSE)
You can use str(s) to see the structure of this returned object.
Assuming that you use the following sempath code to print your SEM
semPaths(obj, intercepts = FALSE)%>%
plot()
you can use the following function to remove any node by its label:
remove_nodes_and_edges <- function(semPaths_obj,node_tbrm_vec){
relevent_nodes_index <- semPaths_obj$graphAttributes$Nodes$names %in% node_tbrm_vec
semPaths_obj$graphAttributes$Nodes$width[relevent_nodes_index]=0
semPaths_obj$graphAttributes$Nodes$height[relevent_nodes_index]=0
semPaths_obj$graphAttributes$Nodes$labels[relevent_nodes_index]=""
return(semPaths_obj)
}
and use this function in the plotting pipe in the following way
semPaths(obj, intercepts = FALSE) %>%
remove_nodes_and_edges(c("Y1","Y2","Y3")) %>%
plot()

Resources