Plot multiple ROC curves using a for loop - r

I need to plot a number of different ROC curves on a single plot. To avoid manually creating each ROC curve, I have created a for loop that automates this process. However, for some reason, the code only outputs a single curve, for the last model in the list of names. Can anyone help me figure out why its not working? Please see below for a reproducible example:
library(pROC)
library(tidyverse)
dat_tst_2 <- data.frame(result = sample(letters[1:2], 100, replace = T))
preds_1 <- data.frame(x = runif(100),
y = runif(100))
preds_2 <- data.frame(x = runif(100),
y = runif(100))
names_preds <- c("preds_1", "preds_2")
output <- list()
for (j in 1:length(names_preds)) {
for (i in names_preds) {
roc_model <- roc(response = dat_tst_2$result,
predictor = eval(as.name(i))[,2],
levels = c("a", "b"),
plot = F)
output[[j]] <- roc_model
}
}
ggroc(output)

First make sure output has multiple items usin str(output). Then try instead to pass each item in output to ggroc:
lappy( output, function (out) { png()
print (ggroc(out))
dev.off() }

Related

How do I make variable weights dynamic in lmer for loop

I want to be able to input the variable name that I'll be using in the "weights" option in the lmer function. So then I can change the dataset, and cycle through the "weights" and pull the correct variable.
I want to pull the correct column for weights within the for loop.
So for y, the equation would be:
lmer(y~x+(1|study), weights = weight.var)
And y1:
lmer(y1~x+(1|study),weights = weight.var1)
So I named the weighting variables (weight.opt), then want to use them in the formula within the for loop. I can use "as.formula" to get the formula working and connected to the dataset, but I'm not sure how to do something similar with the weights.
x <- rnorm(300,0,1)
y <- x*rnorm(300,2,0.5)
y1 <- x*rnorm(300,0.1,0.1)
study <- rep(c("a","b","c"),each = 100)
weight.var <- rep(c(0.5,2,4),each = 100)
weight.var1 <- rep(c(0.1,.2,.15),each = 100)
library(lme4)
dataset <- data.frame(x,y,y1,study,weight.var,weight.var1)
resp1 <- c("y","y1")
weight.opt <- c("weight.var","weight.var1")
for(i in 1:2){
lmer(as.formula(paste(resp1[i],"~x+(1|study)")),weights = weight.opt[i],data = dataset)
}
This seems to work fine:
res_list <- list()
for(i in 1:2){
res_list[[i]] <- lmer(as.formula(paste(resp1[i],"~x+(1|study)")),
weights = dataset[[weight.opt[i]]],data = dataset)
}

Returing a list object from a function

I am trying to write a function that returns a series of ggplot scatterplots from a data frame. Below is a reproducible data frame as well as the function I've written
a <- sample(0:20,20,rep=TRUE)
b <- sample(0:300,20,rep=TRUE)
c <- sample(0:3, 20, rep=TRUE)
d <- rep("dog", 20)
df <- data.frame(a,b,c,d)
loopGraph <- function(dataFrame, y_value, x_type){
if(is.numeric(dataFrame[,y_value]) == TRUE && x_type == "numeric"){
dataFrame_number <- dataFrame %>%
dplyr::select_if(is.numeric) %>%
filter(y_value!=0) %>%
dplyr::select(-y_value)
x <- 1
y<-ncol(dataFrame_number)
endList <- list()
for (i in x:y)
{
i
x_value <-colnames(dataFrame_number)[i]
plotDataFrame <- cbind(dataFrame_number[,x_value], dataFrame[,y_value]) %>% as.data.frame()
r2 <- summary(lm(plotDataFrame[,2]~plotDataFrame[,1]))$r.squared
ggplot <- ggplot(plotDataFrame, aes(y=plotDataFrame[,2],x=plotDataFrame[,1])) +
geom_point()+
geom_smooth(method=lm) +
labs(title = paste("Scatterplot of", y_value, "vs.", x_value), subtitle= paste0("R^2=",r2), x = x_value,y=y_value)
endList[[i]] <- ggplot
}
return(endList)
}
else{print("Try again")}
}
loopGraph(df, "a", "numeric")
What I want is to return the object endList so I can look at the multiple scatterplots generated by this function. What happens is that the function prints each scatterplot in the plots window without giving me access to the endList object.
How can I get this function to return the endList object? Is there a better way to go about this? Thanks in advance!
Update
Thanks to #GordonShumway for solving my first issue. Now, when I define plots as plots <- loopGraph(df, "a", "numeric"), I can view all the outputs. However, all the graphs are of the first ggplot feature, even though the labels change. Any intuition as to why this is happening? Or how to fix it? I tried adding dev.set(dev.next()) to no avail.

Creating a boxplot loop with ggplot2 for only certain variables

I have a dataset with 99 observations and I need to create boxplots for ones with a specific string in them. However, when I run this code I get 57 of the exact same plots from the original function instead of the loop. I was wondering how to prevent the plots from being overwritten but still create all 57. Here is the code and a picture of the plot.
Thanks!
Boxplot Format
#starting boxplot function
myboxplot <- function(mydata=ivf_dataset, myexposure =
"ART_CURRENT", myoutcome = "MEG3_DMR_mean")
{bp <- ggplot(ivf_dataset, aes(ART_CURRENT, MEG3_DMR_mean))
bp <- bp + geom_boxplot(aes(group =ART_CURRENT))
}
#pulling out variables needed for plots
outcomes = names(ivf_dataset)[grep("_DMR_", names(ivf_dataset),
ignore.case = T)]
#creating loop for 57 boxplots
allplots <- list()
for (i in seq_along(outcomes))
{
allplots[[i]]<- myboxplot (myexposure = "ART_CURRENT", myoutcome =
outcomes[i])
}
allplots
I recommend reading about standard and non-standard evaluation and how this works with the tidyverse. Here are some links
http://adv-r.had.co.nz/Functions.html#function-arguments
http://adv-r.had.co.nz/Computing-on-the-language.html
I also found this useful
https://rstudio-pubs-static.s3.amazonaws.com/97970_465837f898094848b293e3988a1328c6.html
Also, you need to produce an example so that it is possible to replicate your problem. Here is the data that I created.
df <- data.frame(label = rep(c("a","b","c"), 5),
x = rnorm(15),
y = rnorm(15),
x2 = rnorm(15, 10),
y2 = rnorm(15, 5))
I kept most of your code the same and only changed what needed to be changed.
myboxplot2 <- function(mydata = df, myexposure, myoutcome){
bp <- ggplot(mydata, aes_(as.name(myexposure), as.name(myoutcome))) +
geom_boxplot()
print(bp)
}
myboxplot2(myexposure = "label", myoutcome = "y")
Because aes() uses non-standard evaluation, you need to use aes_(). Again, read the links above.
Here I am getting all the columns that start with x. I am assuming that your code gets the columns that you want.
outcomes <- names(df)[grep("^x", names(df), ignore.case = TRUE)]
Here I am looping through in the same way that you did. I am only storing the plot object though.
allplots <- list()
for (i in seq_along(outcomes)){
allplots[[i]]<- myboxplot2(myexposure = "label", myoutcome = outcomes[i])$plot
}
allplots

Extracting input and output variables from nlme result in R

I am trying to automatize plots (fitted vs model input variable, fitted vs model output variable) from models and would like to isolate the input and output variable names from the nlme() results.
I managed with something looking like a very dirty solution. Do you have anything more elegant to share?
Thanks!
here is an example:
df <- data.frame(foot = c(18,49.5,36.6,31.55,8.3,17,30.89,13.39,23.04,34.88,35.9,47.8,23.9,31,29.7,25.5,10.8,36,6.3,46.5,9.2,29,5.4,7.5,34.7,16.8,45.5,28,30.50955414,30.2866242,65.9,26.6,12.42038217,81.8,6.8,35.44585987,7,45.8,29,16.7,19.6,46.3,32.9,20.9,40.6,10,21.3,18.6,41.4,6.6),
leg = c(94.3588,760.9818,696.9112,336.64,12.43,69.32,438.9675,31.8159,153.6262,473.116,461.66276,897.7088,131.6944,395.909156,633.1044,179.772,41.3292,457.62,9.072,870.74,18.6438,356.64,5.3486,8.802,452.425561,82.618,839.649888,276.73016,560.63,655.83,2287.6992,234.1807,63,3475.649195,14.098,837.35,10.01,1149.87,615.03,124.35,184.33,1418.66,707.25,123.62,687.87,24.9696,192.416,181.5872,954.158,10.1716),
region=c(rep("a",13), rep("b", 17), rep("c", 20)),
disease = "healthy")
df$g <- "a" #No random effect wanted
m1 <- nlme(leg~a*foot^b,
data = df,
start= c(1,1),
fixed=a+b~1,
groups=~g,
weights=varPower(form=~foot))
I want to do data$output <- data$leg but automatised:
output_var <- eval(parse(text=paste(m1$call$data, as.character(m1$call$model)[2], sep="$")))
df$output <- output_var
I want to do data$input <- data$foot but automatised:
input_var <- eval(parse(text=paste(m1$call$data, gsub('a|b| |\\*|\\/|\\+|\\-|\\^', '', as.character(m1$call$model)[3]), sep="$")))
df$input <- input_var
df$fit_m1 <- fitted.values(m1)
So that I can use generic varaibles in my ggplot:
ggplot(df)+
geom_point(aes(x=input, y=output, colour = region))+
geom_line(aes(x=input, y=fit_m1))
Here is a solution using broom::augment
library(nlme)
library(ggplot)
library(broom)
# Get the fitted values along with the input and output
augmented <- augment(m1, data=df)
# Change the names of the dataframe so you have our standard input and output
new_names <- names(augmented)
new_names[c(1, 2)] <- c('input', 'output')
names(augmented) <- new_names
# Then you can plot using your standard names
ggplot(augmented, aes(input)) +
geom_point(aes(y = output, color = region)) +
geom_line(aes(y = .fitted))

How to extract the p.value and estimate from cor.test() in a data.frame?

In this example, I have temperatures values from 50 different sites, and I would like to correlate the Site1 with all the 50 sites. But I want to extract only the components "p.value" and "estimate" generated with the function cor.test() in a data.frame into two different columns.
I have done my attempt and it works, but I don't know how!
For that reason I would like to know how can I simplify my code, because the problem is that I have to run two times a Loop "for" to get my results.
Here is my example:
# Temperature data
data <- matrix(rnorm(500, 10:30, sd=5), nrow = 100, ncol = 50, byrow = TRUE,
dimnames = list(c(paste("Year", 1:100)),
c(paste("Site", 1:50))) )
# Empty data.frame
df <- data.frame(label=paste("Site", 1:50), Estimate="", P.value="")
# Extraction
for (i in 1:50) {
df1 <- cor.test(data[,1], data[,i] )
df[,2:3] <- df1[c("estimate", "p.value")]
}
for (i in 1:50) {
df1 <- cor.test(data[,1], data[,i] )
df[i,2:3] <- df1[c("estimate", "p.value")]
}
df
I will appreciate very much your help :)
I might offer up the following as well (masking the loops):
result <- do.call(rbind,lapply(2:50, function(x) {
cor.result<-cor.test(data[,1],data[,x])
pvalue <- cor.result$p.value
estimate <- cor.result$estimate
return(data.frame(pvalue = pvalue, estimate = estimate))
})
)
First of all, I'm guessing you had a typo in your code (you should have rnorm(5000 if you want unique values. Otherwise you're going to cycle through those 500 numbers 10 times.
Anyway, a simple way of doing this would be:
data <- matrix(rnorm(5000, 10:30, sd=5), nrow = 100, ncol = 50, byrow = TRUE,
dimnames = list(c(paste("Year", 1:100)),
c(paste("Site", 1:50))) )
# Empty data.frame
df <- data.frame(label=paste("Site", 1:50), Estimate="", P.value="")
estimates = numeric(50)
pvalues = numeric(50)
for (i in 1:50){
test <- cor.test(data[,1], data[,i])
estimates[i] = test$estimate
pvalues[i] = test$p.value
}
df$Estimate <- estimates
df$P.value <- pvalues
df
Edit: I believe your issue was is that in the line df <- data.frame(label=paste("Site", 1:50), Estimate="", P.value="") if you do typeof(df$Estimate), you see it's expecting an integer, and typeof(test$estimate) shows it spits out a double, so R doesn't know what you're trying to do with those two values. you can redo your code like thus:
df <- data.frame(label=paste("Site", 1:50), Estimate=numeric(50), P.value=numeric(50))
for (i in 1:50){
test <- cor.test(data[,1], data[,i])
df$Estimate[i] = test$estimate
df$P.value[i] = test$p.value
}
to make it a little more concise.
similar to the answer of colemand77:
create a cor function:
cor_fun <- function(x, y, method){
tmp <- cor.test(x, y, method= method)
cbind(r=tmp$estimate, p=tmp$p.value) }
apply through the data.frame. You can transpose the result to get p and r by row:
t(apply(data, 2, cor_fun, data[, 1], "spearman"))

Resources