I am trying to automatize plots (fitted vs model input variable, fitted vs model output variable) from models and would like to isolate the input and output variable names from the nlme() results.
I managed with something looking like a very dirty solution. Do you have anything more elegant to share?
Thanks!
here is an example:
df <- data.frame(foot = c(18,49.5,36.6,31.55,8.3,17,30.89,13.39,23.04,34.88,35.9,47.8,23.9,31,29.7,25.5,10.8,36,6.3,46.5,9.2,29,5.4,7.5,34.7,16.8,45.5,28,30.50955414,30.2866242,65.9,26.6,12.42038217,81.8,6.8,35.44585987,7,45.8,29,16.7,19.6,46.3,32.9,20.9,40.6,10,21.3,18.6,41.4,6.6),
leg = c(94.3588,760.9818,696.9112,336.64,12.43,69.32,438.9675,31.8159,153.6262,473.116,461.66276,897.7088,131.6944,395.909156,633.1044,179.772,41.3292,457.62,9.072,870.74,18.6438,356.64,5.3486,8.802,452.425561,82.618,839.649888,276.73016,560.63,655.83,2287.6992,234.1807,63,3475.649195,14.098,837.35,10.01,1149.87,615.03,124.35,184.33,1418.66,707.25,123.62,687.87,24.9696,192.416,181.5872,954.158,10.1716),
region=c(rep("a",13), rep("b", 17), rep("c", 20)),
disease = "healthy")
df$g <- "a" #No random effect wanted
m1 <- nlme(leg~a*foot^b,
data = df,
start= c(1,1),
fixed=a+b~1,
groups=~g,
weights=varPower(form=~foot))
I want to do data$output <- data$leg but automatised:
output_var <- eval(parse(text=paste(m1$call$data, as.character(m1$call$model)[2], sep="$")))
df$output <- output_var
I want to do data$input <- data$foot but automatised:
input_var <- eval(parse(text=paste(m1$call$data, gsub('a|b| |\\*|\\/|\\+|\\-|\\^', '', as.character(m1$call$model)[3]), sep="$")))
df$input <- input_var
df$fit_m1 <- fitted.values(m1)
So that I can use generic varaibles in my ggplot:
ggplot(df)+
geom_point(aes(x=input, y=output, colour = region))+
geom_line(aes(x=input, y=fit_m1))
Here is a solution using broom::augment
library(nlme)
library(ggplot)
library(broom)
# Get the fitted values along with the input and output
augmented <- augment(m1, data=df)
# Change the names of the dataframe so you have our standard input and output
new_names <- names(augmented)
new_names[c(1, 2)] <- c('input', 'output')
names(augmented) <- new_names
# Then you can plot using your standard names
ggplot(augmented, aes(input)) +
geom_point(aes(y = output, color = region)) +
geom_line(aes(y = .fitted))
Related
I have a list of models as specified by the following code:
varlist <- list("PRS_Kunkle", "PRS_Kunkle_e07",
"PRS_Kunkle_e06","PRS_Kunkle_e05", "PRS_Kunkle_e04",
"PRS_Kunkle_e03", "PRS_Kunkle_e02", "PRS_Kunkle_e01",
"PRS_Kunkle_e00", "PRS_Jansen", "PRS_deroja_KANSL")
PRS_age_pacc3 <- lapply(varlist, function(x) {
lmer(substitute(z_pacc3_ds ~ i*AgeAtVisit + i*I(AgeAtVisit^2) +
APOE_score + gender + EdYears_Coded_Max20 +
VisNo + famhist + X1 + X2 + X3 + X4 + X5 +
(1 |family/DBID),
list(i=as.name(x))), data = WRAP_all, REML = FALSE)
})
I want to obtain the slope of PRS at different age points in each of the models. How can I write code to achieve this goal? Without loop, the raw code should be:
test_stat1 <- simple_slopes(PRS_age_pacc3[[1]], levels=list(AgeAtVisit=c(55,60,65,70,75,80)))
test_stat2 <- simple_slopes(PRS_age_pacc3[[2]], levels=list(AgeAtVisit=c(55,60,65,70,75,80)))
test_stat3 <- simple_slopes(PRS_age_pacc3[[3]], levels=list(AgeAtVisit=c(55,60,65,70,75,80)))
test_stat4 <- simple_slopes(PRS_age_pacc3[[4]], levels=list(AgeAtVisit=c(55,60,65,70,75,80)))
test_stat5 <- simple_slopes(PRS_age_pacc3[[5]], levels=list(AgeAtVisit=c(55,60,65,70,75,80)))
test_stat6 <- simple_slopes(PRS_age_pacc3[[6]], levels=list(AgeAtVisit=c(55,60,65,70,75,80)))
test_stat7 <- simple_slopes(PRS_age_pacc3[[7]], levels=list(AgeAtVisit=c(55,60,65,70,75,80)))
test_stat8 <- simple_slopes(PRS_age_pacc3[[8]], levels=list(AgeAtVisit=c(55,60,65,70,75,80)))
test_stat9 <- simple_slopes(PRS_age_pacc3[[9]], levels=list(AgeAtVisit=c(55,60,65,70,75,80)))
test_stat10 <- simple_slopes(PRS_age_pacc3[[10]], levels=list(AgeAtVisit=c(55,60,65,70,75,80)))
test_stat11 <- simple_slopes(PRS_age_pacc3[[11]], levels=list(AgeAtVisit=c(55,60,65,70,75,80)))
library(lme4)
library(reghelper)
set.seed(101)
## add an additional factor variable so we can use it for an interaction
sleepstudy$foo <- factor(sample(LETTERS[1:3], size = nrow(sleepstudy),
replace = TRUE))
m1 <- lmer(Reaction ~ Days*foo + I(Days^2)*foo + (1|Subject), data = sleepstudy)
s1 <- simple_slopes(m1, levels=list(Days = c(5, 10, 15)))
Looking at these results, s1 is a data frame with 6 rows (number of levels of foo × number of Days values specified) and 5 columns (Days, foo, estimate, std error, t value).
The simplest way to do this:
res <- list()
for (i in seq_along(varlist)) {
res[[i]] <- simple_slopes(model_list[[i]], ...) ## add appropriate args here
}
res <- do.call("rbind", res) ## collapse elements to a single data frame
## add an identifier column
res_final <- data.frame(model = rep(varlist, each = nrow(res[[1]])), res)
If you want to be fancier, you could replace the for loop with an appropriate lapply. If you want to be even fancier than that:
library(tidyverse)
(model_list
%>% setNames(varlist)
## map_dfr runs the function on each element, collapses results to
## a single data frame. `.id="model"` adds the names of the list elements
## (set in the previous step) as a `model` column
%>% purrr::map_dfr(simple_slopes, ... <extra args here>, .id = "model")
)
By the way, I would be very careful with simple_slopes when you have a quadratic term in the model as well. The slopes calculated will (presumably) apply only in the case where any other continuous variables in the model are zero. You might want to center your variables as in Schielzeth 2010 Methods in Ecology and Evolution ("Simple means to improve ...")
I want to be able to input the variable name that I'll be using in the "weights" option in the lmer function. So then I can change the dataset, and cycle through the "weights" and pull the correct variable.
I want to pull the correct column for weights within the for loop.
So for y, the equation would be:
lmer(y~x+(1|study), weights = weight.var)
And y1:
lmer(y1~x+(1|study),weights = weight.var1)
So I named the weighting variables (weight.opt), then want to use them in the formula within the for loop. I can use "as.formula" to get the formula working and connected to the dataset, but I'm not sure how to do something similar with the weights.
x <- rnorm(300,0,1)
y <- x*rnorm(300,2,0.5)
y1 <- x*rnorm(300,0.1,0.1)
study <- rep(c("a","b","c"),each = 100)
weight.var <- rep(c(0.5,2,4),each = 100)
weight.var1 <- rep(c(0.1,.2,.15),each = 100)
library(lme4)
dataset <- data.frame(x,y,y1,study,weight.var,weight.var1)
resp1 <- c("y","y1")
weight.opt <- c("weight.var","weight.var1")
for(i in 1:2){
lmer(as.formula(paste(resp1[i],"~x+(1|study)")),weights = weight.opt[i],data = dataset)
}
This seems to work fine:
res_list <- list()
for(i in 1:2){
res_list[[i]] <- lmer(as.formula(paste(resp1[i],"~x+(1|study)")),
weights = dataset[[weight.opt[i]]],data = dataset)
}
So I am creating a function which allows me to take a data.frame and get a dataframe of p.values for each variable tested.
# data and labels
my_data <- data.frame(matrix(data = rnorm(10000), nrow = 100, ncol = 10000))
labels <- sample(0:1, 100, replace = TRUE)
# append the labels to the data, then filter
my_data$labels <- labels
sample_1 <- dplyr::filter(.data = my_data, labels == 0)
sample_2 <- dplyr::filter(.data = my_data, labels == 1)
#perform a t-test on each column
p_vals <- data.frame()
for(i in c(1:10000)) {
p_vals <- rbind(p_vals, t.test(x = sample_1[,i], y = sample_2[,i])$p.value)
}
return(p_vals)
This is functional, but I think/hope there would be a more efficient way to do this without the for loop. The data should be in rows because for later functions it will be important to keep track of which variable had which p value.
Instead of splitting the samples you can use the formula interface to t.test, and sapply over the columns of my_data to conduct the tests:
p_vals <- sapply( my_data, function(x) t.test(x ~ labels)$p.value )
This will make a vector of p-values, the order will be the same as the columns of my_data
You can also use the package genefilter:
library(genefilter)
colttests(as.matrix(my_data[,-ncol(my_data)]),factor(my_data$labels))
I have a dataset with 99 observations and I need to create boxplots for ones with a specific string in them. However, when I run this code I get 57 of the exact same plots from the original function instead of the loop. I was wondering how to prevent the plots from being overwritten but still create all 57. Here is the code and a picture of the plot.
Thanks!
Boxplot Format
#starting boxplot function
myboxplot <- function(mydata=ivf_dataset, myexposure =
"ART_CURRENT", myoutcome = "MEG3_DMR_mean")
{bp <- ggplot(ivf_dataset, aes(ART_CURRENT, MEG3_DMR_mean))
bp <- bp + geom_boxplot(aes(group =ART_CURRENT))
}
#pulling out variables needed for plots
outcomes = names(ivf_dataset)[grep("_DMR_", names(ivf_dataset),
ignore.case = T)]
#creating loop for 57 boxplots
allplots <- list()
for (i in seq_along(outcomes))
{
allplots[[i]]<- myboxplot (myexposure = "ART_CURRENT", myoutcome =
outcomes[i])
}
allplots
I recommend reading about standard and non-standard evaluation and how this works with the tidyverse. Here are some links
http://adv-r.had.co.nz/Functions.html#function-arguments
http://adv-r.had.co.nz/Computing-on-the-language.html
I also found this useful
https://rstudio-pubs-static.s3.amazonaws.com/97970_465837f898094848b293e3988a1328c6.html
Also, you need to produce an example so that it is possible to replicate your problem. Here is the data that I created.
df <- data.frame(label = rep(c("a","b","c"), 5),
x = rnorm(15),
y = rnorm(15),
x2 = rnorm(15, 10),
y2 = rnorm(15, 5))
I kept most of your code the same and only changed what needed to be changed.
myboxplot2 <- function(mydata = df, myexposure, myoutcome){
bp <- ggplot(mydata, aes_(as.name(myexposure), as.name(myoutcome))) +
geom_boxplot()
print(bp)
}
myboxplot2(myexposure = "label", myoutcome = "y")
Because aes() uses non-standard evaluation, you need to use aes_(). Again, read the links above.
Here I am getting all the columns that start with x. I am assuming that your code gets the columns that you want.
outcomes <- names(df)[grep("^x", names(df), ignore.case = TRUE)]
Here I am looping through in the same way that you did. I am only storing the plot object though.
allplots <- list()
for (i in seq_along(outcomes)){
allplots[[i]]<- myboxplot2(myexposure = "label", myoutcome = outcomes[i])$plot
}
allplots
I would like to create a function that creates different kinds of plotly plots based on the parameters that are passed into it. If I create the following data
library(plotly)
#### test data
lead <- rep("Fred Smith", 30)
lead <- append(lead, rep("Terry Jones", 30))
lead <- append(lead, rep("Henry Sarduci", 30))
proj_date <- seq(as.Date('2017-11-01'), as.Date('2017-11-30'), by = 'day')
proj_date <- append(proj_date, rep(proj_date, 2))
set.seed(1237)
actHrs <- runif(90, 1, 100)
cummActHrs <- cumsum(actHrs)
forHrs <- runif(90, 1, 100)
cummForHrs <- cumsum(forHrs)
df <- data.frame(Lead = lead, date_seq = proj_date,
cActHrs = cummActHrs,
cForHrs = cummForHrs)
I could plot it using:
plot_ly(data = df, x = ~date_seq, y = ~cActHrs, split = ~Lead)
If I made a makePlot function like the one shown below, how would I make it do something like this:
makePlot <- function(plot_data = df, x_var = date_seq, y_var, split_var) {
plot <- plot_ly(data = df, x = ~x_var, y = ~y_var, split = ~split_var)
return(plot)
}
?
Is there a function I can wrap x_var, y_var, and split_var with so that plotly will recognize them as x, y, and split parameters?
Eventually got around to figuring this out and hope this little follow up takes some of the mystery of these types of tasks. Although this question is focused on plotting, it's important to first build an understanding of how the functions in various R packages (e.g. dplyr and plotly) evaluate expressions and how to manipulate the way those expressions are evaluated. A great reference to build this understanding is Hadley's article on programming in dplyr here or alternatively here.
Once that's under your belt, this turns out to be pretty easy. The trick is to simply pass your variable arguments like you do when you call dplyr functions and be sure to quote those parameters inside your plotting function. For the question above, this function worked for me:
makePlot <- function(plot_data = df, x_var, y_var, split_var,
type_var="scatter",
mode_var="lines+markers") {
quo_x <- enquo(x_var)
quo_y <- enquo(y_var)
quo_split <- enquo(split_var)
# print(c(quo_x, quo_y, quo_split))
plot <- plot_ly(data = plot_data, x = quo_x, y = quo_y, split = quo_split,
type=type_var, mode=mode_var)
return(plot)
}
# using df created in question, pass col's as args like dplyr functions
p1 <- makePlot2(df, date_seq, cActHrs, Lead)
p2 <- makePlot2(df, date_seq, cForHrs, Lead)