I have a dose-response curve and I need to calculate and plot a 4-parameter logistic regression using the R package drc and ggplot. It needs to have 1 as the upper limit and 0 as the lower limit. It works fine exept that the calculated ED50 value is different from the ED50 value when read from the graph created by ggplot.
Here is a minimal example:
library(ggplot2)
library(drc)
df <- data.frame(dose = c("2000", "666.67", "222.22",
"74.04", "24.69", "8.23",
"2.74", "0.91", "0.30",
"0.10", "0.03", "0.01"),
response = c("0.569767442", "0.709302326", "0.767441860",
"0.712209302", "0.747093023", "0.723837209",
"0.71802326", "0.7558140", "0.7906977",
"0.7616279", "0.8197674", "1"))
df$dose <- as.numeric(df$dose)
df$response <- as.numeric(df$response)
# calculating the ED 50
logistic <- drm(response~dose,
data = df,
fct = LL.4(fixed = c(NA, 0, 1, NA)))
ED50 <- ED(logistic, 50)[1]
# plotting
ggplot(data = df, aes(x = dose, y = response))+
scale_x_continuous(trans = "log10", limits = c(1E-2, 2E6))+
scale_y_continuous()+
geom_smooth(method = drm,
method.args = list(fct = L.4(fixed = c(NA, 0, 1, NA)),
# "b", "c", "d", "e"
# L.4 and not LL.4 because the x scale is on a log10
se=FALSE,
fullrange = TRUE))+
geom_point(alpha = 0.5)+
geom_point(aes(x = ED50, y = 0.5), color = "red") +
coord_cartesian(xlim = c(1E-2,2E6), ylim = c(0,1))
It gives the following graph (the calculated ED50 value is marked in red).
Obviously, the calculated ED50 does not match the ED50 value suggested by the graph even though I calulated both using the LL.4 method. The plot uses a L.4 on a log axis, which should be identical to LL.4 on a non-log scale as suggested here.
I have no clue what the problem is or what I am missing. Thank you for your help and time in advance!
This doesn't really give an answer to the question why the methods give you different ED50's, that might be more suited for CrossValidated. However, you can circumvent the issue of reconstituting your model on a log axis, by just using stat_function() + predict() instead of using geom_smooth(). That way, you can be sure that the plotted data comes from the model that also gave you the ED50 estimate.
library(ggplot2)
library(drc)
df <- data.frame(dose = c("2000", "666.67", "222.22",
"74.04", "24.69", "8.23",
"2.74", "0.91", "0.30",
"0.10", "0.03", "0.01"),
response = c("0.569767442", "0.709302326", "0.767441860",
"0.712209302", "0.747093023", "0.723837209",
"0.71802326", "0.7558140", "0.7906977",
"0.7616279", "0.8197674", "1"))
df$dose <- as.numeric(df$dose)
df$response <- as.numeric(df$response)
# calculating the ED 50
logistic <- drm(response~dose,
data = df,
fct = LL.4(fixed = c(NA, 0, 1, NA)))
ED50 <- ED(logistic, 50)[1]
ggplot(df, aes(dose, response)) +
stat_function(
fun = function(x) predict(logistic, newdata = data.frame(dose = x))
) +
geom_point(alpha = 0.5) +
geom_point(
aes(x = ED50, y = 0.5),
colour = "red"
) +
scale_x_continuous(trans = "log10")
Created on 2022-10-17 by the reprex package (v2.0.0)
Related
I would like to group a series of lines by 2 factors using group = interaction in ggplot. Here is some sample code:
set.seed(123)
N <- 18
means <- rnorm(N,0,1)
ses <- rexp(N,2)
upper<- means+qnorm(0.975)*ses
lower<- means+qnorm(0.025)*ses
fruit <- rep(c("Apples","Bananas","Pears"), each=6)
size <- rep(rep(c("Small","Medium","Big"), each=2),3)
GMO <- rep(c("Yes","No"), 9)
d<- data.frame(means,upper,lower,fruit,size,GMO)
ggplot(data=d,
aes(x = fruit,y = means, ymin = lower, ymax = upper, col=size,linetype=GMO,group=interaction(GMO, size)))+
geom_hline(aes(fill=size),yintercept =1, linetype=2)+
xlab('labels')+ ylab("Parameter estimates (95% Confidence Interval)")+
geom_pointrange(position=position_dodge(width = 0.6)) +
scale_x_discrete(name="Fruits")+
coord_flip()-> fplot
dev.new()
fplot
Here's a link to the resulting graph: https://i.stack.imgur.com/5YF4F.png
I would like to bring the same coloured lines for each of the three groups closer together. In other words I would like the lines to cluster not only by the 'Fruit' variable but also the 'Size' variable for each of the fruits. poisition_dodge seems to only work for one of the interacting groups.
Thanks for your advice.
As far as I know that is not possible with position_dodge, i.e. it dodges according to the categories of the group aes. And it does not matter whether you map one variable on the group aes or an interaction of two or more. The groups are simply placed equidistant from one another.
One option to achieve your desired result would be to use the "facets that don't look like facets" trick which means faceting by fruit, mapping size on x and afterwards using theme options to get rid of the facet look plus some tweaking of the x scale:
set.seed(123)
N <- 18
means <- rnorm(N, 0, 1)
ses <- rexp(N, 2)
upper <- means + qnorm(0.975) * ses
lower <- means + qnorm(0.025) * ses
fruit <- rep(c("Apples", "Bananas", "Pears"), each = 6)
size <- rep(rep(c("Small", "Medium", "Big"), each = 2), 3)
GMO <- rep(c("Yes", "No"), 9)
d <- data.frame(means, upper, lower, fruit, size, GMO)
library(ggplot2)
ggplot(data = d, aes(x = size, y = means, ymin = lower, ymax = upper, col = size, linetype = GMO, group = GMO)) +
geom_hline(yintercept = 1, linetype = 2) +
xlab("labels") +
ylab("Parameter estimates (95% Confidence Interval)") +
geom_pointrange(position = position_dodge(width = 0.6)) +
scale_x_discrete(name = "Fruits", breaks = "Medium", labels = NULL, expand = c(0, 1)) +
coord_flip() +
facet_grid(fruit ~ ., switch = "y") +
theme(strip.placement = "outside",
strip.background.y = element_blank(),
strip.text.y.left = element_text(angle = 0),
panel.spacing.y = unit(0, "pt"))
Maybe you want to facet_wrap your size variable:
set.seed(123)
N <- 18
means <- rnorm(N,0,1)
ses <- rexp(N,2)
upper<- means+qnorm(0.975)*ses
lower<- means+qnorm(0.025)*ses
fruit <- rep(c("Apples","Bananas","Pears"), each=6)
size <- rep(rep(c("Small","Medium","Big"), each=2),3)
GMO <- rep(c("Yes","No"), 9)
d<- data.frame(means,upper,lower,fruit,size,GMO)
library(ggplot2)
#> Warning: package 'ggplot2' was built under R version 4.1.2
ggplot(data=d,
aes(x = fruit,y = means, ymin = lower, ymax = upper, col=size,linetype=GMO,group=interaction(GMO, size)))+
geom_hline(aes(fill=size),yintercept =1, linetype=2)+
xlab('labels')+ ylab("Parameter estimates (95% Confidence Interval)")+
geom_pointrange(position=position_dodge(width = 0.6)) +
scale_x_discrete(name="Fruits")+
coord_flip() +
facet_wrap(~size)-> fplot
#> Warning: geom_hline(): Ignoring `mapping` because `yintercept` was provided.
fplot
Created on 2022-07-13 by the reprex package (v2.0.1)
I am using the cat_plot function from the 'interactions' package in R (which is a wrapper for ggplot) to plot a 2-way interaction with 2 categorical variables. I can do this easily using the code below (reprex from the "diamonds" dataset)
require(interactions)
data("diamonds")
m <- glm(price ~ cut*color, data = diamonds)
cat_plot(m, pred = cut, modx = color, geom = "bar", colors = "Set1")
This produces the following graph
However, what I would like is to have a faceted graph with each of the cuts presented separately, to make it visually easier to interpret. This can be done for 3-way interactions using the facet.modx = TRUE command, but when I try this with only a 2-way interaction with cat_plot(m, pred = cut, modx = color, geom = "bar", colors = "Set1", facet.modx = TRUE) I get the following error
Error in prep_data(model = model, pred = pred, modx = modx, pred.values = pred.values, :
formal argument "facet.modx" matched by multiple actual arguments
Is there a way to easily facet the graph for 2 way interactions? My real-life dataset is actually a glmer model so I would prefer to stay within the "interactions" package if possible.
EDIT: based on the suggestion from #stefan I tried the following syntax cat_plot(m, pred = cut, modx = color, geom = "bar", colors = "Set1") + facet_wrap(~cut) which produced the graph below. This is almost exactly what I want, except it has seemed to keep the other 'cuts' on the x-axis and just removed the bars. Ideally, colours would be on the x-axis instead.
EDIT 2:
I have recreated the problem using data which is more similar to what I am actually working with, with a binary outcome, random effects from glmer etc.
require(lme4)
require(interactions)
set.seed(123)
id <- rep(1:150, each = 4)
condition <- rep(c("a", "b", "c"), each = 4, times = 50)
cat_mod <- rep(c("cat_1", "cat_2", "cat_3", "cat_4"), each = 1, length.out = 600)
control_mod <- rep(c("control_1", "control_2"), each = 4, length.out = 600)
binary_choice <- rbinom(600, 1, 0.5)
simdat <- data.frame(id, condition, cat_mod, binary_choice, control_mod)
m <- glmer(binary_choice ~ condition*cat_mod + control_mod + (1 | id), family=binomial, data = simdat)
cat_plot(m, pred = condition, modx = cat_mod, geom = "bar", colors = "Set1")
I would like to preserve the response scale on the y-axis, and the model accounting for the random intercept, which is why I was trying to avoid using ggplot directly, as the interactions package is already built to accommodate glmms, which is super convenient.
SOLVED
Following the suggestion from #RStam I modified the code slightly so that all y-axes had the same scale, and removed the duplicate facet labels at the bottom.
cat_plot(m, pred = condition, modx = cat_mod, geom = "bar", colors = "Set1") +
scale_x_discrete(labels = c(a = " ", b = " ", c = " ")) +
facet_wrap(condition~., scales= "free_x")
This was the final result
Original Answer
cat_plot(m, pred = cut, modx = color, geom = "bar", colors = "Set1") +
facet_wrap(~cut, scales = "free_x")
Edit 1
After that it still wasn't resolving your issue I've updated my answer. This should resolve the issue you are having.
library(tidyverse)
ggplot(diamonds, aes(x=color,y=price, fill = color)) +
geom_col() + facet_wrap(~cut, scales = "free")
Edit 2
Using your new data and the interactions package I found a rather unpleasant 'hack' using scale_x_discrete() but it should give the desired outcome.
library(interactions)
library(lme4)
set.seed(123)
id <- rep(1:150, each = 4)
condition <- rep(c("a", "b", "c"), each = 4, times = 50)
cat_mod <- rep(c("cat_1", "cat_2", "cat_3", "cat_4"), each =
1, length.out = 600)
control_mod <- rep(c("control_1", "control_2"), each = 4,
length.out = 600)
binary_choice <- rbinom(600, 1, 0.5)
simdat <- data.frame(id, condition, cat_mod, binary_choice,
control_mod)
m <- glmer(binary_choice ~ condition*cat_mod + control_mod +
(1 | id), family=binomial, data = simdat)
cat_plot(m, pred = condition, modx = cat_mod, geom = "bar",
colors = "Set1") + scale_x_discrete() +
facet_wrap(condition~., scales= "free")
I have xy grouped data that I'm plotting using R's ggplot2 geom_violin adding regression trend lines:
Here are the data:
library(dplyr)
library(plotly)
library(ggplot2)
set.seed(1)
df <- data.frame(value = c(rnorm(500,8,1),rnorm(600,6,1.5),rnorm(400,4,0.5),rnorm(500,2,2),rnorm(400,4,1),rnorm(600,7,0.5),rnorm(500,3,1),rnorm(500,3,1),rnorm(500,3,1)),
age = c(rep("d3",500),rep("d8",600),rep("d24",400),rep("d3",500),rep("d8",400),rep("d24",600),rep("d3",500),rep("d8",500),rep("d24",500)),
group = c(rep("A",1500),rep("B",1500),rep("C",1500))) %>%
dplyr::mutate(time = as.integer(age)) %>%
dplyr::arrange(group,time) %>%
dplyr::mutate(group_age=paste0(group,"_",age))
df$group_age <- factor(df$group_age,levels=unique(df$group_age))
And my current plot:
ggplot(df,aes(x=group_age,y=value,fill=age,color=age,alpha=0.5)) +
geom_violin() + geom_boxplot(width=0.1,aes(fill=age,color=age,middle=mean(value))) +
geom_smooth(data=df,mapping=aes(x=group_age,y=value,group=group),color="black",method='lm',size=1,se=T) + theme_minimal()
My questions are:
How do I get rid of the alpha part of the legend?
I would like the x-axis ticks to be df$group rather than df$group_age, which means a tick per each group at the center of that group where the label is group. Consider a situation where not all groups have all ages - for example, if a certain group has only two of the ages and I'm pretty sure ggplot will only present only these two ages, I'd like the tick to still be centered between their two ages.
One more question:
It would also be nice to have the p-values of each fitted slope plotted on top of each group.
I tried:
library(ggpmisc)
my.formula <- value ~ group_age
ggplot(df,aes(x=group_age,y=value,fill=age,color=age,alpha=0.5)) +
geom_violin() + geom_boxplot(width=0.1,aes(fill=age,color=age,middle=mean(value))) +
geom_smooth(data=df,mapping=aes(x=group_age,y=value,group=group),color="black",method='lm',size=1,se=T) + theme_minimal() +
stat_poly_eq(formula = my.formula,aes(label=stat(p.value.label)),parse=T)
But I get the same plot as above with the following warning message:
Warning message:
Computation failed in `stat_poly_eq()`:
argument "x" is missing, with no default
geom_smooth() fits a line, while stat_poly_eqn() issues an error. A factor is a categorical variable with unordered levels. A trend against a factor is undefined. geom_smooth() may be taking the levels and converting them to "arbitrary" numerical values, but these values are just indexes rather than meaningful values.
To obtain a plot similar to what is described in the question but using code that provides correct linear regression lines and the corresponding p-values I would use the code below. The main change is that the numerical variable time is mapped to x making the fitting of a regression a valid operation. To allow for a linear fit an x-scale with a log10 transformation is used, with breaks and labels at the ages for which data is available.
library(dplyr)
library(ggplot2)
library(ggpmisc)
set.seed(1)
df <-
data.frame(
value = c(
rnorm(500, 8, 1), rnorm(600, 6, 1.5), rnorm(400, 4, 0.5),
rnorm(500, 2, 2), rnorm(400, 4, 1), rnorm(600, 7, 0.5),
rnorm(500, 3, 1), rnorm(500, 3, 1), rnorm(500, 3, 1)
),
age = c(
rep("d3", 500), rep("d8", 600), rep("d24", 400),
rep("d3", 500), rep("d8", 400), rep("d24", 600),
rep("d3", 500), rep("d8", 500), rep("d24", 500)
),
group = c(rep("A", 1500), rep("B", 1500), rep("C", 1500))
) %>%
mutate(time = as.integer(gsub("d", "", age))) %>%
arrange(group, time) %>%
mutate(age = factor(age, levels = c("d3", "d8", "d24")),
group = factor(group))
my_formula = y ~ x
ggplot(df, aes(x = time, y = value)) +
geom_violin(aes(fill = age, color = age), alpha = 0.3) +
geom_boxplot(width = 0.1,
aes(color = age), fill = NA) +
geom_smooth(color = "black", formula = my_formula, method = 'lm') +
stat_poly_eq(aes(label = stat(p.value.label)),
formula = my_formula, parse = TRUE,
npcx = "center", npcy = "bottom") +
scale_x_log10(name = "Age", breaks = c(3, 8, 24)) +
facet_wrap(~group) +
theme_minimal()
Which creates the following figure:
Here is a solution. The alpha - legend issue is easy. Anything you place into the aes() functioning will get placed in a legend. This feature should be used when you want a feature of the data to be used as an aestetic. Putting alpha outside of an aes will remove it from the legend.
I'm not sure the x legend is what you wanted but i did it manually so it should be easy to configure.
Regarding the p.values, i did separate linear regressions and store the p.value in three different vectors which can be called into the ggplot using the annotate. For two of the groups the p.value was <.001 so the round functioning will round it to 0. Therefore, i just added p. <.001
Good luck with this!
library(dplyr)
library(ggplot2)
set.seed(1)
df <- data.frame(value = c(rnorm(500,8,1),rnorm(600,6,1.5),rnorm(400,4,0.5),rnorm(500,2,2),rnorm(400,4,1),rnorm(600,7,0.5),rnorm(500,3,1),rnorm(500,3,1),rnorm(500,3,1)),
age = c(rep("d3",500),rep("d8",600),rep("d24",400),rep("d3",500),rep("d8",400),rep("d24",600),rep("d3",500),rep("d8",500),rep("d24",500)),
group = c(rep("A",1500),rep("B",1500),rep("C",1500))) %>%
dplyr::mutate(time = as.integer(age)) %>%
dplyr::arrange(group,time) %>%
dplyr::mutate(group_age=paste0(group,"_",age))
df$group_age <- factor(df$group_age,levels=unique(df$group_age))
mod1 <- lm(value ~ time,df\[df$group == 'A',\])
mod1 <- summary(mod1)$coefficients\[8\] %>% round(2)
mod2 <- lm(value ~ time,df\[df$group == 'B',\])
mod2 <- summary(mod2)$coefficients\[8\] %>% round(2)
mod3 <- lm(value ~ time,df\[df$group == 'C',\])
mod3 <- summary(mod3)$coefficients\[8\] %>% round(2)
ggplot(df,aes(x=group_age,y=value,fill=age,color=age)) +
geom_violin(alpha=0.5) +
geom_boxplot(width=0.1,aes(fill=age,color=age,middle=mean(value))) +
geom_smooth(mapping=aes(x=group_age,y=value,group=group),color="black",method='lm',size=1,se=T) +
scale_x_discrete(labels = c('','A','','','B','','','C','')) +
annotate('text',x = 2,y = -1,label = paste('pvalue: <.001')) +
annotate('text',x = 6,y = 10,label = paste('pvalue: <.001')) +
annotate('text',x = 8,y = -1.2,label = paste('pvalue:',mod3))+
theme_minimal()
I am trying to fit a non-linear regression to a set of data. However, when ploted, R returns many different lines where there should only be one.
This problem is only reproducable in one set of data and I can't see any obvious difference between this data and others.
This is the code for my plot:
plot(df$logFC, df$log_pval,
xlim=c(0,11.1), ylim=c(0,11),
xlab = "logFC", ylab = "p_val")
c <- df$logFC
d <- df$log_pval
model = nls(d ~ a*exp(b*c), start = list(a = 2,b = 0.1))
lines(c, predict(model), col = "dodgerblue", lty = 2, lwd = 2)
And here is a sample of my data (df):
logFC log_pval
4.315 2.788
6.724 9.836
2.925 4.136
5.451 10.836
2.345 1.486
4.219 7.618
I have narrowed the problem down to the model, but I'm not sure where to go from there. Any help is greatly appreciated!
1) ggplot method
I tried graphing the data using ggplot2 and I think the output is more what you were expecting...
library(tibble)
library(ggplot2)
library(dplyr)
# Create dataset
df <- tibble::tribble(~logFC, ~log_pval,
4.315, 2.788,
6.724, 9.836,
2.925, 4.136,
5.451, 10.836,
2.345, 1.486,
4.219, 7.618)
# Extract some vectors
c <- df$logFC
d <- df$log_pval
# Your model
model <- nls(d ~ a*exp(b*c), start = list(a = 2,b = 0.1))
# Create second dataset for new plotting
df2 <- tibble(logFC = c, log_pval =predict(model))
# Plot output
ggplot() +
geom_line(data = df2, aes(x = logFC, y = log_pval)) +
geom_point(data = df, aes(x =logFC, y =log_pval)) +
theme_classic()
2) base method
If you want to stick to base try ordering the x variables in the data frame before plotting the lines:
plot(df$logFC, df$log_pval,
xlab = "logFC", ylab = "p_val")
df3 <- tibble(x = df$logFC, y = predict(model)) %>% dplyr::arrange(x)
lines(df3$x, df3$y, col = "dodgerblue", lty = 1, lwd = 1)
It can be achieved with ggplot. More customization can be added to the plot if needed.
library(ggplot2)
ggplot(df) + aes(x = logFC, y = log_pval) + geom_point() +
geom_line(aes(x = c, y = predict(model)))
data
df <- structure(list(logFC = c(4.315, 6.724, 2.925, 5.451, 2.345, 4.219
), log_pval = c(2.788, 9.836, 4.136, 10.836, 1.486, 7.618)), class =
"data.frame", row.names = c(NA, -6L))
c <- df$logFC
d <- df$log_pval
model = nls(d ~ a*exp(b*c), start = list(a = 2,b = 0.1))
Thanks for your help Klink and Ronak,
It turns out the issue was the data not being ordered by size, and so 'points' plotted the unordered x-axis by the predicted y-axis, resulting in a zigzag between the predicted data.
Because ggplot presumably reorders the data before plotting, this issue has been resolved.
I'm trying to re-create a plot like this in ggplot:.
This graph takes the residuals from a regression output, and plots them in order (with the X-axis being a rank of residuals).
My best attempt at this was something like the following:
library(ggplot2)
library(modelr)
d <- d %>% add_residuals(mod1, var = "resid")
d$resid_rank <- rank(d$resid)
ggplot(data = d, aes(x = resid_rank, y = resid)) +
geom_bar(stat="identity") +
theme_bw()
However, this yields a completely blank graph. I tried something like this:
ggplot(data = d, aes(x = resid_rank, y = resid)) +
geom_segment(yend = 0, aes(xend=resid)) +
theme_bw()
But this yields the segments that go in the wrong direction. What is the right way to do this, and to color those lines by a third factor?
FAKE DATASET:
library(estimatr)
library(fabricatr)
#simulation
dat <- fabricate(
N = 10000,
y = runif(N, 0, 10),
x = runif(N, 0, 100)
)
#add an outlier
dat <- rbind(dat, c(300, 5))
dat <- rbind(dat, c(500, 3))
dat$y_log <- log(dat$y)
dat$x_log <- log(dat$x)
dat$y_log_s <- scale(log(dat$y))
dat$x_log_s <- scale(log(dat$x))
mod1 <- lm(y_log ~ x_log, data = dat))
I used the build in dataset from the help page on lm() to create this example. I also just directly used resid() to get the residuals. It's unclear where / why the colored bars would be different, but basically you'd need to add a column to your data.frame that specificies why they are red or blue, then pass that to fill.
library(ggplot2)
#> Warning: package 'ggplot2' was built under R version 3.4.4
#example from lm
ctl <- c(4.17,5.58,5.18,6.11,4.50,4.61,5.17,4.53,5.33,5.14)
trt <- c(4.81,4.17,4.41,3.59,5.87,3.83,6.03,4.89,4.32,4.69)
group <- gl(2, 10, 20, labels = c("Ctl","Trt"))
weight <- c(ctl, trt)
lm.D9 <- lm(weight ~ group)
resids <- data.frame(resid = resid(lm.D9))
#why are some bars red and some blue? No clue - so I'll pick randomly
resids$group <- sample(c("group 1", "group 2"), nrow(resids), replace = TRUE)
#rank
resids$rank <- rank(-1 * resids$resid)
ggplot(resids, aes(rank, resid, fill = group)) +
geom_bar(stat = "identity", width = 1) +
geom_hline(yintercept = c(-1,1), colour = "darkgray", linetype = 2) +
geom_hline(yintercept = c(-2,2), colour = "lightgray", linetype = 1) +
theme_bw() +
theme(panel.grid = element_blank()) +
scale_fill_manual(values = c("group 1" = "red", "group 2" = "blue"))
Created on 2019-01-24 by the reprex package (v0.2.1)