I am trying to fit a non-linear regression to a set of data. However, when ploted, R returns many different lines where there should only be one.
This problem is only reproducable in one set of data and I can't see any obvious difference between this data and others.
This is the code for my plot:
plot(df$logFC, df$log_pval,
xlim=c(0,11.1), ylim=c(0,11),
xlab = "logFC", ylab = "p_val")
c <- df$logFC
d <- df$log_pval
model = nls(d ~ a*exp(b*c), start = list(a = 2,b = 0.1))
lines(c, predict(model), col = "dodgerblue", lty = 2, lwd = 2)
And here is a sample of my data (df):
logFC log_pval
4.315 2.788
6.724 9.836
2.925 4.136
5.451 10.836
2.345 1.486
4.219 7.618
I have narrowed the problem down to the model, but I'm not sure where to go from there. Any help is greatly appreciated!
1) ggplot method
I tried graphing the data using ggplot2 and I think the output is more what you were expecting...
library(tibble)
library(ggplot2)
library(dplyr)
# Create dataset
df <- tibble::tribble(~logFC, ~log_pval,
4.315, 2.788,
6.724, 9.836,
2.925, 4.136,
5.451, 10.836,
2.345, 1.486,
4.219, 7.618)
# Extract some vectors
c <- df$logFC
d <- df$log_pval
# Your model
model <- nls(d ~ a*exp(b*c), start = list(a = 2,b = 0.1))
# Create second dataset for new plotting
df2 <- tibble(logFC = c, log_pval =predict(model))
# Plot output
ggplot() +
geom_line(data = df2, aes(x = logFC, y = log_pval)) +
geom_point(data = df, aes(x =logFC, y =log_pval)) +
theme_classic()
2) base method
If you want to stick to base try ordering the x variables in the data frame before plotting the lines:
plot(df$logFC, df$log_pval,
xlab = "logFC", ylab = "p_val")
df3 <- tibble(x = df$logFC, y = predict(model)) %>% dplyr::arrange(x)
lines(df3$x, df3$y, col = "dodgerblue", lty = 1, lwd = 1)
It can be achieved with ggplot. More customization can be added to the plot if needed.
library(ggplot2)
ggplot(df) + aes(x = logFC, y = log_pval) + geom_point() +
geom_line(aes(x = c, y = predict(model)))
data
df <- structure(list(logFC = c(4.315, 6.724, 2.925, 5.451, 2.345, 4.219
), log_pval = c(2.788, 9.836, 4.136, 10.836, 1.486, 7.618)), class =
"data.frame", row.names = c(NA, -6L))
c <- df$logFC
d <- df$log_pval
model = nls(d ~ a*exp(b*c), start = list(a = 2,b = 0.1))
Thanks for your help Klink and Ronak,
It turns out the issue was the data not being ordered by size, and so 'points' plotted the unordered x-axis by the predicted y-axis, resulting in a zigzag between the predicted data.
Because ggplot presumably reorders the data before plotting, this issue has been resolved.
Related
I have a dose-response curve and I need to calculate and plot a 4-parameter logistic regression using the R package drc and ggplot. It needs to have 1 as the upper limit and 0 as the lower limit. It works fine exept that the calculated ED50 value is different from the ED50 value when read from the graph created by ggplot.
Here is a minimal example:
library(ggplot2)
library(drc)
df <- data.frame(dose = c("2000", "666.67", "222.22",
"74.04", "24.69", "8.23",
"2.74", "0.91", "0.30",
"0.10", "0.03", "0.01"),
response = c("0.569767442", "0.709302326", "0.767441860",
"0.712209302", "0.747093023", "0.723837209",
"0.71802326", "0.7558140", "0.7906977",
"0.7616279", "0.8197674", "1"))
df$dose <- as.numeric(df$dose)
df$response <- as.numeric(df$response)
# calculating the ED 50
logistic <- drm(response~dose,
data = df,
fct = LL.4(fixed = c(NA, 0, 1, NA)))
ED50 <- ED(logistic, 50)[1]
# plotting
ggplot(data = df, aes(x = dose, y = response))+
scale_x_continuous(trans = "log10", limits = c(1E-2, 2E6))+
scale_y_continuous()+
geom_smooth(method = drm,
method.args = list(fct = L.4(fixed = c(NA, 0, 1, NA)),
# "b", "c", "d", "e"
# L.4 and not LL.4 because the x scale is on a log10
se=FALSE,
fullrange = TRUE))+
geom_point(alpha = 0.5)+
geom_point(aes(x = ED50, y = 0.5), color = "red") +
coord_cartesian(xlim = c(1E-2,2E6), ylim = c(0,1))
It gives the following graph (the calculated ED50 value is marked in red).
Obviously, the calculated ED50 does not match the ED50 value suggested by the graph even though I calulated both using the LL.4 method. The plot uses a L.4 on a log axis, which should be identical to LL.4 on a non-log scale as suggested here.
I have no clue what the problem is or what I am missing. Thank you for your help and time in advance!
This doesn't really give an answer to the question why the methods give you different ED50's, that might be more suited for CrossValidated. However, you can circumvent the issue of reconstituting your model on a log axis, by just using stat_function() + predict() instead of using geom_smooth(). That way, you can be sure that the plotted data comes from the model that also gave you the ED50 estimate.
library(ggplot2)
library(drc)
df <- data.frame(dose = c("2000", "666.67", "222.22",
"74.04", "24.69", "8.23",
"2.74", "0.91", "0.30",
"0.10", "0.03", "0.01"),
response = c("0.569767442", "0.709302326", "0.767441860",
"0.712209302", "0.747093023", "0.723837209",
"0.71802326", "0.7558140", "0.7906977",
"0.7616279", "0.8197674", "1"))
df$dose <- as.numeric(df$dose)
df$response <- as.numeric(df$response)
# calculating the ED 50
logistic <- drm(response~dose,
data = df,
fct = LL.4(fixed = c(NA, 0, 1, NA)))
ED50 <- ED(logistic, 50)[1]
ggplot(df, aes(dose, response)) +
stat_function(
fun = function(x) predict(logistic, newdata = data.frame(dose = x))
) +
geom_point(alpha = 0.5) +
geom_point(
aes(x = ED50, y = 0.5),
colour = "red"
) +
scale_x_continuous(trans = "log10")
Created on 2022-10-17 by the reprex package (v2.0.0)
I have a boxplot from the code below and i want to add median values.
boxplot(ndvi_pct_sep~edge_direction, data= data_sample, subset = edge_direction %in% c(64,4, 1,16),ylab="NDVI2028-2016", xlab="Forest edge direction",names=c("north", "south", "east", "west"))
.
I want to add the median values to the boxplots, any idea how to do it?
It will likely involve using legends - since I don't have your data I cant make it perfect, but the below code should get you started using the ToothGrowth data contained in R. I am showing a base R and ggplot example (I know you said no ggplot, but others may use it).
# Load libraries
library(dplyr); library(ggplot2)
# get median data
mediandata <- ToothGrowth %>% group_by(dose) %>% summarise(median = median(len, na.rm = TRUE))
l <- unname(unlist(mediandata))
tg <- ToothGrowth # for convenience
tg$dose <- as.factor(tg$dose)
### Base R approach
boxplot(len ~ dose, data = tg,
main = "Guinea Pigs' Tooth Growth",
xlab = "Vitamin C dose mg",
ylab = "tooth length", col = "red")
for (i in 1:3){
legend(i-0.65,l[i+3]+5, legend = paste0("Median: ",l[i+3]), bty = "n")
}
### ggplot approach
ggplot(data = tg, aes(dose, len)) +
theme_classic() + theme(legend.position = "none") +
geom_boxplot()+
annotate("text",
x = c(1,2,3),
y = l[4:6]+1, # shit so you can read it
label = l[4:6])
Base R:
ggplot:
Here's a straightforward solution with text and without forloop:
Toy data:
set.seed(12)
df <- data.frame(
var1 = sample(LETTERS[1:4], 100, replace = TRUE),
var2 = rnorm(100)
)
Calculate the medians:
library(dplyr)
med <- df %>%
group_by(var1) %>%
summarise(medians = median(var2)) %>%
pull(medians)
Alternatively, in base R:
bx <- boxplot(df$var2 ~ df$var1)
med <- bx$stats[3,1:4]
Boxplot:
boxplot(df$var2 ~ df$var1)
Annotate boxplots:
text(1:4, med, round(med,3), pos = 3, cex = 0.6)
You can do
b <- boxplot(count ~ spray, data = InsectSprays, col = "lightgray", boxwex=.2)
s <- b$stats
text(1:ncol(s)+.4, s[3,], round(s[3,],1), col="red")
this question is an extension of a previous posting: ggplot and loops
I was using the example above to generate bar graphs in a loop. I modified the above example, to generate a bar graph and corresponding error bars. I somewhat succeeded, however the error bars do not populate according to the individual variable.
I would appreciate the help very much!
My modifications:
#make a dummy dataframe
D <- data.frame(
x1 = runif(20),
x2 = rnorm(20),
x1_se = runif(20, 0.01, 0.09),
x2_se = runif(20, -1, 1),
treatment = rep(c("control","test"), each = 10)
)
# for reference later
p_names <- c("treatment","x1","x2")
se_names <- c("treatment","x1_se","x2_se")
trt <- rep(c("control","test"), each = 10)
# subset the standard error into its own dataframe
se <- D[,se_names]
names(se) <- str_remove(names(se), "_se")
plots <- list()
# the loop
for(nm in p_names) {
#trt <- trt
plots[[nm]] <- ggplot(data= D, aes(x = trt, fill = trt)) +
geom_bar(aes_string(y = D[[nm]]), stat="identity", position = "dodge", color = "black") +
geom_errorbar(aes(ymin= D[[nm]] - se[[nm]],
ymax= D[[nm]] + se[[nm]]), position=position_dodge(.9)) + ylab(nm)
}
print(plots[["x1"]])
print(plots[["x2"]])
```
It doesn't make sense to have one s.e per observation, if you are trying to plot a mean and se barplot for each column, do the below:
p_names = c("x1","x2")
for(nm in p_names) {
plots[[nm]] <- ggplot(data= D,aes_string(x ="treatment",y=nm,fill="treatment"))+
stat_summary(fun.y=mean,color = "black",geom="bar") +
stat_summary(fun.data=mean_se,geom="errorbar",width=0.2)
}
I'm attempting to draw tiles / rectangles to get the following result:
library(tidyverse)
library(plotly)
set.seed(0)
df <- tibble(
a = runif(5),
b = runif(5),
c = runif(5),
d = runif(5),
case_id = 1:5
) %>% tidyr::pivot_longer(cols = -case_id)
plot <- ggplot2::ggplot(
data = df,
mapping = aes(
x = name,
y = value,
group = case_id
)
) + geom_point()
plot_boxes_y <- seq(from = 0, to = 1, by = .2)
plot_boxes_x <- unique(df$name) %>% length()
for (x in 1:plot_boxes_x) {
for (y in plot_boxes_y) {
plot <- plot + geom_rect(
mapping = aes_(
xmin = x - .5,
xmax = x + .5,
ymin = y - .5,
ymax = y + .5
),
color = "red",
fill = NA
)
}
}
plotly::ggplotly(plot)
As you can see, I currently do this by looping through coordinates and drawing each rectangle individually. The problem is, that this generates many layers which makes plotly::ggplotly() really slow on large datasets.
Therefore, I'm looking for a more efficient way. Please note, that I cannot use the panel.grid, since I intend to visualize z-data by filling rectangles later on.
My approach was to draw geom_tile() on top of the scatter plot:
# my attempt
df$z <- rep(0, nrow(df))
plot2 <- ggplot2::ggplot(
data = df,
mapping = aes(
x = name,
y = value,
color = z,
group = case_id
)
) + geom_point() + geom_tile()
I assume that this fails because of the fact that name is a discrete variable? So, how can i efficiently draw tiles in addition to my scatterplot?
Thanks
Here is a solution using the geom_tile option. The key here creating a data frame to hold the coordinates of the grid and then specifying the aesthetics individually in each of the function calls.
library(ggplot2)
library(tidyr)
set.seed(0)
df <- tibble(
a = runif(5),
b = runif(5),
c = runif(5),
d = runif(5),
case_id = 1:5
) %>% pivot_longer(cols = -case_id)
df$z <- rep(0, nrow(df))
#make data frame for the grid corrdinates
grid<-data.frame(x=factor( ordered( 1:4), labels = c("a", "b", "c", "d" )),
y=rep(seq(0, 1, .1), each=4))
#plot using geom_tile & geom_point
plot2 <- ggplot2::ggplot() + geom_tile(data=grid, aes(x=x, y=y), fill=NA, col="red") +
geom_point(data = df,
mapping = aes(
x = name,
y = value,
color = z,
group = case_id))
print(plot2)
if you don't mind them going beyond the axis
ggplot(df,aes(x=name,y=value)) + geom_point() +
geom_vline(xintercept=seq(0.5,4.5,by=1)) +
geom_hline(yintercept=seq(0,2,by=.2))
else:
#make a new data frame
GRIDS = rbind(
# the vertical lines
data.frame(x=seq(0.5,4.5,by=1),xend=seq(0.5,4.5,by=1),y=0,yend=2),
# the horizontal lines
data.frame(x=0.5,xend=4.5,y=seq(0,2,by=.2),yend=seq(0,2,by=.2))
)
ggplot(df,aes(x=name,y=value)) + geom_point() +
geom_segment(data=GRIDS,aes(x=x,y=y,xend=xend,yend=yend),col="red")
this is my first stack overflow post and I am a relatively new R user, so please go gently!
I have a data frame with three columns, a participant identifier, a condition (factor with 2 levels either Placebo or Experimental), and an outcome score.
set.seed(1)
dat <- data.frame(Condition = c(rep("Placebo",10),rep("Experimental",10)),
Outcome = rnorm(20,15,2),
ID = factor(rep(1:10,2)))
I would like to construct a bar plot with two bars with the mean outcome score for each condition and the standard deviation as an error bar. I would like to then overlay lines connecting points for each participant's score in each condition. So the plot displays the individual response as well as the group mean.If it is also possible I would like to include an axis break.
I don't seem to be able to find any advice in other threads, apologies if I am repeating a question.
Many Thanks.
p.s. I realise that presenting data in this way will not be to everyones tastes. It is for a specific requirement!
This ought to work:
library(ggplot2)
library(dplyr)
dat.summ <- dat %>% group_by(Condition) %>%
summarize(mean.outcome = mean(Outcome),
sd.outcome = sd(Outcome))
ggplot(dat.summ, aes(x = Condition, y = mean.outcome)) +
geom_bar(stat = "identity") +
geom_errorbar(aes(ymin = mean.outcome - sd.outcome,
ymax = mean.outcome + sd.outcome),
color = "dodgerblue", width = 0.3) +
geom_point(data = dat, aes(x = Condition, y = Outcome),
color = "firebrick", size = 1.2) +
geom_line(data = dat, aes(x = Condition, y = Outcome, group = ID),
color = "firebrick", size = 1.2, alpha = 0.5) +
scale_y_continuous(limits = c(0, max(dat$Outcome)))
Some people are better with ggplot's stat functions and arguments than I am and might do it differently. I prefer to just transform my data first.
set.seed(1)
dat <- data.frame(Condition = c(rep("Placebo",10),rep("Experimental",10)),
Outcome = rnorm(20,15,2),
ID = factor(rep(1:10,2)))
dat.w <- reshape(dat, direction = 'wide', idvar = 'ID', timevar = 'Condition')
means <- colMeans(dat.w[, 2:3])
sds <- apply(dat.w[, 2:3], 2, sd)
ci.l <- means - sds
ci.u <- means + sds
ci.width <- .25
bp <- barplot(means, ylim = c(0,20))
segments(bp, ci.l, bp, ci.u)
segments(bp - ci.width, ci.u, bp + ci.width, ci.u)
segments(bp - ci.width, ci.l, bp + ci.width, ci.l)
segments(x0 = bp[1], x1 = bp[2], y0 = dat.w[, 2], y1 = dat.w[, 3], col = 1:10)
points(c(rep(bp[1], 10), rep(bp[2], 10)), dat$Outcome, col = 1:10, pch = 19)
Here is a method using the transfomations inside ggplot2
ggplot(dat) +
stat_summary(aes(x=Condition, y=Outcome, group=Condition), fun.y="mean", geom="bar") +
stat_summary(aes(x=Condition, y=Outcome, group=Condition), fun.data="mean_se", geom="errorbar", col="green", width=.8, size=2) +
geom_line(aes(x=Condition, y=Outcome, group=ID), col="red")