Not adjusted (?) trend line, ggplot [R] - r

Maybe it's an understanding error but with this data and code...
data %>%
ggplot(aes(x, y)) + geom_point() +
geom_smooth(method = "lm", se = T, fullrange = T) +
ggpubr::stat_cor(label.x = 3) +
scale_y_log10(breaks = scales::trans_breaks("log10", function(x) 10^x),
labels = scales::trans_format("log10", scales::math_format(10^.x)))
... I'm getting this plot:
I'd expect a different trend line (manually drawn in red) and what I'm guessing is that those -Inf values are affecting somehow. Anyone could help me to understand the error, please? Thanks.

Related

How to find slopes of multiple regression?

I'm making a plot of several linear regressions and I would like to find the slope of each of them. The problem is that I don't find how to do it in my case.
Like you can see on my plot, I'm testing the weight as a function of the temperature, a quality (my two colors) and quantity (my facet wrap).
My code for this plot is that :
g = ggplot(donnees_tot, aes(x=temperature, y=weight, col = quality))+
geom_point(aes(col=quality), size = 3)+
geom_smooth(method="lm", span = 0.8,aes(col=quality, fill=quality))+
scale_color_manual(values=c("S" = "aquamarine3",
"Y" = "darkgoldenrod3"))+
scale_fill_manual(values=c("S" = "aquamarine3",
"Y" = "darkgoldenrod3"))+
scale_x_continuous(breaks=c(20,25,28), limits=c(20,28))+
annotate("text", x= Inf, y = - Inf, label =eqn, parse = T, hjust=1.1, vjust=-.5)+
facet_wrap(~quantity)
g
Also, if you have a tips to write them on my plot, I would be really grateful !
Thank you
By using the ggpmisc package, I've had these lines to my code and it works !
stat_poly_line() +
stat_poly_eq(aes(label = paste(after_stat(eq.label),
after_stat(rr.label), sep = "*\", \"*"))) +

How to remove na from ggplot scatterplot

Is anyone able to help me with my ggplot please. I have tried multiple ways to remove the na from the plot, including na.rm = TRUE, na.rm = FALSE and placing these in different areas of the code. I have also tried using na.omit but this removes all data in the dataframe, instead of just na.
Birth_Sex_Plot <- ggplot(sarah_data2, aes(x=days_birth_measurement, y=hc_birth, colour= Autism)) +
theme_classic()+ ylab("HC_Birth") + xlab("Days since measurement")
Birth_Sex_Plot + geom_point() + geom_smooth(method = lm, se=FALSE, fullrange = FALSE, na.rm=FALSE)
Any help would be really appreciated. Thank you
Without knowing your data, this should do the job. You can subset your data inside the ggplot so that you remove NA values from your Autism column. You can use the following code:
library(ggplot2)
Birth_Sex_Plot <- ggplot(data=subset(sarah_data2, !is.na(Autims)), aes(x=days_birth_measurement, y=hc_birth, colour= Autism)) +
theme_classic()+ ylab("HC_Birth") + xlab("Days since measurement")
Birth_Sex_Plot + geom_point() + geom_smooth(method = lm, se=FALSE, fullrange = FALSE)

Adding a regression trend line and a shaded standard error area to a ggplot for regression models that geom_smooth does not handle

I have a data.frame with observed success/failure outcomes per two groups along with expected probabilities:
library(dplyr)
observed.probability.df <- data.frame(group = c("A","B"), p = c(0.4,0.6))
expected.probability.df <- data.frame(group = c("A","B"), p = qlogis(c(0.45,0.55)))
observed.data.df <- do.call(rbind,lapply(c("A","B"), function(g)
data.frame(group = g, value = c(rep(0,1000*dplyr::filter(observed.probability.df, group != g)$p),rep(1,1000*dplyr::filter(observed.probability.df, group == g)$p)))
)) %>% dplyr::left_join(expected.probability.df)
observed.probability.df$group <- factor(observed.probability.df$group, levels = c("A","B"))
observed.data.df$group <- factor(observed.data.df$group, levels = c("A","B"))
I'm fitting a logistic regression (binomial glm with a logit link function) to these data with the offset term:
fit <- glm(value ~ group + offset(p), data = observed.data.df, family = binomial(link = 'logit'))
Now, I'd like to plot these data as a bar graph using ggplot2's geom_bar, color-coded by group, and to add to that the trend line and shaded standard error area estimated in fit.
I'd use stat_smooth for that but I don't think it can handle the offset term in it's formula, so looks like I need to resort to assembling this figure in an alternative way.
To get the bars and the trend line I used:
slope.est <- function(x, ests) plogis(ests[1] + ests[2] * x)
library(ggplot2)
ggplot(observed.probability.df, aes(x = group, y = p, fill = group)) +
geom_bar(stat = 'identity') +
stat_function(fun = slope.est,args=list(ests=coef(fit)),size=2,color="black") +
scale_x_discrete(name = NULL,labels = levels(observed.probability.df$group), breaks = sort(unique(observed.probability.df$group))) +
theme_minimal() + theme(legend.title = element_blank()) + ylab("Fraction of cells")
So the question is how to add to that the shaded standard error around the trend line?
Using stat_function I am able to shade the entire area from the upper bound of the standard error all the way down to the X-axis:
ggplot(observed.probability.df, aes(x = group, y = p, fill = group)) +
geom_bar(stat = 'identity') +
stat_function(fun = slope.est,args=list(ests=coef(fit)),size=2,color="black") +
stat_function(fun = slope.est,args=list(ests=summary(fit)$coefficients[,1]+summary(fit)$coefficients[,2]),geom='area',fill="gray",alpha=0.25) +
scale_x_discrete(name = NULL,labels = levels(observed.probability.df$group), breaks = sort(unique(observed.probability.df$group))) +
theme_minimal() + theme(legend.title = element_blank()) + ylab("Fraction of cells")
Which is close but not quite there.
Any idea how to subtract from the shaded area above the area that's below the lower bound of the standard error? Perhaps geom_ribbon is the way to go here, but I don't know how to combine it with the slope.est function

Plotting standard error bars

I have a long format dataset with 3 variables. Im plotting two of the variables and faceting by the other one, using ggplot2. I'd like to plot the standard error bars of the observations from each facet too, but I've got no idea how. Anyone knows?
Here´s a picture of what i've got. I'd like to have the standard error bars on each facet. Thanks!!
Edit: here's some example data and the plot.
data <- data.frame(rep(c("1","2","3","4","5","6","7","8","9","10",
"11","12","13","14","15","16","17","18","19","20",
"21","22","23","24","25","26","27","28","29","30",
"31","32"), 2),
rep(c("a","b","c","d","e","f","g","h","i","j","k","l"), 32),
rnorm(n = 384))
colnames(data) <- c("estado","sector","VA")
ggplot(data, aes(x = estado, y = VA, col = sector)) +
facet_grid(.~sector) +
geom_point()
If all you want is the mean & standard error bar associated with each "estado"-"sector" combination, you can leave ggplot to do all the work, by replacing the geom_point() line with stat_summary():
ggplot(data,
aes(x = estado, y = VA, col = sector)) +
facet_grid(. ~ sector) +
stat_summary(fun.data = mean_se)
See ?mean_se from the ggplot2 package for more details on the function. The default parameter option gives you the mean as well as the range for 1 standard error above & below the mean.
If you want to show the original points, just add back the geom_point() line. (Though I think the plot would be rather cluttered for the reader, in that case...)
Maybe you could try something like below?
set.seed(1)
library(dplyr)
dat = data.frame(estado = factor(rep(1:32, 2)),
sector = rep(letters[1:12], 32),
VA = rnorm(384))
se = function(x) {
sd(x)/sqrt(length(x))
}
dat_sum = dat %>% group_by(estado, sector) %>%
summarise(mu = mean(VA), se = se(VA))
dat_plot = full_join(dat, dat_sum)
ggplot(dat_plot, aes(estado, y = VA, color = sector)) +
geom_jitter() +
geom_errorbar(aes(estado, y = mu, color = sector,
ymin = mu - se, ymax = mu + se)) +
facet_grid(.~sector)

geom_smooth does not appear on ggplot

I am working on some viscosity experiments and I'm trying to make an Eyring plot with ν vs. θ.
When I create the plot with ggplot2 I can't get my model displayed.
These are the values used:
> theta
[1] 25 30 35 40 45
> nu
[1] 1.448462 1.362730 1.255161 1.167408 1.083005
Here I create the plot with my values from above:
plot <-
ggplot()+
geom_point(mapping = aes(theta, nu), colour = "#0072bd", size = 4, shape = 16)+
theme_bw()+
labs(
x = expression(paste(theta, " ", "[°C]")),
y = expression(paste("ln(", nu, ")", " ", "[mPa*s]")))+
ylim(0, 10)+
xlim(0, 100)
That's what the plot looks like.
Now, I add my model with geom_smooth()
plot +
geom_smooth(
method = "nls",
method.args = list(formula = nu~a*exp(b/theta),
start=list(a=1, b=0.1)))
But nothing happens... Not even an error message and the plot looks just the same as before.
I also tried to put the formula directly as a geom_smooth() argument and the start values as well,
plot +
geom_smooth(
method = "nls",
formula = nu~a*exp(b/theta),
start=list(a=1, b=0.1))
but then I get the
Error:Unknown parameter: start
Can anyone find the mistake I'm making?
Thanks in advance!
Cheers
EDIT
When separating the aesthetics mapping,
plot <-
ggplot()+
aes(theta, nu)+
geom_point(colour = "#0072bd", size = 4, shape = 16)+
theme_bw()+
labs(
x = expression(paste(theta, " ", "[°C]")),
y = expression(paste("ln(", nu, ")", " ", "[mPa*s]")))+
ylim(0, 10)+
xlim(0, 100)
I get the following error (and still nothing changes):
Warning message:
1: In min(x) : no non-missing arguments to min; returning Inf
2: In max(x) : no non-missing arguments to min; returning -Inf
3: Computation failed in stat_smooth():
$ operator is invalid for atomic vectors
You have several things going on, many of which were pointed out in the comments.
Once you put your variables in a data.frame for ggplot and define you aesthetics either globally in ggplot or within each geom, the main thing going on is that the formula in geom_smooth expects you to refer to y and x instead of the variable names. geom_smooth will use the variables you mapped to y and x in aes.
The other complication you will run into is outlined here. Because you don't get standard errors from predict.nls, you need to use se = FALSE in geom_smooth.
Here is what your geom_smooth code might look like:
geom_smooth(method = "nls", se = FALSE,
method.args = list(formula = y~a*exp(b/x), start=list(a=1, b=0.1)))
And here is the full code and plot.
ggplot(df, aes(theta, nu))+
geom_point(colour = "#0072bd", size = 4, shape = 16)+
geom_smooth(method = "nls", se = FALSE,
method.args = list(formula = y~a*exp(b/x), start=list(a=1, b=0.1))) +
theme_bw()+
labs(
x = expression(paste(theta, " ", "[°C]")),
y = expression(paste("ln(", nu, ")", " ", "[mPa*s]")))+
ylim(0, 10) +
xlim(0, 100)
Note that geom_smooth won't fit outside the range of the dataset unless you use fullrange = TRUE instead of the default. This may be pretty questionable if you only have 5 data points.
ggplot(df, aes(theta, nu))+
geom_point(colour = "#0072bd", size = 4, shape = 16)+
geom_smooth(method = "nls", se = FALSE, fullrange = TRUE,
method.args = list(formula = y~a*exp(b/x), start=list(a=1, b=0.1))) +
theme_bw()+
labs(
x = expression(paste(theta, " ", "[°C]")),
y = expression(paste("ln(", nu, ")", " ", "[mPa*s]")))+
ylim(0, 10) +
xlim(0, 100)
I just wrote this answer as #lukeA made the comment.
df<- data.frame(theta = c(25, 30, 35, 40, 45),
nu = c( 1.448462, 1.362730, 1.255161, 1.167408, 1.083005))
myModel <- nls(nu~a*exp(b/theta), data=df, start=list(a=1, b=0.1))
myPredict <- expand.grid(theta = seq(5, 100, by =0.1))
#expand.grid here in case your model has more than one variable
#Caution, extrapolating well beyond the data
myPredict$fit <- predict(myModel, newdata= myPredict)
plot + geom_line(data = myPredict, aes(x= theta, y= fit))

Resources