Show R^2 value of the fit on the plot - r

I am developing a shiny app in which, I am generating scatterplots by uploading the data files in .txt format.
I doing a polynomial fit on the scatterplot. I want the plot to show R^2 value.
Here is my attempt:
#plot
g <- ggplot(data = df, aes_string(x = df$x, y = df$y)) + theme_bw() +
geom_point(colour = "blue", size = 0.1)+
geom_smooth(formula = y ~ poly(x,input$degree, raw = TRUE), method = "lm", color = "green3", level = 1, size = 0.5)+
stat_poly_eq(formula = y ~ poly(x,input$degree, raw = TRUE),aes(label = paste(..eq.label.., ..rr.label.., sep = "~~~")), parse = TRUE)
ggplotly(g)
A slider is used to vary the degree of the polynomial function, its handle is input$degree
I used stat_poly_eq from the ggpmisc package to get the value of the R^2
But when I run this code, the R^2 value does not get reflected on the ggplot as a legend. Which, according to the examples that I have seen, should get reflected on the plot as the legend of the plot.

Related

How to find slopes of multiple regression?

I'm making a plot of several linear regressions and I would like to find the slope of each of them. The problem is that I don't find how to do it in my case.
Like you can see on my plot, I'm testing the weight as a function of the temperature, a quality (my two colors) and quantity (my facet wrap).
My code for this plot is that :
g = ggplot(donnees_tot, aes(x=temperature, y=weight, col = quality))+
geom_point(aes(col=quality), size = 3)+
geom_smooth(method="lm", span = 0.8,aes(col=quality, fill=quality))+
scale_color_manual(values=c("S" = "aquamarine3",
"Y" = "darkgoldenrod3"))+
scale_fill_manual(values=c("S" = "aquamarine3",
"Y" = "darkgoldenrod3"))+
scale_x_continuous(breaks=c(20,25,28), limits=c(20,28))+
annotate("text", x= Inf, y = - Inf, label =eqn, parse = T, hjust=1.1, vjust=-.5)+
facet_wrap(~quantity)
g
Also, if you have a tips to write them on my plot, I would be really grateful !
Thank you
By using the ggpmisc package, I've had these lines to my code and it works !
stat_poly_line() +
stat_poly_eq(aes(label = paste(after_stat(eq.label),
after_stat(rr.label), sep = "*\", \"*"))) +

ggplot for linear-log regression model?

How do I plot a log linear model in R?
Currently, I am doing this but am not sure if it's the right/efficient way:
data(food)
model1 <- lm(food_exp~log(income), data = food)
temp_var <- predict(model1, interval="confidence")
new_df <- cbind(food, temp_var)
head(new_df)
ggplot(new_df, aes(x = income, y = food_exp))+
geom_point() +
geom_smooth(aes(y=lwr), color = "red", linetype = "dashed")+
geom_smooth(aes(y=upr), color = "red", linetype = "dashed")+
geom_smooth(aes(y = fit), color = "blue")+
theme_economist()
you can use geom_smooth and putting your formula directly in. It should yield the same as your fit (which you can check by also plotting that)
ggplot(new_df, aes(x = Sepal.Width, y = Sepal.Length))+
geom_point() +
geom_point(aes(y=fit), color="red") + #your original fit
geom_smooth(method=lm, formula=y~log(x)) #ggplot fit
If you don't car about extracting the parameters and just want the plot, you can plot directly in ggplot2.
Some fake data for plotting:
library(tidyverse)
set.seed(454)
income <- VGAM::rpareto(n = 100, scale = 20, shape = 2)*1000
food_exp <- rnorm(100, income*.3+.1, 3)
food <- data.frame(income, food_exp)
Now within ggplot2, use the geom_smooth function and specify that you want a linear model. Additionally, you can directly transform the income in the aes argument:
ggplot(food, aes(x = log(income), y = food_exp))+
geom_point()+
geom_smooth(method = "lm")+
theme_bw()+
labs(
title = "Log Linear Model Food Expense as a Function of Log(income)",
x = "Log(Income)",
y = "Food Expenses"
)
This will work for confidence intervals, but adding prediction intervals, you'll need to do what you did earlier with fitting the model, generating the prediction intervals.

Adding a regression trend line and a shaded standard error area to a ggplot for regression models that geom_smooth does not handle

I have a data.frame with observed success/failure outcomes per two groups along with expected probabilities:
library(dplyr)
observed.probability.df <- data.frame(group = c("A","B"), p = c(0.4,0.6))
expected.probability.df <- data.frame(group = c("A","B"), p = qlogis(c(0.45,0.55)))
observed.data.df <- do.call(rbind,lapply(c("A","B"), function(g)
data.frame(group = g, value = c(rep(0,1000*dplyr::filter(observed.probability.df, group != g)$p),rep(1,1000*dplyr::filter(observed.probability.df, group == g)$p)))
)) %>% dplyr::left_join(expected.probability.df)
observed.probability.df$group <- factor(observed.probability.df$group, levels = c("A","B"))
observed.data.df$group <- factor(observed.data.df$group, levels = c("A","B"))
I'm fitting a logistic regression (binomial glm with a logit link function) to these data with the offset term:
fit <- glm(value ~ group + offset(p), data = observed.data.df, family = binomial(link = 'logit'))
Now, I'd like to plot these data as a bar graph using ggplot2's geom_bar, color-coded by group, and to add to that the trend line and shaded standard error area estimated in fit.
I'd use stat_smooth for that but I don't think it can handle the offset term in it's formula, so looks like I need to resort to assembling this figure in an alternative way.
To get the bars and the trend line I used:
slope.est <- function(x, ests) plogis(ests[1] + ests[2] * x)
library(ggplot2)
ggplot(observed.probability.df, aes(x = group, y = p, fill = group)) +
geom_bar(stat = 'identity') +
stat_function(fun = slope.est,args=list(ests=coef(fit)),size=2,color="black") +
scale_x_discrete(name = NULL,labels = levels(observed.probability.df$group), breaks = sort(unique(observed.probability.df$group))) +
theme_minimal() + theme(legend.title = element_blank()) + ylab("Fraction of cells")
So the question is how to add to that the shaded standard error around the trend line?
Using stat_function I am able to shade the entire area from the upper bound of the standard error all the way down to the X-axis:
ggplot(observed.probability.df, aes(x = group, y = p, fill = group)) +
geom_bar(stat = 'identity') +
stat_function(fun = slope.est,args=list(ests=coef(fit)),size=2,color="black") +
stat_function(fun = slope.est,args=list(ests=summary(fit)$coefficients[,1]+summary(fit)$coefficients[,2]),geom='area',fill="gray",alpha=0.25) +
scale_x_discrete(name = NULL,labels = levels(observed.probability.df$group), breaks = sort(unique(observed.probability.df$group))) +
theme_minimal() + theme(legend.title = element_blank()) + ylab("Fraction of cells")
Which is close but not quite there.
Any idea how to subtract from the shaded area above the area that's below the lower bound of the standard error? Perhaps geom_ribbon is the way to go here, but I don't know how to combine it with the slope.est function

Adding regression line to graph

I am trying to add a linear regression model to my plot. I have this data frame:
watershed sqm cfs
3 deerfieldwatershed 1718617392 22703.8851
5 greenwatershed 233458430 1637.4895
6 northwatershed 240348182 3281.9921
8 southwatershed 68031782 867.6428
and my current code is:
ggplot(dischargevsarea, aes(x = sqm, y = cfs, color = watershed)) +
geom_point(aes(color = watershed), size = 2) +
labs(y= "Discharge (cfs)", x = "Area (sq. m)", color = "Watershed") +
scale_color_manual(values = c("#BAC4C1", "#37B795",
"#00898F", "#002245"),
labels = c("Deerfield", "Green", "North",
"South")) +
theme_minimal() +
geom_smooth(method = "lm", se = FALSE)
Which, when it runs, adds a line to the points in the legend, but does not show up on the graph (see image below). I suspect it is drawing a line individually for each point, but I want one regression line for all four points. How would I get the line I want to show up? Thanks.
You're right, it is because your points are grouped in different categories (because of the color in your first aes), so when you call geom_smooth, it will make a regression line for each categories and in your example, it means for each single point. So, that's why you don't have a single regression line.
To get a regression line for all points, you can pass the color argument only in the aes of geom_point (or you can use inherit.aes = FALSE in geom_smooth to indicate to ggplot to not consider previous mapping arguments and fill it with new arguments).
To display the equation on the graph (based on your question in comments), you can have the use of the stat_poly_eq function from the ggpmisc package (here a SO post describing its use: Add regression line equation and R^2 on graph):
library(ggplot2)
library(ggpmisc)
ggplot(df, aes(x = sqm, y = cfs)) +
labs(y= "Discharge (cfs)", x = "Area (sq. m)", color = "Watershed") +
scale_color_manual(values = c("#BAC4C1", "#37B795",
"#00898F", "#002245"),
labels = c("Deerfield", "Green", "North",
"South")) +
theme_minimal() +
geom_smooth(method = "lm", se = FALSE, formula = y~x)+
stat_poly_eq(formula = y~x, aes(label = paste(..eq.label.., ..rr.label.., sep = "~~~")),
parse = TRUE)+
geom_point(aes(color = watershed))
Data
structure(list(watershed = c("deerfieldwatershed", "greenwatershed",
"northwatershed", "southwatershed"), sqm = c(1718617392L, 233458430L,
240348182L, 68031782L), cfs = c(22703.8851, 1637.4895, 3281.9921,
867.6428)), row.names = c(NA, -4L), class = c("data.table", "data.frame"
), .internal.selfref = <pointer: 0x55ef09764350>)

Add legend using geom_point and geom_smooth from different dataset

I really struggle to set the correct legend for a geom_point plot with loess regression, while there is 2 data set used
I got a data set, who is summarizing activity over a day, and then I plot on the same graph, all the activity per hours and per days recorded, plus a regression curve smoothed with a loess function, plus the mean of each hours for all the days.
To be more precise, here is an example of the first code, and the graph returned, without legend, which is exactly what I expected:
# first graph, which is given what I expected but with no legend
p <- ggplot(dat1, aes(x = Hour, y = value)) +
geom_point(color = "darkgray", size = 1) +
geom_point(data = dat2, mapping = aes(x = Hour, y = mean),
color = 20, size = 3) +
geom_smooth(method = "loess", span = 0.2, color = "red", fill = "blue")
and the graph (in grey there is all the data, per hours, per days. the red curve is the loess regression. The blue dots are the means for each hours):
When I tried to set the legend I failed to plot one with the explanation for both kind of dots (data in grey, mean in blue), and the loess curve (in red). See below some example of what I tried.
# second graph, which is given what I expected + the legend for the loess that
# I wanted but with not the dot legend
p <- ggplot(dat1, aes(x = Hour, y = value)) +
geom_point(color = "darkgray", size = 1) +
geom_point(data = dat2, mapping = aes(x = Hour, y = mean),
color = "blue", size = 3) +
geom_smooth(method = "loess", span = 0.2, aes(color = "red"), fill = "blue") +
scale_color_identity(name = "legend model", guide = "legend",
labels = "loess regression \n with confidence interval")
I obtained the good legend for the curve only
and another trial :
# I tried to combine both date set into a single one as following but it did not
# work at all and I really do not understand how the legends works in ggplot2
# compared to the normal plots
A <- rbind(dat1, dat2)
p <- ggplot(A, aes(x = Heure, y = value, color = variable)) +
geom_point(data = subset(A, variable == "data"), size = 1) +
geom_point(data = subset(A, variable == "Moy"), size = 3) +
geom_smooth(method = "loess", span = 0.2, aes(color = "red"), fill = "blue") +
scale_color_manual(name = "légende",
labels = c("Data", "Moy", "loess regression \n with confidence interval"),
values = c("darkgray", "royalblue", "red"))
It appears that all the legend settings are mixed together in a "weird" way, the is a grey dot covering by a grey line, and then the same in blue and in red (for the 3 labels). all got a background filled in blue:
If you need to label the mean, might need to be a bit creative, because it's not so easy to add legend manually in ggplot.
I simulate something that looks like your data below.
dat1 = data.frame(
Hour = rep(1:24,each=10),
value = c(rnorm(60,0,1),rnorm(60,2,1),rnorm(60,1,1),rnorm(60,-1,1))
)
# classify this as raw data
dat1$Data = "Raw"
# calculate mean like you did
dat2 <- dat1 %>% group_by(Hour) %>% summarise(value=mean(value))
# classify this as mean
dat2$Data = "Mean"
# combine the data frames
plotdat <- rbind(dat1,dat2)
# add a dummy variable, we'll use it later
plotdat$line = "Loess-Smooth"
We make the basic dot plot first:
ggplot(plotdat, aes(x = Hour, y = value,col=Data,size=Data)) +
geom_point() +
scale_color_manual(values=c("blue","darkgray"))+
scale_size_manual(values=c(3,1),guide=FALSE)
Note with the size, we set guide to FALSE so it will not appear. Now we add the loess smooth, one way to introduce the legend is to introduce a linetype, and since there's only one group, you will have just one variable:
ggplot(plotdat, aes(x = Hour, y = value,col=Data,size=Data)) +
geom_point() +
scale_color_manual(values=c("blue","darkgray"))+
scale_size_manual(values=c(3,1),guide=FALSE)+
geom_smooth(data=subset(plotdat,Data="Raw"),
aes(linetype=line),size=1,alpha=0.3,
method = "loess", span = 0.2, color = "red", fill = "blue")

Resources