How to change font size of R^2 on scatterplot? - r

I have created a scatterplot and have included my R^2 value on the figure. However, I want to reduce the text size of the R^2 value but cant seem to work out how to do it. My code is below.
ggplot(Gully, aes(x = Downstream, y = Depth))+
geom_point(size = 0.5)+
stat_smooth(method= "lm", col = "black", sixe = 0.5) +
theme_bw()+
theme_classic()+
stat_regline_equation(label.y = -7, aes(label = ..rr.label.., size = 4))+
labs(y = "Decline in waterhole depth (m)", x = "Downstream distance (km)")+
theme(text=element_text(size=8, family = "Arial"))
Any suggestions would be great.
Thakyou
Marita

You probably don't want to map a variable with level "4" to the size aesthetic, which you do if you put size = 4 in aes().
You can simply set a size = 4 for the text size if you set the argument inside stat_regline_equation but outside of aes().
In absence of any minimal example data in your question, here comes an example from the stat_regline_equation help page.
library(ggplot2)
library(ggpubr)
set.seed(4321)
x <- 1:100
y <- (x + x^2 + x^3) + rnorm(length(x), mean = 0, sd = mean(x^3) / 4)
my.data <- data.frame(x, y, group = c("A", "B"),
y2 = y * c(0.5,2), block = c("a", "a", "b", "b"))
# Fit polynomial regression line and add labels
formula <- y ~ poly(x, 3, raw = TRUE)
ggplot(my.data, aes(x, y2, color = group)) +
geom_point() +
stat_smooth(aes(fill = group, color = group), method = "lm", formula = formula) +
stat_regline_equation(
aes(label = paste(..eq.label.., ..adj.rr.label.., sep = "~~~~")),
formula = formula, size = 8 ## size argument outside of aes()
) +
theme_bw()
Created on 2021-09-22 by the reprex package (v2.0.1)

Related

How to color the area between two geom_smooth lines?

I have 3 columns in a data frame from which I want to create a visualisation with geom_smooth() :
ggplot(my_data_frame) +
aes(x = fin_enquete,
y = intentions,
colour = candidat) +
geom_point(alpha = 1/6,
shape = "circle",
size = .5L) +
geom_smooth(mapping = aes(y = erreur_inf),
size = .5L,
span = .42,
se = F) +
geom_smooth(mapping = aes(y = erreur_sup),
size = .5L,
span = .42,
se = F) +
geom_smooth(method = "loess",
size = 1.5L,
span = .42,
se = F) +
labs(x = "Date de fin d'enquĂȘte",
y = "Pourcentage d'intentions de vote") +
theme_minimal() +
theme(text = element_text(family = "DIN Pro")) +
coord_cartesian(expand = F) +
easy_remove_legend()
3 lines with geom_smooth
I would like to color the area between the upper and the lower line. I know the geom_ribbon() function but I am not sure I can use it in this situation.
Does anybody have a solution?
Have a nice day!
You could use geom_ribbon and calculate the loess model yourself within the geom_ribbon call?
Toy random data
dat <- data.frame(x=1:100, y=runif(100), y2=runif(100)+1, y3=runif(100)+2)
Now suppose we want a smoothed ribbon between y and y3, with y2 drawn as a line between them:
ggplot( dat , aes(x, y2)) +
geom_ribbon(aes(ymin=predict(loess(y~x)),
ymax=predict(loess(y3~x))), alpha=0.3) +
geom_smooth(se=F)
You could use lapply() smooth to calculate the range of df values such as (5,11,13) to calculate the smooths and plot only the two edges of the se.
Sample code:
library(ggplot2)
ggplot(data = mtcars,
mapping = aes(x = wt,
y = mpg)) +
geom_point(size = 2)+
lapply(c(5,11, 13), function (i) {
geom_smooth(
data = ~ cbind(., facet_plots = i),
method = lm,
se=F,
formula = y ~ splines::bs(x, i)
)
})+
#facet_wrap(vars(facet_plots))
geom_ribbon(
stat = "smooth",
method = "loess",
se = TRUE,
alpha = 0, # or, use fill = NA
colour = "black",
linetype = "dotted")+
theme_minimal()
Plot:

How do I change the color of the regression lines in ggPlot?

I made a visualization of a regression. Currently this is what the graph looks like.
The regression lines are hard to see since they are the same color as the scatter plot dots.
My question is, how do I make the regression lines a different color from the scatter plot dots?
Here is my code:
(ggplot(data=df, mapping=aes(x='score', y='relent',
color='factor(threshold)'))+
geom_point()+
scale_color_manual(values=['darkorange', 'purple'])+
geom_smooth(method='lm',
formula = 'y ~ x+I(x**2)',se=False, )+
geom_vline(xintercept = 766, color = "red", size = 1, linetype = "dashed")+
labs(y = "Yield",
x = "Score")+
theme_bw()
)
One option to achieve your desired result would be to "duplicate" your threshold column with different values, e.g. in the code below I map 0 on 2 and 1 on 3. This duplicated column could then be mapped on the color aes inside geom_smooth and allows to set different colors for the regression lines.
My code below uses R or ggplot2 but TBMK the code could be easily adapted to plotnine:
n <- 1000
df <- data.frame(
relent = c(runif(n, 100, 200), runif(n, 150, 250)),
score = c(runif(n, 764, 766), runif(n, 766, 768)),
threshold = c(rep(0, n), rep(1, n))
)
df$threshold_sm <- c(rep(2, n), rep(3, n))
library(ggplot2)
p <- ggplot(data = df, mapping = aes(x = score, y = relent, color = factor(threshold))) +
scale_color_manual(values = c("darkorange", "purple", "blue", "green")) +
geom_vline(xintercept = 766, color = "red", size = 1, linetype = "dashed") +
labs(
y = "Yield",
x = "Score"
) +
theme_bw()
p +
geom_point() +
geom_smooth(aes(color = factor(threshold_sm)),
method = "lm",
formula = y ~ x + I(x**2), se = FALSE
)
A second option would be to add some transparency to the points so that the lines stand out more clearly and by the way deals with the overplotting of the points:
p +
geom_point(alpha = .3) +
geom_smooth(aes(color = factor(threshold)),
method = "lm",
formula = y ~ x + I(x**2), se = FALSE
) +
guides(color = guide_legend(override.aes = list(alpha = 1)))
Compare:
iris %>%
ggplot(aes(Petal.Length, Sepal.Width, color = Species)) +
geom_point() +
geom_smooth(method = "lm", aes(group = Species))
With:
iris %>%
ggplot(aes(Petal.Length, Sepal.Width)) +
geom_point(aes(color = Species)) +
geom_smooth(method = "lm", aes(group = Species))
When aes(color = ...) is specified inside of ggplot(), it is applied to both of the subsequent geoms. Moving it to geom_point() applies it to the points only.

Exclude a particular area from geom_smooth fit automatically

I am plotting different plots in my shiny app.
By using geom_smooth(), I am fitting a smoothing curve on a scatterplot.
I am plotting these plots with ggplot() and rendering with ggplotly().
Is there any way, I can exclude a particular data profile from geom_smooth().
For e.g.:
It can be seen in the fit, the fit is getting disturbed and which is not desirable. I have tried plotly_click(), plotly_brush(), plotly_select(). But, I don't want user's interference when plotting this fit, this makes the process much slower and inaccurate.
Here is my code to plot this:
#plot
g <- ggplot(data = d_f4, aes_string(x = d_f4$x, y = d_f4$y)) + theme_bw() +
geom_point(colour = "blue", size = 0.1)+
geom_smooth(formula = y ~ splines::bs(x, df = 10), method = "lm", color = "green3", level = 1, size = 1)
Unfortunately, I can not include my dataset in my question, because the dataset is quite big.
You can make an extra data.frame without the "outliers" and use this as the input for geom_smooth:
set.seed(8)
test_data <- data.frame(x = 1:100)
test_data$y <- sin(test_data$x / 10) + rnorm(100, sd = 0.1)
test_data[60:65, "y"] <- test_data[60:65, "y"] + 1
data_plot <- test_data[-c(60:65), ]
library(ggplot2)
ggplot(data = test_data, aes(x = x, y = y)) + theme_bw() +
geom_point(colour = "blue", size = 0.1) +
geom_smooth(formula = y ~ splines::bs(x, df = 10), method = "lm", color = "green3", level = 1, size = 1)
ggplot(data = test_data, aes(x = x, y = y)) + theme_bw() +
geom_point(colour = "blue", size = 0.1) +
geom_smooth(data = data_plot, formula = y ~ splines::bs(x, df = 10), method = "lm", color = "green3", level = 1, size = 1)
Created on 2020-11-27 by the reprex package (v0.3.0)
BTW: you don't need aes_string (which is deprecated) and d_f4$x, you can just use aes(x = x)

Different color scale for geom_point and geom_smooth on ggplot

I am trying to plot observations and their grouped regression lines with ggplot as follows:
ggplot(df, aes(x = cabpol.e, y = pred.vote_share, color = coalshare)) +
geom_point() +
scale_color_gradient2(midpoint = 50, low="blue", mid="green", high="red") +
geom_smooth(aes(x = cabpol.e, y = pred.vote_share, group=coalshare1, fill = coalshare1), se = FALSE, method='lm') +
scale_fill_manual(values = c(Junior="blue", Medium="green", Senior="red"))
The problem is that the lines from geom_smooth are all the same color. I tried using scale_fill_manual so that there aren't two different color scales, and manually determining which color corresponds to each group. but instead all the lines appear blue. How can I make each line a different color?
As requested, here is a set of replicable data with the same problem:
set.seed(1000)
dff <- data.frame(x=rnorm(100, 0, 1),
y=rnorm(100, 1, 2),
z=seq(1, 100, 1),
g=rep(c("A", "B"), 50))
ggplot(dff, aes(x = x, y = y, color = z, group = g, fill = g)) +
geom_point() +
scale_color_gradient2(midpoint = 50, low="blue", high="red") +
geom_smooth(se = FALSE, method='lm')
My solution to this problem would be to create multiple geom_smooth calls, and each time subset the data for the desired factor level. This way you are able to pass a different color to each call of geom_smooth. As long as you do not have many factors, this solution is not terribly inefficient.
dff <- data.frame(x=rnorm(100, 0, 1),
y=rnorm(100, 1, 2),
z=seq(1, 100, 1),
g=rep(c("A", "B"), 50))
ggplot(dff, aes(x = x, y = y,
color = z,
group = g)) +
geom_point() +
scale_color_gradient2(midpoint = 50, low="blue", high="red") +
geom_smooth(
aes(x = x, y =y),
color = "red",
method = "lm",
data = filter(dff, g == "A"),
se = FALSE
) +
geom_smooth(
aes(x = x, y =y),
color = "blue",
method = "lm",
data = filter(dff, g == "B"),
se = FALSE
)
Group-trends between the x and y variables can be plotted by using different dataframes for the geom_line (with predicted values) and geom_point (with raw data) functions. Make sure to determine in the ggplot() function that color is always the same variable, and then for geom_line group by the same variable.
p2 <- ggplot(NULL, aes(x = cabpol.e, y = vote_share, color = coalshare)) +
geom_line(data = preds, aes(group = coalshare, color = coalshare), size = 1) +
geom_point(data = df, aes(x = cabpol.e, y = vote_share)) +
scale_color_gradient2(name = "Share of Seats\nin Coalition (%)",
midpoint = 50, low="blue", mid = "green", high="red") +
xlab("Ideological Differences on State/Market") +
ylab("Vote Share (%)") +
ggtitle("Vote Share Won by Coalition Parties in Next Election")

How to add a custom legend in ggplot2 in R

I want to plot a data set where the size of the points are proportional to the x-variable and have a regression line with a 95% prediction interval. The "sample" code I have written is as follows:
# Create random data and run regression
x <- rnorm(40)
y <- 0.5 * x + rnorm(40)
plot.dta <- data.frame(y, x)
mod <- lm(y ~ x, data = plot.dta)
# Create values for prediction interval
x.new <- data.frame(x = seq(-2.5, 2.5, length = 1000))
pred <- predict(mod,, newdata = x.new, interval = "prediction")
pred <- data.frame(cbind(x.new, pred))
# plot the data w/ regression line and prediction interval
p <- ggplot(pred, aes(x = x, y = upr)) +
geom_line(aes(y = lwr), color = "#666666", linetype = "dashed") +
geom_line(aes(y = upr), color = "#666666", linetype = "dashed") +
geom_line(aes(y = fit)) +
geom_point(data = plot.dta, aes(y = y, size = x))
p
This produces the following plot:
Obviously, the legend is not too helpful here. I would like to have one entry in the legend for the points, say, labeled "data", one grey, dashed line labeled "95% PI" and one entry with a black line labeled "Regression line."
As Hack-R alluded in the provided link, you can set the breaks and labels for scale_size() to make that legend more meaningful.
You can also construct a legend for all your geom_line() calls by adding linetype into your aes() and use a scale_linetype_manual() to set the values, breaks and labels.
ggplot(pred, aes(x = x, y = upr)) +
geom_line(aes(y = lwr, linetype = "dashed"), color = "#666666") +
geom_line(aes(y = upr, linetype = "dashed"), color = "#666666") +
geom_line(aes(y = fit, linetype = "solid")) +
geom_point(data = plot.dta, aes(y = y, size = x)) +
scale_size(labels = c("Eensy-weensy", "Teeny", "Small", "Medium", "Large")) +
scale_linetype_manual(values = c("dashed" = 2, "solid" = 1), labels = c("95% PI", "Regression Line"))

Resources