Can't add a regression line - r

I'm new to r and trying to run a scatterplot with an added regression line and ID mapped to colour. I've tried :
ggplot(MeanData, aes(x = MeanDifference, y = d, col = ID)) + geom_jitter()+ geom_smooth(method = "lm", se = FALSE) + theme_classic()
however no regression line will appear when I run it.
Another thing I've tried is ggscatter, which I can get to run with a regression line, but I can't figure out how to map ID to colour in that code.
ggscatter(MeanData, x = "MeanDifference", y = "d", add = "reg.line", conf.int = TRUE, cor.coef = TRUE, cor.method = "pearson", xlab = "Mean Difference (degrees)", ylab = "Effect Size (d)")
Can anyone suggest how to run a scatter plot which includes both a regression line and mapping a variable to colour? Thanks in advance!

The geom_smooth layer will inherit the color aesthetic from the original ggplot() call and try to fit a line for each color - presumably with your data, one line per point. Instead, you need to either (a) specify aes(color = ID) in the geom_jitter layer, not the original ggplot call, or (b) put aes(group = 1) inside geom_smooth so it knows to group all the points together. Either of these should work:
# a
ggplot(MeanData, aes(x = MeanDifference, y = d)) +
geom_jitter(aes(color = ID)) +
geom_smooth(method = "lm", se = FALSE) +
theme_classic()
# b
ggplot(MeanData, aes(x = MeanDifference, y = d, color = ID)) +
geom_jitter() +
geom_smooth(aes(group = 1), method = "lm", se = FALSE) +
theme_classic()

Related

How to specify all factors the same colour and transparency in ggplot2, R?

I am plotting multiple regression lines separated by a factor count.
I need all of the lines to be the same colour and transparency (eg. color = red, alpha = 0.5)
Draw a mean for all the lines in the graph e.g. black, dashed line)
I tried adding the color + alpha parameters to the geom_line() but it produced a single line. Any suggestions?
#CODE to produce above graph
count <- c(1,2,3,1,2,3,1,2,3 1,2,3,1,2,3)
xcount <- rnorm(15,3,1)
ycount <- rnorm(15,5,1))
df <- data.frame(count, xcount, ycount)
ggplot(aes(x = xcount, y = ycount, color = factor(count)), data = df) +
geom_smooth(method = "lm", fill = NA) +
theme(panel.background = element_blank()) +
ylab("Y") +
xlab ("X") + theme(legend.position="none")
Move your group aesthetics out of the first ggplot declaration, then use in the first geom_smooth your grouping to create the 3 red lines. Add another geom_smooth without the grouping to create the line on all values (average)
ggplot(aes(x = xcount, y = ycount), data = df) +
geom_smooth(aes(group = factor(count)), method = "lm", fill = NA, color = "red") +
geom_smooth(method = "lm", fill = NA, color = "black", linetype = "longdash")
Note that in the above alpha = 0.5 is removed as it is not supported in geom_smooth, to achieve that as well you can use stat_smooth instead like this so your red lines will have the desired transparency.
ggplot(aes(x = xcount, y = ycount), data = df) +
stat_smooth(aes(group = factor(count)), geom = "line", method = "lm", se = FALSE, color = "red", alpha=0.5) +
stat_smooth(geom = "line", method = "lm", se = FALSE, color = "black", linetype = "longdash")
For the single color and alpha, just pull the dimension you're specifying outside of the aes() call. Anything outside of aes will be applied to every line. The trick is also having the groups specified to avoid getting a single line.
ggplot(aes(x = xcount, y = ycount, group = factor(count), data = df) +
geom_smooth(method = "lm", fill = NA, color = 'red', alpha = 0.5)
As for an average line, the easiest way to do that is pre-compute the mean right before your plot. Assuming you place the mean in a variable called mu, you would just need to add the following:
+ geom_hline(yintercept = mu, linetype = 'longdash')

How to assign colors to multicolor scatter plot with multicolor fitted lines in ggplot2

Problem
I have some data points stored in data.frame with three variables, x, y, and gender. My goal is to draw several generally fitted lines and also lines specifically fitted for male/female over the scatter plot, with points coloured by gender. It sounds easy but some issues just persist.
What I currently do is to use a new set of x's and predict y's for every model, combine the fitted lines together in a data.frame, and then convert wide to long, with their model name as the third var (from this post: ggplot2: how to add the legend for a line added to a scatter plot? and this: Add legend to ggplot2 line plot I learnt that mapping should be used instead of setting colours/legends separately). However, while I can get a multicolor line plot, the points come without specific colour for gender (already a factor) as I expected from the posts I referenced.
I also know it might be possible to use aes=(y=predict(model)), but I met other problems for this. I also tried to colour the points directly in aes, and assign colours separately for each line, but the legend cannot be generated unless I use lty, which makes legend in the same colour.
Would appreciate any idea, and also welcome to change the whole method.
Code
Note that two pairs of lines overlap. So it only appeared to be two lines. I guess adding some jitter in the data might make it look differently.
slrmen<-lm(tc~x+I(x^2),data=data[data['gender']==0,])
slrwomen<-lm(tc~x+I(x^2),data=data[data['gender']==1,])
prdf <- data.frame(x = seq(from = range(data$x)[1],
to = range(data$x)[2], length.out = 100),
gender = as.factor(rep(1,100)))
prdm <- data.frame(x = seq(from = range(data$x)[1],
to = range(data$x)[2], length.out = 100),
gender = as.factor(rep(0,100)))
prdf$fit <- predict(fullmodel, newdata = prdf)
prdm$fit <- predict(fullmodel, newdata = prdm)
rawplotdata<-data.frame(x=prdf$x, fullf=prdf$fit, fullm=prdm$fit,
linf=predict(slrwomen, newdata = prdf),
linm=predict(slrmen, newdata = prdm))
plotdata<-reshape2::melt(rawplotdata,id.vars="x",
measure.vars=c("fullf","fullm","linf","linm"),
variable.name="fitmethod", value.name="y")
plotdata$fitmethod<-as.factor(plotdata$fitmethod)
plt <- ggplot() +
geom_line(data = plotdata, aes(x = x, y = y, group = fitmethod,
colour=fitmethod)) +
scale_colour_manual(name = "Fit Methods",
values = c("fullf" = "lightskyblue",
"linf" = "cornflowerblue",
"fullm"="darkseagreen", "linm" = "olivedrab")) +
geom_point(data = data, aes(x = x, y = y, fill = gender)) +
scale_fill_manual(values=c("blue","green")) ## This does not work as I expected...
show(plt)
Code for another method (omitted two lines), which generates same-colour legend and multi-color plot:
ggplot(data = prdf, aes(x = x, y = fit)) + # prdf and prdm are just data frames containing the x's and fitted values for different models
geom_line(aes(lty="Female"),colour = "chocolate") +
geom_line(data = prdm, aes(x = x, y = fit, lty="Male"), colour = "darkblue") +
geom_point(data = data, aes(x = x, y = y, colour = gender)) +
scale_colour_discrete(name="Gender", breaks=c(0,1),
labels=c("Male","Female"))
This is related to using the colour aesthetic for lines and the fill aesthetics for points in your own (first) example. In the second example, it works because the colour aesthetic is used for lines and points.
By default, geom_point can not map a variable to fill, because the default point shape (19) doesn't have a fill.
For fill to work on points, you have to specify shape = 21:25 in geom_point(), outside of aes().
Perhaps this small reproducible example helps to illustrate the point:
Simulate data
set.seed(4821)
x1 <- rnorm(100, mean = 5)
set.seed(4821)
x2 <- rnorm(100, mean = 6)
data <- data.frame(x = rep(seq(20,80,length.out = 100),2),
tc = c(x1, x2),
gender = factor(c(rep("Female", 100), rep("Male", 100))))
Fit models
slrmen <-lm(tc~x+I(x^2), data = data[data["gender"]=="Male",])
slrwomen <-lm(tc~x+I(x^2),data = data[data["gender"]=="Female",])
newdat <- data.frame(x = seq(20,80,length.out = 200))
fitted.male <- data.frame(x = newdat,
gender = "Male",
tc = predict(object = slrmen, newdata = newdat))
fitted.female <- data.frame(x = newdat,
gender = "Female",
tc = predict(object = slrwomen, newdata = newdat))
Plot using colour aesthetics
Use the colour aesthetics for both points and lines (specify in ggplot such that it gets inherited throughout). By default, geom_point can map a variable to colour.
library(ggplot2)
ggplot(data, aes(x = x, y = tc, colour = gender)) +
geom_point() +
geom_line(data = fitted.male) +
geom_line(data = fitted.female) +
scale_colour_manual(values = c("tomato","blue")) +
theme_bw()
Plot using colour and fill aesthetics
Use the fill aesthetics for points and the colour aesthetics for lines (specify aesthetics in geom_* to prevent them being inherited). This will reproduce the problem.
ggplot(data, aes(x = x, y = tc)) +
geom_point(aes(fill = gender)) +
geom_line(data = fitted.male, aes(colour = gender)) +
geom_line(data = fitted.female, aes(colour = gender)) +
scale_colour_manual(values = c("tomato","blue")) +
scale_fill_manual(values = c("tomato","blue")) +
theme_bw()
To fix this, change the shape argument in geom_point to a point shape that can be filled (21:25).
ggplot(data, aes(x = x, y = tc)) +
geom_point(aes(fill = gender), shape = 21) +
geom_line(data = fitted.male, aes(colour = gender)) +
geom_line(data = fitted.female, aes(colour = gender)) +
scale_colour_manual(values = c("tomato","blue")) +
scale_fill_manual(values = c("tomato","blue")) +
theme_bw()
Created on 2021-09-19 by the reprex package (v2.0.1)
Note that the scales for colour and fill get merged automatically if the same variable is mapped to both aesthetics.
It seems to me that what you really want to do is use ggplot2::stat_smooth instead of trying to predict yourself.
Borrowing the data from #scrameri:
ggplot(data, aes(x = x, y = tc, color = gender)) +
geom_point() +
stat_smooth(aes(linetype = "X^2"), method = 'lm',formula = y~x + I(x^2)) +
stat_smooth(aes(linetype = "X^3"), method = 'lm',formula = y~x + I(x^2) + I(x^3)) +
scale_color_manual(values = c("darkseagreen","lightskyblue"))

Increasing number of axis tick with autoplot function (time series data)

How would we add model x-label ticks to time series plot (I am using autoplot function because "basic" ggplot needs a dataframe and with one columns time series data could have issues)
How to make more x label ticks with autoplot function
library(ggplot2)
library(gridExtra)
library(fpp2)
A <- autoplot(AirPassengers, colour = "#00AFBB", size = 1.1) +
geom_smooth(aes(y = AirPassengers), method = "lm", colour = "#FC4E07", formula = y ~ x + I(x^2), show.legend = TRUE) +
ggtitle("Původní graf časové řady") + scale_x_continuous(breaks = round(seq(min(dat$x), max(dat$x), by = 0.5),1))
A
Here is one option by overriding the current x-axis:
autoplot(AirPassengers, colour = "#00AFBB", size = 1.1) +
geom_smooth(aes(y = AirPassengers), method = "lm", colour = "#FC4E07", formula = y ~ x + I(x^2), show.legend = TRUE) +
ggtitle("Původní graf časové řady") +
scale_x_continuous(breaks = scales::extended_breaks(10))
Here is another option by replacing the current breaks:
A <- autoplot(AirPassengers, colour = "#00AFBB", size = 1.1) +
geom_smooth(aes(y = AirPassengers), method = "lm", colour = "#FC4E07", formula = y ~ x + I(x^2), show.legend = TRUE) +
ggtitle("Původní graf časové řady")
A$scales$scales[[1]]$breaks <- scales::extended_breaks(10)
A
Note that ggplot internally also uses the scales::extended_breaks() function to calculate breaks. The 10 we put into that function is the desired amount of breaks, but some choices are made depending what are 'pretty' labels.
You could also provide your own function that takes in the limits of the scale and returns breaks, or you can provide pre-defined breaks in a vector.

Adding a weighted least squares trendline in ggplot2

I am preparing a plot using ggplot2, and I want to add a trendline that is based on a weighted least squares estimation.
In base graphics this can be done by sending a WLS model to abline:
mod0 <- lm(ds$dMNP~ds$MNP)
mod1 <- lm(ds$dMNP~ds$MNP, weights = ds$Asset)
symbols(ds$dMNP~ds$MNP, circles=ds$r, inches=0.35)
#abline(mod0)
abline(mod1)
in ggplot2 I set the argument weight in geom_smooth but nothing changes:
ggplot(ds, aes(x=MNP, y=dMNP, size=Asset) +
geom_point(shape=21) +
geom_smooth(method = "lm", weight="Asset", color="black", show.legend = FALSE)
this gives me the same plot as
ggplot(ds, aes(x=MNP, y=dMNP, size=Asset) +
geom_point(shape=21) +
geom_smooth(method = "lm", color="black", show.legend = FALSE)
I'm late, but for posterity and clarity, here is the full solution:
ggplot(ds, aes(x = MNP, y = dMNP, size = Asset)) +
geom_point(shape = 21) +
geom_smooth(method = "lm", mapping = aes(weight = Asset),
color = "black", show.legend = FALSE)
Don't put the weight name in quotes.

Regression line for the entire data set together with regression lines based on groups

I am new to ggplot2 and have problem displaying the regression line for the entire data set together with the regression lines for each group.
So far I can plot regression line based on the group, but I have no success in getting the regression line for the entire data-set on the same plot.
I want all the regression lines with different line style so that they can be easily identified in black and white print.
Here is my code so far:
ggplot(alldata, aes(y = y, x = x, colour= group, shape = group )) +
geom_point(size = 3, alpha = .8) +
geom_smooth(method = "lm", fill = NA , size = 1)
Try placing the colour, shape, linetype aesthetics not in the original call to ggplot2
You can then add the overall line with a different colour
set.seed(1)
library(plyr)
alldata <- ddply(data.frame(group = letters[1:5], x = rnorm(50)), 'group',
mutate, y=runif(1,-1,1) * x +rnorm(10))
ggplot(alldata,aes(y = y, x = x)) +
geom_point(aes(colour = group, shape = group), size = 3, alpha = .8) +
geom_smooth(method = "lm", se = FALSE, size = 1,
aes(linetype = group, group = group)) +
geom_smooth(method = "lm", size = 1, colour = 'black', se = F) +
theme_bw()

Resources