Adding a weighted least squares trendline in ggplot2 - r

I am preparing a plot using ggplot2, and I want to add a trendline that is based on a weighted least squares estimation.
In base graphics this can be done by sending a WLS model to abline:
mod0 <- lm(ds$dMNP~ds$MNP)
mod1 <- lm(ds$dMNP~ds$MNP, weights = ds$Asset)
symbols(ds$dMNP~ds$MNP, circles=ds$r, inches=0.35)
#abline(mod0)
abline(mod1)
in ggplot2 I set the argument weight in geom_smooth but nothing changes:
ggplot(ds, aes(x=MNP, y=dMNP, size=Asset) +
geom_point(shape=21) +
geom_smooth(method = "lm", weight="Asset", color="black", show.legend = FALSE)
this gives me the same plot as
ggplot(ds, aes(x=MNP, y=dMNP, size=Asset) +
geom_point(shape=21) +
geom_smooth(method = "lm", color="black", show.legend = FALSE)

I'm late, but for posterity and clarity, here is the full solution:
ggplot(ds, aes(x = MNP, y = dMNP, size = Asset)) +
geom_point(shape = 21) +
geom_smooth(method = "lm", mapping = aes(weight = Asset),
color = "black", show.legend = FALSE)
Don't put the weight name in quotes.

Related

How to color the area between two geom_smooth lines?

I have 3 columns in a data frame from which I want to create a visualisation with geom_smooth() :
ggplot(my_data_frame) +
aes(x = fin_enquete,
y = intentions,
colour = candidat) +
geom_point(alpha = 1/6,
shape = "circle",
size = .5L) +
geom_smooth(mapping = aes(y = erreur_inf),
size = .5L,
span = .42,
se = F) +
geom_smooth(mapping = aes(y = erreur_sup),
size = .5L,
span = .42,
se = F) +
geom_smooth(method = "loess",
size = 1.5L,
span = .42,
se = F) +
labs(x = "Date de fin d'enquête",
y = "Pourcentage d'intentions de vote") +
theme_minimal() +
theme(text = element_text(family = "DIN Pro")) +
coord_cartesian(expand = F) +
easy_remove_legend()
3 lines with geom_smooth
I would like to color the area between the upper and the lower line. I know the geom_ribbon() function but I am not sure I can use it in this situation.
Does anybody have a solution?
Have a nice day!
You could use geom_ribbon and calculate the loess model yourself within the geom_ribbon call?
Toy random data
dat <- data.frame(x=1:100, y=runif(100), y2=runif(100)+1, y3=runif(100)+2)
Now suppose we want a smoothed ribbon between y and y3, with y2 drawn as a line between them:
ggplot( dat , aes(x, y2)) +
geom_ribbon(aes(ymin=predict(loess(y~x)),
ymax=predict(loess(y3~x))), alpha=0.3) +
geom_smooth(se=F)
You could use lapply() smooth to calculate the range of df values such as (5,11,13) to calculate the smooths and plot only the two edges of the se.
Sample code:
library(ggplot2)
ggplot(data = mtcars,
mapping = aes(x = wt,
y = mpg)) +
geom_point(size = 2)+
lapply(c(5,11, 13), function (i) {
geom_smooth(
data = ~ cbind(., facet_plots = i),
method = lm,
se=F,
formula = y ~ splines::bs(x, i)
)
})+
#facet_wrap(vars(facet_plots))
geom_ribbon(
stat = "smooth",
method = "loess",
se = TRUE,
alpha = 0, # or, use fill = NA
colour = "black",
linetype = "dotted")+
theme_minimal()
Plot:

Can't add a regression line

I'm new to r and trying to run a scatterplot with an added regression line and ID mapped to colour. I've tried :
ggplot(MeanData, aes(x = MeanDifference, y = d, col = ID)) + geom_jitter()+ geom_smooth(method = "lm", se = FALSE) + theme_classic()
however no regression line will appear when I run it.
Another thing I've tried is ggscatter, which I can get to run with a regression line, but I can't figure out how to map ID to colour in that code.
ggscatter(MeanData, x = "MeanDifference", y = "d", add = "reg.line", conf.int = TRUE, cor.coef = TRUE, cor.method = "pearson", xlab = "Mean Difference (degrees)", ylab = "Effect Size (d)")
Can anyone suggest how to run a scatter plot which includes both a regression line and mapping a variable to colour? Thanks in advance!
The geom_smooth layer will inherit the color aesthetic from the original ggplot() call and try to fit a line for each color - presumably with your data, one line per point. Instead, you need to either (a) specify aes(color = ID) in the geom_jitter layer, not the original ggplot call, or (b) put aes(group = 1) inside geom_smooth so it knows to group all the points together. Either of these should work:
# a
ggplot(MeanData, aes(x = MeanDifference, y = d)) +
geom_jitter(aes(color = ID)) +
geom_smooth(method = "lm", se = FALSE) +
theme_classic()
# b
ggplot(MeanData, aes(x = MeanDifference, y = d, color = ID)) +
geom_jitter() +
geom_smooth(aes(group = 1), method = "lm", se = FALSE) +
theme_classic()

Increasing number of axis tick with autoplot function (time series data)

How would we add model x-label ticks to time series plot (I am using autoplot function because "basic" ggplot needs a dataframe and with one columns time series data could have issues)
How to make more x label ticks with autoplot function
library(ggplot2)
library(gridExtra)
library(fpp2)
A <- autoplot(AirPassengers, colour = "#00AFBB", size = 1.1) +
geom_smooth(aes(y = AirPassengers), method = "lm", colour = "#FC4E07", formula = y ~ x + I(x^2), show.legend = TRUE) +
ggtitle("Původní graf časové řady") + scale_x_continuous(breaks = round(seq(min(dat$x), max(dat$x), by = 0.5),1))
A
Here is one option by overriding the current x-axis:
autoplot(AirPassengers, colour = "#00AFBB", size = 1.1) +
geom_smooth(aes(y = AirPassengers), method = "lm", colour = "#FC4E07", formula = y ~ x + I(x^2), show.legend = TRUE) +
ggtitle("Původní graf časové řady") +
scale_x_continuous(breaks = scales::extended_breaks(10))
Here is another option by replacing the current breaks:
A <- autoplot(AirPassengers, colour = "#00AFBB", size = 1.1) +
geom_smooth(aes(y = AirPassengers), method = "lm", colour = "#FC4E07", formula = y ~ x + I(x^2), show.legend = TRUE) +
ggtitle("Původní graf časové řady")
A$scales$scales[[1]]$breaks <- scales::extended_breaks(10)
A
Note that ggplot internally also uses the scales::extended_breaks() function to calculate breaks. The 10 we put into that function is the desired amount of breaks, but some choices are made depending what are 'pretty' labels.
You could also provide your own function that takes in the limits of the scale and returns breaks, or you can provide pre-defined breaks in a vector.

Adding uncertainty bands to a smooth spline in a scatterplot

I have a following scatterplot with a smooth spline
a<-rep(1:50,len=500)
b<-sample(0:5000,500)
c<-round(seq(0,600,len=500))
data_frame<-as.data.frame(cbind(a,b,c))
names(data_frame)<-c("ID","toxin_level","days_to_event")
plot(data_frame$days_to_event,data_frame$toxin_level, xlim=c(600,0),xlab="days before the event",ylab="Toxin level",type="p")
abline(v=0,col="red")
x <- data_frame$days_to_event
y <- data_frame$toxin_level
fit.sp = smooth.spline(y ~ x, nknots=20)
lines(fit.sp, col="blue")
This is the resulting plot
I was wondernig if it is possible to somehow add confidence bands to this curve? I deally I would like it to be in a transparent blue, but any color including gray is OK.
Updated: using scale_x_reverse to match your graph more precisely...
How about this using ggplot2?
library(ggplot2)
ggplot(data_frame, aes(x = days_to_event, y = toxin_level)) + geom_point() +
geom_vline(xintercept = 0, color = "red") + scale_x_reverse() +
xlab("Days before the event") + ylab("Toxin Level") +
geom_smooth(method = lm, se = TRUE)
Which gives this:
Or to match your question a bit more:
ggplot(data_frame, aes(x = days_to_event, y = toxin_level)) + geom_point(shape = 1) +
geom_vline(xintercept = 0, color = "red") + scale_x_reverse() +
xlab("Days before the event") + ylab("Toxin Level") +
geom_smooth(method = lm, se = TRUE, color = "blue", fill = "lightblue") +
theme_bw()

Sensible and easy way of placing labels in ggplot with use of geom_dl and geom_point

I'm using the code below to generate a simple chart.
# Data import -------------------------------------------------------------
data(mtcars)
mtcars$model <- rownames(mtcars)
# Graph: Income Broadband -------------------------------------------------
# Lib.
require(ggplot2); require(directlabels)
# Graph definition
ggplot(data = mtcars, aes(x = mpg, y = disp)) +
geom_point(shape = 1, colour = "black", size = 3, fill = "black") +
geom_smooth(method = lm, se = TRUE, fullrange = TRUE) +
geom_dl(aes(label = model), list("smart.grid", cex = 0.5, hjust = -.5)) +
xlab("MPG") +
ylab("DISP") +
theme_bw()
As illustrated below, the labels on the chart are placed far away from the points. I would like to amend this and place the point labels closer to the points on the graph. Naturally, for the sake of readability I would like for the labels not overlap. In addition, I would like for the solution to be easy to reproduce as I will have to apply across a number of charts. mlabvpos in Stata, as discussed here, provides some of those functionalities. I'm looking for a similar solution in R.
Edit
Following the comments, it appears the problem is not associated with the hjust settings. For instance, for the code:
# Graph definition
ggplot(data = mtcars, aes(x = mpg, y = disp)) +
geom_point(shape = 1, colour = "black", size = 3, fill = "black") +
geom_smooth(method = lm, se = TRUE, fullrange = TRUE) +
geom_dl(aes(label = model), list("smart.grid", cex = 0.5, hjust = -.001)) +
xlab("MPG") +
ylab("DISP") +
theme_bw()
The labels are still misplaced:
On the same lines, running the code with no hjust settings does not place the labels in a more sensible manner:

Resources