Increasing number of axis tick with autoplot function (time series data) - r

How would we add model x-label ticks to time series plot (I am using autoplot function because "basic" ggplot needs a dataframe and with one columns time series data could have issues)
How to make more x label ticks with autoplot function
library(ggplot2)
library(gridExtra)
library(fpp2)
A <- autoplot(AirPassengers, colour = "#00AFBB", size = 1.1) +
geom_smooth(aes(y = AirPassengers), method = "lm", colour = "#FC4E07", formula = y ~ x + I(x^2), show.legend = TRUE) +
ggtitle("Původní graf časové řady") + scale_x_continuous(breaks = round(seq(min(dat$x), max(dat$x), by = 0.5),1))
A

Here is one option by overriding the current x-axis:
autoplot(AirPassengers, colour = "#00AFBB", size = 1.1) +
geom_smooth(aes(y = AirPassengers), method = "lm", colour = "#FC4E07", formula = y ~ x + I(x^2), show.legend = TRUE) +
ggtitle("Původní graf časové řady") +
scale_x_continuous(breaks = scales::extended_breaks(10))
Here is another option by replacing the current breaks:
A <- autoplot(AirPassengers, colour = "#00AFBB", size = 1.1) +
geom_smooth(aes(y = AirPassengers), method = "lm", colour = "#FC4E07", formula = y ~ x + I(x^2), show.legend = TRUE) +
ggtitle("Původní graf časové řady")
A$scales$scales[[1]]$breaks <- scales::extended_breaks(10)
A
Note that ggplot internally also uses the scales::extended_breaks() function to calculate breaks. The 10 we put into that function is the desired amount of breaks, but some choices are made depending what are 'pretty' labels.
You could also provide your own function that takes in the limits of the scale and returns breaks, or you can provide pre-defined breaks in a vector.

Related

How to color the area between two geom_smooth lines?

I have 3 columns in a data frame from which I want to create a visualisation with geom_smooth() :
ggplot(my_data_frame) +
aes(x = fin_enquete,
y = intentions,
colour = candidat) +
geom_point(alpha = 1/6,
shape = "circle",
size = .5L) +
geom_smooth(mapping = aes(y = erreur_inf),
size = .5L,
span = .42,
se = F) +
geom_smooth(mapping = aes(y = erreur_sup),
size = .5L,
span = .42,
se = F) +
geom_smooth(method = "loess",
size = 1.5L,
span = .42,
se = F) +
labs(x = "Date de fin d'enquête",
y = "Pourcentage d'intentions de vote") +
theme_minimal() +
theme(text = element_text(family = "DIN Pro")) +
coord_cartesian(expand = F) +
easy_remove_legend()
3 lines with geom_smooth
I would like to color the area between the upper and the lower line. I know the geom_ribbon() function but I am not sure I can use it in this situation.
Does anybody have a solution?
Have a nice day!
You could use geom_ribbon and calculate the loess model yourself within the geom_ribbon call?
Toy random data
dat <- data.frame(x=1:100, y=runif(100), y2=runif(100)+1, y3=runif(100)+2)
Now suppose we want a smoothed ribbon between y and y3, with y2 drawn as a line between them:
ggplot( dat , aes(x, y2)) +
geom_ribbon(aes(ymin=predict(loess(y~x)),
ymax=predict(loess(y3~x))), alpha=0.3) +
geom_smooth(se=F)
You could use lapply() smooth to calculate the range of df values such as (5,11,13) to calculate the smooths and plot only the two edges of the se.
Sample code:
library(ggplot2)
ggplot(data = mtcars,
mapping = aes(x = wt,
y = mpg)) +
geom_point(size = 2)+
lapply(c(5,11, 13), function (i) {
geom_smooth(
data = ~ cbind(., facet_plots = i),
method = lm,
se=F,
formula = y ~ splines::bs(x, i)
)
})+
#facet_wrap(vars(facet_plots))
geom_ribbon(
stat = "smooth",
method = "loess",
se = TRUE,
alpha = 0, # or, use fill = NA
colour = "black",
linetype = "dotted")+
theme_minimal()
Plot:

How do I change the color of the regression lines in ggPlot?

I made a visualization of a regression. Currently this is what the graph looks like.
The regression lines are hard to see since they are the same color as the scatter plot dots.
My question is, how do I make the regression lines a different color from the scatter plot dots?
Here is my code:
(ggplot(data=df, mapping=aes(x='score', y='relent',
color='factor(threshold)'))+
geom_point()+
scale_color_manual(values=['darkorange', 'purple'])+
geom_smooth(method='lm',
formula = 'y ~ x+I(x**2)',se=False, )+
geom_vline(xintercept = 766, color = "red", size = 1, linetype = "dashed")+
labs(y = "Yield",
x = "Score")+
theme_bw()
)
One option to achieve your desired result would be to "duplicate" your threshold column with different values, e.g. in the code below I map 0 on 2 and 1 on 3. This duplicated column could then be mapped on the color aes inside geom_smooth and allows to set different colors for the regression lines.
My code below uses R or ggplot2 but TBMK the code could be easily adapted to plotnine:
n <- 1000
df <- data.frame(
relent = c(runif(n, 100, 200), runif(n, 150, 250)),
score = c(runif(n, 764, 766), runif(n, 766, 768)),
threshold = c(rep(0, n), rep(1, n))
)
df$threshold_sm <- c(rep(2, n), rep(3, n))
library(ggplot2)
p <- ggplot(data = df, mapping = aes(x = score, y = relent, color = factor(threshold))) +
scale_color_manual(values = c("darkorange", "purple", "blue", "green")) +
geom_vline(xintercept = 766, color = "red", size = 1, linetype = "dashed") +
labs(
y = "Yield",
x = "Score"
) +
theme_bw()
p +
geom_point() +
geom_smooth(aes(color = factor(threshold_sm)),
method = "lm",
formula = y ~ x + I(x**2), se = FALSE
)
A second option would be to add some transparency to the points so that the lines stand out more clearly and by the way deals with the overplotting of the points:
p +
geom_point(alpha = .3) +
geom_smooth(aes(color = factor(threshold)),
method = "lm",
formula = y ~ x + I(x**2), se = FALSE
) +
guides(color = guide_legend(override.aes = list(alpha = 1)))
Compare:
iris %>%
ggplot(aes(Petal.Length, Sepal.Width, color = Species)) +
geom_point() +
geom_smooth(method = "lm", aes(group = Species))
With:
iris %>%
ggplot(aes(Petal.Length, Sepal.Width)) +
geom_point(aes(color = Species)) +
geom_smooth(method = "lm", aes(group = Species))
When aes(color = ...) is specified inside of ggplot(), it is applied to both of the subsequent geoms. Moving it to geom_point() applies it to the points only.

How to assign colors to multicolor scatter plot with multicolor fitted lines in ggplot2

Problem
I have some data points stored in data.frame with three variables, x, y, and gender. My goal is to draw several generally fitted lines and also lines specifically fitted for male/female over the scatter plot, with points coloured by gender. It sounds easy but some issues just persist.
What I currently do is to use a new set of x's and predict y's for every model, combine the fitted lines together in a data.frame, and then convert wide to long, with their model name as the third var (from this post: ggplot2: how to add the legend for a line added to a scatter plot? and this: Add legend to ggplot2 line plot I learnt that mapping should be used instead of setting colours/legends separately). However, while I can get a multicolor line plot, the points come without specific colour for gender (already a factor) as I expected from the posts I referenced.
I also know it might be possible to use aes=(y=predict(model)), but I met other problems for this. I also tried to colour the points directly in aes, and assign colours separately for each line, but the legend cannot be generated unless I use lty, which makes legend in the same colour.
Would appreciate any idea, and also welcome to change the whole method.
Code
Note that two pairs of lines overlap. So it only appeared to be two lines. I guess adding some jitter in the data might make it look differently.
slrmen<-lm(tc~x+I(x^2),data=data[data['gender']==0,])
slrwomen<-lm(tc~x+I(x^2),data=data[data['gender']==1,])
prdf <- data.frame(x = seq(from = range(data$x)[1],
to = range(data$x)[2], length.out = 100),
gender = as.factor(rep(1,100)))
prdm <- data.frame(x = seq(from = range(data$x)[1],
to = range(data$x)[2], length.out = 100),
gender = as.factor(rep(0,100)))
prdf$fit <- predict(fullmodel, newdata = prdf)
prdm$fit <- predict(fullmodel, newdata = prdm)
rawplotdata<-data.frame(x=prdf$x, fullf=prdf$fit, fullm=prdm$fit,
linf=predict(slrwomen, newdata = prdf),
linm=predict(slrmen, newdata = prdm))
plotdata<-reshape2::melt(rawplotdata,id.vars="x",
measure.vars=c("fullf","fullm","linf","linm"),
variable.name="fitmethod", value.name="y")
plotdata$fitmethod<-as.factor(plotdata$fitmethod)
plt <- ggplot() +
geom_line(data = plotdata, aes(x = x, y = y, group = fitmethod,
colour=fitmethod)) +
scale_colour_manual(name = "Fit Methods",
values = c("fullf" = "lightskyblue",
"linf" = "cornflowerblue",
"fullm"="darkseagreen", "linm" = "olivedrab")) +
geom_point(data = data, aes(x = x, y = y, fill = gender)) +
scale_fill_manual(values=c("blue","green")) ## This does not work as I expected...
show(plt)
Code for another method (omitted two lines), which generates same-colour legend and multi-color plot:
ggplot(data = prdf, aes(x = x, y = fit)) + # prdf and prdm are just data frames containing the x's and fitted values for different models
geom_line(aes(lty="Female"),colour = "chocolate") +
geom_line(data = prdm, aes(x = x, y = fit, lty="Male"), colour = "darkblue") +
geom_point(data = data, aes(x = x, y = y, colour = gender)) +
scale_colour_discrete(name="Gender", breaks=c(0,1),
labels=c("Male","Female"))
This is related to using the colour aesthetic for lines and the fill aesthetics for points in your own (first) example. In the second example, it works because the colour aesthetic is used for lines and points.
By default, geom_point can not map a variable to fill, because the default point shape (19) doesn't have a fill.
For fill to work on points, you have to specify shape = 21:25 in geom_point(), outside of aes().
Perhaps this small reproducible example helps to illustrate the point:
Simulate data
set.seed(4821)
x1 <- rnorm(100, mean = 5)
set.seed(4821)
x2 <- rnorm(100, mean = 6)
data <- data.frame(x = rep(seq(20,80,length.out = 100),2),
tc = c(x1, x2),
gender = factor(c(rep("Female", 100), rep("Male", 100))))
Fit models
slrmen <-lm(tc~x+I(x^2), data = data[data["gender"]=="Male",])
slrwomen <-lm(tc~x+I(x^2),data = data[data["gender"]=="Female",])
newdat <- data.frame(x = seq(20,80,length.out = 200))
fitted.male <- data.frame(x = newdat,
gender = "Male",
tc = predict(object = slrmen, newdata = newdat))
fitted.female <- data.frame(x = newdat,
gender = "Female",
tc = predict(object = slrwomen, newdata = newdat))
Plot using colour aesthetics
Use the colour aesthetics for both points and lines (specify in ggplot such that it gets inherited throughout). By default, geom_point can map a variable to colour.
library(ggplot2)
ggplot(data, aes(x = x, y = tc, colour = gender)) +
geom_point() +
geom_line(data = fitted.male) +
geom_line(data = fitted.female) +
scale_colour_manual(values = c("tomato","blue")) +
theme_bw()
Plot using colour and fill aesthetics
Use the fill aesthetics for points and the colour aesthetics for lines (specify aesthetics in geom_* to prevent them being inherited). This will reproduce the problem.
ggplot(data, aes(x = x, y = tc)) +
geom_point(aes(fill = gender)) +
geom_line(data = fitted.male, aes(colour = gender)) +
geom_line(data = fitted.female, aes(colour = gender)) +
scale_colour_manual(values = c("tomato","blue")) +
scale_fill_manual(values = c("tomato","blue")) +
theme_bw()
To fix this, change the shape argument in geom_point to a point shape that can be filled (21:25).
ggplot(data, aes(x = x, y = tc)) +
geom_point(aes(fill = gender), shape = 21) +
geom_line(data = fitted.male, aes(colour = gender)) +
geom_line(data = fitted.female, aes(colour = gender)) +
scale_colour_manual(values = c("tomato","blue")) +
scale_fill_manual(values = c("tomato","blue")) +
theme_bw()
Created on 2021-09-19 by the reprex package (v2.0.1)
Note that the scales for colour and fill get merged automatically if the same variable is mapped to both aesthetics.
It seems to me that what you really want to do is use ggplot2::stat_smooth instead of trying to predict yourself.
Borrowing the data from #scrameri:
ggplot(data, aes(x = x, y = tc, color = gender)) +
geom_point() +
stat_smooth(aes(linetype = "X^2"), method = 'lm',formula = y~x + I(x^2)) +
stat_smooth(aes(linetype = "X^3"), method = 'lm',formula = y~x + I(x^2) + I(x^3)) +
scale_color_manual(values = c("darkseagreen","lightskyblue"))

Exclude a particular area from geom_smooth fit automatically

I am plotting different plots in my shiny app.
By using geom_smooth(), I am fitting a smoothing curve on a scatterplot.
I am plotting these plots with ggplot() and rendering with ggplotly().
Is there any way, I can exclude a particular data profile from geom_smooth().
For e.g.:
It can be seen in the fit, the fit is getting disturbed and which is not desirable. I have tried plotly_click(), plotly_brush(), plotly_select(). But, I don't want user's interference when plotting this fit, this makes the process much slower and inaccurate.
Here is my code to plot this:
#plot
g <- ggplot(data = d_f4, aes_string(x = d_f4$x, y = d_f4$y)) + theme_bw() +
geom_point(colour = "blue", size = 0.1)+
geom_smooth(formula = y ~ splines::bs(x, df = 10), method = "lm", color = "green3", level = 1, size = 1)
Unfortunately, I can not include my dataset in my question, because the dataset is quite big.
You can make an extra data.frame without the "outliers" and use this as the input for geom_smooth:
set.seed(8)
test_data <- data.frame(x = 1:100)
test_data$y <- sin(test_data$x / 10) + rnorm(100, sd = 0.1)
test_data[60:65, "y"] <- test_data[60:65, "y"] + 1
data_plot <- test_data[-c(60:65), ]
library(ggplot2)
ggplot(data = test_data, aes(x = x, y = y)) + theme_bw() +
geom_point(colour = "blue", size = 0.1) +
geom_smooth(formula = y ~ splines::bs(x, df = 10), method = "lm", color = "green3", level = 1, size = 1)
ggplot(data = test_data, aes(x = x, y = y)) + theme_bw() +
geom_point(colour = "blue", size = 0.1) +
geom_smooth(data = data_plot, formula = y ~ splines::bs(x, df = 10), method = "lm", color = "green3", level = 1, size = 1)
Created on 2020-11-27 by the reprex package (v0.3.0)
BTW: you don't need aes_string (which is deprecated) and d_f4$x, you can just use aes(x = x)

How to add a custom legend in ggplot2 in R

I want to plot a data set where the size of the points are proportional to the x-variable and have a regression line with a 95% prediction interval. The "sample" code I have written is as follows:
# Create random data and run regression
x <- rnorm(40)
y <- 0.5 * x + rnorm(40)
plot.dta <- data.frame(y, x)
mod <- lm(y ~ x, data = plot.dta)
# Create values for prediction interval
x.new <- data.frame(x = seq(-2.5, 2.5, length = 1000))
pred <- predict(mod,, newdata = x.new, interval = "prediction")
pred <- data.frame(cbind(x.new, pred))
# plot the data w/ regression line and prediction interval
p <- ggplot(pred, aes(x = x, y = upr)) +
geom_line(aes(y = lwr), color = "#666666", linetype = "dashed") +
geom_line(aes(y = upr), color = "#666666", linetype = "dashed") +
geom_line(aes(y = fit)) +
geom_point(data = plot.dta, aes(y = y, size = x))
p
This produces the following plot:
Obviously, the legend is not too helpful here. I would like to have one entry in the legend for the points, say, labeled "data", one grey, dashed line labeled "95% PI" and one entry with a black line labeled "Regression line."
As Hack-R alluded in the provided link, you can set the breaks and labels for scale_size() to make that legend more meaningful.
You can also construct a legend for all your geom_line() calls by adding linetype into your aes() and use a scale_linetype_manual() to set the values, breaks and labels.
ggplot(pred, aes(x = x, y = upr)) +
geom_line(aes(y = lwr, linetype = "dashed"), color = "#666666") +
geom_line(aes(y = upr, linetype = "dashed"), color = "#666666") +
geom_line(aes(y = fit, linetype = "solid")) +
geom_point(data = plot.dta, aes(y = y, size = x)) +
scale_size(labels = c("Eensy-weensy", "Teeny", "Small", "Medium", "Large")) +
scale_linetype_manual(values = c("dashed" = 2, "solid" = 1), labels = c("95% PI", "Regression Line"))

Resources