linear regression lines cannot show on time series plot in r - r

I am new in plotting time series. I downloaded a time series data and calculated a linear equation and I would like to add it in the time series plot. I want to show the year in the plot so I used index(stk) as x-axis input.
code:
library(quantmod)
stk <- suppressWarnings(getSymbols("AAPL", auto.assign = FALSE,
src = "yahoo", periodicity = "daily"))
stk <- na.omit(stk)
stk.lm1 <- lm(log(Cl(stk)) ~ c(1:nrow(stk)), data = stk)
plot(index(stk), log(Cl(stk)), type = "l", lwd = 3, las = 1)
abline(coefficients(stk.lm1)[1], coefficients(stk.lm1)[2], col="blue")
I know it is the plot using index(stk), how can I do to keep the x axis of plot in date and can I use plot.xts or other like ggplot2 to do the same things? Please advise, thank you very much.

It isn't dificult to do the plot that you want in base r plot or ggplot2 here is what you what:
plot(index(stk), log(Cl(stk)), type="l", lwd=3, las=1)
lines(x = index(stk.lm1$fitted.values), y = stk.lm1$fitted.values,col = "blue")
for the base r plot I added a line with the fitted values of the linear regression that I extracted with the $ signed and the dates of theme. Take into account that lm respect the structure of the data so the results are xts
library(ggplot2)
ggplot(stk, aes(x = index(stk), y = as.numeric(log(Cl(stk)))))+geom_line(lwd=1)+
geom_line(aes(x = index(stk.lm1$fitted.values), y = stk.lm1$fitted.values),col = "blue")+
labs(x = "Date", y = "Log Price")
For ggplot2 is quite similar. First you have to initiate the plot with ggplot where you defined the data and aesthetics (aes), then you add a line with geom_line and for the extra line I used the this command and define the new line in a new aes the same way I did it with the base r function.

Here's a ggplot solution. You shouldn't have to calculate the linear regression coefficients yourself:
# convert stk to data frame & specify your x-axis variable explicitly
stk.df <- as.data.frame(stk)
stk.df$Date <- as.Date(rownames(stk.df))
# plot
ggplot(stk.df,
aes(x = Date, y = log(AAPL.Close))) +
geom_line() +
geom_smooth(method = "lm", se = FALSE) +
labs(x = "Year", y = "log(AAPL's closing value)") +
theme_bw()
The geom_smooth line takes care of the regression. Set se = TRUE if you want to include a confidence interval around the line.

Related

R - update boxplot axis range after adding points

I have a boxplot which summarizes ~60000 turbidity data points into quartiles, median, whiskers and sometimes outliers. Often a few outliers are so high up that the whole plot is compressed at the bottom, and I therefor choose to omit the outliers. However, I also have added averages to the plots as points, and I want these to be plotted always. The problem is that the y-axis of the boxplot does not adjust to the added average points, so when averages are far above the box they are simply plotted outside the chart window (see X-point for 2020, but none for 2021 or 2022). Normally with this parameter, the average will be between the whisker end and the most extreme outliers. This is normal, and expected in the data.
I have tried to capture the boxplot y-axis range to compare with the average, and then setting the ylim if needed, but I just don't know how to retrieve these axis ranges.
My code is just
boxplot(...)
points(...)
and works as far as plotting the points. Just not adjusting the y-axis.
Question 1: is it not possible to get the boxplot to redraw with the new points data? I thought this was standard in R plots.
Question 2: if not, how can I dynamically adjust the y-axis range?
Let's try to show a concrete example of the problem with some simulated data:
set.seed(1)
df <- data.frame(y = c(rexp(99), 150), x = rep(c("A", "B"), each = 50))
Here, group "B" has a single outlier at 150, even though most values are a couple of orders of magnitude lower. That means that if we try to draw a boxplot, the boxes get squished at the bottom of the plot:
boxplot(y ~ x, data = df, col = "lightblue")
If we remove outliers, the boxes plot nicely:
boxplot(y ~ x, data = df, col = "lightblue", outline = FALSE)
The problem comes when we want to add a point indicating the mean value for each boxplot, since the mean of "B" lies outside the plot limits. Let's calculate and plot the means:
mean_vals <- sapply(split(df$y, df$x), mean)
mean_vals
#> A B
#> 0.9840417 4.0703334
boxplot(y ~ x, data = df, col = "lightblue", outline = FALSE)
points(1:2, mean_vals, cex = 2, pch = 16, col = "red")
The mean for "B" is missing because it lies above the upper range of the plot.
The secret here is to use boxplot.stats to get the limits of the whiskers. By concatenating our vector of means to this vector of stats and getting its range, we can set our plot limits exactly where they need to be:
y_limits <- range(c(boxplot.stats(df$y)$stats, mean_vals))
Now we apply these limits to a new boxplot and draw it with the points:
boxplot(y ~ x, data = df, outline = FALSE, ylim = y_limits, col = "lightblue")
points(1:2, mean_vals, cex = 2, pch = 16, col = "red")
For comparison, you could do the whole thing in ggplot like this:
library(ggplot2)
ggplot(df, aes(x, y)) +
geom_boxplot(fill = "lightblue", outlier.shape = NA) +
geom_point(size = 3, color = "red", stat = "summary", fun = mean) +
coord_cartesian(ylim = range(c(range(c(boxplot.stats(df$y)$stats,
mean_vals))))) +
theme_classic(base_size = 16)
Created on 2023-02-05 with reprex v2.0.2

Adding a 95% confidence interval to NMDS plot

I am trying to plot an NMDS plot of species community composition data with ellipses which represent 95% confidence intervals. I generated the data for my NMDS plot using metaMDS and successfully have ordinations generated using the basic plot functions in R (see code below). However, I am struggling to get my data to plot successfully using ggplot2 and this is the only way I have seen 95% CIs plotted on NMDS plots. I am hoping someone is able to help me correct my code so the ellipses show 95% CIs, or could point me in the right direction for achieving this using other methods?
My basic code for plotting my NMDS plot:
orditorp(dung.families.mds, display = "sites", labels = F, pch = c(16, 8, 17, 18) [as.numeric(group.variables$Heating)], col = c("green", "blue", "orange", "black") [as.numeric(group.variables$Dungfauna)], cex = 1.3)
ordiellipse(dung.families.mds, groups = group.variables$Dungfauna, draw = "polygon", lty = 1, col = "grey90")
legend("topleft", "stress = 0.1329627", bty = "n", cex = 1)
My ordination:
I realize this question is old, but I found this post useful for plotting confidence ellipses during my work, and maybe it will help you. Plotting ordiellipse function from vegan package onto NMDS plot created in ggplot2
Edit: Below I have copied the code from the second part of Didzis Elferts's answer on the link above.
Where "sol" is the metaMDS object:
First, make NMDS data frame with group column.
NMDS = data.frame(MDS1 = sol$points[,1], MDS2 = >sol$points[,2],group=MyMeta$amt)
Next, save result of function ordiellipse() as some object.
ord<-ordiellipse(sol, MyMeta$amt, display = "sites", >kind = "se", conf = 0.95, label = T)
Data frame df_ell contains values to show ellipses. It is calculated again with function veganCovEllipse which is hidden in vegan package. This function is applied to each level of NMDS (group) and now it uses arguments stored in ord object - cov, center and scale of each level.
df_ell <- data.frame()
for(g in levels(NMDS$group)){
df_ell <- rbind(df_ell, cbind(as.data.frame(with(NMDS[NMDS$group==g,],
veganCovEllipse(ord[[g]]$cov,ord[[g]]$center,ord[[g]]$scale)))
,group=g))
}
Plotting is done the same way as in previous example. As for the calculating of coordinates for elipses object of ordiellipse() is used, this solution will work with different parameters you provide for this function.
ggplot(data = NMDS, aes(MDS1, MDS2)) + geom_point(aes(color = group)) +
geom_path(data=df_ell, aes(x=NMDS1, y=NMDS2,colour=group), size=1, linetype=2)

Confidence interval bands in ggplot2 when using stat_quantile?

I would like to add the median spline and corresponding confidence interval bands to a ggplot2 scatter plot. I am using the 'quantreg'-package, more specifically the rqss function (Additive Quantile Regression Smoothing).
In ggplot2 I am able to add the median spline, but not the confidence interval bands:
fig = ggplot(dd, aes(y = MeanEst, x = N, colour = factor(polarization)))
fig + stat_quantile(quantiles=0.5, formula = y ~ qss(x), method = "rqss") +
geom_point()
The quantreg-package comes with its own plot function; plot.rqss. Where I am able to add the confidence bands (bands=TRUE):
plot(1, type="n", xlab="", ylab="", xlim=c(2, 12), ylim=c(-3, 0)) # empty plot
plotfigs = function(df) {
rqss_model = rqss(df$MeanEst ~ qss(df$N))
plot(rqss_model, bands=TRUE, add=TRUE, rug=FALSE, jit=FALSE)
return(NULL)
}
figures = lapply(split(dd, as.factor(dd$polarization)), plotfigs)
However plot function that comes with the quantreg-package is not very flexible/well suited for my needs. Is it possible to get the confidence bands in a ggplot2 plot? Perhaps by mimicking the method used in the quantreg-package, or simply copying them from the plot?
Data: pastebin.
You almost have it. When you call
plot(rqss_model, bands=TRUE, add=TRUE, rug=FALSE, jit=FALSE)
The function very helpfully returns the plotted data. All we do is grab the data frame. First a minor tweak to your function, return the data in a sensible way
plotfigs = function(df) {
rqss_model = rqss(df$MeanEst ~ qss(df$N))
band = plot(rqss_model, bands=TRUE, add=TRUE, rug=FALSE, jit=FALSE)
data.frame(x=band[[1]]$x, low=band[[1]]$blo, high=band[[1]]$bhi,
pol=unique(df$polarization))
}
Next call the function and condense
figures = lapply(split(dd, as.factor(dd$polarization)), plotfigs)
bands = Reduce("rbind", figures)
Then use geom_ribbon to plot
## We inherit y and color, so have to set them to NULL
fig + geom_ribbon(data=bands,
aes(x=x, ymin=low, ymax=high,
y=NULL, color=NULL, group=factor(pol)),
alpha=0.3)

Fitting smooth through xyplot

This question seems simple but I haven't been able to figure out how to do it. I'm trying to fit a smooth line through longitudinal dataset as illustrated in the following code
library(nlme)
xyplot(conc ~ Time, data = Theoph, groups = Subject, type = c("l", "smooth"))
The output isn't quite what I'm after and there are multiple warnings. I would like to fit a smooth through the entire data. As a bonus, if anyone could also show how to do this using ggplot, that would be great.
To plot the individual Subjects as separate lines and points but plot the overall smooth use either of the two lattices approaches shown or the classic graphics and zoo approach at the end. Also note that we need to order the time points to produce the overall smooth and the nlme package is not used. Also note that no errors are given by the code in the question -- only warnings.
1) trellis.focus/trellis.unfocus We can use trellis.focus/trellis.unfocus to add an overall smooth:
library(lattice)
xyplot(conc ~ Time, groups = Subject, data = Theoph, type = "o")
trellis.focus("panel", 1, 1)
o <- order(Theoph$Time)
panel.xyplot(Theoph[o, "Time"], Theoph[o, "conc"], type = "smooth", col = "red", lwd = 3)
trellis.unfocus()
2) panel function A second way is to define an appropriate panel function:
library(lattice)
o <- order(Theoph$Time)
xyplot(conc ~ Time, groups = Subject, data = Theoph[o, ], panel =
function(x, y, ..., subscripts, groups) {
for (lev in levels(groups)) {
ok <- groups == lev
panel.xyplot(x[ok], y[ok], type = "o", col = lev)
}
panel.xyplot(x, y, type = "smooth", col = "red", lwd = 3)
})
Either of these gives the following output. Note that the overall smooth is the thick red line.
(continued after chart)
3) zoo/classic graphics Here is a solution using the zoo package and classic graphics.
library(zoo)
Theoph.z <- read.zoo(Theoph[c("Subject", "Time", "conc")],
index = "Time", split = "Subject")
plot(na.approx(Theoph.z), screen = 1, col = 1:nlevels(Theoph$Subject))
o <- order(Theoph$Time)
lo <- loess(conc ~ Time, Theoph[o, ])
lines(fitted(lo) ~ Time, Theoph[o,], lwd = 3, col = "red")
You can use the latticeExtra package to add a smoother to your first treillis object
library(nlme)
library(ggplot2)
library(lattice)
library(latticeExtra)
xyplot(conc ~ Time, data = Theoph, groups = Subject, type = "l") +
layer(panel.smoother(..., col = "steelblue"))
And here is the ggplot2 version of the same graph
ggplot(data = Theoph, aes(Time, conc)) +
geom_line(aes(colour = Subject)) +
geom_smooth(col = "steelblue")

How can I add regression lines to a plot that has multiple data series that are colour coded by a factor?

I wish to add regression lines to a plot that has multiple data series that are colour coded by a factor. Using a brewer.pal palette, I created a plot with the data points coloured by factor (plant$ID). Below is an example of the code:
palette(brewer.pal(12,"Paired"))
plot(x=plant$TL, y=plant$d15N, xlab="Total length (mm)", ylab="d15N", col=plant$ID, pch=16)
legend(locator(1), legend=levels(factor(plant$ID)), text.col="black", pch=16, col=c(brewer.pal(12,"Paired")), cex=0.6)
Is there an easy way to add linear regression lines to the graph for each of the different data series (factors)? I also wish to colour the lines according to the factor plant$ID?
I can achieve this by adding each of the data series to the plot separately and then using the abline function (as below), but in cases with multiple data series it can be very time consuming matching up colours.
plot(y=plant$d15N[plant$ID=="Sm"], x=plant$TL[plant$ID=="Sm"], xlab="Total length (mm)", ylab="d15N", col="green", pch=16, xlim=c(50,300), ylim=c(8,15))
points(y=plant$d15N[plant$ID=="Md"], x=plant$TL[plant$ID=="Md"], type="p", pch=16, col="blue")
points(y=plant$d15N[plant$ID=="Lg"], x=plant$TL[plant$ID=="Lg"], type="p", pch=16, col="orange")
abline(lm(plant$d15N[plant$ID=="Sm"]~plant$TL[plant$ID=="Sm"]), col="green")
abline(lm(plant$d15N[plant$ID=="Md"]~plant$TL[plant$ID=="Md"]), col="blue")
abline(lm(plant$d15N[plant$ID=="Lg"]~plant$TL[plant$ID=="Lg"]), col="orange")
legend.text<-c("Sm","Md","Lg")
legend(locator(1), legend=legend.text, col=c("green", "blue", "orange"), pch=16, bty="n", cex=0.7)
There must be a quicker way! Any help would be greatly appreciated.
Or you use ggplot2 and let it do all the hard work. Unfortunately, you example is not reproducible, so I have to create some myself:
plant = data.frame(d15N = runif(1000),
TL = runif(1000),
ID = sample(c("Sm","Md","Lg"), size = 1000, replace = TRUE))
plant = within(plant, {
d15N[ID == "Sm"] = d15N[ID == "Sm"] + 0.5
d15N[ID == "Lg"] = d15N[ID == "Lg"] - 0.5
})
> head(plant)
d15N TL ID
1 0.6445164 0.14393597 Sm
2 0.2098778 0.62502205 Lg
3 -0.1599300 0.85331376 Lg
4 -0.3173119 0.60537491 Lg
5 0.8197111 0.01176013 Sm
6 1.0374742 0.68668317 Sm
The trick is to use the geom_smooth geometry which calculates the lm and draws it. Because we use color = ID, ggplot2 knows it needs to do the whole plot for each unique ID in ID.
library(ggplot2)
ggplot(plant, aes(x = TL, y = d15N, color = ID)) +
geom_point() + geom_smooth(method = "lm")

Resources