Plot time series linear regression in ggplot2 - r

My question is about the representation of time series analysis from tslm with ggplot2.
I have used forecast package to decompose SST time series in the Mediterranean in trend, seasonal and remainder components. Then I have looked for the slope (trend) of the linear regression for the trend component with tslm. But I can't figure out how to plot the tslm with ggplot2. Should I ggplot SST trend component with geom_smooth(model=lm)? Would lm provide the same results (slope) than tslm?
This is the code used to build and decompose SST time series
library(forecast)
# Loop to calculate trend for any grid point/column
for (i in 2:length(data)){
# read variable/column to analyse
var<-paste("V",i,sep="")
ff<-data$fecha
valor<-data[,i]
datos2<-as.data.frame(cbind(data$fecha,valor))
#Build time series
datos.ts<-ts(datos2$valor, frequency = 365)
datos.stl <- stl(datos.ts,s.window = 365)
# tslm: Save trend component
datos.tslm<-tslm(datos.ts ~ trend)
output[,i-1]<-datos.stl$time.series[,2]
}
# Summarize trends for the whole Mediterranean (mean value to be plotted)
trend<-as.data.frame(rowMeans(output[,1:length(output)]))
And the code to plot with geom_smooth
trend.plot<-ggplot(data=trend, aes(x=fecha, y=trend)) + geom_point(size=0.1) +
geom_smooth(method='lm', data = trend[1:12784,])
EDIT 1
As SST data consists of a bunch of files, I've uploaded trend data to Dropbox and made available in this csv file

I am trying to understand you question and as the first try, I have revised your code as following (the data attached only contains 2 columns, so I removed the for loop, but generalization should not be hard)
library(forecast)
library(ggplot2)
library(zoo)
data <- read.csv('../Downloads/trend_data.csv', header=TRUE)
data$fecha <- as.Date(data$fecha)
i <- 2
# read variable/column to analyse
var<-paste("V",i,sep="")
ff<-data$fecha
valor<-data[,i]
datos2<-as.data.frame(cbind(data$fecha,valor))
#Build time series
datos.ts<-ts(datos2$valor, frequency = 365)
datos.stl <- stl(datos.ts,s.window = 365)
# tslm: Save trend component
datos.tslm<-tslm(datos.ts ~ trend)
output <-datos.stl$time.series[,2]
# Summarize trends for the whole Mediterranean (mean value to be plotted)
# trend<-as.data.frame(rowMeans(output[,1:length(output)]))
ggplot(data=data, aes(x=fecha, y=trend)) + geom_point(size=0.1) +
geom_smooth(method='lm', data = data.frame(fecha=data$fecha, trend=output), aes(x=fecha, y=output))
Let me know if I misinterpret your intention here.
UPDATE: I feel like what you want might be just line plot of the output trend of tslm?
ggplot(data=data, aes(x=fecha, y=trend)) + geom_point(size=0.1) +
geom_line(data = data.frame(fecha=data$fecha, trend=output), aes(x=fecha, y=output))
If you want a smoothed version of the trend,
ggplot(data=data, aes(x=fecha, y=trend)) + geom_point(size=0.1, col="red") +
geom_smooth(data = data.frame(fecha=data$fecha, trend=output), aes(x=fecha, y=output),col="blue",size=0.1)

The data you provided, plotted as a linegraph with one dot per day. Does this solve your problem?
library(dplyr)
library(ggplot2)
trend_data <- read.csv2("../trend_data.csv",
sep = ",",stringsAsFactors = FALSE)
df <- trend_data %>% mutate(fecha = as.Date(fecha), trend = as.numeric(trend))
ggplot(df, aes(x = fecha, y = trend)) +
geom_line() +
geom_point()

Related

Autoplot time series - set fixed months/years to plot

library(fpp)
library(forecast)
ausbeer.train <- window(ausbeer, end=c(1999,4))
ausbeer.test <- window(ausbeer, start=c(2000,1))
autoplot(ausbeer.train, xlab="Rok", ylab="beer") +
autolayer(snaive(ausbeer.train, h=32), PI=FALSE, series="snaive") +
autolayer(meanf(ausbeer.train, h=32), PI=FALSE, series="meanf") +
autolayer(ausbeer.test)
produces
What if I wanted to plot only data from 1995 up to 2008? Can I somehow limit the range on the x axis? I don't want to subset my data (as snaive and meanf and probably other methods will need the entire train data), I only need to limit what I draw on the plot.
If p is the value of the autoplot statement in the question then this will plot only 1995 to the end of the series.
library(ggplot2)
p + xlim(1995, NA)

Moving average on several time series using ggplot

Hi I try desperately to plot several time series with a 12 months moving average.
Here is an example with two time series of flower and seeds densities. (I have much more time series to work on...)
#datasets
taxon <- c(rep("Flower",36),rep("Seeds",36))
density <- c(seq(20, 228, length=36),seq(33, 259, length=36))
year <- rep(c(rep("2000",12),rep("2001",12),rep("2002",12)),2)
ymd <- c(rep(seq(ymd('2000-01-01'),ymd('2002-12-01'), by = 'months'),2))
#dataframe
df <- data.frame(taxon, density, year, ymd)
library(forecast)
#create function that does a Symmetric Weighted Moving Average (2x12) of the monthly log density of flowers and seeds
ma_12 <- function(x) {
ts_x <- ts(x, freq = 12, start = c(2000, 1), end = c(2002, 12)) # transform to time-series object as it is necessary to run the ma function
return(ma(log(ts_x + 1), order = 12, centre = T))
}
#trial of the function
ma_12(df[df$taxon=="Flower",]$density) #works well
library(ggplot2)
#Trying to plot flower and seeds log density as two time series
ggplot(df,aes(x=year,y=density,colour=factor(taxon),group=factor(taxon))) +
stat_summary(fun.y = ma_12, geom = "line") #or geom = "smooth"
#Warning message:
#Computation failed in `stat_summary()`:
#invalid time series parameters specified
Function ma_12 works correctly. The problem comes when I try to plot both time-series (Flower and Seed) using ggplot. I cannot define both taxa as different time series and apply a moving average on them. Seems that it has to do with "stat_summary"...
Any help would be more than welcome! Thanks in advance
Note: The following link is quite useful but can not directly help me as I want to apply a specific function and plot it in accordance to the levels of one group variable. For now, I can't find any solution. Any way, thank you to suggest me this.
Multiple time series in one plot
This is what you need?
f <- ma_12(df[df$taxon=="Flower", ]$density)
s <- ma_12(df[df$taxon=="Seeds", ]$density)
f <- cbind(f,time(f))
s <- cbind(s,time(s))
serie <- data.frame(rbind(f,s),
taxon=c(rep("Flower", dim(f)[1]), rep("Seeds", dim(s)[1])))
serie$density <- exp(serie$f)
library(lubridate)
serie$time <- ymd(format(date_decimal(serie$time), "%Y-%m-%d"))
library(ggplot2)
ggplot() + geom_point(data=df, aes(x=ymd, y=density, color=taxon, group=taxon)) +
geom_line(data=serie, aes(x= time, y=density, color=taxon, group=taxon))

r: Blank graph when plotting multiple lines on scatterplot

My goal is to produce a graph showing the differences between regression lines using continuous vs categorical variables. I'm using is the "SleepStudy" dataset from Lock5Data, and I want to show the regression lines predicting GPA from ClassYear as either continuous or categorical. The code is below:
library(Lock5Data)
data("SleepStudy")
fit2 <- lm(GPA ~ factor(ClassYear), data = SleepStudy)
fit2_line <- aggregate(fit2$fitted.values ~ SleepStudy$ClassYear, FUN = mean)
colnames(fit2_line) <- c('ClassYear','GPA')
options(repr.plot.width=5, repr.plot.height=5)
library(ggplot2)
ggplot() +
geom_line(data=fit2_line, aes(x=ClassYear, y=GPA)) + # Fit line, ClassYear factor
geom_smooth(data=SleepStudy, method='lm', formula=GPA~ClassYear) + # Fit line, ClassYear continuous
geom_point(data=SleepStudy, aes(x=ClassYear, y=GPA)) # Data points as dots
What is producing the blank graph? What am I missing here?
You have to define the data you are using for the geom_smooth in the ggplot(). This code works:
ggplot(data=SleepStudy, aes(y = GPA,x = ClassYear)) +
geom_smooth(data=SleepStudy, method='lm', formula=y~x)+
geom_line(data=fit2_line, aes(x=ClassYear, y=GPA)) +
geom_point(data=SleepStudy, aes(x=ClassYear, y=GPA))

bacterial growth curve (logistic/sigmoid) with multiple explanatory variables in R

Goal: I want to obtain regression (ggplot curves and model parameters) for growth curves with multiple treatments.
I have data for bacterial cultures C={a,b,c,d} growing on nutrient sources N={x,y}.
Their idealized growth curves (measuring turbidity of cell culture every hour) look something like this:
There are 8 different curves to obtain coefficients and curves for. How can I do it in one go for my data frame, feeding the different treatments as different groups for the nonlinear regression?
Thanks!!!
This question is similar to an unanswered question posted here.
(sourcecode for idealized data, sorry it's not elegant as I'm not a computer scientist):
a<-1:20
a[1]<-0.01
for(i in c(1:19)){
a[i+1]<-1.3*a[i]*(1-a[i])
}
b<-1:20
b[1]<-0.01
for(i in c(1:19)){
b[i+1]<-1.4*b[i]*(1-b[i])
}
c<-1:20
c[1]<-0.01
for(i in c(1:19)){
c[i+1]<-1.5*c[i]*(1-c[i])
}
d<-1:20
d[1]<-0.01
for(i in c(1:19)){
d[i+1]<-1.6*d[i]*(1-d[i])
}
sub.data<-cbind(a,b,c,d)
require(reshape2)
data<-melt(sub.data, value.name = "OD600")
data$nutrition<-rep(c("x", "y"), each=5, times=4)
colnames(data)[1:2]<-c("Time", "Culture")
ggplot(data, aes(x = Time, y = OD600, color = Culture, group=nutrition)) +
theme_bw() + xlab("Time/hr") + ylab("OD600") +
geom_point() + facet_wrap(~nutrition, scales = "free")
If you are familiar group_by function from dplyr (included in tidyverse), then you can group your data by Culture and nutrition and create models for each group using broom. I think this vignette is getting at exactly what you are trying to accomplish. Here is the code all in one go:
library(tidyverse)
library(broom)
library(mgcv) #For the gam model
data %>%
group_by(Culture, nutrition) %>%
do(fit = gam(OD600 ~ s(Time), data = ., family=gaussian())) %>% # Change this to whatever model you want (e.g., non-linear regession, sigmoid)
#do(fit = lm(OD600 ~ Time, data = .,)) %>% # Example using linear regression
augment(fit) %>%
ggplot(aes(x = Time, y = OD600, color = Culture)) + # No need to group by nutrition because that is broken out in the facet_wrap
theme_bw() + xlab("Time/hr") + ylab("OD600") +
geom_point() + facet_wrap(~nutrition, scales = "free") +
geom_line(aes(y = .fitted, group = Culture))
If you are ok without one go, break apart the %>% for better understanding. I used GAM which overfits here but you could replace this with whatever model you want, including sigmoid.

How to customize title, axis labels, etc. in a plot of a decomposed time series

I'm reasonably familiar with the usual ways of modifying a plot by writing your own x axis labels or a main title, but I've been unable to customize the output when plotting the results of a time series decomposition.
For example,
library(TTR)
t <- ts(co2, frequency=12, start=1, deltat=1/12)
td <- decompose(t)
plot(td)
plot(td, main="Title Doesn't Work") # gets you an error message
gives you a nice, basic plot of the observed time series, trend, etc. With my own data (changes in depth below the water surface), however, I'd like to be able to switch the orientation of the y axes (eg ylim=c(40,0) for 'observed', or ylim=c(18,12) for 'trend'), change 'seasonal' to 'tidal', include the units for the x axis ('Time (days)'), and provide a more descriptive title for the figure.
My impression is that the kind of time series analyses I'm doing is pretty basic and, eventually, I may be better off using another package, perhaps with better graphical control, but I'd like to use ts() and decompose() if I can for now (yeah, cake and consumption). Assuming this doesn't get too horrendous.
Is there a way to do this?
Thanks! Pete
You can modify the plot.decomposed.ts function (that's the plot "method" that gets dispatched when you run plot on an object of class decomposed.ts (which is the class of td).
getAnywhere(plot.decomposed.ts)
function (x, ...)
{
xx <- x$x
if (is.null(xx))
xx <- with(x, if (type == "additive")
random + trend + seasonal
else random * trend * seasonal)
plot(cbind(observed = xx, trend = x$trend, seasonal = x$seasonal, random = x$random),
main = paste("Decomposition of", x$type, "time series"), ...)
}
Notice in the code above that the function hard-codes the title. So let's modify it so that we can choose our own title:
my_plot.decomposed.ts = function(x, title="", ...) {
xx <- x$x
if (is.null(xx))
xx <- with(x, if (type == "additive")
random + trend + seasonal
else random * trend * seasonal)
plot(cbind(observed = xx, trend = x$trend, seasonal = x$seasonal, random = x$random),
main=title, ...)
}
my_plot.decomposed.ts(td, "My Title")
Here's a ggplot version of the plot. ggplot requires a data frame, so the first step is to get the decomposed time series into data frame form and then plot it.
library(tidyverse) # Includes the packages ggplot2 and tidyr, which we use below
# Get the time values for the time series
Time = attributes(co2)[[1]]
Time = seq(Time[1],Time[2], length.out=(Time[2]-Time[1])*Time[3])
# Convert td to data frame
dat = cbind(Time, with(td, data.frame(Observed=x, Trend=trend, Seasonal=seasonal, Random=random)))
ggplot(gather(dat, component, value, -Time), aes(Time, value)) +
facet_grid(component ~ ., scales="free_y") +
geom_line() +
theme_bw() +
labs(y=expression(CO[2]~(ppm)), x="Year") +
ggtitle(expression(Decomposed~CO[2]~Time~Series)) +
theme(plot.title=element_text(hjust=0.5))

Resources