I want to plot a density time series with following data:
density vector (4,2,5,8,4,6,4)
sampling period vector (unit: seconds) (2,2,2,2,3,2,2)
as you can see, the sampling period is not constant. I only know the starting date and time.
I somehow need to assign the start time date to the first measurement and then compute the following dates and times for the following measurements, but i don't know how exactly to code it.
Try converting first the desired vector in a ts, provided an initial starttime and period's cumsum.
I assumed that you sample a continous process (there are not spanned/death times)
require (lubridate)
require (tidyr)
require (ggplot2)
require (ggfortify)
require (timetk)
density <- c (4,2,5,8,4,6,4)
seconds <- c (2,2,2,2,3,2,2)
starttime <- 0
time <- 0 + cumsum (seconds)
df <- as.data.frame (cbind (time, seconds, density))
df$time <- as_datetime(df$time)
df$ts <- tk_ts (df, select = density)
autoplot (df$ts, ts.geom = 'bar', fill = 'blue')
Plot the density against the cumulative sum of the seconds added to the start.
dens <- c(4,2,5,8,4,6,4)
secs <- c(2,2,2,2,3,2,2)
st <- as.POSIXct("2000-01-01 00:00:00")
plot(st + cumsum(secs), dens, xlab = "", type = "l")
Related
Hi I try desperately to plot several time series with a 12 months moving average.
Here is an example with two time series of flower and seeds densities. (I have much more time series to work on...)
#datasets
taxon <- c(rep("Flower",36),rep("Seeds",36))
density <- c(seq(20, 228, length=36),seq(33, 259, length=36))
year <- rep(c(rep("2000",12),rep("2001",12),rep("2002",12)),2)
ymd <- c(rep(seq(ymd('2000-01-01'),ymd('2002-12-01'), by = 'months'),2))
#dataframe
df <- data.frame(taxon, density, year, ymd)
library(forecast)
#create function that does a Symmetric Weighted Moving Average (2x12) of the monthly log density of flowers and seeds
ma_12 <- function(x) {
ts_x <- ts(x, freq = 12, start = c(2000, 1), end = c(2002, 12)) # transform to time-series object as it is necessary to run the ma function
return(ma(log(ts_x + 1), order = 12, centre = T))
}
#trial of the function
ma_12(df[df$taxon=="Flower",]$density) #works well
library(ggplot2)
#Trying to plot flower and seeds log density as two time series
ggplot(df,aes(x=year,y=density,colour=factor(taxon),group=factor(taxon))) +
stat_summary(fun.y = ma_12, geom = "line") #or geom = "smooth"
#Warning message:
#Computation failed in `stat_summary()`:
#invalid time series parameters specified
Function ma_12 works correctly. The problem comes when I try to plot both time-series (Flower and Seed) using ggplot. I cannot define both taxa as different time series and apply a moving average on them. Seems that it has to do with "stat_summary"...
Any help would be more than welcome! Thanks in advance
Note: The following link is quite useful but can not directly help me as I want to apply a specific function and plot it in accordance to the levels of one group variable. For now, I can't find any solution. Any way, thank you to suggest me this.
Multiple time series in one plot
This is what you need?
f <- ma_12(df[df$taxon=="Flower", ]$density)
s <- ma_12(df[df$taxon=="Seeds", ]$density)
f <- cbind(f,time(f))
s <- cbind(s,time(s))
serie <- data.frame(rbind(f,s),
taxon=c(rep("Flower", dim(f)[1]), rep("Seeds", dim(s)[1])))
serie$density <- exp(serie$f)
library(lubridate)
serie$time <- ymd(format(date_decimal(serie$time), "%Y-%m-%d"))
library(ggplot2)
ggplot() + geom_point(data=df, aes(x=ymd, y=density, color=taxon, group=taxon)) +
geom_line(data=serie, aes(x= time, y=density, color=taxon, group=taxon))
I am having issues with trying to plot some Time Series data; namely, trying to plot the date (increments in months) against a real number (which represents price).
I can plot the data with just plot(months, mydata) with no issue, but its in a scatter plot format.
However, when I try the same with ts.plot i.e. tsplot(months, mydata), I get the following error:
Error in .cbind.ts(list(...), .makeNamesTs(...), dframe = dframe, union = TRUE) : no time series supplied
I tried to bypass this by doing tsplot(ts(months, mydata)), but with this I get a straight linear line (which I know isn't correct).
I have made sure that both months and mydata have the same length
EDIT: What I mean by custom x-axis
I need the data to be in monthly increments (specifically from 03/1998 to 02/2018) - so I ran the following in R:
d <- seq(as.Date("1998-03-01"), as.Date("2018-02-01"), "day")
months <- seq(min(d), max(d), "month")
Now that I have attained the monthly increments, I need the above variable, months, to act as the x-axis for the Time Series plot (perhaps more accurately, the time index).
With package zoo you can do the following.
library(zoo)
z <- zoo(mydata, order.by = months)
labs <- seq(min(index(z)), max(index(z)), length.out = 10)
plot(z, xaxt = "n")
axis(1, at = labs, labels = format(labs, "%m/%Y"))
Data creation code.
set.seed(1234)
d <- seq(as.Date("1998-03-01"), as.Date("2018-02-01"), "day")
months <- seq(min(d), max(d), "month")
n <- length(months)
mydata <- cumsum(rnorm(n))
I have a daily time series about number of visitors on the web site. my series start from 01/06/2014 until today 14/10/2015 so I wish to predict number of visitor for in the future. How can I read my series with R? I'm thinking:
series <- ts(visitors, frequency=365, start=c(2014, 6))
if yes,and after runing my time series model arimadata=auto.arima() I want to predict visitor's number for the next 6o days, how can i do this?
h=..?
forecast(arimadata,h=..),
the value of h shoud be what ?
thanks in advance for your help
The ts specification is wrong; if you are setting this up as daily observations, then you need to specify what day of the year 2014 is June 1st and specify this in start:
## Create a daily Date object - helps my work on dates
inds <- seq(as.Date("2014-06-01"), as.Date("2015-10-14"), by = "day")
## Create a time series object
set.seed(25)
myts <- ts(rnorm(length(inds)), # random data
start = c(2014, as.numeric(format(inds[1], "%j"))),
frequency = 365)
Note that I specify start as c(2014, as.numeric(format(inds[1], "%j"))). All the complicated bit is doing is working out what day of the year June 1st is:
> as.numeric(format(inds[1], "%j"))
[1] 152
Once you have this, you're effectively there:
## use auto.arima to choose ARIMA terms
fit <- auto.arima(myts)
## forecast for next 60 time points
fore <- forecast(fit, h = 60)
## plot it
plot(fore)
That seems suitable given the random data I supplied...
You'll need to select appropriate arguments for auto.arima() as suits your data.
Note that the x-axis labels refer to 0.5 (half) of a year.
Doing this via zoo
This might be easier to do via a zoo object created using the zoo package:
## create the zoo object as before
set.seed(25)
myzoo <- zoo(rnorm(length(inds)), inds)
Note you now don't need to specify any start or frequency info; just use inds computed earlier from the daily Date object.
Proceed as before
## use auto.arima to choose ARIMA terms
fit <- auto.arima(myts)
## forecast for next 60 time points
fore <- forecast(fit, h = 60)
The plot though will cause an issue as the x-axis is in days since the epoch (1970-01-01), so we need to suppress the auto plotting of this axis and then draw our own. This is easy as we have inds
## plot it
plot(fore, xaxt = "n") # no x-axis
Axis(inds, side = 1)
This only produces a couple of labeled ticks; if you want more control, tell R where you want the ticks and labels:
## plot it
plot(fore, xaxt = "n") # no x-axis
Axis(inds, side = 1,
at = seq(inds[1], tail(inds, 1) + 60, by = "3 months"),
format = "%b %Y")
Here we plot every 3 months.
Time Series Object does not work well with creating daily time series. I will suggest you use the zoo library.
library(zoo)
zoo(visitors, seq(from = as.Date("2014-06-01"), to = as.Date("2015-10-14"), by = 1))
Here's how I created a time series when I was given some daily observations with quite a few observations missing. #gavin-simpson gave quite a big help. Hopefully this saves someone some grief.
The original data looked something like this:
library(lubridate)
set.seed(42)
minday = as.Date("2001-01-01")
maxday = as.Date("2005-12-31")
dates <- seq(minday, maxday, "days")
dates <- dates[sample(1:length(dates),length(dates)/4)] # create some holes
df <- data.frame(date=sort(dates), val=sin(seq(from=0, to=2*pi, length=length(dates))))
To create a time-series with this data I created a 'dummy' dataframe with one row per date and merged that with the existing dataframe:
df <- merge(df, data.frame(date=seq(minday, maxday, "days")), all=T)
This dataframe can be cast into a timeseries. Missing dates are NA.
nts <- ts(df$val, frequency=365, start=c(year(minday), as.numeric(format(minday, "%j"))))
plot(nts)
series <- ts(visitors, frequency=365, start=c(2014, 152))
152 number is 01-06-2014 as it start from 152 number because of frequency=365
To forecast for 60 days, h=60.
forecast(arimadata , h=60)
What is the smartest way to manipulate POSIX for use in ggplot axis?
I am trying to create a function for plotting many graphs (One per day) spanning a period of weeks, using POSIX time for the x axis.
To do so, I create an additional integer column DF$Day with the day, that I input into the function. Then, I create a subset using that day, which I plot using ggplot2. I figured how to use scale_x_datetime to format the POSIX x axis. Basically, I have it show the hours & minutes only, omitting the date.
Here is my question: How can I set the limits for each individual graph in hours of the day?
Below is some working, reproducible code to get an idea. It creates the first day, shows it for 3 seconds & the proceeds to create the second day. But, each days limits is chosen based on the range of the time variable. How can I make the range, for instance, all day long (0h - 24h)?
DF <- data.frame(matrix(ncol = 0, nrow = 4))
DF$time <- as.POSIXct(c("2010-01-01 02:01:00", "2010-01-01 18:10:00", "2010-01-02 04:20:00", "2010-01-02 13:30:00"))
DF$observation <- c(1,2,1,2)
DF$Day <- c(1,1,2,2)
for (Individual_Day in 1:2) {
Day_subset <- DF[DF$Day == as.integer(Individual_Day),]
print(ggplot( data=Day_subset, aes_string( x="time", y="observation") ) + geom_point() +
scale_x_datetime( breaks=("2 hour"), minor_breaks=("1 hour"), labels=date_format("%H:%M")))
Sys.sleep(3) }
Well, here's one way.
# ...
for (Individual_Day in 1:2) {
Day_subset <- DF[DF$Day == as.integer(Individual_Day),]
lower <- with(Day_subset,as.POSIXct(strftime(min(time),"%Y-%m-%d")))
upper <- with(Day_subset,as.POSIXct(strftime(as.Date(max(time))+1,"%Y-%m-%d"))-1)
limits = c(lower,upper)
print(ggplot( data=Day_subset, aes( x=time, y=observation) ) +
geom_point() +
scale_x_datetime( breaks=("2 hour"),
minor_breaks=("1 hour"),
labels=date_format("%H:%M"),
limits=limits)
)
}
The calculation for lower takes the minimum time in the subset and coerces it to character with only the date part (e.g., strips away the time part). Converting back to POSIXct generates the beginning of that day.
The calculation for upper is a little more complicated. You have to convert the maximum time to a Date value and add 1 (e.g., 1 day), then convert to character (strip off the time part), convert back to POSIXct, and subtract 1 (e.g., 1 second). This generates 23:59 on the end day.
Huge amount of work for such a small thing. I hope someone else posts a simpler way to do this...
Is it possible to substract one survdiff object from another one in R, using the survival package?
I want to plot a figure that shows in which intervals one survival curve is higher/lower than the other and by how much.
one possible solution with survA and survB as survdiff-objects:
interval <- 0:2500
# choose a different time interval if you want
sumA <- summary(survA, time = interval)
sumB <- summary(survB, time = interval)
both <- data.frame(time = interval, A = sumA$surv, B = sumB$surv)
both$diff <- both$B - both$A
# or both$diff <- both$A - both$B
plot(x = both$time, y = both$diff, type = "line")