I have the following vector, which contains data for each day of December.
vector1 <- c(1056772, 674172, 695744, 775040, 832036,735124,820668,1790756,1329648,1195276,1267644,986716,926468,828892,826284,749504,650924,822256,3434204,2502916,1262928,1025980,1828580,923372,658824,956916,915776,1081736,869836,898736,829368)
Now I want to create a time series object on a weekly basis and used the following code snippet:
weeklyts = ts(vector1,start=c(2016,12,01), frequency=7)
However, the starting and end points are not correct. I always get the following time series:
> weeklyts
Time Series:
Start = c(2017, 5)
End = c(2021, 7)
Frequency = 7
[1] 1056772 674172 695744 775040 832036 735124 820668 1790756 1329648 1195276 1267644 986716 926468 828892 826284 749504
[17] 650924 822256 3434204 2502916 1262928 1025980 1828580 923372 658824 956916 915776 1081736 869836 898736 829368
Does anybody nows what I am doing wrong?
To get a timeseries that starts and ends as you would expect, you need to think about the timeserie. You have 31 days from december 2016.
The timeserie start option handles 2 numbers, not 3. So something like c(2016, 1) if you start with month 1 in 2016. See following example.
ts(1:12, start = c(2016, 1), frequency = 12)
Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec
2016 1 2 3 4 5 6 7 8 9 10 11 12
Now ts and daily data is an annoyance. ts cannot handle leap years. That is why you see people using a frequency of 365.25 to get an annual timeseries. To get a good december 2016 series we can do the following:
ts(vector1, start = c(2016, 336), frequency = 366)
Time Series:
Start = c(2016, 336)
End = c(2016, 366)
Frequency = 366
[1] 1056772 674172 695744 775040 832036 735124 820668 1790756 1329648 1195276 1267644 986716 926468 828892 826284 749504
[17] 650924 822256 3434204 2502916 1262928 1025980 1828580 923372 658824 956916 915776 1081736 869836 898736 829368
Note the following things that are going on:
Frequence is 366 because 2016 is a leap year
start is c(2016, 336), because 336 is the day in the year on "2016-12-01"
Personally I use xts package (and zoo) to handle daily data and use the functions in xts to aggregate to weekly timeseries. These can then be used with packages that like ts timeseries like forecast.
edit: added small xts example
my_df <- data.frame(dates = seq.Date(as.Date("2016-12-01"), as.Date("2017-01-31"), by = "day"),
var1 = rep(1:31, 2))
library(xts)
my_xts <- xts(my_df[, -1], order.by = my_df$dates)
# rollup to weekly. Dates shown are the last day in the weekperiod.
my_xts_weekly <- period.apply(my_xts, endpoints(my_xts, on = "weeks"), colSums)
head(my_xts_weekly)
[,1]
2016-12-04 10
2016-12-11 56
2016-12-18 105
2016-12-25 154
2017-01-01 172
2017-01-08 35
Depending on your needs you can transform this back into data.frames etc etc. Read the help for period.apply as you can specify your own functions in the rolling mechanism. And read the xts (and zoo) vignettes.
Related
I would like to mutate a fiscal month-end date to a dataset in R. In my company the fiscal month-end would be on 21st of that. For example
12/22/2019 to 1/21/2020 will be Jan-2020
1/22/2020 to 2/21/2020 will be Feb-2020
2/22/2020 to 3/21/2020 will be Mar-2020
etc
Dataset
Desired_output
How would I accomplish this in R. The Date column in my data is %m/%d/%Y(1/22/2020)
You could extract the date and if date is greater than 22 add 10 days to it and get the date in month-year format :
transform(dat, Fiscal_Month = format(Date +
ifelse(as.integer(format(Date, '%d')) >= 22, 10, 0), '%b %Y'))
# Date Fiscal_Month
#1 2020-01-20 Jan 2020
#2 2020-01-21 Jan 2020
#3 2020-01-22 Feb 2020
#4 2020-01-23 Feb 2020
#5 2020-01-24 Feb 2020
This can also be done without ifelse like this :
transform(dat, Fiscal_Month = format(Date + c(0, 10)
[(as.integer(format(Date, '%d')) >= 22) + 1], '%b %Y'))
data
Used this sample data :
dat <- data.frame(Date = seq(as.Date('2020-01-20'), by = '1 day',length.out = 5))
1) yearmon We perform the following steps:
create test data d which shows both a date in the start of period month (i.e. 22nd or later) and a date in the end of period month (i.e. 21st or earlier)
convert the input d to Date class giving dd
subtract 21 days thereby shifting it to the month that starts the fiscal period
convert that to ym of yearmon class (which represents a year and a month without a day directly and internally represents it as the year plus 0 for Jan, 1/12 for Feb, ..., 11/12 for Dec) and then add 1/12 to get to the month at the end of fiscal period.
format it as shown. (We could omit this step, i.e. the last line of code, if the default format, e.g. Jan 2020, that yearmon uses is ok.
The whole thing could easily be written in a single line of code but we have broken it up for clarity.
library(zoo)
d <- c("1/22/2020", "1/21/2020") # test data
dd <- as.Date(d, "%m/%d/%Y")
ym <- as.yearmon(dd - 21) + 1/12
format(ym, "%b-%y")
## [1] "Feb-20" "Jan-20"
2) Base R This could be done using only in base R as follows. We make use of dd from above. cut computes the first of the month that dd-21 lies in (but not as a Date class object) and then as.Date converts it to one. Adding 31 shifts it to the end of period month and formatting this we get the final answer.
format(as.Date(cut(dd - 21, "month")) + 31, "%b-%y")
## [1] "Feb-20" "Jan-20"
I have a problem with forecasting in R.
First of all, this is an example of the original dataset (CW_data_noNA):
Loading date Year Built Vessel Type Cargo Size Week
2019-08-22 2011 Medium 30000 34
2019-09-01 2004 Aframax 80000 35
2019-08-30 2005 Panamax 60000 35
2019-09-01 2000 VLCC 270000 35
2019-08-29 2001 VLCC 270000 35
2019-09-03 2003 Suezmax 130000 36
2019-08-26 2002 Medium 30000 34
I have to create a weekly time series (showing the total number of fixed ships and the cargo capacity), and then to use naïve and simple moving average to provide one-week ahead forecast.
Weekly_base <- CW_data_noNA %>% group_by(Week) %>% summarize(Number_of_fix = n(),cargo_capacity = sum(`Cargo Size`))
Weekly_ts <- ts(Weekly_base, start = c(2019, 32), frequency = 52)
demand_training <- window(Weekly_ts, start = c(2019,32), end=c(2019,41))
demand_test <- window(Weekly_ts, start = c(2019,42))
naive(demand_training, h=1)
The problem occured with the code above is that it gives me the forcasting not for the variables (number of fix and cargo capacity) but for the week itself. This is how the result looks like:
Point Forecast Lo 80 ....
2019.788 42 -23879066 ....
Can someone help me? Thank you.
In the line where you generate your Weekly_ts, you're currently supplying the whole data frame, i.e.
Weekly_ts <- ts(Weekly_base, start = c(2019, 32), frequency = 52)
I guess the help of naive (?naive) is a bit ambiguous(?), as it states that y should be
a numeric vector or time series of class ts
and you definitely supplied an object of class ts. However, in this case you supplied multiple series when it is expecting just the one. Simply select the one you want and it should forecast the correct series
relevant_variable <- Weekly_base %>%
select(cargo_capacity)#change cargo_capacity to Number_of_fix to change variable
Weekly_ts <- ts(relevant_variable, start = c(2019, 32), frequency = 52)
Or more direct
Weekly_ts <- ts(Weekly_base$cargo_capacity, start = c(2019, 32), frequency = 52)
I have a dataset that has daily prices from Jan 1 2009 to Jan 1 2019 and I want to transform it into a time series. When I use monthly data, the ts() function works as expected:
> head(monthlyts)
Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec
1999 0.8811 0.8854 0.9251 0.8940 0.8746 0.8521 0.8522 0.8799 0.9143 0.8951 0.9123 0.8862
2000 0.8665 0.8934 0.8900 0.8709 0.8463 0.8185 0.8319 0.8266 0.8677 0.8697 0.8346 0.8575
but when I try it with daily prices it appears completely differently:
> head(dailyts)
Time Series:
Start = 2009
End = 2009.01368925394
Frequency = 365.25
Price
[1,] 0.8990
[2,] 0.8990
[3,] 0.9014
[4,] 0.9004
[5,] 0.9041
[6,] 0.8986
The code I'm using for both is the same so I'm not sure what the issue is.
monthlyts <- ts(mprices['Price'], frequency=12, start=c(2009,1))
dailyts <- ts(dprices['Price'], frequency=365.25, start=c(2009,1))
There's no change in the data either, both .csv files are dl'd from the same website and are the same timeframe, just one is monthly and one is daily.
Any ideas on how to get the daily time series properly?
Here's some test data that's representative of the problem
data <- as.data.frame(sample(seq(from=0, to=1, by=0.0001), size = 730, replace = TRUE))
colnames(data) <- 'data'
datats <- ts(data, frequency=365, start=c(2009,1))
head(datats)
It should output two rows of data labelled 2009 and 2010 with 365 columns in each row.
Currently I am working on a river discharge data analysis. I have the daily discharge record from 1935 to now. I want to extract the annual maximum discharge for each hydrolocial year (start from 01/11 to next year 31/10). However, I found that the hydroTSM package can only deal with the natural year. I tried to use the "zoo" package, but I found it's difficult to compute, as each year have different days. Does anyone have some idea? Thanks.
the data looks like:
01-11-1935 663
02-11-1935 596
03-11-1935 450
04-11-1935 381
05-11-1935 354
06-11-1935 312
my code:
mydata<-read.table("discharge")
colnames(mydata) <- c("date","discharge")
library(zoo)
z<-zooreg(mydata[,2],start=as.Date("1935-11-1"))
mydta$date <- as.POSIXct(dat$date)
q.month<-daily2monthly(z,FUN=max,na.rm = TRUE,date.fmt = "%Y-%m-%d",out.fmt="numeric")
q.month.plain=coredata(q.month)
z.month<-zooreg(q.month.plain,start=1,frequency=12)
With dates stored in a vector of class Date, you can just use cut() and tapply(), like this:
## Example data
df <- data.frame(date = seq(as.Date("1935-01-01"), length = 100, by = "week"),
flow = (runif(n = 100, min = 0, max = 1000)))
## Use vector of November 1st dates to cut data into hydro-years
breaks <- seq(as.Date("1934-11-01"), length=4, by="year")
df$hydroYear <- cut(df$date, breaks, labels=1935:1937)
## Find the maximum flow in each hydro-year
with(df, tapply(flow, hydroYear, max))
# 1935 1936 1937
# 984.7327 951.0440 727.4210
## Note: whenever using `cut()`, I take care to double-check that
## I've got the cuts exactly right
cut(as.Date(c("1935-10-31", "1935-11-01")), breaks, labels=1935:1937)
# [1] 1935 1936
# Levels: 1935 1936 1937
Here is a one-liner to do that.
First convert the dates to "yearmon" class. This class represents a year month as the sum of a year as the integer part and a month as the fractional part (Jan = 0, Feb = 1/12, etc.). Add 2/12 to shift November to January and then truncate to give just the years. Aggregate over those. Although the test data we used starts at the beginning of the hydro year this solution works even if the data does not start on the beginning of the hydro year.
# test data
library(zoo)
z <- zooreg(1:1000, as.Date("2000-11-01")) # test input
aggregate(z, as.integer(as.yearmon(time(z)) + 2/12), max)
This gives:
2001 2002 2003
365 730 1000
Try the xts package, which works together with zoo:
require(zoo)
require(xts)
dates = seq(Sys.Date(), by = 'day', length = 365 * 3)
y = cumsum(rnorm(365 * 3))
serie = zoo(y, dates)
# if you need to specify `start` and `end`
# serie = window(serie, start = "2015-06-01")
# xts function
apply.yearly(serie, FUN = max)
I'm trying to load time series in R with the 'zoo' library.
The observations I have varying precision. Some have the day/month/year, others only month and year, and others year:
02/10/1915
1917
07/1917
07/1918
30/08/2018
Subsequently, I need to aggregate the rows by year, year and month.
The basic R as.Date function doesn't handle that.
How can I model this data with zoo?
Thanks,
Mulone
We use the test data formed from the index data in the question followed by a number:
# test data
Lines <- "02/10/1915 1
1917 2
07/1917 3
07/1918 4
30/08/2018 5"
yearly aggregation
library(zoo)
to.year <- function(x) as.numeric(sub(".*/", "", as.character(x)))
read.zoo(text = Lines, FUN = to.year, aggregate = mean)
The last line returns:
1915 1917 1918 2018
1.0 2.5 4.0 5.0
year/month aggregation
Since year/month aggregation of data with no months makes no sense we first drop the year only data and aggregate the rest:
DF <- read.table(text = Lines, as.is = TRUE)
# remove year-only records. DF.ym has at least year and month.
yr <- suppressWarnings(as.numeric(DF[[1]]))
DF.ym <- DF[is.na(yr), ]
# remove day, if present, and convert to yearmon.
to.yearmon <- function(x) as.yearmon( sub("\\d{1,2}/(\\d{1,2}/)", "\\1", x), "%m/%Y" )
read.zoo(DF.ym, FUN = to.yearmon, aggregate = mean)
The last line gives:
Oct 1915 Jul 1917 Jul 1918 Aug 2018
1 3 4 5
UPDATE: simplifications