I have a question regarding declaring my dataset as a time series. I have weekly historical data of demand of a certain product. Every week is labeled as 201401, 201402, up to the current month 201937.
The problem arises once I declare the set as ts <- ts(data, start=2014, frequency=52). Because every year consists of 365.25 days, in my set I have a week 201553. So from week 201601, every week is basically a week later, which causes problem when finding seasonality patterns.
Do I have to delete week 201553 or what is the appropriate continuation?
If you set frequency=365.25/7, things should work out fine.
References
https://otexts.com/fpp2/ts-objects.html
https://www.rdocumentation.org/packages/stats/versions/3.6.1/topics/ts (interestingly, the trick with frequency=52.17857 is not mentioned here)
http://manishbarnwal.com/blog/2017/05/03/time_series_and_forecasting_using_R/
Related
This is my first question on stackoverflow, sorry if the question is poorly put.
I am currently developing a project where I predict how much a person drinks each day. I currently have data that looks like this:
The menge column represents how much water a person has actually drunk in 30 minutes (So first value represents amount from 8:00 till before 8:30 etc..). This is a 1 day sample from 3 months of data. The day starts at 8 AM and ends at 8 PM.
I am trying to forecast the Time Series for each day. For example, given the first one or two time steps, we would predict the whole day and then we know how much in total the person has drunk until 8 PM.
I am trying to model this data as a Time Series object in R (Google Colab), in order to use Croston's Method for the forecasting. Using the ts() function, what should I set the frequency to knowing that:
The data is half-hourly
The data is from 8:00 till 20:00 each day (Does not span the whole day)
Would I need to make the data span the whole day by adding 0 values? Are there maybe better approaches for this? Thank you in advance.
When using the ts() function, the frequency is used to define the number of (usually regularly spaced) observations within a given time period. For your example, your observations are every 30 minutes between 8AM and 8PM, and your time period is 1 day. The time period of 1 day assumes that the patterns over each day is of most interest here, you could also use 1 week here.
So within each day of your data (8AM-8PM) you have 24 observations (24 half hours). So a suitable frequency for this data would be 24.
You can also pad the data with 0 values, however this isn't necessary and would complicate the model. If you padded the data so that it has observations for all half-hours of the day, the frequency would then be 48.
I am trying to plot a decomposed time series, but running into an error:
Error in decompose(ts_ret) : time series has no or less than 2 periods`.
I am forcing the time series to a fixed period that is higher than 2.
Why does the ts think the period is less than 2?
Shouldn't the period be set automatically based on the time intervals in the data? (which are daily)
rm(list=ls())
library(jsonlite)
library(xts)
item.id<-18
eve.url<-paste0("http://eve-marketdata.com/api/item_history2.json?char_name=demo®ion_ids=10000002&type_ids=",item.id,"&days=100")
eve.data<-data.frame(fromJSON(txt=eve.url))$emd.row
eve.data$date<-as.POSIXct(eve.data$date,format="%Y-%m-%d",tz="EST")
xxx<-xts(as.numeric(eve.data[,"avgPrice"]),eve.data$date)
colnames(xxx)<-"trit"
ts_ret<-ts(xxx,frequency=52) #but Im setting the periods here.....
plot(decompose(ts_ret))
As #ufelder pointed out my dataset was too small to look at seasonal decomposition because I only had a few months of data (measured hourly), but not an entire seasons worth (which is 4 months). To fix this I had to modify the period of the dataset to once per day by using ts(xxx,frequency=365) so decompose would compare across days, not seasons.
I have a workspace where I have study of the weather of every hour past one year (temperature, CO2 and stuff).
What I need to do is split whole workspace depending on date (cause I have several 2009-01-01 etc) and in next step summarize the data for each day separetly (I'm looking for summary of every variable for every day separetly).
I was searching for some kind of function and have one, which is almoust good. Separating day works quite good, but summary is really bad.
df <- data.frame(date=rep(seq.POSIXt(as.POSIXct("2009-01-01"), by="day", length.out=31), each=1))
summary(split(df, as.Date(df$date),AM19))
I'm looking for a way to determine the week number (week beginning on Monday) over several years. That means I don't want to have 0-53 but if, let's say I have 2 years of dates, I want them to be numbered with 0-106 in R.
I tried strftime(Datum, format ="%W") but then I only get the annual week number and not as a whole.
Given that you did not provide any data, I took the liberty of creating some:
#create data
Datum<-c("2013-03-01", "2014-06-02", "2013-06-01")
# format data to year-month-day with strptime
Datum<-strptime(Datum, "%Y-%m-%d")
You now need to identify the origin year. As I'm sure you are aware not all years have the same number of weeks 52.29 in a leap year vs. 52.4 in a standard calendar year but as this is unlikely to be a consideration for only 2 years we can use the number of weeks returned through the strftime function.
origin.year=as.numeric(min(substring(Datum,1,4)))
# number of weeks in first year (offset for second year)
n.weeks<-52
Now we can create a vector containing the number of weeks to offset each week in Datum (X).
X<-as.numeric(substring(Datum,1,4)!=origin.year)*n.weeks
We can then simply add this vector to the number of weeks returned by strftime when it is applied to Datum
week.vec<-as.numeric(strftime(Datum, "%W"))+X
This will work for 2 years, but if you have more years than this, you will need to modify the offsets to account for this.
How can I accurately convert the products (units is in days) of the difftime below to years, months and days?
difftime(Sys.time(),"1931-04-10")
difftime(Sys.time(),"2012-04-10")
This does years and days but how could I include months?
yd.conv<-function(days, print=TRUE){
x<-days*0.00273790700698851
x2<-floor(x)
x3<-x-x2
x4<-floor(x3*365.25)
if (print) cat(x2,"years &",x4,"days\n")
invisible(c(x2, x4))
}
yd.conv(difftime(Sys.time(),"1931-04-10"))
yd.conv(difftime(Sys.time(),"2012-04-10"))
I'm not sure how to even define months either. Would 4 weeks be considered a month or the passing of the same month day. So for the later definition of a month if the initial date was 2012-01-10 and the current 2012-05-31 then we'd have 0 years, 5 months and 21 days. This works well but what if the original date was on the 31st of the month and the end date was on feb 28 would this be considered a month?
As I wrote this question the question itself evolved so I'd better clarify:
What would be the best (most logical approach) to defining months and then how to find diff time in years, months and days?
If you're doing something like
difftime(Sys.time(), someDate)
It comes as implied that you must know what someDate is. In that case, you can convert this to a POSIXct class object that gives you the ability to extract temporal information directly (package chron offers more methods, too). For instance
as.POSIXct(c(difftime(Sys.time(), someDate, units = "sec")), origin = someDate)
This will return your desired date object. If you have a timezone tz to feed into difftime, you can also pass that directly to the tz parameter in as.POSIXct.
Now that you have your date object, you can run things like months(.) and if you have chron you can do years(.) and days(.) (returns ordered factor).
From here, you could do more simple math on the difference of years, months, and days separately (converting to appropriate numeric representations). Of course, convert someDate to POSIXct will be required.
EDIT: On second thought, months(.) returns a character representation of the month, so that may not be efficient. At least, it'll require a little processing (not too difficult) to give a numeric representation.
R has not implemented these features out of ignorance. difftime objects are transitive. A 700 day difference on any arbitrary start-date can yield a differing number of years depending on whether there was a leap year or not. Similarly for months, they take between 28-31 days.
For research purposes, we use these units a lot (months and years) and pragmatically, we define a year as 365.25 days and a month as 365.25/12 = 30.4375 days.
To do arithmetic on a given difftime, you must convert this value to numeric using as.numeric(difftime.obj) which is, in default, days so R stops spouting off the units.
You can not simply convert a difftime to month, since the definition of months depends on the absolute time at which the difftime has started.
You'll need to know the start date or the end date to accurately tell the number of months.
You could then, e.g., calculate the number of months in the first year of your timespan, the number of month in the last your of the timespan, and add the number of years between times 12.
Hmm. I think the most sensible would be to look at the various units themselves. So compare the day of the month first, then compare the month of the year, then compare the year. At each point, you can introduce a carry to avoid negative values.
In other words, don't work with the product of difftime, but recode your own difftime.