Getting Date to Add Correctly - r

I have a 3000 x 1000 matrix time series database going back 14 years that is updated every three months. I am forecasting out 9 months using this data still keeping a 3200 x 1100 matrix (mind you these are rough numbers).
During the forecasting process I need the variables Year and Month to be calculated appropriately . I am trying to automate the process so I don't have to mess with the code any more; I can just run the code every three months and upload the projections into our database.
Below is the code I am using right now. As I said above I do not want to have to look at the data or the code just run the code every three months. Right now everything else is working as planed, but I still have to ensure the dates are appropriately annotated. The foo variables are changed for privacy purposes due to the nature of their names.
projection <- rbind(projection, data.frame(foo=forbar, bar=barfoo,
+ Year=2012, Month=1:9,
+ Foo=as.vector(fc$mean)))

I'm not sure exactly where the year/months are coming from, but if you want to refer to the current date for those numbers, here is an option (using the wonderful package, lubridate):
library(lubridate)
today = Sys.Date()
projection <- rbind(projection, data.frame(foo=foobar, bar=barfoo,
year = year(today),
month = sapply(1:9,function(x) month(today+months(x))),
Foo = as.vector(fc$mean)))
I hope this is what you're looking for.

Related

My data does not convert to time series in R

My data contains several measurements in one day. It is stored in CSV-file and looks like this:
enter image description here
The V1 column is factor type, so I'm adding a extra column which is date-time -type: vd$Vdate <- as_datetime(vd$V1) :
enter image description here
Then I'm trying to convert the vd-data into time series: vd.ts<- ts(vd, frequency = 365)
But then the dates are gone:
enter image description here
I just cannot get it what I am doing wrong! Could someone help me, please.
Your dates are gone because you need to build the ts dataframe from your variables (V1, ... V7) disregarding the date field and your ts command will order R to structure the dates.
Also, I noticed that you have what is seems like hourly data, so you need to provide the frequency that is appropriate to your time not 365. Considering what you posted your frequency seems to be a bit odd. I recommend finding a way to establish the frequency correctly. For example, if I have hourly data for 365 days of the year then I have a frequency of 365.25*24 (0.25 for the leap years).
So the following is just as an example, it still won't work properly with what I see (it is limited view of your dataset so I am not sure 100%)
# Build ts data (univariate)
vs.ts <- ts(vd$V1, frequency = 365, start = c(2019, 4)
# check to see if it is structured correctly
print(vd.ts, calendar = T)
Finally my time series is working properly. I used
ts <- zoo(measurements, date_times)
and I found out that the date_times was supposed to be converted with as_datetime() as otherwise they were character type. The measurements are converted into data.frame type.

Understanding the time() and cycle() functions in R

I have time series of the monthly international airplane passengers for many years, what does the time function applied to my data set tell me? What does the cycle function do to my data set? What are these functions useful for?
The syntax of the time statement would be as follows:-
library(tseries)
library(forecast)
data(AirPassengers)
AP <- AirPassengers
time(AP, offset(0))
The offset of 0 indicates that the sampling of this dataset took place at the start of the time series and the time function would create a vector of times, starting the first unit or the first month in this case.
The cycle function on the other hand, will just show the position of the datapoint or the observation in the entire cycle. The syntax would be
cycle(AP)
If you run this in R, you would see that the position assigned to Jan is 1, Feb is 2, March is 3 and so on for all the 12 months.
Use of these functions would be to get an overview of the data before starting to model it.
Here is an additional link for you to browse.
An old question, but perhaps someone is googling this, so an illuminating example could still help some. When trying to make sense Cowpertwait's R book...
Try this to see a visual example of what is going on:
AP
cycle(AP)
layout(matrix(c(1,2,3, 4), 2, 2, byrow = TRUE),widths=c(1,1), heights=c(1,1))
plot(AP)
plot(aggregate(AP))
boxplot(AP)
boxplot(AP ~ cycle(AP))

how I change the frame data into time series?

I have a daily rainfall data for 36 years. I want to analyze the time series, but my data is still in the form of frame data, how I change the frame data into time series. My data is a variable, how to unify the year number with the date and month, so the data is only in one column
You could use a time series package for that, such as fpp i.e. install.packages('fpp'). Since you don't give an example code, I can't really help you properly with it but it's quite easy.
ts(your_data, start =, frequency = ) At start = you put the year or month where you'd start and at frequency = you'd put e.g. 36 since you talk about 36 years.
You might want to check out https://robjhyndman.com/. He has an online (free) book available that walks you through the use of his package as well as providing useful information with respect to time series analysis.
Hope this helps.

Average of Data in accordance with similar time

I have a dataset having solar power generation for 24 hours for many days, now I have to find the average of the power generated in accordance with the time, as for example, Have a glimpse of the datasetI have to find the average of the power generated at time 9:00:00 AM.
Start by stripping out the time from the date-time variable.
Assuming your data is called myData
library(lubridate)
myData$Hour <- hour(strptime(myData$Time, format = "%Y-%m-%d %H:%M:%S"))
Then use ddply from the plyr package, which allows us to apply a function to a subset of data.
myMeans <- ddply(myData[,c("Hour", "IT_solar_generation")], "Hour", numcolwise(mean))
The resulting frame will have one column called Time which will give you the hour, and another with the means at each hour.
NOW, on another side but important note, when you ask a question you should be providing information on the attempts you've made so far to answer the question. This isn't a help desk.

XTS: split FX intraday bar data by trading days

I want to apply a function to 20 trading days worth of hourly FX data (as one example amongst many).
I started off with rollapply(data,width=20*24,FUN=FUN,by=24). That seemed to be working well, I could even assert I always got 480 bars passed in... until I realized that wasn't what I wanted. The start and end time of those 480 bars was drifting over the years, due to changes in daylight savings, and market holidays.
So, what I want is a function that treats a day as from 22:00 to 22:00 of each day we have data for. (21:00 to 21:00 in N.Y. summertime - my data timezone is UTC, and daystart is defined at 5pm ET)
So, I made my own rollapply function with this at its core:
ep=endpoints(data,on=on,k=k)
sp=ep[1:(length(ep)-width)]+1
ep=ep[(width+1):length(ep)]
xx <- lapply(1:length(ep), function(ix) FUN(.subset_xts(data,sp[ix]:ep[ix]),...) )
I then called this with on="days", k=1 and width=20.
This has two problems:
Days is in days, not trading days! So, instead of typically 4 weeks of data, I get just under 3 weeks of data.
The cutoff is midnight UTC. I cannot work out how to change it to use the 22:00 (or 21:00) cutoff.
UPDATE: Problem 1 above is wrong! The XTS endpoints function does work in trading days, not calendar days. The reason I thought otherwise is the timezone issue made it look like a 6-day trading
week: Sun to Fri. Once the timezone problem was fixed (see my
self-answer), using width=20 and on="days" does indeed give me 4
weeks of data.
(The typically there is important: when there is a trading holiday during those 4 weeks I expect to receive 4 weeks 1 day's worth of data, i.e. always exactly 20 trading days.)
I started working on a function to cut the data into weeks, thinking I could then cut them into five 24hr chunks, but this feels like the wrong approach, and surely someone has invented this wheel before me?
Here is how to get the daybreak right:
x2=x
index(x2)=index(x2)+(7*3600)
indexTZ(x2)='America/New_York'
I.e. just setting the timezone puts the daybreak at 17:00; we want it to be at 24:00, so add 7 hours on first.
With help from:
time zones in POSIXct and xts, converting from GMT in R
Here is the full function:
rollapply_chunks.FX.xts=function(data,width,FUN,...,on="days",k=1){
data <- try.xts(data)
x2 <- data
index(x2) <- index(x2)+(7*3600)
indexTZ(x2) <- 'America/New_York'
ep <- endpoints(x2,on=on,k=k) #The end point of each calendar day (when on="days").
#Each entry points to the final bar of the day. ep[1]==0.
if(length(ep)<2){
stop("Cannot divide data up")
}else if(length(ep)==2){ #Can only fit one chunk in.
sp <- 1;ep <- ep[-1]
}else{
sp <- ep[1:(length(ep)-width)]+1
ep <- ep[(width+1):length(ep)]
}
xx <- lapply(1:length(ep), function(ix) FUN(.subset_xts(data,sp[ix]:ep[ix]),...) )
xx <- do.call(rbind,xx) #Join them up as one big matrix/data.frame.
tt <- index(data)[ep] #Implicit align="right". Use sp for align="left"
res <- xts(xx, tt)
return (res)
}
You can see we use the modified index to split up the original data. (If R uses copy-on-write under the covers, then the only extra memory requirement should be for a copy of the index, not of the data.)
(Legal bit: please consider it licensed under MIT, but explicit permission given to use in the GPL-2 XTS package if that is desired.)

Resources