Creating a Time Series with Half Hourly Data in R - r

This is my first time ever asking a question on Stack Overflow and I'm a programming novice so any advice as to how to improve my question asking abilities would be appreciated.
Onto my question: I have two csv files, one containing three columns (date time in dd/mm/yyyy hh:(00 or 30) format, production of a certain product, and demand for said product), and the other containing several columns (decomposition of the date time into year, month, day, hour, and whether it is :00 or :30 represented by 1 or 2 respectively, alongside several columns for independent variables which may affect production/demand of said product).
I've only played around with the first csv file, converting the string into a datetime object but the ts() function won't recognise the datetime objects as my times. I've tried adjusting the frequency parameter but ultimately failed and have no idea how to create a time series using half hourly data. Would appreciate any help.
Thanks in advance!

My suggestion is to apply the "difftime" over all your time data. For instance, like following code, you can use your initial time (the time of first record) for all comparisons as time_start and the others as time_finish. Then it return the time intervals as number of seconds and then you are ready to use other column values as the value of the time stamps.
interval=as.integer(difftime(strptime(time_finish,"%H:%M"),strptime(time_start,"%H:%M"),units = "sec"))
Second 0 10 15 ....

Related

What are the consequences of choosing different frequencies for ts objects?

To create a ts-object in R, one has to specify a data frame, a start date and the frequency of the time series.
When searching the internet (e.g. Role of frequency parameter in ts), I get the impression that by choosing the frequency, one can emphasise whatever periodic pattern one believes is the most important in the data. However, I doubt that this is actually true. My impression is that it is solely used to compute the dates of the time series on-the-fly. E.g. when I set the start date “2015-08-01”, R automatically transforms it into a decimal date and I get something like 2015.58. If I now choose a frequency of 365 (or 365.25), R divides one unit by 365 and assigns this fraction to each day as one unit ahead, so the entry 366 days later is exactly 2016.58. However, if I choose frequency=7, the fraction assigned to each day is 1/7th, so the date assigned to the 8th day after my start date corresponds to a decimal number between 2016 and 2017. So the only choice for a data set with 365 entries per year is 365, isn’t it? And it is only used to actually create the time series?
Otherwise, if I choose the xts-class, an xts-object is built from a vector and a matrix where the vector has to be created in advance. So here there is no need to compute the date on-the-fly using a start date and a frequency and that is the reason why no frequency has to be assigned at all.
In both cases I can apply forecasting packages to either ts or xts objects (such as ARIMA, ets, stl, bats, bats etc) without specifying anything else so this shows that the frequency is actually not used for anything else. Or am I missing something here?
Thanks in advance for your comments!

Extract data for all days for last 30 days from R data frame

I am totally new to R environment and I'm stuck at Date operations. The scenario is, I have a daily database of customer activity of a certain Store, and I need to extract last 30 months data starting from current date.
In other words, suppose today is 18-NOV-2014, I need all the data from 18-OCT-2014 till today in a separate data-frame. To extract it, what kind of iteration logic should I write in R?
You don't need an iteration. What you could do is, assuming your data.frame is called X, and the date column, DATE, you could write:
X$DATE=as.Date(X$DATE, format='%d-%B-%Y')
the 'format' argument is to match your date format you specify in you question. Then, to get the lines you are interested in, something like:
X[X$DATE>=as.Date(today(),format='%d-%B-%Y')-30)]
which is all the lines that are after today - 30 days.
Does this help at all?

Can I make a time series work with date objects rather than integers?

I have time series data that I'm trying to analyse in R. It was provided as a CSV from excel, which I subsequently read as a data.frame all. Let's say it has two columns: all$date and all$people, representing the count of people on a particular date. The frequency is hence daily.
Being from Excel, the dates are integers representing the number of days since 1900-01-01.
I could read the data as people = ts(all$people, start=c(all$date[1], 1), frequency=365); but that gives a silly start value of almost 40000 because the data starts in 2006. The start parameter doesn't take a date object, according to ?ts, so I can't just use as.Date():
ts - ...
start: the time of the first observation. Either a single number
or a vector of two integers, which specify a natural time unit and
a (1-based) number of samples into the time unit. See the examples
for the use of the second form.
I could of course set start=1, but it's a bit painful to figure out what season we're in when the plot tells me interesting things are happening around day 2100. (To be clear, setting frequency=365 does tell me what year we're in, but isn't useful more precise dates). Is there a useful way of expressing the date in ts in a human-readable form so that I don't have to keep calling as.Date() to understand when the interesting features are happening?

Index xts using string and return only observations at that exact time

I have an xts time series in R and am using the very handy function to subset the time series based on a string, for example
time_series["17/06/2006 12:00:00"]
This will return the nearest observation to that date/time - which is very handy in many situations. However, in this particular situation I only want to return the elements of the time series which are at that exact time. Is there a way to do this in xts using a nice date/time string like this?
In a more general case (I don't have this problem immediately now, but suspect I may run into it soon) - is it possible to extract the closest observation within a certain period of time? For example, the closest observation to the given date/time, assuming it is within 10 minutes of the given date/time - otherwise just discard that observation.
I suspect this more general case may require me writing a function to do this - which I am happy to do - I just wanted to check whether the more specific case (or the general case) was already catered for in xts.
AFAIK, the only way to do this is to use a subset that begins at the time you're interested in, then get the first observation of that.
e.g.
first(time_series["2006-06-17 12:00:00/2006-06-17 12:01"])
or, more generally, to get the 12:00 price every day, you can subset down to 1 minute of each day, then split by days and extract the first observation of each.
do.call(rbind, lapply(split(time_series["T12:00:00/T12:01"],'days'), first))
Here's a thread where Jeff (the xts author) contemplates adding the functionality you want
http://r.789695.n4.nabble.com/Find-first-trade-of-day-in-xts-object-td3598441.html#a3599887

Specific date format conversion problems in R

Basically I want to know why as.Date(200322,format="%Y%W") gives me NA. While we are at it, I would appreciate any advice on a data structure for repeated cross-section (aka pseudo-panel) in R.
I did get aggregate() to (sort of) work, but it is not flexible enough - it misses data on columns when I omit the missed values, for example.
Specifically, I have a survey that is repeated weekly for a couple of years with a bunch of similar questions answers to which I would like to combine, average, condition and plot in both dimensions. Getting the date conversion right should presumably help me towards my goal with zoo package or something similar.
Any input is appreciated.
Update: thanks for string suggestion, but as you can see in your own example, %W part doesn't work - it only identifies the year while setting the current day while I need to set a specific week (and leave the day blank).
Use a string as first argument in as.Date() and select a specific weekday (format %w, value 0-6). There are seven possible dates in each week, therefore strptime needs more information to select a unique date. Otherwise the current day and month are returned.
> as.Date(paste("200947", "0", sep="-"), format="%Y%W-%w")
[1] "2009-11-22"

Resources