Finding duration in seconds between time stamps in R? - r

I am trying to find the duration in seconds between adjacent time stamps. The first issue I am having is that I am not sure if it is treating each timestamp as a number, I tried using
TimeStamp <- read.csv("DateStamps.csv", header = TRUE, colClasses = "character")
However, I am not sure if it is working, because, on empty spaces, where there should be an NA, there is nothing.
For the differences, I want to find 4 durations in seconds, (gather order - start, walk to car - gather order, handoff - walk to car, return to store - handoff) all of this through adjacent columns. However, I am not sure how to do this, or how to write a piece of code that would recognize the specific differences I want to calculate.

I think the NA issue is answered here:
Change the Blank Cells to "NA"
You need to stipulate to R what constitutes NA

Related

Comparing times within two vectores and finding nearest for each element in R

I have a problem going out of basic programming towards more sophisticated. Could you help me to adjust this code?
There are two vectors with dates and times, one is when activities happens, and another one - when triggers appear. The aim is to find nearest activities date/time to each of triggers, after each trigger happen. Final result is average of all differences.
I have this code. It works. But it's very slow when working with large dataset.
time_activities<- as.POSIXct(c("2008-09-14 22:15:14","2008-09-15 09:05:14","2008-09-16 14:05:14","2008-09-17 12:05:14"), , "%Y-%m-%d %H:%M:%S")
time_triggers<- as.POSIXct(c("2008-09-15 06:05:14","2008-09-17 12:05:13"), , "%Y-%m-%d %H:%M:%S")
for (j in 1:length(time_triggers))
{
for(i in 1:length(time_activities))
{
if(time_triggers[j]<time_activities[i])
{
result[j] = ceiling(difftime(time_activities[i], time_triggers[j], units = 'mins'))
break
}
}
}
print(mean(as.numeric(result)))
Can I somehow get rid of the loop, and do everything with vectors? Maybe you can give me some hint of which function I could use to compare dates at once?
delay=sapply(time_triggers,function(x) max(subset(difftime(x,time_activities,units='mins'),difftime(x,time_activities,units='mins')<0)))
mean(delay[is.finite(delay)])
This should do the trick. As always, the apply family of functions is a good replacement for a for loop.
This gives the average number of minutes that an activity occurred after a trigger.
If you want to see what the activity delay was after each trigger (rather than just the mean of all the triggers), you can just remove the mean() at the beginning. The values will then correspond to each value in time_triggers.
UPDATE:
I updated the code to ignore Inf values as requested. Sadly, this means the code should be 2 lines rather than 1. If you really want, you can make this all one line, but then you will be doing the majority of the computation twice (not very efficient).

Can I make a time series work with date objects rather than integers?

I have time series data that I'm trying to analyse in R. It was provided as a CSV from excel, which I subsequently read as a data.frame all. Let's say it has two columns: all$date and all$people, representing the count of people on a particular date. The frequency is hence daily.
Being from Excel, the dates are integers representing the number of days since 1900-01-01.
I could read the data as people = ts(all$people, start=c(all$date[1], 1), frequency=365); but that gives a silly start value of almost 40000 because the data starts in 2006. The start parameter doesn't take a date object, according to ?ts, so I can't just use as.Date():
ts - ...
start: the time of the first observation. Either a single number
or a vector of two integers, which specify a natural time unit and
a (1-based) number of samples into the time unit. See the examples
for the use of the second form.
I could of course set start=1, but it's a bit painful to figure out what season we're in when the plot tells me interesting things are happening around day 2100. (To be clear, setting frequency=365 does tell me what year we're in, but isn't useful more precise dates). Is there a useful way of expressing the date in ts in a human-readable form so that I don't have to keep calling as.Date() to understand when the interesting features are happening?

how to do a sum in graphite but exclude cases where not all data is present

We have 4 data series and once in a while one of the 4 has a null as we missed reading the data point. This makes the graph look like we have awful spikes in loss of volume coming in which is not true as we were just missing the data point.
I am doing a basic sumSeries(server*.InboundCount) right now for server 1, 2, 3, 4 where the * is.
Is there a way where graphite can NOT sum the locations on the line and just have sum for those points in time be also null so it connects the line from the point where there is data to the next point where there is data.
NOTE: We also display the graphs server*.InboundCount individually to watch for spikes on individual servers.
or perhaps there is function such that it looks at all the series and if any of the values is null, it returns null for every series that it takes X series and returns X series points to the sum function as null+null+null+null hopefully doesn't result in a spike and shows null.
thanks,
Dean
This is an old question but still deserves an answer as a point of reference, what you're after I believe is the function KeepLastValue
Takes one metric or a wildcard seriesList, and optionally a limit to the number of ‘None’ values to skip over. Continues the line with the last received value when gaps (‘None’ values) appear in your data, rather than breaking your line.
This would make your function
sumSeries(keepLastValue(server*.InboundCount))
This will work ok if you have a single null datapoint here and there. If you have multiple consecutive null data points you can specify how far back before a null breaks your data. For example, the following will look back up to 10 values before the sumSeries breaks:
sumSeries(keepLastValue(server*.InboundCount, 10))
I'm sure you've since solved your problems, but I hope this helps someone.

Index xts using string and return only observations at that exact time

I have an xts time series in R and am using the very handy function to subset the time series based on a string, for example
time_series["17/06/2006 12:00:00"]
This will return the nearest observation to that date/time - which is very handy in many situations. However, in this particular situation I only want to return the elements of the time series which are at that exact time. Is there a way to do this in xts using a nice date/time string like this?
In a more general case (I don't have this problem immediately now, but suspect I may run into it soon) - is it possible to extract the closest observation within a certain period of time? For example, the closest observation to the given date/time, assuming it is within 10 minutes of the given date/time - otherwise just discard that observation.
I suspect this more general case may require me writing a function to do this - which I am happy to do - I just wanted to check whether the more specific case (or the general case) was already catered for in xts.
AFAIK, the only way to do this is to use a subset that begins at the time you're interested in, then get the first observation of that.
e.g.
first(time_series["2006-06-17 12:00:00/2006-06-17 12:01"])
or, more generally, to get the 12:00 price every day, you can subset down to 1 minute of each day, then split by days and extract the first observation of each.
do.call(rbind, lapply(split(time_series["T12:00:00/T12:01"],'days'), first))
Here's a thread where Jeff (the xts author) contemplates adding the functionality you want
http://r.789695.n4.nabble.com/Find-first-trade-of-day-in-xts-object-td3598441.html#a3599887

Specific date format conversion problems in R

Basically I want to know why as.Date(200322,format="%Y%W") gives me NA. While we are at it, I would appreciate any advice on a data structure for repeated cross-section (aka pseudo-panel) in R.
I did get aggregate() to (sort of) work, but it is not flexible enough - it misses data on columns when I omit the missed values, for example.
Specifically, I have a survey that is repeated weekly for a couple of years with a bunch of similar questions answers to which I would like to combine, average, condition and plot in both dimensions. Getting the date conversion right should presumably help me towards my goal with zoo package or something similar.
Any input is appreciated.
Update: thanks for string suggestion, but as you can see in your own example, %W part doesn't work - it only identifies the year while setting the current day while I need to set a specific week (and leave the day blank).
Use a string as first argument in as.Date() and select a specific weekday (format %w, value 0-6). There are seven possible dates in each week, therefore strptime needs more information to select a unique date. Otherwise the current day and month are returned.
> as.Date(paste("200947", "0", sep="-"), format="%Y%W-%w")
[1] "2009-11-22"

Resources