how to change the time unit for survfit results in r - r

I am working to create a nice looking plot of the survfit return in R. The default time unit on the x-axis is a long integer, which I am assuming is seconds or milliseconds. This persists no matter what the unit of time is that is originally passed to the Surv(time, event) function. I have tried passing the start and stop times as POSIXcs dates as well as in milliseconds. Does anyone know how to indicate to either survfit or plot.survfit that I would like the time result to be in either Days, Hours format or HH:MM format? Thanks for your help.
survival <- survfit(Surv(time=starteventtime,event=endevent,time2=stoptime)~1)
plot(survival)
the current plot rendered

Related

Time/Date conversion from numeric

I have recently come across a time series dataset (in R) that had a numeric time index in the following format:
1.586183e+12 1.586184e+12 1.586185e+12 1.586186e+12 1.586187e+12 1.586188e+12
The data should be in 15 minute intervals. I have tried some of a usual conversions, such as as.POSIXct(), but that doesn't seem to work. I was hoping that someone could point me to the right format conversion.
Many thanks

Unexpected behavior of datetick function in Octave plotting

I'm trying to plot a graphic that displays values against moments of time.
For this, I have an array of time instants (in Epoch) and an array of values.
I've already been able to plot the graphic normally using the raw time (as Epoch). The problem is specifically in the conversion of the axis time format.
hold on;
plot(horizontal, pre_X(:,4), 'b-');
xt = get(gca, 'xtick');
set(gca, 'xticklabel', sprintf('%d|', xt));
datetick ("x", "dd/mmm/YY HH:MM");
yt = get(gca, 'ytick');
set(gca, 'yticklabel', sprintf('%d|', yt));
hold off;
The datetick function was supposed to be able to transform these Epoch times into nicely formatted ones, but I am not getting the expected result. Instead, all time instants get labeled as the same (01/JAN/00 00:00) which is weird.
The plot without the
datetick ("x", "dd/mmm/YY HH:MM");
line works fine, but gives the time information in Epoch, which is not what I intend to.
Any help would be appreciated!
NOTE: If the right function to do what I intend to turns out not to be "datetick", also please let me know! All I need is to get the X axis to be formatted nicely into readable time.
EDIT: By Epoch Time, I mean Unix Time.
Lacking more info, I'm going to assume that by Epoch time you mean posix, or Unix time
I would expect that to then be represented as a 32-bit integer, and should be the number of seconds counted from 'zero time' (as described in the Wiki linked) (it may also be a floating point number including fractional seconds using the same scale).
According to the Matlab help for datetick, it expects the axis data to be "serial date numbers, as returned by the datenum function". For compatibility, octave likely expects the same, although the datetick function reference does not state such explicitly.
The datenum "serial date number" format is another serial representation of time, but it has a different scale and reference than Epoch/posix/unix time. According to the datenum function description, it's serial definition is "date/time input as a serial day number, with Jan 1, 0000 defined as day 1".
That's a long way of saying you're time is probably in units of seconds, whereas datenum expects units of days.
Now, you can probably address this a couple ways. you can covert all of your times to the datenum scale before plotting. from this example, something like this would work in Matlab:
datetime(1470144960, 'convertfrom','posixtime')
According to bug #47032 datetime has not yet been implemented in Octave, but that bug report does link to a github repository with a datetime class implementation (under the inst/ folder).
To manually convert from unix to matlab time, you could convert following this example from the Mathworks File exchange:
unix_epoch = datenum(1970,1,1,0,0,0);
matlab_time = unix_time./86400 + unix_epoch;
(assuming your x-axis data is the unix_time variable)
Once you get your data into the 'datenum' scale, datetick should perform correctly.

What are the consequences of choosing different frequencies for ts objects?

To create a ts-object in R, one has to specify a data frame, a start date and the frequency of the time series.
When searching the internet (e.g. Role of frequency parameter in ts), I get the impression that by choosing the frequency, one can emphasise whatever periodic pattern one believes is the most important in the data. However, I doubt that this is actually true. My impression is that it is solely used to compute the dates of the time series on-the-fly. E.g. when I set the start date “2015-08-01”, R automatically transforms it into a decimal date and I get something like 2015.58. If I now choose a frequency of 365 (or 365.25), R divides one unit by 365 and assigns this fraction to each day as one unit ahead, so the entry 366 days later is exactly 2016.58. However, if I choose frequency=7, the fraction assigned to each day is 1/7th, so the date assigned to the 8th day after my start date corresponds to a decimal number between 2016 and 2017. So the only choice for a data set with 365 entries per year is 365, isn’t it? And it is only used to actually create the time series?
Otherwise, if I choose the xts-class, an xts-object is built from a vector and a matrix where the vector has to be created in advance. So here there is no need to compute the date on-the-fly using a start date and a frequency and that is the reason why no frequency has to be assigned at all.
In both cases I can apply forecasting packages to either ts or xts objects (such as ARIMA, ets, stl, bats, bats etc) without specifying anything else so this shows that the frequency is actually not used for anything else. Or am I missing something here?
Thanks in advance for your comments!

Creating a Time Series with Half Hourly Data in R

This is my first time ever asking a question on Stack Overflow and I'm a programming novice so any advice as to how to improve my question asking abilities would be appreciated.
Onto my question: I have two csv files, one containing three columns (date time in dd/mm/yyyy hh:(00 or 30) format, production of a certain product, and demand for said product), and the other containing several columns (decomposition of the date time into year, month, day, hour, and whether it is :00 or :30 represented by 1 or 2 respectively, alongside several columns for independent variables which may affect production/demand of said product).
I've only played around with the first csv file, converting the string into a datetime object but the ts() function won't recognise the datetime objects as my times. I've tried adjusting the frequency parameter but ultimately failed and have no idea how to create a time series using half hourly data. Would appreciate any help.
Thanks in advance!
My suggestion is to apply the "difftime" over all your time data. For instance, like following code, you can use your initial time (the time of first record) for all comparisons as time_start and the others as time_finish. Then it return the time intervals as number of seconds and then you are ready to use other column values as the value of the time stamps.
interval=as.integer(difftime(strptime(time_finish,"%H:%M"),strptime(time_start,"%H:%M"),units = "sec"))
Second 0 10 15 ....

Can I make a time series work with date objects rather than integers?

I have time series data that I'm trying to analyse in R. It was provided as a CSV from excel, which I subsequently read as a data.frame all. Let's say it has two columns: all$date and all$people, representing the count of people on a particular date. The frequency is hence daily.
Being from Excel, the dates are integers representing the number of days since 1900-01-01.
I could read the data as people = ts(all$people, start=c(all$date[1], 1), frequency=365); but that gives a silly start value of almost 40000 because the data starts in 2006. The start parameter doesn't take a date object, according to ?ts, so I can't just use as.Date():
ts - ...
start: the time of the first observation. Either a single number
or a vector of two integers, which specify a natural time unit and
a (1-based) number of samples into the time unit. See the examples
for the use of the second form.
I could of course set start=1, but it's a bit painful to figure out what season we're in when the plot tells me interesting things are happening around day 2100. (To be clear, setting frequency=365 does tell me what year we're in, but isn't useful more precise dates). Is there a useful way of expressing the date in ts in a human-readable form so that I don't have to keep calling as.Date() to understand when the interesting features are happening?

Resources