convert string to time in r - r

I have an array of time strings, for example 115521.45 which corresponds to 11:55:21.45 in terms of an actual clock.
I have another array of time strings in the standard format (HH:MM:SS.0) and I need to compare the two.
I can't find any way to convert the original time format into something useable.
I've tried using strptime but all it does is add a date (the wrong date) and get rid of time decimal places. I don't care about the date and I need the decimal places:
for example
t <- strptime(105748.35, '%H%M%OS') = ... 10:57:48
using %OSn (n = 1,2 etc) gives NA.
Alternatively, is there a way to convert a time such as 10:57:48 to 105748?

Set the options to allow digits in seconds, and then add the date you wish before converting (so that the start date is meaningful).
options(digits.secs=3)
strptime(paste0('2013-01-01 ',105748.35), '%Y-%M-%d %H%M%OS')

Related

Convert UTC ISO string date to Unix timestamp

I have this UTC date in a Google spreadsheet: 2018-10-18T08:55:13Z and would like to convert it to Unix timestamp (1539852913). I tried this formula, but it's unable to recognize the timevalue:
=DATEVALUE(MID(A1;1;10)) + TIMEVALUE(MID(A1;12;8))
If I can get a valid date and time, I can use this formula to convert to Unix timestamp:
=(A1-$C$1)*86400
Does anyone have a solution for this?
Simpler:
=86400*(left(substitute(A1,"T"," "),19))-2209161600
Replaces T with space and cuts off Z, leaving what's left recognisable as date and time in arithmetical calculations. Convert day and time index into seconds and adjust for the offset.
Assuming your date has proceeding zeros for single digit days and month, pull each date string part and drop it into the DATE formula as follows:
Year
=LEFT(A1,4)
Month
=MID(A1,6,2)
Day
=MID(A1,9,2)
Use the date formula
=DATE(year,month,day)
=DATE(LEFT(A1,4),MID(A1,6,2),MID(A1,9,2))
A similar process can be used for TIME
Hour
=MID(A1,12,2)
Minutes
=MID(A1,15,2)
Seconds
=MID(A1,18,2)
Time
=TIME(Hour,Minutes,Seconds)
=TIME(MID(A1,12,2),MID(A1,15,2),MID(A1,18,2))
1) There are other methods
2) The formulas will need to be adapted if you do not have leading 0 for each unit. In that case you would need to use FIND to identify the position of key characters and measure the distance between them to determine if there was a single digit unit or double digit unit.
Since the date is the integer part (left of the decimal) represents the number of days since 1900/01/01 (with that date being 1) and decimal portion represents time in terms of fraction of a day, to get a full date and time, you would add the date formula to the time formula as follows:
=DATE(LEFT(A1,4),MID(A1,6,2),MID(A1,9,2))+TIME(MID(A1,12,2),MID(A1,15,2),MID(A1,18,2))

Changing Time To A Comparable Function In R

I have a dataset in .csv, and I have added in a column on my own in the csv that takes the total time taken for a task to be completed. There are two other columns that consists of the start time and the end time, and that is where I calculated the total time taken column from. The format of the start time and end time columns are in the datetime format 5/7/2018 16:13 while the format of the total time taken column is 0:08:20(H:MM:SS).
I understand that for datetime, it is possible to use the functions as.Date or as.POSIXlt to change the variable type from a factor to that of date. Is there a function that I can convert my total time taken column to (from that of factor) so that I can use it to plot scatterplots/plots in general? I tried as.numeric but the numbers that come out are gibberish and do not correspond to the original time.
If you want to plot the total time taken for each row, then I would suggest just plotting that difference as seconds. Here is a code snippet which shows how you can convert your start or end date into a numerical value:
start <- "5/7/2018 16:13"
start_date <- as.POSIXct(start, format="%d/%m/%Y %H:%M")
as.numeric(start_date)
[1] 1530799980
The above is a UNIX timestamp, which is number of seconds since the epoch (January 1, 1970). But, since you want a difference between start and end times, this detail does not really matter for you, and the difference you get should be valid.
If you want to use minutes, hours, or some other time unit, then you can easily convert.

Time series (xts) strptime; ONLY month and day

I've been trying to do a time series on my dataframe, and I need to strip times from my csv. This is what I've got:
campbell <-read.csv("campbell.csv")
campbell$date = strptime(campbell$date, "%m/%d")
campbell.ts <- xts(campbell[,-1],order.by=campbell[,1])
First, what I'm trying to do is just get xts to strip the dates as "xx/xx" meaning just the month and day. I have no year for my data. When I try that second line of code and call upon the date column, it converts it to "2013-xx-xx." These months and days have no year associated with them, and I can't figure out how to get rid of the 2013. (The csv file I'm calling on has the dates in the format "9/30,10/1...etc.)
Secondly, once I try and make a time series (the third line), I am unsure what the "order.by" command is calling on. What am I indexing?
Any help??
Thanks!
For strptime, you need to provide the full date, i.e. day, month and year. In case, any of these is not provided, current ones are assumed from the system's time and appended to the incomplete date. So, if you want to retain your date format as you have read it, first make a copy of that and store in a temporary variable and then use strptime over campbell$date to convert into R readable date format. Since, year is not a concern to you, you need not bother about it even though it is automatically appended by strptime.
campbell <-read.csv("campbell.csv")
date <- campbell$date
campbell$date <- strptime(campbell$date, "%m/%d")
Secondly, what you are doing by 'the third line' (xts(campbell[,-1],order.by=campbell[,1])) command is that, your are telling to order all the data of campbell except the first column (campbell[,-1]) according to the index provided by the time data in the first column of campbell (campbell[,1]). So, it would only work given the date is in the first column.
After ordering the data according to time-series, you can replace back the campbell$date column with date to get back the date format you wanted (although here, first you have to order date also like shown below)
date <- xts(date, order.by=campbell[,1]) # assuming campbell$date is campbell[,1]
campbell.ts <- xts(campbell[,-1], order.by=campbell[,1])
campbell.ts <- cbind(date, campbell.ts)
format(as.Date(campbell$dat, "%m/%d/%Y"), "%m/%d")

cast string directly to IDateTime

I am using the new version of data.table and especially the AWESOME fread function. My files contain dates that are loaded as strings (cause I don't know to do it otherwise) looking like 01APR2008:09:00:00.
I need to sort the data.table on those datetimes and then for the sort to be efficient to cast then in the IDateTime format (or anything alse I would not know yet).
> strptime("01APR2008:09:00:00","%d%b%Y:%H:%M:%S")
[1] "2008-04-01 09:00:00"
> IDateTime(strptime("01APR2008:09:00:00","%d%b%Y:%H:%M:%S"))
idate itime
1: 2008-04-01 09:00:00
> IDateTime("01APR2008:09:00:00","%d%b%Y:%H:%M:%S")
Error in charToDate(x) :
character string is not in a standard unambiguous format
It looks like I cannot do DT[ , newType := IDateTime(strptime(oldType, "%d%b%Y:%H:%M:%S"))].
My questions are then:
Is there a way to cast directly to IDateTime from fread, such that I can sort afterward efficiently?
If not, what is the most efficient way to go knowing that I would like to be able to sort DT by this datetime column
Unfortunately (for efficiency) strptime produces a POSIXlt type, which is unsupported by data.table and always will be due its size (40 bytes per date!) and structure. Although strftime produces the much better POSIXct, it still does it via POSIXlt. More info here :
http://stackoverflow.com/a/12788992/403310
Looking to base functions such as as.Date, it uses strptime too, creating an integer offset from epoch (oddly) stored as double. The IDate (and friends) class in data.table aims to achieve integer epoch offsets stored as, um, integer. Suitable for fast sorting by base::sort.list(method = "radix") (which is really a counting sort). IDate doesn't really aim to be fast at (usually one off) conversion.
So to convert string dates/times, rightly or wrongly, I tend to roll my own helper function.
If the string date is "2012-12-24" I'd lean towards: as.integer(gsub("-", "", col)) and proceed with YYYYMMDD integer dates. Similarly times can be HHMMDD as an integer. Two columns: date and time separately can be useful if you generally want to roll = TRUE within a day, but not to the previous day. Grouping by month is simple and fast: by = date %/% 100L. Adding and subtracting days is troublesome, but it is anyway because rarely do you want to add calendar days, rather weekdays or business days. So that's a lookup to your business day vector anyway.
In your case the character month would need a conversion to 1:12. There isn't a separator in your dates "01APR2008", so a substring would be one way followed by a match or fmatch on the month name. Are you in control of the file format? If so, numbers are better in an unambiguous format that sorts naturally such as %Y-%m-%d, or %Y%m%d.
I haven't yet got to how best do this in fread, so date/times are left as character currently because I'm not yet sure how to detect the date format or which type to output. It does need to output either integer or double dates though, rather than inefficient character. I suspect that my use of YYYYMMDD integers are seen as unconventional, so I'm a little hesitant to make that the default. They have their place, and there are pros and cons of epoch based dates too. Dates don't have to be always epoch based is all I'm suggesting.
What do you think? Btw, thanks for encouragement on fread; was nice to see.
I d'ont know how your file is structured, but from your comment you want to use the date field as a key. Why not to read it as a time series and format it when in reading?
Here I use zoo to do it.(Here I suppose that the date column is the first one,otherwise see index.colum argument)
ff <- function(x) as.POSIXct(strptime(x,"%d%b%Y:%H:%M:%S"))
h <- read.zoo(text = "03avril2008:09:00:00 125
02avril2008:09:30:00 126
05avril2008:09:10:00 127
04avril2008:09:20:00 128
01avril2008:09:00:00 128"
,FUN=ff)
You get your dates sorted in the right format and sorted.
The conversion is natural from POSIXct to IDateTime
IDateTime(index(h))
idate itime
1: 2008-04-01 09:00:00
2: 2008-04-02 09:30:00
3: 2008-04-03 09:00:00
4: 2008-04-04 09:20:00
5: 2008-04-05 09:10:00
Here sure you still do 2 conversions, But you do it when reading data, and the second you do it without dealing with any format problem.

Converting time format to numeric with R

In most cases, we convert numeric time to POSIXct format using R. However, if we want to compare two time points, then we would prefer the numeric time format. For example, I have a date format like "2001-03-13 10:31:00",
begin <- "2001-03-13 10:31:00"
Using R, I want to covert this into a numeric (e.g., the Julian time), perhaps something like the passing seconds between 1970-01-01 00:00:00 and 2001-03-13 10:31:00.
Do you have any suggestions?
The Julian calendar began in 45 BC (709 AUC) as a reform of the Roman calendar by Julius Caesar. It was chosen after consultation with the astronomer Sosigenes of Alexandria and was probably designed to approximate the tropical year (known at least since Hipparchus). see http://en.wikipedia.org/wiki/Julian_calendar
If you just want to remove ":" , " ", and "-" from a character vector then this will suffice:
end <- gsub("[: -]", "" , begin, perl=TRUE)
#> end
#[1] "20010313103100"
You should read the section about 1/4 of the way down in ?regex about character classes. Since the "-" is special in that context as a range operator, it needs to be placed first or last.
After your edit then the answer is clearly what #joran wrote, except that you would need first to convert to a DateTime class:
as.numeric(as.POSIXct(begin))
#[1] 984497460
The other point to make is that comparison operators do work for Date and DateTime classed variables, so the conversion may not be necessary at all. This compares 'begin' to a time one second later and correctly reports that begin is earlier:
as.POSIXct(begin) < as.POSIXct(begin) +1
#[1] TRUE
Based on the revised question this should do what you want:
begin <- "2001-03-13 10:31:00"
as.numeric(as.POSIXct(begin))
The result is a unix timestamp, the number of seconds since epoch, assuming the timestamp is in the local time zone.
Maybe this could also work:
library(lubridate)
...
df <- '24:00:00'
as.numeric(hms(df))
hms() will convert your data from one time format into another, this will let you convert it into seconds. See full documentation.
I tried this because i had trouble with data which was in that format but over 24 hours.
The example from ?as.POSIX help gives
as.POSIXct(strptime(begin, "%Y-%m-%d %H:%M:%S"))
so for you it would be
as.numeric(as.POSIXct(strptime(begin, "%Y-%m-%d %H:%M:%S")))

Resources