conversion from 12 hr to 24 hr in R and combine two tables - r

The image for Y table
enter image description here
I want to roll join two tables trial and trial2 with the key as time stamp. One table 'trial' has timestamp POSIXct as the key and other one 'trial2' has a timestamp in character . I tried to convert 'trial2' timestamo from 12 hour format to 24 hr format (POSIXct) so that I can apply roll join on them. But whatever I have used till now gave me NULL in the resulting field rolli for trial2.
library(data.table)
library(dplyr)
library(lubridate)
library(readr)
library(hms)
trial <- read_csv("X.csv")
trial2 <- read_csv("Y.csv")
trial2$rolli<- as.POSIXct(trial2$date ,format = '%m/%d/%Y %I:%M:%S %p')
#######OR#########
trial2$rolli<-strptime(trial2$date, "%m/%d/%Y %I:%M:%S %p")
#######OR#########
trial2$rolli<-ymd_hms(trial2$date)
trial<-mutate(trial, rolli=ymd_hms(paste("2018-11-27", Time), tz='Asia/Kolkata'))
trial<-data.table(trial)
trial2<-data.table(trial2)
setkey(trial, rolli)
setkey(trial2, rolli)
try<-trial[trial2, roll = "nearest"]
class(trial$rolli)
#[1] "POSIXct" "POSIXt"
class(trial2$rolli)
#[1] "POSIXct" "POSIXt"

Debugging is always hard, so a general tip: try to reduce it as much as possible.
Looking at it, I'd think that the parsing of the character hits a problem. I'm not sure about lubridate and ymd_hms, but the as.POSIXct and strptime calls should work.
You can check by printing trial2$date and trial2$rolli. If the date looks fine, but rolli consists of all NA's, then that's the problem.
Probably the dates provided as characters are not in the exact right format, these functions can be very picky.
In order to know exactly what is going wrong, I'd need to see a sample of Y.csv, but you can check if you've typed everything exactly right: spaces, or have you switched "\" and "/"? Also, I normally work with 24-hour-notation, so it could be that strptime is picky about a specification being "am" or "AM" or "a.m." or something else.
EDIT: I've seen the format you're trying to supply, which has decimals in the seconds, which means %S doesn't do the trick.
Instead, you want %OS (it is in the help for ?strptime, but it's quite hidden). Also, I can't see it clearly in the image, but in your original code, there are 2 spaces between "%Y" and "%I". Are there 2 in your input as well?
Anyway:
strptime('11/27/2018 11:44:04.479 AM', format='%m/%d/%Y %I:%M:%OS %p')
# Works with me
trial2$rolli<-strptime(trial2$date, "%m/%d/%Y %I:%M:%OS %p")
# Should solve your problem.
Furthermore, when printing trial2$rolli, the fractional part is not shown, but it is stored. You can show it with as.numeric(trial2$rolli) %% 1, although there may be some small rounding differences.
2nd EDIT:
To fix problems where you have times like 0:00 PM in your input (which is technically wrong, but you might not have control over your input), you can use:
trial2$date <- sub('0+(:..:..)', '12\1', trial2$date)
It replaces all occurences of the form 0 :restoftime or 00 :restoftime with 12 :restoftime
Only be careful about what your source actually means by something like 0:00:00.000 AM: is this midnight or noon? I don't know how R-functions handle this generally (or even if it's guaranteed to always be the same), and I'm not going to burn my hands on that question. If you look on the internet there are a lot of people who have very strong opinions on what AM/PM means in those circumstances, in all variations.

Related

Trimming unwanted characters

I have a very large data set (CSV) with information about bicycle counts from a bike share system. The information I'm working with is the time at which bicycles were taken out of the racks (departure time) and also the total travel time. What I want to do is to add them so I can get the arrival time at the arrival station. The departure time variable is FECHA_HORA_RETIRO and the travel time variable is TIEMPO_USO. The former, which is read by R as factor object, is in the following format: "23/01/2017 19:55:16". On the other hand, TIEMPO_USO is read by R as a character and it's in the following format: "0:17:46".
> head(viajes_ecobici_2017$FECHA_HORA_RETIRO)
[1] 28/01/2017 13:51 17/01/2017 16:24 12/01/2017 16:38 25/01/2017 10:31
> head(viajes_ecobici_2017$TIEMPO_USO)
[1] "1:35:37" "0:11:17" "0:32:51" "0:31:29" "1:31:59" "0:21:43" "0:5:43"
I first used strptime to get everything in the desired format
> viajes_ecobici_2017$FECHA_HORA_RETIRO =format(strptime(viajes_ecobici_2017$FECHA_HORA_RETIRO,format = "%d/%m/%Y %H:%M"),format = "%d/%m/%Y %H:%M:%S")
> viajes_ecobici_2017$TIEMPO_USO = format(strptime(viajes_ecobici_2017$TIEMPO_USO, format="%H:%M:%S"), format="%H:%M:%S")
This works with most observations. However, several observations became NA values after running this code. I went back to the original data to see why this was happening and created a variable with just the observations that became NA. When I looked closer at this observations I saw they have this format "\t\t01/06/2017 00:01". How can I get rid of the "\t\t" while preserving the rest of the information?
Thanks in advance for your help.
trimws() trims white space (including tab characters, \t) from the ends of a character variable:
viajes_ecobici_2017$TIEMPO_USO <- trimws(viajes_ecobici_2017$TIEMPO_USO)
For what it's worth, readr::read_csv() has a built-in trimws option (which is TRUE by default).
Assuming that the variable with the problem is TIEMPO_USO, then a simple regex would take care of the tab characters ("\t")
viajes_ecobici_2017$TIEMPO_USO <- gsub("^\\t\\t","", viajes_ecobici_2017$TIEMPO_USO)

Using R for a Date format of 07-JUL-16 06.05.54.000000 AM

I have 2 Date variables in a .csv file with formats of "07-JUL-16 06.05.54.000000 AM". I want to use these in a regression model. Should I be reading these into a data frame as factors or characters? How can I take a difference of the 2 dates in each case?
Read them in as characters (e.g. stringsAsFactors=FALSE or tidyverse functions), then use as.POSIXct, e.g.
as.POSIXct("07-JUL-16 06.05.54.000000 AM",format="%d-%b-%y %I.%M.%OS %p")
## [1] "2016-07-07 06:05:54 EDT"
(I'm assuming that you are intending a day-month-year format rather than a month-day-year format -- but actually I don't have any evidence to support that thought!)
Once you've done this, subtracting the values should just work (give you an object of difftime) -- but be careful with units when converting to numeric!
For what it's worth, lubridate::ymd_hms thinks it can guess the format, but guesses wrong (?? assuming I guessed right above: with a two-digit year, and without any year values greater than 31, there's really nothing to distinguish years and days ...)

How can I change character class date variables to POSIXlt class when there are multiple date formats?

I'm struggling with converting character class dates of many different format types (e.g., yyyy/mm/dd; mm/dd/yyyy; yyyy-mm-dd; mm-dd-yyyy; yy-mm-dd; mm-dd-yy; etc.) to POSIXlt class. Ideally, I would like to convert all birth_dates to POSIXlt class with yyyy/mm/dd format (see sample data below). Is there any simple way to do this in R?:
id birth_date start_date age
102 08/09/1993 2013/09/01 20
103 1995-02-21 2013/09/01 18
104 01-15-94 2013/09/01 19
105 88-12-30 2013/09/01 24
Here is what I have been doing thus far. Unfortunately, this doesn't seem to work (I wind up with more NAs than there should be) given all of the different ways in which the original date is formatted:
library(lubridate)
data$birth_date1<-as.Date(data$birth_date,format="%Y-%m-%d") #Convert character class to date class
data$birth_date2<-ymd(swc3$birth_date1) #Convert date class to POSIXlt class using lubridate pkg
That's horrible. Could be worse though. At least there are delimiters in there, like "-" and "/".
Short Answer
Yes, there's an easy way to parse that in R. Apply parse_date_time() separately to each birth date, giving it a decent orders list to chose from, and carefully set the order of the guesses. You'll need to convert the "integer-time" to a useful time when you're done.
See the Long Answer for details.
Long Answer
This is why the lubridate package has parse_date_time(). But there are problems. Let's see:
require(lubridate)
# WRONG! doesn't work as intended.
as.Date(
parse_date_time(data$birth_date,
orders=c("ymd", "mdy", "mdY", "Ymd")
)
)
[1] "1993-08-09" "1995-02-21" "1994-01-15" "0088-12-30"
That looks great, except for the last one. What's going on?
parse_date_time() is selecting a "best fit" set of orders and formats to use when parsing the dates, and the last element is the odd one out.
To make this work as intended, you'll need to apply parse_date_time() one-by-one to each date, because each date format was apparently selected more-or-less at random. This will be slower, but it will give more useful answers.
# RIGHT. Some conversion of results required.
parsed <- sapply(data[,"birth_date"],
parse_date_time,
orders=c("ymd", "mdy", "mdY", "Ymd") )
parsed
08/09/1993 1995-02-21 01-15-94 88-12-30
744854400 793324800 758592000 599443200
Ok, those look like Unix-time integers, which are the unclass()'d version of what parse_date_time() produces. And none are negative, so they must all have happened after 1970. This is encouraging. Convert:
# Conversion of results
parsed <- as.POSIXct(parsed, origin="1970-01-01", tz = "GMT")
as.Date(parsed)
08/09/1993 1995-02-21 01-15-94 88-12-30
"1993-08-09" "1995-02-21" "1994-01-15" "1988-12-30"
lubridate and parse_date_time() are very good at what they do.
Since you asked for POSIXlt, not Date types:
as.POSIXlt(parsed)
08/09/1993 1995-02-21
"1993-08-09 10:00:00 AEST" "1995-02-21 11:00:00 AEDT"
01-15-94 88-12-30
"1994-01-15 11:00:00 AEDT" "1988-12-30 11:00:00 AEDT"
Though I personally prefer only having dates when the actual time isn't important; these are assumed to be all happening at midnight UTC, and are converted to my time zone (Eastern Australia).

Converting time format to numeric with R

In most cases, we convert numeric time to POSIXct format using R. However, if we want to compare two time points, then we would prefer the numeric time format. For example, I have a date format like "2001-03-13 10:31:00",
begin <- "2001-03-13 10:31:00"
Using R, I want to covert this into a numeric (e.g., the Julian time), perhaps something like the passing seconds between 1970-01-01 00:00:00 and 2001-03-13 10:31:00.
Do you have any suggestions?
The Julian calendar began in 45 BC (709 AUC) as a reform of the Roman calendar by Julius Caesar. It was chosen after consultation with the astronomer Sosigenes of Alexandria and was probably designed to approximate the tropical year (known at least since Hipparchus). see http://en.wikipedia.org/wiki/Julian_calendar
If you just want to remove ":" , " ", and "-" from a character vector then this will suffice:
end <- gsub("[: -]", "" , begin, perl=TRUE)
#> end
#[1] "20010313103100"
You should read the section about 1/4 of the way down in ?regex about character classes. Since the "-" is special in that context as a range operator, it needs to be placed first or last.
After your edit then the answer is clearly what #joran wrote, except that you would need first to convert to a DateTime class:
as.numeric(as.POSIXct(begin))
#[1] 984497460
The other point to make is that comparison operators do work for Date and DateTime classed variables, so the conversion may not be necessary at all. This compares 'begin' to a time one second later and correctly reports that begin is earlier:
as.POSIXct(begin) < as.POSIXct(begin) +1
#[1] TRUE
Based on the revised question this should do what you want:
begin <- "2001-03-13 10:31:00"
as.numeric(as.POSIXct(begin))
The result is a unix timestamp, the number of seconds since epoch, assuming the timestamp is in the local time zone.
Maybe this could also work:
library(lubridate)
...
df <- '24:00:00'
as.numeric(hms(df))
hms() will convert your data from one time format into another, this will let you convert it into seconds. See full documentation.
I tried this because i had trouble with data which was in that format but over 24 hours.
The example from ?as.POSIX help gives
as.POSIXct(strptime(begin, "%Y-%m-%d %H:%M:%S"))
so for you it would be
as.numeric(as.POSIXct(strptime(begin, "%Y-%m-%d %H:%M:%S")))

How to add/subtract time from a POSIXlt time while keeping its class in R?

I am manipulating some POSIXlt DateTime objects. For example I would like to add an hour:
my.lt = as.POSIXlt("2010-01-09 22:00:00")
new.lt = my.lt + 3600
new.lt
# [1] "2010-01-09 23:00:00 EST"
class(new.lt)
# [1] "POSIXct" "POSIXt"
The thing is I want new.lt to be a POSIXlt object. I know I could use as.POSIXlt to convert it back to POSIXlt, but is there a more elegant and efficient way to achieve this?
POSIXct-classed objects are internally a numeric value that allows numeric calculations. POSIXlt-objects are internally lists. Unfortunately for your desires, Ops.POSIXt (which is what is called when you use "+") coerces to POSIXct with this code:
if (inherits(e1, "POSIXlt") || is.character(e1))
e1 <- as.POSIXct(e1)
Fortunately, if you just want to and an hour there is a handy alternative to adding 3600. Instead use the list structure and add 1 to the hour element:
> my.lt$hour <- my.lt$hour +1
> my.lt
[1] "2010-01-09 23:00:00"
This approach is very handy when you want to avoid thorny questions about DST changes, at least if you want adding days to give you the same time-of-day.
Edit (adding #sunt's code demonstrating that Ops.POSIXlt is careful with time "overflow".))
my.lt = as.POSIXlt("2010-01-09 23:05:00")
my.lt$hour=my.lt$hour+1
my.lt
# [1] "2010-01-10 00:05:00"
Short answer: No
Long answer:
POSIXct and POSIXlt objects are two specific types of the more general POSIXt class (not in a strictly object oriented inheritance sense, but in a quasi-object oriented implementation sense). Code freely switches between these. When you add to a POSIXlt object, the actual function used is +.POSIXt, not one specifically for POSIXlt. Inside this function, the argument is converted into a POSIXct and then dealt with (added to).
Additionally, POSIXct is the number of seconds from a specific date and time. POSIXlt is a list of date parts (seconds, minutes, hours, day of month, month, year, day of week, day of year, DST info) so adding to that directly doesn't make any sense. Converting it to a number of seconds (POSIXct) and adding to that does make sense.
It may not be significantly more elegant, but
seq.POSIXt( from=Sys.time(), by="1 hour", length.out=2 )[2]
IMHO is more descriptive than
Sys.time()+3600; # 60 minutes * 60 seconds
because the code itself documents that you're going for a "POSIX" "seq"uence incremented "by 1 hour", but it's a matter of taste. Works just fine on POSIXlt, but note that it returns a POSIXct either way. Also works for "days". See help(seq.POSIXt) for details on how it handles months, daylight savings, etc.
?POSIXlt tells you that:
Any conversion that needs to go between the two date-time classes requires a timezone: conversion from "POSIXlt" to "POSIXct" will validate times in the selected timezone.
So I guess that 3600 not being a POSIXlt object, there is an automatic conversion.
I would stick with simple:
new.lt = as.POSIXlt(my.lt + 3600)
class(new.lt)
[1] "POSIXlt" "POSIXt"
It's not that much of a hassle to add as.POSIXlt before your time operation.

Resources