RchivalTag package date - r

I am trying to use the RchivalTag package in R and I am having issues with the date and time. Whenever I try to use read_histos() it gives me the following error:
hist_dat_1<- read_histos(Georgia)
Warning: 598 parsing failures.
row col expected actual
1 -- date like %H:%M:%S %d-%b-%Y 2009-02-23 00:00:00
2 -- date like %H:%M:%S %d-%b-%Y 2009-02-23 06:00:00
3 -- date like %H:%M:%S %d-%b-%Y 2009-02-23 12:00:00
4 -- date like %H:%M:%S %d-%b-%Y 2009-02-23 18:00:00
5 -- date like %H:%M:%S %d-%b-%Y 2009-02-24 00:00:00
... ... ........................... ...................
See problems(...) for more details.
Error in .fact2datetime(as.character(add0$Date), date_format = date_format, :
date concersion failed! Please revise current'date_format': %H:%M:%S %d-%b-%Y to 2009-02-23 00:00:00
Current date/time class:
class(Georgia$Date)
#[1] "POSIXct" "POSIXt"
I have changed the date and time format to:
format(Georgia$Date, format = "%H:%M:%S %d-%b-%Y")
But I am still getting the error message. If anyone is familiar with this package, help would be greatly appreciated

This is a difficult one without reproducible data, but we can actually replicate your error using one of the package's built-in data sets:
library(RchivalTag)
hist_file <- system.file("example_files/67851-12h-Histos.csv",package="RchivalTag")
Georgia <- read.csv(hist_file)
This reads the histogram data in as a data frame, which seems to be the stage you are at. We will, however, change the format to POSIXct standard date-time representation.
Georgia$Date <- strptime(Georgia$Date, "%H:%M:%S %d-%b-%Y")
Georgia$Date <- as.character(Georgia$Date)
Now when we try to read the data we get:
read_histos(Georgia)
#> Warning: 34 parsing failures.
#> row col expected actual
#> 1 -- date like %H:%M:%S %d-%b-%Y 2012-02-01 08:10:00
#> 2 -- date like %H:%M:%S %d-%b-%Y 2012-02-01 12:00:00
#> 3 -- date like %H:%M:%S %d-%b-%Y 2012-02-02 00:00:00
#> 4 -- date like %H:%M:%S %d-%b-%Y 2012-02-02 12:00:00
#> 5 -- date like %H:%M:%S %d-%b-%Y 2012-02-03 00:00:00
#> ... ... ........................... ...................
#> See problems(...) for more details.
#>
#> Error in .fact2datetime(as.character(add0$Date), date_format = date_format, :
#> date concersion failed! Please revise current'date_format': %H:%M:%S %d-%b-%Y to 2012-02-01 08:10:00
To fix this, we can rewrite the Date column using strftime:
Georgia$Date <- strftime(Georgia$Date, "%H:%M:%S %d-%b-%Y")
And now we can read the data without the parse errors.
read_histos(Georgia)
#> DeployID Ptt Type p0_25 p0_50 p0_75 p0_90
#> 1 67851 67851 TAD 2.9 100 100 100
#> 2 67851 67851 TAT 3.0 100 100 100

Related

How to convert a column of UTC timestamps into several different timezones?

I have a dataset with dates stored in the DB as UTC, however, the timezone is actually different.
mydat <- data.frame(
time_stamp=c("2022-08-01 05:00:00 UTC","2022-08-01 17:00:00 UTC","2022-08-02 22:30:00 UTC","2022-08-04 05:00:00 UTC","2022-08-05 02:00:00 UTC"),
timezone=c("America/Chicago","America/New_York","America/Los_Angeles","America/Denver","America/New_York")
)
I want to apply the timezone to the UTC saved timestamps, over the entire column.
I looked into the with_tz function in the lubridate package, but I don't see how to reference the "timezone" column, rather than hardcoding in a value.
Such as if I try
with_tz(mydat$time_stamp, tzone = mydat$timezone)
I get the following error
Error in as.POSIXlt.POSIXct(x, tz) : invalid 'tz' value`
However, if I try
mydat$time_stamp2 <- with_tz(mydat$time_stamp,"America/New_York")
that will render a new column without issue. How can I do this just referencing column values?
Welcome to StackOverflow. This is nice, common, and tricky problem! The following should do what you ask for:
Code
mydat <- data.frame(time_stamp=c("2022-08-01 05:00:00 UTC",
"2022-08-01 17:00:00 UTC",
"2022-08-02 22:30:00 UTC",
"2022-08-04 05:00:00 UTC",
"2022-08-05 02:00:00 UTC"),
timezone=c("America/Chicago", "America/New_York",
"America/Los_Angeles", "America/Denver",
"America/New_York"))
mydat$utc <- anytime::utctime(mydat$time_stamp, tz="UTC")
mydat$format <- ""
for (i in seq_len(nrow(mydat)))
mydat[i, "format"] <- strftime(mydat[i,"utc"],
"%Y-%m-%d %H:%M:%S",
tz=mydat[i,"timezone"])
Output
> mydat
time_stamp timezone utc format
1 2022-08-01 05:00:00 UTC America/Chicago 2022-08-01 05:00:00 2022-08-01 00:00:00
2 2022-08-01 17:00:00 UTC America/New_York 2022-08-01 17:00:00 2022-08-01 13:00:00
3 2022-08-02 22:30:00 UTC America/Los_Angeles 2022-08-02 22:30:00 2022-08-02 15:30:00
4 2022-08-04 05:00:00 UTC America/Denver 2022-08-04 05:00:00 2022-08-03 23:00:00
5 2022-08-05 02:00:00 UTC America/New_York 2022-08-05 02:00:00 2022-08-04 22:00:00
>
Comment
We first parse your data as UTC, I once wrote a helper function for that in my anytime package (there are alternatives but this is how I do it...). We then need to format from the given (numeric !!) UTC representation to the give timezone. We need a loop for this as the tz argument to strftime() is not vectorized.
Dirk gave a great answer that uses (mostly) base R tooling, if that is a requirement of yours. I wanted to also add an answer that uses the clock package that I developed because it doesn't require working rowwise over your data frame. clock has a function called sys_time_info() that retrieves low level information about a UTC time point in a specific time zone. It is one of the few functions where it makes sense to have a vectorized zone argument (which you need here) and returns an offset from UTC that will be useful here for converting to a "local" time.
As others have mentioned, you won't be able to construct a date-time vector that stores multiple time zones in it, but if you just need to see what the local time would have been in those zones, this can still be useful.
library(clock)
mydat <- data.frame(
time_stamp=c("2022-08-01 05:00:00 UTC","2022-08-01 17:00:00 UTC","2022-08-02 22:30:00 UTC","2022-08-04 05:00:00 UTC","2022-08-05 02:00:00 UTC"),
timezone=c("America/Chicago","America/New_York","America/Los_Angeles","America/Denver","America/New_York")
)
# Parse into a "sys-time" type, which can be thought of as a UTC time point
mydat$time_stamp <- sys_time_parse(mydat$time_stamp, format = "%Y-%m-%d %H:%M:%S")
mydat
#> time_stamp timezone
#> 1 2022-08-01T05:00:00 America/Chicago
#> 2 2022-08-01T17:00:00 America/New_York
#> 3 2022-08-02T22:30:00 America/Los_Angeles
#> 4 2022-08-04T05:00:00 America/Denver
#> 5 2022-08-05T02:00:00 America/New_York
# "Low level" information about DST, the time zone abbreviation,
# and offset from UTC in that zone. This is one of the few functions where
# it makes sense to have a vectorized `zone` argument.
info <- sys_time_info(mydat$time_stamp, mydat$timezone)
info
#> begin end offset dst abbreviation
#> 1 2022-03-13T08:00:00 2022-11-06T07:00:00 -18000 TRUE CDT
#> 2 2022-03-13T07:00:00 2022-11-06T06:00:00 -14400 TRUE EDT
#> 3 2022-03-13T10:00:00 2022-11-06T09:00:00 -25200 TRUE PDT
#> 4 2022-03-13T09:00:00 2022-11-06T08:00:00 -21600 TRUE MDT
#> 5 2022-03-13T07:00:00 2022-11-06T06:00:00 -14400 TRUE EDT
# Add the offset to the sys-time and then convert to a character column
# (these times don't really represent sys-time anymore since they are now localized)
mydat$localized <- as.character(mydat$time_stamp + info$offset)
mydat
#> time_stamp timezone localized
#> 1 2022-08-01T05:00:00 America/Chicago 2022-08-01T00:00:00
#> 2 2022-08-01T17:00:00 America/New_York 2022-08-01T13:00:00
#> 3 2022-08-02T22:30:00 America/Los_Angeles 2022-08-02T15:30:00
#> 4 2022-08-04T05:00:00 America/Denver 2022-08-03T23:00:00
#> 5 2022-08-05T02:00:00 America/New_York 2022-08-04T22:00:00

Using logicals to change date format in R

New R user here - I have many .csv files containing time stamps (date_time) in one column and temperature readings in the other. I am trying to write a function that detects the date_time format, and then changes it to a different format. Because of the way the data was collected, the date/time format is different in some of the .csv files. I want the function to change the date_time for all files to the same format.
Date_time format I want: %m/%d/%y %H:%M:%S
Date_time format I want changed to above: "%y-%m-%d %H:%M:%S"
> head(file1data)
x date_time temperature coupler_d coupler_a host_con stopped EoF
1 1 18-07-10 09:00:00 41.137 Logged
2 2 18-07-10 09:15:00 41.322
3 3 18-07-10 09:30:00 41.554
4 4 18-07-10 09:45:00 41.832
5 5 18-07-10 10:00:00 42.156
6 6 18-07-10 10:15:00 42.755
> head(file2data)
x date_time temperature coupler_d coupler_a host_con stopped EoF
1 1 07/10/18 01:00:00 PM 8.070 Logged
2 2 07/10/18 01:15:00 PM 8.095
3 3 07/10/18 01:30:00 PM 8.120
4 4 07/10/18 01:45:00 PM 8.120
5 5 07/10/18 02:00:00 PM 8.020
6 6 07/10/18 02:15:00 PM 7.795
file2data is in the correct format. file1data is incorrect.
I have tried using logicals to detect and replace the date format e.g.,
file1data %>%
if(str_match_all(date_time,"([0-9][0-9]{2})[-.])")){
format(as.POSIXct(date_time,format="%y-%m-%d %H:%M:%S"),"%m/%d/%y %H:%M:%S")
}else{format(date_time,"%m/%d/%y %H:%M:%S")}
but this has not worked, I get the following errors:
Error in if (.) str_match_all(date_time, "([0-9][0-9]{2})[-.])") else { :
argument is not interpretable as logical
In addition: Warning message:
In if (.) str_match_all(date_time, "([0-9][0-9]{2})[-.])") else { :
the condition has length > 1 and only the first element will be used
Any ideas?

how to parse timestamp in R?

I am using lubridate to parse a timestamp to POSIXlt.
user time
____ ____
1 2017-09-01 00:01:01
1 2017-09-01 00:01:20
1 2017-09-01 00:03:01
library(lubridate)
data[, time:=parse_date_time2(time,orders="YmdHMS",tz="NA")]
But this resulted in
Warning message:
In as.POSIXct.POSIXlt(.mklt(.Call("parse_dt", x, orders, FALSE, :
unknown timezone 'NA'
Any help is appreciated.
Parse simply without tz
> ts <- '2017-09-01 00:01:01'
> lubridate::parse_date_time2(ts,orders="YmdHMS")
[1] "2017-09-01 00:01:01 UTC"
Similar to input code:
data[, time:=parse_date_time2(time,orders="YmdHMS")]

Associate numbers to datetime/timestamp

I have a dataframe df with a certain number of columns. One of them, ts, is timestamps:
1462147403122 1462147412990 1462147388224 1462147415651 1462147397069 1462147392497
...
1463529545634 1463529558639 1463529556798 1463529558788 1463529564627 1463529557370.
I have also at my disposal the corresponding datetime in the datetime column:
"2016-05-02 02:03:23 CEST" "2016-05-02 02:03:32 CEST" "2016-05-02 02:03:08 CEST" "2016-05-02 02:03:35 CEST" "2016-05-02 02:03:17 CEST" "2016-05-02 02:03:12 CEST"
...
"2016-05-18 01:59:05 CEST" "2016-05-18 01:59:18 CEST" "2016-05-18 01:59:16 CEST" "2016-05-18 01:59:18 CEST" "2016-05-18 01:59:24 CEST" "2016-05-18 01:59:17 CEST"
As you can see my dataframe contains data accross several day. Let's say there are 3. I would like to add a column containing number 1, 2 or 3. 1 if the line belongs to the first day, 2 for the second day, etc...
Thank you very much in advance,
Clement
One way to do this is to keep track of total days elapsed each time the date changes, as demonstrated below.
# Fake data
dat = data.frame(datetime = c(seq(as.POSIXct("2016-05-02 01:03:11"),
as.POSIXct("2016-05-05 01:03:11"), length.out=6),
seq(as.POSIXct("2016-05-09 01:09:11"),
as.POSIXct("2016-05-16 02:03:11"), length.out=4)))
tz(dat$datetime) = "UTC"
Note, if your datetime column is not already in a datetime format, convert it to one using as.POSIXct.
Now, create a new column with the day number, counting the first day in the sequence as day 1.
dat$day = c(1, cumsum(as.numeric(diff(as.Date(dat$datetime, tz="UTC")))) + 1)
dat
datetime day
1 2016-05-02 01:03:11 1
2 2016-05-02 15:27:11 1
3 2016-05-03 05:51:11 2
4 2016-05-03 20:15:11 2
5 2016-05-04 10:39:11 3
6 2016-05-05 01:03:11 4
7 2016-05-09 01:09:11 8
8 2016-05-11 09:27:11 10
9 2016-05-13 17:45:11 12
10 2016-05-16 02:03:11 15
I specified the timezone in the code above to avoid getting tripped up by potential silent shifts between my local timezone and UTC. For example, note the silent shift from my default local time zone ("America/Los_Angeles") to UTC when converting a POSIXct datetime to a date:
# Fake data
datetime = seq(as.POSIXct("2016-05-02 01:03:11"), as.POSIXct("2016-05-05 01:03:11"), length.out=6)
tz(datetime)
[1] ""
date = as.Date(datetime)
tz(date)
[1] "UTC"
data.frame(datetime, date)
datetime date
1 2016-05-02 01:03:11 2016-05-02
2 2016-05-02 15:27:11 2016-05-02
3 2016-05-03 05:51:11 2016-05-03
4 2016-05-03 20:15:11 2016-05-04 # Note day is different due to timezone shift
5 2016-05-04 10:39:11 2016-05-04
6 2016-05-05 01:03:11 2016-05-05

Parsing dates in multiple formats in R using lubridate

I have data with dates in MM/DD/YY HH:MM format and others in plain old MM/DD/YY format. I want to parse all of them into the same format as "2010-12-01 12:12 EST." How should I go about doing that? I tried the following ifelse statement and it gave me a bunch of long integers and told me a large number of my data points failed to parse:
df_prime$date <- ifelse(!is.na(mdy_hm(df$date)), mdy_hm(df$date), mdy(df$date))
df_prime is a duplicate of the data frame df that I initially loaded in
IEN date admission_number KEY_PTF_45 admission_from discharge_to
1 12 3/3/07 18:05 1 252186 OTHER DIRECT
2 12 3/9/07 12:10 1 252186 RETURN TO COMMUNITY- INDEPENDENT
3 12 3/10/07 15:08 2 252382 OUTPATIENT TREATMENT
4 12 3/14/07 10:26 2 252382 RETURN TO COMMUNITY-INDEPENDENT
5 12 4/24/07 19:45 3 254343 OTHER DIRECT
6 12 4/28/07 11:45 3 254343 RETURN TO COMMUNITY-INDEPENDENT
...
1046334 23613488506 2/25/14 NA NA
1046335 23613488506 2/25/14 11:27 NA NA
1046336 23613488506 2/28/14 NA NA
1046337 23613488506 3/4/14 NA NA
1046338 23613488506 3/10/14 11:30 NA NA
1046339 23613488506 3/10/14 12:32 NA NA
Sorry if some of the formatting isn't right, but the date column is the most important one.
EDIT: Below is some code for a portion of my data frame via a dput command:
structure(list(IEN = c(23613488506, 23613488506, 23613488506, 23613488506, 23613488506, 23613488506), date = c("2/25/14", "2/25/14 11:27", "2/28/14", "3/4/14", "3/10/14 11:30", "3/10/14 12:32")), .Names = c("IEN", "date"), row.names = 1046334:1046339, class = "data.frame")
Have you tried the function guess_formats() in the lubridate package?
A reproducible example to build a dataframe like yours could be helpful!
The lubridate package's mdy_hm has a truncated parameter that lets you supply dates that might not have all the bits. For your example:
> mdy_hm(d$date,truncated=2)
[1] "2014-02-25 00:00:00 UTC" "2014-02-25 11:27:00 UTC"
[3] "2014-02-28 00:00:00 UTC" "2014-03-04 00:00:00 UTC"
[5] "2014-03-10 11:30:00 UTC" "2014-03-10 12:32:00 UTC"

Resources