I am trying to find out if it is possible to save the date time zone into a
file in R.
Lets create a date/time variable set to EST time.
x <- as.POSIXct("2015-06-01 12:00:00", tz = "EST")
x
#> [1] "2015-06-01 12:00:00 EST"
lubridate::tz(x)
#> [1] "EST"
tmpfile <- tempfile()
I will save this date information (which is in EST time) using base R,
readr and data.table to see how the information is preserved.
Base R version
The date gets converted to UTC and the time is not changed.
write.csv(data.frame(x = x), tmpfile)
df <- read.csv(tmpfile)
df
#> X x
#> 1 1 2015-06-01 12:00:00
lubridate::tz(df$x)
#> [1] "UTC"
tibble/readr version
It gets converted to UTC and the time is formatted as UTC as described in
the help. Time is now 17:00:00.
… POSIXct values are formatted as ISO8601 with a UTC timezone Note:
POSIXct objects in local or non-UTC timezones will be converted to UTC
before writing.
readr::write_csv(tibble::tibble(x = x), tmpfile)
df <- readr::read_csv(tmpfile)
#> Rows: 1 Columns: 1
#> ── Column specification ────────────────────────────────────────────────────────
#> Delimiter: ","
#> dttm (1): x
#>
#> ℹ Use `spec()` to retrieve the full column specification for this data.
#> ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
df
#> # A tibble: 1 × 1
#> x
#> <dttm>
#> 1 2015-06-01 17:00:00
lubridate::tz(df$x)
#> [1] "UTC"
data.table version
The date gets converted to UTC and the time is formatted as UTC as described in
the help. Time is now also 17:00:00.
data.table::fwrite(data.table::data.table(x = x), tmpfile)
df <- data.table::fread(tmpfile)
df
#> x
#> 1: 2015-06-01 17:00:00
lubridate::tz(df$x)
#> [1] "UTC"
Is it possible to preserve the time zone when saving a file? If the user is
not paying attention, this can go silently and create issues.
Created on 2022-03-14 by the reprex package (v2.0.1)
Saving to an .RDS is often the best choice to preserve an object exactly as it’s represented in R. In this case, it preserves timezone without conversion:
saveRDS(data.frame(x = x), tmpfile)
df <- readRDS(tmpfile)
df
# x
# 1 2015-06-01 12:00:00
lubridate::tz(df$x)
# [1] "EST"
df$x
# [1] "2015-06-01 12:00:00 EST"
You could simply paste the time zone on to the end of the datetime string.
x <- as.POSIXct("2015-06-01 12:00:00", tz = "EST")
df <- data.frame(x = paste(as.character(x), lubridate::tz(x)))
tmpfile <- tempfile()
write.csv(df, tmpfile, row.names = FALSE)
new_df <- read.csv(text = csv)
new_df
#> x
#> 1 2015-06-01 12:00:00 EST
as.POSIXct(new_df$x, strsplit(new_df$x, ' ')[[1]][3])
#> [1] "2015-06-01 12:00:00 EST"
Related
I have a table with date and temperature, and I need to convert the date from this format 1/1/2021 12:00:00 AM to just 1/1/21. Please help. I tried to add a new column with the new date using this code
SFtemps$Date <- as.Date(strptime(SFtemps$Date, "%m/%d/%y %H:%M"))
but it's not working.
It should be %I to represent hours as decimal number (01–12), not %H, and %y to
years without century (00–99).
x <- "1/1/2021 12:00:00 AM"
format(strptime(x, "%m/%d/%Y %I:%M:%S %p"), "%m/%d/%y")
[1] "01/01/21"
Note that after you re-foramt the time object, it'll be a pure character string and lose all attributes of a time object.
Your string is actually a timestamp, using lubridate you can convert it to the desired format and then leverage stamp function to get your data formatted in the manner that suits you.
org_str <- "1/1/2021 12:00:00 AM"
library("lubridate")
#>
#> Attaching package: 'lubridate'
#> The following objects are masked from 'package:base':
#>
#> date, intersect, setdiff, union
x <- mdy_hms(x = org_str, tz = "UTC")
sf <- stamp("29/01/1997")
#> Multiple formats matched: "%d/%Om/%Y"(1), "%d/%m/%Y"(1)
#> Using: "%d/%Om/%Y"
sf(x)
#> [1] "01/01/2021"
Created on 2022-04-22 by the reprex package (v2.0.1)
Explanation
Contrary to common perception the date doesn't have a format as such. Only string representation of the date can be of a specific format. Consider the following example. Running code:
x <- Sys.Date()
unclass(x)
will return an integer. This is because1:
Thus, objects of the class Date are internally represented as numbers. More specifically, dates in R are actually integers.
The same is applicable to the POSIX objects
library(lubridate)
#>
#> Attaching package: 'lubridate'
#> The following objects are masked from 'package:base':
#>
#> date, intersect, setdiff, union
xn <- now()
class(xn)
#> [1] "POSIXct" "POSIXt"
unclass(xn)
#> [1] 1650644616
#> attr(,"tzone")
#> [1] ""
Created on 2022-04-22 by the reprex package (v2.0.1)
You can always run:
format.Date(lubridate::now(), "%d-%m")
# [1] "22-04"
However, by doing this you are not formatting the date but creating a string representation of the date / timestamp object. Your string representation would have to be converted to timestamp / date object if you want to use it in common date time operations.
Suggestions
Whenever working with date time / date objects I would argue that it's advisable to keep those objects as a correct class maintaining the relevant details (such as time zone in case of timestamps) and only leverage formatting functions when utilising the data in analysis / visual representation, as shown in the sample.
1 Essentials of dates and times
Your format is a bit off. These are two options that work (lubridate is my favorite):
SFtemps <- list()
SFtemps$Date <- '1/1/2021 12:00:00 AM'
> as.Date(strptime(SFtemps$Date, "%m/%d/%Y %r"))
[1] "2021-01-01"
> as.Date(lubridate::mdy_hms(SFtemps$Date))
[1] "2021-01-01"
This will give you the date. If you want to see it as 1/1/2021, use the format function
You can try tryFormats
as.Date(x, format, tryFormats = c("%m/%d/%y, %H:%M"),
optional = FALSE)
It should be an easy issue, but I got stacked with it. I have a data.frame with dates and values:
class(var_data)
[1] "tbl_df" "tbl" "data.frame"
var_data
A tibble: 42 x 2
date Tourists
<dttm> <dbl>
1 2006-03-01 00:00:00 55280.
2 2006-06-01 00:00:00 84392.
3 2006-09-01 00:00:00 132714.
Then I want to copy some dates and values into other data.frame:
var_list_DB$var_last[ii] <- var_data[last,"Tourists"]
var_list_DB$var_date_start[ii] <- var_data[1,"date"]
var_list_DB$var_date_last[ii] <- var_data[last,"date"]
But instead of dates I got numbers:
var_date_start var_date_last var_val_last
951868800 1496275200 10044.3162
And while trying to convert to date format, got an error:
as.Date(var_data[last,"date"], format = "%m/%d/%Y")
Error in as.Date.default(x, ...) :
do not know how to convert 'x' to class “Date”
I recently updated to 3.5.0 version, may be this is an issue.
Add as.character convertion before pass to date and move var_data to data.frame format, like this two examples using as.Date and as.POSIXct:
var_data<-data.frame(var_data)
as.Date(as.character(var_data[,"date"]))
[1] "2006-03-01" "2006-06-01" "2006-09-01"
as.POSIXct(as.character(var_data[,"date"]))
[1] "2006-03-01 CET" "2006-06-01 CEST" "2006-09-01 CEST"
Date parsing bug
I am having trouble with character to date-time conversions and would appreciate help understanding what is going wrong. To do this, I define a very simple data frame with two rows, which holds an ID, a time zone, a date, and a time for each row. I would like to add a column that contains a (say) POSIXct entry for the combined date-time including the correct time zone. (This is a synthetic example but I want to apply this to a much larger data set.)
First we try combining these features into a unified representation of the data, time and time zone using R’s base facilities.
d <- data.frame(id=c(111, 222),
tzz=c("Europe/Berlin", "US/Eastern"),
d=c("09-Sep-2017", "11-Sep-2017"),
t=c("23:42:13", "22:05:17"),
stringsAsFactors = FALSE)
d$dt <- strptime(paste(d$d, d$t), tz=d$tzz, format="%d-%b-%Y %T")
Error in strptime(paste(d$d, d$t), tz = d$tzz, format = "%d-%b-%Y %T") :
invalid 'tz' value
That approach fails, though it’s not clear to my why. For example, I can do the non-vectorized version of this easily. Also, the time zones I am using seem to be part of the officially supported list.
d$tzz %in% OlsonNames()
[1] TRUE TRUE
dt1 <- strptime(paste(d$d[1], d$t[1]), tz=d$tzz[1], format="%d-%b-%Y %T")
print(dt1)
[1] "2017-09-09 23:42:13 CEST"
print(tz(dt1))
[1] "Europe/Berlin"
dt2 <- strptime(paste(d$d[2], d$t[2]), tz=d$tzz[2], format="%d-%b-%Y %T")
print(dt2)
[1] "2017-09-11 22:05:17 EDT"
print(tz(dt2))
[1] "US/Eastern"
Also, Thinking that perhaps my problem was in misunderstanding how to use strptime, I then tried a similar approach with lubridate:
library(lubridate)
d$dt <- dmy_hms(paste(d$d, d$t), tz=d$tzz)
Error in strptime(.enclose(x), .enclose(fmt), tz) : invalid 'tz' value
but got the same error. Again, a non-vector version works fine.
dt1l <- dmy_hms(paste(d$d[1], d$t[1]), tz=d$tzz[1])
print(dt1l)
[1] "2017-09-09 23:42:13 CEST"
print(tz(dt1l))
[1] "Europe/Berlin"
Trying mutate in tidyverse yields the same problem. (Incidentally, CEST is not among the OlsonNames set.)
Help for how to do this correctly, or at least an explanation of how this is going wrong, would be much appreciated.
Try computing it row by row like this:
library(dplyr)
d %>%
rowwise() %>%
mutate(ct = as.POSIXct(paste(d, t), format = "%d-%b-%Y %H:%M:%S", tz = tzz)) %>%
ungroup
giving:
# A tibble: 2 x 5
id tzz d t ct
<dbl> <chr> <chr> <chr> <dttm>
1 111. Europe/Berlin 09-Sep-2017 23:42:13 2017-09-09 17:42:13
2 222. US/Eastern 11-Sep-2017 22:05:17 2017-09-11 22:05:17
Similar to Gabor's but with data.table using the fact that the ids are unique:
R> dt <- data.table(d)
R> dt[ , ct := as.POSIXct(paste(d, t), "%d-%b-%Y %H:%M:%S", tz=tzz), by=id][]
id tzz d t ct
1: 111 Europe/Berlin 09-Sep-2017 23:42:13 2017-09-09 17:42:13
2: 222 US/Eastern 11-Sep-2017 22:05:17 2017-09-11 22:05:17
R>
I'd like to change the time zone of a POSIXct object in R, using the with_tz() function in the lubridate package.
This example I pulled from the web works for me:
meeting <- ymd_hms("2011-07-01 09:00:00", tz = "Pacific/Auckland")
with_tz(meeting, "America/Chicago")
But this one does not, using a snippet of some data:
atime <- as.POSIXct("2016-11-04 18:04:30",
format="%Y-%m-%d %H:%M:%S",
tz="PST")
atime_utc <- with_tz(atime, "UTC")
str() and tz() show that the new object has a time zone of "UTC", and is a POSIXct object, but the times are identical. There should be 8 hours between them after the time zone conversion.
Another solution using a different function would be fine, too.
The comments above should be well taken, but you can also try force_tz depending on your needs:
library(lubridate)
#>
#> Attaching package: 'lubridate'
#> The following object is masked from 'package:base':
#>
#> date
atime <- as.POSIXct("2016-11-04 18:04:30",
format="%Y-%m-%d %H:%M:%S",
tz="PST")
#> Warning in strptime(x, format, tz = tz): unknown timezone 'PST'
#> Warning in as.POSIXct.POSIXlt(as.POSIXlt(x, tz, ...), tz, ...): unknown
#> timezone 'PST'
tz(atime)
#> [1] "PST"
atime_utc <- with_tz(atime, "UTC")
force_tz(atime, "UTC")
#> [1] "2016-11-04 10:04:30 UTC"
Created on 2019-03-03 by the reprex package (v0.2.1)
Want to change the class for Time to POSIXlt and extract only the hours minutes and seconds
str(df3$Time)
chr [1:2075259] "17:24:00" "17:25:00" "17:26:00" "17:27:00" ...
Used the strptime function
df33$Time <- strptime(df3$Time, format = "%H:%M:%S")
This gives the date/time appended
> str(df3$Time)
POSIXlt[1:2075259], format: "2015-08-07 17:24:00" "2015-08-07 17:25:00" "2015-08-07 17:26:00" ...
Wanted to extract just the time without changing the POSIXlt class. using the strftime function
df3$Time <- strftime(df3$Time, format = "%H:%M:%S")
but this converts the class back to "char" -
> class(df3$Time)
[1] "character"
How can I just extract the time with class set to POSIX or numeric...
If your data is
a <- "17:24:00"
b <- strptime(a, format = "%H:%M:%S")
you can use lubridate in order to have a result of class integer
library(lubridate)
hour(b)
minute(b)
# > hour(b)
# [1] 17
# > minute(b)
# [1] 24
# > class(minute(b))
# [1] "integer"
and you can combine them using
# character
paste(hour(b),minute(b), sep=":")
# numeric
hour(b) + minute(b)/60
for instance.
I would not advise to do that if you want to do any further operations on your data. However, it might be convenient to do that if you want to plot the results.
A datetime object contains date and time; you cannot extract 'just time'. So you have to think throught what you want:
POSIXlt is a Datetime representation (as a list of components)
POSIXct is a different Datetime representation (as a compact numeric)
Neither one omits the Date part. Once you have a valid object, you can choose to display only the time. But you cannot make the Date part disappear from the representation.
A "modern" tidyverse answer to this is to use hms::as_hms()
For example
library(tidyverse)
library(hms)
as_hms(1)
#> 00:00:01
as_hms("12:34:56")
#> 12:34:56
or, with your example data:
x <- as.POSIXlt(c("17:24:00", "17:25:00", "17:26:00", "17:27:00"), format = "%H:%M:%S")
x
#>[1] "2021-04-10 17:24:00 EDT" "2021-04-10 17:25:00 EDT" "2021-04-10 17:26:00 EDT" "2021-04-10 17:27:00 EDT"
as_hms(x)
# 17:24:00
# 17:25:00
# 17:26:00
# 17:27:00
See also docs here:
https://hms.tidyverse.org/reference/hms.html
You can also use the chron package to extract just times of the day:
library(chron)
# current date/time in POSIXt format as an example
timenow <- Sys.time()
# create chron object "times"
onlytime <- times(strftime(timenow,"%H:%M:%S"))
> onlytime
[1] 14:18:00
> onlytime+1/24
[1] 15:18:00
> class(onlytime)
[1] "times"
This is my idiom for getting just the timepart from a datetime object. I use floor_date() from lubridate to get midnight of the timestamp and take the difference of the timestamp and midnight of that day. I create and store a hms object provided with lubridate (I believe) in dataframes because the class has formatting of hh:mm:ss that is easy to read, but the underlying value is a numeric value of seconds. Here is my code:
library(tidyverse)
library(lubridate)
#>
#> Attaching package: 'lubridate'
#> The following object is masked from 'package:base':
#>
#> date
# Create timestamps
#
# Get timepart by subtacting the timestamp from it's floor'ed date, make sure
# you convert to seconds, and then cast to a time object provided by the
# `hms` package.
# See: https://www.rdocumentation.org/packages/hms/versions/0.4.2/topics/hms
dt <- tibble(dt=c("2019-02-15T13:15:00", "2019-02-19T01:10:33") %>% ymd_hms()) %>%
mutate(timepart = hms::hms(as.numeric(dt - floor_date(dt, "1 day"), unit="secs")))
# Look at result
print(dt)
#> # A tibble: 2 x 2
#> dt timepart
#> <dttm> <time>
#> 1 2019-02-15 13:15:00 13:15
#> 2 2019-02-19 01:10:33 01:10
# `hms` object is really a `difftime` object from documentation, but is made into a `hms`
# object that defaults to always store data in seconds.
dt %>% pluck("timepart") %>% str()
#> 'hms' num [1:2] 13:15:00 01:10:33
#> - attr(*, "units")= chr "secs"
# Pull off just the timepart column
dt %>% pluck("timepart")
#> 13:15:00
#> 01:10:33
# Get numeric part. From documentation, `hms` object always stores in seconds.
dt %>% pluck("timepart") %>% as.numeric()
#> [1] 47700 4233
Created on 2019-02-15 by the reprex package (v0.2.1)
If you want it in POSIX format, the only way would be to leave it as it is, and extract just the "time" part everytime you display it. But internally it will always be date + time anyway.
If you want it in numeric, however, you can simply convert it into a number.
For example, to get time as number of seconds passed since the beginning of the day:
df3$Time=df3$Time$sec + df3$Time$min*60 + df3$Time$hour*3600