I have almost finished my script but I have a problem with my dates format.
I installed lubridate package used the as_date function, but it doesn't give me what I want (a date).
"time" is my variable, I put its description below.
I do not put my entire script since the concern is only about this format question (and it implies a huge netcdf file impossible to download)
Could you help me please ?
class(time)
[1] "array"
head(time)
[1] 3573763200 3573774000 3573784800 3573795600 3573806400 3573817200
tunits
$long_name
[1] "time in seconds (UT)"
$standard_name
[1] "time"
$units
[1] "seconds since 1900-01-01T00:00:00Z"
$axis
[1] "T"
$time_origin
[1] "01-JAN-1900 00:00:00"
$conventions
[1] "relative number of seconds with no decimal part"
#conversion
date = as_date(time,tz="UTC",origin = "1900-01-01")
head(date)
[1] "-5877641-06-23" "-5877641-06-23" "-5877641-06-23" "-5877641-06-23"
[5] "-5877641-06-23" "-5877641-06-23"
Time is in seconds since 01/01/1900. Converting a value in time to an actual date would work as follows, using the seconds methods in lubridate:
lubridate::ymd("1900-01-01") + lubridate::seconds(3573763200)
You can vectorize it:
lubridate::ymd("1900-01-01") + lubridate::seconds(time)
as_date() calculates the date using the number of days since the origin.
What you are looking for seems to be as_datetime() also from the lubridate package which calculates the date using the number of seconds since the origin. In your example this would be:
time <- c(3573763200,3573774000,3573784800,3573795600,3573806400,3573817200)
date <- as_datetime(time, tz = "UTC", origin = "1900-01-01") %>% date()
Using a dplyr pipe and the date() function from lubridate to extract the date from the as_datetime() function.
date <- as_date(time/(24*60*60), tz = "UTC", origin = "1900-01-01")
date
Related
I have a vector a = 40208.64507.
In excel, I can automatically change a to a datetime: 2010/1/30 15:28:54 by click the Date type.
I tried some methods but I cannot get the same result in R, just as in excel.
a = 40208.64507
# in Excel, a can change into: 2010/1/30 15:28:54
as.Date(a, origin = "1899-12-30")
lubridate::as_datetime(a, origin = "1899-12-30")
Is there any way to get the same results in R as in Excel?
Here are several ways. chron class is the closest to Excel in terms of internal representations -- they are the same except for origin -- and the simplest so we list that one first. We also show how to use chron as an intermediate step to get POSIXct.
Base R provides an approach which avoids package dependencies and lubridate might be used if you are already using it.
1) Add the appropriate origin using chron to get a chron datetime or convert that to POSIXct. Like Excel, chron works in days and fractions of a day, but chron uses the UNIX Epoch as origin whereas Excel uses the one shown below.
library(chron)
a <- 40208.64507
# chron date/time
ch <- as.chron("1899-12-30") + a; ch
## [1] (01/30/10 15:28:54)
# POSIXct date/time in local time zone
ct <- as.POSIXct(ch); ct
## [1] "2010-01-30 10:28:54 EST"
# POSIXct date/time in UTC
as.POSIXct(format(ct), tz = "UTC")
## [1] "2010-01-30 10:28:54 UTC"
2) Using only base R convert the number to Date class using the indicated origin and then to POSIXct.
# POSIXct with local time zone
ct <- as.POSIXct(as.Date(a, origin = "1899-12-30")); ct
## [1] "2010-01-30 10:28:54 EST"
# POSIXct with UTC time zone
as.POSIXct(format(ct), tz = "UTC")
## [1] "2010-01-30 15:28:54 UTC"
3) Using lubridate it is similar to base R so we can write
library(lubridate)
# local time zone
as_datetime(as_date(a, origin = "1899-12-30"), tz = "")
[1] "2010-01-30 15:28:54 EST"
# UTC time zone
as_datetime(as_date(a, origin = "1899-12-30"))
[1] "2010-01-30 15:28:54 UTC"
I am working with an external package that's converting columns of a dataframe with the lubridate date type Date into numeric type. (Confirmed by running as.numeric() on the columns).
I'm wondering if there's a way to convert it back?
For example, if I have the date "O1-01-2021" then running as.numeric on it returns -719143. How can I turn that back into "O1-01-2021" ?
Note that Date class is part of base R, not lubridate.
You probably assumed that the data was year/month/day by mistake. Using base R to eliminate lubridate as a problem we can replicate the question's result like this:
as.numeric(as.Date("01-01-2021", "%Y-%m-%d"))
## [1] -719143
Had we used day/month/year we would have gotten:
as.numeric(as.Date("01-01-2021", "%d-%m-%Y"))
## [1] 18628
or using lubridate
library(lubridate)
as.numeric(dmy("01-01-2021"))
## [1] 18628
It would be best if you fix the mistake that resulted in -719143 but if you don't control that and are faced with an input of
-719143 and want to get as.Date("2021-01-01") as the output then:
# input x is numeric; result is Date class
fixup <- function(x) as.Date(format(.Date(x), "%y-%m-%d"), "%d-%m-%y")
fixup(-719143)
## [1] "2020-01-01"
Note that we can't tell from the question whether 01-01-2020 is supposed to represent day-month-year or month-day-year so we assumed the first but if it is to represent the second then it should be obvious at this point how to proceed.
EDIT #2: It looks like the original data is being parsed as Jan 20, year 1, which might happen if the year-month-day columns were jumbled while being parsed:
as.numeric(as.Date("01-01-2021", format = "%Y-%m-%d", origin = "1970-01-01"))
[1] -719143
as.numeric(as.Date("0001-01-20", origin = "1970-01-01"))
[1] -719143
Is there a way to share an example of the raw data as you have it? e.g. dput(MY_DATA[1:10, DATE_COL])
EDIT: -719143 is about 1970 years of days, which can't be a coincidence, given that many date/time formats use 1970 as a baseline. I wonder if 01-01-2021 is being interpreted as the numeric formula equal to -2021 and so we're looking at perhaps -2021 seconds/days/[?] before year zero, which would be about -1970 years before the epoch...
-719143/(365)
[1] -1970.255
For instance, we can get something close with:
as.numeric(as.Date("0000-01-01", origin = "1970-01-01"))
[1] -719528
Original answer:
R treats a string describing a date as text:
x <- "01-01-2021"
class(x)
[1] "character"
We can convert it to a Date data type using these two equivalent commands:
base_dt <- as.Date(x, "%m-%d-%Y") # base R version
lubridt <- lubridate::mdy(x) # convenience lubridate function
identical(base_dt, lubridt)
[1] TRUE
Under the hood, a Date object in R is a numeric value with a flag telling R it's a date:
> typeof(lubridt) # What general type of data is it?
[1] "double" # --> numeric, stored as a double
> as.numeric(lubridt)
[1] 18628
> class(lubridt) # Does it have any special class attributes?
[1] "Date" # --> yes, it's a Date
> dput(lubridt) # How would we construct it from scratch?
structure(18628, class = "Date") # --> by giving 18628 a Date attribute
In R, a Date is encoded as the number of days since 1970 began:
> as.Date("1970-01-1") + as.numeric(lubridt)
[1] "2021-01-01"
We could convert it back to the original text using:
format(base_dt, "%m-%d-%Y")
[1] "01-01-2021"
identical(x, format(base_dt, "%m-%d-%Y"))
[1] TRUE
I am trying to remove the utc from this data and just keep it in single quotes this is the function i am using in R.
date.start = as.Date(Sys.Date())
But i am getting this result
I guess date.start is Sys.time() therefore do:
date.start = as.Date(Sys.time())
Sys.Date()
Sys.time()
Sys.timezone()
as.Date(Sys.time())
Output:
> Sys.Date()
[1] "2021-08-17"
> Sys.time()
[1] "2021-08-17 09:14:33 CEST"
> Sys.timezone()
[1] "Europe/Berlin"
> as.Date(Sys.time())
[1] "2021-08-17"
I think that the timezone 'UTC' is being posited there by your system settings. I believe that generating the system date with lubridate might sidestep the issue within R:
date.start = lubridate::today(tzone = "")
Use sub:
sub(" UTC", "", date)
[1] "2021-08-17" "2020-12-12"
Test data:
date <- c("2021-08-17 UTC", "2020-12-12 UTC")
Try using different time formats when getting data.
format(Sys.time(),"%d-%m-%y")
For better understanding you can read rbloggers article on Date Formats in R here:
https://www.r-bloggers.com/2013/08/date-formats-in-r/
I'm not sure why you want to remove it. That would help. Another answer showed you how to convert it to a string.
But you'll want it in date format to do something like seq(Sys.Date(), Sys.Date() + 24, by = 'day').
If the reason you want it in a particular time zone is to to join data set at midnight, you should use lubridate's force_tz ala force_tz(Sys.Date(), 'America/Chicago'). Be careful, here because it the timezone will change as needed due to daylight savings. That's why it's usually better to stick with UTC anyways.
Otherwise, as the other poster mentioned, just convert to string and format it ala format(Sys.Date(),"%Y-%m-%d").
In the question "average time in a column in hr:min:sec format" the following example is given:
Col_Time = c('03:08:20','03:11:30','03:22:18','03:27:39')
library(chron)
mean(times(Col_Time))
[1] 03:17:27
How can I get hr:min:sec as result for the standard deviation? If I use the R function sd, the result looks like that:
sd(times(Col_Time))
[1] 0.006289466
sd is operating on the number internally representing the time (days for chron::times, seconds for hms and POSIXct, settable for difftime), which is fine. The only problem is that it is dropping the class from the result so it isn't printed nicely. The solution, then, is just to convert back to the time class afterwards:
x <- c('03:08:20','03:11:30','03:22:18','03:27:39')
chron::times(sd(chron::times(x)))
#> [1] 00:09:03
hms::as.hms(sd(hms::as.hms(x)))
#> 00:09:03.409836
as.POSIXct(sd(as.POSIXct(x, format = '%H:%M:%S')),
tz = 'UTC', origin = '1970-01-01')
#> [1] "1970-01-01 00:09:03 UTC"
as.difftime(sd(as.difftime(x, units = 'secs')),
units = 'secs')
#> Time difference of 543.4098 secs
You can use lubridate package. The hms function will convert time from characters to HMS format. Then use seconds to convert to seconds and calculate mean/sd. Finally, use seconds_to_period to get the result in HMS format.
library(lubridate)
Col_Time = c('03:08:20','03:11:30','03:22:18','03:27:39')
#Get the mean
seconds_to_period(mean(seconds(hms(Col_Time))))
# [1] "3H 17M 26.75S"
#Get the sd
seconds_to_period(sd(seconds(hms(Col_Time))))
#[1] "9M 3.40983612739285S"
The date in my dataset is like this: 20130501000000 and I'm trying to convert this to a better datetime format in R
data1$date <- as.Date(data1$date, format = "%Y-%m-%s-%h-%m-%s")
However, I get an error for needing an origin. After I put the very first cell under date in as origin, it converts every cell under date to N/A. Is this right or should I try as.POSIXct()?
That is a somewhat degenerate format, but the anytime() and anydate() functions of the anytime package can help you, without requiring any explicit format strings:
R> anytime("20130501000000") ## returns POSIXct
[1] "2013-05-01 CDT"
R> anydate("20130501000000") ## returns Date
[1] "2013-05-01"
R>
Not that we parse from character representation here -- parsing from numeric would be wrong as we use a conflicting heuristic to make sense of dates stored a numeric values.
So here your code would just become
data1$data <- anytime::anydate(data1$date)
provided data1$date is in character, else wrap one as.character() around it.
Lastly, if you actually want Datetime rather than Date (as per your title), don't use anydate() but anytime().
Before I write my answer, I would like to say that the format argument should be the format that your string is in. Therefore, if you have "20130501000000", you have to use (you don't have - between each component of your date in the string format):
as.Date("20130501000000", format = "%Y%m%d%H%M%S")
# [1] "2013-05-01"
which works just fine, does not produce any error, and will return an object of class Date:
as.Date("20130501000000", format = "%Y%m%d%H%M%S") |> class()
# [1] "Date"
Therefore, I think your issue is more of a formatting and not origin of the date.
Now to my detailed answer:
As far as I know and can understand, the as.Date() will convert it to "date", so if you want the time part of the string as well, you have to use as.POSIXct():
as.POSIXct("20130501000000", format = "%Y%m%d%H%M%S")
# [1] "2013-05-01 EEST"
as.POSIXct("20130501000000", format = "%Y%m%d%H%M%S") |> class()
# [1] "POSIXct" "POSIXt"
Note that the timezone is EEST which is my local timezone, if you want to define the timezone, you have to define it. For example to set the timezone to UTC:
as.POSIXct("20130501000000", format = "%Y%m%d%H%M%S", tz = "UTC")
# [1] "2013-05-01 UTC"
using the as.POSIXct() you can do arithmetic with the object:
times <- c("20130501000000",
"20130501035001") # added 03:50:01 to the first element
class(times)
# [1] "character"
times <- as.POSIXct(times, format = "%Y%m%d%H%M%S", tz = "UTC")
class(times)
# [1] "POSIXct" "POSIXt"
times[2] - times[1]
# Time difference of 3.833611 hours