as.Date function not working - r

I have a huge data set I am working with. Some of the months are in the format 01/01/2010 and others are 1/1/2010.
When I run as.Date(Dates, format="%y/%d/%m") all of the latter dates change the year to 2020. What is going on here?

Your format statement is not correct. Try this:
d1 <- "01/01/2010"
d2 <- "1/1/2010"
> as.Date(d1, format='%d/%m/%Y')
#[1] "2010-01-01"
> as.Date(d2, format='%d/%m/%Y')
#[1] "2010-01-01"
For dates with different formats of the year, the lubridate package can be used:
library(lubridate)
d1 <- "1/1/10"
d2 <- "01/01/2010"
parse_date_time(d1, "dmy")
#[1] "2010-01-01 UTC"
parse_date_time(d2, "dmy")
#[1] "2010-01-01 UTC"

Related

R - Prevent aggregate function from converting date time timezones to local time?

Is there a way to stop aggregate converting datetimes to the computer's local timezone? For example:
dtUTC <- as.POSIXct(c('2010-01-01 01:01:01', '2015-01-02 07:23:11',
'2016-06-02 05:23:41', '2018-01-08 17:57:43'), tz='UTC')
groups <- c(1,1,2,2)
result <- aggregate(dtUTC, by=list(groups), FUN=min)
The result is converted to my computers local timezone.
> dtUTC
[1] "2010-01-01 01:01:01 UTC" "2015-01-02 07:23:11 UTC" "2016-06-02 05:23:41 UTC"
[4] "2018-01-08 17:57:43 UTC"
> result$x
[1] "2010-01-01 12:01:01 AEDT" "2016-06-02 15:23:41 AEST"
I can convert it back post hoc but this is an annoying extra step to have to do. Especially if I have multiple datetime columns.
attr(result$x, 'tzone') <- 'UTC'
> result$x
[1] "2010-01-01 01:01:01 UTC" "2016-06-02 05:23:41 UTC"
I can't find anything that you can do with aggregate to change this behavior, but you can set your environment's TZ so any date-times will automatically be in UTC:
Sys.setenv(TZ='UTC') # <- set your TZ here
dtUTC <- as.POSIXct(c('2010-01-01 01:01:01', '2015-01-02 07:23:11',
'2016-06-02 05:23:41', '2018-01-08 17:57:43'))
groups <- c(1,1,2,2)
df <- data.frame(dtUTC, groups)
result <- aggregate(dtUTC ~ groups, df, min)
result$dtUTC
# [1] "2010-01-01 01:01:01 UTC" "2016-06-02 05:23:41 UTC"
you can use dplyr package to aggregate
library(lubridate)
library(dplyr)
dtUTC <- as.POSIXct(c('2010-01-01 01:01:01', '2015-01-02 07:23:11',
'2016-06-02 05:23:41', '2018-01-08 17:57:43'), tz='UTC')
groups <- c(1,1,2,2)
b<-data.frame(date= dtUTC, group = groups) %>% group_by(group) %>% dplyr::summarise(min = min(date))
b$min
> b$min
[1] "2010-01-01 01:01:01 UTC" "2016-06-02 05:23:41 UTC"

Inconsistent dates formatting (lubridate fail)

I have a vector of inconsistent dates, including (mainly) these three formats:
"%d/%m/%y", "%m/%d/%y" and "%d/%m/%Y"
I tried to implement this:
df <- as.data.frame(c("30/12/00","7/31/09","17/09/2008"),col.names = "original_date")
guess_date <- function(x){
require(lubridate)
guess <- guess_formats(x, c("mdy","dmy"))
date <- as.Date(x, guess)[1]
return(date)
}
df$date <- lapply(df$original_date, guess_date)
We can pass it with parse_date_time
library(lubridate)
parse_date_time(df$original_date,
guess_formats(as.character(df$original_date), c("mdy", "dmy", "dmY")))
#[1] "2000-12-30 UTC" "2009-07-31 UTC" "2008-09-17 UTC"

Convert Date Format from character "11JAN2016:00:00:00.000"

I am looking to convert this character vector to a date format
I have tried various methods though have not been successful so far. Any assistance with this is greatly appreciated!
See ?strptime for format details
as.Date(x, "%d%b%Y:%H:%M:%S")
#[1] "2016-01-11"
Or if you want it in date-time format
as.POSIXct(x, format = "%d%b%Y:%H:%M:%S")
#[1] "2016-01-11 GMT"
With lubridate
library(lubridate)
dmy_hms(x)
#[1] "2016-01-11 UTC"
and if you want only date
ymd(dmy_hms(x))
#[1] "2016-01-11"
data
x <- "11JAN2016:00:00:00.000"
One option is anydate from anytime
library(anytime)
anydate(data)
#[1] "2016-01-11"
data
data <- "11JAN2016:00:00:00.000"

Convert Date with special format using R

I have several variables that exist in the following format:
/Date(1353020400000+0100)/
I want to convert this format to ddmmyyyy. I found this solution for the same problem using php, but I don't know anything about php, so I'm unable to convert that solution to what I need, which is a solution that I can use in R.
Any suggestions?
Thanks.
If the format is milliseconds since the epoch then anytime() or as.POSIXct() can help you:
R> anytime(1353020400000/1000)
[1] "2012-11-15 17:00:00 CST"
R> anytime(1353020400.000)
[1] "2012-11-15 17:00:00 CST"
R>
anytime() converts to local time, which is Chicago for me. You would have to deal with the UTC offset separately.
Base R can do it too, but you need the dreaded origin:
R> as.POSIXct(1353020400.000, origin="1970-01-01")
[1] "2012-11-15 17:00:00 CST"
R>
As far as I can tell from the linked question, this is milliseconds since the epoch:
x <- "/Date(1353020400000+0100)/"
spl <- strsplit(x, "[()+]")
as.POSIXct(as.numeric(sapply(spl,`[[`,2)) / 1000, origin="1970-01-01", tz="UTC")
#[1] "2012-11-15 23:00:00 UTC"
If you want to pick up the timezone difference as well, here's an attempt:
x <- "/Date(1353020400000+0100)/"
spl <- strsplit(x, "(?=[+-])|[()]", perl=TRUE)
tzo <- sapply(spl, function(x) paste(x[3:4],collapse="") )
dt <- as.POSIXct(as.numeric(sapply(spl,`[[`,2)) / 1000, origin="1970-01-01", tz="UTC")
as.POSIXct(paste(format(dt), tzo), tz="UTC", format = '%F %T %z')
#[1] "2012-11-15 22:00:00 UTC"
The package lubridate can come to the rescue as follows:
as.Date("1970-01-01") + lubridate::milliseconds(1353020400000)
Read: Number of milliseconds since epoch (= 1. January 1970, UTC + 0)
A parsing function can now be made using regular expressions:
parse.myDate <- function(text) {
num <- as.numeric(stringr::str_extract(text, "(?<=/Date\\()\\d+"))
as.Date("1970-01-01") + lubridate::milliseconds(num)
}
finally, format the Date with
format(theDate, "%d/%m/%Y %H:%M")
If you also need the time zone information, you can use this instead:
parse.myDate <- function(text) {
parts <- stringr::str_match(text, "^/Date\\((\\d+)([+-])(\\d{4})\\)/$")
as.POSIXct(as.numeric(parts[,2])/1000, origin = "1970-01-01", tz = paste0("Etc/GMT", parts[,3], as.integer(parts[,4])/100))
}

First day of the month from a POSIXct date time using lubridate

Given a POSIXct date time, how do you extract the first day of the month for aggregation?
library(lubridate)
full.date <- ymd_hms("2013-01-01 00:00:21")
lubridate has a function called floor_date which rounds date-times down. Calling it with unit = "month" does exactly what you want:
library(lubridate)
full.date <- ymd_hms("2013-01-01 00:00:21")
floor_date(full.date, "month")
[1] "2013-01-01 UTC"
I don't see a reason to use lubridate:
full.date <- as.POSIXct("2013-01-11 00:00:21", tz="GMT")
monthStart <- function(x) {
x <- as.POSIXlt(x)
x$mday <- 1
as.Date(x)
}
monthStart(full.date)
#[1] "2013-01-01"
first.of.month <- ymd(format(full.date, "%Y-%m-01"))
first.of.month
[1] "2013-01-01 UTC"
i have another solution :
first.of.month <- full.date - mday(full.date) + 1
but it needs the library 'lubridate' or 'date.table' (aggregation with data.table)
You can simply use base R's trunc:
d <- as.POSIXct("2013-01-11 00:00:21", tz="UTC")
trunc(d, "month")
#[1] "2013-01-01 UTC"

Resources