I have a vector of inconsistent dates, including (mainly) these three formats:
"%d/%m/%y", "%m/%d/%y" and "%d/%m/%Y"
I tried to implement this:
df <- as.data.frame(c("30/12/00","7/31/09","17/09/2008"),col.names = "original_date")
guess_date <- function(x){
require(lubridate)
guess <- guess_formats(x, c("mdy","dmy"))
date <- as.Date(x, guess)[1]
return(date)
}
df$date <- lapply(df$original_date, guess_date)
We can pass it with parse_date_time
library(lubridate)
parse_date_time(df$original_date,
guess_formats(as.character(df$original_date), c("mdy", "dmy", "dmY")))
#[1] "2000-12-30 UTC" "2009-07-31 UTC" "2008-09-17 UTC"
Related
I have a variable for the date of medical admission. However, it is not properly formatted. It is a factor and formatted as "DDMMYEAR HRMN", like "01012016 1215", which should mean "01-01-2016 12:15". How can I reformat it and assign weekdays?
You can use lubridate to parse the date, then weekdays from base R to get the day of week as a character.
library(lubridate)
d <- dmy_hm("01012016 1215")
weekdays(d)
Use as.POSIXct/strptime to convert to date time and then use weekdays.
df$date <- as.POSIXct(df$date, format = '%d%m%Y %H%M', tz = 'UTC')
df$weekday <- weekdays(df$date)
For example,
string <- '01012016 1215'
date <- as.POSIXct(string, format = '%d%m%Y %H%M', tz = 'UTC')
date
#[1] "2016-01-01 12:15:00 UTC"
weekdays(date)
#[1] "Friday"
I have a time in excel that when converter to R, comes as a character and looks someting like this 0.59658.
I am trying to convert to POSIXct but it returns as a POSIXct with NA.
teste <- as.POSIXct(test, format = "%H:%M")
I've also tried teste <- as.POSIXct(test, format = "%H:%M:%S")
For other columns it works fine, but not this one..
UPDATE:
I've done the solution, but a second problem comes with the rest of the thing that I need.
teste <- as.POSIXct(teste*24*60*60,"%H%M", origin="1970-01-01")
teste <- format(as.POSIXct(teste, format = "%Y-%m-%d %H:%M:%S"), format="%H:%M")
And now, I want to paste with a date vector that is a POSIXct in the 2013-01-06, with this command:
teste<-as.POSIXct(paste(date, teste), format="%Y-%m-%d %H:%M:%S")
And the NA are back
Confused as to what exactly you want, but what is wrong with this function:
df <- data.frame(number = c(0.59658, 0.59658, 0.59658, 0.59658, 0.59658), dates = c("2013-01-06", "2013-01-06", "2013-01-06", "2013-01-07", "2013-01-07"))
testing <- function(number, dates){
teste <- as.POSIXct(number*24*60*60,"%H%M", origin="1970-01-01")
teste <- format(as.POSIXct(teste, format = "%Y-%m-%d %H:%M:%OS"), format="%H:%M")
return(as.POSIXct(paste0(dates," ",teste)))
}
Which gives the following when doing testing(df$number, df$dates):
"2013-01-06 14:19:00 EST" "2013-01-06 14:19:00 EST" "2013-01-06 14:19:00 EST" "2013-01-07 14:19:00 EST" "2013-01-07 14:19:00 EST"
I have several variables that exist in the following format:
/Date(1353020400000+0100)/
I want to convert this format to ddmmyyyy. I found this solution for the same problem using php, but I don't know anything about php, so I'm unable to convert that solution to what I need, which is a solution that I can use in R.
Any suggestions?
Thanks.
If the format is milliseconds since the epoch then anytime() or as.POSIXct() can help you:
R> anytime(1353020400000/1000)
[1] "2012-11-15 17:00:00 CST"
R> anytime(1353020400.000)
[1] "2012-11-15 17:00:00 CST"
R>
anytime() converts to local time, which is Chicago for me. You would have to deal with the UTC offset separately.
Base R can do it too, but you need the dreaded origin:
R> as.POSIXct(1353020400.000, origin="1970-01-01")
[1] "2012-11-15 17:00:00 CST"
R>
As far as I can tell from the linked question, this is milliseconds since the epoch:
x <- "/Date(1353020400000+0100)/"
spl <- strsplit(x, "[()+]")
as.POSIXct(as.numeric(sapply(spl,`[[`,2)) / 1000, origin="1970-01-01", tz="UTC")
#[1] "2012-11-15 23:00:00 UTC"
If you want to pick up the timezone difference as well, here's an attempt:
x <- "/Date(1353020400000+0100)/"
spl <- strsplit(x, "(?=[+-])|[()]", perl=TRUE)
tzo <- sapply(spl, function(x) paste(x[3:4],collapse="") )
dt <- as.POSIXct(as.numeric(sapply(spl,`[[`,2)) / 1000, origin="1970-01-01", tz="UTC")
as.POSIXct(paste(format(dt), tzo), tz="UTC", format = '%F %T %z')
#[1] "2012-11-15 22:00:00 UTC"
The package lubridate can come to the rescue as follows:
as.Date("1970-01-01") + lubridate::milliseconds(1353020400000)
Read: Number of milliseconds since epoch (= 1. January 1970, UTC + 0)
A parsing function can now be made using regular expressions:
parse.myDate <- function(text) {
num <- as.numeric(stringr::str_extract(text, "(?<=/Date\\()\\d+"))
as.Date("1970-01-01") + lubridate::milliseconds(num)
}
finally, format the Date with
format(theDate, "%d/%m/%Y %H:%M")
If you also need the time zone information, you can use this instead:
parse.myDate <- function(text) {
parts <- stringr::str_match(text, "^/Date\\((\\d+)([+-])(\\d{4})\\)/$")
as.POSIXct(as.numeric(parts[,2])/1000, origin = "1970-01-01", tz = paste0("Etc/GMT", parts[,3], as.integer(parts[,4])/100))
}
I have a huge data set I am working with. Some of the months are in the format 01/01/2010 and others are 1/1/2010.
When I run as.Date(Dates, format="%y/%d/%m") all of the latter dates change the year to 2020. What is going on here?
Your format statement is not correct. Try this:
d1 <- "01/01/2010"
d2 <- "1/1/2010"
> as.Date(d1, format='%d/%m/%Y')
#[1] "2010-01-01"
> as.Date(d2, format='%d/%m/%Y')
#[1] "2010-01-01"
For dates with different formats of the year, the lubridate package can be used:
library(lubridate)
d1 <- "1/1/10"
d2 <- "01/01/2010"
parse_date_time(d1, "dmy")
#[1] "2010-01-01 UTC"
parse_date_time(d2, "dmy")
#[1] "2010-01-01 UTC"
Given a POSIXct date time, how do you extract the first day of the month for aggregation?
library(lubridate)
full.date <- ymd_hms("2013-01-01 00:00:21")
lubridate has a function called floor_date which rounds date-times down. Calling it with unit = "month" does exactly what you want:
library(lubridate)
full.date <- ymd_hms("2013-01-01 00:00:21")
floor_date(full.date, "month")
[1] "2013-01-01 UTC"
I don't see a reason to use lubridate:
full.date <- as.POSIXct("2013-01-11 00:00:21", tz="GMT")
monthStart <- function(x) {
x <- as.POSIXlt(x)
x$mday <- 1
as.Date(x)
}
monthStart(full.date)
#[1] "2013-01-01"
first.of.month <- ymd(format(full.date, "%Y-%m-01"))
first.of.month
[1] "2013-01-01 UTC"
i have another solution :
first.of.month <- full.date - mday(full.date) + 1
but it needs the library 'lubridate' or 'date.table' (aggregation with data.table)
You can simply use base R's trunc:
d <- as.POSIXct("2013-01-11 00:00:21", tz="UTC")
trunc(d, "month")
#[1] "2013-01-01 UTC"