I'm trying to use as.Date in R.
I'm using the command:
as.Date("65-05-14", "%y-%m-%d")
I get:
"2065-05-14"
Is there any way to get it show 1965 instead? Or do I need to recode everything into long format -- eg add 1900 as a numeric?
Thanks!
I didn't see this simple solution in the linked questions, so I'm adding it here too.
In base R you can simply use as.POSIXlt class which provides year attribute. You can then simply reduce 100 years.
Lets say this is your dates vector
(Date <- c("65-05-14", "15-05-14", "25-05-14", "34-05-14"))
## [1] "65-05-14" "15-05-14" "25-05-14" "34-05-14"
You can simply do
Date <- as.POSIXlt(Date, format = "%y-%m-%d")
Date$year <- Date$year - 100L
Date # Alternatively, you could also do `as.Date(Date)`
## [1] "1965-05-14 IDT" "1915-05-14 IDT" "1925-05-14 IDT" "1934-05-14 IDT"
Related
As the title suggests, I am trying to use either lubridate or ANYTIME (or similar) to convert a time from 24 hour into 12 hour.. To make life easier I don't need the whole time converted.
What I mean is I have a column of dates in this format:
2021-02-15 16:30:33
I can use inbound$Hour <- hour(inbound$Timestamp) to grab just the hour from the Timestamp which is great.. except that it is still in 24hr time. (this creates an integer column for the hour number)
I have tried several mutates such as inbound <- inbound %>% mutate(Hour = ifelse(Hour > 12, sum(Hour - 12),Hour)
This technically works.. but I get some really wonky values (I get a -294 in several rows for example)..
is there an easier way to get the 12hr time converted?
Per recommendation below I tried to use a base FORMAT as follows:
inbound$Time <- format(inbound$Timestamp, "%H:%M:%S")
inbound$Time <- format(inbound$Time, "%I:%M:%S")
and on the second format I am getting an error
Error in format.default(inbound$Time, "%I:%M:%S") :
invalid 'trim' argument
I did notice the first format converts to a class CHARACTER column.. not sure if that is causing issues with the 2nd format or not..
I then also tried:
`inbound$time <- format(strptime(inbound$Timestamp, "%H:%M:%S"), "%I:%M %p")`
Which runs without error.. but it creates a full column of NA's
Final edit::::: I made the mistake of mis-reading/applying the solution and that caused errors.. when using the inbound$Time <- format(inbound$Time, "%I:%M:%S") or as.numeric(format(inbound$Timestamp, "%I")) from the comments... both worked and solved the issue I was having.
To be clear... From 2021-02-15 16:30:33 you want just 04:30:33 as a result?
No need for lubridate or anytime. Assuming that is a Posixct
a <- as.POSIXct("2021-02-15 16:30:33")
a
# [1] "2021-02-15 16:30:33 UTC"
b <- format(a, "%H:%M:%S")
b
#[1] "16:30:33"
c <- format(a, "%I:%M:%S")
c
#[1] "04:30:33"
I have a sample date column as part of large data set, Below date is in multiple format.
I need convert below mentioned into Date format , Please help me with a solution.
22-04-2015
4/8/2015
18-04-2015
5/7/2015
26-05-2015
6/12/2015
24-06-2015
23-06-2015
Try with lubridate. The function guess_formats() allows defining possible formats of your data (you could add others if needed), and then you can use as.Date() to get the dates in the proper class using the formats previously defined. Here the code:
library(lubridate)
#Dates
vecdate <- c('22-04-2015', '4/8/2015','18-04-2015','5/7/2015','26-05-2015',
'6/12/2015','24-06-2015','23-06-2015')
#Formats
formats <- guess_formats(vecdate, c("dmY"))
dates <- as.Date(vecdate, format=formats)
Output:
dates
[1] "2015-04-22" "2015-08-04" "2015-04-18" "2015-07-05" "2015-05-26" "2015-12-06" "2015-06-24"
[8] "2015-06-23" "2015-04-22" "2015-08-04" "2015-04-18" "2015-07-05" "2015-05-26" "2015-12-06"
[15] "2015-06-24" "2015-06-23"
Have you tried to use R package anytime for date conversion?
> library(anytime)
> datemix <- c("22-04-2015", "4/8/2015", "18-04-2015", "5/7/2015")
> anydate(datemix)
[1] NA "2015-04-08" NA "2015-05-07"
You notice that the first and third character dates came up with NA. This is because the package anytime as currently defined does not include this format "d-m-y". It is very easy to add this format with the addFormats() command as shown below
> addFormats("%d-%m-%Y")
> datemix <- c("22-04-2015", "4/8/2015", "18-04-2015", "5/7/2015")
> anydate(datemix)
[1] "2015-04-22" "2015-04-08" "2015-04-18" "2015-05-07"
The output dates are converted into ISO format of YYYY-MM-DD.
You can explore all the formats in anytime using getFormats() command. Here is the link to the anytime on R CRAN https://cran.r-project.org/web/packages/anytime/anytime.pdf
I have a problem with the as.date function.
I have a list of normal date shows in the excel, but when I import it in R, it becomes numbers, like 33584. I understand that it counts since a specific day. I want to set up my date in the form of "dd-mm-yy".
The original data is:
how the "date" variable looks like in r
I've tried:
as.date <- function(x, origin = getOption(date.origin)){
origin <- ifelse(is.null(origin), "1900-01-01", origin)
as.Date(date, origin)
}
and also simply
as.Date(43324, origin = "1900-01-01")
but none of them works. it shows the error: do not know how to convert '.' to class “Date”
Thank you guys!
The janitor package has a pair of functions designed to deal with reading Excel dates in R. See the following links for usage examples:
https://www.rdocumentation.org/packages/janitor/versions/2.0.1/topics/excel_numeric_to_date
https://www.rdocumentation.org/packages/janitor/versions/2.0.1/topics/convert_to_date
janitor::excel_numeric_to_date(43324)
[1] "2018-08-12"
I've come across excel sheets read in with readxl::read_xls() that read date columns in as strings like "43488" (especially when there is a cell somewhere else that has a non-date value). I use
xldate<- function(x) {
xn <- as.numeric(x)
x <- as.Date(xn, origin="1899-12-30")
}
d <- data.frame(date=c("43488"))
d$actual_date <- xldate(d$date)
print(d$actual_date)
# [1] "2019-01-23"
Dates are notoriously annoying. I would highly recommend the lubridate package for dealing with them. https://lubridate.tidyverse.org/
Use as_date() from lubridate to read numeric dates if you need to.
You can use format() to put it in dd-mm-yy.
library(lubridate)
date_vector <- as_date(c(33584, 33585), origin = lubridate::origin)
formatted_date_vector <- format(date_vector, "%d-%m-%y")
I have a monthly data file where dates are stored in %tm format of Stata like 2000m1. How I can convert it to dates?
I could do something like manipulate the strings into 2000-01-01 but I would like to avoid this if possible.
as.Date('2000m1') (unsurprisingly) returns NA.
1) yearmon Using the zoo package, this converts it to a "yearmon" class object which may make more sense than converting it to a "Date" given that you have no day of the month. Such objects are internally represented as a year + 0 for Jan, year + 1/12 for Feb, etc. so they sort properly.
library(zoo)
as.yearmon('2000m1', '%Ym%m')
## [1] "Jan 2000"
If you really want "Date" class then the following give the start and end of month respectively:
as.Date(as.yearmon('2000m1', '%Ym%m'))
## [1] "2000-01-01"
as.Date(as.yearmon('2000m1', '%Ym%m'), frac = 1)
[1] "2000-01-31"
2) paste This does not use any packages and while it does use paste it's a fairly minimal use of string manipulation:
as.Date(paste("2000m1", 1), "%Ym%m %d")
## [1] "2000-01-01"
Note: Be sure not to use any solution that returns a POSIXct object rather than a "yearmon" or "Date" object since then you have introduced the possibility of future potential errors based on time zones into your code which can be completely avoided by using an appropriate class. See the R Help Desk article in R News 4/1.
This can be done very easily with the amazing lubridate package:
data <- c("2001m1","2010m3","2015m12","2009m8")
library(lubridate)
parse_date_time(data,orders="%Y%m"):
[1] "2001-01-01 UTC" "2010-03-01 UTC" "2015-12-01 UTC" "2009-08-01 UTC"
Sometimes I am given data sets that has two different date formats but common variables that have to been joined into one dataframe. Over the years, I've tried various solutions to get around this workflow hassle. Now that I've been using lubridate, it seems like many of these problems are easily solved. However, I am encountering some behaviour that seems weird to me though I imagine there is a good explanation that is beyond me. Say I am given a data set with different date formats that I join into one data frame. This dataframe looks like this:
library(ludridate)
library(dplyr)
df<-data.frame(Lab=c("A","B"),DATE=c("12/15/15","12/15/2013")); df
I want to convert this data to a date format with lubridate. However the following does not format consistently:
df %>%
mutate(mdy(DATE))
...but rather creates a 0015 date. If I filter just for Lab "A":
df %>%
filter(Lab=="A") %>%
mutate(mdy(DATE))
... or even group_by Lab:
df %>%
group_by(Lab) %>%
mutate(mdy(DATE))
Then I get the desired year format. Is this the correct behaviour of the lubridate family of date formatting functions? Is there a better way to accomplish what I am doing? I am sure that multiple date formats in one column is a relatively common (and annoying) occurence.
Thanks in advance.
parse_date_time of lubridate package can help format multiple date formats in one go.
Syntax:
df$date = parse_date_time(df$date, c(format1, format2, format3))
You need to specify all the possible format types.
Since lubridate has some difficulty understanding (correctly) some format types, you need to make custom format.
In the help section , you will find the below illustration. You can recreate it to suit your requirement.
## ** how to use `select_formats` argument **
## By default %Y has precedence:
parse_date_time(c("27-09-13", "27-09-2013"), "dmy")
## [1] "13-09-27 UTC" "2013-09-27 UTC"
## to give priority to %y format, define your own select_format function:
my_select <- function(trained){
n_fmts <- nchar(gsub("[^%]", "", names(trained))) + grepl("%y", names(trained))*1.5
names(trained[ which.max(n_fmts) ])
}
parse_date_time(c("27-09-13", "27-09-2013"), "dmy", select_formats = my_select)
## '[1] "2013-09-27 UTC" "2013-09-27 UTC"
From the help on parse_date_time:
## ** how to use select_formats **
## By default %Y has precedence:
parse_date_time(c("27-09-13", "27-09-2013"), "dmy")
## [1] "13-09-27 UTC" "2013-09-27 UTC"
## to give priority to %y format, define your own select_format function:
my_select <- function(trained){
n_fmts <- nchar(gsub("[^%]", "", names(trained))) + grepl("%y", names(trained))*1.5
names(trained[ which.max(n_fmts) ])
}
parse_date_time(c("27-09-13", "27-09-2013"), "dmy", select_formats = my_select)
## '[1] "2013-09-27 UTC" "2013-09-27 UTC"