Parsing a string containing day of the week into date in R - r

I have a set of dates represented as strings that have the following format:
dates_strings = c("Monday 27 March 2017", "Friday 24 March 2017" , "Wednesday 22 March 2017", "Monday 20 March 2017" , "Wednesday 15 March 2017")
My aim is to parse these strings into date format. I have tried anytime() and something like as.Date(dates_strings, format = "%A% %d %m %Y"). I wonder whether there is a lubridate-type solution similar to dmy() that would consider the day of the week as well.

You need to use "%B" for the month name and not "%m"
dates_strings = c("Monday 27 March 2017", "Friday 24 March 2017" , "Wednesday 22 March 2017", "Monday 20 March 2017" , "Wednesday 15 March 2017")
as.Date(dates_strings, format = "%A %d %B %Y")
[1] "2017-03-27" "2017-03-24" "2017-03-22" "2017-03-20" "2017-03-15"

We can use
as.Date(dates_strings, "%a %d %B %Y")
#[1] "2017-03-27" "2017-03-24" "2017-03-22" "2017-03-20" "2017-03-15"

Related

Converting irregular dates in R and the Tidyverse

I have a series of dates as follows
25 September 2019
27 April 2020
1994
28 February 2021
1986
Now I want to convert the 1994 and 1996 to:
01 January 1994
01 January 1986
Other full dates should be left as they are.
Any help is appreciated especially using the tidyverse way.
A regex solution, which identifies the "only-year" values using the anchors ^ (for string start position) and $ (for string end position) as well as backreference \\1 to recollect the "only-year" values:
library(dplyr)
df %>%
mutate(dates = sub("^(\\d{4})$", "01 January \\1", dates))
dates
1 25 September 2019
2 27 April 2020
3 01 January 1994
4 28 February 2021
5 01 January 1986
base R:
df$dates <- sub("^(\\d{4})$", "01 January \\1", df$dates)
Data:
df <- data.frame(
dates = c("25 September 2019",
"27 April 2020",
"1994",
"28 February 2021",
"1986")
)
Given some vector d of dates and years:
> d
[1] "25 September 2019" "27 April 2020" "1994"
[4] "28 February 2021" "1986"
Replace any entries with only 4 letters with those four letters with "01 January" pasted in front:
> d[nchar(d)==4] = paste0("01 January ",d[nchar(d)==4])
Giving:
> d
[1] "25 September 2019" "27 April 2020" "01 January 1994"
[4] "28 February 2021" "01 January 1986"

As.Date returns error when applied to column

I have a dataset with about 20000 observations. I need to convert one of the columns to a different date format.
head(df$created_at)
[1] Tue Mar 31 13:42:58 +0000 2020 Sat Mar 14 05:15:56 +0000 2020
[3] Sun Apr 05 14:02:10 +0000 2020 Tue Mar 24 09:06:12 +0000 2020
[5] Tue Apr 28 01:14:28 +0000 2020 Thu Oct 24 18:47:10 +0000 2019
I can apply as.date to an individual row:
as.Date(df$created_at[1], format = '%a %b %d %H:%M:%S %z %Y')
[1] "2020-03-31
But when I try to use as.Date on the entire column, I get:
df$dates = as.Date(df$created_at, format = '%a %b %d %H:%M:%S %z %Y')
Error in strptime(x, format, tz = "GMT") : input string is too long
What am I doing wrong? Is there another command I'm missing here?
(Too long for a comment.)
It works fine for the data you've shown us. There must be something wrong later in your column. You could locate the problem by trying the command on subsets of your data, e.g. tmp <- as.Date(df[1:(round(nrow(df)/2)), "created_at", ...) - then bisect to find the problem, e.g. if the problem doesn't occur in the first half of the data set then try rows 1:(round(0.75*nrow(df))) and so on ...
You could also try plotting nchar(df$created_at) to see if anything pops out.
df <- data.frame(created_at=c(
"Tue Mar 31 13:42:58 +0000 2020 Sat Mar 14 05:15:56 +0000 2020",
"Sun Apr 05 14:02:10 +0000 2020 Tue Mar 24 09:06:12 +0000 2020",
"Tue Apr 28 01:14:28 +0000 2020 Thu Oct 24 18:47:10 +0000 2019"))
df$dates = as.Date(df$created_at, format = '%a %b %d %H:%M:%S %z %Y')
Absent issues with your data as alluded to by Ben, here is a solution using parse_date_time from the lubridate package which parses the date variable into POSIXct date-time.
df <- tibble(date = c("Tue Mar 31 13:42:58 +0000 2020",
"Sun Apr 05 14:02:10 +0000 2020",
"Tue Apr 28 01:14:28 +0000 2020",
"Sat Mar 14 05:15:56 +0000 2020",
"Tue Mar 24 09:06:12 +0000 2020",
"Thu Oct 24 18:47:10 +0000 2019"))
library(lubridate)
df$date <- parse_date_time(df$date, "%a %b %d %H:%M:%S %z %Y")
date
<dttm>
1 2020-03-31 13:42:58
2 2020-04-05 14:02:10
3 2020-04-28 01:14:28
4 2020-03-14 05:15:56
5 2020-03-24 09:06:12
6 2019-10-24 18:47:10
Created on 2020-11-13 by the reprex package (v0.3.0)

How to display date in month and year format

I have a date field called date.timestamp in database,which has date values like "Fri Nov 27 20:17:01 IST 2015" .There are lot of records having date.timestamp field. I need to display it as Nov 2015 for all those records in database. How can I do that?
Assuming IST is the Indian Standard Time, which is 5:30 hours ahead of Coordinated Universal Time, one would substitute IST for +0530 and use the format %z of strptime.
vec <- "Fri Nov 27 20:17:01 IST 2015"
format(strptime(sub("IST", "+0530", vec), "%a %b %d %H:%M:%S %z %Y"), "%b %Y")
# [1] "Nov 2015"
vec <- c("Fri Nov 27 20:17:01 IST 2015","Mon Nov 30 20:17:01 IST 2015")
format(strptime(sub("IST", "+0530", vec), "%a %b %d %H:%M:%S %z %Y"), "%b %Y")
# [1] "Nov 2015" "Nov 2015"

How to list all months between two dates in R

I'm trying to create a function that get all the months between two months into a list:
date1<- 201305
date2<- 201511
months <- function(date1,date2){}
And I want it return a list like this:
201305
201306
201307
...
201509
201510
201511
First, we need to create dates. What you supply is not yet a date as it misses a day--so we add one:
R> d1 <- as.Date(paste0("201305","01"), "%Y%m%d")
R> d2 <- as.Date(paste0("201511","01"), "%Y%m%d")
Given two dates, getting a sequence of dates is trivial: a call to seq(). Equally trivial to format in the way you want:
R> dat <- format(seq(d1,d2,by="month"), "%Y%m")
We check the beginning and end:
R> head(dat)
[1] "201305" "201306" "201307" "201308" "201309" "201310"
R> tail(dat)
[1] "201506" "201507" "201508" "201509" "201510" "201511"
R>
Now, as a function:
datseq <- function(t1, t2) {
format(seq(as.Date(paste0(t1,"01"), "%Y%m%d"),
as.Date(paste0(t2,"01"), "%Y%m%d"),by="month"),
"%Y%m")
}
This can be done using the yearmon class in the zoo package:
library(zoo)
ym1 <- as.yearmon(as.character(date1), "%Y%m") # convert to yearmon
ym2 <- as.yearmon(as.character(date2), "%Y%m") # ditto
s <- seq(ym1, ym2, 1/12) # create yearmon sequence
as.numeric(format(s, "%Y%m")) # convert to numeric yyyymm
giving:
[1] 201305 201306 201307 201308 201309 201310 201311 201312 201401 201402
[11] 201403 201404 201405 201406 201407 201408 201409 201410 201411 201412
[21] 201501 201502 201503 201504 201505 201506 201507 201508 201509 201510
[31] 201511
or you might prefer to use s which is a yearmon class variable which looks like this but sorts correctly and can be used in plotting:
> s
[1] "May 2013" "Jun 2013" "Jul 2013" "Aug 2013" "Sep 2013" "Oct 2013"
[7] "Nov 2013" "Dec 2013" "Jan 2014" "Feb 2014" "Mar 2014" "Apr 2014"
[13] "May 2014" "Jun 2014" "Jul 2014" "Aug 2014" "Sep 2014" "Oct 2014"
[19] "Nov 2014" "Dec 2014" "Jan 2015" "Feb 2015" "Mar 2015" "Apr 2015"
[25] "May 2015" "Jun 2015" "Jul 2015" "Aug 2015" "Sep 2015" "Oct 2015"
[31] "Nov 2015"
For example this works:
plot(seq(31) ~ s)
I know this is an old post but incase someone was having a similar issue to me I hope this can be helpful. If you're looking for the months between 2 dates, lets say
d1 = as.Date("2022-01-25")
d2 = as.Date("2022-02-13")
seq(as.Date(d1), as.Date(d2), by = "month")
[1] "2022-01-25"
It does not give back the desired output of both January and February. I noticed this is because the 'day' portion of the date needs to be the same (or less than) for the first date.
What I opted to do was mutate the first date as so:
d1 = paste0(substr(d1, 1,7), "-01")
seq(as.Date(d1), as.Date(d2), by = "month")
[1] "2022-01-01" "2022-02-01"
This way you will always get the year months that you desire but not necessarily the day portion.

how to convert a string like "Sat Mar 17 11:27:57 +0000 2012" into date in R

The question is like the title:
The date date i got is in the format like "Sat Mar 17 11:27:57 +0000 2012".
How could i convert it into R's date data?
You just need to specify the correct format (as documented in strptime):
fmt <- "%a %b %d %H:%M:%S %z %Y"
# POSIXct
as.POSIXct("Sat Mar 17 11:27:57 +0000 2012", format=fmt, tz="UTC")
[1] "2012-03-17 11:27:57 UTC"
# POSIXlt
strptime("Sat Mar 17 11:27:57 +0000 2012", format=fmt, tz="UTC")
[1] "2012-03-17 16:27:57 UTC"

Resources