I have week-date data in the form yyyy-ww where wwis the week number in two digits. The data span 2007-01 to 2010-30. The week counting convention is ISO 8601, which as you can see here on Wikipedia's "Week number" article, occasionally reaches 53 weeks in a year. For example 2009 had 53 weeks by this system, see the week numbers in this ISO 8601 calendar. (See other years; as per the Wikipedia article, 53rd weeks are fairly rare.)
Basically I want to read the week date in, convert it to a Date object and save this to a separate column in a data.frame. As a test, I reconverted the Date objects to yyyy-ww formats by format([Date-object], format = "%Y-%W", and this threw up an error at 2009-53. That week fails to be interpreted as a date by R. This is very odd, as other years which do not have a 53rd week (in ISO 8601 standard) are converted fine, such as 2007-53, whereas other years that also do not have a 53rd week (in ISO 8601 standard) also fail, such as 2008-53
The following minimal example demonstrates the issue.
Minimal example:
dates <- c("2009-50", "2009-51", "2009-52", "2009-53", "2010-01", "2010-02")
as.Date(x = paste(dates, 1), format = "%Y-%W %w")
# [1] "2009-12-14" "2009-12-21" "2009-12-28" NA "2010-01-04"
# [6] "2010-01-11"
other.dates <- c("2007-53", "2008-53", "2009-53", "2010-53")
as.Date(x = paste(other.dates, 1), format = "%Y-%W %w")
# [1] "2007-12-31" NA NA NA
The question is, how do I get R to accept week numbers in ISO 8601 format?
Note: This question summarises a problem I have been struggling with for a few hours. I have searched and found various helpful posts such as this, but none solved the problem.
The package ISOweek manages ISO 8601 style week numberings, converting to and from Date objects in R. See ISOweek for more. Continuing the example dates above, we first need to modify the formatting a bit. They must be in form yyyy-Www-w rather than yyyy-ww, i.e. 2009-W53-1. The final digit identifies which day of the week to use in identifying the week, in this case it is the Monday. The week number must be two-digit.
library(ISOweek)
dates <- c("2009-50", "2009-51", "2009-52", "2009-53", "2010-01", "2010-02")
other.dates <- c("2007-53", "2008-53", "2009-53", "2010-53")
dates <- sub("(\\d{4}-)(\\d{2})", "\\1W\\2-1", dates)
other.dates <- sub("(\\d{4}-)(\\d{2})", "\\1W\\2-1", other.dates)
## Check:
dates
# [1] "2009-W50-1" "2009-W51-1" "2009-W52-1" "2009-W53-1" "2010-W01-1"
# [6] "2010-W02-1"
(iso.date <- ISOweek2date(dates)) # deal correctly
# [1] "2009-12-07" "2009-12-14" "2009-12-21" "2009-12-28" "2010-01-04"
# [6] "2010-01-11"
(iso.other.date <- ISOweek2date(other.dates)) # also deals with this
# [1] "2007-12-31" "2008-12-29" "2009-12-28" "2011-01-03"
## Check that back-conversion works:
all(date2ISOweek(iso.date) == dates)
# [1] TRUE
## This does not work for the others, since the 53rd week of
## e.g. 2008 is back-converted to the first week of 2009, in
## line with the ISO 6801 standard.
date2ISOweek(iso.other.date) == other.dates
# [1] FALSE FALSE TRUE FALSE
Related
I have a vector of date strings in the form month_name-2_digit_year i.e.
a = rbind("April-21", "March-21", "February-21", "January-21")
I'm trying to convert that vector into a vector of date objects. I'm aware this question is very similar to this: Convert non-standard date format to date in R posted some years ago, but unfortunately, it has not answered my question.
I have tried the following as.Date() calls to do this, but it just returns a vector of NA. I.e.
b = as.Date(a, format = "%B-%y")
b = as.Date(a, format = "%B%y")
b = as.Date(a, "%B-%y")
b = as.Date(a, "%B%y")
I'm also attempted to do it using the convertToDate function from the openxlsx package:
b = convertToDate(a, format = "%B-%y")
I have also tried all the above but using a single character string rather than a vector, but that produced the same issue.
I'm a little lost as to why this isn't working, as this format has worked in reverse earlier in my script (that is, I had a date object already in dd-mm-yyyy format and converted it to month_name-yy using %B-%y). Is there another way to go from string to date when the string is a non-standard (anything other than dd-mm-yyy or mm-dd-yy if you're in the US) date format?
For the record my R locales are all UK and english.
Thanks in advance.
A Date must have all three of day, month and year. Convert to yearmon class which requires only month and year and then to Date as in (1) and (2) below or add the day as in (3).
(1) and (3) give first of month and (2) gives the end of the month.
(3) uses only functions from base R.
Also consider not converting to Date at all but just use yearmon objects instead since they directly represent a year and month which is what the input represents.
library(zoo)
# test input
a <- c("April-21", "March-21", "February-21", "January-21")
# 1
as.Date(as.yearmon(a, "%B-%y"))
## [1] "2021-04-01" "2021-03-01" "2021-02-01" "2021-01-01"
# 2
as.Date(as.yearmon(a, "%B-%y"), frac = 1)
## [1] "2021-04-30" "2021-03-31" "2021-02-28" "2021-01-31"
# 3
as.Date(paste(1, a), "%d %B-%y")
## [1] "2021-04-01" "2021-03-01" "2021-02-01" "2021-01-01"
In addition to zoo, which #G. Grothendieck mentioned, you can also use clock or lubridate.
clock supports a variable precision calendar type called year_month_day. In this case you'd want "month" precision, then you can set the day to whatever you'd like and convert back to Date.
library(clock)
x <- c("April-21", "March-21", "February-21", "January-21")
ymd <- year_month_day_parse(x, format = "%B-%y", precision = "month")
ymd
#> <year_month_day<month>[4]>
#> [1] "2021-04" "2021-03" "2021-02" "2021-01"
# First of month
as.Date(set_day(ymd, 1))
#> [1] "2021-04-01" "2021-03-01" "2021-02-01" "2021-01-01"
# End of month
as.Date(set_day(ymd, "last"))
#> [1] "2021-04-30" "2021-03-31" "2021-02-28" "2021-01-31"
The simplest solution may be to use lubridate::my(), which parses strings in the order of "month then year". That assumes that you want the first day of the month, which may or may not be correct for you.
library(lubridate)
x <- c("April-21", "March-21", "February-21", "January-21")
# Assumes first of month
my(x)
#> [1] "2021-04-01" "2021-03-01" "2021-02-01" "2021-01-01"
I have some dates in a dataframe, and when I use as.Date() to convert them into dates, the years convert into 2020, which isn't really valid because the file only has data up to 2018.
What I have so far:
> fechadeinsc1[2]
[1] "2020-08-15"
> class(fechadeinsc1)
[1] "Date"
> fechainsc[2]
[1] "2017/99/99"
> class(fechainsc)
[1] "character"
As you can see, fechadeinsc1 was converted into a date and fechainsc is the original dataframe which elements are characters. "fechadeinsc1" should give the same year, shouldn't it? Even though days and months aren't valid.
Another example:
> fechadenac1[2]
[1] "2020-12-31"
> class(fechadenac1)
[1] "Date"
> fechanac[2]
[1] "12/31/2016"
> class(fechanac)
[1] "character"
Again, the year changes.
My code:
fechanac <- dat$fecha_nac
fechainsc <- dat$fecha_insc
fechadeinsc1 <- as.Date(fechainsc,tryFormats =c("%d/%m/%y","%m/%d/%y","%y","%d%m%y","%m%d%y"))
fechadenac1 <- as.Date(fechanac,tryFormats =c("%d/%m/%y","%m/%d/%y","%y","%d%m%y","%m%d%y"))
"dat" is the original dataframe which contains information about newborns registered in 2016 and 2017 in Ecuador, if anyone wants the original .csv file please contact me.
Based on strptime, referred from as.Date, you should use upper case Y for 4-digit years:
%y Year without century (00--99). On input, values 00 to 68 are prefixed by 20 and 69 to 99 by 19 -- that is the behaviour specified by the 2004 and 2008 POSIX standards, but they do also say ‘it is expected that in a future version the default century inferred from a 2-digit year will change’.
%Y Year with century. [...]
I am trying to add 52 weeks to a date variable which is in YYYYWW format. my initial date is 201616 (year 2016 and week 16) and i am trying to add 52 weeks to this date and the expected output is 201715.
I tried couple of things but no luck, here is what i tried so far
date <- as.Date(as.character(201616), "%Y%W")
seq(date, by = "1 week", length.out = 52)
I would greatly appreciate your input. Many Thanks for your time!
The problem is that there are 7 days in week #16 2016. You need to specify a day to convert it to a date that can be used to add days. In the code below %u indicates first day of the week. You can then add 52 weeks to this number.
date1 <- as.Date("201616 1", format = "%Y%U %u")
format(date1+(52*7), "%Y%U")
[1] "201716"
I'm not sure that as.Date can take %Y%W and generate a unique value. It appears to be populating date with the current month and day. If instead we specify a date in the 16th week:
date <- as.Date("2016-04-23")
and format that in your style
format(date, "%Y%W")
[1] "201616"
we can generate a sequence of 52 values from this
newdate_seq <- seq(date, by = "1 week", length.out = 52)
and change those to your format too
format(newdate_seq, "%Y%W")
[1] "201616" "201617" "201618" "201619" "201620" "201621" "201622" "201623" "201624" "201625" "201626" "201627"
[13] "201628" "201629" "201630" "201631" "201632" "201633" "201634" "201635" "201636" "201637" "201638" "201639"
[25] "201640" "201641" "201642" "201643" "201644" "201645" "201646" "201647" "201648" "201649" "201650" "201651"
[37] "201652" "201701" "201702" "201703" "201704" "201705" "201706" "201707" "201708" "201709" "201710" "201711"
[49] "201712" "201713" "201714" "201715"
which ends where you expect.
FYI, for next time, try highlighting what caused you to think there was "no luck" -- what errors did you produce, what results did you produce and how did they differ from what you expect to produce? Simply printing the date variable showed me that it wasn't doing what you expected.
I have dates encoded in a weekly time format (European convention >> 01 through 52/53, e.g. "2016-48") and would like to standardize them to a POSIX date:
require(magrittr)
(x <- as.POSIXct("2016-12-01") %>% format("%Y-%V"))
# [1] "2016-48"
as.POSIXct(x, format = "%Y-%V")
# [1] "2016-01-11 CET"
I expected the last statement to return "2016-12-01" again. What am I missing here?
Edit
Thanks to Dirk, I was able to piece it together:
y <- sprintf("%s-1", x)
While I still don't get why this doesn't work
(as.POSIXct(y, format = "%Y-%V-%u"))
# [1] "2016-01-11 CET"
this does
(as.POSIXct(y, format = "%Y-%U-%u")
# [1] "2016-11-28 CET"
Edit 2
Oh my, I think using %V is a very bad idea in general:
as.POSIXct("2016-01-01") %>% format("%Y-%V")
# [1] "2016-53"
Should this be considered to be on a "serious bug" level that requires further action?!
Sticking to either %U or %W seems to be the right way to go
as.POSIXct("2016-01-01") %>% format("%Y-%U")
# [1] "2016-00"
Edit 3
Nope, not quite finished/still puzzled: the approach doesn't work for the very first week
(x <- as.POSIXct("2016-01-01") %>% format("%Y-%W"))
# [1] "2016-00"
as.POSIXct(sprintf("%s-1", x), format = "%Y-%W-%u")
# [1] NA
It does for week 01 as defined in the underlying convention when using %U or %W (so "week 2", actually)
as.POSIXct("2016-01-1", format = "%Y-%W-%u")
# [1] "2016-01-04 CET"
As I have to deal a lot with reporting by ISO weeks, I've created the ISOweek package some years ago.
The package includes the function ISOweek2date() which returns the date of a given weekdate (year, week of the year, day of week according to ISO 8601). It's the inverse function to date2ISOweek().
With ISOweek, your examples become:
library(ISOweek)
# define dates to convert
dates <- as.Date(c("2016-12-01", "2016-01-01"))
# convert to full ISO 8601 week-based date yyyy-Www-d
(week_dates <- date2ISOweek(dates))
[1] "2016-W48-4" "2015-W53-5"
# convert back to class Date
ISOweek2date(week_dates)
[1] "2016-12-01" "2016-01-01"
Note that date2ISOweek() requires a full ISO week-based date in the format yyyy-Www-d including the day of the week (1 to 7, Monday to Sunday).
So, if you only have year and ISO week number you have to create a character string with a day of the week specified.
A typical phrase in many reports is, e.g., "reporting week 31 ending 2017-08-06":h
yr <- 2017
wk <- 31
ISOweek2date(sprintf("%4i-W%02i-%1i", yr, wk, 7))
[1] "2017-08-06"
Addendum
Please, see this answer for another use case and more background information on the ISOweek package.
I'm looking for a function that will get year + week number + Week day, and return a date, for example:
I would like to input the 3 following
2015
Monday
23
And get the desired output:
"2015-06-08"
After Searching in the Web, there seems to be equivalent question in other languages but not in R:
How to Get date from week number and year
Any help on that would be great!
Using strptime:
strptime("2015Monday23", "%Y%A%U")
# [1] "2015-06-08"
Or more generally
strptime(paste0(2015, "Monday", 23), "%Y%A%U")
There are two caveats here:
The result depends on the current locale.
In my locale "German_Germany.1252" (call Sys.getlocale("LC_TIME") to check your locale), strptime("2015Monday23", "%Y%A%U") returns NA.
The results depends on the convention for numbering the weeks of a year.
There are 3 conventions R is aware of: US, UK, and ISO 8601. See this answer for a detailed discussion. So, the convention to be used for conversion has to be specified.
Non-English locales
If you are in a non-english locale, you can deal with English weekday names (or month names, likewise) by temporarily changing the current locale:
Sys.setlocale("LC_TIME", "US")
#> [1] "English_United States.1252"
strptime("2015Monday23", "%Y%A%U")
#> [1] "2015-06-08 CEST"
Sys.setlocale("LC_TIME")
#> [1] "German_Germany.1252"
The lubridate package offers a more convenient way:
lubridate::parse_date_time("2015Monday23", "YAU", locale = "US")
#> [1] "2015-06-08 UTC"
Week of the year in different conventions
As the weekday is given by its name, the US and UK conventions return the same result:
lubridate::parse_date_time("2015Monday23", "YAU", locale = "US")
#> [1] "2015-06-08 UTC"
lubridate::parse_date_time("2015Monday23", "YAW", locale = "UK")
#> [1] "2015-06-08 UTC"
Unfortunately, the format specifiers for the ISO 8601 convention are not accepted on input. So, we reverse the process and format the resulting date as week of the year in the different conventions which shows a differing result for ISO 8601.
format(as.Date("2015-06-08 UTC"), "%Y-%W-%u") # UK convention
#> [1] "2015-23-1"
format(as.Date("2015-06-08 UTC"), "%Y-%U-%w") # US convention
#> [1] "2015-23-1"
format(as.Date("2015-06-08 UTC"), "%G-%V-%u") # ISO 8601
#> [1] "2015-24-1"