I'm running into trouble converting a year + month string containing week number 53 using the as.Date() function in R.
The code works for the top example for week number 52 but returns NA for the bottom example for week number 53.
a <- "2017521"
as.Date(a, '%Y%W%u')
"2017-12-25"
b <- "2017531"
as.Date(b, '%Y%W%u')
NA
You're getting NA for b <- "2017531" because you're trying to reference a date that did not exist.
This has to do with the way you formatted your date, and the way the calendar is initiated.
%W refers to the numerical week 00-53
%u refers to the day of the week 1-7 Monday is 1
b <- "2017531"
as.Date(b, '%Y%W%u')
# [1] NA
Week 53 day 1 would refer to Monday of the 53rd week. But the only day of the week that occurred on the 53rd week of 2017 was Sunday.
c <- "2017537"
as.Date(a, '%Y%W%u')
# [1] "2017-12-31"
You can further confirm this by checking the date Saturday of week 52:
d <- "2017526"
as.Date(a, '%Y%W%u')
# [1] "2017-12-30"
Related
The following code results in NA. Why?:
as.Date(paste(2015,53,1),"%Y %W %u")
The year 2015 had 53 weeks so the date should be totally fine.
From ?strptime:
%W
Week of the year as decimal number (00–53) using Monday as the first
day of week (and typically with the first Monday of the year as day 1
of week 1). The UK convention.
The decimal number of the week of the year %W thus ranges from 0-53. Number one is considered to be the week with the first Monday. 2015 started with a Thursday (week 0). So there were 53 weeks in 2015, but the range of weeks went from 0 to 52, and not from 1 to 53.
> as.Date(paste(2015,0,1),"%Y %W %u")
[1] "2015-01-05"
> as.Date(paste(2015,52,1),"%Y %W %u")
[1] "2015-12-28"
The weeks are actually counted 52 in 2015.
strftime(as.Date("2015-12-31"), "%W")
# [1] "52"
The remaining days are counted in 2016 as week 0.
strftime(as.Date("2016-01-01"), "%W")
# [1] "00"
For instance Jan, 1 of 2018 is in week 1 (because 2017 probably had 52 weeks).
strftime(as.Date("2018-01-01"), "%W")
# [1] "01"
I want to know how to find out which part of string is month and which part of string is day while parsing dates.
The problem is 01-06-2017 can be 1 June or it can be 6 January. How to parse it correctly. In India we write dates as Day Month Year mostly, in west it is Month Day Year mostly, when I have mixed data how do I impute which is the month and which is the day
because the data is not clean enough, it sometimes have dates in mdy and sometimes in dmy format and if the number is less than 12, it is difficult to know if it is a day or a month
11/1/11 can be 11 Jan 2011 or 1 November 2011
Example
I am using lubridate package and I have dates in this format
library(lubridate)
fundates2=c("1Apr2017","12-30-2017","1/6/17")
fun3=dmy(fundates2)
## Warning: 1 failed to parse.
fun3
## [1] "2017-04-01" NA "2017-06-01"
fun4=mdy(fundates2)
## Warning: 1 failed to parse.
fun4
## [1] NA "2017-12-30" "2017-01-06"
Well, you have yo know from your context which one is the correct.
To check which one your date is you can simply add 1 day to it:
In fun3:
fun3 + 1
[1] "2017-04-02" NA "2017-06-02"
You can see that the month is the 06.
In fun4:
fun4 + 1
[1] NA "2017-12-31" "2017-01-07"
You can see the month is 01
I have dates in year month day format that I want to convert to year month week format like so:
date dateweek
2015-02-18 -> 2015-02-8
2015-02-19 -> 2015-02-8
2015-02-20 -> ....
2015-02-21
2015-02-22
2015-02-23
2015-02-24 ...
2015-02-25 -> 2015-02-9
2015-02-26 -> 2015-02-9
2015-02-27 -> 2015-02-9
I tried
data$dateweek <- week(as.POSIXlt(data$date))
but that returns only weeks without the corresponding year and month.
I also tried:
data$dateweek <- as.POSIXct('2015-02-18')
data$dateweek <- format(data$dateweek, '%Y-%m-%U')
# data$dateweek <- format(as.POSIXct(data$date), '%Y-%m-%U')
but the corresponding columns look strange:
date datetime
2015-01-01 2015-01-00
2015-01-02 2015-01-00
2015-01-03 2015-01-00
2015-01-04 2015-01-01
2015-01-05 2015-01-01
2015-01-06 2015-01-01
2015-01-07 2015-01-01
2015-01-08 2015-01-01
2015-01-09 2015-01-01
2015-01-10 2015-01-01
2015-01-11 2015-01-02
You need to use the '%Y-%m-%V format to change it:
mydate <- as.POSIXct('2015-02-18')
> format(mydate, '%Y-%m-%V')
[1] "2015-02-08"
From the documentation strptime:
%V
Week of the year as decimal number (00–53) as defined in ISO 8601. If the week (starting on Monday) containing 1 January has four or more days in the new year, then it is considered week 1. Otherwise, it is the last week of the previous year, and the next week is week 1. (Accepted but ignored on input.)
and there is also (The US convention) :
%U
Week of the year as decimal number (00–53) using Sunday as the first day 1 of the week (and typically with the first Sunday of the year as day 1 of week 1). The US convention.
It really depends on which one you want to use for your case.
mydate <- as.POSIXct('2015-02-18')
> format(mydate, '%Y-%m-%U')
[1] "2015-02-07"
In your case you should do:
data$dateweek <- format(as.POSIXct(data$date), '%Y-%m-%U')
Fisheries data is often collected by statistical weeks that start January 1st every year. The second week starts on the following Sunday each year.
So in 2013 Jan. 1st to Jan. 5 was week 1 and Jan. 6 to Jan.12 was week two. I am trying to calculate the statical week given a date for a number of years. My data is just dates in d-m-y format (i.e 16-6-1990) and I want a statistical week output in R code.
An example would be:
> d <- as.Date(c("01-01-2013","06-01-2013","01-01-2006","08-01-2006"),"%d-%m-%Y")
And the desired result would be:
> statweek(d)
[1] 1 2 1 2
Try this:
> d <- as.Date("01-01-2013", "%d-%m-%Y") + 0:7 # first 8 days of 2013
> d
[1] "2013-01-01" "2013-01-02" "2013-01-03" "2013-01-04" "2013-01-05"
[6] "2013-01-06" "2013-01-07" "2013-01-08"
>
> ufmt <- function(x) as.numeric(format(as.Date(x), "%U"))
> ufmt(d) - ufmt(cut(d, "year")) + 1
[1] 1 1 1 1 1 2 2 2
Note: The first Sunday in the year is defined as the start of week 1 by %U which means that if the year does not start on Sunday then we must add 1 to the week so that the first week is week 1 rather than week 0. ufmt(cut(d, "year")) equals one if d's year starts on Sunday and zero otherwise so the formula above reduces to ufmt(d) if d's year starts on Sunday and ufmt(d)+1 if not.
UPDATE: corrections so Jan starts at week 1 even if year starts on a Sunday, e.g. 2006.
Here is the statweek function. The main argument can be a character vector of dates (the default after reading a data.frame, for example). You can specify the format of the dates (has a default: format="%d-%m-%Y")
d1 <- c("01-01-2013","06-01-2013","01-01-2006","08-01-2006") # format="%d-%m-%Y"
d2 <- c("01/01/2013","06/01/2013","01/01/2006","08/01/2006") # format="%d/%m/%Y"
statweek = function(dates, format="%d-%m-%Y", ...) {
# convert to Date
dates = as.Date(dates, format=format, ...)
# get correction for the first week of the year (0 if 1-Jan not a Sunday)
firstweek = 1 - as.numeric(format(as.Date(cut(dates, "year")), "%U"))
output = as.numeric(format(dates, "%U")) + firstweek
return(output)
}
And the examples:
statweek(d1)
[1] 1 2 1 2
statweek(d1, format="%d-%m-%Y")
[1] 1 2 1 2
statweek(d2, format="%d/%m/%Y")
[1] 1 2 1 2
Is there a good way to get a year + week number converted a date in R? I have tried the following:
> as.POSIXct("2008 41", format="%Y %U")
[1] "2008-02-21 EST"
> as.POSIXct("2008 42", format="%Y %U")
[1] "2008-02-21 EST"
According to ?strftime:
%Y Year with century. Note that whereas there was no zero in the
original Gregorian calendar, ISO 8601:2004 defines it to be valid
(interpreted as 1BC): see http://en.wikipedia.org/wiki/0_(year). Note
that the standard also says that years before 1582 in its calendar
should only be used with agreement of the parties involved.
%U Week of the year as decimal number (00–53) using Sunday as the
first day 1 of the week (and typically with the first Sunday of the
year as day 1 of week 1). The US convention.
This is kinda like another question you may have seen before. :)
The key issue is: what day should a week number specify? Is it the first day of the week? The last? That's ambiguous. I don't know if week one is the first day of the year or the 7th day of the year, or possibly the first Sunday or Monday of the year (which is a frequent interpretation). (And it's worse than that: these generally appear to be 0-indexed, rather than 1-indexed.) So, an enumerated day of the week needs to be specified.
For instance, try this:
as.POSIXlt("2008 42 1", format = "%Y %U %u")
The %u indicator specifies the day of the week.
Additional note: See ?strptime for the various options for format conversion. It's important to be careful about the enumeration of weeks, as these can be split across the end of the year, and day 1 is ambiguous: is it specified based on a Sunday or Monday, or from the first day of the year? This should all be specified and tested on the different systems where the R code will run. I'm not certain that Windows and POSIX systems sing the same tune on some of these conversions, hence I'd test and test again.
Day-of-week == zero in the POSIXlt DateTimesClasses system is Sunday. Not exactly Biblical and not in agreement with the R indexing that starts at "1" convention either, but that's what it is. Week zero is the first (partial) week in the year. Week one (but day of week zero) starts with the first Sunday. And all the other sequence types in POSIXlt have 0 as their starting point. It kind of interesting to see what coercing the list elements of POSIXlt objects do. The only way you can actually change a POSIXlt date is to alter the $year, the $mon or the $mday elements. The others seem to be epiphenomena.
today <- as.POSIXlt(Sys.Date())
today # Tuesday
#[1] "2012-02-21 UTC"
today$wday <- 0 # attempt to make it Sunday
today
# [1] "2012-02-21 UTC" The attempt fails
today$mday <- 19
today
#[1] "2012-02-19 UTC" Success
I did not come up with this myself (it's taken from a blog post by Forester), but nevertheless I thought I'd add this to the answer list because it's the first implementation of the ISO 8601 week number convention that I've seen in R.
No doubt, week numbers are a very ambiguous topic, but I prefer an ISO standard over the current implementation of week numbers via format(..., "%U") because it seems that this is what most people agreed on, at least in Germany (calendars etc.).
I've put the actual function def at the bottom to facilitate focusing on the output first. Also, I just stumbled across package ISOweek, maybe worth a try.
Approach Comparison
x.days <- c("Mon", "Tue", "Wed", "Thu", "Fri", "Sat", "Sun")
x.names <- sapply(1:length(posix), function(x) {
x.day <- as.POSIXlt(posix[x], tz="Europe/Berlin")$wday
if (x.day == 0) {
x.day <- 7
}
out <- x.days[x.day]
})
data.frame(
posix,
name=x.names,
week.r=weeknum,
week.iso=ISOweek(as.character(posix), tzone="Europe/Berlin")$weeknum
)
# Result
posix name week.r week.iso
1 2012-01-01 Sun 1 4480458
2 2012-01-02 Mon 1 1
3 2012-01-03 Tue 1 1
4 2012-01-04 Wed 1 1
5 2012-01-05 Thu 1 1
6 2012-01-06 Fri 1 1
7 2012-01-07 Sat 1 1
8 2012-01-08 Sun 2 1
9 2012-01-09 Mon 2 2
10 2012-01-10 Tue 2 2
11 2012-01-11 Wed 2 2
12 2012-01-12 Thu 2 2
13 2012-01-13 Fri 2 2
14 2012-01-14 Sat 2 2
15 2012-01-15 Sun 3 2
16 2012-01-16 Mon 3 3
17 2012-01-17 Tue 3 3
18 2012-01-18 Wed 3 3
19 2012-01-19 Thu 3 3
20 2012-01-20 Fri 3 3
21 2012-01-21 Sat 3 3
22 2012-01-22 Sun 4 3
23 2012-01-23 Mon 4 4
24 2012-01-24 Tue 4 4
25 2012-01-25 Wed 4 4
26 2012-01-26 Thu 4 4
27 2012-01-27 Fri 4 4
28 2012-01-28 Sat 4 4
29 2012-01-29 Sun 5 4
30 2012-01-30 Mon 5 5
31 2012-01-31 Tue 5 5
Function Def
It's taken directly from the blog post, I've just changed a couple of minor things. The function is still kind of sketchy (e.g. the week number of the first date is far off), but I find it to be a nice start!
ISOweek <- function(
date,
format="%Y-%m-%d",
tzone="UTC",
return.val="weekofyear"
){
##converts dates into "dayofyear" or "weekofyear", the latter providing the ISO-8601 week
##date should be a vector of class Date or a vector of formatted character strings
##format refers to the date form used if a vector of
## character strings is supplied
##convert date to POSIXt format
if(class(date)[1]%in%c("Date","character")){
date=as.POSIXlt(date,format=format, tz=tzone)
}
# if(class(date)[1]!="POSIXt"){
if (!inherits(date, "POSIXt")) {
print("Date is of wrong format.")
break
}else if(class(date)[2]=="POSIXct"){
date=as.POSIXlt(date, tz=tzone)
}
print(date)
if(return.val=="dayofyear"){
##add 1 because POSIXt is base zero
return(date$yday+1)
}else if(return.val=="weekofyear"){
##Based on the ISO8601 weekdate system,
## Monday is the first day of the week
## W01 is the week with 4 Jan in it.
year=1900+date$year
jan4=strptime(paste(year,1,4,sep="-"),format="%Y-%m-%d")
wday=jan4$wday
wday[wday==0]=7 ##convert to base 1, where Monday == 1, Sunday==7
##calculate the date of the first week of the year
weekstart=jan4-(wday-1)*86400
weeknum=ceiling(as.numeric((difftime(date,weekstart,units="days")+0.1)/7))
#########################################################################
##calculate week for days of the year occuring in the next year's week 1.
#########################################################################
mday=date$mday
wday=date$wday
wday[wday==0]=7
year=ifelse(weeknum==53 & mday-wday>=28,year+1,year)
weeknum=ifelse(weeknum==53 & mday-wday>=28,1,weeknum)
################################################################
##calculate week for days of the year occuring prior to week 1.
################################################################
##first calculate the numbe of weeks in the previous year
year.shift=year-1
jan4.shift=strptime(paste(year.shift,1,4,sep="-"),format="%Y-%m-%d")
wday=jan4.shift$wday
wday[wday==0]=7 ##convert to base 1, where Monday == 1, Sunday==7
weekstart=jan4.shift-(wday-1)*86400
weeknum.shift=ceiling(as.numeric((difftime(date,weekstart)+0.1)/7))
##update year and week
year=ifelse(weeknum==0,year.shift,year)
weeknum=ifelse(weeknum==0,weeknum.shift,weeknum)
return(list("year"=year,"weeknum"=weeknum))
}else{
print("Unknown return.val")
break
}
}