This question already has answers here:
Convert week number to date
(5 answers)
Closed 9 months ago.
I've got a Dataset that looks like this:
Year
Week
Cases
2010
1
2
2010
4
3
2010
5
5
2010
6
1
I would like to convert the Year-Week columns into a single timestamp column (dd/mm/yyyy). Day of the week could be the first or the last one.
Is there a simple way to solve this?
Best,
Daniel
The weeks function in lubridate and str_c function in stringr might provide it:
df <- tribble(~year, ~week, 2010,1,2010,4,2010,5,2010,6)
df_tbl <- df %>%
mutate(beg = ymd(str_c(year, "-01-01")),
date_var = beg + weeks(week))
df_tbl$date_var
If you count week 1 as starting on 1st January, you could do:
as.Date(paste(df$Year, 1, 1, sep = '-')) + 7 * (df$Week - 1)
#> [1] "2010-01-01" "2010-01-22" "2010-01-29" "2010-02-05"
If you count week 1 as starting on the first Monday of the year (as per ISO 8601) then you could use this little function:
year_week <- function(year, week) {
as.Date(unlist((Map(function(y, w) {
d <- which(lubridate::wday(as.Date(paste(y, 1, 1:7, sep = '-'))) == 2)
as.Date(paste(y, 1, d, sep = '-')) + (w - 1) * 7}, y = year, w = week))),
origin = '1970-01-01')
}
This will give you the date of the nth Monday in the year, so that we have:
year_week(df$Year, df$Week)
#> [1] "2010-01-04" "2010-01-25" "2010-02-01" "2010-02-08"
Related
This question already has answers here:
Concatenate a vector of strings/character
(8 answers)
Closed 5 years ago.
I have three variables: Year, Month, and Day. How can I merge them into one variable ("Date") so that the variable is represented as such:
yyyy-mm-dd
Thanks in advance and best regards!
How do you merge three variables into one variable?
Consider two methods:
Old school
With dplyr, lubridate, and data frames
And consider the data types. You can have:
Number or character
Date or POSIXct final type
Old School Method
The old school method is straightforward. I assume you are using vectors or lists and don't know data frames yet. Let's take your data, force it to a standardized, unambiguous format, and concatenate the data.
> y <- 2012:2015
> y
[1] 2012 2013 2014 2015
> m <- 1:4
> m
[1] 1 2 3 4
> d <- 10:13
> d
[1] 10 11 12 13
Use as.numeric if you want to be safe and convert everything to the same format before concatenation. If you get any NA values you will need to handle them with the is.na function and provide a default value.
Use paste with the sep separator value set to your delimiter, in this case, the hyphen.
> paste(y,m,d, sep = '-')
[1] "2012-1-10" "2013-2-11" "2014-3-12" "2015-4-13"
Dataframe / Dplyr / Lubridate Way
> df <- data.frame(year = y, mon = m, day = d)
> df
year mon day
1 2012 1 10
2 2013 2 11
3 2014 3 12
4 2015 4 13
Below I do the following:
Take the df object
Create a new variable name Date
Concatenate the numeric variables y, m, and d with a - separator
Convert the string literal into a Date format with ymd()
> df %>%
mutate(Date = ymd(
paste(y,m,d, sep = '-')
)
)
year mon day Date
1 2012 1 10 2012-01-10
2 2013 2 11 2013-02-11
3 2014 3 12 2014-03-12
4 2015 4 13 2015-04-13
Below we create year-month-day character strings, yyyy-mm-dd character strings (similar except one digit month and day are zero padded out to 2 digits) and Date class. The last one prints out as yyyy-mm-dd and can be manipulated in ways that character strings can't, for example adding one to a Date class object gives the next day.
First we set up some sample input:
year <- c(2017, 2015, 2014)
month <- c(3, 1, 10)
day <- c(15, 9, 25)
convert to year-month-day character string This is not quite yyyy-mm-dd since 1 digit months and days are not zero padded to 2 digits:
paste(year, month, day, sep = "-")
## [1] "2017-3-15" "2015-1-9" "2014-10-25"
convert to Date class It prints on console as yyyy-mm-dd. Two alternatives:
as.Date(paste(year, month, day, sep = "-"))
## [1] "2017-03-15" "2015-01-09" "2014-10-25"
as.Date(ISOdate(year, month, day))
## [1] "2017-03-15" "2015-01-09" "2014-10-25"
convert to character string yyyy-mm-dd In this case 1 digit month and day are zero padded out to 2 characters. Two alternatives:
as.character(as.Date(paste(year, month, day, sep = "-")))
## [1] "2017-03-15" "2015-01-09" "2014-10-25"
sprintf("%d-%02d-%02d", year, month, day)
## [1] "2017-03-15" "2015-01-09" "2014-10-25"
I'm currently writing a script in the R Programming Language and I've hit a snag.
I have time series data organized in a way where there are 30 days in each month for 12 months in 1 year. However, I need the data organized in a proper 365 days in a year calendar, as in 30 days in a month, 31 days in a month, etc.
Is there a simple way for R to recognize there are 30 days in a month and to operate within that parameter? At the moment I have my script converting the number of days from the source in UNIX time and it counts up.
For example:
startingdate <- "20060101"
endingdate <- "20121230"
date <- seq(from = as.Date(startingdate, "%Y%m%d"), to = as.Date(endingdate, "%Y%m%d"), by = "days")
This would generate an array of dates with each month having 29 days/30 days/31 days etc. However, my data is currently organized as 30 days per month, regardless of 29 days or 31 days present.
Thanks.
The first 4 solutions are basically variations of the same theme using expand.grid. (3) uses magrittr and the others use no packages. The last two work by creating long sequence of numbers and then picking out the ones that have month and day in range.
1) apply This gives a series of yyyymmdd numbers such that there are 30 days in each month. Note that the line defining yrs in this case is the same as yrs <- 2006:2012 so if the years are handy we could shorten that line. Omit as.numeric in the line defining s if you want character string output instead. Also, s and d are the same because we have whole years so we could omit the line defining d and use s as the answer in this case and also in general if we are always dealing with whole years.
startingdate <- "20060101"
endingdate <- "20121230"
yrs <- seq(as.numeric(substr(startingdate, 1, 4)), as.numeric(substr(endingdate, 1, 4)))
g <- expand.grid(yrs, sprintf("%02d", 1:12), sprintf("%02d", 1:30))
s <- sort(as.numeric(apply(g, 1, paste, collapse = "")))
d <- s[ s >= startingdate & s <= endingdate ] # optional if whole years
Run some checks.
head(d)
## [1] 20060101 20060102 20060103 20060104 20060105 20060106
tail(d)
## 20121225 20121226 20121227 20121228 20121229 20121230
length(d) == length(2006:2012) * 12 * 30
## [1] TRUE
2) no apply An alternative variation would be this. In this and the following solutions we are using yrs as calculated in (1) so we omit it to avoid redundancy. Also, in this and the following solutions, the corresponding line to the one setting d is omitted, again, to avoid redundancy -- if you don't have whole years then add the line defining d in (1) replacing s in that line with s2.
g2 <- expand.grid(yr = yrs, mon = sprintf("%02d", 1:12), day = sprintf("%02d", 1:30))
s2 <- with(g2, sort(as.numeric(paste0(yr, mon, day))))
3) magrittr This could also be written using magrittr like this:
library(magrittr)
expand.grid(yr = yrs, mon = sprintf("%02d", 1:12), day = sprintf("%02d", 1:30)) %>%
with(paste0(yr, mon, day)) %>%
as.numeric %>%
sort -> s3
4) do.call Another variation.
g4 <- expand.grid(yrs, 1:12, 1:30)
s4 <- sort(as.numeric(do.call("sprintf", c("%d%02d%02d", g4))))
5) subset sequence Create a sequence of numbers from the starting date to the ending date and if each number is of the form yyyymmdd pick out those for which mm and dd are in range.
seq5 <- seq(as.numeric(startingdate), as.numeric(endingdate))
d5 <- seq5[ seq5 %/% 100 %% 100 %in% 1:12 & seq5 %% 100 %in% 1:30]
6) grep Using seq5 from (5)
d6 <- as.numeric(grep("(0[1-9]|1[0-2])(0[1-9]|[12][0-9]|30)$", seq5, value = TRUE))
Here's an alternative:
date <- unclass(startingdate):unclass(endingdate) %% 30L
month <- rep(1:12, each = 30, length.out = NN <- length(date))
year <- rep(1:(NN %/% 360 + 1), each = 360, length.out = NN)
(of course, we can easily adjust by adding constants to taste if you want a specific day to be 0, or a specific month, etc.)
I have a column of strings in my data set formatted as year week (e.g. '201401' is equivalent to 7th April 2014, or the first fiscal week of the year)
I am trying to convert these to a proper date so I can manipulate them later, however I always receive the dame date for a given year, specifically the 14th of April.
e.g.
test_set <- c('201401', '201402', '201403')
as.Date(test_set, '%Y%U')
gives me:
[1] "2014-04-14" "2014-04-14" "2014-04-14"
Try something like this:
> test_set <- c('201401', '201402', '201403')
>
> extractDate <- function(dateString, fiscalStart = as.Date("2014-04-01")) {
+ week <- substr(dateString, 5, 6)
+ currentDate <- fiscalStart + 7 * as.numeric(week) - 1
+ currentDate
+ }
>
> extractDate(test_set)
[1] "2014-04-07" "2014-04-14" "2014-04-21"
Basically, I'm extracting the weeks from the start of the year, converting it to days and then adding that number of days to the start of the fiscal year (less 1 day to make things line up).
Not 100% sure what is your desired output but this may work
as.Date(paste0(substr(test_set, 1, 4), "-04-07")) +
(as.numeric(substr(test_set, 5, 6)) - 1) * 7
# [1] "2014-04-07" "2014-04-14" "2014-04-21"
I have this number
20101213 which is a representation of this data 2010 Dec 13th I want to extract the year, month and day numbers from that number. So I should have three variables contain the values.
What I have tried:
value = 20101213
as.numeric(strsplit(as.character(value), "")[[1]])
The result is [1] 2 0 1 0 1 0 1 0
but I didn't know how to continue, may you help me please
You probably want to get this into a date-time format anyways for future computing, so how about:
(x <- strptime(20101213, "%Y%m%d"))
# [1] "2010-12-13 EST"
This will enable you to do computations that you wouldn't have been able to with just the year, month number, and day number, such as grabbing the day of the week (0=Sunday, 1=Monday, ...) or day of the year:
x$wday
# [1] 1
x$yday
# [1] 346
Further, you could easily extract the year, month number, and day of month number:
c(x$year+1900, x$mon+1, x$mday)
# [1] 2010 12 13
Edit: As pointed out by #thelatemail, an alternative that doesn't involve remembering offsets is:
as.numeric(c(format(x, "%Y"), format(x, "%m"), format(x, "%d")))
# [1] 2010 12 13
year <- as.numeric(substr(as.character(value),start = 1,stop = 4))
month <- as.numeric(substr(as.character(value),start = 5,stop = 6))
day <- as.numeric(substr(as.character(value),start = 7,stop = 8))
If you don't want to deal with string representation you could also just use mod function like this:
# using mod
year = floor(value/10000)
month = floor((value %% 10000)/100)
day = value %% 100
Which will then extract the relevant parts of the number as expected.
I would like a function that counts the number of specific days per month..
i.e.. Nov '13 -> 5 fridays.. while Dec'13 would return 4 Fridays..
Is there an elegant function that would return this?
library(lubridate)
num_days <- function(date){
x <- as.Date(date)
start = floor_date(x, "month")
count = days_in_month(x)
d = wday(start)
sol = ifelse(d > 4, 5, 4) #estimate that is the first day of the month is after Thu or Fri then the week will have 5 Fridays
sol
}
num_days("2013-08-01")
num_days(today())
What would be a better way to do this?
1) Here d is the input, a Date class object, e.g. d <- Sys.Date(). The result gives the number of Fridays in the year/month that contains d. Replace 5 with 1 to get the number of Mondays:
first <- as.Date(cut(d, "month"))
last <- as.Date(cut(first + 31, "month")) - 1
sum(format(seq(first, last, "day"), "%w") == 5)
2) Alternately replace the last line with the following line. Here, the first term is the number of Fridays from the Epoch to the next Friday on or after the first of the next month and the second term is the number of Fridays from the Epoch to the next Friday on or after the first of d's month. Again, we replace all 5's with 1's to get the count of Mondays.
ceiling(as.numeric(last + 1 - 5 + 4) / 7) - ceiling(as.numeric(first - 5 + 4) / 7)
The second solution is slightly longer (although it has the same number of lines) but it has the advantage of being vectorized, i.e. d could be a vector of dates.
UPDATE: Added second solution.
There are a number of ways to do it. Here is one:
countFridays <- function(y, m) {
fr <- as.Date(paste(y, m, "01", sep="-"))
to <- fr + 31
dt <- seq(fr, to, by="1 day")
df <- data.frame(date=dt, mon=as.POSIXlt(dt)$mon, wday=as.POSIXlt(dt)$wday)
df <- subset(df, df$wday==5 & df$mon==df[1,"mon"])
return(nrow(df))
}
It creates the first of the months, and a day in the next months.
It then creates a data frame of month index (on a 0 to 11 range, but we only use this for comparison) and weekday.
We then subset to a) be in the same month and b) on a Friday. That is your result set, and
we return the number of rows as your anwser.
Note that this only uses base R code.
Without using lubridate -
#arguments to pass to function:
whichweekday <- 5
whichmonth <- 11
whichyear <- 2013
#function code:
firstday <- as.Date(paste('01',whichmonth,whichyear,sep="-"),'%d-%m-%Y')
lastday <- if(whichmonth == 12) { '31-12-2013' } else {seq(as.Date(firstday,'%d-%m-%Y'), length=2, by="1 month")[2]-1}
sum(
strftime(
seq.Date(
from = firstday,
to = lastday,
by = "day"),
'%w'
) == whichweekday)