I have a vector of dates of the form BW01.68, BW02.68, ... , BW26.10. BW stands for "bi-week", so for example, "BW01.68" represents the first bi-week of the year 1968, and "BW26.10" represents the 26th (and final) bi-week of the year 2010. Using R, how could I convert this vector into actual dates, say, of the form 01-01-1968, 01-15-1968, ... , 12-16-2010? Is there a way for R to know exactly which dates correspond to each bi-week? Thanks for any help!
An alternative solution.
biwks <- c("BW01.68", "BW02.68", "BW26.10")
bw <- substr(biwks,3,4)
yr <- substr(biwks,6,7)
yr <- paste0(ifelse(as.numeric(yr) > 15,"19","20"),yr)
# the %j in the date format is the number of days into the year
as.Date(paste(((as.numeric(bw)-1) * 14) + 1,yr,sep="-"),format="%j-%Y")
#[1] "1968-01-01" "1968-01-15" "2010-12-17"
Though I will note that a 'bi-week' seems a strange measure and I can't be sure that just using 14 day blocks is what is intended in your work.
You can make this code a lot shorter. I have spaced out each step to help understanding but you could finish it off in one (long) line of code.
bw <- c('BW01.68', 'BW02.68','BW26.10','BW22.13')
# the gsub will ensure that bw01.1 the same as bw01.01, bw1.01, or bw1.1
#isolating year no
yearno <- as.numeric(
gsub(
x = bw,
pattern = "BW.*\\.",
replacement = ""
)
)
#isolating and converting bw to no of days
dayno <- 14 * as.numeric(
gsub(
x = bw,
pattern = "BW|\\.[[:digit:]]{1,2}",
replacement = ""
)
)
#cutoff year chosen as 15
yearno <- yearno + 1900
yearno[yearno < 1915] <- yearno[yearno < 1915] + 100
# identifying dates
dates <- as.Date(paste0('01/01/',yearno),"%d/%m/%Y") + dayno
# specifically identifinyg mondays of that week no
mondaydates <- dates - as.numeric(strftime(dates,'%w')) + 1
Output -
> bw
[1] "BW01.68" "BW02.68" "BW26.10" "BW22.13"
> dates
[1] "1968-01-15" "1968-01-29" "2010-12-31" "2013-11-05"
> mondaydates
[1] "1968-01-15" "1968-01-29" "2010-12-27" "2013-11-04"
PS: Just be careful that you're aligned with how bw is measured in your data and whether you're translating it correctly. You should be able to manipulate this to get it to work, for instance you might encounter a bw 27.
Related
I'm hoping to retrieve the month number from a fiscal year that starts in November (i.e. the first day of the fiscal year is November 1st). The following code provides my desired output, borrowing the week_start syntax of lubridate::wday, where year_start is analogous to week_start:
library('lubridate')
dateToRetrieve = ymd('2017-11-05')
#output: [1] "2017-11-05"
monthFromDate = month(dateToRetrieve, year_start=11)
#output: [1] 1
Since this functionality doesn't yet exist, I'm looking for an alternative solution that provides the same output. Adding period(10, units="month") to each date does not work because the length of different months leads to issues translating between months (e.g. March 31st minus a month = February 31st, which doesn't make sense).
I checked a somewhat similar question on the lubridate github here, but didn't see any solutions. Does anyone have an idea that will provide my desired functionality?
Many thanks,
1) lubridate Below x can be a character vector or a Date vector:
x <- "2017-11-05" # test data
(month(x) - 11) %% 12 + 1
## [1] 1
2) Base R To do this with only base R first calculate the month number giving mx as shown and then perform the same computation:
mx <- as.POSIXlt(x)$mon + 1
(mx - 11) %% 12 + 1
## [1] 1
It is a not pretty way... but you could create a vector range of months starting at November, call the full month of the date object, then match the two objects together to get the vector position.
suppressPackageStartupMessages(library('lubridate'))
x <- format(ISOdate(2004,1:12,1),"%B")[c(11,12,1:10)]
match(as.character(month(ymd('2017-11-05'), label = TRUE, abbr = FALSE)), x)
#> [1] 1
match(as.character(month(ymd('2017-01-15'), label = TRUE, abbr = FALSE)), x)
#> [1] 3
match(as.character(month(ymd('2017-05-01'), label = TRUE, abbr = FALSE)), x)
#> [1] 7
I'm trying to set up a new variable that incorporates the difference (in number of days) between a known date and the end of a given year. Dummy data below:
> Date.event <- as.POSIXct(c("12/2/2000","8/2/2001"), format = "%d/%m/%Y", tz = "Europe/London")
> Year = c(2000,2001)
> Dates.test <- data.frame(Date.event,Year)
> Dates.test
Date.event Year
1 2000-02-12 2000
2 2001-02-08 2001
I've tried applying a function to achieve this, but it returns an error
> Time.dif.fun <- function(x) {
+ as.numeric(as.POSIXct(sprintf('31/12/%s', s= x['Year']),format = "%d/%m/%Y", tz = "Europe/London") - x['Date.event'])
+ }
> Dates.test$Time.dif <- apply(
+ Dates.test, 1, Time.dif.fun
+ )
Error in unclass(e1) - e2 : non-numeric argument to binary operator
It seems that apply() does not like as.POSIXct(), as testing a version of the function that only derives the end of year date, it is returned as a numeric in the form '978220800' (e.g. for end of year 2000). Is there any way around this? For the real data the function is a bit more complex, including conditional instances using different variables and sometimes referring to previous rows, which would be very hard to do without apply.
Here are some alternatives:
1) Your code works with these changes. We factored out s, not because it is necessary, but only because the following line gets very hard to read without that due to its length. Note that if x is a data frame then so is x["Year"] but x[["Year"]] is a vector as is x$Year. Since the operations are all vectorized we do not need apply.
Although we have not made this change, it would be a bit easier to define s as s <- paste0(x$Year, "-12-31") in which case we could omit the format argument in the following line owing to the use of the default format.
Time.dif.fun <- function(x) {
s <- sprintf('31/12/%s', x[['Year']])
as.numeric(as.POSIXct(s, format = "%d/%m/%Y", tz = "Europe/London") -x[['Date.event']])
}
Time.dif.fun(Dates.test)
## [1] 323 326
2) Convert to POSIXlt, set the year, month and day to the end of the year and subtract. Note that the year component uses years since 1900 and the mon component uses Jan = 0, Feb = 1, ..., Dec = 11. See ?as.POSIXlt for details on these and other components:
lt <- as.POSIXlt(Dates.test$Date.event)
lt$year <- Dates.test$Year - 1900
lt$mon <- 11
lt$mday <- 31
as.numeric(lt - Dates.test$Date.event)
## [1] 323 326
3) Another possibility is:
with(Dates.test, as.numeric(as.Date(paste0(Year, "-12-31")) - as.Date(Date.event)))
## [1] 323 326
You could use the difftime function:
Dates.test$diff_days <- difftime(as.POSIXct(paste0(Dates.test[,2],"-12-31"),format = "%Y-%m-%d", tz = "Europe/London"),Dates.test[,1],unit="days")
You can use ISOdate to build the end of year date, and the difftime(... units='days') to get the days til end of year.
From ?difftime:
Limited arithmetic is available on "difftime" objects: they can be
added or subtracted, and multiplied or divided by a numeric vector.
If you want to do more than the limited arithmetic, just coerce with as.numeric(), but you will have to stick with whatever units you specified.
By convention, you may wish to use the beginning of the next year (midnight on new year's eve) as your endpoint for that year. For example:
Dates.test <- data.frame(
Date.event = as.POSIXct(c("12/2/2000","8/2/2001"),
format = "%d/%m/%Y", tz = "Europe/London")
)
# use data.table::year() to get the year of a date
year <- function(x) as.POSIXlt(x)$year + 1900L
Dates.test$Date.end <- ISOdate(year(Dates.test$Date.event)+1,1,1)
# if you don't want class 'difftime', wrap it in as.numeric(), as in:
Dates.test$Date.diff <- as.numeric(
difftime(Dates.test$Date.end,
Dates.test$Date.event,
units='days')
)
Dates.test
# Date.event Date.end Date.diff
# 1 2000-02-12 2001-01-01 12:00:00 324.5
# 2 2001-02-08 2002-01-01 12:00:00 327.5
The apply() family are basically a clean way of doing for loops, and you should strive for more efficient, vectorized solutions.
I'm currently writing a script in the R Programming Language and I've hit a snag.
I have time series data organized in a way where there are 30 days in each month for 12 months in 1 year. However, I need the data organized in a proper 365 days in a year calendar, as in 30 days in a month, 31 days in a month, etc.
Is there a simple way for R to recognize there are 30 days in a month and to operate within that parameter? At the moment I have my script converting the number of days from the source in UNIX time and it counts up.
For example:
startingdate <- "20060101"
endingdate <- "20121230"
date <- seq(from = as.Date(startingdate, "%Y%m%d"), to = as.Date(endingdate, "%Y%m%d"), by = "days")
This would generate an array of dates with each month having 29 days/30 days/31 days etc. However, my data is currently organized as 30 days per month, regardless of 29 days or 31 days present.
Thanks.
The first 4 solutions are basically variations of the same theme using expand.grid. (3) uses magrittr and the others use no packages. The last two work by creating long sequence of numbers and then picking out the ones that have month and day in range.
1) apply This gives a series of yyyymmdd numbers such that there are 30 days in each month. Note that the line defining yrs in this case is the same as yrs <- 2006:2012 so if the years are handy we could shorten that line. Omit as.numeric in the line defining s if you want character string output instead. Also, s and d are the same because we have whole years so we could omit the line defining d and use s as the answer in this case and also in general if we are always dealing with whole years.
startingdate <- "20060101"
endingdate <- "20121230"
yrs <- seq(as.numeric(substr(startingdate, 1, 4)), as.numeric(substr(endingdate, 1, 4)))
g <- expand.grid(yrs, sprintf("%02d", 1:12), sprintf("%02d", 1:30))
s <- sort(as.numeric(apply(g, 1, paste, collapse = "")))
d <- s[ s >= startingdate & s <= endingdate ] # optional if whole years
Run some checks.
head(d)
## [1] 20060101 20060102 20060103 20060104 20060105 20060106
tail(d)
## 20121225 20121226 20121227 20121228 20121229 20121230
length(d) == length(2006:2012) * 12 * 30
## [1] TRUE
2) no apply An alternative variation would be this. In this and the following solutions we are using yrs as calculated in (1) so we omit it to avoid redundancy. Also, in this and the following solutions, the corresponding line to the one setting d is omitted, again, to avoid redundancy -- if you don't have whole years then add the line defining d in (1) replacing s in that line with s2.
g2 <- expand.grid(yr = yrs, mon = sprintf("%02d", 1:12), day = sprintf("%02d", 1:30))
s2 <- with(g2, sort(as.numeric(paste0(yr, mon, day))))
3) magrittr This could also be written using magrittr like this:
library(magrittr)
expand.grid(yr = yrs, mon = sprintf("%02d", 1:12), day = sprintf("%02d", 1:30)) %>%
with(paste0(yr, mon, day)) %>%
as.numeric %>%
sort -> s3
4) do.call Another variation.
g4 <- expand.grid(yrs, 1:12, 1:30)
s4 <- sort(as.numeric(do.call("sprintf", c("%d%02d%02d", g4))))
5) subset sequence Create a sequence of numbers from the starting date to the ending date and if each number is of the form yyyymmdd pick out those for which mm and dd are in range.
seq5 <- seq(as.numeric(startingdate), as.numeric(endingdate))
d5 <- seq5[ seq5 %/% 100 %% 100 %in% 1:12 & seq5 %% 100 %in% 1:30]
6) grep Using seq5 from (5)
d6 <- as.numeric(grep("(0[1-9]|1[0-2])(0[1-9]|[12][0-9]|30)$", seq5, value = TRUE))
Here's an alternative:
date <- unclass(startingdate):unclass(endingdate) %% 30L
month <- rep(1:12, each = 30, length.out = NN <- length(date))
year <- rep(1:(NN %/% 360 + 1), each = 360, length.out = NN)
(of course, we can easily adjust by adding constants to taste if you want a specific day to be 0, or a specific month, etc.)
This works for me in R:
# Setting up the first inner while-loop controller, the start of the next water year
NextH2OYear <- as.POSIXlt(firstDate)
NextH2OYear$year <- NextH2OYear$year + 1
NextH2OYear<-as.Date(NextH2OYear)
But this doesn't:
# Setting up the first inner while-loop controller, the start of the next water month
NextH2OMonth <- as.POSIXlt(firstDate)
NextH2OMonth$mon <- NextH2OMonth$mon + 1
NextH2OMonth <- as.Date(NextH2OMonth)
I get this error:
Error in as.Date.POSIXlt(NextH2OMonth) :
zero length component in non-empty POSIXlt structure
Any ideas why? I need to systematically add one year (for one loop) and one month (for another loop) and am comparing the resulting changed variables to values with a class of Date, which is why they are being converted back using as.Date().
Thanks,
Tom
Edit:
Below is the entire section of code. I am using RStudio (version 0.97.306). The code below represents a function that is passed an array of two columns (Date (CLass=Date) and Discharge Data (Class=Numeric) that are used to calculate the monthly averages. So, firstDate and lastDate are class Date and determined from the passed array. This code is adapted from successful code that calculates the yearly averages - there maybe one or two things I still need to change over, but I am prevented from error checking later parts due to the early errors I get in my use of POSIXlt. Here is the code:
MonthlyAvgDischarge<-function(values){
#determining the number of values - i.e. the number of rows
dataCount <- nrow(values)
# Determining first and last dates
firstDate <- (values[1,1])
lastDate <- (values[dataCount,1])
# Setting up vectors for results
WaterMonths <- numeric(0)
class(WaterMonths) <- "Date"
numDays <- numeric(0)
MonthlyAvg <- numeric(0)
# while loop variables
loopDate1 <- firstDate
loopDate2 <- firstDate
# Setting up the first inner while-loop controller, the start of the next water month
NextH2OMonth <- as.POSIXlt(firstDate)
NextH2OMonth$mon <- NextH2OMonth$mon + 1
NextH2OMonth <- as.Date(NextH2OMonth)
# Variables used in the loops
dayCounter <- 0
dischargeTotal <- 0
dischargeCounter <- 1
resultsCounter <- 1
loopCounter <- 0
skipcount <- 0
# Outer while-loop, controls the progression from one year to another
while(loopDate1 <= lastDate)
{
# Inner while-loop controls adding up the discharge for each water year
# and keeps track of day count
while(loopDate2 < NextH2OMonth)
{
if(is.na(values[resultsCounter,2]))
{
# Skip this date
loopDate2 <- loopDate2 + 1
# Skip this value
resultsCounter <- resultsCounter + 1
#Skipped counter
skipcount<-skipcount+1
} else{
# Adding up discharge
dischargeTotal <- dischargeTotal + values[resultsCounter,2]
}
# Adding a day
loopDate2 <- loopDate2 + 1
#Keeping track of days
dayCounter <- dayCounter + 1
# Keeping track of Dicharge position
resultsCounter <- resultsCounter + 1
}
# Adding the results/water years/number of days into the vectors
WaterMonths <- c(WaterMonths, as.Date(loopDate2, format="%mm/%Y"))
numDays <- c(numDays, dayCounter)
MonthlyAvg <- c(MonthlyAvg, round((dischargeTotal/dayCounter), digits=0))
# Resetting the left hand side variables of the while-loops
loopDate1 <- NextH2OMonth
loopDate2 <- NextH2OMonth
# Resetting the right hand side variable of the inner while-loop
# moving it one year forward in time to the next water year
NextH2OMonth <- as.POSIXlt(NextH2OMonth)
NextH2OMonth$year <- NextH2OMonth$Month + 1
NextH2OMonth<-as.Date(NextH2OMonth)
# Resettting vraiables that need to be reset
dayCounter <- 0
dischargeTotal <- 0
loopCounter <- loopCounter + 1
}
WaterMonths <- format(WaterMonthss, format="%mm/%Y")
# Uncomment the line below and return AvgAnnualDailyAvg if you want the water years also
# AvgAnnDailyAvg <- data.frame(WaterYears, numDays, YearlyDailyAvg)
return((MonthlyAvg))
}
Same error occurs in regular R. When doing it line by line, its not a problem, when running it as a script, it it.
Plain R
seq(Sys.Date(), length = 2, by = "month")[2]
seq(Sys.Date(), length = 2, by = "year")[2]
Note that this works with POSIXlt too, e.g.
seq(as.POSIXlt(Sys.Date()), length = 2, by = "month")[2]
mondate.
library(mondate)
now <- mondate(Sys.Date())
now + 1 # date in one month
now + 12 # date in 12 months
Mondate is bit smarter about things like mondate("2013-01-31")+ 1 which gives last day of February whereas seq(as.Date("2013-01-31"), length = 2, by = "month")[2] gives March 3rd.
yearmon If you don't really need the day part then yearmon may be preferable:
library(zoo)
now.ym <- yearmon(Sys.Date())
now.ym + 1/12 # add one month
now.ym + 1 # add one year
ADDED comment on POSIXlt and section on yearmon.
Here is you can add 1 month to a date in R, using package lubridate:
library(lubridate)
x <- as.POSIXlt("2010-01-31 01:00:00")
month(x) <- month(x) + 1
>x
[1] "2010-03-03 01:00:00 PST"
(note that it processed the addition correctly, as 31st of Feb doesn't exist).
Can you perhaps provide a reproducible example? What's in firstDate, and what version of R are you using? I do this kind of manipulation of POSIXlt dates quite often and it seems to work:
Sys.Date()
# [1] "2013-02-13"
date = as.POSIXlt(Sys.Date())
date$mon = date$mon + 1
as.Date(date)
# [1] "2013-03-13"
I struggle mightily with dates in R and could do this pretty easily in SPSS, but I would love to stay within R for my project.
I have a date column in my data frame and want to remove the year completely in order to leave the month and day. Here is a peak at my original data.
> head(ds$date)
[1] "2003-10-09" "2003-10-11" "2003-10-13" "2003-10-15" "2003-10-18" "2003-10-20"
> class((ds$date))
[1] "Date"
I "want" it to be.
> head(ds$date)
[1] "10-09" "10-11" "10-13" "10-15" "10-18" "10-20"
> class((ds$date))
[1] "Date"
If possible, I would love to set the first date to be October 1st instead of January 1st.
Any help you can provide will be greatly appreciated.
EDIT: I felt like I should add some context. I want to plot an NHL player's performance over the course of a season which starts in October and ends in April. To add to this, I would like to facet the plots by each season which is a separate column in my data frame. Because I want to compare cumulative performance over the course of the season, I believe that I need to remove the year portion, but maybe I don't; as I indicated, I struggle with dates in R. What I am looking to accomplish is a plot that compares cumulative performance over relative dates by season and have the x-axis start in October and end in April.
> d = as.Date("2003-10-09", format="%Y-%m-%d")
> format(d, "%m-%d")
[1] "10-09"
Is this what you are looking for?
library(ggplot2)
## make up data for two seasons a and b
a = as.Date("2010/10/1")
b = as.Date("2011/10/1")
a.date <- seq(a, by='1 week', length=28)
b.date <- seq(b, by='1 week', length=28)
## make up some score data
a.score <- abs(trunc(rnorm(28, mean = 10, sd = 5)))
b.score <- abs(trunc(rnorm(28, mean = 10, sd = 5)))
## create a data frame
df <- data.frame(a.date, b.date, a.score, b.score)
df
## Since I am using ggplot I better create a "long formated" data frame
df.molt <- melt(df, measure.vars = c("a.score", "b.score"))
levels(df.molt$variable) <- c("First season", "Second season")
df.molt
Then, I am using ggplot2 for plotting the data:
## plot it
ggplot(aes(y = value, x = a.date), data = df.molt) + geom_point() +
geom_line() + facet_wrap(~variable, ncol = 1) +
scale_x_date("Date", format = "%m-%d")
If you want to modify the x-axis (e.g., display format), then you'll probably be interested in scale_date.
You have to remember Date is a numeric format, representing the number of days passed since the "origin" of the internal date counting :
> str(Date)
Class 'Date' num [1:10] 14245 14360 14475 14590 14705 ...
This is the same as in EXCEL, if you want a reference. Hence the solution with format as perfectly valid.
Now if you want to set the first date of a year as October 1st, you can construct some year index like this :
redefine.year <- function(x,start="10-1"){
year <- as.numeric(strftime(x,"%Y"))
yearstart <- as.Date(paste(year,start,sep="-"))
year + (x >= yearstart) - min(year) + 1
}
Testing code :
Start <- as.Date("2009-1-1")
Stop <- as.Date("2011-11-1")
Date <- seq(Start,Stop,length.out=10)
data.frame( Date=as.character(Date),
year=redefine.year(Date))
gives
Date year
1 2009-01-01 1
2 2009-04-25 1
3 2009-08-18 1
4 2009-12-11 2
5 2010-04-05 2
6 2010-07-29 2
7 2010-11-21 3
8 2011-03-16 3
9 2011-07-09 3
10 2011-11-01 4