How to generate matched-pairs based on dates? - r

I have a dataset that includes dates and count of reports. I am tasked with generating matched-pairs using these guidelines:
Reports will need to be matched to the week immediately prior to or following. (For example: Jan 23, 2000 will be matched with Jan 16, 2000 and Jan 30, 2000)
Holidays must not be included in the final matched-pairs generation.
I have been able to identify the holidays within the dataset but am still stuck on how to generate the matched pairs. Any advice would be much appreciated!
Example of the Data

I am making assumptions as I could not ask for clarifications.
Assumptions I made
a> You wanted to get a formula bash
b> You wanted the date closest matching the previous week to the specific date. for example a Monday event needed to match closer to an event on Monday the previous week. As the data set you gave showed multiple reports through the week. It was not clear what pattern of the previous week you wanted to match.
Solution based on Assumptions.
1> You can mathematically turn each date to a grouping of which week they were in for the year. Then match them to one another. For example 1/1/2003 would be 1.1. A date in 14/1/2003 would be 2.1.
You can then patten match on if 1.1 = 2.1 if that hits it's a match if not it would loop until it saw an entry in the range of 2.[0-9]. You can place an if statment to check if there is a holiday on the match, if there is one it will continue the loop.

Related

Is it possible to print the official calendar week range and not just the available dates in my dataset?

I have a data frame which sums values over a weekly basis.
I am able to calculate the official calendar week that the summed values fall into in each case, but I want to know if I can add the calendar week range also for reference as some values only contain a couple of days' worth of dates for a particular week.
I currently use paste(min(created), max(created), sep = ' - ') but this only gives me the range of values with created and not the official calendar week range and can sometimes be misleading due to an incomplete weeks' worth of values being present in the dataset, as mentioned above.
Can an official calendar week range be achieved?

What are the consequences of choosing different frequencies for ts objects?

To create a ts-object in R, one has to specify a data frame, a start date and the frequency of the time series.
When searching the internet (e.g. Role of frequency parameter in ts), I get the impression that by choosing the frequency, one can emphasise whatever periodic pattern one believes is the most important in the data. However, I doubt that this is actually true. My impression is that it is solely used to compute the dates of the time series on-the-fly. E.g. when I set the start date “2015-08-01”, R automatically transforms it into a decimal date and I get something like 2015.58. If I now choose a frequency of 365 (or 365.25), R divides one unit by 365 and assigns this fraction to each day as one unit ahead, so the entry 366 days later is exactly 2016.58. However, if I choose frequency=7, the fraction assigned to each day is 1/7th, so the date assigned to the 8th day after my start date corresponds to a decimal number between 2016 and 2017. So the only choice for a data set with 365 entries per year is 365, isn’t it? And it is only used to actually create the time series?
Otherwise, if I choose the xts-class, an xts-object is built from a vector and a matrix where the vector has to be created in advance. So here there is no need to compute the date on-the-fly using a start date and a frequency and that is the reason why no frequency has to be assigned at all.
In both cases I can apply forecasting packages to either ts or xts objects (such as ARIMA, ets, stl, bats, bats etc) without specifying anything else so this shows that the frequency is actually not used for anything else. Or am I missing something here?
Thanks in advance for your comments!

flexibly naming subsetted objects in R

I'm somewhat new to R so i apologize in advance if the answer to this question is obvious. I have a very long data frame (only one variable) from which i want to create multiple objects from subsets within the data frame. The code to scrape the data & format as data frame 'aa', define the variable as 'whatever':
aa<-data.frame(readLines("ftp://ftp.cmegroup.com/pub/settle/stlint"))
aa<-data.frame(aa[-1:-3,])
colnames(aa)<-"whatever"
I am looking to subset each section under a heading beginning with 'ZE' and ending with the last data row before the next 'ZE' or before the 'TOTAL'... so basically i want 36 objects (length(grep("ZE",aa$whatever[1:nrow(aa)]))=36) each starting with their respective 'ZE' title followed by (roughly) 70 rows of data, with each object identified by their respective title. So for instance, I would want the first dataset (headed by row ZE MAR15 EURODOLLAR OPTIONS CALL) to be named some variant of 'March 2015 Calls' as i just need to denote the month, year, and whether the data is for calls or puts.
I can actually code this up in batch thru a loop, but here's my problem: right now of course the first 'ZE' month is Mar15, ie March 2015, and the last 'ZE' month is Dec18, or Dec 2015. This will change as time goes on though, and i'm hoping to be able to automatically name them based on the first line without tweaking the script when the months change for each contract. So is it possible to flexibly name each of these subsets based on the content of the header?
Thanks

PL/SQL group by week

In MySQL I am using
week(date,3)
to get correct values of week grouping.
How to translate this to PL/SQL? I'v tried the following but what about this 3 from week function in mysql?
TRUNC (created_dt, 'IW')
Oracle is using the NLS setting of the database to determinate how the week number should be calculated and therefor there is no need (according to Oracle) for the '3' part pf the MySQL function. I can image that there still should be useful to have this option but this is once again a sign of the fact that Oracle does not fully understand the needs of working outside USA.
Based on your MYSQL statment above, it returns the week of the specified date e.g. 2012-12-07, 3 as the second argument defines that the third day of the week is assumed as Monday...
If you look at this article it says there are 8 ways MYSQL WEEK() function can behave. So you gotta let us know what results you are trying to achieve by looking for MYSQL Week equivalent function in PL/SQL.
In most staright forward manner, MYSQL WEEK(date[mode]) returns the week number for a given date.
From re-reading your question, the only thing I grabbed that you want to achieve the first feature within PL/SQL and so you are looking for an equivalent function.
And with Oracle it gets slightly RAMEN...
W Week of month (1-5) where week 1 starts on the first day of the month and ends on the seventh. It goes on with how the year starts. E.g. 2012 started on a Sunday, ORACLE THINKS.............. that weeks are Sundays to Saturday.
iw Week of year (1-52, 1-53) based on the ISO standard. Hence the weeks do not necessarily start on Sunday.
WW Week of the year (1-53) where week 1 starts on the first day of the year and continues to the seventh day of the year.
By default Oracle NLS settings are set to following:
NLS_CALENDAR : Gregorian
NLS_DATE_LANGUAGE: AMERICAN
NLS_TERRITORY: AMERICA
So you can suggest Oracle to follow the calendar by manipulating your query level....
You could something like this:
select trunc('2012-12-07','YY') AS week_number,
to_number(to_char(trunc('2012-12-07','YY')+rownum-1,'D')) AS dayNumber_in_week
from dual connect by level <= 365
where to_char('2012-12-07','IW') = 3
and to_char('2012-12-07','DY') = 'MON';

Interval of one month back not working on the 31st?

Essentially, I have a query that is responsible for fetching all records (with specific filters) within the last month. I'm using Oracle's interval keyword and all was working great until today (December 31st, 2009). The code I'm using is
select (sysdate - interval '1' month) from dual
and the error I get it
ORA-01839: date not valid for month specified
How can I use the interval keyword to be compatible with any date? Or if anyone has a better way of approaching the issue, I'm all ears.
Thank you.
try
select add_months(sysdate,-1) from dual
Being pedantic...
The requirements are not quite specified perfectly unambiguously. What does the business mean by "within the last month"? Most people would take that to mean "within the current calendar month" in which case I'd use:
TRUNC(SYSDATE,'MM')
Otherwise, perhaps they want an arbitrary period of 1 month prior to the current date - but then how do you define that? As you've found, INTERVAL '1' MONTH simply subtracts one from the month portion of the date - e.g. 15-JAN-2009 - INTERVAL '1' MONTH returns 15-DEC-1999. For some dates, this results in an invalid date because not all months have the same number of days.
ADD_MONTHS resolves this by returning the last day in the month, e.g. ADD_MONTHS(31-DEC-2009,-1) returns 30-NOV-2009.
Another possibility is that the business actually wants to use an average month period - e.g. 365/12 which is approximately 30.4. They might want you to use SYSDATE-30, although of course twelve iterations of this will only cover 360 days of the year.

Resources