VB or macro to exclude period of times from time duration calculation in Excel - datetime

I have an Excel table which contains thousands of incident tickets. Each tickets typically carried over few hours or few days, and I usually calculate the total duration by substracting opening date and time from closing date and time.
However I would like to take into account and not count the out of office hours (night time), week-ends and holidays.
I have therefore created two additional reference tables, one which contains the non-working hours (eg everyday after 7pm until 7am in the morning, saturday and sunday all day, and list of public holidays).
Now I need to find some sort of VB macro that would automatically calculate each ticket "real duration" by removing from the total ticket time any time that would fall under that list.
I had a look around this website and other forums, however I could not find what I am looking for. If someone can help me achieve this, I would be extremely grateful.
Best regards,
Alex

You can use the NETWORKDAYS function to calculate the number of working days in the interval. Actually you seem to be perfectly set up for it: it takes start date, end date and a pointer to a range of holidays. By default it counts all days non-weekend.
For calculating the intraday time, you will need some additional magic. assuming that tickets are only opened and closed in bussines hours, it would look like this:
first_day_hrs := dayend - ticketstart
last_day_hrs := ticketend - daystart
inbeetween_hrs := (NETWORKDAYS(ticketstart, ticketend, rng_holidays) - 2) * (dayend - daystart)
total_hrs := first_day_hrs + inbetween_hrs + last_day_hrs
Of course the names should in reality refer to Excel cells. I recommend using lists and/or names.

Related

Double for-loop R takes a very long time

I'm new with R and I just made a double for-loop which takes a very long time to run. Can someone help me to change this code to make it run faster?
I have a data set with two columns: a column with the amounts paid (amount) and a column with the day the amount is paid (days). It is possible that more amounts are paid on the same day. With the code, I want to calculate the sum of all amounts paid at each day, so, for example:
day 1: $100
day 2: $150
etc.
I made the following code. As already said, it works but takes a very long time to run.
sum_day = rep(0,203)
for(i in 1:7555){
for (k in 0:203){
if(data$days[i]==k){
sum_day[k+1] = sum_day[k+1] + data$bedrag[i]
}}}
Can someone please help me?

R - Cluster x number of events within y time period

I have a dataset that has 59k entries recorded over 63 years, I need to identify clusters of events with the criteria being:
6 or more events within 6 hours
Each event has a unique ID, time HH:MM:SS and date DD:MM:YY, an output would ideally have a cluster ID, the eventS that took place within each cluster, and start and finish time and date.
Thinking about the problem in R we would need to look at every date/time and count the number of events in the following 6 hours, if the number is 6 or greater save the event IDs, if not move onto the next date and perform the same task. I have taken a data extract that just contains EventID, Date, Time and Year.
https://dl.dropboxusercontent.com/u/16400709/StackOverflow/DataStack.csv
If I come up with anything in the meantime I will post below.
Update: Having taken a break to think about the problem I have a new approach.
Add 6 hours to the Date/Time of each event then count the number of events that fall within the start end time, if there are 6 or more take the eventIDs and assign them a clusterID. Then move onto the next event and repeat 59k times as a loop.
Don't use clustering. It's the wrong tool. And the wrong term. You are not looking for abstract "clusters", but something much simpler and much more well defined. In particular, your data is 1 dimensional, which makes things a lot easier than the multivariate case omnipresent in clustering.
Instead, sort your data and use a sliding window.
If your data is sorted, and time[x+5] - time[x] < 6 hours, then these events satisfy your condition.
Sorting is O(n log n), but highly optimized. The remainder is O(n) in a single pass over your data. This will beat every single clustering algorithm, because they don't exploit your data characteristics.

Dates in SQLite3, with a twist (inaccurate dates)

I am working on genealogical software that stores its data in SQLite3 format. Everything works fine, except for one minor detail. Not in all cases is the accuracy of the birth or death dates (etc) available to the exact day. So I have the following accuracies:
exact (YYYY-MM-DD)
month (YYYY-MM)
year (YYYY)
year (YYYY+/-5)
year (YYYY+/-10)
year (YYYY+/-50)
decade
century
Now, assuming I store everything in a single column, I end up with a problem. Since SQLite3 has the Julian Day function I was thinking to encode the accuracy in the fractional part of the REAL Julian Day (I don't need the hours anyway). That is fine, but it complicates the way SELECTs work, in fact it means that stuff I could otherwise offload to SQLite3 has to be implemented in application code.
What would be a reasonable method to store the inaccurate dates and be able to query them quickly?
Note: if it matters to anyone answering, the language used is Python, but I am asking in general.
When doing queries on those date values, the most common operation probably is to check whether a date might match another date.
For this, you always need the start and the end of the interval, so it would make sense to store these two values in the DB.
(Call them Start/End or Min/Max or Earliest/Latest or whatever makes sense.)
For example, to find people who might have been born one century ago:
... WHERE '1913-04-16' BETWEEN BirthDateMin AND BirthDateMax
Inequality comparisons can be done with one of the interval boundaries.
For example, to find people who might have been born more than one century ago:
... WHERE BirthDateMin < '1913-04-16'
Just because you're storing date information, doesn't mean that the built-in date type is the right one for you. Your data requirements (date inaccuracy) means that it's probably more accurate and better long-term to do some custom date-handling work, and avoid using the built-in date data types.
Use two columns. One column is the approximate date, as accurate as possible, in SQLite format. The second column is the accuracy of the date in days. If the date is absolutely accurate, the second column is zero. If only the month is known, the date would be mid month and the second column 15 days. Etc. Date comparisons can be done by comparing against the date +/- the accuracy column.

How to download intraday stock market data with R

All,
I'm looking to download stock data either from Yahoo or Google on 15 - 60 minute intervals for as much history as I can get. I've come up with a crude solution as follows:
library(RCurl)
tmp <- getURL('https://www.google.com/finance/getprices?i=900&p=1000d&f=d,o,h,l,c,v&df=cpct&q=AAPL')
tmp <- strsplit(tmp,'\n')
tmp <- tmp[[1]]
tmp <- tmp[-c(1:8)]
tmp <- strsplit(tmp,',')
tmp <- do.call('rbind',tmp)
tmp <- apply(tmp,2,as.numeric)
tmp <- tmp[-apply(tmp,1,function(x) any(is.na(x))),]
Given the amount of data I'm looking to import, I worry that this could be computationally expensive. I also don't for the life of me, understand how the time stamps are coded in Yahoo and Google.
So my question is twofold--what's a simple, elegant way to quickly ingest data for a series of stocks into R, and how do I interpret the time stamping on the Google/Yahoo files that I would be using?
I will try to answer timestamp question first. Please note this is my interpretation and I could be wrong.
Using the link in your example https://www.google.com/finance/getprices?i=900&p=1000d&f=d,o,h,l,c,v&df=cpct&q=AAPL I get following data :
EXCHANGE%3DNASDAQ
MARKET_OPEN_MINUTE=570
MARKET_CLOSE_MINUTE=960
INTERVAL=900
COLUMNS=DATE,CLOSE,HIGH,LOW,OPEN,VOLUME
DATA=
TIMEZONE_OFFSET=-300
a1357828200,528.5999,528.62,528.14,528.55,129259
1,522.63,528.72,522,528.6499,2054578
2,523.11,523.69,520.75,522.77,1422586
3,520.48,523.11,519.6501,523.09,1130409
4,518.28,520.579,517.86,520.34,1215466
5,518.8501,519.48,517.33,517.94,832100
6,518.685,520.22,518.63,518.85,565411
7,516.55,519.2,516.55,518.64,617281
...
...
Note the first value of first column a1357828200, my intuition was that this has something to do with POSIXct. Hence a quick check :
> as.POSIXct(1357828200, origin = '1970-01-01', tz='EST')
[1] "2013-01-10 14:30:00 EST"
So my intuition seems to be correct. But the time seems to be off. Now we have one more info in the data. TIMEZONE_OFFSET=-300. So if we offset our timestamps by this amount we should get :
as.POSIXct(1357828200-300*60, origin = '1970-01-01', tz='EST')
[1] "2013-01-10 09:30:00 EST"
Note that I didn't know which day data you had requested. But quick check on google finance reveals, those were indeed price levels on 10th Jan 2013.
Remaining values from first column seem to be some sort of offset from first row value.
So downloading and standardizing the data ended up being more much of a bear than I figured it would--about 150 lines of code. The problem is that while Google provides the past 50 training days of data for all exchange-traded stocks, the time stamps within the days are not standardized: an index of '1,' for example could either refer to the first of second time increment on the first trading day in the data set. Even worse, stocks that only trade at low volumes only have entries where a transaction is recorded. For a high-volume stock like APPL that's no problem, but for low-volume small caps it means that your series will be missing much if not the majority of the data. This was problematic because I need all the stock series to lie neatly on to of each other for the analysis I'm doing.
Fortunately, there is still a general structure to the data. Using this link:
https://www.google.com/finance/getprices?i=1800&p=1000d&f=d,o,h,l,c,v&df=cpct&q=AAPL
and changing the stock ticker at the end will give you the past 50 days of trading days on 1/2-hourly increment. POSIX time stamps, very helpfully decoded by #geektrader, appear in the timestamp column at 3-week intervals. Though the timestamp indexes don't invariably correspond in a convenient 1:1 manner (I almost suspect this was intentional on Google's part) there is a pattern. For example, for the half-hourly series that I looked at the first trading day of ever three-week increment uniformly has timestamp indexes running in the 1:15 neighborhood. This could be 1:13, 1:14, 2:15--it all depends on the stock. I'm not sure what the 14th and 15th entries are: I suspect they are either daily summaries or after-hours trading info. The point is that there's no consistent pattern you can bank on.The first stamp in a training day, sadly, does not always contain the opening data. Same thing for the last entry and the closing data. I found that the only way to know what actually represents the trading data is to compare the numbers to the series on Google maps. After days of futiley trying to figure out how to pry a 1:1 mapping patter from the data, I settled on a "ballpark" strategy. I scraped APPL's data (a very high-volume traded stock) and set its timestamp indexes within each trading day as the reference values for the entire market. All days had a minimum of 13 increments, corresponding to the 6.5 hour trading day, but some had 14 or 15. Where this was the case I just truncated by taking the first 13 indexes. From there I used a while loop to essentially progress through the downloaded data of each stock ticker and compare its time stamp indexes within a given training day to the APPL timestamps. I kept the overlap, gap-filled the missing data, and cut out the non-overlapping portions.
Sounds like a simple fix, but for low-volume stocks with sparse transaction data there were literally dozens of special cases that I had to bake in and lots of data to interpolate. I got some pretty bizarre results for some of these that I know are incorrect. For high-volume, mid- and large-cap stocks, however, the solution worked brilliantly: for the most part the series either synced up very neatly with the APPL data and matched their Google Finance profiles perfectly.
There's no way around the fact that this method introduces some error, and I still need to fine-tune the method for spare small-caps. That said, shifting a series by a half hour or gap-filling a single time increment introduces a very minor amount of error relative to the overall movement of the market and the stock. I am confident that this data set I have is "good enough" to allow me to get relevant answers to some questions that I have. Getting this stuff commercially costs literally thousands of dollars.
Thoughts or suggestions?
Why not loading the data from Quandl? E.g.
library(Quandl)
Quandl('YAHOO/AAPL')
Update: sorry, I have just realized that only daily data is fetched with Quandl - but I leave my answer here as Quandl is really easy to query in similar cases
For the timezone offset, try:
as.POSIXct(1357828200, origin = '1970-01-01', tz=Sys.timezone(location = TRUE))
(The tz will automatically adjust according to your location)

Interval of one month back not working on the 31st?

Essentially, I have a query that is responsible for fetching all records (with specific filters) within the last month. I'm using Oracle's interval keyword and all was working great until today (December 31st, 2009). The code I'm using is
select (sysdate - interval '1' month) from dual
and the error I get it
ORA-01839: date not valid for month specified
How can I use the interval keyword to be compatible with any date? Or if anyone has a better way of approaching the issue, I'm all ears.
Thank you.
try
select add_months(sysdate,-1) from dual
Being pedantic...
The requirements are not quite specified perfectly unambiguously. What does the business mean by "within the last month"? Most people would take that to mean "within the current calendar month" in which case I'd use:
TRUNC(SYSDATE,'MM')
Otherwise, perhaps they want an arbitrary period of 1 month prior to the current date - but then how do you define that? As you've found, INTERVAL '1' MONTH simply subtracts one from the month portion of the date - e.g. 15-JAN-2009 - INTERVAL '1' MONTH returns 15-DEC-1999. For some dates, this results in an invalid date because not all months have the same number of days.
ADD_MONTHS resolves this by returning the last day in the month, e.g. ADD_MONTHS(31-DEC-2009,-1) returns 30-NOV-2009.
Another possibility is that the business actually wants to use an average month period - e.g. 365/12 which is approximately 30.4. They might want you to use SYSDATE-30, although of course twelve iterations of this will only cover 360 days of the year.

Resources