I have a function in R that, given n days, returns a list of the last n weekdays. My solution works fine, but it feels inelegant, and I was wondering if there were any easy ways to improve it.
WeekdayList <- function(n) {
Today <- as.Date(Sys.time())
days <- c(Today)
i <- 1
while (length(days) < n) {
NewDay <- as.Date(Today-i)
if (!weekdays(NewDay) %in% c("Saturday", "Sunday")) {
days <- c(days,NewDay)
}
i <- i+1
}
days
}
WeekdayList(30)
WeekdayList(2)
Exclusion of holidays would be a nice feature too.
Vectorizing code is essential in R. Here is the example:
WeekdayList2 <- function(n) {
Today <- as.Date(Sys.time())
dayz <- seq(Today, Today - 2 * n, "-1 days")
dayz <- dayz[!(weekdays(dayz) %in% c("Saturday", "Sunday"))]
dayz <- dayz[seq_len(n)]
return(dayz)
}
identical(WeekdayList2(1000), WeekdayList(1000))
system.time(WeekdayList2(10000))
system.time(WeekdayList(10000))
[1] TRUE
user system elapsed
0 0 0
user system elapsed
4.90 0.00 4.91
As you can see, even though my function creates a vector twice almost twice the size it needs to be (and then deletes the weekends), it is much faster than using a "for" loop. My computer cannot even run your function with n = 100000 (not that that you'd care about that many days back anyway), but WeekdayList2 runs it almost instantly.
Since Holidays are relative to where you are, you'll probably need to manually upload a list of dates, and add another criteria to filter out those dates from the data.
I added a holiday calculation to Rguy's code.
WeekdayList3 <- function(n) {
library(timeDate)
Today <- as.Date(Sys.time())
dayz <- rev(seq(Today - 2 * n, Today, "days"))
years <- as.numeric(unique(format(dayz,'%Y')))
holidays <- as.Date(holidayNYSE(years))
dayz <- dayz[!(weekdays(dayz) %in% c("Saturday", "Sunday"))]
dayz <- dayz[!(dayz %in% holidays)]
dayz <- dayz[1 : n]
return(dayz)
}
WeekdayList3(100)
Related
Is there a simple R idiom for getting a sequence of all days in a given year? I can do the following which does ok, except for leap years:
dtt <- as.Date( paste( as.character(year), "-1-1", sep="") ) + seq( 0,364 )
I could, obviously, add a line to filter out any values in (year + 1) but I'm guessing there's a much shorter way to do this.
What about this:
R> length(seq( as.Date("2004-01-01"), as.Date("2004-12-31"), by="+1 day"))
[1] 366
R> length(seq( as.Date("2005-01-01"), as.Date("2005-12-31"), by="+1 day"))
[1] 365
R>
This uses nuttin' but base R to compute correctly on dates to give you your vector. If you want higher-level operators, look e.g. at lubridate or even my more rudimentary RcppBDT which wraps parts of the Boost Time_Date library.
Using Dirk's guidance I've settled on this:
getDays <- function(year){
seq(as.Date(paste(year, "-01-01", sep="")), as.Date(paste(year, "-12-31", sep="")), by="+1 day")
}
I'd be interested to know if it would be faster to invert the sequencing and the casting as.Date:
# My function getDays
getDays_1 <- function(year) {
d1 <- as.Date(paste(year, '-01-01', sep = ''));
d2 <- as.Date(paste(year, '-12-31', sep = ''));
as.Date(d1:d2, origin = '1970-01-01');
};
# other getDays
getDays_2 <- function(year) {
seq(as.Date(paste(year, '-01-01', sep='')),
as.Date(paste(year, '-12-31', sep='')),
by = '+1 day');
};
test_getDays_1 <- function(n = 10000) {
for(i in 1:n) {
getDays_1(2000);
};
};
test_getDays_2 <- function(n = 10000) {
for(i in 1:n) {
getDays_2(2000);
};
};
system.time(test_getDays_1());
# user system elapsed
# 4.80 0.00 4.81
system.time(test_getDays_2());
# user system elapsed
# 4.52 0.00 4.53
I guess not . . . it appears that sequencing Date objects is slightly faster than convert a vector of integers to Dates
I needed something similar, however for a range of dates I want to know the number of days in that year. I came up with the following function, which returns a vector with the same length as the dates in the input.
days_in_year <- function(dates) {
years <- year(dates)
days <- table(year(seq(as.Date(paste0(min(years), '-01-01')),
as.Date(paste0(max(years), '-12-31')),
by = '+1 day')))
as.vector(days[as.character(years)])
}
It works similar to Dirk's solution, however it uses the lubridate::year function to get the year part of all dates twice. Using table does the same as length, however for all unique years. It might use some more memory than strictly necessary if the dates are not in consecutive years.
Suppose you have 50 dollars and the fixed monthly interest rate is 5%. After the first year, you add 50 dollars to it for each subsequent years, what is the total amount of money you will get at the end of the three-year period.
I understand in R, it can be simply calculated as
((50 x 1.05^12) +50) x 1.05 ^12) + 50) x 1.05^12 = 540.64
Is there a way I can write a function or loop so when calculating a large number of years, for example, 10 years, 15 years, etc. without typing manually?
You can write a simple loop which might be easy to understand :
get_calc_year_loop <- function(year) {
ans <- 0
for(i in seq_len(year)) {
ans <- (ans + 50)*1.05^12
}
return(ans)
}
get_calc_year_loop(3)
#[1] 540.6386
However, you can also do this without loop using Reduce :
get_calc_year <- function(year) {
Reduce(function(x, y) (x + 50) * 1.05^12, seq_len(year), init = 0)
}
get_calc_year(3)
#[1] 540.6386
We can do this with reduce from purrr
library(purrr)
get_calc_year <- function(year) {
reduce(seq_len(year), ~ (.x + 50) * 1.05 ^12, .init = 0)
}
get_calc_year(3)
#[1] 540.6386
I'm new to R and currently poking that thing with a stick till it does, what I need to be done. Unfortunately I hit a wall with some performance issues.
My problem is, that I need a CCI indicator calculated on minute periods but refreshed every second for the "actual" minute of the iteration.
My implementation works but is incredibly slow. For 4 days of forex data on EUR/USD, based on second periods, I need almost 15 minutes to apply the indicator.
I did read some stuff about preallocation and slow rbind operations. I already reduced my rbind calls by refactoring the loops. But this didn't improve the performance. So I assume I'm loosing the time elsewhere.
Since I don't know anyone who is fit in R, I post my code here and hope for some help.
What I do is basically looping my second data, accumulate the seconds to minutes, calculate CCI and after I once did that for periode I then refresh the last minutebar every second.
addCCIToData <- function(bars, periode) {
#bars is OHLC based on second periods
#periode is number of periods for cci calculation
require(xts)
require(quantmod)
bars <- as.xts(bars)
bars$CCI <- 0
x <- 1
##scope is time of observation == periode
scope <- list()
for (i in 1:periode ) {
scope[[i]] <- 1 # save time by preallocating?
}
y <- nrow(bars)
lastminute <- 0
createdBarCount <- 1
enoughData <- FALSE
zeit1 <- as.POSIXlt(time(bars[x]))
while(x < y) {
zeit <- as.POSIXlt(time(bars[x]))
if(zeit$min != lastminute) {
zeit2 <-zeit
lastminute <- zeit$min
zeit1 <- as.POSIXlt(time(bars[x])) #reset zeit1 because of new 1 minute bar
createdBarCount <- createdBarCount + 1
if(createdBarCount > periode && enoughData == FALSE) {
enoughData = TRUE
i = 2
dataPeriodeMinus1 = scope[[1]]
while(i <= periode-1) {
dataPeriodeMinus1 = rbind(dataPeriodeMinus1, scope[[i]])
i = i + 1
}
createdBarCount <- periode
}
else if(enoughData == TRUE) {
newScope <- list()
for(i in 1:periode-1) {
newScope[i] <- scope[i+1]
}
scope = newScope
i = 2
dataPeriodeMinus1 = scope[[1]]
while(i <= periode-1) {
dataPeriodeMinus1 = rbind(dataPeriodeMinus1, scope[[i]])
i = i + 1
}
createdBarCount <- periode
}
}
a <- as.character(zeit1)
b <- as.character(as.POSIXlt(time(bars[x])))
c <- paste(a, b, sep = "::")
scope[createdBarCount] <- list(to.minutes(OHLC(bars[c]), 1,"CCI")) #merge the seconds to minutes
if(enoughData == TRUE) {
data = rbind(dataPeriodeMinus1, scope[[periode]])
# i = 2
# while(i <= periode) { #improve!, we need only the last bar to be binded here
# data = rbind(data, scope[[i]]) #internet says this is slow
# i = i + 1
# }
#bars[[x,5]] = SMA(data$CCI.Close, periode)[[periode]][[1]]
bars[[x,5]] = CCI(data[,c("CCI.High","CCI.Close", "CCI.Low")], periode, SMA)[[periode]]
}
x <- x+1
}
bars
}
Edit: fixed code.
Edit2: Testdata can be optained from here Tesdata
It can be loaded using the command "load("path/to/file)". Then just call addCCIToData(bars_seconds["2015-01-05 00:00:00::2015-01-05 02:00:00"], 14) after sourcing the above function. I really do think, that the continous merging of seconds to minutebars is the timeconsuming task. How can I optimize that?
Edit 3: Seems that calculation of CCI is also taking some time...
For the complete set of testdata I need
357s without cci calculation
902s with cci calculation
Thank you very much!
I am looking for a way to convert cron information into a list of timestamps, using R.
is there an easy way to do do?
Given the crontab and a start date and an end date, I would like to obtain the list of trigger timestamps during these 2 dates.
I have not found any package that specifically deals with CRON info, but maybe somebody has already had this problem?
Thanks
I assume you mean the crontab(5) format. I know of no library parsing this information, but the following snippet should get you started:
splitre <- function(p, s) {
s <- as.character(s)
stopifnot(length(s) == 1)
m <- gregexpr(p, s)[[1]]
if (m[1] == -1) return(s);
return(substring(s, c(1, m + attr(m, "match.length")), c(m - 1, nchar(s))))
}
ranges <- function(desc, lo, hi, name) {
res <- integer(0)
for (range in splitre(",", desc)) {
m <- regexec("^(?:\\*|(?:(\\d+)(?:-(\\d+))?))(?:/(\\d+))?$", range)
m <- regmatches(range, m)[[1]]
m[m == ""] <- NA
m[1] <- NA
m <- as.integer(m)
if (is.na(m[2])) r <- lo:hi
else if (is.na(m[3])) r <- m[2]
else r <- m[2]:m[3]
if (!is.na(m[4])) {
stopifnot(m[4] > 0)
r <- r[rep(c(TRUE, rep(FALSE, m[4] - 1)), length.out = length(r))]
}
res <- c(res, r)
}
res <- data.frame(res)
names(res) <- name
return(res)
}
ct2df <- function(lines) {
res <- data.frame()
for (line in lines) {
if (regexpr("^ *(#|$)", line) == 1) continue
parts <- splitre(" +", line)
stopifnot(length(parts) > 5)
j <- ranges(parts[1], 0, 59, "minute")
j <- merge(j, ranges(parts[2], 0, 23, "hour"))
j <- merge(j, ranges(parts[3], 1, 31, "day.of.month"))
j <- merge(j, ranges(parts[4], 1, 12, "month"))
j <- merge(j, ranges(parts[5], 0, 6, "day.of.week"))
res <- rbind(res, j)
}
return(res)
}
print(ct2df("* 1-2,5 1-10/2 */3 1 command"))
This is not perfect, as it won't handle names for months or day of week, and it won't handle the special case about day of month vs. day of week, which requires treatment of * as more than a simple range.
Note: The day of a command's execution can be specified by two fields - day of month, and day of week. If both fields are restricted (ie, aren't *), the command will be run when either field matches the current time. For example, 30 4 1,15 * 5 would cause a command to be run at 4:30 am on the 1st and 15th of each month, plus every Friday.
The resulting data frame can be turned into a list of timestamps, but I haven't written code for that. Perhaps someone else will, building on this post. The simple but slow method would be iterating over possible timestamps minute by minute, ans for every timestamp see whether a row from the computed data frame matches that value. Faster solutions would iterate on a day-by-day basis, use that to work out day of week and day of month, and then take times from all rows matching that day.
Is there a simple R idiom for getting a sequence of all days in a given year? I can do the following which does ok, except for leap years:
dtt <- as.Date( paste( as.character(year), "-1-1", sep="") ) + seq( 0,364 )
I could, obviously, add a line to filter out any values in (year + 1) but I'm guessing there's a much shorter way to do this.
What about this:
R> length(seq( as.Date("2004-01-01"), as.Date("2004-12-31"), by="+1 day"))
[1] 366
R> length(seq( as.Date("2005-01-01"), as.Date("2005-12-31"), by="+1 day"))
[1] 365
R>
This uses nuttin' but base R to compute correctly on dates to give you your vector. If you want higher-level operators, look e.g. at lubridate or even my more rudimentary RcppBDT which wraps parts of the Boost Time_Date library.
Using Dirk's guidance I've settled on this:
getDays <- function(year){
seq(as.Date(paste(year, "-01-01", sep="")), as.Date(paste(year, "-12-31", sep="")), by="+1 day")
}
I'd be interested to know if it would be faster to invert the sequencing and the casting as.Date:
# My function getDays
getDays_1 <- function(year) {
d1 <- as.Date(paste(year, '-01-01', sep = ''));
d2 <- as.Date(paste(year, '-12-31', sep = ''));
as.Date(d1:d2, origin = '1970-01-01');
};
# other getDays
getDays_2 <- function(year) {
seq(as.Date(paste(year, '-01-01', sep='')),
as.Date(paste(year, '-12-31', sep='')),
by = '+1 day');
};
test_getDays_1 <- function(n = 10000) {
for(i in 1:n) {
getDays_1(2000);
};
};
test_getDays_2 <- function(n = 10000) {
for(i in 1:n) {
getDays_2(2000);
};
};
system.time(test_getDays_1());
# user system elapsed
# 4.80 0.00 4.81
system.time(test_getDays_2());
# user system elapsed
# 4.52 0.00 4.53
I guess not . . . it appears that sequencing Date objects is slightly faster than convert a vector of integers to Dates
I needed something similar, however for a range of dates I want to know the number of days in that year. I came up with the following function, which returns a vector with the same length as the dates in the input.
days_in_year <- function(dates) {
years <- year(dates)
days <- table(year(seq(as.Date(paste0(min(years), '-01-01')),
as.Date(paste0(max(years), '-12-31')),
by = '+1 day')))
as.vector(days[as.character(years)])
}
It works similar to Dirk's solution, however it uses the lubridate::year function to get the year part of all dates twice. Using table does the same as length, however for all unique years. It might use some more memory than strictly necessary if the dates are not in consecutive years.