Backtest over specific dates in Quanstrat R - r

How do I backtest over specific dates, for example 2008::2010 in Quanstrat?
I want to load symbols from 2001::2017, but i only want to back test over a subset of dates. (rather than reload the symbols every time for specific date ranges)

There is no built-in way to do this in quantstrat. In fact, there is a comment at the beginning of the apply* functions that says:
#TODO add Date subsetting
(patches welcome)
There are a number of possible ways to do this with the existing code though.
Probably the simplest way is to load all your market data into an environment, and then subset your market data into the .GlobalEnv before each call to applyStrategy.
Indicators and signals should use vectorized functions, and should take (at most) seconds to apply to the entire series. So the simplest thing is probably to run applyIndicators and applySignals manually over the entire series, and then call applyRules with just the subset you want.
You could also add a signal function that does understand subsets. This signal function would be last in the strategy specification, and would filter all your other signals to 0 outside of your preferred date range.

Related

How to check difference for data stored in same column in R?

I have a dataset with a column called Person and a column Time. The combination of these columns indicate at which time an employee completed a task. A person can complete multiple tasks on one day. I want to know what the difference between completion of two following tasks from the same person is and I want to store this data in another column. For sure I have to add a new column, but is this doable with one code? Or should I make a column first that stores the time of the next task completed by the same person? Any tips on how to do this?
I would tackle this using the dplyr package (though I am sure an equivalent solution exists using the data.table package).
Create a tibble (data frame) with the two columns, Person and Time.
Group the data by Person, and sort the data by Time. This will keep your data grouped by individual people, with each person's tasks in time order.
Then I would use the dplyr mutate command to create a 'TimeSinceLastTask' column. The equation you will need to do this needs to use the dplyr lead (or lag) functions to look up the following (or previous) result to subtract from the current value in Time.
If you are using times, I'd strongly recommend the use of lubridate to do your time difference calculations (makes it less messy).
I hope that makes sense. Not near an R terminal so can't safely create you a reprex that works (ie. I could guess but my blind coding never works first time!)
Hope that helps.
Andrew

What is the difference the zoo object and ts object in R?

I want to know the differences into use ts() or zoo() function.
A zoo object has the time values (possibly irregular) in an index attribute displayed like a row name at the console by the print.zoo method and the values in a matrix or atomic vector which places constraints on the values that can be used (generally numeric, but necessarily all of a single mode, i.e. not as a list with multiple modes like a dataframe might hold). With pkg:zoo loaded, to get a list of functions that have zoo-methods:
library(zoo)
methods(class="zoo")
The yrmon- class is added to allow monthly date indices. you can see the range of methods:
methods(class="yearmon")
The xts-class is an important extension to the zoo methods but an additional package is needed. There are many worked examples of zoo and xts functions on SO.
A ts-object has values of a single mode with attributes that always imply regular observations and those attributes support a recurring cycle such as years and months. Rather than storing the index item by item or row by row, the index is calculated on the fly using 'start', 'end' and 'frequency' values stored as attributes and accessible with functions by those names. The list of functions for ts-objects is distinctly small (and most people find them more difficult to work with):
methods(class="ts")
There was also an its-package for irregular time series, but it was distinctly less popular than the zoo-package and has apparently been abandoned.

How to automate a process by pulling elements from a data frame in R -looping with a string?

I am trying to automate a process instead of individually compute PPCC values for a large number of test cases. The details of my functions do not matter (though for reference I'm using Lmomco), my issue is either putting this into a loop or somehow using plyr or apply to repeat over and over. I do not know how to automate the string. For example I have sorted data by "M" parameter:
testx.100cv1<-by(x.cv1$first_year,x.cv1$M,sort)
I then apply a function here:
testexp<-lapply(testx.100cv1,parexp)
Now I want to do something to each "M", where in the example below, M = 1.02. Right now, I am manually changing this value and then recomputing for every M (and I have a lot of them). I'm looking for a way to write this M value into a loop so it reads it automatically.
exp<-quaexp(plotpos,testexp$'1.02')
PPCCexp<-cor(exp,testx.100cv1$'1.02')
I want to compute PPCC values for many distributions, so without automating, this will take over my life for a week.
Thanks!

Index xts using string and return only observations at that exact time

I have an xts time series in R and am using the very handy function to subset the time series based on a string, for example
time_series["17/06/2006 12:00:00"]
This will return the nearest observation to that date/time - which is very handy in many situations. However, in this particular situation I only want to return the elements of the time series which are at that exact time. Is there a way to do this in xts using a nice date/time string like this?
In a more general case (I don't have this problem immediately now, but suspect I may run into it soon) - is it possible to extract the closest observation within a certain period of time? For example, the closest observation to the given date/time, assuming it is within 10 minutes of the given date/time - otherwise just discard that observation.
I suspect this more general case may require me writing a function to do this - which I am happy to do - I just wanted to check whether the more specific case (or the general case) was already catered for in xts.
AFAIK, the only way to do this is to use a subset that begins at the time you're interested in, then get the first observation of that.
e.g.
first(time_series["2006-06-17 12:00:00/2006-06-17 12:01"])
or, more generally, to get the 12:00 price every day, you can subset down to 1 minute of each day, then split by days and extract the first observation of each.
do.call(rbind, lapply(split(time_series["T12:00:00/T12:01"],'days'), first))
Here's a thread where Jeff (the xts author) contemplates adding the functionality you want
http://r.789695.n4.nabble.com/Find-first-trade-of-day-in-xts-object-td3598441.html#a3599887

Operating with time intervals like 08:00-08:15

I would like to import a time-series where the first field indicates a period:
08:00-08:15
08:15-08:30
08:30-08:45
Does R have any features to do this neatly?
Thanks!
Update:
The most promising solution I found, as suggested by Godeke was the cron package and using substring() to extract the start of the interval.
I'm still working on related issues, so I'll update with the solution when I get there.
CRAN shows a package that is actively updated called "chron" that handles dates. You might want to check that and some of the other modules found here: http://cran.r-project.org/web/views/TimeSeries.html
xts and zoo handle irregular time series data on top of that. I'm not familiar with these packages, but a quick look over indicates you should be able to use them fairly easily by splitting on the hyphen and loading into the structures they provide.
So you're given a character vector like c("08:00-08:15",08:15-08:30) and you want to convert to an internal R data type for consistency? Check out the help files for POSIXt and strftime.
How about a function like this:
importTimes <- function(t){
t <- strsplit(t,"-")
return(lapply(t,strptime,format="%H:%M:%S"))
}
This will take a character vector like you described, and return a list of the same length, each element of which is a POSIXt 2-vector giving the start and end times (on today's date). If you want you could add a paste("1970-01-01",x) somewhere inside the function to standardize the date you're looking at if it's an issue.
Does that help at all?

Resources