Use of adjusted vs.anadjusted prices for stock strategy backtesting? - quantmod

This is more of a methodological (rather than a programming) issue, yet it feels SO is the right place for it. Following the ups and downs after Yahoo changed its defaults in May 2017 for fetching daily data (discussed on https://github.com/joshuaulrich/quantmod/issues/174, http://blog.fosstrading.com/2017/06/yahoo-finance-alternatives.html and also on SO Why Open,High,Low prices are wrong when using quantmod?) I am probably not the only one not 100% certain which data to use in a backtesting procedure and whether quantmod getSymbols.yahoo and adjustOHLC still provide the relevant data for quality backtesting.
Quantmod 0.4.11 also includes AlphaVantage as (adjusted stock) data provider, but I am not familiar with their reliability.
How to prepare the (stock and index) data obtained from getSymbols calls? Which data ((stock & dividends) adjusted or unadjusted) should be used? Which transformations do you use? The adjustOHLC function also contains a bug, as it is not split adjusted (easily seen on AAPL by calling
getSymbols(AAPL)
chart_Series(adjustOHLC(AAPL))
and observing a jump in 2014.

You should always use adjusted prices. Most of the time when data provider doesn't have adjusted prices then usually provider's close prices are adjusted. There is no point doing backtests on a raw close prices data. I've once made a mistake by downloading close prices instead of adjusted and at the end of backtesting, my strategy told me that among all S&P composites Master Card was the worst performer. After looking at the MA chart it was obvious why.
Beacuse of a split on January 22, 2014 my data had a single return over -90%! In conclusion raw close data for backtesting might give you utterly false results.
How to deal with splits
Divide every price before a split by split ratio. For example Master Card had 1:10 split ratio so you should divide every price before 21.01.2014 by 10. It's very easy to find splits in a data, you just have to look for returns around or below -50%.
Dividends
Subtract from every price before dividend day dividend amount. To find dividends days you need dividends calendar, it's impossible to find them by yourself.

Related

Difference between use = "txns" and use = "trades" in tradeStrats()?

I'm backtesting a trading strategy using the quantstrat package, when generating trade statistics using the tradeStats() function, what does de use = argument changes if using "txns" instead of "trades" as the argument.
The tradeStats function is in blotter...you can read the source here - https://github.com/braverock/blotter/blob/master/R/tradeStats.R. When using "txns" the PnL is derived from the portfolio object, which typically is marked daily to the close price. When using "trades" the PnL will be based on the round trip trade (at least one buy txn and one sell txn, for example) which could be longer or shorter than the period used in the txn PL calculation. For intraday and other high frequency strategies you probably want to use "trades" and for lower frequency strategies that trade over periods typically spanning more than 1 day you probably want to use "txns" so that you get the analysis based on a daily mark-to-market of your portfolio. HTH.

End equity (and other trading stats) update issue in portfolio - blotter

I have multiple symbols in a portfolio but when running a blotter trading strategy the end equity only updates for the last symbol that has been run. When looking into how the equity updates each transaction it seems that when a new symbol is introduced the equity goes back to its original value set at the beginning (1mil).
This is how the portfolio is getting updated for a symbol month by month:
updatePortf(portfolioName,Symbols=symbolName, Dates=currentDate)
updateAcct(accountName,Dates=currentDate)
updateEndEq(accountName, currentDate)
Why is this happening?
Hope my question makes sense and thank you in advance
This is a good question. if you look at applyStrategy you'll see that each symbol in the loop is run in isolation -- independently. You may want to check out applyStrategy.rebalancing which does a nested loop of the form:
for(i in 2:length(pindex)){
#the proper endpoints for each symbol will vary, so we need to get
#them separately, and subset each one
for (symbol in symbols){
#sret<-ret[[portfolio]]
This means it loops over a section of timestamps, and then for each symbol, which is what you want when you want some interactions between symbols (applyStrategy simply does a for symbols, then an inner by timestamp loop, so you'll never get interactions going on) in terms of equity.
When I first starting using quantstrat I initially had the same frustration. My solution was to modify applyStrategy.rebalancing do become a (slower) double loop, for each time stamp, then an inner loop across each symbol.
Yes, this means you can't directly compute portfolio PL accurately in quantstrat. So things like opening positions that are position of current portfolio equity can't be done directly. (But you can modify the code to do it if you want).
Why does quantstrat behave this way by default? The authors will give you good reasons. In short, my view is that (after some brief discussions with the authors), if a signal has predictive power, and gives you an edge in a strategy, it'll work regardless of how you combine it with other symbols later. quantstrat is about identifying whether signals are good or not in relation to mktdata you pass to it.
Logically, if a signal is good on a per symbol level, then it will likely do ok on a portfolio level too (if not better, with smoother portfolio PL). quantstrat's current approach will give you a reasonable approximation to what portfolio PL will look like, but not in a true "compounding return" sense. To do that, you'd want to scale your positions according to the current portfolio PL (which isn't possible in applyStrategy as noted above). This simplification of running a strategy per symbol only also makes simulations much, much faster. Note that you can introduce still interactions with other symbols in applyStrategy by adding additional columns to the symbol data that relate to other symbols though (e.g. in pairs trading, etc).
At the end of the day backtest results are always simplifications of trading in reality, so there isn't a big motivation to get "super" accurate backtest results that project profit/trading revenue very accurately.

Extracting data from one data set, using another in r

I am trying to extract data from one data set (contains water quality data -- chlorophyll, dissolved oxygen, temp, etc), using information from another data set that contains tidal information (low tide times).
Background: It has recently come to my attention that due to hydrodynamics it will be best to only look at WQ data points measured at low tide, when I had previously just taken the daily average.
Is there a way I can extract specific WQ data based on if it aligns with date/time of the tidal data??? Caveats -- the times might not match up exactly, WQ data was measured every 15 minutes so I need the closest point(s) to the low tide time.
It is difficult to give the exact code without knowing the frequency of your tidal data. However, you can take a look at the following links, using which you could match the timestamps on both your datasets by rounding them off to the nearest hour/half hour/quarter hour (as the case may be):
rounding times to the nearest hour in R
Rounding time to nearest quarter hour
Hope this helps.

Having difficulty using R programming to implement a trading strategy using multiple securities

I am currently attempting to implement a trading idea that I have been playing around with. It consists of 50+ securities and has a strategy very similar to this one. (Current package I am using is quantmod).
http://www.r-bloggers.com/backtesting-a-simple-stock-trading-strategy/
For those who aren't interested in clicking, it is a strategy that will look at the pass X days( in his case 200 ) and enter a position depending on the peak reached in the stock. I understand how to do this strategy for my idea, but I cannot grasp how to aggregate my data into one summary.
Is there a way I can consolidate the summary for all the positions I have entered into one larger portfolio summary and chart that against the S&P 500?
Any advice on where I can find resources or being lead to the information. I have looked at portfolio analysis package for R and I do not believe that will be much help to me.
Thank you in advance.
Edit: In the link, at the bottom, there are 3 indexes that are FTSE, N225, DJIA. Could i combine those 3 summaries to show the same output as below, BUT combined
FTSE:
Me Index
Cumulative Return 3.56248582 3.8404476
Annual Return 0.05667121 0.0589431
Annualized Sharpe Ratio 0.45907768 0.3298633
Win % 0.53216374 0.5239884
Annualized Volatility 0.12344579 0.1786895
Maximum Drawdown -0.39653398 -0.5256991
Max Length Drawdown 1633.00000 2960.0000
Could I get that same output but for the 3 securities data combined? Is there a effective way of doing that. Thank you so much. Happy holidays
It's a little unclear to me what you mean by "combine" in this case. If you want a single column representing the combined returns from all three exchanges as if they were a single unified market, that's really tricky, because the exchanges trade in different currencies (British pounds; U.S. dollars, Japanese Yen, etc.). The underlying analysis would have to be modified substantially to take into account fluctuating daily foreign exchange rates.
I suspect that this is NOT want you want. Rather, you are simply asking how to take three sequential two-column outputs and turn them into a single parallel six-column output.
If that is indeed what you want, then you need to rewrite the testStrategy() function shown near the bottom of the link. As it's currently written, that function takes three inputs: an index name myStock (with allowed values of FTSE, DJIA, or N225), and two integer values, nHold and nHigh. You would need to change it so that it instead accepts five inputs; e.g., myStockA, myStockB and myStockC, plus the two integer values already mentioned. Then each of the lines currently referring to myStock would have to be replicated three times. Finally, the two cbind() lines that you see at the bottom would have to be modified so that instead of merging the data together into only two columns, you include all six.
For a good intro tutorial on how to write and modify your own R functions, please see this. To understand how to use the cbind() function, which you will have to call with six rather than two inputs, please see this.

How to download intraday stock market data with R

All,
I'm looking to download stock data either from Yahoo or Google on 15 - 60 minute intervals for as much history as I can get. I've come up with a crude solution as follows:
library(RCurl)
tmp <- getURL('https://www.google.com/finance/getprices?i=900&p=1000d&f=d,o,h,l,c,v&df=cpct&q=AAPL')
tmp <- strsplit(tmp,'\n')
tmp <- tmp[[1]]
tmp <- tmp[-c(1:8)]
tmp <- strsplit(tmp,',')
tmp <- do.call('rbind',tmp)
tmp <- apply(tmp,2,as.numeric)
tmp <- tmp[-apply(tmp,1,function(x) any(is.na(x))),]
Given the amount of data I'm looking to import, I worry that this could be computationally expensive. I also don't for the life of me, understand how the time stamps are coded in Yahoo and Google.
So my question is twofold--what's a simple, elegant way to quickly ingest data for a series of stocks into R, and how do I interpret the time stamping on the Google/Yahoo files that I would be using?
I will try to answer timestamp question first. Please note this is my interpretation and I could be wrong.
Using the link in your example https://www.google.com/finance/getprices?i=900&p=1000d&f=d,o,h,l,c,v&df=cpct&q=AAPL I get following data :
EXCHANGE%3DNASDAQ
MARKET_OPEN_MINUTE=570
MARKET_CLOSE_MINUTE=960
INTERVAL=900
COLUMNS=DATE,CLOSE,HIGH,LOW,OPEN,VOLUME
DATA=
TIMEZONE_OFFSET=-300
a1357828200,528.5999,528.62,528.14,528.55,129259
1,522.63,528.72,522,528.6499,2054578
2,523.11,523.69,520.75,522.77,1422586
3,520.48,523.11,519.6501,523.09,1130409
4,518.28,520.579,517.86,520.34,1215466
5,518.8501,519.48,517.33,517.94,832100
6,518.685,520.22,518.63,518.85,565411
7,516.55,519.2,516.55,518.64,617281
...
...
Note the first value of first column a1357828200, my intuition was that this has something to do with POSIXct. Hence a quick check :
> as.POSIXct(1357828200, origin = '1970-01-01', tz='EST')
[1] "2013-01-10 14:30:00 EST"
So my intuition seems to be correct. But the time seems to be off. Now we have one more info in the data. TIMEZONE_OFFSET=-300. So if we offset our timestamps by this amount we should get :
as.POSIXct(1357828200-300*60, origin = '1970-01-01', tz='EST')
[1] "2013-01-10 09:30:00 EST"
Note that I didn't know which day data you had requested. But quick check on google finance reveals, those were indeed price levels on 10th Jan 2013.
Remaining values from first column seem to be some sort of offset from first row value.
So downloading and standardizing the data ended up being more much of a bear than I figured it would--about 150 lines of code. The problem is that while Google provides the past 50 training days of data for all exchange-traded stocks, the time stamps within the days are not standardized: an index of '1,' for example could either refer to the first of second time increment on the first trading day in the data set. Even worse, stocks that only trade at low volumes only have entries where a transaction is recorded. For a high-volume stock like APPL that's no problem, but for low-volume small caps it means that your series will be missing much if not the majority of the data. This was problematic because I need all the stock series to lie neatly on to of each other for the analysis I'm doing.
Fortunately, there is still a general structure to the data. Using this link:
https://www.google.com/finance/getprices?i=1800&p=1000d&f=d,o,h,l,c,v&df=cpct&q=AAPL
and changing the stock ticker at the end will give you the past 50 days of trading days on 1/2-hourly increment. POSIX time stamps, very helpfully decoded by #geektrader, appear in the timestamp column at 3-week intervals. Though the timestamp indexes don't invariably correspond in a convenient 1:1 manner (I almost suspect this was intentional on Google's part) there is a pattern. For example, for the half-hourly series that I looked at the first trading day of ever three-week increment uniformly has timestamp indexes running in the 1:15 neighborhood. This could be 1:13, 1:14, 2:15--it all depends on the stock. I'm not sure what the 14th and 15th entries are: I suspect they are either daily summaries or after-hours trading info. The point is that there's no consistent pattern you can bank on.The first stamp in a training day, sadly, does not always contain the opening data. Same thing for the last entry and the closing data. I found that the only way to know what actually represents the trading data is to compare the numbers to the series on Google maps. After days of futiley trying to figure out how to pry a 1:1 mapping patter from the data, I settled on a "ballpark" strategy. I scraped APPL's data (a very high-volume traded stock) and set its timestamp indexes within each trading day as the reference values for the entire market. All days had a minimum of 13 increments, corresponding to the 6.5 hour trading day, but some had 14 or 15. Where this was the case I just truncated by taking the first 13 indexes. From there I used a while loop to essentially progress through the downloaded data of each stock ticker and compare its time stamp indexes within a given training day to the APPL timestamps. I kept the overlap, gap-filled the missing data, and cut out the non-overlapping portions.
Sounds like a simple fix, but for low-volume stocks with sparse transaction data there were literally dozens of special cases that I had to bake in and lots of data to interpolate. I got some pretty bizarre results for some of these that I know are incorrect. For high-volume, mid- and large-cap stocks, however, the solution worked brilliantly: for the most part the series either synced up very neatly with the APPL data and matched their Google Finance profiles perfectly.
There's no way around the fact that this method introduces some error, and I still need to fine-tune the method for spare small-caps. That said, shifting a series by a half hour or gap-filling a single time increment introduces a very minor amount of error relative to the overall movement of the market and the stock. I am confident that this data set I have is "good enough" to allow me to get relevant answers to some questions that I have. Getting this stuff commercially costs literally thousands of dollars.
Thoughts or suggestions?
Why not loading the data from Quandl? E.g.
library(Quandl)
Quandl('YAHOO/AAPL')
Update: sorry, I have just realized that only daily data is fetched with Quandl - but I leave my answer here as Quandl is really easy to query in similar cases
For the timezone offset, try:
as.POSIXct(1357828200, origin = '1970-01-01', tz=Sys.timezone(location = TRUE))
(The tz will automatically adjust according to your location)

Resources