Rblpapi BDH to get historical fundamental data - r

My objective is to get fundamental data from Bloomberg via Rblpapi. Say you wanted to compare QoQ and YoY revenue per share for AMD stock - in last reporting period (date:12/26/15) to 1yr before (date:12/27/14).
# To get data for last reporting period you could
last_report_dt = bdp ("AMD US Equity", "MOST_RECENT_PERIOD_END_DT")
rev_yrly_cur = bdh("AMD US Equity","REVENUE_PER_SH",last_report_dt,last_report_dt, opt=c("periodicitySelection"="YEARLY"))
rev_qtrly_cur = bdh("AMD US Equity","REVENUE_PER_SH",last_report_dt,last_report_dt, opt=c("periodicitySelection"="QUARTERLY"))
Question is how to get the reporting date for the year before (12/27/2014) programmatically (I have many tickers) so I can get revenue for that period and compare.
Any suggestions or workarounds welcome?

Try something along the lines of:
bdp("AMD US Equity","REVENUE_PER_SH", override_fields = "EQY_FUND_RELATIVE_PERIOD", override_values = "-1FY")
This means get the value of the previous financial year. Other examples for options you can override with are: "-1FQ", "-1CQ" meaning previous financial quarter and previous calendar year, respectively.
Also, if you want to test easily you can use Excel API or FLDS on the Bloomberg Terminal. The formula to test this with Excel API is:
=BDP($E8,F$7,"DX243=-3FQ")

Overrides is the solution:
bdp("AMD US Equity","REVENUE_PER_SH",overrides=c("EQY_FUND_RELATIVE_PERIOD"="-1FQ"))

Related

R crypto2: How to retreive global market cap historical data?

I would like to retreive the historical global marketcap from Coinmarketcap.com with the help of the crypto2 R package. Unfortunately, I haven't found the respective function in the package reference.
I know that there is an alternative package called coinmarketcapr that provides an explicit function to do exactly that but it requires an official API and costly Coinmarketcap subscription in order to retreive all historical data. In contrast, the crypto2 package scrapes the data directly and doesn't go through the API.
I also know that there is a possibility to use the crypto_history function in the crypto2 for all coins and aggregate them but scraping the data just in order to get the global market cap takes hours in computing time.
This is the code that I am using so far:
library(crypto2)
crypto_history(
coin_list = NULL,
convert = "USD",
start_date = as.Date("2020-01-01"),
end_date = as.Date(Sys.Date()),
interval = "daily",
sleep = 60,
finalWait = TRUE
)
The abovementioned code returns a tibble/dataframe object with all coins including some fundamental data such as OHLC prices as well as the market cap that needs to be aggregated across all individual coins in order to receive the overall marekt cap.
Any help would be highly appreciated!
Of course, I will use the data for educational/non-commercial uses only.
Thank you very much in advance for your help!

Difference between use = "txns" and use = "trades" in tradeStrats()?

I'm backtesting a trading strategy using the quantstrat package, when generating trade statistics using the tradeStats() function, what does de use = argument changes if using "txns" instead of "trades" as the argument.
The tradeStats function is in blotter...you can read the source here - https://github.com/braverock/blotter/blob/master/R/tradeStats.R. When using "txns" the PnL is derived from the portfolio object, which typically is marked daily to the close price. When using "trades" the PnL will be based on the round trip trade (at least one buy txn and one sell txn, for example) which could be longer or shorter than the period used in the txn PL calculation. For intraday and other high frequency strategies you probably want to use "trades" and for lower frequency strategies that trade over periods typically spanning more than 1 day you probably want to use "txns" so that you get the analysis based on a daily mark-to-market of your portfolio. HTH.

Query in back-testing strategy in R- Indian trader perspective

There is a documentation for backtesting in R in GitHub(https://timtrice.github.io/backtesting-strategies/).
I have a query in two lines of code mentioned in this document (https://timtrice.github.io/backtesting-strategies/using-quantstrat.html#settings-and-variables).
First line
Sys.setenv(TZ = "UTC")
Second line
currency('USD')
As you can see, the first line sets - system time to the US and the second line - sets the currency in which trading is occurring to the US. I am an Indian Trader and my job is to do back-testing with equity data for Indian companies. I use quantstrat and quantmod packages along with its dependencies. The data is downloaded from Yahoo Finance through R platform.
What is the argument should an Indian trader pass to both these
functions(Sys.setenv and currency)???. The currency of Indian market
is INR(Indian Nation Rupees) and the time of India is GMT+5:30
I have tried to pass the argument "GMT+5:30" to Sys.setenv function and it turned back an error. But when i tried to pass GMT, there was no error. But Indian timing is GMT+5:30.
I found the answer. For determining the time zone, type OlsonNames() in R. You will get a comprehensive list of timezones. Among that, please choose the specific one according to your timezone. So for me(Indian trader), it would be Sys.getenv("Asia/Kolkata") For the currency, please set it as currency("INR") . I thank Ilya Kipnis - for helping in arriving at solution.

Use of adjusted vs.anadjusted prices for stock strategy backtesting?

This is more of a methodological (rather than a programming) issue, yet it feels SO is the right place for it. Following the ups and downs after Yahoo changed its defaults in May 2017 for fetching daily data (discussed on https://github.com/joshuaulrich/quantmod/issues/174, http://blog.fosstrading.com/2017/06/yahoo-finance-alternatives.html and also on SO Why Open,High,Low prices are wrong when using quantmod?) I am probably not the only one not 100% certain which data to use in a backtesting procedure and whether quantmod getSymbols.yahoo and adjustOHLC still provide the relevant data for quality backtesting.
Quantmod 0.4.11 also includes AlphaVantage as (adjusted stock) data provider, but I am not familiar with their reliability.
How to prepare the (stock and index) data obtained from getSymbols calls? Which data ((stock & dividends) adjusted or unadjusted) should be used? Which transformations do you use? The adjustOHLC function also contains a bug, as it is not split adjusted (easily seen on AAPL by calling
getSymbols(AAPL)
chart_Series(adjustOHLC(AAPL))
and observing a jump in 2014.
You should always use adjusted prices. Most of the time when data provider doesn't have adjusted prices then usually provider's close prices are adjusted. There is no point doing backtests on a raw close prices data. I've once made a mistake by downloading close prices instead of adjusted and at the end of backtesting, my strategy told me that among all S&P composites Master Card was the worst performer. After looking at the MA chart it was obvious why.
Beacuse of a split on January 22, 2014 my data had a single return over -90%! In conclusion raw close data for backtesting might give you utterly false results.
How to deal with splits
Divide every price before a split by split ratio. For example Master Card had 1:10 split ratio so you should divide every price before 21.01.2014 by 10. It's very easy to find splits in a data, you just have to look for returns around or below -50%.
Dividends
Subtract from every price before dividend day dividend amount. To find dividends days you need dividends calendar, it's impossible to find them by yourself.

How to download intraday stock market data with R

All,
I'm looking to download stock data either from Yahoo or Google on 15 - 60 minute intervals for as much history as I can get. I've come up with a crude solution as follows:
library(RCurl)
tmp <- getURL('https://www.google.com/finance/getprices?i=900&p=1000d&f=d,o,h,l,c,v&df=cpct&q=AAPL')
tmp <- strsplit(tmp,'\n')
tmp <- tmp[[1]]
tmp <- tmp[-c(1:8)]
tmp <- strsplit(tmp,',')
tmp <- do.call('rbind',tmp)
tmp <- apply(tmp,2,as.numeric)
tmp <- tmp[-apply(tmp,1,function(x) any(is.na(x))),]
Given the amount of data I'm looking to import, I worry that this could be computationally expensive. I also don't for the life of me, understand how the time stamps are coded in Yahoo and Google.
So my question is twofold--what's a simple, elegant way to quickly ingest data for a series of stocks into R, and how do I interpret the time stamping on the Google/Yahoo files that I would be using?
I will try to answer timestamp question first. Please note this is my interpretation and I could be wrong.
Using the link in your example https://www.google.com/finance/getprices?i=900&p=1000d&f=d,o,h,l,c,v&df=cpct&q=AAPL I get following data :
EXCHANGE%3DNASDAQ
MARKET_OPEN_MINUTE=570
MARKET_CLOSE_MINUTE=960
INTERVAL=900
COLUMNS=DATE,CLOSE,HIGH,LOW,OPEN,VOLUME
DATA=
TIMEZONE_OFFSET=-300
a1357828200,528.5999,528.62,528.14,528.55,129259
1,522.63,528.72,522,528.6499,2054578
2,523.11,523.69,520.75,522.77,1422586
3,520.48,523.11,519.6501,523.09,1130409
4,518.28,520.579,517.86,520.34,1215466
5,518.8501,519.48,517.33,517.94,832100
6,518.685,520.22,518.63,518.85,565411
7,516.55,519.2,516.55,518.64,617281
...
...
Note the first value of first column a1357828200, my intuition was that this has something to do with POSIXct. Hence a quick check :
> as.POSIXct(1357828200, origin = '1970-01-01', tz='EST')
[1] "2013-01-10 14:30:00 EST"
So my intuition seems to be correct. But the time seems to be off. Now we have one more info in the data. TIMEZONE_OFFSET=-300. So if we offset our timestamps by this amount we should get :
as.POSIXct(1357828200-300*60, origin = '1970-01-01', tz='EST')
[1] "2013-01-10 09:30:00 EST"
Note that I didn't know which day data you had requested. But quick check on google finance reveals, those were indeed price levels on 10th Jan 2013.
Remaining values from first column seem to be some sort of offset from first row value.
So downloading and standardizing the data ended up being more much of a bear than I figured it would--about 150 lines of code. The problem is that while Google provides the past 50 training days of data for all exchange-traded stocks, the time stamps within the days are not standardized: an index of '1,' for example could either refer to the first of second time increment on the first trading day in the data set. Even worse, stocks that only trade at low volumes only have entries where a transaction is recorded. For a high-volume stock like APPL that's no problem, but for low-volume small caps it means that your series will be missing much if not the majority of the data. This was problematic because I need all the stock series to lie neatly on to of each other for the analysis I'm doing.
Fortunately, there is still a general structure to the data. Using this link:
https://www.google.com/finance/getprices?i=1800&p=1000d&f=d,o,h,l,c,v&df=cpct&q=AAPL
and changing the stock ticker at the end will give you the past 50 days of trading days on 1/2-hourly increment. POSIX time stamps, very helpfully decoded by #geektrader, appear in the timestamp column at 3-week intervals. Though the timestamp indexes don't invariably correspond in a convenient 1:1 manner (I almost suspect this was intentional on Google's part) there is a pattern. For example, for the half-hourly series that I looked at the first trading day of ever three-week increment uniformly has timestamp indexes running in the 1:15 neighborhood. This could be 1:13, 1:14, 2:15--it all depends on the stock. I'm not sure what the 14th and 15th entries are: I suspect they are either daily summaries or after-hours trading info. The point is that there's no consistent pattern you can bank on.The first stamp in a training day, sadly, does not always contain the opening data. Same thing for the last entry and the closing data. I found that the only way to know what actually represents the trading data is to compare the numbers to the series on Google maps. After days of futiley trying to figure out how to pry a 1:1 mapping patter from the data, I settled on a "ballpark" strategy. I scraped APPL's data (a very high-volume traded stock) and set its timestamp indexes within each trading day as the reference values for the entire market. All days had a minimum of 13 increments, corresponding to the 6.5 hour trading day, but some had 14 or 15. Where this was the case I just truncated by taking the first 13 indexes. From there I used a while loop to essentially progress through the downloaded data of each stock ticker and compare its time stamp indexes within a given training day to the APPL timestamps. I kept the overlap, gap-filled the missing data, and cut out the non-overlapping portions.
Sounds like a simple fix, but for low-volume stocks with sparse transaction data there were literally dozens of special cases that I had to bake in and lots of data to interpolate. I got some pretty bizarre results for some of these that I know are incorrect. For high-volume, mid- and large-cap stocks, however, the solution worked brilliantly: for the most part the series either synced up very neatly with the APPL data and matched their Google Finance profiles perfectly.
There's no way around the fact that this method introduces some error, and I still need to fine-tune the method for spare small-caps. That said, shifting a series by a half hour or gap-filling a single time increment introduces a very minor amount of error relative to the overall movement of the market and the stock. I am confident that this data set I have is "good enough" to allow me to get relevant answers to some questions that I have. Getting this stuff commercially costs literally thousands of dollars.
Thoughts or suggestions?
Why not loading the data from Quandl? E.g.
library(Quandl)
Quandl('YAHOO/AAPL')
Update: sorry, I have just realized that only daily data is fetched with Quandl - but I leave my answer here as Quandl is really easy to query in similar cases
For the timezone offset, try:
as.POSIXct(1357828200, origin = '1970-01-01', tz=Sys.timezone(location = TRUE))
(The tz will automatically adjust according to your location)

Resources