Scraping Options Data from Google Finance using R

Scraping Options Data from Google Finance using R - r

I recently saw a function that will scrape Options Data from Google Finance
It works fine, but I am trying to use the function to call 20 different stocks (although some of them do not have options data) and save them as a list....
I have modified the getOptionQuote function to fit my needs:
Functions provided from the site above:
# installs RCurl and jsonlite packages. will prompt you to select mirror for download
install.packages("RCurl");library("RCurl")
install.packages("jsonlite");library("jsonlite")
# ******************************************************************
getOC2 <- function(symbol){
output = list()
url = paste('http://www.google.com/finance/option_chain?q=', symbol, '&output=json', sep = "")
x = getURL(url)
fix = fixJSON(x)
json = fromJSON(fix)
numExp = dim(json$expirations)[1]
if(length(numExp)!= 0){
for(i in 1:numExp){
# download each expirations data
y = json$expirations[i,]$y
m = json$expirations[i,]$m
d = json$expirations[i,]$d
expName = paste(y, m, d, sep = "_")
if (i > 1){
url = paste('http://www.google.com/finance/option_chain?q=', symbol, '&output=json&expy=', y, '&expm=', m, '&expd=', d, sep = "")
json = fromJSON(fixJSON(getURL(url)))
}
output[[paste(expName, "calls", sep = "_")]] = json$calls
output[[paste(expName, "puts", sep = "_")]] = json$puts
}}
assign(paste(symbol),do.call(rbind,output))
}
fixJSON <- function(json_str){
stuff = c('cid','cp','s','cs','vol','expiry','underlying_id','underlying_price',
'p','c','oi','e','b','strike','a','name','puts','calls','expirations',
'y','m','d')
for(i in 1:length(stuff)){
replacement1 = paste(',"', stuff[i], '":', sep = "")
replacement2 = paste('\\{"', stuff[i], '":', sep = "")
regex1 = paste(',', stuff[i], ':', sep = "")
regex2 = paste('\\{', stuff[i], ':', sep = "")
json_str = gsub(regex1, replacement1, json_str)
json_str = gsub(regex2, replacement2, json_str)
}
return(json_str)
}
I have basically added the if(length(numExp)!= 0){ to accomodate for stocks that do not have options chains....
Here is what I use to call the 20 stocks:
LIST <- c("AAMC", "AAU", "ACU", "ACY", "ADGE", "ADK", "AE", "AIII", "AINC",
"AIRI", "AKG", "ALN", "ALTV", "AMCO", "AMPE", "AMS", "APP", "APT",
"APTS", "ASM")
library("plyr")
# USE PARALLEL COMPUTING FOR FASTER PROCESS
registerDoParallel(detectCores())
system.time(
OC <- llply(.data=as.list(LIST), .fun=function(x) {
tmp <- try(getOC2(x))
if (!inherits(tmp, 'try-error')) tmp
}, .parallel = TRUE, .paropts=list(c(.packages=(all.available=T))))
)
# user system elapsed
# 0.008 0.001 0.097
I get no errors but OC should have a length of 20, however it only has a length on 1 and produces: list(NULL)
SESSION INFO:
R version 3.2.2 (2015-08-14)
Platform: x86_64-apple-darwin13.4.0 (64-bit)
Running under: OS X 10.10.5 (Yosemite)
locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
attached base packages:
[1] parallel stats graphics grDevices utils datasets methods base
other attached packages:
[1] stringr_1.0.0 plyr_1.8.3 doMC_1.3.3 doParallel_1.0.8 iterators_1.0.7 foreach_1.4.2 pbapply_1.1-1 jsonlite_0.9.17
[9] RCurl_1.95-4.7 bitops_1.0-6
loaded via a namespace (and not attached):
[1] Rcpp_0.12.0 lattice_0.20-33 codetools_0.2-14 zoo_1.7-12
[5] grid_3.2.2 magrittr_1.5 PerformanceAnalytics_1.4.3541 stringi_0.5-5
[9] PortfolioAnalytics_1.0.3636 xts_0.9-7 blotter_0.8.19 tools_3.2.2
[13] compiler_3.2.2 quantstrat_0.8.2

Related

Obtaining an error when running exact code from a blog

I am following a tutorial here. A few days ago I was able to run this code without error and run it on my own data set (it was always a little hit and miss with obtaining this error) - however now I try to run the code and I always obtain the same error.
Error in solve.QP(Dmat, dvec, Amat, bvec = b0, meq = 2) :
constraints are inconsistent, no solution!
I get that the solver cannot solve the equations but I am a little confused as to why it worked previously and now it does not... The author of the article has this code working...
library(tseries)
library(data.table)
link <- "https://raw.githubusercontent.com/DavZim/Efficient_Frontier/master/data/mult_assets.csv"
df <- data.table(read.csv(link))
df_table <- melt(df)[, .(er = mean(value),
sd = sd(value)), by = variable]
er_vals <- seq(from = min(df_table$er), to = max(df_table$er), length.out = 1000)
# find an optimal portfolio for each possible possible expected return
# (note that the values are explicitly set between the minimum and maximum of the expected returns per asset)
sd_vals <- sapply(er_vals, function(er) {
op <- portfolio.optim(as.matrix(df), er)
return(op$ps)
})
SessionInfo:
R version 3.5.3 (2019-03-11)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows >= 8 x64 (build 9200)
Matrix products: default
locale:
[1] LC_COLLATE=Spanish_Spain.1252 LC_CTYPE=Spanish_Spain.1252 LC_MONETARY=Spanish_Spain.1252
[4] LC_NUMERIC=C LC_TIME=Spanish_Spain.1252
attached base packages:
[1] parallel stats graphics grDevices utils datasets methods base
other attached packages:
[1] lpSolve_5.6.13.1 data.table_1.12.0 tseries_0.10-46 rugarch_1.4-0
loaded via a namespace (and not attached):
[1] Rcpp_1.0.0 MASS_7.3-51.1 mclust_5.4.2
[4] lattice_0.20-38 quadprog_1.5-5 Rsolnp_1.16
[7] TTR_0.23-4 tools_3.5.3 xts_0.11-2
[10] SkewHyperbolic_0.4-0 GeneralizedHyperbolic_0.8-4 quantmod_0.4-13.1
[13] spd_2.0-1 grid_3.5.3 KernSmooth_2.23-15
[16] yaml_2.2.0 numDeriv_2016.8-1 Matrix_1.2-15
[19] nloptr_1.2.1 DistributionUtils_0.6-0 ks_1.11.3
[22] curl_3.3 compiler_3.5.3 expm_0.999-3
[25] truncnorm_1.0-8 mvtnorm_1.0-8 zoo_1.8-4

tseries::portfolio.optim disallows short selling by default, see argument short. If short = FALSE asset weights may not go below 0. And as the weights must sum up to 1, also no individual asset weight could be above 1. There's no leverage.
(Possibly, in an earlier version of tseries default could have been short = TRUE. This would explain why it previously worked for you.)
Your target return (pm) cannot exceed the highest return of any of the input assets.
Solution 1: Allow short selling, but remember that that's a different efficient frontier. (For reference, see any lecture or book discussing Markowitz optimization. There's a mathematical solution to the problem without short-selling restriction.)
op <- portfolio.optim(as.matrix(df), er, shorts = T)
Solution 2: Limit the target returns between the worst and the best asset's return.
er_vals <- seq(from = min(colMeans(df)), to = max(colMeans(df)), length.out = 1000)
Here's a plot of the obtained efficient frontiers.
Here's the full script that gives both solutions.
library(tseries)
library(data.table)
link <- "https://raw.githubusercontent.com/DavZim/Efficient_Frontier/master/data/mult_assets.csv"
df <- data.table(read.csv(link))
df_table <- melt(df)[, .(er = mean(value),
sd = sd(value)), by = variable]
# er_vals <- seq(from = min(df_table$er), to = max(df_table$er), length.out = 1000)
er_vals1 <- seq(from = 0, to = 0.15, length.out = 1000)
er_vals2 <- seq(from = min(colMeans(df)), to = max(colMeans(df)), length.out = 1000)
# find an optimal portfolio for each possible possible expected return
# (note that the values are explicitly set between the minimum and maximum of the expected returns per asset)
sd_vals1 <- sapply(er_vals1, function(er) {
op <- portfolio.optim(as.matrix(df), er, short = T)
return(op$ps)
})
sd_vals2 <- sapply(er_vals2, function(er) {
op <- portfolio.optim(as.matrix(df), er, short = F)
return(op$ps)
})
plot(x = sd_vals1, y = er_vals1, type = "l", col = "red",
xlab = "sd", ylab = "er",
main = "red: allowing short-selling;\nblue: disallowing short-selling")
lines(x = sd_vals2, y = er_vals2, type = "l", col = "blue")

indirect indexing/subscripting inside %dopar%

I'm not understanding how to do indirect subscripting in %dopar% or in llply( .parallel = TRUE). My actual use-case is a list of formulas, then generating a list of glmer results in a first foreach %dopar%, then calling PBmodcomp on specific pairs of results in a separate foreach %dopar%. My toy example, using numeric indices rather than names of objects in the lists, works fine for %do% but not %dopar%, and fine for alply without .parallel = TRUE but not with .parallel = TRUE. [My real example with glmer and indexing lists by names rather than by integers works with %do% but not %dopar%.]
library(doParallel)
library(foreach)
library(plyr)
cl <- makePSOCKcluster(2) # tiny for toy example
registerDoParallel(cl)
mB <- c(1,2,1,3,4,10)
MO <- c("Full", "noYS", "noYZ", "noYSZS", "noS", "noZ",
"noY", "justS", "justZ", "noSZ", "noYSZ")
# Works
testouts <- foreach(i = 1:length(mB)) %do% {
# mB[i]
MO[mB[i]]
}
testouts
# all NA
testouts2 <- foreach(i = 1:length(mB)) %dopar% {
# mB[i]
MO[mB[i]]
}
testouts2
# Works
testouts3 <- alply(mB, 1, .fun = function(i) { MO[mB[i]]} )
testouts3
# fails "$ operator is invalid for atomic vectors"
testouts4 <- alply(mB, 1, .fun = function(i) { MO[mB[i]]},
.parallel = TRUE,
.paropts = list(.export=ls(.GlobalEnv)))
testouts4
stopCluster(cl)
I've tried various combinations of double brackets like MO[mB[[i]]], to no avail. mB[i] instead of MO[mB[i]] works in all 4 and returns a list of the numbers. I've tried .export(c("MO", "mB")) but just get the message that those objects are already exported.
I assume that there's something I misunderstand about evaluation of expressions like MO[mB[i]] in different environments, but there may be other things I misunderstand, too.
sessionInfo() R version 3.5.1 (2018-07-02) Platform: x86_64-w64-mingw32/x64 (64-bit) Running under: Windows 7 x64 (build
7601) Service Pack 1
Matrix products: default
locale: [1] LC_COLLATE=English_United States.1252 [2]
LC_CTYPE=English_United States.1252 [3] LC_MONETARY=English_United
States.1252 [4] LC_NUMERIC=C [5]
LC_TIME=English_United States.1252
attached base packages: [1] parallel stats graphics grDevices
utils datasets methods [8] base
other attached packages: [1] plyr_1.8.4 doParallel_1.0.13
iterators_1.0.9 foreach_1.5.0
loaded via a namespace (and not attached): [1] compiler_3.5.1
tools_3.5.1 listenv_0.7.0 Rcpp_0.12.17 [5]
codetools_0.2-15 digest_0.6.15 globals_0.12.1 future_1.8.1
[9] fortunes_1.5-5

The problem appears to be with version 1.5.0 of foreach on r-forge. Version 1.4.4 from CRAN works fine for both foreach %do par% and llply( .parallel = TRUE). For anyone finding this post when searching for %dopar% with lists, here's the code where mList is a named list of formulas, and tList is a named list of pairs of model names to be compared.
tList <- list(Z1 = c("Full", "noYZ"),
Z2 = c("noYS", "noYSZS"),
S1 = c("Full", "noYS"),
S2 = c("noYZ", "noYSZS"),
A1 = c("noYSZS", "noY"),
A2 = c("noSZ", "noYSZ")
)
cl <- makePSOCKcluster(params$nCores) # value from YAML params:
registerDoParallel(cl)
# first run the models
modouts <- foreach(imod = 1:length(mList),
.packages = "lme4") %dopar% {
glmer(as.formula(mList[[imod]]),
data = dsn,
family = poisson,
control = glmerControl(optimizer = "bobyqa",
optCtrl = list(maxfun = 100000),
check.conv.singular = "warning")
)
}
names(modouts) <- names(mList)
####
# now run the parametric bootstrap tests
nSim <- 500
testouts <- foreach(i = seq_along(tList),
.packages = "pbkrtest") %dopar% {
PBmodcomp(modouts[[tList[[i]][1]]],
modouts[[tList[[i]][2]]],
nsim = nSim)
}
names(testouts) <- names(tList)
stopCluster(Cl)

Error in rep(" ", len) : invalid 'times' argument

library(OneR)
library(RWeka)
loan_train <- read.csv("loan_train.csv")
loan_test <- read.csv("loan_test.csv")
loan_train <- optbin(loan_train, method = "logreg", na.omit = TRUE)
loan_test <- optbin(loan_test, method = "logreg", na.omit = TRUE)
#Task 1
loan_1R <- OneR(bad_loans ~ ., data = loan_train)
loan_1R
loan_JRip <- JRip(bad_loans ~ ., data = loan_train)
loan_JRip
Need some help with my code. I am able to run everything but for some reason, every time I print loan_1R, it gives me an error. Tried using traceback() but have no idea what it means. My csv file can be in the link below.
https://drive.google.com/file/d/1139FUSXUc_fdzgtKAleo5bGAtjcVGoRC/view?usp=sharing
Error in rep(" ", len) : invalid 'times' argument
In addition: Warning message:
In max(nchar(names(model$rules))) :
no non-missing arguments to max; returning -Inf
> traceback()
3: cat("If ", model$feature, " = ", names(model$rules[iter]), rep(" ",
len), " then ", model$target, " = ", model$rules[[iter]],
"\n", sep = "")
2: print.OneR(x)
1: function (x, ...)
UseMethod("print")(x)
> sessionInfo()
R version 3.4.1 (2017-06-30)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows >= 8 x64 (build 9200)
Matrix products: default
locale:
[1] LC_COLLATE=English_Singapore.1252 LC_CTYPE=English_Singapore.1252
[3] LC_MONETARY=English_Singapore.1252 LC_NUMERIC=C
[5] LC_TIME=English_Singapore.1252
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] RWeka_0.4-37 OneR_2.2
loaded via a namespace (and not attached):
[1] compiler_3.4.1 tools_3.4.1 grid_3.4.1 rJava_0.9-9 RWekajars_3.9.2-1

After hours of testing i found out the problem but I have no idea why it is so. Think that it has something to do with the library(RWeka) package.... Placing library(RWeka) after the OneR code seemed to make it run. But this means i encounter the error only once i run the library(RWeka). Any workaround this?
library(OneR)
loan_train <- read.csv("loan_train.csv")
loan_test <- read.csv("loan_test.csv")
loan_train <- optbin(loan_train, method = "logreg", na.omit = TRUE)
loan_test <- optbin(loan_test, method = "logreg", na.omit = TRUE)
#Task 1
loan_1R <- OneR(bad_loans ~ ., data = loan_train)
loan_1R
library(RWeka)
loan_JRip <- JRip(bad_loans ~ ., data = loan_train)
loan_JRip

cannot INSTALL shinyapps

I encountered the following error when trying to install shinyapps on my ubuntu 13.04. Can anyone help? Thanks.
Call:
require(devtools)
devtools::install_github('rstudio/shinyapps')
Error:
Installing github repo(s) rstudio/shinyapps/master from hadley
Installing rstudio/shinyapps.zip from https://github.com/hadley/rstudio/shinyapps/archive/master.zip
Error in writeBin(content(request), bundle) :
can only write vector objects
Session Info:
R version 3.0.2 (2013-09-25)
Platform: x86_64-pc-linux-gnu (64-bit)
locale:
[1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C LC_TIME=zh_CN.UTF-8
[4] LC_COLLATE=en_US.UTF-8 LC_MONETARY=zh_CN.UTF-8 LC_MESSAGES=en_US.UTF-8
[7] LC_PAPER=zh_CN.UTF-8 LC_NAME=C LC_ADDRESS=C
[10] LC_TELEPHONE=C LC_MEASUREMENT=zh_CN.UTF-8 LC_IDENTIFICATION=C
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] R.oo_1.15.8 R.methodsS3_1.5.2 XML_3.98-1.1 devtools_1.4.1
loaded via a namespace (and not attached):
[1] digest_0.6.3 evaluate_0.5.1 httr_0.2 memoise_0.1 parallel_3.0.2 RCurl_1.95-4.1
[7] stringr_0.6.2 tools_3.0.2 whisker_0.3-2
*trackback() prints the following: *
8: stop("can only write vector objects")
7: writeBin(content(request), bundle)
6: (function (url, name = NULL, subdir = NULL, config = list(),
before_install = NULL, ...)
{
if (is.null(name)) {
name <- basename(url)
}
message("Downloading ", name, " from ", url)
bundle <- file.path(tempdir(), name)
request <- GET(url, config)
stop_for_status(request)
writeBin(content(request), bundle)
on.exit(unlink(bundle), add = TRUE)
install_local_single(bundle, subdir = subdir, before_install = before_install,
...)
})(dots[[1L]][[1L]], dots[[2L]][[1L]], subdir = NULL, config = list(),
before_install = function (bundle, pkg_path)
{
desc <- file.path(pkg_path, "DESCRIPTION")
if (!ends_with_newline(desc))
cat("\n", sep = "", file = desc, append = TRUE)
append_field <- function(name, value) {
if (!is.null(value)) {
cat("Github", name, ":", value, "\n", sep = "",
file = desc, append = TRUE)
}
}
append_field("Repo", repo)
append_field("Username", username)
append_field("Ref", ref)
append_field("SHA1", github_extract_sha1(bundle))
append_field("Pull", pull)
append_field("Subdir", subdir)
append_field("Branch", branch)
append_field("AuthUser", auth_user)
})
5: mapply(install_url_single, url, name, MoreArgs = list(subdir = subdir,
config = config, before_install = before_install, ...))
4: install_url(url, name = paste(repo, ".zip", sep = ""), subdir = subdir,
config = auth, before_install = github_before_install, ...)
3: FUN("shinyapps"[[1L]], ...)
2: vapply(repo, install_github_single, FUN.VALUE = logical(1), username,
ref, pull, subdir, branch, auth_user, password, ...)
1: install_github(repo = "shinyapps", username = "rstudio")

Please have a look at the parameter list of install_github. Your call should be
install_github( repo = "shinyapps", username="rstudio" )
At least for devtools prior to version 1.4.1: https://github.com/hadley/devtools/blob/master/NEWS.md

I think you have another choice. Please download the package from https://github.com/hadley/rstudio/shinyapps/archive/master.zip to your local computer. Then install the package by call install_local(). For example,
install_local("~/Downloads/shinyapps-master.zip").
shinyapps-master.zip is the shinyapps package, and ~/Downloads/ is the path.

shinyapps package has been replaced by rsconnect which can be simply installed through CRAN in the regular way.
https://github.com/rstudio/shinyapps

addOBV throwing error

I am trying to plot a graph with price and a few technical indicators such as ADX, RSI, and OBV. I cannot figure out why addOBV is giving an error and why addADX not showing at all in the graph lines in the chart?
Here my code:
tmp <- read.csv(paste("ProcessedQuotes/",Nifty[x,],".csv", sep=""),
as.is=TRUE, header=TRUE, row.names=NULL)
tmp$Date<-as.Date(tmp$Date)
ydat = xts(tmp[,-1],tmp$Date)
lineChart(ydat, TA=NULL, name=paste(Nifty[x,]," Technical Graph"))
plot(addSMA(10))
plot(addEMA(10))
plot(addRSI())
plot(addADX())
plot(addOBV())
Error for addOBV is:
Error in try.xts(c(2038282, 1181844, -1114409, 1387404, 3522045, 4951254, :
Error in as.xts.double(x, ..., .RECLASS = TRUE) :
order.by must be either 'names()' or otherwise specified
Below you can see DIn is not shown fully in the graphs.
> class(ydat)
[1] "xts" "zoo"
> head(ydat)
Open High Low Close Volume Trades Sma20 Sma50 DIp DIn DX ADX aroonUp aroonDn oscillator macd signal RSI14

I don't know why that patch doesn't work for you, but you can just create a new function (or you could mask the one from quantmod). Let's just make a new, patched version called addOBV2 which is the code for addOBV except for the one patched line. (x <- as.matrix(lchob#xdata) is replaced with x <- try.xts(lchob#xdata, error=FALSE)).
addOBV2 <- function (..., on = NA, legend = "auto")
{
stopifnot("package:TTR" %in% search() || require("TTR", quietly = TRUE))
lchob <- quantmod:::get.current.chob()
x <- try.xts(lchob#xdata, error=FALSE)
#x <- as.matrix(lchob#xdata)
x <- OBV(price = Cl(x), volume = Vo(x))
yrange <- NULL
chobTA <- new("chobTA")
if (NCOL(x) == 1) {
chobTA#TA.values <- x[lchob#xsubset]
}
else chobTA#TA.values <- x[lchob#xsubset, ]
chobTA#name <- "chartTA"
if (any(is.na(on))) {
chobTA#new <- TRUE
}
else {
chobTA#new <- FALSE
chobTA#on <- on
}
chobTA#call <- match.call()
legend.name <- gsub("^.*[(]", " On Balance Volume (", deparse(match.call()))#,
#extended = TRUE)
gpars <- c(list(...), list(col=4))[unique(names(c(list(col=4), list(...))))]
chobTA#params <- list(xrange = lchob#xrange, yrange = yrange,
colors = lchob#colors, color.vol = lchob#color.vol, multi.col = lchob#multi.col,
spacing = lchob#spacing, width = lchob#width, bp = lchob#bp,
x.labels = lchob#x.labels, time.scale = lchob#time.scale,
isLogical = is.logical(x), legend = legend, legend.name = legend.name,
pars = list(gpars))
if (is.null(sys.call(-1))) {
TA <- lchob#passed.args$TA
lchob#passed.args$TA <- c(TA, chobTA)
lchob#windows <- lchob#windows + ifelse(chobTA#new, 1,
0)
chartSeries.chob <- quantmod:::chartSeries.chob
do.call("chartSeries.chob", list(lchob))
invisible(chobTA)
}
else {
return(chobTA)
}
}
Now it works.
# reproduce your data
ydat <- getSymbols("ZEEL.NS", src="yahoo", from="2012-09-11",
to="2013-01-18", auto.assign=FALSE)
lineChart(ydat, TA=NULL, name=paste("ZEEL Technical Graph"))
plot(addSMA(10))
plot(addEMA(10))
plot(addRSI())
plot(addADX())
plot(addOBV2())

This code reproduces the error:
library(quantmod)
getSymbols("AAPL")
lineChart(AAPL, 'last 6 months')
addOBV()
Session Info:
sessionInfo()
R version 2.15.0 (2012-03-30)
Platform: x86_64-apple-darwin9.8.0/x86_64 (64-bit)
locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] quantmod_0.3-17 TTR_0.21-1 xts_0.9-1 zoo_1.7-9 Defaults_1.1-1 rgeos_0.2-11
[7] sp_1.0-5 sos_1.3-5 brew_1.0-6
loaded via a namespace (and not attached):
[1] grid_2.15.0 lattice_0.20-6 tools_2.15.0
Googling around, the error seems to be related to the fact that addOBV converts the data into a matrix, which causes problems with TTR::OBV. A patch has been posted on RForge.

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

Scraping Options Data from Google Finance using R - r

Related

Obtaining an error when running exact code from a blog

indirect indexing/subscripting inside %dopar%

Error in rep(" ", len) : invalid 'times' argument

cannot INSTALL shinyapps

addOBV throwing error

Categories

Resources