Obtaining an error when running exact code from a blog - r

I am following a tutorial here. A few days ago I was able to run this code without error and run it on my own data set (it was always a little hit and miss with obtaining this error) - however now I try to run the code and I always obtain the same error.
Error in solve.QP(Dmat, dvec, Amat, bvec = b0, meq = 2) :
constraints are inconsistent, no solution!
I get that the solver cannot solve the equations but I am a little confused as to why it worked previously and now it does not... The author of the article has this code working...
library(tseries)
library(data.table)
link <- "https://raw.githubusercontent.com/DavZim/Efficient_Frontier/master/data/mult_assets.csv"
df <- data.table(read.csv(link))
df_table <- melt(df)[, .(er = mean(value),
sd = sd(value)), by = variable]
er_vals <- seq(from = min(df_table$er), to = max(df_table$er), length.out = 1000)
# find an optimal portfolio for each possible possible expected return
# (note that the values are explicitly set between the minimum and maximum of the expected returns per asset)
sd_vals <- sapply(er_vals, function(er) {
op <- portfolio.optim(as.matrix(df), er)
return(op$ps)
})
SessionInfo:
R version 3.5.3 (2019-03-11)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows >= 8 x64 (build 9200)
Matrix products: default
locale:
[1] LC_COLLATE=Spanish_Spain.1252 LC_CTYPE=Spanish_Spain.1252 LC_MONETARY=Spanish_Spain.1252
[4] LC_NUMERIC=C LC_TIME=Spanish_Spain.1252
attached base packages:
[1] parallel stats graphics grDevices utils datasets methods base
other attached packages:
[1] lpSolve_5.6.13.1 data.table_1.12.0 tseries_0.10-46 rugarch_1.4-0
loaded via a namespace (and not attached):
[1] Rcpp_1.0.0 MASS_7.3-51.1 mclust_5.4.2
[4] lattice_0.20-38 quadprog_1.5-5 Rsolnp_1.16
[7] TTR_0.23-4 tools_3.5.3 xts_0.11-2
[10] SkewHyperbolic_0.4-0 GeneralizedHyperbolic_0.8-4 quantmod_0.4-13.1
[13] spd_2.0-1 grid_3.5.3 KernSmooth_2.23-15
[16] yaml_2.2.0 numDeriv_2016.8-1 Matrix_1.2-15
[19] nloptr_1.2.1 DistributionUtils_0.6-0 ks_1.11.3
[22] curl_3.3 compiler_3.5.3 expm_0.999-3
[25] truncnorm_1.0-8 mvtnorm_1.0-8 zoo_1.8-4

tseries::portfolio.optim disallows short selling by default, see argument short. If short = FALSE asset weights may not go below 0. And as the weights must sum up to 1, also no individual asset weight could be above 1. There's no leverage.
(Possibly, in an earlier version of tseries default could have been short = TRUE. This would explain why it previously worked for you.)
Your target return (pm) cannot exceed the highest return of any of the input assets.
Solution 1: Allow short selling, but remember that that's a different efficient frontier. (For reference, see any lecture or book discussing Markowitz optimization. There's a mathematical solution to the problem without short-selling restriction.)
op <- portfolio.optim(as.matrix(df), er, shorts = T)
Solution 2: Limit the target returns between the worst and the best asset's return.
er_vals <- seq(from = min(colMeans(df)), to = max(colMeans(df)), length.out = 1000)
Here's a plot of the obtained efficient frontiers.
Here's the full script that gives both solutions.
library(tseries)
library(data.table)
link <- "https://raw.githubusercontent.com/DavZim/Efficient_Frontier/master/data/mult_assets.csv"
df <- data.table(read.csv(link))
df_table <- melt(df)[, .(er = mean(value),
sd = sd(value)), by = variable]
# er_vals <- seq(from = min(df_table$er), to = max(df_table$er), length.out = 1000)
er_vals1 <- seq(from = 0, to = 0.15, length.out = 1000)
er_vals2 <- seq(from = min(colMeans(df)), to = max(colMeans(df)), length.out = 1000)
# find an optimal portfolio for each possible possible expected return
# (note that the values are explicitly set between the minimum and maximum of the expected returns per asset)
sd_vals1 <- sapply(er_vals1, function(er) {
op <- portfolio.optim(as.matrix(df), er, short = T)
return(op$ps)
})
sd_vals2 <- sapply(er_vals2, function(er) {
op <- portfolio.optim(as.matrix(df), er, short = F)
return(op$ps)
})
plot(x = sd_vals1, y = er_vals1, type = "l", col = "red",
xlab = "sd", ylab = "er",
main = "red: allowing short-selling;\nblue: disallowing short-selling")
lines(x = sd_vals2, y = er_vals2, type = "l", col = "blue")

Related

Can't use SharpeRatio in PortfolioAnalytics to optimize a portfolio

I try to use SharpeRatio has a objective function to optimize my portfolio, but i get the following error:
objective name SharpeRatio generated an error or warning: Error in t(w) %*% M3 : requires numeric/complex matrix/vector arguments
I've searched and it seems that the issue is related to the weights, but i can't find a way to solve it.
The next code replicates the error:
library(PortfolioAnalytics)
data(edhec)
asset_names <- colnames(edhec)
port_spec <- portfolio.spec(asset_names)
port_spec <- add.constraint(portfolio = port_spec, type = "weight_sum", min_sum = 0.99, max_sum = 1.01)
port_spec <- add.constraint(portfolio = port_spec, type = "long_only")
port_spec <- add.objective(portfolio = port_spec, type = "return", name = "SharpeRatio", FUN = "StdDev")
opt_DE <- optimize.portfolio(R = edhec, portfolio = port_spec, optimize_method = "DEoptim", search_size=5000, trace = TRUE, traceDE = 0)
Has requested, sessionInfo()
R version 3.4.3 (2017-11-30)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows >= 8 x64 (build 9200)
Matrix products: default
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] FactoMineR_1.39 nFactors_2.3.3 lattice_0.20-35
[4] boot_1.3-20 psych_1.7.8 MASS_7.3-47
[7] PortfolioAnalytics_1.0.3636 PerformanceAnalytics_1.4.3541 foreach_1.4.4
[10] xts_0.10-1 zoo_1.8-0
loaded via a namespace (and not attached):
[1] cluster_2.0.6 leaps_3.0 mnormt_1.5-5 scatterplot3d_0.3-40
[5] quadprog_1.5-5 ROI_0.3-0 TTR_0.23-2 tools_3.4.3
[9] quantmod_0.4-12 parallel_3.4.3 grid_3.4.3 nlme_3.1-131
[13] registry_0.5 iterators_1.0.9 yaml_2.1.16 GenSA_1.1.7
[17] codetools_0.2-15 curl_3.1 slam_0.1-42 ROI.plugin.quadprog_0.2-5
[21] compiler_3.4.3 flashClust_1.01-2 DEoptim_2.2-4 foreign_0.8-69
I would recommend checking out the PortfolioAnalytics Demo files. One of them in particular
Demo Max Sharpe Ratio:
https://github.com/R-Finance/PortfolioAnalytics/blob/master/demo/demo_max_Sharpe.R
will be particularly useful to reference. After reading through some of the code and comments, you will see a few things. First you specified conflicting arguments, e.g. type = "return", name = "SharpeRatio", FUN = "StdDev".
"return" is a type of constraint, "StdDev" is a name of a "risk" constraint, and "SharpeRatio" is what you are trying to solve for.
If you use the "ROI" method to optimize, you would need to specify that you want to maximize the Sharpe Ratio in the optimization "maxSR=TRUE" if you want to use the "DEOptim" optimization method, you need to relax your leverage constraints.
Examples of each can be found below. They are directly taken from the referenced demo file above.
library(PortfolioAnalytics)
# Examples of solving optimization problems to maximize mean return per unit StdDev
data(edhec)
R <- edhec[, 1:8]
funds <- colnames(R)
# Construct initial portfolio
init.portf <- portfolio.spec(assets=funds)
init.portf <- add.constraint(portfolio=init.portf, type="full_investment")
init.portf <- add.constraint(portfolio=init.portf, type="long_only")
init.portf <- add.objective(portfolio=init.portf, type="return", name="mean")
init.portf <- add.objective(portfolio=init.portf, type="risk", name="StdDev")
init.portf
# The default action if "mean" and "StdDev" are specified as objectives with
# optimize_method="ROI" is to maximize quadratic utility. If we want to maximize
# Sharpe Ratio, we need to pass in maxSR=TRUE to optimize.portfolio.
maxSR.lo.ROI <- optimize.portfolio(R=R, portfolio=init.portf,
optimize_method="ROI",
maxSR=TRUE, trace=TRUE)
maxSR.lo.ROI
# Although the maximum Sharpe Ratio objective can be solved quickly and accurately
# with optimize_method="ROI", it is also possible to solve this optimization
# problem using other solvers such as random portfolios or DEoptim. These
# solvers have the added flexibility of using different methods to calculate
# the Sharpe Ratio (e.g. we could specify annualized measures of risk and return).
# For random portfolios and DEoptim, the leverage constraints should be
# relaxed slightly.
init.portf$constraints[[1]]$min_sum=0.99
init.portf$constraints[[1]]$max_sum=1.01
# Use DEoptim
maxSR.lo.DE <- optimize.portfolio(R=R, portfolio=init.portf,
optimize_method="DEoptim",
search_size=2000,
trace=TRUE)
Hopefully this helps; typically I find that many of the more complex packages in R will have demo files to help get you started.

Receiving an unexpected error when plotting by group

Sorry for the massive data dump but I can't reproduce this on the subsets of the data I've tried. Copy-pasted the dput of the data (165 obs., not crazy) to this Gist.
I'm trying to plot the data in DT by sport, according to:
Create empty plot with proper limits to accommodate all data
Plot the column gini as a scatterplot, with colors varying by sport
Plot the column five_year_ma as a line, with color matching that in 2.
This should be simple and I've done things like it before. Here's what should work:
#empty plot with proper axes
DT[ , plot(
NA, ylim = range(gini), xlim = range(season),
xlab = "Season", ylab = "Gini",
main = "Comparison of Gini Coefficient Across Sports"
)]
#pick colors for each sport
cols <- c(NHL="black", NBA="red")
DT[ , by = sport, {
#add points to current plot
points(season, gini, col = cols[.BY$sport])
#add lines to current plot
lines(season, five_yr_ma, col = cols[.BY$sport], lwd = 3)
}]
But this gives me output/error:
# Empty data.table (0 rows) of 1 col: sport
Error: x and y lengths differ in plot.xy()
This is strange. If we skip the grouping and just do it manually, it works perfectly fine:
all_sports[sport == "NBA", {
points(season, gini, col = "red")
lines(season, five_yr_ma, col = "red", lwd = 3)
}]
all_sports[sport == "NHL", {
points(season, gini, col = "black")
lines(season, five_yr_ma, col = "black", lwd = 3)
}]
Moreover, even in the context of grouping, it's unclear why plot.xy has received arguments of different length -- if we make the following adjustment to force R to record the inputs just before they're sent, there doesn't appear to be any issue:
all_sports[ , {
cat("\n\nPlotting for sport: ", .BY$sport)
points(x1 <- season, y1 <- gini, col = cols[.BY$sport])
lines(x2 <- season, y2 <- five_yr_ma, col = cols[.BY$sport], lwd = 3)
cat("\npoints/season: ",length(x1),
"\npoints/gini: ", length(y1),
"\nlines/season: ", length(x2),
"\nlines/five_yr_ma: ", length(y2))},
by = sport]
Has output:
# Plotting for sport: NHL
# points/season: 98
# points/gini: 98
# lines/season: 98
# lines/five_yr_ma: 98
# Plotting for sport: NBA
# points/season: 67
# points/gini: 67
# lines/season: 67
# lines/five_yr_ma: 67
What could be going on??
Since it appears like this is not common across machines, here's my sessionInfo():
R version 3.2.4 (2016-03-10)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 14.04.3 LTS
locale:
[1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
[5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 LC_PAPER=en_US.UTF-8 LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] data.table_1.9.7
loaded via a namespace (and not attached):
[1] rsconnect_0.4.1.11 tools_3.2.4
Indeed, as #Arun points out, it seems this is a resurfacing of the (as yet unsolved) issue which was causing the error in this question:
Values of the wrong group are used when using plot() within a data.table() in RStudio
As #Arun discovered there, it seems like RStudio's native graphics device is somehow getting tripped up by the changing pointers used for the different subgroups created when evaluating j when by is present, which lends itself to the workaround of simply copying all of .SD each time, like:
points(copy(season), copy(gini),
col = cols[.BY$sport])
lines(copy(season), copy(five_yr_ma),
col = cols[.BY$sport], lwd = 3)
Or
x <- copy(.SD)
with(x, {points(season, gini, cols = cols[.BY$sport]);
lines(copy(season), copy(five_yr_ma),
col = cols[.BY$sport], lwd = 3)})
Both of which worked for me (since the subgroups are so small, there's no computational efficiency concern at play here -- we can copy away without affecting performance noticeably).
This is #1524 at the data.table GitHub page and I've filed this bug report at RStudio Support; will update this if a fix is pushed.

Error in grid.Call(L_textBounds, as.graphicsAnnot(x$label), x$x, x$y, : polygon edge not found (new)

I know that the title of this question is a duplicate of this Question and this Question but the solutions over there don't work for me and the error message is (slightly) different:
Error in grid.Call(L_textBounds, as.graphicsAnnot(x$label), x$x, x$y, :
polygon edge not found
(note the missing part about the missing font)
I tried all suggestions that I found (updating / reinstalling all loaded graphic packages, ggplot2, GGally, and scales, reinitialising the Fonts on Mac OSX by starting in safe mode, moving the Fonts from /Fonts/ (Disabled) back into /Fonts...) but none of it resolved the problem.
The error seems to occure when I plot a ggplot graph with
scale_y_continuous(label=scientific_10)
where scientific_10 is defined as
scientific_10 <- function(x) {
parse(text = gsub("e", " %*% 10^", scientific_format()(x)))
}
Therefore the I suspect that the scales library has something to do with it.
The most puzzling is that the error only occurs each so-and-so many times, maybe each 3rd or 5th time i try to plot the same graph...
> sessionInfo()
R version 3.2.2 (2015-08-14)
Platform: x86_64-apple-darwin13.4.0 (64-bit)
Running under: OS X 10.9.5 (Mavericks)
locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] gridExtra_2.0.0 scales_0.3.0 broom_0.4.0 tidyr_0.3.1 ggplot2_1.0.1 GGally_0.5.0 dplyr_0.4.3
loaded via a namespace (and not attached):
[1] Rcpp_0.11.5 magrittr_1.5 MASS_7.3-43 mnormt_1.5-1 munsell_0.4.2 colorspace_1.2-6 lattice_0.20-33 R6_2.0.1
[9] stringr_0.6.2 plyr_1.8.1 tools_3.2.2 parallel_3.2.2 grid_3.2.2 gtable_0.1.2 nlme_3.1-121 psych_1.5.8
[17] DBI_0.3.1 htmltools_0.2.6 lazyeval_0.1.10 yaml_2.1.13 assertthat_0.1 digest_0.6.8 reshape2_1.4.1 rmarkdown_0.8.1
[25] labeling_0.3 reshape_0.8.5 proto_0.3-10
traceback()
35: grid.Call(L_textBounds, as.graphicsAnnot(x$label), x$x, x$y,
resolveHJust(x$just, x$hjust), resolveVJust(x$just, x$vjust),
x$rot, 0)
34: widthDetails.text(x)
33: widthDetails(x)
32: (function (x)
{
widthDetails(x)
})(list(label = expression(5 %*% 10^+5, 7.5 %*% 10^+5, 1 %*%
10^+6, 1.25 %*% 10^+6, 1.5 %*% 10^+6), x = 1, y = c(0.0777214770341215,
0.291044141334423, 0.504366805634725, 0.717689469935027, 0.931012134235329
), just = "centre", hjust = 1, vjust = 0.5, rot = 0, check.overlap = FALSE,
name = "axis.text.y.text.8056", gp = list(fontsize = 9.6,
col = "black", fontfamily = "", lineheight = 0.9, font = 1L),
vp = NULL))
31: grid.Call.graphics(L_setviewport, vp, TRUE)
30: push.vp.viewport(X[[i]], ...)
I solved it by installing the library extrafont, installing a set of specific fonts and forcing ggplot to use only these fonts:
require(extrafont)
# need only do this once!
font_import(pattern="[A/a]rial", prompt=FALSE)
require(ggplot2)
# extending the help file example
df <- data.frame(gp = factor(rep(letters[1:3], each = 10)), y = rnorm(30))
ds <- plyr::ddply(df, "gp", plyr::summarise, mean = mean(y), sd = sd(y))
plotobj <- ggplot(df, aes(gp, y)) +
geom_point() +
geom_point(data = ds, aes(y = mean), colour = 'red', size = 3) +
theme(text=element_text(size=16, family="Arial"))
print(plotobj)
I experienced the same issue when trying to plot ggplot/grid output to the graph window in Rstudio. However, plotting to an external graphing device seems to work fine.
The external device of choice depends on your system, but the script below, paraphrased from this blog, works for most systems:
a = switch(tolower(Sys.info()["sysname"]),
"darwin" = "quartz",
"linux" = "x11",
"windows" = "windows")
options("device" = a)
graphics.off()
rm(a)
and to switch back to using the Rstudio plot window:
options("device"="RStudioGD")
graphics.off()
Note that by switching, you lose any existing plots.
A lot of solutions for this particular error direct you to look under the hood of your computer but this error can also be caused by a scripting error in which R expects to match elements from two data structures but cannot.
For me the error was caused by calling a fairly complex graphing function (see below) that read an ordered character vector as well as a matrix whose row names were supposed to each match a value in the ordered character vector. The problem was that some of my values contained dashes in them and R's read.table() function translated those dashes to periods (Ex: "HLA-DOA" became "HLA.DOA").
I was using the ComplexHeatmap package with a call like this:
oncoPrint(mat,
get_type = function(x) strsplit(x, ";")[[1]],
alter_fun_list = alter_fun_list,
col = col,
row_order = my_order,
column_title = "OncoPrint",
heatmap_legend_param = list(title = "Alternations", at = c("AMP", "HOMDEL", "MUT"), labels = c("Amplification", "Deep deletion", "Mutation"))
)
In this call:
mat was a matrix that had dashes swapped out for periods
my_order was a character vector containing the same values as the row names of matexcept the dashes remained
every other argument is essential to the call but irrelevant to this post
To help R find this elusive "polygon edge", I just edited my character vector with:
row_order <- gsub("\\.", "-", row_order)
If you've tried re-installing packages, restarting your computer and re-enabling fonts - maybe check and see if you've got some faulty character matching going on in your call.
i tried to set the font of aes,returned the error info
the added words:
p <- p + theme(text = element_text(family = "宋体"))
when i tried to remove the setting,it's ok then.
Actually, I have the same problem on my MAC and couldn't solve it on a regular base... Since it also happens like every 5th or 10th execution I decided to wrap the whole ggplot command into a trycatch call and execute it until it doesn't fail...
The code would looks like this
error_appeared <- FALSE
repeat{
tryCatch({ # we put everything into a try catch block, because sometimes we get an error
gscat <-
ggplot() # my ggplot command which sometimes fail
ggsave('file.pdf', gscat, width=8,height=8)
plot(gscat)
},
error=function(e) {
print('redo the ratioscatterplot.')
error_appeared <- TRUE
}
)
if(!error_appeared){
break
}
}
Actually I figured out, only the drawing/plotting of the figure gives problems! Saving always works.
Maybe this is helping someone, since I couldn't find a solution which actually solves the whole thing!
Additional:
If somebody wants to play with the problem on a "reproducible example" the code below throws an average of 2 errors out of 20 within the loop.
library(scales)
library(ggplot2)
df <- data.frame(
log2.Ratio.H.L.normalized.rev = c(2.53861265542646, 0.402176424979483, 0.438931541934545, 0.639695233399582, 0.230203013366421,
2.88223218956399, 1.23051046036618, 2.56554843533357, 0.265436896049098,
1.32866415755805, -0.92108963514092, 0.0976107966264223, -0.43048946484291,
-0.558665259531966, 4.13183638727079, 0.904580434921318, -0.0733780789564803,
-0.621932351219966, 1.48594198341242, -0.365611185917855, 1.21088754922081,
-2.3717583289898, 2.95160644380282, 3.71446534016249),
Intensity = c(5951600000, 2.4433e+10, 1.1659e+10, 2273600000, 6.852e+10, 9.8746e+10, 5701600000,
1758500000, 987180000, 3.4167e+11, 1.5718e+10, 6.8888e+10, 5.5936e+10,
8702900000, 1093500000, 4426200000, 1.3681e+11, 7.773e+09, 5860400000,
1.2861e+12, 2017900000, 2061300000, 240520000, 1382700000),
my_label = c("RPL18",
"hCG_2024613", "NOL7", "PRPF4B", "HIST1H2BC", "XRCC1", "C9orf30",
"CABIN1", "MGC3731", "XRCC6", "RPL23", "RPL27", "RPL17", "RPL32",
"XPC", "RPL15", "GNL3", "RPL29", "JOSD3", "PARP1", "DNAPTP6",
"ORC2L", "NCL", "TARDBP"))
unlink("figures", recursive=TRUE)
if(!dir.exists('figures')) dir.create('figures')
for(i in 1:20) {
error_appeared <- FALSE
repeat{
tryCatch({ # we put everything into a try catch block, because sometimes we get an error
gscat <-
ggplot(df, aes_string("log2.Ratio.H.L.normalized.rev", 'Intensity')) +
geom_point(data=df[abs(df[["log2.Ratio.H.L.normalized.rev"]]) < 1,],
color='black', alpha=.3, na.rm=TRUE) +
scale_y_log10(labels = scales::trans_format("log10", scales::math_format()))
ggsave(file.path('figures', paste0('intensity_scatter_', i, '.pdf')),
gscat, width=8, height=8)
plot(gscat)
},
error=function(e) {
# print(e)
print(sprintf('%s redo the ratioscatterplot.', i))
error_appeared <- TRUE
}
)
if(!error_appeared){
break
}
}
}

Missing object error when using step() within a user-defined function

5 days and still no answer
As can be seen by Simon's comment, this is a reproducible and very strange issue. It seems that the issue only arises when a stepwise regression with very high predictive power is wrapped in a function.
I have been struggling with this for a while and any help would be much appreciated. I am trying to write a function that runs several stepwise regressions and outputs all of them to a list. However, R is having trouble reading the dataset that I specify in my function arguments. I found several similar errors on various boards (here, here, and here), however none of them seemed to ever get resolved. It all comes down to some weird issues with calling step() in a user-defined function. I am using the following script to test my code. Run the whole thing several times until an error arises (trust me, it will):
test.df <- data.frame(a = sample(0:1, 100, rep = T),
b = as.factor(sample(0:5, 100, rep = T)),
c = runif(100, 0, 100),
d = rnorm(100, 50, 50))
test.df$b[10:100] <- test.df$a[10:100] #making sure that at least one of the variables has some predictive power
stepModel <- function(modeling.formula, dataset, outfile = NULL) {
if (is.null(outfile) == FALSE){
sink(file = outfile,
append = TRUE, type = "output")
print("")
print("Models run at:")
print(Sys.time())
}
model.initial <- glm(modeling.formula,
family = binomial,
data = dataset)
model.stepwise1 <- step(model.initial, direction = "backward")
model.stepwise2 <- step(model.stepwise1, scope = ~.^2)
output <- list(modInitial = model.initial, modStep1 = model.stepwise1, modStep2 = model.stepwise2)
sink()
return(output)
}
blah <- stepModel(a~., dataset = test.df)
This returns the following error message (if the error does not show up right away, keep re-running the test.df script as well as the call for stepModel(), it will show up eventually):
Error in is.data.frame(data) : object 'dataset' not found
I have determined that everything runs fine up until model.stepwise2 starts to get built. Somehow, the temporary object 'dataset' works just fine for the first stepwise regression, but fails to be recognized by the second. I found this by commenting out part of the function as can be seen below. This code will run fine, proving that the object 'dataset' was originally being recognized:
stepModel1 <- function(modeling.formula, dataset, outfile = NULL) {
if (is.null(outfile) == FALSE){
sink(file = outfile,
append = TRUE, type = "output")
print("")
print("Models run at:")
print(Sys.time())
}
model.initial <- glm(modeling.formula,
family = binomial,
data = dataset)
model.stepwise1 <- step(model.initial, direction = "backward")
# model.stepwise2 <- step(model.stepwise1, scope = ~.^2)
# sink()
# output <- list(modInitial = model.initial, modStep1 = model.stepwise1, modStep2 = model.stepwise2)
return(model.stepwise1)
}
blah1 <- stepModel1(a~., dataset = test.df)
EDIT - before anyone asks, all the summary() functions were there because the full function (i edited it so that you could focus in on the error) has another piece that defines a file to which you can output stepwise trace. I just got rid of them
EDIT 2 - session info
sessionInfo()
R version 2.15.1 (2012-06-22)
Platform: x86_64-pc-mingw32/x64 (64-bit)
locale:
[1] LC_COLLATE=English_United States.1252 LC_CTYPE=English_United States.1252
[3] LC_MONETARY=English_United States.1252 LC_NUMERIC=C
[5] LC_TIME=English_United States.1252
attached base packages:
[1] tcltk stats graphics grDevices utils datasets methods base
other attached packages:
[1] sqldf_0.4-6.4 RSQLite.extfuns_0.0.1 RSQLite_0.11.3 chron_2.3-43
[5] gsubfn_0.6-5 proto_0.3-10 DBI_0.2-6 ggplot2_0.9.3.1
[9] caret_5.15-61 reshape2_1.2.2 lattice_0.20-6 foreach_1.4.0
[13] cluster_1.14.2 plyr_1.8
loaded via a namespace (and not attached):
[1] codetools_0.2-8 colorspace_1.2-1 dichromat_2.0-0 digest_0.6.2 grid_2.15.1
[6] gtable_0.1.2 iterators_1.0.6 labeling_0.1 MASS_7.3-18 munsell_0.4
[11] RColorBrewer_1.0-5 scales_0.2.3 stringr_0.6.2 tools_2.15
EDIT 3 - this performs all the same operations as the function, just without using a function. This will run fine every time, even when the algorithm doesn't converge:
modeling.formula <- a~.
dataset <- test.df
outfile <- NULL
if (is.null(outfile) == FALSE){
sink(file = outfile,
append = TRUE, type = "output")
print("")
print("Models run at:")
print(Sys.time())
}
model.initial <- glm(modeling.formula,
family = binomial,
data = dataset)
model.stepwise1 <- step(model.initial, direction = "backward")
model.stepwise2 <- step(model.stepwise1, scope = ~.^2)
output <- list(modInitial = model.initial, modStep1 = model.stepwise1, modStep2 = model.stepwise2)
Using do.call to refer to the data set in the calling environment works for me. See https://stackoverflow.com/a/7668846/210673 for the original suggestion. Here's a version that works (with sink code removed).
stepModel2 <- function(modeling.formula, dataset) {
model.initial <- do.call("glm", list(modeling.formula,
family = "binomial",
data = as.name(dataset)))
model.stepwise1 <- step(model.initial, direction = "backward")
model.stepwise2 <- step(model.stepwise1, scope = ~.^2)
list(modInitial = model.initial, modStep1 = model.stepwise1, modStep2 = model.stepwise2)
}
blah <- stepModel2(a~., dataset = "test.df")
It fails for me consistently with set.seed(6) with the original code. The reason it fails is that the dataset variable is not present within the step function, and although it's not needed in making model.stepwise1, it is needed for model.stepwise2 when model.stepwise1 keeps a linear term. So that's the case when your version fails. Calling the dataset from the global environment as I do here fixes this issue.

addOBV throwing error

I am trying to plot a graph with price and a few technical indicators such as ADX, RSI, and OBV. I cannot figure out why addOBV is giving an error and why addADX not showing at all in the graph lines in the chart?
Here my code:
tmp <- read.csv(paste("ProcessedQuotes/",Nifty[x,],".csv", sep=""),
as.is=TRUE, header=TRUE, row.names=NULL)
tmp$Date<-as.Date(tmp$Date)
ydat = xts(tmp[,-1],tmp$Date)
lineChart(ydat, TA=NULL, name=paste(Nifty[x,]," Technical Graph"))
plot(addSMA(10))
plot(addEMA(10))
plot(addRSI())
plot(addADX())
plot(addOBV())
Error for addOBV is:
Error in try.xts(c(2038282, 1181844, -1114409, 1387404, 3522045, 4951254, :
Error in as.xts.double(x, ..., .RECLASS = TRUE) :
order.by must be either 'names()' or otherwise specified
Below you can see DIn is not shown fully in the graphs.
> class(ydat)
[1] "xts" "zoo"
> head(ydat)
Open High Low Close Volume Trades Sma20 Sma50 DIp DIn DX ADX aroonUp aroonDn oscillator macd signal RSI14
I don't know why that patch doesn't work for you, but you can just create a new function (or you could mask the one from quantmod). Let's just make a new, patched version called addOBV2 which is the code for addOBV except for the one patched line. (x <- as.matrix(lchob#xdata) is replaced with x <- try.xts(lchob#xdata, error=FALSE)).
addOBV2 <- function (..., on = NA, legend = "auto")
{
stopifnot("package:TTR" %in% search() || require("TTR", quietly = TRUE))
lchob <- quantmod:::get.current.chob()
x <- try.xts(lchob#xdata, error=FALSE)
#x <- as.matrix(lchob#xdata)
x <- OBV(price = Cl(x), volume = Vo(x))
yrange <- NULL
chobTA <- new("chobTA")
if (NCOL(x) == 1) {
chobTA#TA.values <- x[lchob#xsubset]
}
else chobTA#TA.values <- x[lchob#xsubset, ]
chobTA#name <- "chartTA"
if (any(is.na(on))) {
chobTA#new <- TRUE
}
else {
chobTA#new <- FALSE
chobTA#on <- on
}
chobTA#call <- match.call()
legend.name <- gsub("^.*[(]", " On Balance Volume (", deparse(match.call()))#,
#extended = TRUE)
gpars <- c(list(...), list(col=4))[unique(names(c(list(col=4), list(...))))]
chobTA#params <- list(xrange = lchob#xrange, yrange = yrange,
colors = lchob#colors, color.vol = lchob#color.vol, multi.col = lchob#multi.col,
spacing = lchob#spacing, width = lchob#width, bp = lchob#bp,
x.labels = lchob#x.labels, time.scale = lchob#time.scale,
isLogical = is.logical(x), legend = legend, legend.name = legend.name,
pars = list(gpars))
if (is.null(sys.call(-1))) {
TA <- lchob#passed.args$TA
lchob#passed.args$TA <- c(TA, chobTA)
lchob#windows <- lchob#windows + ifelse(chobTA#new, 1,
0)
chartSeries.chob <- quantmod:::chartSeries.chob
do.call("chartSeries.chob", list(lchob))
invisible(chobTA)
}
else {
return(chobTA)
}
}
Now it works.
# reproduce your data
ydat <- getSymbols("ZEEL.NS", src="yahoo", from="2012-09-11",
to="2013-01-18", auto.assign=FALSE)
lineChart(ydat, TA=NULL, name=paste("ZEEL Technical Graph"))
plot(addSMA(10))
plot(addEMA(10))
plot(addRSI())
plot(addADX())
plot(addOBV2())
This code reproduces the error:
library(quantmod)
getSymbols("AAPL")
lineChart(AAPL, 'last 6 months')
addOBV()
Session Info:
sessionInfo()
R version 2.15.0 (2012-03-30)
Platform: x86_64-apple-darwin9.8.0/x86_64 (64-bit)
locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] quantmod_0.3-17 TTR_0.21-1 xts_0.9-1 zoo_1.7-9 Defaults_1.1-1 rgeos_0.2-11
[7] sp_1.0-5 sos_1.3-5 brew_1.0-6
loaded via a namespace (and not attached):
[1] grid_2.15.0 lattice_0.20-6 tools_2.15.0
Googling around, the error seems to be related to the fact that addOBV converts the data into a matrix, which causes problems with TTR::OBV. A patch has been posted on RForge.

Resources