I am currently working with R due to a course at university, so I am still quite inexperienced.
We use R for exploratory data analysis. In a data analysis we are supposed to apply different regression models to the data and generate the same plots for each. Additionally, we are supposed to play a bit with the parameters for learning purposes. To avoid unattractive 10-20 times copy-pasting I wrote a function that shows the regression function and the parameters for it as an ellipsis (...). In this function I call the passed function with the ellipsis as parameter.
library("astsa")
data_glob <- globtemp
plot.data.and.reg <- function(data, reg.func, ...){
model <- reg.func(...)
par(mfrow = c(1, 2))
plot(data)
abline(model, col = "orange", lwd = 3)
qqnorm(data)
}
This works for the simple lm function, but unfortunately not for the ksmooth function.
When I pass this function I get the error message: "numeric y must be supplied. For density estimation use density()".
plot.data.and.reg(
data_g,
lm,
list(
formula = as.formula("data_glob ~ time(data_glob)"),
data = data_glob
)
)
plot.data.and.reg(
data_glob,
ksmooth,
list(
x = as.numeric(time(data_glob)),
y = as.numeric(data_glob),
kernel = "box",
bandwidth = 0.25
)
)
Thereupon I looked at the source code of ksmooth. It shows that this error message occurs because the check "missing(y)" fails. Apparently a problem occurs because I passed the parameters as an ellipsis and it doesn't seem to "unpack".
For simplicity, I wrote a dummy function to test if I can add this "unpack" myself.
test.wrapper <- function(func, ...){
func(...)
}
test <- function(x, y){
match.call()
if(missing(y))
print("Leider hatte ich Recht")
print(x)
print(y)
}
test.wrapper(test, list(x = 10, y = 20))
Unfortunately I have not found a solution yet.
From Python I know it so that as with kwargs a dictionary can be unpacked with the ** operator. Is there an equivalent in R? Or how to make sure in R that the parameters from the ellipsis are used correctly?
Since it worked with the lm function without errors I also looked again in their source code . Unfortunately, with my little experience in R, I can't see exactly where the essential difference is.
Overall, I would attribute the error to the fact that the ksmooth function is not yet designed for use with an ellipsis, but I am not sure. How would I need to adjust the ksmooth code to make it work with ...?
(For my Uni task, I will resort to the copy-paste (anti) pattern if in doubt. After searching for so long, I would still be interested in the solution and it may be useful in the future).
Thanks a lot for your help!
The closest equivalent of the */** splat in Python is the do.call function.
However, you don’t need this here. The actual issue is that you’re passing the extra arguments as a list rather than individually. Once you flatten the list, it works1:
plot.data.and.reg(
data_glob,
ksmooth,
x = as.numeric(time(data_glob)),
y = as.numeric(data_glob),
kernel = "box",
bandwidth = 0.25
)
I’m actually surprised that it works with a list for lm; that’s not intentional, it’s essentially an accident caused by how lm is currently implemented.
1 I say it “works” because there’s no error and it plots something, but with your example data there’s no visible regression line (abline is inappropriate for the output of ksmooth), and the smoothing parameters do nothing — the result is identical to the unsmoothed input.
To get this to work, use lines instead of abline. And as for the smoothing, for your example data a bandwidth of 10 works fine.
Related
I made a model using R2jags. I like the jags syntax but I find the output produced by R2jags not easy to use. I recently read about the rstanarm package. It has many useful functions and is well supported by the tidybayes and bayesplot packages for easy model diagnostics and visualisation. However, I'm not a fan of the syntax used to write a model in rstanarm. Ideally, I would like to get the best of the two worlds, that is writing the model in R2jags and convert the output into a Stanreg object to use rstanarm functions.
Is that possible? If so, how?
I think then question isn't necessarily whether or not it's possible - I suspect it probably is. The question really is how much time you're prepared to spend doing it. All you'd have to do is try to replicate in structure the object that gets created by rstanarm, to the extent that it's possible with the R2jags output. That would make it so that some post-processing tasks would probably work.
If I might be so bold, I suspect a better use of your time would be to turn the R2jags object into something that could be used with the post-processing functions you want to use. For example, it only takes a small modification to the JAGS output to make all of the mcmc_*() plotting functions from bayesplot work. Here's an example. Below is the example model from the jags() function help.
# An example model file is given in:
model.file <- system.file(package="R2jags", "model", "schools.txt")
# data
J <- 8.0
y <- c(28.4,7.9,-2.8,6.8,-0.6,0.6,18.0,12.2)
sd <- c(14.9,10.2,16.3,11.0,9.4,11.4,10.4,17.6)
jags.data <- list("y","sd","J")
jags.params <- c("mu","sigma","theta")
jags.inits <- function(){
list("mu"=rnorm(1),"sigma"=runif(1),"theta"=rnorm(J))
}
jagsfit <- jags(data=jags.data, inits=jags.inits, jags.params,
n.iter=5000, model.file=model.file, n.chains = 2)
Now, what the mcmc_*() plotting functions from bayesplot expect is a list of matrices of MCMC draws where the column names give the name of the parameter. By default, jags() puts all of them into a single matrix. In the above case, there are 5000 iterations in total, with 2500 as burnin (leaving 2500 sampled) and the n.thin is set to 2 in this case (jags() has an algorithm for identifying the thinning parameter), but in any case, the jagsfit$BUGSoutput$n.keep element identifies how many iterations are kept. In this case, it's 1250. So you could use that to make a list of two matrices from the output.
jflist <- list(jagsfit$BUGSoutput$sims.matrix[1:jagsfit$BUGSoutput$n.keep, ],
jagsfit$BUGSoutput$sims.matrix[(jagsfit$BUGSoutput$n.keep+1):(2*jagsfit$BUGSoutput$n.keep), ])
Now, you'd just have to call some of the plotting functions:
mcmc_trace(jflist, regex_pars="theta")
or
mcmc_areas(jflist, regex_pars="theta")
So, instead of trying to replicate all of the output that rstanarm produces, it might be a better use of your time to try to bend the jags output into a format that would be amenable to the post-processing functions you want to use.
EDIT - added possibility for pp_check() from bayesplot.
The posterior draws of y in this case are in the theta parameters. So, we make an object that has elements y and yrep and make it of class foo
x <- list(y = y, yrep = jagsfit$BUGSoutput$sims.list$theta)
class(x) <- "foo"
We can then write a pp_check method for objects of class foo. This come straight out of the help file for bayesplot::pp_check().
pp_check.foo <- function(object, ..., type = c("multiple", "overlaid")) {
y <- object[["y"]]
yrep <- object[["yrep"]]
switch(match.arg(type),
multiple = ppc_hist(y, yrep[1:min(8, nrow(yrep)),, drop = FALSE]),
overlaid = ppc_dens_overlay(y, yrep[1:min(8, nrow(yrep)),, drop = FALSE]))
}
Then, just call the function:
pp_check(x, type="overlaid")
I am very new to R, and this is my first time of encountering the eval() function. So I am trying to use the med and boot.med function from the following package: mma. I am using it to conduct mediation analysis. med and boot.med take in models such as linear models, and dataframes that specify mediators and predictors and then estimate the mediation effect of each mediator.
The author of the package gives the flexible option of specifying one's own custom.function. From the source code of med, it can be seen that the custom.function is passed to the eval(). So I tried insert the gbmt function as the custom function. However, R kept giving me error message: Error during wrapup: Number of trees to be used in prediction must be provided. I have been searching online for days and tried many ways of specifying the number of trees parameter n.trees, but nothing works (I believe others have raised similar issues: post 1, post 2).
The following codes are part of the source code of the med function:
cf1 = gsub("responseY", "y[,j]", custom.function[j])
cf1 = gsub("dataset123", "x2", cf1)
cf1 = gsub("weights123", "w", cf1)
full.model[[j]] <- eval(parse(text = cf1))
One custom function example the author gives in the package documentation is as follows:
temp1<-med(data=data.bin,n=2,custom.function = 'glm(responseY~.,data=dataset123,family="quasibinomial",
weights=weights123)')
Here the glm is the custom function. This example code works and you can replicate it easily (if you have mma installed and loaded). However when I am trying to use the gbmt function on a survival object, I got errors and here is what my code looks like:
temp1 <- med(data = data.surv,n=2,type = "link",
custom.function = 'gbmt(responseY ~.,
data = dataset123,
distribution = dist,
train_params = start_stop,
cv_folds=10,
keep_gbm_data = TRUE,
)')
Anyone has any idea how the argument about number of trees n.trees can be added somewhere in the above code?
Many thanks in advance!
Update: in order to replicate the example code, please install mma and try the following:
library("mma")
data("weight_behavior") ##binary x #binary y
x=weight_behavior[,c(2,4:14)]
pred=weight_behavior[,3]
y=weight_behavior[,15]
data.bin<-data.org(x,y,pred=pred,contmed=c(7:9,11:12),binmed=c(6,10), binref=c(1,1),catmed=5,catref=1,predref="M",alpha=0.4,alpha2=0.4)
temp1<-med(data=data.bin,n=2) #or use self-defined final function
temp1<-med(data=data.bin,n=2, custom.function = 'glm(responseY~.,data=dataset123,family="quasibinomial",
weights=weights123)')
I changed the custom.function to gbmt and used a survival object as responseY and the error occurs. When I use the gbmt function on my data outside the med function, there is no error.
I am trying to use the Nomad technique for blackbox optimisation from the crs package (C implementation), which is called via the snomadr function. The method works when trying straight numerical optimisation, but errors when categorical features are included. However the help for categorical optimisation is not very well documented, so I am struggling to see where I am going wrong. Reproducible code below:
library(crs)
library(randomForest)
Illustrating this on randomForest & the iris dataset.
Creating the randomForest model (leaving the last row out as starting points for the optimizer)
rfIris <- randomForest(x=iris[-150,-c(1)], y=unlist(iris[-150,1]))
The objective function (functions we want to optimize)
objFn <- function(x0,model){
preds <- predict(object = model, newdata = x0)
as.numeric(preds)
}
Test to see if the objective function works (should return ~6.37)
objOut <- objFn(x0=unlist(iris[150,-c(1)]),model = rfIris)
Creating initial conditions, options list, and upper/lower bounds for Nomad
x0 <- iris[150,-c(1)]
x0 <- unlist(x0)
options <- list("MAX_BB_EVAL"=10000,
"MIN_MESH_SIZE"=0.001,
"INITIAL_MESH_SIZE"=1,
"MIN_POLL_SIZE"=0.001,
"NEIGHBORS_EXE" = c(1,2,3),
"EXTENDED_POLL_ENABLED" = 'yes',
"EXTENDED_POLL_TRIGGER" = 'r0.01',
"VNS_SEARCH" = '1')
up <- c(10,10,10,10)
low <- c(0,0,0,0)
Calling the optimizer
opt <- snomadr(eval.f = objFn, n = 4, bbin = c(0,0,0,2), bbout = 0, x0= x0 ,model = rfIris, opts=options,
ub = up, lb = low)
and I get an error about the NEIGHBORS_EXE parameter in the options list. It seems as if I need to supply NEIGHBORS_EXE a file corresponding to a set of 'extended poll' coordinates, however is it not clear what these exactly are.
The method works by setting "EXTENDED_POLL_ENABLED" = 'no' in the options list, as it then ignores the categorical variables and defaults to numerical optimisation, but this is not what I want.
I also managed to pull up some additional information for NEIGHBORS_EXE using
snomadr(information=list("help"="-h NEIGHBORS_EXE"))
and again, do not understand what the 'neighbours.exe' is meant to be.
Any help would be much appreciated!
This is the response from Zhenghua who coded the R interface:
The issue is that he did not configure the parameter “NEIGHBORS_EXE” properly. He need to prepare an Executable file for defining the neighbors, put the executable file in the folder where R is called, and then set the parameter “NEIGHBORS_EXE” to the executable file name.
You can contact us at nomad#gerad.ca if you wish to continue the discussion.
About the neighbours_exe parameter you can refer to the section 7.1 of user guide of Nomad
https://www.gerad.ca/nomad/Downloads/user_guide.pdf
I am using the randomForest package (v. 4.6-7) in R 2.15.2. I cannot find the source code for the partialPlot function and am trying to figure out exactly what it does (the help file seems to be incomplete.) It is supposed to take the name of a variable x.var as an argument:
library(randomForest)
data(iris)
rf <- randomForest(Species ~., data=iris)
x1 <- "Sepal.Length"
partialPlot(x=rf, pred.data=iris, x.var=x1)
# Error in `[.data.frame`(pred.data, , xname) : undefined columns selected
partialPlot(x=rf, pred.data=iris, x.var=as.character(x1))
# works!
typeof(x1)
# [1] "character"
x1 == as.character(x1)
# TRUE
# Now if I try to wrap it in a function...
f <- function(w){
partialPlot(x=rf, pred.data=iris, x.var=as.character(w))
}
f(x1)
# Error in as.character(w) : 'w' is missing
Questions:
1) Where can I find the source code for partialPlot?
2) How is it possible to write a function which takes a string x1 as an argument where x1 == as.character(x1), but the function throws an error when as.character is not applied to x1?
3) Why does it fail when I wrap it inside a function? Is partialPlot messing with environments somehow?
Tips/ things to try that might be helpful for solving such questions by myself in future would also be very welcome!
The source code for partialPlot() is found by entering
randomForest:::partialPlot.randomForest
into the console. I found this by first running
methods(partialPlot)
because entering partialPlot only tells me that it uses a method. From the methods call we see that there is one method, and the asterisk next to it tells us that it is a non-exported function. To view the source code of a non-exported function, we use the triple-colon operator :::. So it goes
package:::generic.method
Where package is the package, generic is the generic function (here it's partialPlot), and method is the method (here it's the randomForest method).
Now, as for the other questions, the function can be written with do.call() and you can pass w without a wrapper.
f <- function(w) {
do.call("partialPlot", list(x = rf, pred.data = iris, x.var = w))
}
f(x1)
This works on my machine. It's not so much environments as it is evaluation. Many plotting functions use some non-standard evaluation, which can be handled most of the time with this do.call() construct.
But note that outside the function you can also use eval() on x1.
partialPlot(x = rf, pred.data = iris, x.var = eval(x1))
I don't really see a reason to check for the presence of as.character() inside the function. If you can leave a comment we can go from there if you need more info. I'm not familiar enough with this package yet to go any further.
I'm relatively new in R and I would appreciated if you could take a look at the following code. I'm trying to estimate the shape parameter of the Frechet distribution (or inverse weibull) using mmedist (I tried also the fitdist that calls for mmedist) but it seems that I get the following error :
Error in mmedist(data, distname, start = start, fix.arg = fix.arg, ...) :
the empirical moment function must be defined.
The code that I use is the below:
require(actuar)
library(fitdistrplus)
library(MASS)
#values
n=100
scale = 1
shape=3
# simulate a sample
data_fre = rinvweibull(n, shape, scale)
memp=minvweibull(c(1,2), shape=3, rate=1, scale=1)
# estimating the parameters
para_lm = mmedist(data_fre,"invweibull",start=c(shape=3,scale=1),order=c(1,2),memp = "memp")
Please note that I tried many times en-changing the code in order to see if my mistake was in syntax but I always get the same error.
I'm aware of the paradigm in the documentation. I've tried that as well but with no luck. Please note that in order for the method to work the order of the moment must be smaller than the shape parameter (i.e. shape).
The example is the following:
require(actuar)
#simulate a sample
x4 <- rpareto(1000, 6, 2)
#empirical raw moment
memp <- function(x, order)
ifelse(order == 1, mean(x), sum(x^order)/length(x))
#fit
mmedist(x4, "pareto", order=c(1, 2), memp="memp",
start=c(shape=10, scale=10), lower=1, upper=Inf)
Thank you in advance for any help.
You will need to make non-trivial changes to the source of mmedist -- I recommend that you copy out the code, and make your own function foo_mmedist.
The first change you need to make is on line 94 of mmedist:
if (!exists("memp", mode = "function"))
That line checks whether "memp" is a function that exists, as opposed to whether the argument that you have actually passed exists as a function.
if (!exists(as.character(expression(memp)), mode = "function"))
The second, as I have already noted, relates to the fact that the optim routine actually calls funobj which calls DIFF2, which calls (see line 112) the user-supplied memp function, minvweibull in your case with two arguments -- obs, which resolves to data and order, but since minvweibull does not take data as the first argument, this fails.
This is expected, as the help page tells you:
memp A function implementing empirical moments, raw or centered but
has to be consistent with distr argument. This function must have
two arguments : as a first one the numeric vector of the data and as a
second the order of the moment returned by the function.
How can you fix this? Pass the function moment from the moments package. Here is complete code (assuming that you have made the change above, and created a new function called foo_mmedist):
# values
n = 100
scale = 1
shape = 3
# simulate a sample
data_fre = rinvweibull(n, shape, scale)
# estimating the parameters
para_lm = foo_mmedist(data_fre, "invweibull",
start= c(shape=5,scale=2), order=c(1, 2), memp = moment)
You can check that optimization has occurred as expected:
> para_lm$estimate
shape scale
2.490816 1.004128
Note however, that this actually reduces to a crude way of doing overdetermined method of moments, and am not sure that this is theoretically appropriate.