two functions from different packages not working together - r

So I am doing work on ruin probability. I am performing monte-carlo simulations through R. The package that works perfectly for this is 'ruin'. You simply specify inputs for different ruin models. I am trying to simulate a SparreAndersen model where there are two random variable generators from different distributions. Typical distributions such as exponential or gamma work fine, using rgamma or rexp.
The problem that I am having is that I require a mixed exponential function as one of the r.v. generators. I found a great package that achieves this, called 'gendist' where I use the rmixt function to generate r.v.s from a mixed exponential function.
Running rmixt with all parameters works fine, giving a correct and accurate numerical output.
Running rexp with all parameters works fine, giving a correct and accurate numerical output.
Running a SparreAndersen() function works fine when using any function that is NOT the rmixt function.
The problem: SparreAndersen does not like to use rmixt from 'gendist' package as a r.v. generator
When I try to run this function, I get this as an error
I have no idea why this isn't working???
See my entire code below:
library(ruin)
library(gendist)
rmixt(1, phi=1, spec1="exp", arg1=list(rate=3), spec2="exp",
arg2=list(rate=7) )
rexp(1, rate=1)
modelSA <- SparreAndersen(initial_capital = 1,
premium_rate = 5,
claim_interarrival_generator = rmixt,
claim_interarrival_parameters = list(1, phi=1, spec1="exp", arg1=list(rate=3), spec2="exp", arg2=list(rate=7)),
claim_size_generator = rexp,
claim_size_parameters = list(rate=1))
edit: I have found a working solution, not involving the 'gendist' package. Let me know if this is the best way to go about doing this.
My new working code:
# Function which generates multiple exponential
rmixedexp <- function(...) {
choice <- c(...)
return(rexp(n=1, sample(choice, 1, replace=TRUE)))
}
modelSA <- SparreAndersen(initial_capital = 5,
premium_rate = 5,
claim_interarrival_generator = rmixedexp,
claim_interarrival_parameters = list(1, 3, 7),
claim_size_generator = rexp,
claim_size_parameters = list(rate=1))
I had to make my own function which takes any number of parameters, for example, 2. Each parameter is the rate of the exponential, chosen randomly (probability half).

Related

ksmooth function doesn't work with parameters via ellipsis

I am currently working with R due to a course at university, so I am still quite inexperienced.
We use R for exploratory data analysis. In a data analysis we are supposed to apply different regression models to the data and generate the same plots for each. Additionally, we are supposed to play a bit with the parameters for learning purposes. To avoid unattractive 10-20 times copy-pasting I wrote a function that shows the regression function and the parameters for it as an ellipsis (...). In this function I call the passed function with the ellipsis as parameter.
library("astsa")
data_glob <- globtemp
plot.data.and.reg <- function(data, reg.func, ...){
model <- reg.func(...)
par(mfrow = c(1, 2))
plot(data)
abline(model, col = "orange", lwd = 3)
qqnorm(data)
}
This works for the simple lm function, but unfortunately not for the ksmooth function.
When I pass this function I get the error message: "numeric y must be supplied. For density estimation use density()".
plot.data.and.reg(
data_g,
lm,
list(
formula = as.formula("data_glob ~ time(data_glob)"),
data = data_glob
)
)
plot.data.and.reg(
data_glob,
ksmooth,
list(
x = as.numeric(time(data_glob)),
y = as.numeric(data_glob),
kernel = "box",
bandwidth = 0.25
)
)
Thereupon I looked at the source code of ksmooth. It shows that this error message occurs because the check "missing(y)" fails. Apparently a problem occurs because I passed the parameters as an ellipsis and it doesn't seem to "unpack".
For simplicity, I wrote a dummy function to test if I can add this "unpack" myself.
test.wrapper <- function(func, ...){
func(...)
}
test <- function(x, y){
match.call()
if(missing(y))
print("Leider hatte ich Recht")
print(x)
print(y)
}
test.wrapper(test, list(x = 10, y = 20))
Unfortunately I have not found a solution yet.
From Python I know it so that as with kwargs a dictionary can be unpacked with the ** operator. Is there an equivalent in R? Or how to make sure in R that the parameters from the ellipsis are used correctly?
Since it worked with the lm function without errors I also looked again in their source code . Unfortunately, with my little experience in R, I can't see exactly where the essential difference is.
Overall, I would attribute the error to the fact that the ksmooth function is not yet designed for use with an ellipsis, but I am not sure. How would I need to adjust the ksmooth code to make it work with ...?
(For my Uni task, I will resort to the copy-paste (anti) pattern if in doubt. After searching for so long, I would still be interested in the solution and it may be useful in the future).
Thanks a lot for your help!
The closest equivalent of the */** splat in Python is the do.call function.
However, you don’t need this here. The actual issue is that you’re passing the extra arguments as a list rather than individually. Once you flatten the list, it works1:
plot.data.and.reg(
data_glob,
ksmooth,
x = as.numeric(time(data_glob)),
y = as.numeric(data_glob),
kernel = "box",
bandwidth = 0.25
)
I’m actually surprised that it works with a list for lm; that’s not intentional, it’s essentially an accident caused by how lm is currently implemented.
1 I say it “works” because there’s no error and it plots something, but with your example data there’s no visible regression line (abline is inappropriate for the output of ksmooth), and the smoothing parameters do nothing — the result is identical to the unsmoothed input.
To get this to work, use lines instead of abline. And as for the smoothing, for your example data a bandwidth of 10 works fine.

Converting a R2jags object into a Stanreg (rstanarm) object

I made a model using R2jags. I like the jags syntax but I find the output produced by R2jags not easy to use. I recently read about the rstanarm package. It has many useful functions and is well supported by the tidybayes and bayesplot packages for easy model diagnostics and visualisation. However, I'm not a fan of the syntax used to write a model in rstanarm. Ideally, I would like to get the best of the two worlds, that is writing the model in R2jags and convert the output into a Stanreg object to use rstanarm functions.
Is that possible? If so, how?
I think then question isn't necessarily whether or not it's possible - I suspect it probably is. The question really is how much time you're prepared to spend doing it. All you'd have to do is try to replicate in structure the object that gets created by rstanarm, to the extent that it's possible with the R2jags output. That would make it so that some post-processing tasks would probably work.
If I might be so bold, I suspect a better use of your time would be to turn the R2jags object into something that could be used with the post-processing functions you want to use. For example, it only takes a small modification to the JAGS output to make all of the mcmc_*() plotting functions from bayesplot work. Here's an example. Below is the example model from the jags() function help.
# An example model file is given in:
model.file <- system.file(package="R2jags", "model", "schools.txt")
# data
J <- 8.0
y <- c(28.4,7.9,-2.8,6.8,-0.6,0.6,18.0,12.2)
sd <- c(14.9,10.2,16.3,11.0,9.4,11.4,10.4,17.6)
jags.data <- list("y","sd","J")
jags.params <- c("mu","sigma","theta")
jags.inits <- function(){
list("mu"=rnorm(1),"sigma"=runif(1),"theta"=rnorm(J))
}
jagsfit <- jags(data=jags.data, inits=jags.inits, jags.params,
n.iter=5000, model.file=model.file, n.chains = 2)
Now, what the mcmc_*() plotting functions from bayesplot expect is a list of matrices of MCMC draws where the column names give the name of the parameter. By default, jags() puts all of them into a single matrix. In the above case, there are 5000 iterations in total, with 2500 as burnin (leaving 2500 sampled) and the n.thin is set to 2 in this case (jags() has an algorithm for identifying the thinning parameter), but in any case, the jagsfit$BUGSoutput$n.keep element identifies how many iterations are kept. In this case, it's 1250. So you could use that to make a list of two matrices from the output.
jflist <- list(jagsfit$BUGSoutput$sims.matrix[1:jagsfit$BUGSoutput$n.keep, ],
jagsfit$BUGSoutput$sims.matrix[(jagsfit$BUGSoutput$n.keep+1):(2*jagsfit$BUGSoutput$n.keep), ])
Now, you'd just have to call some of the plotting functions:
mcmc_trace(jflist, regex_pars="theta")
or
mcmc_areas(jflist, regex_pars="theta")
So, instead of trying to replicate all of the output that rstanarm produces, it might be a better use of your time to try to bend the jags output into a format that would be amenable to the post-processing functions you want to use.
EDIT - added possibility for pp_check() from bayesplot.
The posterior draws of y in this case are in the theta parameters. So, we make an object that has elements y and yrep and make it of class foo
x <- list(y = y, yrep = jagsfit$BUGSoutput$sims.list$theta)
class(x) <- "foo"
We can then write a pp_check method for objects of class foo. This come straight out of the help file for bayesplot::pp_check().
pp_check.foo <- function(object, ..., type = c("multiple", "overlaid")) {
y <- object[["y"]]
yrep <- object[["yrep"]]
switch(match.arg(type),
multiple = ppc_hist(y, yrep[1:min(8, nrow(yrep)),, drop = FALSE]),
overlaid = ppc_dens_overlay(y, yrep[1:min(8, nrow(yrep)),, drop = FALSE]))
}
Then, just call the function:
pp_check(x, type="overlaid")

Fitting Step functions

AIM: The aim here was to find a suitable fit, using step functions, which uses age to describe wage, in the Wage dataset in the library ISLR.
PLAN:
To find a suitable fit, I'll try multiple fits, which will have different cut points. I'll use the glm() function (of the boot library) for the fitting purpose. In order to check which fit is the best, I'll use the cv.glm() function to perform cross-validation over the fitted model.
PROBLEM:
In order to do so, I did the following:
all.cvs = rep(NA, 10)
for (i in 2:10) {
lm.fit = glm(wage~cut(Wage$age,i), data=Wage)
all.cvs[i] = cv.glm(Wage, lm.fit, K=10)$delta[2]
}
But this gives an error:
Error in model.frame.default(formula = wage ~ cut(Wage$age, i), data =
list( : variable lengths differ (found for 'cut(Wage$age, i)')
Whereas, when I run the code given below, it runs.(It can be found here)
all.cvs = rep(NA, 10)
for (i in 2:10) {
Wage$age.cut = cut(Wage$age, i)
lm.fit = glm(wage~age.cut, data=Wage)
all.cvs[i] = cv.glm(Wage, lm.fit, K=10)$delta[2]
}
Hypotheses and Results:
Well, it might be possible that cut() and glm() might not work together. But this works:
glm(wage~cut(age,4),data=Wage)
Question:
So, basically we're using the cut() function, saving it's results in a variable, then using that variable in the glm() function. But we can't put the cut function inside the glm() function. And that too, only if the code is in a loop.
So, why is the first version of the code not working?
This is confusing. Any help appreciated.

r Nomad categorical optimisation (snomadr)

I am trying to use the Nomad technique for blackbox optimisation from the crs package (C implementation), which is called via the snomadr function. The method works when trying straight numerical optimisation, but errors when categorical features are included. However the help for categorical optimisation is not very well documented, so I am struggling to see where I am going wrong. Reproducible code below:
library(crs)
library(randomForest)
Illustrating this on randomForest & the iris dataset.
Creating the randomForest model (leaving the last row out as starting points for the optimizer)
rfIris <- randomForest(x=iris[-150,-c(1)], y=unlist(iris[-150,1]))
The objective function (functions we want to optimize)
objFn <- function(x0,model){
preds <- predict(object = model, newdata = x0)
as.numeric(preds)
}
Test to see if the objective function works (should return ~6.37)
objOut <- objFn(x0=unlist(iris[150,-c(1)]),model = rfIris)
Creating initial conditions, options list, and upper/lower bounds for Nomad
x0 <- iris[150,-c(1)]
x0 <- unlist(x0)
options <- list("MAX_BB_EVAL"=10000,
"MIN_MESH_SIZE"=0.001,
"INITIAL_MESH_SIZE"=1,
"MIN_POLL_SIZE"=0.001,
"NEIGHBORS_EXE" = c(1,2,3),
"EXTENDED_POLL_ENABLED" = 'yes',
"EXTENDED_POLL_TRIGGER" = 'r0.01',
"VNS_SEARCH" = '1')
up <- c(10,10,10,10)
low <- c(0,0,0,0)
Calling the optimizer
opt <- snomadr(eval.f = objFn, n = 4, bbin = c(0,0,0,2), bbout = 0, x0= x0 ,model = rfIris, opts=options,
ub = up, lb = low)
and I get an error about the NEIGHBORS_EXE parameter in the options list. It seems as if I need to supply NEIGHBORS_EXE a file corresponding to a set of 'extended poll' coordinates, however is it not clear what these exactly are.
The method works by setting "EXTENDED_POLL_ENABLED" = 'no' in the options list, as it then ignores the categorical variables and defaults to numerical optimisation, but this is not what I want.
I also managed to pull up some additional information for NEIGHBORS_EXE using
snomadr(information=list("help"="-h NEIGHBORS_EXE"))
and again, do not understand what the 'neighbours.exe' is meant to be.
Any help would be much appreciated!
This is the response from Zhenghua who coded the R interface:
The issue is that he did not configure the parameter “NEIGHBORS_EXE” properly. He need to prepare an Executable file for defining the neighbors, put the executable file in the folder where R is called, and then set the parameter “NEIGHBORS_EXE” to the executable file name.
You can contact us at nomad#gerad.ca if you wish to continue the discussion.
About the neighbours_exe parameter you can refer to the section 7.1 of user guide of Nomad
https://www.gerad.ca/nomad/Downloads/user_guide.pdf

Estimate parameters of Frechet distribution using mmedist or fitdist(with mme) error

I'm relatively new in R and I would appreciated if you could take a look at the following code. I'm trying to estimate the shape parameter of the Frechet distribution (or inverse weibull) using mmedist (I tried also the fitdist that calls for mmedist) but it seems that I get the following error :
Error in mmedist(data, distname, start = start, fix.arg = fix.arg, ...) :
the empirical moment function must be defined.
The code that I use is the below:
require(actuar)
library(fitdistrplus)
library(MASS)
#values
n=100
scale = 1
shape=3
# simulate a sample
data_fre = rinvweibull(n, shape, scale)
memp=minvweibull(c(1,2), shape=3, rate=1, scale=1)
# estimating the parameters
para_lm = mmedist(data_fre,"invweibull",start=c(shape=3,scale=1),order=c(1,2),memp = "memp")
Please note that I tried many times en-changing the code in order to see if my mistake was in syntax but I always get the same error.
I'm aware of the paradigm in the documentation. I've tried that as well but with no luck. Please note that in order for the method to work the order of the moment must be smaller than the shape parameter (i.e. shape).
The example is the following:
require(actuar)
#simulate a sample
x4 <- rpareto(1000, 6, 2)
#empirical raw moment
memp <- function(x, order)
ifelse(order == 1, mean(x), sum(x^order)/length(x))
#fit
mmedist(x4, "pareto", order=c(1, 2), memp="memp",
start=c(shape=10, scale=10), lower=1, upper=Inf)
Thank you in advance for any help.
You will need to make non-trivial changes to the source of mmedist -- I recommend that you copy out the code, and make your own function foo_mmedist.
The first change you need to make is on line 94 of mmedist:
if (!exists("memp", mode = "function"))
That line checks whether "memp" is a function that exists, as opposed to whether the argument that you have actually passed exists as a function.
if (!exists(as.character(expression(memp)), mode = "function"))
The second, as I have already noted, relates to the fact that the optim routine actually calls funobj which calls DIFF2, which calls (see line 112) the user-supplied memp function, minvweibull in your case with two arguments -- obs, which resolves to data and order, but since minvweibull does not take data as the first argument, this fails.
This is expected, as the help page tells you:
memp A function implementing empirical moments, raw or centered but
has to be consistent with distr argument. This function must have
two arguments : as a first one the numeric vector of the data and as a
second the order of the moment returned by the function.
How can you fix this? Pass the function moment from the moments package. Here is complete code (assuming that you have made the change above, and created a new function called foo_mmedist):
# values
n = 100
scale = 1
shape = 3
# simulate a sample
data_fre = rinvweibull(n, shape, scale)
# estimating the parameters
para_lm = foo_mmedist(data_fre, "invweibull",
start= c(shape=5,scale=2), order=c(1, 2), memp = moment)
You can check that optimization has occurred as expected:
> para_lm$estimate
shape scale
2.490816 1.004128
Note however, that this actually reduces to a crude way of doing overdetermined method of moments, and am not sure that this is theoretically appropriate.

Resources