choosing the best combination of input values to a model in r - r

I'm trying to optimize a model that I've written in R.
I can run the model with the following:
mod <- run_model(data,md,s1,s2,s3)
where md,s1,s2,s3 are numerical values that are used to define specific parameters in the model. The question that I have is: is it possible in R to select the best combination of md, s1, s2 and s3 to drive the model. Specifically, i know that these variables should be one of the following
md <- c(1, 0.75, 0.5, 1.5, 0.3, 2.5)
s1 <- c(0.6,0.8)
s2 <- c(0.3,0.4,0.6)
s3 <- c(0.17336, 0.18246, 0.1921, 0.22624, 0.28704, 0.33518,
0.5534, 0.7442, 1.019, 1.5122)
but I would like to know how to reduce the model error by selecting the best combination of these values.
So, if the error is defined by
err = observed - mod
how can I select the best possible combination of these input partameters to get the lowest err?
I was thinking that this might be possible in a loop (i.e. 4 different loops) but that idea does not sound very efficient. I was wondering if someone else had some suggestions as to what I should do? Note that I cannot use optim here because I don't wantto be told what the input value shold be, but to select the best value from the vectors provided. Any advice would be appreciated.

This would be easier to answer with a specific run_model function. But since you want to test a discrete set of parameter values for an arbitrary function with no assumptions about its form, you really need to test each combination to find the maximum.
You can create a data.frame of all possible input parameters with
pp <- expand.grid(md=md, s1=s1, s2=s2, s3=s3)
It would also be best if run_model were vectorized over all it's parameters. If it's not in its current form, you can use Vectorize() to help. I'm going to assume that run_model returns the overall error for a given parameter combination. Then you can do
# make all parameters vectorized (except for the first "data" parameter)
V_run_model <- Vectorize(run_model, vectorize.args = names(formals(run_model))[-1])
# get error values
err <- with(pp, V_run_model(data, md, s1, s2, s3))
# find best parameters (minimal error)
pp[which.min(err), ]

Related

Bootstrap-t Method for Comparing Trimmed Means in R

I am confused with different robust methods to compare independent means. I found good explanation in statistical textbooks. For example yuen() in case of equal sample sizes. My samples are rather unequal, thus I would like to try a bootstrap-t method (from Wilcox book: Introduction to Robust Estimation and Hypothesis Testing, p.163). It says yuenbt() would be a possible solution.
But all textbooks say I can use vectors here:
yuenbt(x,y,tr=0.2,alpha=0.05,nboot=599,side=F)
If I check the local description it says:
yuenbt(formula, data, tr = 0.2, nboot = 599)
What's wrong with my trial:
x <- c(1,2,3)
y <- c(5,6,12,30,2,2,3,65)
yuenbt(x,y)
Why can't I use yuenbt-function with my two vectors? Thank you very much
Looking at the help (for those wondering, yuenbt is from the package WRS2...) for yuenbt, it takes a formula and a dataframe as arguments. My impression is that it expects data in long format.
With your example data, we can achieve that like so:
library(WRS2)
x <- c(1,2,3)
y <- c(5,6,12,30,2,2,3,65)
dat <- data.frame(value=c(x,y),group=rep(c("x","y"), c(length(x),length(y))))
We can then use the function:
yuenbt(value~group, data=dat)

How to get specific fitted value of corresponding x in ycinterextra package in R

I am new to R so please keep that in mind :)
I am currently using package ‘ycinterextra’ and interpolating yield curve with several methods. For example,
maturity<- c(1,2,5,10)
yield<- c(0.39,0.61,1.66,2.58)
t<-seq(from=min(maturity), to=max(maturity), by=0.01)
yc <- ycinter(yM = yield, matsin = maturity, matsout = t, method="SW",typeres="rates")
fitted(yc)
I know how to get fitted(yc), but I don't know how to get one value for specific maturity. for example if I am interested in 4-year yield or 1.5-year yield? What I need is just one value that correspond to specific t (any).
Thansk in advance!
Not sure if I understood correctly and very old question, but here is what I think you should do. Simply match your value from t and find it from fitted values.
as.numeric(fitted(yc))[match(4.5,t)]
[1] 1.460163

How to have sigma of something while plotting?

I am using plotFun function of mosaic package in R. I am trying to plot a 3D plot. Following is my code snippet:
plotFun(2*l*(w/log(beta*kspill*lambda^2+(1+(w/x-w)/10)*(lambda^2*5+lambda^1*5+1))+(w/x-w)/log((1+((w/x-w)/10)^1)*(lambda^2*5+lambda^1*5+1))) ~ x & lambda ,
x.lim=range(0.8,1), lambda.lim=range(1,3),
l=60,w=40,kspill=10,f=1,beta=0.5,surface=TRUE)
This is working fine. Now suppose I want to fix lambda and introduce a new variable t such that if t=2 we get lambda^2*5+lambda^1*5+1 as in the above case. If t=3 we get lambda^3*5+lambda^2*5+lambda^1*5+1 and so on. So now I have t.lim=range(1,3) for a fixed lambda :
plotFun(2*l*(w/log(beta*kspill*lambda^2+(1+(w/x-w)/10)*("depends on t"))+(w/x-w)/log((1+((w/x-w)/10)^1)*("depends on t"))) ~ x & lambda ,
x.lim=range(0.8,1), t.lim=range(0.5,1),
l=60,w=40,kspill=10,f=1,beta=0.5,lambda=1,surface=TRUE)
What to write in the "depends on t" part above. I guess we can't put a for loop there to calculate 5* {summation i=0 to i=t}lambdai. How to go about doing this?
You can define your "depends on t" with
depends_on_t <- makeFun(5 * sum(lambda^(1:round(t))) + 1 ~ t + lambda, lambda = 1)
But you still have some issues to resolve:
1) Your plotFun() command is creating a plot using x and lambda, but I'm guessing you meant x and t.
2) t can only be an integer if you are going to use it in a sum of the type you are suggesting. But you are creating a plot that assumes continuous variables for the axes. I inserted round(t) as one way to move from real values to integer values before computing the sum. The makes the function work for non-integer values, but it might not be what you really want.
Finally, some additional suggestions:
3) x & lambda should be replaced with x + lambda. The use of & here dates back to the very early days of the mosaic package, and although it is still supported (I think, we don't really test for it anymore), we prefer x + lambda.
4) I recommend separating out the definition of your function from the plotFun() command. You can use mosaic::makeFun() as illustrated above, or the usual function() to define your function and whatever default values you want for its arguments. Then you can do sanity checks on the function, or use it in multiple plots rather than including all of the function's definition in each plot.
5) Using spaces and returns would make your code much more readable. (As would simplifying your example to a minimal example that demonstrates the issue you are asking about.)
6) I think you might want
depends_on_t <- makeFun(5 * sum(lambda^(0:round(t))) ~ t + lambda, lambda = 1)
rather than the formula as you describe it, but without more context, I can't really know.

How can we specify a custom lambda sequence to glmnet

I am new to the glmnet package in R, and wanted to specify a lambda function based on the suggestion in a published research paper to the glmnet.cv function. The documentation suggests that we can supply a decreasing sequence of lambdas as a parameter. However, in the documentation there are no examples of how to do this.
It would be very grateful if someone can suggest how to go about doing this. Do I pass a vector of 100 odd values (default value for nlambda) to the function? What restrictions should be there for the min and max value of this vector, if any? Also, are their things to keep in mind regarding nvars, nobs etc. while specifying the vector?
Thanks in advance.
You can define a grid like this :
grid=10^seq(10,-2,length=100) ##get lambda sequence
ridge_mod=glmnet(x,y,alpha=0,lambda=grid)
This is fairly easy though it's not well explained in the original documentation ;)
In the following I've used cox family but you can change it based on your need
my_cvglmnet_fit <- cv.glmnet(x=regression_data, y=glmnet_response, family="cox", maxit = 100000)
Then you can plot the fitted object created by the cv.glmnet and in the plot you can easily see where the lambda is minimum. one of those dotted vertical lines is the minimum lambda and the other one is the 1se.
plot(my_cvglmnet_fit)
the following lines helps you see the non zero coefficients and their corresponding values:
coef(my_cvglmnet_fit, s = "lambda.min")[which(coef(my_cvglmnet_fit, s = "lambda.min") != 0)] # the non zero coefficients
colnames(regression_data)[which(coef(my_cvglmnet_fit, s = "lambda.min") != 0)] # The features that are selected
here are some links that may help:
http://web.stanford.edu/~hastie/glmnet/glmnet_alpha.html
http://blog.revolutionanalytics.com/2013/05/hastie-glmnet.html

Using outer() with predict()

I am trying to use the outer function with predict in some classification code in R. For ease, we will assume in this post that we have two vectors named alpha and beta each containing ONLY 0 and 1. I am looking for a simple yet efficient way to pass all combinations of alpha and beta to predict.
I have constructed the code below to mimic the lda function from the MASS library, so rather than "lda", I am using "classifier". It is important to note that the prediction method within predict depends on an (alpha, beta) pair.
Of course, I could use a nested for loop to do this, but I am trying to avoid this method.
Here is what I would like to do ideally:
alpha <- seq(0, 1)
beta <- seq(0, 1)
classifier.out <- classifier(training.data, labels)
outer(X=alpha, Y=beta, FUN="predict", classifier.out, validation.data)
This is a problem because alpha and beta are not the first two parameters in predict.
So, in order to get around this, I changed the last line to
outer(X=alpha, Y=beta, FUN="predict", object=classifier.out, data=validation.data)
Note that my validation data has 40 observations, and also that there are 4 possible pairs of alpha and beta. I get an error though saying
dims [product 4] do not match the length of object [40]
I have tried a few other things, some of which work but are far from simple. Any suggestions?
The problem is that outer expects its function to be vectorized (i.e., it will call predict ONCE with a vector of all the arguments it wants executed). Therefore, when predict is called once, returning its result (which happens to be of length 4), outer complains because it doesn't equal the expected 40.
One way to fix this is to use Vectorize. Untested code:
outer(X=alpha, Y=beta, FUN=Vectorize(predict, vectorize.args=c("alpha", "beta")), object=classifier.out, data=validation.data)
I figured out one decent way to do this. Here it is:
pairs <- expand.grid(alpha, beta)
names(pairs) <- c("alpha", "beta")
mapply(predict, pairs$alpha, pairs$beta,
MoreArgs=list(object=classifier.out, data=validation.data))
Anyone have something simpler and more efficient? I am very eager to know because I spent a little too long on this problem. :(

Resources