Using outer() with predict() - r

I am trying to use the outer function with predict in some classification code in R. For ease, we will assume in this post that we have two vectors named alpha and beta each containing ONLY 0 and 1. I am looking for a simple yet efficient way to pass all combinations of alpha and beta to predict.
I have constructed the code below to mimic the lda function from the MASS library, so rather than "lda", I am using "classifier". It is important to note that the prediction method within predict depends on an (alpha, beta) pair.
Of course, I could use a nested for loop to do this, but I am trying to avoid this method.
Here is what I would like to do ideally:
alpha <- seq(0, 1)
beta <- seq(0, 1)
classifier.out <- classifier(training.data, labels)
outer(X=alpha, Y=beta, FUN="predict", classifier.out, validation.data)
This is a problem because alpha and beta are not the first two parameters in predict.
So, in order to get around this, I changed the last line to
outer(X=alpha, Y=beta, FUN="predict", object=classifier.out, data=validation.data)
Note that my validation data has 40 observations, and also that there are 4 possible pairs of alpha and beta. I get an error though saying
dims [product 4] do not match the length of object [40]
I have tried a few other things, some of which work but are far from simple. Any suggestions?

The problem is that outer expects its function to be vectorized (i.e., it will call predict ONCE with a vector of all the arguments it wants executed). Therefore, when predict is called once, returning its result (which happens to be of length 4), outer complains because it doesn't equal the expected 40.
One way to fix this is to use Vectorize. Untested code:
outer(X=alpha, Y=beta, FUN=Vectorize(predict, vectorize.args=c("alpha", "beta")), object=classifier.out, data=validation.data)

I figured out one decent way to do this. Here it is:
pairs <- expand.grid(alpha, beta)
names(pairs) <- c("alpha", "beta")
mapply(predict, pairs$alpha, pairs$beta,
MoreArgs=list(object=classifier.out, data=validation.data))
Anyone have something simpler and more efficient? I am very eager to know because I spent a little too long on this problem. :(

Related

Low-pass fltering of a matrix

I'm trying to write a low-pass filter in R, to clean a "dirty" data matrix.
I did a google search, came up with a dazzling range of packages. Some apply to 1D signals (time series mostly, e.g. How do I run a high pass or low pass filter on data points in R? ); some apply to images. However I'm trying to filter a plain R data matrix. The image filters are the closest equivalent, but I'm a bit reluctant to go this way as they typically involve (i) installation of more or less complex/heavy solutions (imageMagick...), and/or (ii) conversion from matrix to image.
Here is sample data:
r<-seq(0:360)/360*(2*pi)
x<-cos(r)
y<-sin(r)
z<-outer(x,y,"*")
noise<-0.3*matrix(runif(length(x)*length(y)),nrow=length(x))
zz<-z+noise
image(zz)
What I'm looking for is a filter that will return a "cleaned" matrix (i.e. something close to z, in this case).
I'm aware this is a rather open-ended question, and I'm also happy with pointers ("have you looked at package so-and-so"), although of course I'd value sample code from users with experience on signal processing !
Thanks.
One option may be using a non-linear prediction method and getting the fitted values from the model.
For example by using a polynomial regression, we can predict the original data as the purple one,
By following the same logic, you can do the same thing to all columns of the zz matrix as,
predictions <- matrix(, nrow = 361, ncol = 0)
for(i in 1:ncol(zz)) {
pred <- as.matrix(fitted(lm(zz[,i]~poly(1:nrow(zz),2,raw=TRUE))))
predictions <- cbind(predictions,pred)
}
Then you can plot the predictions,
par(mfrow=c(1,3))
image(z,main="Original")
image(zz,main="Noisy")
image(predictions,main="Predicted")
Note that, I used a polynomial regression with degree 2, you can change the degree for a better fitting across the columns. Or maybe, you can use some other powerful non-linear prediction methods (maybe SVM, ANN etc.) to get a more accurate model.

How to have sigma of something while plotting?

I am using plotFun function of mosaic package in R. I am trying to plot a 3D plot. Following is my code snippet:
plotFun(2*l*(w/log(beta*kspill*lambda^2+(1+(w/x-w)/10)*(lambda^2*5+lambda^1*5+1))+(w/x-w)/log((1+((w/x-w)/10)^1)*(lambda^2*5+lambda^1*5+1))) ~ x & lambda ,
x.lim=range(0.8,1), lambda.lim=range(1,3),
l=60,w=40,kspill=10,f=1,beta=0.5,surface=TRUE)
This is working fine. Now suppose I want to fix lambda and introduce a new variable t such that if t=2 we get lambda^2*5+lambda^1*5+1 as in the above case. If t=3 we get lambda^3*5+lambda^2*5+lambda^1*5+1 and so on. So now I have t.lim=range(1,3) for a fixed lambda :
plotFun(2*l*(w/log(beta*kspill*lambda^2+(1+(w/x-w)/10)*("depends on t"))+(w/x-w)/log((1+((w/x-w)/10)^1)*("depends on t"))) ~ x & lambda ,
x.lim=range(0.8,1), t.lim=range(0.5,1),
l=60,w=40,kspill=10,f=1,beta=0.5,lambda=1,surface=TRUE)
What to write in the "depends on t" part above. I guess we can't put a for loop there to calculate 5* {summation i=0 to i=t}lambdai. How to go about doing this?
You can define your "depends on t" with
depends_on_t <- makeFun(5 * sum(lambda^(1:round(t))) + 1 ~ t + lambda, lambda = 1)
But you still have some issues to resolve:
1) Your plotFun() command is creating a plot using x and lambda, but I'm guessing you meant x and t.
2) t can only be an integer if you are going to use it in a sum of the type you are suggesting. But you are creating a plot that assumes continuous variables for the axes. I inserted round(t) as one way to move from real values to integer values before computing the sum. The makes the function work for non-integer values, but it might not be what you really want.
Finally, some additional suggestions:
3) x & lambda should be replaced with x + lambda. The use of & here dates back to the very early days of the mosaic package, and although it is still supported (I think, we don't really test for it anymore), we prefer x + lambda.
4) I recommend separating out the definition of your function from the plotFun() command. You can use mosaic::makeFun() as illustrated above, or the usual function() to define your function and whatever default values you want for its arguments. Then you can do sanity checks on the function, or use it in multiple plots rather than including all of the function's definition in each plot.
5) Using spaces and returns would make your code much more readable. (As would simplifying your example to a minimal example that demonstrates the issue you are asking about.)
6) I think you might want
depends_on_t <- makeFun(5 * sum(lambda^(0:round(t))) ~ t + lambda, lambda = 1)
rather than the formula as you describe it, but without more context, I can't really know.

choosing the best combination of input values to a model in r

I'm trying to optimize a model that I've written in R.
I can run the model with the following:
mod <- run_model(data,md,s1,s2,s3)
where md,s1,s2,s3 are numerical values that are used to define specific parameters in the model. The question that I have is: is it possible in R to select the best combination of md, s1, s2 and s3 to drive the model. Specifically, i know that these variables should be one of the following
md <- c(1, 0.75, 0.5, 1.5, 0.3, 2.5)
s1 <- c(0.6,0.8)
s2 <- c(0.3,0.4,0.6)
s3 <- c(0.17336, 0.18246, 0.1921, 0.22624, 0.28704, 0.33518,
0.5534, 0.7442, 1.019, 1.5122)
but I would like to know how to reduce the model error by selecting the best combination of these values.
So, if the error is defined by
err = observed - mod
how can I select the best possible combination of these input partameters to get the lowest err?
I was thinking that this might be possible in a loop (i.e. 4 different loops) but that idea does not sound very efficient. I was wondering if someone else had some suggestions as to what I should do? Note that I cannot use optim here because I don't wantto be told what the input value shold be, but to select the best value from the vectors provided. Any advice would be appreciated.
This would be easier to answer with a specific run_model function. But since you want to test a discrete set of parameter values for an arbitrary function with no assumptions about its form, you really need to test each combination to find the maximum.
You can create a data.frame of all possible input parameters with
pp <- expand.grid(md=md, s1=s1, s2=s2, s3=s3)
It would also be best if run_model were vectorized over all it's parameters. If it's not in its current form, you can use Vectorize() to help. I'm going to assume that run_model returns the overall error for a given parameter combination. Then you can do
# make all parameters vectorized (except for the first "data" parameter)
V_run_model <- Vectorize(run_model, vectorize.args = names(formals(run_model))[-1])
# get error values
err <- with(pp, V_run_model(data, md, s1, s2, s3))
# find best parameters (minimal error)
pp[which.min(err), ]

How can we specify a custom lambda sequence to glmnet

I am new to the glmnet package in R, and wanted to specify a lambda function based on the suggestion in a published research paper to the glmnet.cv function. The documentation suggests that we can supply a decreasing sequence of lambdas as a parameter. However, in the documentation there are no examples of how to do this.
It would be very grateful if someone can suggest how to go about doing this. Do I pass a vector of 100 odd values (default value for nlambda) to the function? What restrictions should be there for the min and max value of this vector, if any? Also, are their things to keep in mind regarding nvars, nobs etc. while specifying the vector?
Thanks in advance.
You can define a grid like this :
grid=10^seq(10,-2,length=100) ##get lambda sequence
ridge_mod=glmnet(x,y,alpha=0,lambda=grid)
This is fairly easy though it's not well explained in the original documentation ;)
In the following I've used cox family but you can change it based on your need
my_cvglmnet_fit <- cv.glmnet(x=regression_data, y=glmnet_response, family="cox", maxit = 100000)
Then you can plot the fitted object created by the cv.glmnet and in the plot you can easily see where the lambda is minimum. one of those dotted vertical lines is the minimum lambda and the other one is the 1se.
plot(my_cvglmnet_fit)
the following lines helps you see the non zero coefficients and their corresponding values:
coef(my_cvglmnet_fit, s = "lambda.min")[which(coef(my_cvglmnet_fit, s = "lambda.min") != 0)] # the non zero coefficients
colnames(regression_data)[which(coef(my_cvglmnet_fit, s = "lambda.min") != 0)] # The features that are selected
here are some links that may help:
http://web.stanford.edu/~hastie/glmnet/glmnet_alpha.html
http://blog.revolutionanalytics.com/2013/05/hastie-glmnet.html

Calculate many AUCs in R

I am fairly new to R. I am using the ROCR package in R to calculate AUC, which I can do for one predictor just fine. What I am looking to do is perform many AUC calculations for 100 different variables.
What I have done so far is the following:
varlist <- names(mydata)[2:101]
formlist <- lapply(varlist, function(x) paste0("prediction(",x,"mydata$V1))
However then the formulas are in text format, and the as.formula is giving me an error. Any help appreciated! Thanks in advance!
The function inside your lapply looks like it is just outputting a statement like prediction(varmydata$V1). I am guessing you actually want to run that command. If so, you probably want something like
lapply(varlist,function(x) prediction(mydata[x]))
but it is hard to tell without a reproducible situation. Also, it looks like your code has a missing quote.
If I understand you correctly, you want to use the first column of mydata as predictions, and all other variables as labels, one after the other.
Is this the correct way to treat mydata? This way is rather uncommon. It is more common to have the same true labels for several diffent predictions (e.g. iterated cross validation, comparison of different classifiers).
However, to answer your original question:
predictions and labels need to have the same shape for ROCR::prediction, e.g.
either as matrix
prediction (matrix (rep (mydata$V1, 10), 10), mydata [, -1])
or as lists:
prediction (mydata [rep (1, ncol (mydata) - 1)], mydata [-1])

Resources