prediction on test set for Gaussian Process Regression in R - r

The mlegp package explains how to do Gaussian Process fitting but the R code mentioned in the mlegp package only demonstrates the use of the predict method to reconstruct the original functional output. Can someone help me understand how to predict using GPR on a test set?

The function predict.gp (which gets called when you use predict on an mlpeg object) takes a newData argument, see ?predict.gp:
Usage:
## S3 method for class 'gp'
predict(object, newData = object$X, se.fit = FALSE, ...)
Arguments:
object: an object of class ‘gp’
newData: an optional data frame or matrix with rows corresponding to
inputs for which to predict. If omitted, the design matrix
‘X’ of ‘object’ is used.
...
Consider the simple model
library(mlepg)
x = -5:5
y = sin(x) + rnorm(length(x),sd = 0.1)
fit = mlegp(x, y)
Then
predict(fit)
and
predict(fit, newData = fit$X)
gives the same result. You can then change newData according to your test data.

Related

How can I load a library in R to call it from Excel with bert-toolkit?

Bert-toolkit is a very nice package to call R functions from Excel. See: https://bert-toolkit.com/
I have used bert-toolkit to call a fitted neuralnet (avNNnet fitted with Caret) within a wrapper function in R from Excel VBA. This runs perfect. This is the code to load the model within the wrapper function in bert-toolkit:
load("D:/my_model_avNNet.rda")
neuraln <- function(x1,x2,x3){
xx <- data.frame(x1,x2,x3)
z <- predict(my_model_avNNET, xx)
z
}
Currently I tried to do this with a fitted GAM (fitted with package mgcv). Although I do not succeed. If I call the fitted GAM from Excel VBA it gives error 2015. If I call the fitted GAM from a cell it gives #VALUE! At the same time the correct outcome of the calculation is shown in the bert-console!
This is the code to load the model in the wrapperfunction in bert-toolkit:
library(mgcv)
load("D:/gam_y_model.rda")
testfunction <- function(k1,k2){
z <- predict(gam_y, data.frame(x = k1, x2 = k2))
print (z)
}
The difference between the avNNnet-model (Caret) and the GAM-model (mgcv) is that the avNNnet-model does NOT need the Caret library to be loaded to generate a prediction, while the GAM-model DOES need the mgcv library to be loaded.
It seems to be not sufficient to load the mgvc-library in the script with the GAM-model which loads the GAM-model in a wrapper function in bert-toolkit, as I did in the code above. Although the correct outcome of the model is shown in the bert-console. It does not generate the correct outcome in Excel.
I wonder how this is possible and can be solved. It seems to me that maybe there are two instances of R running in bert-toolkit.
How can I load the the mgcv-library in such a way that it can be used by the GAM-model within the function called from Excel?
This is some example code to fit the GAM with mgcv and save to model (after running this code the model can uploaded in bert-toolkit with the code above) :
library(mgcv)
# construct some sample data:
x <- seq(0, pi * 2, 0.1)
x2 <- seq(0, pi * 20, 1)
sin_x <- sin(x)
tan_x2 <- tan(x2)
y <- sin_x + rnorm(n = length(x), mean = 0, sd = sd(sin_x / 2))
Sample_data <- data.frame(y,x,x2)
# fit gam:
gam_y <- gam(y ~ s(x) + s(x2), method = "REML")
# Make predictions with the fitted model:
x_new <- seq(0, max(x), length.out = 100)
x2_new <- seq(0, max(x2), length.out = 100)
y_pred <- predict(gam_y, data.frame(x = x_new, x2 = x2_new))
# save model, to load it later in bert-toolkit:
setwd("D:/")
save(gam_y, file = "gam_y_model.rda")
One of R's signatures is method dispatching where users call the same named method such as predict but internally a different variant is run such as predict.lm, predict.glm, or predict.gam depending on the model object passed into it. Therefore, calling predict on an avNNet model is not the same predict on a gam model. Similarly, just as the function changes due to the input, so does the output change.
According to MSDN documents regarding the Excel #Value! error exposed as Error 2015:
#VALUE is Excel's way of saying, "There's something wrong with the way your formula is typed. Or, there's something wrong with the cells you are referencing."
Fundamentally, without seeing actual results, Excel may not be able to interpret or translate into Excel range or VBA type the result R returns from gam model especially as you describe R raises no error.
For example, per docs, the return value of the standard predict.lm is:
predict.lm produces a vector of predictions or a matrix of predictions...
However, per docs, the return value of predict.gam is a bit more nuanced:
If type=="lpmatrix" then a matrix is returned which will give a vector of linear predictor values (minus any offest) at the supplied covariate values, when applied to the model coefficient vector. Otherwise, if se.fit is TRUE then a 2 item list is returned with items (both arrays) fit and se.fit containing predictions and associated standard error estimates, otherwise an array of predictions is returned. The dimensions of the returned arrays depends on whether type is "terms" or not: if it is then the array is 2 dimensional with each term in the linear predictor separate, otherwise the array is 1 dimensional and contains the linear predictor/predicted values (or corresponding s.e.s). The linear predictor returned termwise will not include the offset or the intercept.
Altogether, consider adjusting parameters of your predict call to render a numeric vector for easy Excel interpretation and not a matrix/array or some other higher dimension R type that Excel cannot render:
testfunction <- function(k1,k2){
z <- mgcv::predict.gam(gam_y, data.frame(x = k1, x2 = k2), type=="response")
return(z)
}
testfunction <- function(k1,k2){
z <- mgcv::predict.gam(gam_y, data.frame(x = k1, x2 = k2), type=="lpmatrix")
return(z)
}
testfunction <- function(k1,k2){
z <- mgcv::predict.gam(gam_y, data.frame(x = k1, x2 = k2), type=="linked")
return(z$fit) # NOTICE fit ELEMENT USED
}
...
Further diagnostics:
Check returned object of predict.glm with str(obj) and class(obj)/ typeof(obj) to see dimensions and underlying elements and compare with predict in caret;
Check if high precision of decimal numbers is the case such as Excel's limits of 15 decimal points;
Check amount of data returned (exceeds Excel's sheet row limit of 220 or cell limit of 32,767 characters?).

Getting Error Bootstrapping to test predictive model

rsq <- function(formula, Data1, indices) {
d <- Data1[indices,] # allows boot to select sample
fit <- lm(formula, Data1=d)
return(summary(fit)$r.square)
}
results = boot(data = Data1, statistic = rsq, R = 500)
When I execute the code, I get the following error:
Error in Data1[indices,] : incorrect number of dimensions
Background info: I am creating a predictive model using Linear Regressions. I would like to test my Predictive Model and through some research, I decided to use the Bootstrapping Method.
Credit goes to #Rui Barradas, check comments for original post.
If you read the help page for function boot::boot you will see that the function it calls has first argument data, then indices, then others. So change the order of your function definition to rsq <- function(Data1, indices, formula)
Another problem that I had was that I didn't define the Function.

Error converting rxGlm to GLM

I'm having a problem converting rxGlm models to normal glm models. Every time I try and covert my models I get the same error:
Error in qr.lm(object) : lm object does not have a proper 'qr' component.
Rank zero or should not have used lm(.., qr=FALSE).
Here's a simple example:
cols <- colnames(iris)
vars <- cols[!cols %in% "Sepal.Length"]
form1 <- as.formula(paste("Sepal.Length ~", paste(vars, collapse = "+")))
rx_version <- rxGlm(formula = form1,
data = iris,
family = gaussian(link = 'log'),
computeAIC = TRUE)
# here is the equivalent model with base R
R_version <- glm(formula = form1,
data = iris,
family = gaussian(link = 'log'))
summary(as.glm(rx_version)) #this always gives the above error
I cant seem to find this "qr" component (I'm assuming this is related to matrix decomposition) to specify in rxGlm formula.
Anyone else dealt with this?
rxGlm objects don't have a qr component, and converting to a glm object won't create one. This is intentional, as computing the QR decomposition of the model matrix requires the full dataset to be in memory which would defeat the purpose of using the rx* functions.
as.glm is really meant more for supporting model import/export via PMML. Most of the things that you'd want to do can be done with the rxGlm object, without converting. Eg rxGlm computes the coefficient std errors as part of the fit, without requiring a qr component afterwards.

Plot in SVM model (e1071 Package) using DocumentTermMatrix

i trying do create a plot for my model create using SVM in e1071 package.
my code to build the model, predict and build confusion matrix is
ptm <- proc.time()
svm.classifier = svm(x = train.set.list[[0.999]][["0_0.1"]],
y = train.factor.list[[0.999]][["0_0.1"]],
kernel ="linear")
pred = predict(svm.classifier, test.set.list[[0.999]][["0_0.1"]], decision.values = TRUE)
time[["svm"]] = proc.time() - ptm
confmatrix = confusionMatrix(pred,test.factor.list[[0.999]][["0_0.1"]])
confmatrix
train.set.list and test.set.list contains the test and train set for several conditions. train and set factor has the true label for each set. Train.set and test.set are both documenttermmatrix.
Then i tried to see a plot of my data, i tried with
plot(svm.classifier, train.set.list[[0.999]][["0_0.1"]])
but i got the message:
"Error in plot.svm(svm.classifier, train.set.list[[0.999]][["0_0.1"]]) :
missing formula."
what i'm doing wrong? confusion matrix seems good to me even not using formula parameter in svm function
Without given code to run, it's hard to say exactly what the problem is. My guess, given
?plot.svm
which says
formula formula selecting the visualized two dimensions. Only needed if more than two input variables are used.
is that your data has more than two predictors. You should specify in your plot function:
plot(svm.classifier, train.set.list[[0.999]][["0_0.1"]], predictor1 ~ predictor2)

predict() for glm.fit does not work. why?

I've built glm model in R using glm.fit() function:
m <- glm.fit(x = as.matrix(df[,x.id]), y = df[,y.id], family = gaussian())
Afterwards, I tried to make some predictions, using (I am not sure that I chose s correctly):
predict.glm(m, x, s = 0.005)
And got an error:
Error in terms.default(object) : no terms component nor attribute
Here https://stat.ethz.ch/pipermail/r-help/2004-September/058242.html I found some sort of solution to a problem:
predict.glm.fit<-function(glmfit, newmatrix){
newmatrix<-cbind(1,newmatrix)
coef <- rbind(1, as.matrix(glmfit$coef))
eta <- as.matrix(newmatrix) %*% as.matrix(coef)
exp(eta)/(1 + exp(eta))
}
But I can not figure out if it is not possible to use glm.fit and predict afterwards. Why it is possible or not? And how should one choose s correctly?
N.B. The problem can be ommited if using glm() function. But glm() function asks for formula, which is not quite convenient in some cases. STill if someone wants to use glm.fit & predictions afterwards here is some solution: https://stat.ethz.ch/pipermail/r-help/2004-September/058242.html
You should be using glm not glm.fit. glm.fit is the workhorse of glm but glm returns an object of class c("glm", "lm") for which there is a predict.glm method. Then you only have to apply predict to the object returned by glm (possibly with some new data specified and the type of prediction that you want) and the generic predict function will select the correct method function.

Resources