predict() for glm.fit does not work. why? - r

I've built glm model in R using glm.fit() function:
m <- glm.fit(x = as.matrix(df[,x.id]), y = df[,y.id], family = gaussian())
Afterwards, I tried to make some predictions, using (I am not sure that I chose s correctly):
predict.glm(m, x, s = 0.005)
And got an error:
Error in terms.default(object) : no terms component nor attribute
Here https://stat.ethz.ch/pipermail/r-help/2004-September/058242.html I found some sort of solution to a problem:
predict.glm.fit<-function(glmfit, newmatrix){
newmatrix<-cbind(1,newmatrix)
coef <- rbind(1, as.matrix(glmfit$coef))
eta <- as.matrix(newmatrix) %*% as.matrix(coef)
exp(eta)/(1 + exp(eta))
}
But I can not figure out if it is not possible to use glm.fit and predict afterwards. Why it is possible or not? And how should one choose s correctly?
N.B. The problem can be ommited if using glm() function. But glm() function asks for formula, which is not quite convenient in some cases. STill if someone wants to use glm.fit & predictions afterwards here is some solution: https://stat.ethz.ch/pipermail/r-help/2004-September/058242.html

You should be using glm not glm.fit. glm.fit is the workhorse of glm but glm returns an object of class c("glm", "lm") for which there is a predict.glm method. Then you only have to apply predict to the object returned by glm (possibly with some new data specified and the type of prediction that you want) and the generic predict function will select the correct method function.

Related

What is the meaning of bf() in brms package when we do cumulative regression analysis?

I was trying to run a Bayesian multilevel cumulative model on ordinal data and was reading the documentation of brms online. My model looks something like
model <- brm(bf(y ~ Condition + (Condition|item) + (Condition|subject)),
data = df,
family = cumulative(link="probit", threshold="flexible"),
chains=4,cores=4,iter=2000, prior = prior)
I saw that some documentations do not have the bf() function when specifying the formula but some do. Could someone explain to me what is bf() doing here? Thanks!
The bf() function is just to specify a formula, and using it for simple models inside the brm() function is not something you need to do. You could remove it in your example.
However, you can use the bf() function to save a formula as an object to pass to the brm() function, like this:
model_formula <- bf(y ~ Condition + (Condition|item) + (Condition|subject))
model <- brm(model_formula,
data = df,
family = cumulative(link="probit", threshold="flexible"),
chains=4,cores=4,iter=2000, prior = prior)
For more advanced formulas, you may need to use the bf() function to separate different parts of your model. For example, a linear model like this would not run if you didn't wrap the formula in bf():
model <- brm(bf(y ~ x + (1+x|random_effect), sigma ~ x), ...)
Here are some links to pages describing more complex models that all use the bf() function to specify the formulas:
https://cran.r-project.org/web/packages/brms/vignettes/brms_distreg.html
https://paul-buerkner.github.io/brms/reference/mixture.html

Getting Error Bootstrapping to test predictive model

rsq <- function(formula, Data1, indices) {
d <- Data1[indices,] # allows boot to select sample
fit <- lm(formula, Data1=d)
return(summary(fit)$r.square)
}
results = boot(data = Data1, statistic = rsq, R = 500)
When I execute the code, I get the following error:
Error in Data1[indices,] : incorrect number of dimensions
Background info: I am creating a predictive model using Linear Regressions. I would like to test my Predictive Model and through some research, I decided to use the Bootstrapping Method.
Credit goes to #Rui Barradas, check comments for original post.
If you read the help page for function boot::boot you will see that the function it calls has first argument data, then indices, then others. So change the order of your function definition to rsq <- function(Data1, indices, formula)
Another problem that I had was that I didn't define the Function.

Fitting Step functions

AIM: The aim here was to find a suitable fit, using step functions, which uses age to describe wage, in the Wage dataset in the library ISLR.
PLAN:
To find a suitable fit, I'll try multiple fits, which will have different cut points. I'll use the glm() function (of the boot library) for the fitting purpose. In order to check which fit is the best, I'll use the cv.glm() function to perform cross-validation over the fitted model.
PROBLEM:
In order to do so, I did the following:
all.cvs = rep(NA, 10)
for (i in 2:10) {
lm.fit = glm(wage~cut(Wage$age,i), data=Wage)
all.cvs[i] = cv.glm(Wage, lm.fit, K=10)$delta[2]
}
But this gives an error:
Error in model.frame.default(formula = wage ~ cut(Wage$age, i), data =
list( : variable lengths differ (found for 'cut(Wage$age, i)')
Whereas, when I run the code given below, it runs.(It can be found here)
all.cvs = rep(NA, 10)
for (i in 2:10) {
Wage$age.cut = cut(Wage$age, i)
lm.fit = glm(wage~age.cut, data=Wage)
all.cvs[i] = cv.glm(Wage, lm.fit, K=10)$delta[2]
}
Hypotheses and Results:
Well, it might be possible that cut() and glm() might not work together. But this works:
glm(wage~cut(age,4),data=Wage)
Question:
So, basically we're using the cut() function, saving it's results in a variable, then using that variable in the glm() function. But we can't put the cut function inside the glm() function. And that too, only if the code is in a loop.
So, why is the first version of the code not working?
This is confusing. Any help appreciated.

Error converting rxGlm to GLM

I'm having a problem converting rxGlm models to normal glm models. Every time I try and covert my models I get the same error:
Error in qr.lm(object) : lm object does not have a proper 'qr' component.
Rank zero or should not have used lm(.., qr=FALSE).
Here's a simple example:
cols <- colnames(iris)
vars <- cols[!cols %in% "Sepal.Length"]
form1 <- as.formula(paste("Sepal.Length ~", paste(vars, collapse = "+")))
rx_version <- rxGlm(formula = form1,
data = iris,
family = gaussian(link = 'log'),
computeAIC = TRUE)
# here is the equivalent model with base R
R_version <- glm(formula = form1,
data = iris,
family = gaussian(link = 'log'))
summary(as.glm(rx_version)) #this always gives the above error
I cant seem to find this "qr" component (I'm assuming this is related to matrix decomposition) to specify in rxGlm formula.
Anyone else dealt with this?
rxGlm objects don't have a qr component, and converting to a glm object won't create one. This is intentional, as computing the QR decomposition of the model matrix requires the full dataset to be in memory which would defeat the purpose of using the rx* functions.
as.glm is really meant more for supporting model import/export via PMML. Most of the things that you'd want to do can be done with the rxGlm object, without converting. Eg rxGlm computes the coefficient std errors as part of the fit, without requiring a qr component afterwards.

Predict function from Caret package give an Error

I am doing just a regular logistic regression using the caret package in R. I have a binomial response variable coded 1 or 0 that is called a SALES_FLAG and 140 numeric response variables that I used dummyVars function in R to transform to dummy variables.
data <- dummyVars(~., data = data_2, fullRank=TRUE,sep="_",levelsOnly = FALSE )
dummies<-(predict(data, data_2))
model_data<- as.data.frame(dummies)
This gives me a data frame to work with. All of the variables are numeric. Next I split into training and testing:
trainIndex <- createDataPartition(model_data$SALE_FLAG, p = .80,list = FALSE)
train <- model_data[ trainIndex,]
test <- model_data[-trainIndex,]
Time to train my model using the train function:
model <- train(SALE_FLAG~. data=train,method = "glm")
Everything runs nice and I get a model. But when I run the predict function it does not give me what I need:
predict(model, newdata =test,type="prob")
and I get an ERROR:
Error in dimnames(out)[[2]] <- modelFit$obsLevels :
length of 'dimnames' [2] not equal to array extent
On the other hand when I replace "prob" with "raw" for type inside of the predict function I get prediction but I need probabilities so I can code them into binary variable given my threshold.
Not sure why this happens. I did the same thing without using the caret package and it worked how it should:
model2 <- glm(SALE_FLAG ~ ., family = binomial(logit), data = train)
predict(model2, newdata =test, type="response")
I spend some time looking at this but not sure what is going on and it seems very weird to me. I have tried many variations of the train function meaning I didn't use the formula and used X and Y. I used method = 'bayesglm' as well to check and id gave me the same error. I hope someone can help me out. I don't need to use it since the train function to get what I need but caret package is a good package with lots of tools and I would like to be able to figure this out.
Show us str(train) and str(test). I suspect the outcome variable is numeric, which makes train think that you are doing regression. That should also be apparent from printing model. Make it a factor if you want to do classification.
Max

Resources