Can't complete regression loop (invalid term in model formula) - r

I have applied the below code, and it was working fine until I got an error message that I don't know how to solve.
respvars <- names(QBB_clean[1653:2592])
`predvars <- c("bmi","Age", "sex","lpa2c", "smoking", "CholesterolTotal")`
results <- list()
for (v in respvars) {
form <- reformulate(predvars, response = v) results[[v]] <- lm(form, data = QBB_clean) } `
Error message:
Error in terms.formula(formula, data = data) :
invalid term in model formula

The error message "invalid term in model formula" says that there is something wrong with the way the formula is being constructed.
There might be several reasons. First, one of the variables in the formula may not present in the dataset or has a different name. To fix this issue, you can print the variable names in the formula and compare them to the variable names in the dataset.

Related

tryCatch() function to save warning message AND error message in r

my previous question was:
I am simulating multilevel data and fitting it to various Multilevel models.
I want to make a function that if there is any error (such as "failed to converge" or "singular fit"), I want to save it.
For example, my model is
lmer(y~ x1 + x2 + (1|pid), data=sim_data).
and here are many conditions so various data will be fitted into this model.
How can I save the error or warning message as a whole in the dataframe or list?
(like, first dataset -> no error, second dataset -> converge error...etc)
and some people gave me good answer to it.
However, the answers to this question were about error message, but not warning message.
for( i in 1:10){
catch$error[i]<-tryCatch((y~ x1 + x2 + (1|pid), data=sim_data),
error=function(e) e$message,
warning=function(w) w$message)
}
with this code, can I extract warning and error message both?
and it seems like generating following error
Error in results$MLM0warning1[i] <- tryCatch(model_null(ind_sim_data), :
incompatible types (from S4 to character) in subassignment type fix

R: DALEX explain Fails to Read In Target Variable Data

I'm running a simple lm model in R and I am trying to analyze the results using the DALEX package explain object.
My model is as follows: lm_model <- lm (DV ~ x + z, data = datax)
If it matters, x and z are factors and DV is numeric. The lm runs with no errors, and everything looks fine via summary(lm_model).
When I try to create the explain object in DALEX like so:
lm_exp <- DALEX::explain(lm_model, label = "lm", data = datax, y = datax$DV)
It gives me the following:
Preparation of a new explainer is initiated
-> model label : lm
-> data : 15375 rows 49 cols
-> data : tibbble converted into a data.frame
-> target variable : 15375 values
Error in if (is_y_in_data(data, y)) { :
missing value where TRUE/FALSE needed
Before the lm is run, datax is filtered for values between .2 and 1 using the subset command. Looking at summary(datax$DV) and sum(is.na(datax$DV)), everything looks fine. I also checked for blanks / errors using a filter in Excel. For those reasons, I do not believe there are any blanks in the DV col of datax, so I am unsure of why I am receiving "Error in if (is_y_in_data(data, y)) { :
missing value where TRUE/FALSE needed"
I have scoured the internet for this error when using DALEX explain, but I have not found any results. Thanks for any help that can be provided.

invalid type (list) message in applying gmm method

The moment condition function is simply exp(-g/r)-1, where g is a known series of daily return on AAA-class bond index, and r is the rikiness measure to be derived through gmm. My codes are as follows:
View(Source)
library(gmm)
data(Source)
x <- Source[1:5200,"AAA"]
m <- function(r,x)
{m.1 <- exp(-x[,"AAA"]/r)-1}
summary(gmm(m,x,t0=1,method="BFGS",control=1e-12))
Which in term yields the following error message:
****Error in model.frame.default(formula = gmat ~ 1, drop.unused.levels = TRUE) :
invalid type (list) for variable 'gmat'****
Could anyone help me figure out what went wrong?
Thanks a lot!
For those kind people who would like to replicate the results, please find attached the source data as mentioned above.
The correct r is 1.590 , which can be solved through goal searching in excel, with target function :(average(exp(-g/r)-1) )^2 and target value: 0 (tolerance: 1e-12)
https://docs.google.com/spreadsheets/d/1AnTErQd2jm9ttKDZa7On3DLzEZUWaz5Km3nKaB7K18o/edit?usp=sharing

lm function throws an error in terms.formula() in R

I am trying to run linear modelling on the training data frame, but it is not giving me the output.
It gives me an error saying
Error in terms.formula(formula, data = data) :
'.' in formula and no 'data' argument
Code
n <- ncol(training)
input <- as.data.frame(training[,-n])
fit <- lm(training[,n] ~.,data = training[,-n])
There's no need to remove the column from the data to perform this operation, and it's best to use names.
Say that your last column is called response. Then run this:
lm(response ~ ., data=training)
It's hard to say that this is the formula that you need. If you provide a reproducible example, that will become clear.

Predicting with lm object in R - black box paradigm

I have a function that returns an lm object. I want to produce predicted values based on some new data. The new data is a data.frame in the exact format as the data passed to the lm function, except that the response has been removed (since we're predicting, not training). I would expect to execute the following, but get an error:
predict( model , newdata )
"Error in eval(expr, envir, enclos) : object 'ModelResponse' not found"
In my case, ModelResponse was the name of the response column in the data I originally trained on. So just for kicks, I tried to insert NA reponse:
newdata$ModelResponse = NA
predict( model , newdata )
Error in terms.default(object, data = data) : no terms component nor attribute
Highly frustrating! R's notion of models/regression doesn't match mine: 1. I train a model with some data and get a model object. 2. I can score new data from any environment/function/frame/etc. so long as I input data into the model object that "looks like" the data I trained on (i.e. same column names). This is a standard black-box paradigm.
So here are my questions:
1. What concept(s) am I missing here?
2. How do I get my scenario to work?
3. How can I get model object to be portable? str(model) shows me that the model object saved the original data it trained on! So the model object is massive. I want my model to be portable to any function/environment/etc. and only contain the data it needs to score.
In the absence of str() on either the model or the data offered to the model, here's my guess regarding this error message:
predict( model , newdata )
"Error in eval(expr, envir, enclos) : object 'ModelResponse' not found"
I guess that you made a model object named "model" and that your outcome variable (the left-hand-side of the formula( in the original call to lm was named "ModelResponse" and that you then named a column in newdata by the same name. But what you should have done was leave out the "ModelResponse" columns (because that is what you are predicting) and put in the "Model_Predictor1", Model_Predictor2", etc. ... i.e. all the names on the right-hand-side of the formula given to lm()
The coef() function will allow you to extract the information needed to make the model portable.
mod.coef <- coef(model)
mod.coef
Since you expressed interest in the rms/Hmisc package combo Function, here it is using the help-example from ols and comparing the output with an extracted function and the rms Predict method. Note the capitals, since these are designed to work with the package equivalents of lm and glm(..., family="binomial") and coxph, which in rms become ols, lrm, and cph.
> set.seed(1)
> x1 <- runif(200)
> x2 <- sample(0:3, 200, TRUE)
> distance <- (x1 + x2/3 + rnorm(200))^2
> d <- datadist(x1,x2)
> options(datadist="d") # No d -> no summary, plot without giving all details
>
>
> f <- ols(sqrt(distance) ~ rcs(x1,4) + scored(x2), x=TRUE)
>
> Function(f)
function(x1 = 0.50549065,x2 = 1) {0.50497361+1.0737604* x1-
0.79398383*pmax(x1-0.083887788,0)^3+ 1.4392827*pmax(x1-0.38792825,0)^3-
0.38627901*pmax(x1-0.65115162,0)^3-0.25901986*pmax(x1-0.92736774,0)^3+
0.06374433*x2+ 0.60885222*(x2==2)+0.38971577*(x2==3) }
<environment: 0x11b4568e8>
> ols.fun <- Function(f)
> pred1 <- Predict(f, x1=1, x2=3)
> pred1
x1 x2 yhat lower upper
1 1 3 1.862754 1.386107 2.339401
Response variable (y): sqrt(distance)
Limits are 0.95 confidence limits
# The "yhat" is the same as one produces with the extracted function
> ols.fun(x1=1, x2=3)
[1] 1.862754
(I have learned through experience that the restricted cubic-spline fit functions coming from rms need to have spaces and carriage returns added to improve readability. )
Thinking long-term, you should probably take a look at the caret package. Many or most modeling functions work with data frames and matrices, others have a preference, and there may be other variations of their expectations. It's important to quickly get your head around each, but if you want a single wrapper that will simplify life for you, making the intricacies into a "black box", then caret is as close as you can get.
As a disclaimer: I do not use caret, as I don't think modeling should be a be a black box. I've had more than a few emails to maintainers of modeling packages resulting from looking into their code and seeing something amiss. Wrapping that in another layer would not serve my interests. So, in the very long-run, avoid caret and develop an enjoyment for dissecting what's going into and out of the different modeling functions. :)

Resources