lm function throws an error in terms.formula() in R - r

I am trying to run linear modelling on the training data frame, but it is not giving me the output.
It gives me an error saying
Error in terms.formula(formula, data = data) :
'.' in formula and no 'data' argument
Code
n <- ncol(training)
input <- as.data.frame(training[,-n])
fit <- lm(training[,n] ~.,data = training[,-n])

There's no need to remove the column from the data to perform this operation, and it's best to use names.
Say that your last column is called response. Then run this:
lm(response ~ ., data=training)
It's hard to say that this is the formula that you need. If you provide a reproducible example, that will become clear.

Related

Can't complete regression loop (invalid term in model formula)

I have applied the below code, and it was working fine until I got an error message that I don't know how to solve.
respvars <- names(QBB_clean[1653:2592])
`predvars <- c("bmi","Age", "sex","lpa2c", "smoking", "CholesterolTotal")`
results <- list()
for (v in respvars) {
form <- reformulate(predvars, response = v) results[[v]] <- lm(form, data = QBB_clean) } `
Error message:
Error in terms.formula(formula, data = data) :
invalid term in model formula
The error message "invalid term in model formula" says that there is something wrong with the way the formula is being constructed.
There might be several reasons. First, one of the variables in the formula may not present in the dataset or has a different name. To fix this issue, you can print the variable names in the formula and compare them to the variable names in the dataset.

Error in is.data.frame(data) : object 'test_data' not found

I am new in to R programming and trying creating Logistic Regression model for the first time.
While creating the model, I am getting the below error:
m<-glm(ad~.,data=test_data,family='binomial')
Error in terms.formula(formula, data = data) :
'.' in formula and no 'data' argument
Code:
college<- read.csv(file.choose(),header=T)
head(college)
set.seed(2020)
split_data<- sample.split(college_final$admit,SplitRatio=3/4)
split_data
train_data<- subset(split_data,split==T)
train_data
test_data<-subset(split_data,split==F)
test_data
model<-glm(admit~.,data=test_data,family='binomial')
model
summary(model)
Tried looking into R community for the same, but nothing was mentioned about it.
I don't have enough rep to leave a comment. But I tried to reproduce the data and it showed an error when I was creating test_data, so I think the issue is with subsetting. (I wonder if you got an error?)
In this case, we want test_data to be a data frame instead of vector. Try str(test_data) to see if it returns data.frame.
If not, try replacing
train_data<- subset(split_data,split==T)
test_data<-subset(split_data,split==F)
With
train_data <- subset(college, split_data == T)
test_data <- subset(college, split_data == F)
And run glm again.

Error in terms.formula(formula, data = data) : '.' in formula and no 'data' argument

i created a GUI for regression analysis.
svalue(tbl[2,1]) : accept a .csv input file
svalue(tbl[4,1]) : provide a dependent variable
enter code here
selected_var=read.csv(svalue(svalue(tbl[2,1]))
sv=selected_var
sv_regression=data.frame(sv)
glm1<<-glm(svalue(tbl[4,1]) ~ . ,data = sv_regression,family = poisson)
reg<<-summary.glm(glm1)$coefficients
reg_result <<-gtable(reg)
add(frame1,reg_result,expand=TRUE)
now run this code, i got an error
Error in terms.formula(formula, data = data) :
'.' in formula and no 'data' argument
glm() and other modelling functions need a formula in this structure:
`glm(var1 ~ ., data = sv_regression, family = poisson)
Where var1 should be the name of the response variable you are trying to predict. Without knowing what tbl and svalue are I can't see exactly what's going wrong (I suspect at least three things), but you need to structure your data in a way that you know in advance the name of the variable to be on the left side of the formula in your statistical model.
For example, given you are dependent on the user choosing things in the GUI, you could rename the column in sv_regression that is to be the response variable as y (or something more distinctive that has less chance of causing a conflict with an existing name), before you call glm. Then when you call glm you know that it will by glm(y ~ ., ...)

Subsetting data breaks GLM

I have a GLM Logit regression that works correctly, but when I add a subset argument to the GLM command, I get the following error:
invalid type (list) for variable '(weights)'.
So, the following command works:
glm(formula = A ~ B + C,family = "binomial",data = Data)
But the following command yield the error:
glm(formula = A ~ B + C,family = "binomial",data = Data,subset(Data,D<10))
(I realize that it may be difficult to answer this without seeing my data, but any general help on what may be causing my problem would be greatly appreciated)
Try subset=D<10 instead (you don't need to specify Data again, it is implicitly used as the environment for the subset argument). Because you haven't named the argument, R is interpreting it as the weights argument (which is the next argument after data).

Predicting with lm object in R - black box paradigm

I have a function that returns an lm object. I want to produce predicted values based on some new data. The new data is a data.frame in the exact format as the data passed to the lm function, except that the response has been removed (since we're predicting, not training). I would expect to execute the following, but get an error:
predict( model , newdata )
"Error in eval(expr, envir, enclos) : object 'ModelResponse' not found"
In my case, ModelResponse was the name of the response column in the data I originally trained on. So just for kicks, I tried to insert NA reponse:
newdata$ModelResponse = NA
predict( model , newdata )
Error in terms.default(object, data = data) : no terms component nor attribute
Highly frustrating! R's notion of models/regression doesn't match mine: 1. I train a model with some data and get a model object. 2. I can score new data from any environment/function/frame/etc. so long as I input data into the model object that "looks like" the data I trained on (i.e. same column names). This is a standard black-box paradigm.
So here are my questions:
1. What concept(s) am I missing here?
2. How do I get my scenario to work?
3. How can I get model object to be portable? str(model) shows me that the model object saved the original data it trained on! So the model object is massive. I want my model to be portable to any function/environment/etc. and only contain the data it needs to score.
In the absence of str() on either the model or the data offered to the model, here's my guess regarding this error message:
predict( model , newdata )
"Error in eval(expr, envir, enclos) : object 'ModelResponse' not found"
I guess that you made a model object named "model" and that your outcome variable (the left-hand-side of the formula( in the original call to lm was named "ModelResponse" and that you then named a column in newdata by the same name. But what you should have done was leave out the "ModelResponse" columns (because that is what you are predicting) and put in the "Model_Predictor1", Model_Predictor2", etc. ... i.e. all the names on the right-hand-side of the formula given to lm()
The coef() function will allow you to extract the information needed to make the model portable.
mod.coef <- coef(model)
mod.coef
Since you expressed interest in the rms/Hmisc package combo Function, here it is using the help-example from ols and comparing the output with an extracted function and the rms Predict method. Note the capitals, since these are designed to work with the package equivalents of lm and glm(..., family="binomial") and coxph, which in rms become ols, lrm, and cph.
> set.seed(1)
> x1 <- runif(200)
> x2 <- sample(0:3, 200, TRUE)
> distance <- (x1 + x2/3 + rnorm(200))^2
> d <- datadist(x1,x2)
> options(datadist="d") # No d -> no summary, plot without giving all details
>
>
> f <- ols(sqrt(distance) ~ rcs(x1,4) + scored(x2), x=TRUE)
>
> Function(f)
function(x1 = 0.50549065,x2 = 1) {0.50497361+1.0737604* x1-
0.79398383*pmax(x1-0.083887788,0)^3+ 1.4392827*pmax(x1-0.38792825,0)^3-
0.38627901*pmax(x1-0.65115162,0)^3-0.25901986*pmax(x1-0.92736774,0)^3+
0.06374433*x2+ 0.60885222*(x2==2)+0.38971577*(x2==3) }
<environment: 0x11b4568e8>
> ols.fun <- Function(f)
> pred1 <- Predict(f, x1=1, x2=3)
> pred1
x1 x2 yhat lower upper
1 1 3 1.862754 1.386107 2.339401
Response variable (y): sqrt(distance)
Limits are 0.95 confidence limits
# The "yhat" is the same as one produces with the extracted function
> ols.fun(x1=1, x2=3)
[1] 1.862754
(I have learned through experience that the restricted cubic-spline fit functions coming from rms need to have spaces and carriage returns added to improve readability. )
Thinking long-term, you should probably take a look at the caret package. Many or most modeling functions work with data frames and matrices, others have a preference, and there may be other variations of their expectations. It's important to quickly get your head around each, but if you want a single wrapper that will simplify life for you, making the intricacies into a "black box", then caret is as close as you can get.
As a disclaimer: I do not use caret, as I don't think modeling should be a be a black box. I've had more than a few emails to maintainers of modeling packages resulting from looking into their code and seeing something amiss. Wrapping that in another layer would not serve my interests. So, in the very long-run, avoid caret and develop an enjoyment for dissecting what's going into and out of the different modeling functions. :)

Resources