Subsetting data breaks GLM - r

I have a GLM Logit regression that works correctly, but when I add a subset argument to the GLM command, I get the following error:
invalid type (list) for variable '(weights)'.
So, the following command works:
glm(formula = A ~ B + C,family = "binomial",data = Data)
But the following command yield the error:
glm(formula = A ~ B + C,family = "binomial",data = Data,subset(Data,D<10))
(I realize that it may be difficult to answer this without seeing my data, but any general help on what may be causing my problem would be greatly appreciated)

Try subset=D<10 instead (you don't need to specify Data again, it is implicitly used as the environment for the subset argument). Because you haven't named the argument, R is interpreting it as the weights argument (which is the next argument after data).

Related

Error in as.data.frame.default(data, optional = TRUE) when knitting a r markdown file

When I try to knit my RMD I get the following - Line 20 Error in as.data.frame.default(data, optional = TRUE): cannot coerce class '"function"' to a data frame. I can run the code in my script with no issues and in fact the first time I tried it worked but then I had to change few options (echo, eval and error) and then I started getting the error. Here is the line in question
`
model <- glm(GGPA_3.4 ~ Age + Gender + UGPA, df, family = "binomial")
summary(model)
Thanks for the help.
model <- glm(GGPA_3.4 ~ Age + Gender + UGPA, family = "binomial", data = df)
summary(model)
Wondering if the order of the arguments out of order caused the function to error. The doc glm() says for the family argument For glm this can be a character string naming a family function, a family function or the result of a call to a family function Maybe try family=binomial also to fix the error
I think I figured it out. In the first chunk I had the option "eval = FALSE" and that's when I started having issues. Once I removed it everything was back to normal.

error r: invalid subscript type "closure" in a simple regression

unfortunately i am a beginner in r. I d like to run a simple linear regression model in r with the comand lm, but every time i try the following error occurs:
Error in xj[i] : invalid subscript type 'closure'
The regression model ist just as follows:
REG1 <- lm(flowpercent~ret+tna+fundage+number_shr_cl,data = reg, na.omit)
#-flowpercent is a calculated variable:
reg$flowpercent <- reg$flow_dollar/lag(reg$tna, n=1)
#-fundage is also calculated:
reg$fundage <- as.numeric(difftime(ref_date,reg$InceptionDate, units = "days")/365.25)
ret, tna, number_shr_cl are variables from a database
hopefully some can help me to solve my problem.
Many thanks in advance.
Your third argument is na.omit. You probably saw someone writing something like na.action = na.omit. However, if you look up the help for lm by typing ?lm, you will see:
Usage:
lm(formula, data, subset, weights, na.action, ... # etc
which tells you that the third argument to lm is subset. So, you are passing the object called na.omit to the subset argument, which lm tries to use to subset your data. Unfortunately, na.omit is an R function (aka a "closure"). Not surprisingly, R does not know how to use this function to subset your data. Hence the error.

Error in terms.formula(formula, data = data) : '.' in formula and no 'data' argument

i created a GUI for regression analysis.
svalue(tbl[2,1]) : accept a .csv input file
svalue(tbl[4,1]) : provide a dependent variable
enter code here
selected_var=read.csv(svalue(svalue(tbl[2,1]))
sv=selected_var
sv_regression=data.frame(sv)
glm1<<-glm(svalue(tbl[4,1]) ~ . ,data = sv_regression,family = poisson)
reg<<-summary.glm(glm1)$coefficients
reg_result <<-gtable(reg)
add(frame1,reg_result,expand=TRUE)
now run this code, i got an error
Error in terms.formula(formula, data = data) :
'.' in formula and no 'data' argument
glm() and other modelling functions need a formula in this structure:
`glm(var1 ~ ., data = sv_regression, family = poisson)
Where var1 should be the name of the response variable you are trying to predict. Without knowing what tbl and svalue are I can't see exactly what's going wrong (I suspect at least three things), but you need to structure your data in a way that you know in advance the name of the variable to be on the left side of the formula in your statistical model.
For example, given you are dependent on the user choosing things in the GUI, you could rename the column in sv_regression that is to be the response variable as y (or something more distinctive that has less chance of causing a conflict with an existing name), before you call glm. Then when you call glm you know that it will by glm(y ~ ., ...)

Panel regression error in R

I am running an unbalanced panel regression.
Independent Variable is Gross
Dependent Varibales are DEX, GRW, Debt and Life.
Time is Year
Grouping is Country
I have successfully executed the following commands:
tino=read.delim("clipboard")
tino
summary(tino)
Dep<- with(tino, cbind(Gross, index=c("Country, Year"))
Ind<- tino[ , c('DEX', 'GRW' , 'Debt', 'Life')]
install.packages("plm")
library('plm')
pandata<-plm.data(tino)
tino
summary(pandata)
summary(Dep)
summary(Ind)
However, When I run the Command below for results, I get an error.
pooling<- plm(Dep~Ind, data = pandata, model= "pooling")
gives error below
Error in model.frame.default(terms(formula, lhs = lhs, rhs = rhs, data = data,: invalid type (list) for variable 'Ind'
Please help.
Thanks
Without access to your data, it is impossible to confirm that this will work, but I am going to try to point out several issues in your code that are likely contributing to the error.
This line is fine:
tino=read.delim("clipboard")
Here is where you start to make errors:
Dep<- with(tino, cbind(Gross, index=c("Country, Year"))
Ind<- tino[ , c('DEX', 'GRW' , 'Debt', 'Life')]
with() is typically used to create new vectors out of a data.frame. All it does is allow you to drop the $ notation for referencing variables in a data.frame and nothing else. From the read of your code, you may be thinking that with() is actually modifying the tino object, which it is not.
Further, when you want to construct a data.frame for use in a regression model, you want all of the right-hand and left-hand side variables in one data.frame or matrix rather than separating them. This is because most modelling functions operate using a "formula" and data argument, which are passed to model.frame() to preprocess the data before modelling.
This means you presumably want to do something like the following, skipping all of the above:
pandata <- plm.data(tino, index = c("Country", "Year"))
pooling <- plm(Gross ~ DEX + GRW + Debt + Life, data = pandata, model = "pooling")
summary(pooling)
If you have a lot of right-hand side variables, you can subset your data.frame, with something like:
pandata2 <- plm.data(tino[ , c('Gross', 'DEX', 'GRW' , 'Debt', 'Life')], index = c("Country", "Year"))
pooling2 <- plm(Gross ~ ., data = pandata2, model = "pooling")
using the . notation as a shorthand for "all other columns in the data."

lm function throws an error in terms.formula() in R

I am trying to run linear modelling on the training data frame, but it is not giving me the output.
It gives me an error saying
Error in terms.formula(formula, data = data) :
'.' in formula and no 'data' argument
Code
n <- ncol(training)
input <- as.data.frame(training[,-n])
fit <- lm(training[,n] ~.,data = training[,-n])
There's no need to remove the column from the data to perform this operation, and it's best to use names.
Say that your last column is called response. Then run this:
lm(response ~ ., data=training)
It's hard to say that this is the formula that you need. If you provide a reproducible example, that will become clear.

Resources