Error: Coefficients: (4 not defined because of singularities) in R - r

I have some error in my code, which I couldn't figure out.
I have a dataframe "a", with:
row.names GM variance stddev skewness correltomarket DEratio
1 MMM 0.9785122 0.9998918 0.9999459 -1.049053 2.932738 0.07252799
Now, I need to find a linear model for the above dataframe with the following code
riskmodel <- lm(formula=((a$GM)~(a$variance)+(a$skewness)+
(a$correltomarket)+(a$DEratio)),data=a)
When I run this code, I get the following summary for the "riskmodel"
Call:
lm(formula = ((a$GM) ~ (a$variance) + (a$skewness) + (a$correlationtomarket) +
(a$DEratio)), data = a)
Residuals:
ALL 1 residuals are 0: no residual degrees of freedom!
Coefficients: (4 not defined because of singularities)
Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.9785 NA NA NA
a$variance NA NA NA NA
a$skewness NA NA NA NA
a$correlationtomarket NA NA NA NA
a$DEratio NA NA NA NA
Residual standard error: NaN on 0 degrees of freedom
I don't understand why and I would be really grateful to anyone who helps me with this. I have no idea whats going wrong.

You only have a single observation in your data.frame. You can't fit a model with 5 parameters with only a single observation. You would need at least six observations to be able to fit parameters and have an estimate of the variance.

Related

Missing values for inverse Mills ratio, sigma and rho for Heckman model output in R

I am using R and the selection() function to run 2 seperate Heckman models.
Unadjusted model: For the sampling probability, the independent and dependent variable are binary, and for the second equation my independent variable is binary and my dependent variable is continuous.
Estimate Std. Error t value Pr(>|t|)
invMillsRatio NA NA NA NA
sigma NA NA NA NA
rho NA NA NA NA
Adjusted model: When I run the same model but add binary and continuous independent variables to adjust for some things, I get :
Estimate Std. Error t value Pr(>|t|)
invMillsRatio 506.807 29.125 17.4 <0.0000000000000002 ***
sigma 504.052 NA NA NA
rho 1.005 NA NA NA
Everything works fine, except that I get a lot of NA´s for the Error terms. Can anyone explain to me why this is the case? What does this mean, and are there any solutions to obtain error terms?
Thanks for any help!

PSCL Returning NAs for zeroinfl negbin model

I am trying to run a Zero-Inflated Negative Binomial Count Model on some data containing the number of campaign visits by a politician by county. (Log Liklihood tests indicate Negative Binomial is correct, Vuong test suggests Zero-Inflated, though that could be thrown off by the fact that my Zero-Inflated model is clearly not converging.) I am using the pscl package in R. The problem is that when I run
Call:
zeroinfl(formula = Sanders_Adjacent_Clinton_Visit ~ Relative_Divisiveness + Obama_General_Percent_12 +
Percent_Over_65 + Percent_F + Percent_White + Percent_HS + Per_Capita_Income +
Poverty_Rate + MRP_Ideology_Mean + Swing_State, data = Unity_Data, dist = "negbin")
Pearson residuals:
Min 1Q Median 3Q Max
-0.96406 -0.24339 -0.11744 -0.03183 16.21356
Count model coefficients (negbin with log link):
Estimate Std. Error z value Pr(>|z|)
(Intercept) -1.216e+01 NA NA NA
Relative_Divisiveness -3.831e-01 NA NA NA
Obama_General_Percent_12 1.904e+00 NA NA NA
Percent_Over_65 -4.848e-02 NA NA NA
Percent_F 1.737e-01 NA NA NA
Percent_White 2.980e+00 NA NA NA
Percent_HS -3.563e-02 NA NA NA
Per_Capita_Income 7.413e-05 NA NA NA
Poverty_Rate -2.273e-02 NA NA NA
MRP_Ideology_Mean -8.316e-01 NA NA NA
Swing_State 1.580e+00 NA NA NA
Log(theta) 9.595e+00 NA NA NA
Zero-inflation model coefficients (binomial with logit link):
Estimate Std. Error z value Pr(>|z|)
(Intercept) -1.024e+02 NA NA NA
Relative_Divisiveness -3.265e+00 NA NA NA
Obama_General_Percent_12 -2.300e+01 NA NA NA
Percent_Over_65 -7.768e-02 NA NA NA
Percent_F 2.873e+00 NA NA NA
Percent_White 5.156e+00 NA NA NA
Percent_HS -5.097e-01 NA NA NA
Per_Capita_Income 2.831e-04 NA NA NA
Poverty_Rate 1.391e-02 NA NA NA
MRP_Ideology_Mean -2.569e+00 NA NA NA
Swing_State 5.075e-01 NA NA NA
Theta = 14696.9932
Number of iterations in BFGS optimization: 94
Log-likelihood: -596.5 on 23 Df
Obviously, all of those NA's are less then helpful to me. Any advice would be greatly appreciated! I'm pretty novice at R, StackOverflow, and Statistics, but trying to learn. I'm trying to provide everything needed for the minimal reproducible example, but I don't see anywhere to share my actual data... so if that's something you need in order to answer the question, let me know where I can put it!

Fitting a linear regression model in R with confounding variables

I have a dataset called datamoth where survival is the response variable and treatment is a variable that can be considered both categorical and quantitative. The dataset looks like follows:
survival <- c(17,22,26,20,11,14,37,26,24,11,11,16,8,5,12,3,5,4,14,8,4,6,3,3,10,13,5,7,3,3)
treatment <- c(3,3,3,3,3,3,6,6,6,6,6,6,9,9,9,9,9,9,12,12,12,12,12,12,21,21,21,21,21,21)
days <- c(3,3,3,3,3,3,6,6,6,6,6,6,9,9,9,9,9,9,12,12,12,12,12,12,21,21,21,21,21,21)
datamoth <- data.frame(survival, treatment)
So, I can fit a linear regression model considering treatment as categorical, like this:
lmod<-lm(survival ~ factor(treatment), datamoth)
My question is how to fit a linear regression model with treatment as categorical variable but also considering treatment as a quantitative confounding variable.
I have figured out something like this:
model <- lm(survival ~ factor(treatment) + factor(treatment)*days, data = datamoth)
summary(model)
Call:
lm(formula = survival ~ factor(treatment) + factor(treatment) *
days, data = datamoth)
Residuals:
Min 1Q Median 3Q Max
-9.833 -3.333 -1.167 3.167 16.167
Coefficients: (5 not defined because of singularities)
Estimate Std. Error t value Pr(>|t|)
(Intercept) 18.333 2.435 7.530 6.96e-08 ***
factor(treatment)6 2.500 3.443 0.726 0.47454
factor(treatment)9 -12.167 3.443 -3.534 0.00162 **
factor(treatment)12 -12.000 3.443 -3.485 0.00183 **
factor(treatment)21 -11.500 3.443 -3.340 0.00263 **
days NA NA NA NA
factor(treatment)6:days NA NA NA NA
factor(treatment)9:days NA NA NA NA
factor(treatment)12:days NA NA NA NA
factor(treatment)21:days NA NA NA NA
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 5.964 on 25 degrees of freedom
Multiple R-squared: 0.5869, Adjusted R-squared: 0.5208
F-statistic: 8.879 on 4 and 25 DF, p-value: 0.0001324
But obviously this code is not working because these two variables are collinear.
Does anyone to know how to fix it? Any help will be appreciated.

Keep getting NaNs when carrying out TukeyHSD in R

I have this small data frame that I want to carry out a TukeyHSD test on.
data.frame': 4 obs. of 4 variables:
$ Species : Factor w/ 4 levels "Anthoxanthum",..: 1 1 1 1
$ Harvest : Factor w/ 4 levels "b","c","d","e": 1 2 3 4
$ Total : num 0.2449 0.1248 0.0722 0.1025
I perform an analysis of variance with aov:
anthox1 <- aov(Total ~ Harvest, data=anthox)
anthox.tukey <- TukeyHSD(anthox1, "Harvest", conf.level = 0.95)
but when I run the TukeyHSD I get this message:
Warning message:
In qtukey(conf.level, length(means), x$df.residual) : NaNs produced
Can anyone help me to fix the problem and also explain why this is happening. I feel like everything is correctly written (code and data) but for some reason it does not want to work.
Since you have exactly one observation per group, you get a perfect fit:
Total <- c(0.2449, 0.1248, 0.0722, 0.1025)
Harvest <- c("b","c","d","e")
anthox1 <- aov(Total ~ Harvest)
summary.lm(anthox1)
#Call:
# aov(formula = Total ~ Harvest)
#
#Residuals:
# ALL 4 residuals are 0: no residual degrees of freedom!
#
# Coefficients:
# Estimate Std. Error t value Pr(>|t|)
#(Intercept) 0.2449 NA NA NA
#Harvestc -0.1201 NA NA NA
#Harvestd -0.1727 NA NA NA
#Harveste -0.1424 NA NA NA
#
#Residual standard error: NaN on 0 degrees of freedom
#Multiple R-squared: 1, Adjusted R-squared: NaN
#F-statistic: NaN on 3 and 0 DF, p-value: NA
This means you don't have enough residual degrees of freedom for a Tukey test (or for any statistics).

Regression summary in R returns a bunch of NAs

Trying to run an uncomplicated regression in R and receiving long list of coefficient values with NAs for standard error and t-value. I've never experienced this before.
Result:
summary(model)
Call:
lm(formula = fed$SPX.Index ~ fed$Fed.Treasuries...MM., data = fed)
Residuals:
ALL 311 residuals are 0: no residual degrees of freedom!
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 1258.84 NA NA NA
fed$Fed.Treasuries...MM. 1,016,102 0.94 NA NA NA
fed$Fed.Treasuries...MM. 1,030,985 17.72 NA NA NA
fed$Fed.Treasuries...MM. 1,062,061 27.12 NA NA NA
fed$Fed.Treasuries...MM. 917,451 -52.77 NA NA NA
fed$Fed.Treasuries...MM. 949,612 -30.56 NA NA NA
fed$Fed.Treasuries...MM. 967,553 -23.61 NA NA NA
Residual standard error: NaN on 0 degrees of freedom
Multiple R-squared: 1, Adjusted R-squared: NaN
F-statistic: NaN on 310 and 0 DF, p-value: NA
head(fed)
X Fed.Treasuries...MM. Reserve.Repurchases Agency.Debt.Held Treasuries.Maturing.in.5.10.years SPX.Index
1 10/1/2008 476,621 93,063 14,500 93,362 1161.06
2 10/8/2008 476,579 77,349 14,105 93,353 984.94
3 10/15/2008 476,555 107,819 14,105 94,336 907.84
4 10/22/2008 476,512 95,987 14,105 94,327 896.78
5 10/29/2008 476,469 94,655 13,620 94,317 930.09
6 11/5/2008 476,456 96,663 13,235 94,312 952.77
You have commas in your numbers in your CSV file, R reads them as characters. Your model then has as many levels as rows, and so is degenerate.
Illustration. Take this CSV file:
1, "1,234", "2,345,565"
2, "2,345", "3,234,543"
3, "3,234", "3,987,766"
Read in, fit first column (numbers) against third column (comma-separated numbers):
> fed = read.csv("commas.csv",head=FALSE)
> summary(lm(V1~V3, fed))
Call:
lm(formula = V1 ~ V3, data = fed)
Residuals:
ALL 3 residuals are 0: no residual degrees of freedom!
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 1 NA NA NA
V3 3,234,543 1 NA NA NA
V3 3,987,766 2 NA NA NA
Residual standard error: NaN on 0 degrees of freedom
Multiple R-squared: 1, Adjusted R-squared: NaN
F-statistic: NaN on 2 and 0 DF, p-value: NA
Note this is exactly what you are getting but with different column names. So this almost certainly must be what you have.
Fix. Convert column:
> fed$V3 = as.numeric(gsub(",","", fed$V3))
> summary(lm(V1~V3, fed))
Call:
lm(formula = V1 ~ V3, data = fed)
Residuals:
1 2 3
0.02522 -0.05499 0.02977
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -1.875e+00 1.890e-01 -9.922 0.0639 .
V3 1.215e-06 5.799e-08 20.952 0.0304 *
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 0.06742 on 1 degrees of freedom
Multiple R-squared: 0.9977, Adjusted R-squared: 0.9955
F-statistic: 439 on 1 and 1 DF, p-value: 0.03036
Repeat over columns as necessary.

Resources