"Non conformable arguments" error with pgmm (plm library) - r

I am unsuccessfully trying to do the Arellano and Bond (1991) estimation using pgmm from the plm package. To see if the problem was in my data, I instead used the data supplied i the plm library, but got the same problem when using the "summary" command:
Error in t(y) %*% x : non-conformable arguments
The coefficients of the model can be obtained though.
My own data has T=3, N=290. As I understand it T=3 is the minimnum, but should be sufficient. When using the Arellano and Bond data, I get the same error when T=4.
data("EmplUK", package = "plm")
library(sqldf)
UK<-sqldf("select * from EmplUK where year in ('1982','1981',
'1980','1979')")
z1 <- pgmm(log(emp) ~ lag(log(emp), 1) + log(wage) +
log(capital) + log(output) | lag(log(emp), 2),
data = UK, effect = "twoways", model = "twosteps")
summary(z1)
The way I understand the estimation method and the R-formula, the left hand term is the difference in the dependent variable, and the first right hand term is the lagged difference. And the latter term is instrumented by the level of the dependent variable in (t-2)
I have verified that subset I use is a balanced panel with T=4. When I include more years, everything works out. So it must be the length of the panel that causes troubles.
Any help would be much appreciated.

A similar question is asked here. It is suggested that the error has to do with mtest, a serial correlation test performed by the pgmm summary method. Running the function separately seems to confirm this
>mtest(z1, order = 2)
Error in t(y) %*% x : non-conformable arguments
T=3 is enough to estimate the model, but this only only leaves you with an estimate for the last period. A second order mtest requires the residuals to contain at least 3 periods, i.e. T=5 for your model.

Related

lmer failing with na.pass

When I run a lmer model with lme4 using na.pass as the na.action, I get the following error:
R: NA/NaN/Inf in foreign function call (arg 1)
I run the model like this:
model1 <- lme4::lmer(agg_dv_singing ~ GMS.Musical.Training +
JAJ.ability + MDT.ability + MPT.ability + PDCT.ability +
PIAT.ability + agg_dv_long_note + demographics.age +
aggiv_entropy + aggiv_interval_complexity +
aggiv_rhythmic_complexity + aggiv_tonal_complexity +
log.freq + length + (1|p_id),
data = dat, na.action = na.pass)
summary(dat) indicates that there are no Inf or NaN values, although yes, there are many NA values.
Running na.pass outside of lmer on the same data set does not give an error:
na.pass(dat)
So what could be going wrong within lmer?
Comments to a previous question of yours attempted to explain that, in general, mixed model machinery cannot handle estimation from cases when there are missing values in the predictors; it just doesn't work that way. If you want to fit mixed models with missing data you need to do some form of imputation, i.e. filling in values for missing predictors (e.g. see the mice package, which is more or less the state of the art at least as far as the R ecosystem is concerned). Here is what the four different standard na.* actions do in the context of mixed models:
na.fail(): fail immediately if there are missing values in the data (predictors or response). This is frustrating, but alerts you immediately to the fact that you have missing data, and lets you decide what to do about it.
na.omit(): drop non-complete cases from the data before fitting.
na.exclude(): like na.omit(), but keep track of the locations of the excluded cases. When using predict() or residuals() (or any function that produces results per observation), reconstitute a complete data set with NA values for the non-complete cases in the original data set. (I usually find this setting to be the most useful default.)
na.pass: do not remove NA values, but attempt to continue with the fitting procedure. As you found out, this usually doesn't work at all! It will just pass the NA values down through the code until something goes wrong. Typically one of two things happens at this point:
if the entire estimation procedure is written using R functions that can handle and propagate missing values, then you'll usually get a fitted model object with NA/NaN for all coefficients, likelihoods, etc. etc. (because the missing values contaminate the entire fitting procedure);
if some step of the estimation procedure can't handle NA/NaN values (as in this case), you get an inscrutable error from the first point in the procedure that fails.
If you look at the source code of na.pass() (by typing na.pass at the R prompt), you'll see that in fact all it does is return the same object, unchanged. To be honest, I'm not really sure why na.pass even exists, except for completeness ... (or compatibility with S)
Your NA value was not in a parameter that is used in a random-effects term: if it had, you would have gotten a more interpretable error message:
library(lme4)
ss <- sleepstudy
ss[1,"Days"] <- NA
lmer(Reaction ~ Days + (Days|Subject), ss, na.action=na.pass)
Error in lme4::lFormula(formula = Reaction ~ Days + (Days | Subject), :
NA in Z (random-effects model matrix): please use "na.action='na.omit'" or "na.action='na.exclude'"
If I fit a model with (1|Subject), so that the NA value only affects the fixed effects
lmer(Reaction ~ Days + (1|Subject), ss, na.action=na.pass)
then we get your error message.
Error in qr.default(X, tol = tol, LAPACK = FALSE) :
NA/NaN/Inf in foreign function call (arg 1)
traceback() tells me that this happens in the internal chkRank.drop.cols() function, where R is trying to figure out if any of your fixed-effect columns are collinear. There should probably be a check for missing values there ...

How to apply time transformation to a factor variable in cph( ) function

Hi I'm currently trying to fit a cox regression model but when I run the cox.zph() function, my p value is constantly smaller than 0.05 with the Delay variable being a major issue. It is a binary factor variable (0 or 1) describing whether a patient had early (0) or delayed (1) treatment. So from what I read on how to deal with this issue I tried to apply the tt() function to the Delay variable as such:
cox.death <- cph(Surv(diff, event)~tt(Delay)+rcs(age,3)+female+rcs(hba1c,3)+rcs(bmi,3)+rcs(egfr,3)+rcs(sbp,3),
data=cox.model, x = T, y = T)
But then I get this error message: Error in tt(Delay) : could not find function "tt"
I'm not quite familiar with the rms package but is there any solution to this problem? Thanks!

Heteroskedasticity random effects regression with the plm package (correction & how to report)

I have already checked a couple of topics and also found some help regarding heteroskedasticity in panel regressions. But unfortunately, some questions have remained unsolved.
Following example (some repeated measures, data already in long format):
Panelregr <- plm(V1~ V2 + V3 + V4, data = XY, model ="random")
Then I checked for heteroskedasticity:
B.P.Test <- bptest(V ~ V2 + V3 + V4, data=XY, studentize = F)
The test was highly significant --> heteroskedasticity
Then I read (Link: https://www.princeton.edu/~otorres/Panel101R.pdf) about using robust covariance matrix to account for hetereoskedasticity. For the example above I used the code
coeftest(Panelregr, vcovHC)
summary(Panelregr, vcov = vcovHC)
and got the results. But I could also use
coeftest(Panelregr, vcovHC(Panelregr, type = "HC3"))
or the other types HC0 - HC4
Now some questions came up:
Which estimator of these five types do I receive when I use coeftest(Panelregr, vcovHC) instead of defining one particular HC..? Is it HC0?
How do I know which HC... fits to my data? (I read some information, for example: https://cran.r-project.org/web/packages/sandwich/vignettes/sandwich.pdf , page 4, but I´m still not sure how to decide).
How do I describe the results in case of the use of one of these correct estimators? Example: "In order to account for heteroskedasticity, a robust covariance metrix was used. In detail, we used the HC... estimator as ... In the following table, the results of the HC... estimator are shown."
When I correct for hetereosk. , the results don´t include values like R-squared. Is it correct to report the corrected values (e.g. coeftest(Panelregr, vcovHC) and to report values like R-squared from the "originial" Panel regression (Panelregr <- plm(V1~ V2 + V3 + V4, data = XY, model ="random"))?
1) The default one (see ?vcovHC) and for plm::vcovHC that is HC0 as it is the first value mentioned for argument type.
3) HC0, HC1, ... are scaling factors for the variance-covariance matrix. Good to mentioned that. You also want to mention the estimator, i.e. what is given by the method argument. A typical choice is the estimator by Arellano (1987) and it is the default for plm::vcovHC.
4) The R^2 is not impacted by using a het.-consistent variance-covariance matrix. However, the F-statistic is. summary(Panelregr, vcov = vcovHC) gives you what you need.

Standard error in glm output

I am using r glm to model Poisson data binned by year. So I have x[i] counts with T[i] exposure in each year, i. The r glm with poisson family log link output produces model coefficients a, b for y = a + bx.
What I need is the standard error of (a + bx) not the standard error of a or the standard error of b. The documentation describing a solution I am trying to implement says this should be calculated by the software because it is not straightforward to calculate from the parameters for a and b. Perhaps SAS does the calc, but I am not recognizing it in R.
I am working working through section 7.2.4.5 of the Handbook of Parameter Estimation (NUREG/CR-6823, a public document) and looking at eq 7.2. Also not a statistician so I am finding this is very hard to follow.
The game here is to find the 90 percent simultaneous confidence interval on the model output, not the confidence interval at each year, i.
Adding this here so I can show some code. The first answer below appears to get me pretty close. A statistician here put together the following function to construct the confidence bounds. This appears to work.
# trend line simultaneous confidence intervals
# according to HOPE 7.2.4.5
HOPE = function(x, ...){
t = data$T
mle<-predict(model, newdata=data.frame(x=data$x), type="response")
se = as.data.frame(predict(model, newdata=data.frame(x=data$x), type="link", se.fit=TRUE))[,2]
chi = qchisq(.90, df=n-1)
upper = (mle + (chi * se))/t
lower = (mle - (chi * se))/t
return(as.data.frame(cbind(mle, t, upper, lower)))
}
I think you need to provide the argument se.fit=TRUE when you create the prediction from the model:
hotmod<-glm(...)
predz<-predict(hotmod, ..., se.fit=TRUE)
Then you should be able to find the estimated standard errors using:
predz$se.fit
Now if you want to do it by hand on this software, it should not be as hard as you suggest:
covmat<-vcov(hotmod)
coeffs<-coef(hotmod)
Then I think the standard error should simply be:
sqrt(t(coeffs) %*% covmat %*% coeffs)
The operator %*% can be used for matrix multiplication in this software.

Error message: Error in fn(x, ...) : Downdated VtV is not positive definite

I'm trying to use the lmer function to create a minimum adequate model. My model is Mated ~ Size * Attempts * Status + (random factor).
as.logical(Mated)
as.numeric(Size)
as.factor(Attempts)
as.factor(Status)
(These have all worked on previous models)
So after all that I try running my model:
Model1<-lmer(Mated ~ Size*Status*Attempts + (1|FemaleID),data=mydata)
And it can be submitted without fault.It's only when I try to apply this update that it goes wrong:
Model2<-update(Model1, REML=FALSE)
Here is the error message supplied:
Error in fn(x, ...) : Downdated VtV is not positive definite
If I make a third model without the interaction and do an ANOVA between that and model one, then it says the two are significantly different.
Model3<-update(Model1,~.-Size:Status:Attempts
anova(Model1,Model3)
What am I doing wrong? Is the three way interaction really significant or have I made some mistake?
Thank you
If Mated is binary, then you should probably be using glmer with a logit or probit link function instead, something like:
model <- glmer(Mated ~ Size * Status * Attempts + (1|FemaleID),
data = mydata, family = binomial)
It would help if you could let us know what your data looks like (head(mydata) might be fine, or see here for how to make a reproducible example).
Also, I would avoid making Mated logical (see this question and answer for how it can make your life more difficult). Instead, as.factor(Mated) will explicitly make your response variable discrete.
After that, you can compare your full and reduced models with anova().

Resources