R : Clustered standard errors in fractional probit model - r

I need to estimate a fractional (response taking values between 0 and 1) model with R. I also want to cluster the standard errors. I have found several examples in SO and elsewhere and I built this function based on my findings:
require(sandwich)
require(lmtest)
clrobustse <- function(fit, cl){
M <- length(unique(cl))
N <- length(cl)
K <- fit$rank
dfc <- (M/(M - 1))*((N - 1)/(N - K))
uj <- apply(estfun(fit), 2, function(x) tapply(x, cl, sum))
vcovCL <- dfc*sandwich(fit, meat = crossprod(uj)/N)
coeftest(fit, vcovCL)
}
I estimate my model like this:
fit <- glm(dep ~ exp1 + exp2 + exp3, data = df, fam = quasibinomial("probit"))
clrobustse(fit, df$cluster)
Everything works fine and I get the results. However, I suspect that something is not right as the non-clustered version:
coeftest(fit)
gives the exact same standard errors. I checked that Stata reports and that displays different clustered errors. I suspect that I have misspecified the function clrobustse but I just don't know how. Any ideas about what could be going wrong here?

Related

stargazer: odds ratio for hazards model with wrong significance stars

I have been using the following function to output logistic regression tables with stargazer as odds ratio with the right significance stars and standard errors:
stargazer2 <- function(model, odd.ratio = F, ...) {
if(!("list" %in% class(model))) model <- list(model)
if (odd.ratio) {
coefOR2 <- lapply(model, function(x) exp(coef(x)))
seOR2 <- lapply(model, function(x) exp(coef(x)) * summary(x)$coef[, 2])
p2 <- lapply(model, function(x) summary(x)$coefficients[, 4])
stargazer(model, coef = coefOR2, se = seOR2, p = p2, ...)
} else {
stargazer(model, ...)
}
}
However, this no longer works when I'm using a hazards model analysis, and the significance stars move around in the output. Reproducible example below:
library(survival)
library(stargazer)
data("diabetic")
juvenile <- 1*(diabetic$age < 20)
fit <-coxph(Surv(time, status) ~ trt + juvenile, cluster= id,
data= diabetic)
stargazer2(fit, odd.ratio = T, type = "text")
stargazer2(fit, odd.ratio = F, type = "text")
The citation for that code brings up a blocked webpage. I don't think this line is correct:
seOR2 <- lapply(model, function(x) exp(coef(x)) * summary(x)$coef[, 2])
The reported effect measures are different. The "odd.ratio = T" version is reporting the exponentiated coefficient estimate while the "odd.ratio = F" version is displaying the unexponentiated version. Quite frankly the code you have copied is highly suspect for the line that calculates the seOR2 as well as the fact that it uses the sloppy and dangerous shortcut of T and F for logical values. The p-values for the ordinary stargazer call is correct while the new improved version is not.
The citation for that code brings up a blocked webpage. I don't think this line is correct:
seOR2 <- lapply(model, function(x) exp(coef(x)) * summary(x)$coef[, 2])
The reported effect measures are different. The "odd.ratio = T" version is reporting the exponentiated coefficient estimate while the "odd.ratio = F" version is displaying the unexponentiated version. Quite frankly the code you have copied is highly suspect for the line that calculates the seOR2 as well as the fact that it uses the sloppy and dangerous shortcut of T and F for logical values. The p-values for the ordinary stargazer call are correct while the new improved version is not. You might want to contact the author of that code to advise them to check their statistical logic.

rjags model negative binomial likelihood and gamma prior

I read in my data. I make the model string. I hand it JAGS. I get "Error in node y[1] - Node inconsistent with parents".
Y=read.table("data.txt",header=T)
Y=Y$Y
model_string <- "model{
# Likelihood:
for( i in 1 : N ) {
y[i] ~ dnegbin( l , r )
}
# Prior:
r ~ dgamma(1,1)
l ~ dgamma(.1,.1)
}"
model <- jags.model(textConnection(model_string),
data = list(y=Y,N=200))
First off all, I have no clue if my model is right. I cannot find even basic documentation for JAGS. I'm actually ashamed to admit it, because this should be as simple as an internet search, but I cannot find any document to tell me 1) how a JAGS model is set up or 2) what kinds of functions/distribution/parameters are available in JAGS. I only got this far because I found someone doing a similar model. If anyone knows of a JAGS wiki or documentation, that would be great.
Edit: If someone could even just tell me what the parameters for dnegbin are that would be a huge help. When I plug in random numbers for l and r in dnegbin(l,r) it 'works' as in it draws numbers for l and r, but I have no clue if it means anything.
You can find some info about dnegbin in the JAGS user manual.
The first parameter of dnegbin must be between and 0 and 1. You can assign e.g. a uniform distribution:
library(rjags)
model_string <- "model{
# Likelihood:
for( i in 1 : N ) {
y[i] ~ dnegbin( l , r )
}
# Prior:
r ~ dgamma(1,1)
l ~ dunif(0,1)
}"
y <- rpois(200, 10)
model <- jags.model(textConnection(model_string),
data = list(y=y, N=length(y)))
You also have to be sure that the values of y are non-negative integers.

R: Matrix multiplication error - related to GLM

I've been trying to build some custom code for Logistic regression (i.e. I cannot use the GLM package for this purpose since - happy to explain why.)
Below is the initial R code to provide the data set I'm working with:
## Load the datasets
data("titanic_train")
data("titanic_test")
## Combining Training and Testing dataset
complete_data <- rbind(titanic_train, titanic_test)
library(dplyr)
titanic_test$Survived <- 2
complete_data <- rbind(titanic_train, titanic_test)
complete_data$Embarked[complete_data$Embarked==""] <- "S"
complete_data$Age[is.na(complete_data$Age)] <-
median(complete_data$Age,na.rm=T)
complete_data <- as.data.frame(complete_data)
titanic_data <- select(complete_data,-c(Cabin, PassengerId, Ticket,
Name))
titanic_data <- titanic_data[!titanic_data$Survived == "2", ]
titanic_model <- model <- glm(Survived
~.,family=binomial(link='logit'),data=titanic_data)
y <- titanic_data$Survived
x <- as.data.frame(cbind(rep(1, dim(titanic_data)
[1]),titanic_data[,-2]))
x <- as.matrix(as.numeric(x))
beta <- as.numeric(rep(0, dim(x)[2]))
beta <- as.matrix(beta)
The issue I'm having here is I would like to compute the matrix product of beta (a px1 matrix) and x (a n x p matrix)
I have tried the following -
beta * x
x %*% beta
However, the above the following errors -
Error in FUN(left, right) : non-numeric argument to binary operator
Error in x %*% beta : requires numeric/complex matrix/vector arguments
I'd imagine this is due to the fact I've got non-numeric fields in the data matrix x.
As a bit of a background, calculating the linear predictor will allow me to progress with my custom code for fitting a Logistic regression model.
I would appreciate some help - thank you!

How to extract p-values from lmekin objects in coxme package

I want to be able to view the p-values for the lmekin object produced by the coxme package.
eg.
model= lmekin(formula = height ~ score + sex + age + (1 | IID), data = phenotype_df, varlist = kinship_matrix)
I tried:
summary(model)
coef(summary(model))
summary(model$coefficient$fixed)
fixef(model)/ sqrt(diag(vcov(model)) #(Calculates Z-scores but not p-values)
But these did not work. So how do I view the p-values for this linear mixed model?
It took me ages of searching to figure this out, but I noticed a lot of other similar questions without proper answers, so I wanted to answer this.
You use:
library(coxme)
print(model)
Note it is important to load the coxme package beforehand or it will not work.
I've also noticed a lot of posts about how to extract the p-value from lmekin objects, or how to extract the p-value from coxme objects in general. I wrote this function, which is based on the coxme:::print.coxme function code (to view code type coxme:::print.coxme directly into R). print calculates p-values on the fly - this function allows the extraction of p-values and saves them to an object.
Note that mod is your model, eg. mod <- lmekin(y~x+a+b)
Use print(mod) to double check that the tables match.
extract_coxme_table <- function (mod){
beta <- mod$coefficients$fixed
nvar <- length(beta)
nfrail <- nrow(mod$var) - nvar
se <- sqrt(diag(mod$var)[nfrail + 1:nvar])
z<- round(beta/se, 2)
p<- signif(1 - pchisq((beta/se)^2, 1), 2)
table=data.frame(cbind(beta,se,z,p))
return(table)
}
I arrived at this topic because I was looking for the same thing for just the coxme object. The function of IcedCoffee works with a micro adjustment:
extract_coxme_table <- function (mod){
beta <- mod$coefficients #$fixed is not needed
nvar <- length(beta)
nfrail <- nrow(mod$var) - nvar
se <- sqrt(diag(mod$var)[nfrail + 1:nvar])
z<- round(beta/se, 2)
p<- signif(1 - pchisq((beta/se)^2, 1), 2)
table=data.frame(cbind(beta,se,z,p))
return(table)
}

R Harmonic Prediction Failing - newdata structure

I am forecasting a time series using harmonic regression created as such:
(Packages used: tseries, forecast, TSA, plyr)
airp <- AirPassengers
TIME <- 1:length(airp)
SIN <- COS <- matrix(nrow = length(TIME), ncol = 6,0)
for (i in 1:6){
SIN[,i] <- sin(2*pi*i*TIME/12)
COS[,i] <- cos(2*pi*i*TIME/12)
}
SIN <- SIN[,-6]
decomp.seasonal <- decompose(airp)$seasonal
seasonalfit <- lm(airp ~ SIN + COS)
The fitting works just fine. The problem occurs when forecasting.
TIME.NEW <- seq(length(TIME)+1, length(TIME)+12, by=1)
SINNEW <- COSNEW <- matrix(nrow=length(TIME.NEW), ncol = 6, 0)
for (i in 1:6) {
SINNEW[,i] <- sin(2*pi*i*TIME.NEW/12)
COSNEW[,i] <- cos(2*pi*i*TIME.NEW/12)
}
SINNEW <- SINNEW[,-6]
prediction.harmonic.dataframe <- data.frame(TIME = TIME.NEW, SIN = SINNEW, COS = COSNEW)
seasonal.predictions <- predict(seasonalfit, newdata = prediction.harmonic.dataframe)
This causes the warning:
Warning message:
'newdata' had 12 rows but variables found have 144 rows
I went through and found that the names were SIN.1, SIN.2, et cetera, instead of SIN1 and SIN2... So I manually changed those and it still didn't work. I also manually removed the SIN.6 because it, for some reason, was still there.
Help?
Edit: I have gone through the similar posts as well, and the answers in those questions did not fix my problem.
Trying to predict with a data.frame after fitting an lm model with variables not inside a data.frame (especially matrices) is not fun. It's better if you always fit your model from data in a data.frame.
For example if you did
seasonalfit <- lm(airp ~ ., data.frame(airp=airp,SIN=SIN,COS=COS))
Then your predict would work.
Alternatively you can try to cram matrices into data.frames but this is generally a bad idea. You would do
prediction.harmonic.dataframe <- data.frame(TIME = TIME.NEW,
SIN = I(SINNEW), COS = I(COSNEW))
The I() (or AsIs function) will keep them as matrices.

Resources