I want to analyze when the claims of a protest are directed at the state, based on action and country level characteristics, using glmer. So, I would like to obtain p-values of both the fixed and random effects. My model looks like this:
targets <- glmer(state ~ ENV + HLH + HRI + LAB + SMO + Capital +
(1 + rile + parties + rep + rep2 + gdppc + election| Country),
data = df, family = binomial)
The output only gives me the Variance & Std.Dev. of the random effects, as well as the correlations among them, which makes sense for most multilevel analyses but not for my purposes. Is there any way I can get something like the estimates and the p-values for the random effects?
If this cannot be done with R, is there any other statistical software that would give such an output?
UPDATE: Following the suggestions here, I have moved this question to Cross Validated: https://stats.stackexchange.com/questions/381208/r-how-to-get-estimates-and-p-values-for-random-effects-in-glmer
library(lme4)
library(lattice)
xyplot(incidence/size ~ period|herd, cbpp, type=c('g','p','l'),
layout=c(3,5), index.cond = function(x,y)max(y))
gm1 <- glmer(cbind(incidence, size - incidence) ~ period + (1 | herd),
data = cbpp, family = binomial)
summary(gm1)
Related
I would like to calculate marginal effects for this logistic model with clustered standard errors which I computed with miceadds::glm.cluster.
fullmodel3 <- miceadds::glm.cluster(data = SDdataset17,
formula = stigmatisation_dummy_num ~ gender + age +
agesquared + education_new + publicsector +
retired + socialisation_sd + selfplacement_num +
years_membership + voteshare,
cluster = "voteshare", family = "binomial")
Given that I am not using glm(), most functions I have seen around do not work.
Any suggestions?
I am running a gam model based on a large dataset with many variables. My response variable is the level of "recruitment" by a herd every fall/autumn. This is calculated by the fawn:female ratio every fall/autumn over a 60 year period.
My problem is that there are many years and study sites where only between 1 - 10 females are recorded. This means that the robustness of the ratio is not trustworthy. For example if one female and one fawn is seen, it has a recruitment of 100%, but if they see one more female, that drops by 50%!
I need to tell the model that years/study sites with smaller sample sizes should be weighted less than those with larger sample sizes as these smaller sample sizes are no doubt affecting the results.
Above is a table of the females observed every year and a histogram of the same.
My model is as follows:
gamFIN <- gam(Fw.FratioFall
~ s(year)
+ s(percentage_woody_coverage)
+ s(kmRoads.km2)
+ s(WELLS_ACTIVEinsideD)
+ s(d3)
+ s(WT_DEER_springsurveys)
+ s(BadlandsCoyote.1000_mi)
+ s(Average_mintemp_winter, BadlandsCoyote.1000_mi)
+ s(BadlandsCoyote.1000_mi, WELLS_ACTIVEinsideD)
+ s(BadlandsCoyote.1000_mi, d3)
+ s(YEAR, bs = "re") + s(StudyArea, bs = "re"), method = "REML", select = T, data = mydata)
How might I tell the model to weight my response variable by the sample sizes they are based on.
Do not model this as a ratio for your outcome. Instead model the fawn counts as your outcome and model the female counts via an offset() term using logged values on the RHS of the formula. You should be offsetting with the log of the fawn count. So the formula would look like this:
Fawns
~ s(year)
+ all_those_smooth_terms
+ offset( lnFemale_counts)
The gam models have an implicit log link which is the reason for the logging of the Female counts.
Edit (Gavin's correct. The default for gam is not a linear link):
gamFIN <- gam(FawnFall ~ s(year) + s(percentage_woody_coverage) + s(kmRoads.km2) +
s(WELLS_ACTIVEinsideD) + s(d3) + s(WT_DEER_springsurveys) +
s(BadlandsCoyote.1000_mi) + s(Average_mintemp_winter, BadlandsCoyote.1000_mi) +
s(BadlandsCoyote.1000_mi, WELLS_ACTIVEinsideD) + s(BadlandsCoyote.1000_mi, d3) +
s(YEAR, bs = "re") + s(StudyArea, bs = "re") + offset(FemaleFall),
family="poisson", method = "REML", select = T, data = mydata)
I'm working with a model from the Prestige dataset in the car package in R.
library(car)
library(carData)
data = na.omit(Prestige)
prestige = data$prestige
income = data$income
education = data$education
type = data$type
I'm trying to fit the model lm(prestige ~ income + education + type + income:type + education:type). For class I'm starting with the full model and working down to a smaller model, just backward selection. One of the least useful covariates according to p-value is the education:typeprof. How do I just delete that covariate from the model without taking out all the education:type interactions? In general how do you exclude interactions with factors? I saw an answer with the update function specifying which interaction to exclude but it didn't work in my case. Maybe I implemented it incorrectly.
fit4 = lm(prestige ~ income + education + type + income:type + education:type)
newfit = update(fit4, . ~ . - education:typeprof)
Unfortunately this didn't work for me.
So there is a way to drop a single interaction term. Suppose you have the linear model
fullmodel = lm(y_sim ~ income + education + type + income:type + education:type - 1)
You can call model.matrix on fullmodel which will give you the X matrix for your linear model. From there you can specify which column you'd like to drop and refit your model.
X = model.matrix(fullmodel)
drop = which(colnames(X) == 'education:typeprof')
X1 = X[,-1]
newfit = lm(presitge ~ X1 - 1)
Because this is such a long question I've broken it down into 2 parts; the first being just the basic question and the second providing details of what I've attempted so far.
Question - Short
How do you fit an individual frailty survival model in R? In particular I am trying to re-create the coefficient estimates and SE's in the table below that were found from fitting the a semi-parametric frailty model to this dataset link. The model takes the form:
h_i(t) = z_i h_0(t) exp(\beta'X_i)
where z_i is the unknown frailty parameter per each patient, X_i is a vector of explanatory variables, \beta is the corresponding vector of coefficients and h_0(t) is the baseline hazard function using the explanatory variables disease, gender, bmi & age ( I have included code below to clean up the factor reference levels).
Question - Long
I am attempting to follow and re-create the Modelling Survival Data in Medical Research text book example for fitting frailty mdoels. In particular I am focusing on the semi parametric model for which the textbook provides parameter and variance estimates for the normal cox model, lognormal frailty and Gamma frailty which are shown in the above table
I am able to recreate the no frailty model estimates using
library(dplyr)
library(survival)
dat <- read.table(
"./Survival of patients registered for a lung transplant.dat",
header = T
) %>%
as_data_frame %>%
mutate( disease = factor(disease, levels = c(3,1,2,4))) %>%
mutate( gender = factor(gender, levels = c(2,1)))
mod_cox <- coxph( Surv(time, status) ~ age + gender + bmi + disease ,data = dat)
mod_cox
however I am really struggling to find a package that can reliably re-create the results of the second 2 columns. Searching online I found this table which attempts to summarise the available packages:
source
Below I have posted my current findings as well as the code I've used encase it helps someone identify if I have simply specified the functions incorrectly:
frailtyEM - Seems to work the best for gamma however doesn't offer log-normal models
frailtyEM::emfrail(
Surv(time, status) ~ age + gender + bmi + disease + cluster(patient),
data = dat ,
distribution = frailtyEM::emfrail_dist(dist = "gamma")
)
survival - Gives warnings on the gamma and from everything I've read it seems that its frailty functionality is classed as depreciated with the recommendation to use coxme instead.
coxph(
Surv(time, status) ~ age + gender + bmi + disease + frailty.gamma(patient),
data = dat
)
coxph(
Surv(time, status) ~ age + gender + bmi + disease + frailty.gaussian(patient),
data = dat
)
coxme - Seems to work but provides different estimates to those in the table and doesn't support gamma distribution
coxme::coxme(
Surv(time, status) ~ age + gender + bmi + disease + (1|patient),
data = dat
)
frailtySurv - I couldn't get to work properly and seemed to always fit the variance parameter with a flat value of 1 and provide coefficient estimates as if a no frailty model had been fitted. Additionally the documentation doesn't state what strings are support for the frailty argument so I couldn't work out how to get it to fit a log-normal
frailtySurv::fitfrail(
Surv(time, status) ~ age + gender + bmi + disease + cluster(patient),
dat = dat,
frailty = "gamma"
)
frailtyHL - Produce warning messages saying "did not converge" however it still produced coeficiant estimates however they were different to that of the text books
mod_n <- frailtyHL::frailtyHL(
Surv(time, status) ~ age + gender + bmi + disease + (1|patient),
data = dat,
RandDist = "Normal"
)
mod_g <- frailtyHL::frailtyHL(
Surv(time, status) ~ age + gender + bmi + disease + (1|patient),
data = dat,
RandDist = "Gamma"
)
frailtypack - I simply don't understand the implementation (or at least its very different from what is taught in the text book). The function requires the specification of knots and a smoother which seem to greatly impact the resulting estimates.
parfm - Only fits parametric models; having said that everytime I tried to use it to fit a weibull proportional hazards model it just errored.
phmm - Have not yet tried
I fully appreciate given the large number of packages that I've gotten through unsuccessfully that it is highly likely that the problem is myself not properly understanding the implementation and miss using the packages. Any help or examples on how to successfully re-create the above estimates though would be greatly appreciated.
Regarding
I am really struggling to find a package that can reliably re-create the results of the second 2 columns.
See the Survival Analysis CRAN task view under Random Effect Models or do a search on R Site Search on e.g., "survival frailty".
I'm trying to include time fixed effects (dummies for years generated with model.matrix) into a PPML regression in R.
Without time fixed effect the regression is:
require(gravity)
my_model <- PPML(y="v", dist="dist",
x=c("land","contig","comlang_ethno",
"smctry","tech","exrate"),
vce_robust=T, data=database)
I've tried to add command fe=c("year") within the PPML function but it doesn't work.
I'd appreciate any help on this.
I would comment on the previous answer but don't have enough reputation. The gravity model in your PPML command specifies v = dist × exp(land + contig + comlang_ethno + smctry + tech + exrate + TimeFE) = exp(log(dist) + land + contig + comlang_ethno + smctry + tech + exrate + TimeFE).
The formula inside of glm should have as its RHS the variables inside the exponential, because it represents the linear predictor produced by the link function (the Poisson default for which is natural log). So in sum, your command should be
glm(v ~ log(dist) + land + contig + comlang_ethno + smctry + tech + exrate + factor(year),
family='quasipoisson')
and in particular, you need to have distance in logs on the RHS (unlike the previous answer).
Just make sure that year is a factor, than you can just use the plain-and-simple glm-function as
glm(y ~ dist + year, family = "quasipoisson")
which gives you the results with year as dummies/fixed effects. The robust SE are then calculated with
lmtest::coeftest(EstimationResults.PPML, vcov=sandwich::vcovHC(model.PPML, "HC1"))
The PPML function does nothing more, it just isn't very flexible.
Alternatively to PPML and glm, you can also solve your problem using the function femlm (from package FENmlm) which deals with fixed-effect estimation for maximum likelihood models.
The two main advantages of function femlm are:
you can add as many fixed-effects as you want, and they are dealt with separately leading to computing times without comparison to glm (especially when fixed-effects contain many categories)
standard-errors can be clustered with intuitive commands
Here's an example regarding your problem (with just two variables and the year fixed-effects):
library(FENmlm)
# (default family is Poisson, 'pipe' separates variables from fixed-effects)
res = femlm(v ~ log(dist) + land | year, base)
summary(res, se = "cluster")
This code estimates the coefficients of variables log(dist) and land with year fixed-effects; then it displays the coefficients table with clustered standard-errors (w.r.t. year) for the two variables.
Going beyond your initial question, now assume you have a more complex case with three fixed-effects: country_i, country_j and year. You'd write:
res = femlm(v ~ log(dist) + land | country_i + country_j + year, base)
You can then easily play around with clustered standard-errors:
# Cluster w.r.t. country_i (default is first cluster encountered):
summary(res, se = "cluster")
summary(res, se = "cluster", cluster = "year") # cluster w.r.t. year cluster
# Two-way clustering:
summary(res, se = "twoway") # two-way clustering w.r.t. country_i & country_j
# two way clustering w.r.t. country_i & year:
summary(res, se = "twoway", cluster = c("country_i", "year"))
For more information on the package, the vignette can be found at https://cran.r-project.org/web/packages/FENmlm/vignettes/FENmlm.html.