Firth's Penalised Logistic Regression - high chi-squared values - r

I am analysing a household budget survey dataset with the aim of analysing whether households who spend more on alcohol also spend more on other discretionary items such as restaurants and entertainment (large sample size over 200,000).
Given the large number of households reporting zero expenditure for each of these items, I had non-normally distributed errors in my linear regression model and therefore used a logistic regression.
When I ran my logistic regression I came across quasi-complete separation. Based on an analysis of the literature, it seems a Firth’s penalised logistic regression was the most appropriate:
Regression <- logistf(restaurant_spender ~ alc_spender + income_quintiles + eduation_hh, data = alcohol, weights = weight, firth=FALSE)
Where:
restaurant_spender is binary (=1 if they spend anything on restaurants and 0 otherwise)
alc_spender same as above but for alcohol
income_quintiles is a categorical variable separating households into one of five income quintiles
education_hh is a categorical variable indicating the highest level of education for the household head.
And to get the odds ratios:
exp(coef(Regression))
This produces odds ratio I would expect and my confidence intervals make sense. However, my Chi-squared values are all infinite.
I have tabbed all of my independent variables against my dependent variable and there are no categories with 0 (in fact, they are evenly distributed).
My questions are:
1) Am I doing anything obviously wrong in running a Firth’s penalised logistic regression in R?
2) Are infinite chi-squared values implausible?
3) Is there some other way in R to test why I am getting quasi-separation apart from tabbing independent and dependent variables?
Any help would be greatly appreciated.

Related

Justifying need for zero-inflated model in GLMMs

I am using GLMMs in R to examine the influence of continuous predictor variables (x) on biological counts variable (y). My response variables (n=5) each have a high number of zeros (data distribution), so I have tested the fit of various distributions (genpois, poisson, nbinom1, nbinom2, zip, zinb1, zinb2) and selected the best fit one according to the lowest AIC/LogLik value.
According to this selection criteria, three of my response variables with the highest number of zeros are best fit to the zero inflated negative binomial (zinb2) distribution. Compared to the regular NB distribution (non-zero inflated), the delta AIC is between 30-150.
My question is: must I use the ZI models for these variables considering the dAIC? I have received advice from a statistician that if dAIC is small enough between the ZI and non-ZI model, use the non-ZI model even if it is marginally worse fit since ZI models involve much more complicated modelling & interpretation. The distribution matters in this case because ZINB / NB models select a different combination of top candidate models when testing my predictors.
Thank you for any clarification!

How to specify icc_pre_subject and var_ratio in study_parameters function (powerlmm package)?

I am trying to conduct a power analysis for studies where I use Linear Mixed Model for the analysis. I conducted a pilot study in order to see the effect sizes of the fixed effects and to see the results of random effects, which are required to fill in in a R function - study_parametor().
First, I build a lmer model using the data from the pilot study. In the model, the reaction time for the stimuli is set as the dependent variable, and the experimental condition (with 2levels), the number of trials (from 0 to 159, code as numeric values) as well as the interaction between the condition and the number of trials are included as fixed factors. The experimental condition is a between-subject condition, but the number of trials is within-subject factor - all participants go through the trials from 0 to 159. For random effects, I set the random intercept and slope for participants, and random intercept for beauty rating for each item (as a control factor). Together, the model looks like:
lmer(Reaction time ~ Condition*Number of trial + (1 + Number of trial|Subject) + (1|Beautyrating))
For power analysis I want to use a function study_parametor() in powerlmm package. In this function, we have to specify icc_pre_subject and var_ratio as the parametors for random effect variance information. What I want to do here is, to set the parametors based on the results of the pilot study.
From the tutorial, the two variables are defined as follows:
icc_pre_subject: the amount of the total baseline variance the is between-subjects. (there is a typo in the sentence in the tutorial). icc_pre_subject would be the 2-level ICC if there was no random slopes.
icc_pre_subject = var(subject_intercepts)/(var(subject_intercepts) + var(within-subject_error))
var_ratio: the ratio of total random slope variance over the level-1 residual variance.
var_ratio = var(subject_slopes)/var(within-subject_error))
Here, I am not sure what var(within-subject_error)) means, and how to specify it.
This is the results of random effects in the model which used the pilot study data
My question
which number should I use for specify the icc_pre_subject and var_ratio in the function of study_parametor()

how to calculate heritability from half-sib design

I'm trying to measure heritability of a trait, flowering time (FT), for a set of data collected from a half-sib design. The data includes FT for each mother plant and 2 half siblings from that mother plant for ~150 different maternal lines (ML). Paternity is unknown.
I've tried:
Estimating heritability with a regression of the maternal FT and the mean sibling FT, and doubling the slope. This worked fine, and produced an estimate of 0.14.
Running an ANOVA and using the between ML variation to estimate additive genetic variance. Got the idea from slide 25 of this powerpoint and from this thread on within and between variance calculation
fit = lm(FT ~ ML, her)
anova(fit)
her is the dataset, which, in this case, only includes the half sib FT values (I excluded the mother FT values for this attempt at heritability)
From the ANOVA output I have have used the "ML" term mean square as the between ML variation, which is also equal to 1/4 of the additive genetic variance because coefficient of relatedness between half-sibs is 0.25. This value turned out to be 0.098. Also, by multiplying this by 4 I could get the additive genetic variance.
I have used the "residuals" mean square as all variability save for that accounted for by the "ML" term. So, all of variance minus 1/4 of additive genetic variance. This turned out to be 1.342.
And then attempted to calculate heritabilty as Va/Vp = (4*0.098)/(1.342 + 0.098) = 0.39
This is quite different from my slope estimate, and I'm not sure if my reasoning is correct.
I've tried things with the sommer and heritability packages of R but haven't had success using either for a half-sib design and haven't found an example of a half-sib design with either package.
Any suggestions?

interactions in logistical regression R

I am struggling to interpret the results of a binomial logistic regression I did. The experiment has 4 conditions, in each condition all participants receive different version of treatment.
DVs (1 per condition)=DE01,DE02,DE03,DE04, all binary (1 - participants take a spec. decision, 0 - don't)
Predictors: FTFinal (continuous, a freedom threat scale)
SRFinal (continuous, situational reactance scale)
TRFinal (continuous, trait reactance scale)
SVO_Type(binary, egoists=1, altruists=0)
After running the binomial (logit) models, I ended up with the following:see output. Initially I tested 2 models per condition, when condition 2 (DE02 as a DV) got my attention. In model(3)There are two variables, which are significant predictors of DE02 (taking a decision or not) - FTFinal and SVO Type. In context, the values for model (3) would mean that all else equal, being an Egoist (SVO_Type 1) decreases the (log)likelihood of taking a decision in comparison to being an altruist. Also, higher scores on FTFinal(freedom threat) increase the likelihood of taking the decision. So far so good. Removing SVO_Type from the regression (model 4) made the FTFinal coefficient non-significant. Removing FTFinal from the model does not change the significance of SVO_Type.
So I figured:ok, mediaiton, perhaps, or moderation.
I tried all models both in R and SPSS, and entering an interaction term SVO_Type*FTFinal makes !all variables in model(2) non-significant. I followed this "http: //www.nrhpsych.com/mediation/logmed.html" mediation procedure for logistic regression, but found no mediation. (sorry for link, not allowd to post more than 2, remove space after http:) To sum up: Predicting DE02 from SVO_Type only is not significant.Predicting DE02 from FTFinal is not significantPutitng those two in the regression makes them significant predictors
code and summaries here
Including an interaction between these both in a model makes all coefficients insignificant.
So I am at a total loss: As far as I know, to test moderation, you need an interaction term. This term is between a categorical var (SVO_Type) and the continuous one(FTFinal), perhaps that goes wrong?
And to test mediation outside SPSS, I tried the "mediate" package (sorry I am a noob, so I am allowed max 3 links per post), only to discover that there is a "treatment" argument in the funciton, which is to be the treatment variable (exp Vs cntrl). I don't have such, all ppns are subjected to different versions of the same treatment.
Any help will be appreciated.

R: cox regression power?

I am trying to find out, whether I have enough power to conduct a Cox proportional hazards model using R.
My predictor is intelligence and my outcome is cancer. My sample size is 22,000 and I predict 600 cancer cases over 25 years.
Furthermore, I would like to know, whether I have enough power to test gender interactions, if half of my sample if female and 200 of the 600 cancer cases are.
I predict a hazard ratios of at least 1.3.
The functions that I have found in R all require a binary predictor, but intelligence is metric, and I could not find any functions for that.

Resources