Cox hazards model and test for trend - r

If I have a cox proportional hazards model with as a predictor of event a categorical variable (and other covariates such as age etc), let's say for example the categorial variable I am interested in is tumor size which can be 0-10, 10-20, 20-30 for example, and I see there is a trend towards an higher HR of death with the increasing in tumor size, how can I compute in r and get a p?

There is a bit of danger in accepting your incomplete specification since you could have a size classification that will mess this up, as for example: 0-10, 10-20, 20-30, 30-80, 80-120, 120-240.
Unless you have carefully constructed your factor to have correctly ordered ascending levels, what I am about to suggest as a shortcut will fail.
survmdl <- coxph(Surv(tie,event) ~ as.numeric(fact), data=dat)
You will get a "test of trend" which would be interpreted as the per-category increase in the log(Hazard) for rising size and it would be a single coefficient. So post your actual factor levels if you wnat a better considered reply.

Related

Mixed effect model or multiple regressions comparison in nested setup

I have a response Y that is a percentage ranging between 0-1. My data is nested by taxonomy or evolutionary relationship say phylum/genus/family/species and I have one continuous covariate temp and one categorial covariate fac with levels fac1 & fac2.
I am interested in estimating:
is there a difference in Y between fac1 and fac2 (intercept) and how much variance is explained by that
does each level of fac responds differently in regard to temp (linearly so slope)
is there a difference in Y for each level of my taxonomy and how much variance is explained by those (see varcomp)
does each level of my taxonomy responds differently in regard to temp (linearly so slope)
A brute force idea would be to split my data into the lowest taxonomy here species, do a linear beta regression for each species i as betareg(Y(i)~temp) . Then extract slope and intercepts for each speies and group them to a higher taxonomic level per fac and compare the distribution of slopes (intercepts) say, via Kullback-Leibler divergence to a distribution that I get when bootstrapping my Y values. Or compare the distribution of slopes (or interepts) just between taxonomic levels or my factor fac respectively.Or just compare mean slopes and intercepts between taxonomy levels or my factor levels.
Not sure is this is a good idea. And also not sure of how to answer the question of how many variance is explained by my taxonomy level, like in nested random mixed effect models.
Another option may be just those mixed models, but how can I include all the aspects I want to test in one model
say I could use the "gamlss" package to do:
library(gamlss)
model<-gamlss(Y~temp*fac+re(random=~1|phylum/genus/family/species),family=BE)
But here I see no way to incorporate a random slope or can I do:
model<-gamlss(Y~re(random=~temp*fac|phylum/genus/family/species),family=BE)
but the internal call to lme has some trouble with that and guess this is not the right notation anyways.
Is there any way to achive what I want to test, not necessarily with gamlss but any other package that inlcuded nested structures and beta regressions?
Thanks!
In glmmTMB, if you have no exact 0 or 1 values in your response, something like this should work:
library(glmmTMB)
glmmTMB(Y ~ temp*fac + (1 + temp | phylum/genus/family/species),
data = ...,
family = beta_family)
if you have zero values, you will need to do something . For example, you can add a zero-inflation term in glmmTMB; brms can handle zero-one-inflated Beta responses; you can "squeeze" the 0/1 values in a little bit (see the appendix of Smithson and Verkuilen's paper on Beta regression). If you have only a few 0/1 values it won't matter very much what you do. If you have a lot, you'll need to spend some serious time thinking about what they mean, which will influence how you handle them. Do they represent censoring (i.e. values that aren't exactly 0/1 but are too close to the borders to measure the difference)? Are they a qualitatively different response? etc. ...)
As I said in my comment, computing variance components for GLMMs is pretty tricky - there's not necessarily an easy decomposition, e.g. see here. However, you can compute the variances of intercept and slope at each taxonomic level and compare them (and you can use the standard deviations to compare with the magnitudes of the fixed effects ...)
The model given here might be pretty demanding, depending on the size of your phylogeny - for example, you might not have enough replication at the phylum level (in which case you could fit the model ~ temp*(fac + phylum) + (1 + temp | phylum:(genus/family/species)), i.e. pull out the phylum effects as fixed effects).
This is assuming that you're willing to assume that the effects of fac, and its interaction with temp, do not vary across the phylogeny ...

Does the sorting order matter when interpreting beta estimates in a regression model?

Seems like a very basic question but I just wanted to confirm. I'm running a multivariable linear regression model adjusted for different types of covariates (some numeric, some categorical, etc.). A sample of the model is shown below:
fit <- ols(outcome ~ exposure + age + zbmi + income + sex + ethnicity)
Both the "outcome" and "exposure" are continuous numerical variables.
My question is, if say I run the model and the beta estimate, 95% CI, and p-value looks something like the one below:
B = -0.20 // 95%CI: [-0.50, -0.001] // p = 0.04
Would it be appropriate to interpret this as: "For every 1 unit increase of the exposure is a 0.20 decrease in the outcome"?
What I want to know is how did it determine the order of "per 1 unit increase"? Is that just the default order of how R sorts continuous variables when running it in a regression model? Also, since both my outcome and exposure are continuous variables, does this mean that it automatically sorted these variables in ascending order (by default?) when I ran the model?
Just a bit confused on whether this sorting order matters before I run any regression model using continuous variables. Any tips / help would be appreciated!
Under OLS, there is no ordering or sorting of the predictors. The right hand side of the equation is summed before subtracting it from the left hand side. Then the square of this difference is minimized. So with this technique, the predictors do not have to be sorted in any way.
For interpretation of your betas, the predictors are supposed to be independent, so it doesn't matter in which order you take them.
Side note: In reality, you might get some dependence among the predictors, and this will be reflected in the standard errors being slightly larger.

Continuous variable as a random effect? (lme function in R)

The hormone levels are inflated by the sample mass, even after correcting hormone levels by sample mass (its a common problem for endocrinologists).
I'm trying to determine if treatment affect hormone levels, ''correcting'' for sample mass.
lme(hormone levels ~ treatment, random= list(~1|INDIVIDUAL, ~1|sample mass), na.action="na.omit", method = "ML",dados)
However, the reviewer said I cant use continuous variable as a random effect.
What is the alternative?
welcome to stack overflow. This question is probably more appropriate for Cross Validated, as it is more about statistics that coding. I'm going to answer anyway.
The reviewer is correct, you can't have a continuous predictor as a random effect. See some discussion about that here: https://stats.stackexchange.com/questions/105698/how-do-i-enter-a-continuous-variable-as-a-random-effect-in-a-linear-mixed-effect
To directly answer your question, the alternative is to add the predictor sample mass to the model as a fixed-effect, where it will be a covariate. This means the model will take into account both how hormone levels vary by size and the treatment. This is what user63230 has suggested, and I think it is the correct way to move forward. If you have a specific prediction about how the treatment may vary by the sample mass, you could include an interaction. If you expect the treatment to affect different individuals differently, you can fit a random slope for treatment on individual.

how to change R-code to combine cox proportional hazards model with BDLIMs?

I had read the paper "Bayesian distributed lag interaction models to identify perinatal windows of vulnerability in children’s health". I also tried the R-code from the Github website. I also read the other paper "Prenatal fine particulate exposure and early childhood asthma Effect of maternal stress and fetal sex", and the author also applied BDLIMs to accall for effect modifications by OR. I have been looking for how to convert the y-axis to OR in BDLIMs for a long time, but I have not found it. Furthermore, I want to know how to change the label of the y-axis to OR and change the label of the x-axis to gestational weeks. However, I want to use the cox proportional hazards model to calculate HR in my dataset. Simultaneously, we also want to apply the BDLIMs to examine effect modification by sex. Therefore, could I ask you how to change R-code to combine the cox proportional hazards model with BDLIMs?
My data:
The y-axis is a disease (with and without autism (1,0)) The x-axis is weekly atmosphere mercury exposure. I want to investigate the association between exposure and disease by cox proportional hazards model with BDLIMs.
R code:
fit <- bdlim(Y=data$autism,X=lag, Z=data$ID_SEX, G=data$ID_SEX,
inter.model="all", niter=100, nthin=5,
basis.opts = list(df=7, type="bs"),seed=1234)
I hope the y-axis is HR, but the actual output is estimated effect

visreg plot binomial distribution

I'm studying the effect of different predictors (dummy, categorical and continuos variables) on presence of birds, obtained from bird counts at-sea. To do that I used a glmmadmb function and binomial family.
I've plotted the relationship between response variable and predictors in order to asses the model fit and the marginal effect of each predictor. To draw the graphs I used visreg function, specifying the transformation of the vertical axis:
visreg(modelo.bn7, type="conditional", scale="response", ylab= "Bird Presence")
The output graphs showed a confident bands very wide when I used the original scale of the response variable (covering the whole vertical axis). In case of graphs without transformation, confident bands were shorter but they had the same extension in the different levels of dummy variables. Does anyone know how the confidents bands are calculated in binomial distributions? Could it reflect that I have a problem in the estimated coefficients or in the model fit?
The confidence bands are calculated using p-values for binomial distribution... For detailed explanation you can ask on stats.stackexchange.com. If the bands are very wide (and the interpretation of 'wide' is subjective and mostly based on what is your goal) then it shows that your estimates may not be very accurate. High p-values usually are due to small or insufficient number of observations used for building the model. If the number of observations are large, then it does indicate a poor fit.

Resources