I have a datset that looks like:
Treatment Surface ex.time excision antib.time antibiotic inf.time infection
1 0 15 12 0 12 0 12 0
2 0 20 9 0 9 0 9 0
3 0 15 13 0 13 0 7 1
4 0 20 11 1 29 0 29 0
5 0 70 28 1 31 0 4 1
6 0 20 11 0 11 0 8 1
he variables represented in the dataset are as follows:
Observation number
Treatment
0-routine bathing 1-Body cleansing
Surface
Percentage of total surface area burned
Exis.time
Time to excision or on study time
Excision
indicator: 1=yes 0=no
Antib.time
Time to prophylactic antibiotic treatment or on study time
antibiotic
indicator: 1=yes 0=no
inf.time Time to straphylocous aureaus infection or on study time
infection
indicator: 1=yes 0=no
I want to model the time until infection as a function of treatment, surface, time until antibiotic treatment and time until excision. According to other posts this dataset must be transformed from wide to long. However I am not sure how to do it? Then, once the data is in the right format i would use this formula:
coxph(Surv(start, stop, event) ~ m, data=times)
Until now i have run just a normal Cox's regression, but i guess this is not correct because the time dependency is not accounted for?
coxph(formula = Surv(inf.time, infection) ~ Treatment + Surface +
ex.time + antib.time, data = BurnData)
n= 154, number of events= 48
coef exp(coef) se(coef) z Pr(>|z|)
Treatment -0.453748 0.635243 0.300805 -1.508 0.131
Surface 0.006932 1.006956 0.007333 0.945 0.345
ex.time 0.013503 1.013595 0.018841 0.717 0.474
antib.time 0.009546 1.009592 0.009560 0.999 0.318
exp(coef) exp(-coef) lower .95 upper .95
Treatment 0.6352 1.5742 0.3523 1.145
Surface 1.0070 0.9931 0.9926 1.022
ex.time 1.0136 0.9866 0.9768 1.052
antib.time 1.0096 0.9905 0.9909 1.029
Concordance= 0.576 (se = 0.046 )
Rsquare= 0.041 (max possible= 0.942 )
Likelihood ratio test= 6.5 on 4 df, p=0.1648
Wald test = 6.55 on 4 df, p=0.1618
Score (logrank) test = 6.71 on 4 df, p=0.1519
Related
I have a multinomial logit model with two individual specific variables (first and age).
I would like to conduct the hmftest to check if the IIA holds.
My dataset looks like this:
head(df)
mode choice first age
1 both 1 0 24
2 pre 1 1 23
3 both 1 2 53
4 post 1 3 43
5 no 1 1 55
6 both 1 2 63
I adjusted it for the mlogit to:
mode choice first age idx
1 TRUE 1 0 24 1:both
2 FALSE 1 0 24 1:no
3 FALSE 1 0 24 1:post
4 FALSE 1 0 24 1:pre
5 FALSE 1 1 23 2:both
6 FALSE 1 1 23 2:no
7 FALSE 1 1 23 2:post
8 TRUE 1 1 23 2:pre
9 TRUE 1 2 53 3:both
10 FALSE 1 2 53 3:no
~~~ indexes ~~~~
id1 id2
1 1 both
2 1 no
3 1 post
4 1 pre
5 2 both
6 2 no
7 2 post
8 2 pre
9 3 both
10 3 no
indexes: 1, 2
My original (full) model runs as follows:
full <- mlogit(mode ~ 0 | first + age, data = df_mlogit, reflevel = "no")
leading to the following result:
Call:
mlogit(formula = mode ~ 0 | first + age, data = df_mlogit, reflevel = "no",
method = "nr")
Frequencies of alternatives:choice
no both post pre
0.2 0.4 0.2 0.2
nr method
18 iterations, 0h:0m:0s
g'(-H)^-1g = 8.11E-07
gradient close to zero
Coefficients :
Estimate Std. Error z-value Pr(>|z|)
(Intercept):both 2.0077e+01 1.0441e+04 0.0019 0.9985
(Intercept):post -4.1283e-01 1.4771e+04 0.0000 1.0000
(Intercept):pre 5.3346e-01 1.4690e+04 0.0000 1.0000
first1:both -4.0237e+01 1.1059e+04 -0.0036 0.9971
first1:post -8.9168e-01 1.4771e+04 -0.0001 1.0000
first1:pre -6.6805e-01 1.4690e+04 0.0000 1.0000
first2:both -1.9674e+01 1.0441e+04 -0.0019 0.9985
first2:post -1.8975e+01 1.5683e+04 -0.0012 0.9990
first2:pre -1.8889e+01 1.5601e+04 -0.0012 0.9990
first3:both -2.1185e+01 1.1896e+04 -0.0018 0.9986
first3:post 1.9200e+01 1.5315e+04 0.0013 0.9990
first3:pre 1.9218e+01 1.5237e+04 0.0013 0.9990
age:both 2.1898e-02 2.9396e-02 0.7449 0.4563
age:post 9.3377e-03 2.3157e-02 0.4032 0.6868
age:pre -1.2338e-02 2.2812e-02 -0.5408 0.5886
Log-Likelihood: -61.044
McFadden R^2: 0.54178
Likelihood ratio test : chisq = 144.35 (p.value = < 2.22e-16)
To test for IIA, I exclude one alternative from the model (here "pre") and run the model as follows:
part <- mlogit(mode ~ 0 | first + age, data = df_mlogit, reflevel = "no",
alt.subset = c("no", "post", "both"))
leading to
Call:
mlogit(formula = mode ~ 0 | first + age, data = df_mlogit, alt.subset = c("no",
"post", "both"), reflevel = "no", method = "nr")
Frequencies of alternatives:choice
no both post
0.25 0.50 0.25
nr method
18 iterations, 0h:0m:0s
g'(-H)^-1g = 6.88E-07
gradient close to zero
Coefficients :
Estimate Std. Error z-value Pr(>|z|)
(Intercept):both 1.9136e+01 6.5223e+03 0.0029 0.9977
(Intercept):post -9.2040e-01 9.2734e+03 -0.0001 0.9999
first1:both -3.9410e+01 7.5835e+03 -0.0052 0.9959
first1:post -9.3119e-01 9.2734e+03 -0.0001 0.9999
first2:both -1.8733e+01 6.5223e+03 -0.0029 0.9977
first2:post -1.8094e+01 9.8569e+03 -0.0018 0.9985
first3:both -2.0191e+01 1.1049e+04 -0.0018 0.9985
first3:post 2.0119e+01 1.1188e+04 0.0018 0.9986
age:both 2.1898e-02 2.9396e-02 0.7449 0.4563
age:post 1.9879e-02 2.7872e-02 0.7132 0.4757
Log-Likelihood: -27.325
McFadden R^2: 0.67149
Likelihood ratio test : chisq = 111.71 (p.value = < 2.22e-16)
However when I want to codnuct the hmftest then the following error occurs:
> hmftest(full, part)
Error in solve.default(diff.var) :
system is computationally singular: reciprocal condition number = 4.34252e-21
Does anyone have an idea where the problem might be?
I believe the issue here could be that the hmftest checks if the probability ratio of two alternatives depends only on the characteristics of these alternatives.
Since there are only individual-level variables here, the test won't work in this case.
I have some data (df):
inter out time int
0 1 21 0
0 0 32 0
0 1 44 0
0 0 59 0
0 1 88 0
0 1 111 0
0 0 54 0
1 0 63 63
1 1 73 73
1 1 83 83
1 0 93 93
1 1 52 52
1 0 33 33
1 1 10 10
And I run a glm model:
m <- glm(out ~ inter + time + int, data = df, family = binomial(link = "logit"))
The model coefficients are:
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -1.00916 1.82200 -0.554 0.580
inter 2.00906 2.64959 0.758 0.448
time 0.02293 0.03010 0.762 0.446
int -0.03502 0.04215 -0.831 0.406
I want to get the marginal effects, which according to my understanding is the predicted probabilities at certain levels holding other variables constant, which in this case is 0 vs. 1 for my binary predictor, 'inter'. If this in incorrect, please let me know. According to [https://rdrr.io/cran/ggeffects/man/ggpredict.html], "All remaining covariates that are not specified in terms are held constant (see 'Details')". The mean of time and int are 58.29 and 29.07, respectively, so the formula to get the predicted probabilities is:
Level 0:
sum = -1.00916 + (.02293 * 58.29) + (-.03502 * 29.07)
sume = exp(sum)
sumee <- sume/(1+sume)
sumee = 0.33
Level 1:
sum = -1.00916 + 2.00906 + (.02293 * 58.29) + (-.03502 * 29.07)
sume = exp(sum)
sumee <- sume/(1+sume)
sumee = 0.79
The predicted probability holding other variables constant is 0.79 for level 1 compared to 0.33 for level 0, which is exactly what these ggpredict statements produce:
ggpredict(m, terms = c("inter", "time [mean]"))
ggpredict(m, terms = c("inter"))
However, when I specify 'int' at the mean with "int [mean]", it produces different results:
ggpredict(m, terms = c("inter", "time [mean]", "int [mean]"))
ggpredict(m, terms = c("inter", "int [mean]"))
It says level 0 has a predicted probability of 0.19, compared to 0.64 for level 1. Why? Shouldn't all four commands produce the same results since r automatically calculates covariates at the mean? Using other functions for 'int', such as min and max ggpredict(m, terms = c("inter", "time [mean]", "int [min]")) produces predictable results based on the formula.
Im trying to perform logisting regression for generalized estimated equations (GEE). I use the Ideo_Dich(consists of 0 and 1) as my response variable with:
sample2$Ideo_Dich <- ordered(factor(sample2$Ideo_Dich))
library (multgee)
nomLORgee(Ideo_Dich~ square+round, data=sample2,
id= Politician_ID,repeated=Country_ID)
but I get the following error message:
The response variable should have more than 2 categories
My sample dataset is like:
Politician_ID Country_ID Ideo_Dich round square
<int> <int> <ord> <dbl> <dbl>
1 3917 1 0 0.374 -0.486
2 3921 1 0 0.682 -0.580
3 3931 1 0 0.463 -0.801
4 3932 1 0 0.00806 -0.296
5 3935 1 0 -0.250 -0.485
6 3936 1 0 0.814 -0.684
7 3937 1 0 -0.0876 -0.421
8 3942 1 0 0.630 -0.738
9 3944 1 0 0.0779 -0.499
10 3945 1 0 0.549 -1.30
As Im new to the regression methodologies I would like some guidance on this.
GEE are population average models. You need to specify only a single ID in the model. If you can only choose politicians or county as the ID and a correlation structure.
sample2$Ideo_Dich <- factor(sample2$Ideo_Dich)
library(geepack)
Ideo_Dich~ geeglm(square+round, data=sample2,id= Politician_ID, corstr="exchengable")
summary(Ideo_Dich)
The issue is that your response variable is binary and not categorical. This particular model (nomLORgee) is for multinomial GEE regression. Thus, gee or geepack packages and binomial GEE regression should suffice. The above is a good example of this.
I'm trying to compare the prevalence of a specific lesion (binary) at the symptomatic side to the asymptomatic side within a group of patients.
I've already performed a McNemar test to compare the prevalence at the symptomatic versus asymptomatic side within patients.
However, I'm asked to perform also a conditional logistic regression. I'm not sure if my syntax is correct with respect to the stratification:
summary(clogit(ds$symp ~ ds$asymp, strata(ds$ID), data=ds, method = "exact"))
Does R compare both sides of the patient (symptomatic vs asymptomatic) within the patient(s)? Or do I have to duplicate manually the patient ID (one ID for the symptomatic side AND one ID for the asymptomatic side)?
Thanks,
An example:
ID symp asymp
1 0 0
2 1 0
3 0 0
4 0 0
5 1 0
6 1 1
7 0 0
8 0 0
9 0 1
10 0 0
As an example: patient 2 has a lesion at the symptomatic side and patient 9 only at the asymptomatic side. Patients 6 at both sides.
A Exact McNemar test showes:
test <- table(df$symp, df$asymp)
compare <- exact2x2(test, paired = TRUE, alternative = "two.sided", tsmethod = "central")
print(compare)
Exact McNemar test (with central confidence intervals)
data: test
b = 1, c = 2, p-value = 1
alternative hypothesis: true odds ratio is not equal to 1
95 percent confidence interval:
0.00847498 9.60452988
sample estimates:
odds ratio
0.5
However, a conditional logistic regression model:
> summary(clogit(df$symp ~ df$asymp, strata(df$ID), data=df, method = "exact"))
Call:
coxph(formula = Surv(rep(1, 10L), df$symp) ~ df$asymp, data = df,
method = "exact")
n= 10, number of events= 3
coef exp(coef) se(coef) z Pr(>|z|)
df$symp 0.973 2.646 1.524 0.638 0.523
exp(coef) exp(-coef) lower .95 upper .95
df$asymp 2.646 0.378 0.1334 52.46
Rsquare= 0.039 (max possible= 0.616 )
Likelihood ratio test= 0.4 on 1 df, p=0.528
Wald test = 0.41 on 1 df, p=0.5232
Score (logrank) test = 0.43 on 1 df, p=0.5127
I am using the survey package to analyse a longitudinal database. The data looks like
personid spellid long.w Dur rc sex 1 10 age
1 1 278 6.4702295519 0 0 47 20 16
2 1 203 2.8175129012 1 1 126 87 62
3 1 398 6.1956669321 0 0 180 6 37
4 1 139 7.2791061847 1 0 104 192 20
7 1 10 3.6617503439 1 0 18 24 25
8 1 3 2.265464682 0 1 168 136 40
9 1 134 6.3180994022 0 1 116 194 35
10 1 272 6.9167936912 0 0 39 119 45
11 1 296 5.354798213 1 1 193 161 62
After the variable SEX I have 10 bootstrap weights, then the variable Age.
The longitudinal weight is given in the column long.w
I am using the following code.
data.1 <- read.table("Panel.csv", sep = ",",header=T)
library(survey)
library(survival)
#### Unweigthed model
mod.1 <- summary(coxph(Surv(Dur, rc) ~ age + sex, data.1))
mod.1
coxph(formula = Surv(Dur, rc) ~ age + sex, data = data.1)
n= 36, number of events= 14
coef exp(coef) se(coef) z Pr(>|z|)
age -4.992e-06 1.000e+00 2.291e-02 0.000 1.000
sex 5.277e-01 1.695e+00 5.750e-01 0.918 0.359
exp(coef) exp(-coef) lower .95 upper .95
age 1.000 1.00 0.9561 1.046
sex 1.695 0.59 0.5492 5.232
Concordance= 0.651 (se = 0.095 )
Rsquare= 0.024 (max possible= 0.858 )
### --- Weights
weights <- data.1[,7:16]*data.1$long.w
panel <-svrepdesign(data=data.1,
weights=data.1[,3],
type="BRR",
repweights=weights,
combined.weights=TRUE
)
#### Weighted model
mod.1.w <- svycoxph(Surv(Dur,rc)~ age+ sex ,design=panel)
summary(mod.1.w)
Balanced Repeated Replicates with 10 replicates.
Call:
svycoxph.svyrep.design(formula = Surv(Dur, rc) ~ age + sex, design = panel)
n= 36, number of events= 14
coef exp(coef) se(coef) z Pr(>|z|)
age 0.0198 1.0200 0.0131 1.512 0.131
sex 1.0681 2.9098 0.2336 4.572 4.84e-06 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
exp(coef) exp(-coef) lower .95 upper .95
age 1.02 0.9804 0.9941 1.047
sex 2.91 0.3437 1.8407 4.600
Concordance= 0.75 (se = 0.677 )
Rsquare= NA (max possible= NA )
Likelihood ratio test= NA on 2 df, p=NA
Wald test = 28.69 on 2 df, p=5.875e-07
Score (logrank) test = NA on 2 df, p=NA
### ----
> panel.2 <-svrepdesign(data=data.1,
+ weights=data.1[,3],
+ type="BRR",
+ repweights=data.1[,7:16],
+ combined.weights=FALSE,
+ )
Warning message:
In svrepdesign.default(data = data.1, weights = data.1[, 3], type = "BRR", :
Data look like combined weights: mean replication weight is 101.291666666667 and mean sampling weight is 203.944444444444
mod.2.w <- svycoxph(Surv(Dur,rc)~ age+ sex ,design=panel.2)
> summary(mod.2.w)
Call: svrepdesign.default(data = data.1, weights = data.1[, 3], type = "BRR",
repweights = data.1[, 7:16], combined.weights = FALSE, )
Balanced Repeated Replicates with 10 replicates.
Call:
svycoxph.svyrep.design(formula = Surv(Dur, rc) ~ age + sex, design = panel.2)
n= 36, number of events= 14
coef exp(coef) se(coef) z Pr(>|z|)
age 0.0198 1.0200 0.0131 1.512 0.131
sex 1.0681 2.9098 0.2336 4.572 4.84e-06 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
exp(coef) exp(-coef) lower .95 upper .95
age 1.02 0.9804 0.9941 1.047
sex 2.91 0.3437 1.8407 4.600
Concordance= 0.75 (se = 0.677 )
Rsquare= NA (max possible= NA )
Likelihood ratio test= NA on 2 df, p=NA
Wald test = 28.69 on 2 df, p=5.875e-07
Score (logrank) test = NA on 2 df, p=NA
The sum of the longitudinal weights is 7,342. The total of events must be around 2,357 and the censored observations a total of 4,985 for a "population" of 7,342 individuals
Do models mod.1.w and mod.2.w take into consideration the longitudinal weights? If the do, why the summary report only n= 36, number of events= 14 ?
The design works well if I take other statistics. For example the mean of Dur in data.1 without considering the sampling design is around 4.9 and 5.31 when I consider svymean(~Dur, panel.2) for example.