I am learning to work with apply family functions and R loops.
I am working with a basic data set table that has y (outcome variable) column and x (predictor variable) column with 100 rows.
I have already used the lm() function to run a regression for the data.
Model.1<-lm(y~x, data = data)
Coefficients:
(Intercept) x
13.87 4.89
summary(Model.1)
Residuals:
Min 1Q Median 3Q Max
-4.1770 -1.7005 -0.0011 1.5625 6.4893
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 13.87039 0.95625 14.51 <2e-16 ***
x 4.88956 0.09339 52.35 <2e-16 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 2.195 on 98 degrees of freedom
Multiple R-squared: 0.9655, Adjusted R-squared: 0.9651
F-statistic: 2741 on 1 and 98 DF, p-value: < 2.2e-16
anova(Model.1)
Analysis of Variance Table
Response: y
Df Sum Sq Mean Sq F value Pr(>F)
x 1 13202 13202.5 2740.9 < 2.2e-16 ***
Residuals 98 472 4.8
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
attributes(Model.1)
$names
[1] "coefficients" "residuals" "effects" "rank" "fitted.values" "assign" "qr" "df.residual"
[9] "xlevels" "call" "terms" "model"
$class
[1] "lm"
I know want to randomly sample 100 observations from my table "y" and "x" table. This is the function I created to run the random sample with replacement
draw_100<-function(){
random_100=sample(data, 100, replace = TRUE)
}
Running random_100 gives me these outputs
random_100
x x.1 y
1 8.112187 8.112187 53.69602
2 8.403589 8.403589 53.79438
3 9.541786 9.541786 58.48542
4 8.989281 8.989281 57.08601
5 6.965905 6.965905 46.62331
6 10.167800 10.167800 63.91487
7 10.683152 10.683152 65.84915
8 10.703093 10.703093 66.24738
9 8.337231 8.337231 51.87687
10 13.106177 13.106177 75.94588
11 10.726036 10.726036 65.19384
12 8.601641 8.601641 51.95095
13 10.338696 10.338696 62.92599
14 5.771682 5.771682 42.14190
15 6.161545 6.161545 46.36998
16 9.874543 9.874543 63.67148
17 8.540996 8.540996 58.85341
18 9.866002 9.866002 63.26319
19 8.622546 8.622546 57.05820
20 9.539929 9.539929 64.76654
21 9.498090 9.498090 61.38521
22 8.206142 8.206142 53.43508
23 8.245825 8.245825 58.29646
24 12.192542 12.192542 76.17440
25 6.955028 6.955028 49.73094
26 10.237639 10.237639 65.71210
27 10.927818 10.927818 67.18048
28 8.536011 8.536011 52.97402
29 9.574403 9.574403 60.53908
30 9.507752 9.507752 58.40020
31 5.838214 5.838214 41.93612
32 10.702791 10.702791 64.54986
33 6.704084 6.704084 46.88057
34 12.914798 12.914798 78.99422
35 16.607947 16.607947 96.60247
36 8.334241 8.334241 55.32263
37 12.287914 12.287914 71.46411
38 11.214098 11.214098 68.53254
39 7.722161 7.722161 50.81632
40 14.065276 14.065276 80.31033
41 10.402173 10.402173 64.36506
42 10.984727 10.984727 64.25032
43 8.491214 8.491214 58.36475
44 9.120864 9.120864 61.24240
45 10.251654 10.251654 60.56177
46 4.497277 4.497277 33.20243
47 11.384417 11.384417 68.61502
48 14.033980 14.033980 83.95417
49 9.909422 9.909422 62.27733
50 8.692219 8.692219 55.73567
51 12.864750 12.864750 79.08818
52 9.886267 9.886267 65.87693
53 10.457541 10.457541 61.36505
54 13.395296 13.395296 76.01832
55 10.343134 10.343134 60.84247
56 10.233329 10.233329 65.12074
57 10.756491 10.756491 70.05930
58 9.287774 9.287774 57.65071
59 11.704419 11.704419 72.65211
60 13.075236 13.075236 77.87956
61 12.066161 12.066161 69.34647
62 10.044714 10.044714 65.80648
63 13.331926 13.331926 80.72634
64 10.816099 10.816099 67.11356
65 10.377846 10.377846 63.14035
66 11.824583 11.824583 67.51041
67 7.114326 7.114326 51.80456
68 9.752344 9.752344 59.36107
69 10.869720 10.869720 67.97186
70 10.366262 10.366262 66.28012
71 10.656127 10.656127 67.86625
72 6.246312 6.246312 45.95457
73 8.003875 8.003875 49.29802
74 11.541176 11.541176 67.89918
75 11.799510 11.799510 73.15802
76 9.787112 9.787112 62.90187
77 13.187445 13.187445 80.26162
78 13.019787 13.019787 75.69156
79 3.854378 3.854378 35.82556
80 11.724234 11.724234 71.79034
81 6.953864 6.953864 45.72355
82 12.822231 12.822231 76.93698
83 9.285428 9.285428 59.61610
84 10.259240 10.259240 62.37958
85 10.613086 10.613086 63.91694
86 8.547155 8.547155 54.72216
87 15.069100 15.069100 86.23767
88 7.816772 7.816772 51.41676
89 13.854272 13.854272 88.10100
90 9.495968 9.495968 61.61393
91 9.881453 9.881453 65.24259
92 7.475875 7.475875 50.80777
93 13.286219 13.286219 81.15708
94 9.703433 9.703433 60.75532
95 5.415999 5.415999 42.55981
96 12.997555 12.997555 78.12987
97 11.893787 11.893787 68.97691
98 5.228217 5.228217 37.38417
99 8.392504 8.392504 54.81151
100 8.077527 8.077527 51.47045
I am having an road block, using this new random sample of 100 values and fitting a regression model to it to extract the coefficient and standard error?
I thought I may need to use the supply() function but I truly believe I am overthinking this. Because when I the regression model with my R object with the store random sample and it was identical to Model.1. I am off.
Model.2<-lm(y~x, data = random_100)
Call:
lm(formula = y ~ x, data = random_100)
Coefficients:
(Intercept) x
13.87 4.89
Coefficient and slop were identical to Model.1
Call:
lm(formula = y ~ x, data = random_100)
Residuals:
Min 1Q Median 3Q Max
-4.1770 -1.7005 -0.0011 1.5625 6.4893
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 13.87039 0.95625 14.51 <2e-16 ***
x 4.88956 0.09339 52.35 <2e-16 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 2.195 on 98 degrees of freedom
Multiple R-squared: 0.9655, Adjusted R-squared: 0.9651
F-statistic: 2741 on 1 and 98 DF, p-value: < 2.2e-16
Related
I want to look at the coefficient estimates for all of my different sales teams. I have 20 teams listed in a "Teams" column and about 24 observations for each. However, when I run my regression, I am only seeing 15 of the 20 teams in my model summary. I want to see all of them, any thoughts?
Here is my code and output:
(log_teams <- lm(Worked ~ Team+Activity+Presented+Confirmed+Jobs_Filled+Converted, data = df))%>%
summary
Output:
Call:
lm(formula = Worked ~ Team + Activity + Presented + Confirmed +
Jobs_Filled + Converted, data = WBY)
Residuals:
Min 1Q Median 3Q Max
-4.4035 -1.0048 0.0000 0.8774 5.1677
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 2.609486 1.869903 1.396 0.1699
TeamCRW 0.110828 1.908735 0.058 0.9540
TeamEMW -1.068797 2.767863 -0.386 0.7013
TeamGSW -0.424508 2.795353 -0.152 0.8800
TeamNS2 -1.234508 2.388392 -0.517 0.6078
TeamNUW -1.458735 2.083549 -0.700 0.4875
TeamOBW 3.224057 2.103054 1.533 0.1324
TeamORT -0.432185 1.884824 -0.229 0.8197
TeamPC1 4.338479 2.115219 2.051 0.0462 *
TeamPC2 -1.002268 2.227166 -0.450 0.6549
TeamPDW 2.560784 2.791501 0.917 0.3640
TeamPLW 1.381216 2.151150 0.642 0.5242
TeamPYW -1.074374 2.799772 -0.384 0.7030
TeamSB2 -0.646769 2.288132 -0.283 0.7788
TeamSYW 2.252061 1.833820 1.228 0.2259
TeamWMO 0.857452 2.302522 0.372 0.7114
Activity -0.000627 0.002906 -0.216 0.8302
Presented 0.162181 0.331876 0.489 0.6275
Confirmed -0.242462 0.317139 -0.765 0.4486
Jobs_Filled -0.025657 0.016099 -1.594 0.1182
Converted 0.006213 0.002610 2.381 0.0217 *
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 2.247 on 44 degrees of freedom
(427 observations deleted due to missingness)
Multiple R-squared: 0.5217, Adjusted R-squared: 0.3043
F-statistic: 2.4 on 20 and 44 DF, p-value: 0.007786*
I came across a strange behavior of my R console, although it is probably me who made it do something strange rather than a bug.
When capturing the output of a model (let's say a regression model as in the example below) and saving it to a data frame to make it amendable, and when subsetting a certain amount of rows I am interested in, I will get different results depending on the window width of my console in RStudio.
This would be a minimal working example:
> x <- c(1, 1, 0.5, 0.5, 0, 0)
> y <- c(0, 0, 0.5, 0.5, 1, 1)
>
> model <- lm(y ~ x) # Basic regression model
> output <- summary(model) # Saving the summary
> output
Call:
lm(formula = y ~ x)
Residuals:
1 2 3 4 5 6
2.725e-16 -2.477e-16 -2.484e-17 -2.484e-17 1.242e-17 1.242e-17
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 1.000e+00 1.195e-16 8.367e+15 <2e-16 ***
x -1.000e+00 1.852e-16 -5.401e+15 <2e-16 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 1.852e-16 on 4 degrees of freedom
Multiple R-squared: 1, Adjusted R-squared: 1
F-statistic: 2.917e+31 on 1 and 4 DF, p-value: < 2.2e-16
> op2 <- capture.output(output) # Capturing the the summary to make it amendable
> op2 <- data.frame(op2) # Saving the ammendable summary as data frame
> op2
op2
1
2 Call:
3 lm(formula = y ~ x)
4
5 Residuals:
6 1 2 3 4 5 6
7 2.725e-16 -2.477e-16 -2.484e-17 -2.484e-17 1.242e-17 1.242e-17
8
9 Coefficients:
10 Estimate Std. Error t value Pr(>|t|)
11 (Intercept) 1.000e+00 1.195e-16 8.367e+15 <2e-16 ***
12 x -1.000e+00 1.852e-16 -5.401e+15 <2e-16 ***
13 ---
14 Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
15
16 Residual standard error: 1.852e-16 on 4 degrees of freedom
17 Multiple R-squared: 1,\tAdjusted R-squared: 1
18 F-statistic: 2.917e+31 on 1 and 4 DF, p-value: < 2.2e-16
19
> op3 <- op2[9:14,] # I'm only interested in rows 9 to 14 of the summary, so I subset them
> op3 # And printing them works just fine
[1] "Coefficients:"
[2] " Estimate Std. Error t value Pr(>|t|) "
[3] "(Intercept) 1.000e+00 1.195e-16 8.367e+15 <2e-16 ***"
[4] "x -1.000e+00 1.852e-16 -5.401e+15 <2e-16 ***"
[5] "---"
[6] "Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1"
Now, if I reduce the width of my console (which I have set on the top right of my RStudio) and run the code again from op2 onwards, I will get a different result because the row numbers of op2 have changed since they seem dependent on the width of the console.
Like this:
> op2 <- capture.output(output)
> op2 <- data.frame(op2)
> op2
op2
1
2 Call:
3 lm(formula = y ~ x)
4
5 Residuals:
6 1 2 3
7 2.725e-16 -2.477e-16 -2.484e-17
8 4 5 6
9 -2.484e-17 1.242e-17 1.242e-17
10
11 Coefficients:
12 Estimate
13 (Intercept) 1.000e+00
14 x -1.000e+00
15 Std. Error
16 (Intercept) 1.195e-16
17 x 1.852e-16
18 t value Pr(>|t|)
19 (Intercept) 8.367e+15 <2e-16
20 x -5.401e+15 <2e-16
21
22 (Intercept) ***
23 x ***
24 ---
25 Signif. codes:
26 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’
27 0.05 ‘.’ 0.1 ‘ ’ 1
28
29 Residual standard error: 1.852e-16 on 4 degrees of freedom
30 Multiple R-squared: 1,\tAdjusted R-squared: 1
31 F-statistic: 2.917e+31 on 1 and 4 DF, p-value: < 2.2e-16
32
> op3 <- op2[9:14,]
> op3
[1] "-2.484e-17 1.242e-17 1.242e-17 "
[2] ""
[3] "Coefficients:"
[4] " Estimate"
[5] "(Intercept) 1.000e+00"
[6] "x -1.000e+00"
Any idea on (i) why the row numbers of op2 are dependent on the width of my console and (ii) how to avoid this?
Many thanks in advance.
I am trying to see if there is a relationship between number of bat calls and the time of pup rearing season. The pup variable has three categories: "Pre", "Middle", and "Post". When I ask for the summary, it only included the p-values for Pre and Post pup production. I created a sample data set below. With the sample data set, I just get an error.... with my actual data set I get the output I described above.
SAMPLE DATA SET:
Calls<- c("55","60","180","160","110","50")
Pup<-c("Pre","Middle","Post","Post","Middle","Pre")
q<-data.frame(Calls, Pup)
q
q1<-lm(Calls~Pup, data=q)
summary(q1)
OUTPUT AND ERROR MESSAGE FROM SAMPLE:
> Calls Pup
1 55 Pre
2 60 Middle
3 180 Post
4 160 Post
5 110 Middle
6 50 Pre
Error in as.character.factor(x) : malformed factor
In addition: Warning message:
In Ops.factor(r, 2) : ‘^’ not meaningful for factors
ACTUAL INPUT FOR MY ANALYSIS:
> pupint <- lm(Calls ~ Pup, data = park2)
summary(pupint)
THIS IS THE OUTPUT I GET FROM MY ACTUAL DATA SET:
Residuals:
Min 1Q Median 3Q Max
-66.40 -37.63 -26.02 -5.39 299.93
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 66.54 35.82 1.858 0.0734 .
PupPost -51.98 48.50 -1.072 0.2927
PupPre -26.47 39.86 -0.664 0.5118
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 80.1 on 29 degrees of freedom
Multiple R-squared: 0.03822, Adjusted R-squared: -0.02811
F-statistic: 0.5762 on 2 and 29 DF, p-value: 0.5683
Overall, just wondering why the above output isn't showing "Middle". Sorry my sample data set didn't work out the same but maybe that error message will help better understand the problem.
For R to correctly understand a dummy variable, you have to indicate Pup is a cualitative (dummy) variable by using factor
> Pup <- factor(Pup)
> q<-data.frame(Calls, Pup)
> q1<-lm(Calls~Pup, data=q)
> summary(q1)
Call:
lm(formula = Calls ~ Pup, data = q)
Residuals:
1 2 3 4 5 6
2.5 -25.0 10.0 -10.0 25.0 -2.5
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 85.00 15.61 5.444 0.0122 *
PupPost 85.00 22.08 3.850 0.0309 *
PupPre -32.50 22.08 -1.472 0.2374
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 22.08 on 3 degrees of freedom
Multiple R-squared: 0.9097, Adjusted R-squared: 0.8494
F-statistic: 15.1 on 2 and 3 DF, p-value: 0.02716
If you want R to show all categories inside the dummy variable, then you must remove the intercept from the regression, otherwise, you will be in variable dummy trap.
summary(lm(Calls~Pup-1, data=q))
Call:
lm(formula = Calls ~ Pup - 1, data = q)
Residuals:
1 2 3 4 5 6
2.5 -25.0 10.0 -10.0 25.0 -2.5
Coefficients:
Estimate Std. Error t value Pr(>|t|)
PupMiddle 85.00 15.61 5.444 0.01217 *
PupPost 170.00 15.61 10.889 0.00166 **
PupPre 52.50 15.61 3.363 0.04365 *
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 22.08 on 3 degrees of freedom
Multiple R-squared: 0.9815, Adjusted R-squared: 0.9631
F-statistic: 53.17 on 3 and 3 DF, p-value: 0.004234
If you include a categorical variable like pup in a regression, then it is including a dummy variable for each value within that variable except for one by default. You could show a coefficient for pupmiddle if you omit instead the intercept coefficient like this:
q1<-lm(Calls~Pup - 1, data=q)
In this experiment, four different diets were tried on animals. Then researchers measured their effects on blood coagulation time.
## Data :
coag diet
1 62 A
2 60 A
3 63 A
4 59 A
5 63 B
6 67 B
7 71 B
8 64 B
9 65 B
10 66 B
11 68 C
12 66 C
13 71 C
14 67 C
15 68 C
16 68 C
17 56 D
18 62 D
19 60 D
20 61 D
21 63 D
22 64 D
23 63 D
24 59 D
I am trying to fit a linear model for coag~diet by using the function lm in R
Results should look like the following:
> modelSummary$coefficients
Estimate Std. Error t value Pr(>|t|)
(Intercept) 6.100000e+01 1.183216 5.155441e+01 9.547815e-23
dietB 5.000000e+00 1.527525 3.273268e+00 3.802505e-03
dietC 7.000000e+00 1.527525 4.582576e+00 1.805132e-04
dietD -1.071287e-14 1.449138 -7.392579e-15 1.000000e+00
My code thus far does not look like results:
coagulation$x1 <- 1*(coagulation$diet=="B")
coagulation$x2 <- 1*(coagulation$diet=="C")
coagulation$x3 <- 1*(coagulation$diet=="D")
modelSummary <- lm(coag~1+x1+x2+x3, data=coagulation)
"diet" is a character variable and is treated as a factor. So you may leave out the dummy coding and just do:
summary(lm(coag ~ diet, data=coagulation))$coefficients
# Estimate Std. Error t value Pr(>|t|)
# (Intercept) 6.100000e+01 1.183216 5.155441e+01 9.547815e-23
# dietB 5.000000e+00 1.527525 3.273268e+00 3.802505e-03
# dietC 7.000000e+00 1.527525 4.582576e+00 1.805132e-04
# dietD 2.991428e-15 1.449138 2.064281e-15 1.000000e+00
Even if "diet" were a numeric variable and you want R to treat it as a categorical rather than a continuous variable no dummy coding is needed, you would just add it as + factor(diet) into the formula.
As you see, also 1 + is redundant since lm calculates the (Intercept) by default. To omit the intercept, you may do 0 + (or - 1).
That presentation is a property of summary(modelSummary) (class summary.lm), not modelSummary (class lm).
summary(modelSummary)$coefficients
# Estimate Std. Error t value Pr(>|t|)
# (Intercept) 6.100000e+01 1.183216 5.155441e+01 9.547815e-23
# x1 5.000000e+00 1.527525 3.273268e+00 3.802505e-03
# x2 7.000000e+00 1.527525 4.582576e+00 1.805132e-04
# x3 2.991428e-15 1.449138 2.064281e-15 1.000000e+00
You may also consider coding diet in this manner
coagulation$diet <- factor(coagulation$diet)
modelSummary<-lm(coag~diet,coagulation)
summary(modelSummary)
Call:
lm(formula = coag ~ diet, data = coagulation)
Residuals:
Min 1Q Median 3Q Max
-5.00 -1.25 0.00 1.25 5.00
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 6.100e+01 1.183e+00 51.554 < 2e-16 ***
dietB 5.000e+00 1.528e+00 3.273 0.003803 **
dietC 7.000e+00 1.528e+00 4.583 0.000181 ***
dietD 2.991e-15 1.449e+00 0.000 1.000000
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
I am using the survey package to analyse a longitudinal database. The data looks like
personid spellid long.w Dur rc sex 1 10 age
1 1 278 6.4702295519 0 0 47 20 16
2 1 203 2.8175129012 1 1 126 87 62
3 1 398 6.1956669321 0 0 180 6 37
4 1 139 7.2791061847 1 0 104 192 20
7 1 10 3.6617503439 1 0 18 24 25
8 1 3 2.265464682 0 1 168 136 40
9 1 134 6.3180994022 0 1 116 194 35
10 1 272 6.9167936912 0 0 39 119 45
11 1 296 5.354798213 1 1 193 161 62
After the variable SEX I have 10 bootstrap weights, then the variable Age.
The longitudinal weight is given in the column long.w
I am using the following code.
data.1 <- read.table("Panel.csv", sep = ",",header=T)
library(survey)
library(survival)
#### Unweigthed model
mod.1 <- summary(coxph(Surv(Dur, rc) ~ age + sex, data.1))
mod.1
coxph(formula = Surv(Dur, rc) ~ age + sex, data = data.1)
n= 36, number of events= 14
coef exp(coef) se(coef) z Pr(>|z|)
age -4.992e-06 1.000e+00 2.291e-02 0.000 1.000
sex 5.277e-01 1.695e+00 5.750e-01 0.918 0.359
exp(coef) exp(-coef) lower .95 upper .95
age 1.000 1.00 0.9561 1.046
sex 1.695 0.59 0.5492 5.232
Concordance= 0.651 (se = 0.095 )
Rsquare= 0.024 (max possible= 0.858 )
### --- Weights
weights <- data.1[,7:16]*data.1$long.w
panel <-svrepdesign(data=data.1,
weights=data.1[,3],
type="BRR",
repweights=weights,
combined.weights=TRUE
)
#### Weighted model
mod.1.w <- svycoxph(Surv(Dur,rc)~ age+ sex ,design=panel)
summary(mod.1.w)
Balanced Repeated Replicates with 10 replicates.
Call:
svycoxph.svyrep.design(formula = Surv(Dur, rc) ~ age + sex, design = panel)
n= 36, number of events= 14
coef exp(coef) se(coef) z Pr(>|z|)
age 0.0198 1.0200 0.0131 1.512 0.131
sex 1.0681 2.9098 0.2336 4.572 4.84e-06 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
exp(coef) exp(-coef) lower .95 upper .95
age 1.02 0.9804 0.9941 1.047
sex 2.91 0.3437 1.8407 4.600
Concordance= 0.75 (se = 0.677 )
Rsquare= NA (max possible= NA )
Likelihood ratio test= NA on 2 df, p=NA
Wald test = 28.69 on 2 df, p=5.875e-07
Score (logrank) test = NA on 2 df, p=NA
### ----
> panel.2 <-svrepdesign(data=data.1,
+ weights=data.1[,3],
+ type="BRR",
+ repweights=data.1[,7:16],
+ combined.weights=FALSE,
+ )
Warning message:
In svrepdesign.default(data = data.1, weights = data.1[, 3], type = "BRR", :
Data look like combined weights: mean replication weight is 101.291666666667 and mean sampling weight is 203.944444444444
mod.2.w <- svycoxph(Surv(Dur,rc)~ age+ sex ,design=panel.2)
> summary(mod.2.w)
Call: svrepdesign.default(data = data.1, weights = data.1[, 3], type = "BRR",
repweights = data.1[, 7:16], combined.weights = FALSE, )
Balanced Repeated Replicates with 10 replicates.
Call:
svycoxph.svyrep.design(formula = Surv(Dur, rc) ~ age + sex, design = panel.2)
n= 36, number of events= 14
coef exp(coef) se(coef) z Pr(>|z|)
age 0.0198 1.0200 0.0131 1.512 0.131
sex 1.0681 2.9098 0.2336 4.572 4.84e-06 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
exp(coef) exp(-coef) lower .95 upper .95
age 1.02 0.9804 0.9941 1.047
sex 2.91 0.3437 1.8407 4.600
Concordance= 0.75 (se = 0.677 )
Rsquare= NA (max possible= NA )
Likelihood ratio test= NA on 2 df, p=NA
Wald test = 28.69 on 2 df, p=5.875e-07
Score (logrank) test = NA on 2 df, p=NA
The sum of the longitudinal weights is 7,342. The total of events must be around 2,357 and the censored observations a total of 4,985 for a "population" of 7,342 individuals
Do models mod.1.w and mod.2.w take into consideration the longitudinal weights? If the do, why the summary report only n= 36, number of events= 14 ?
The design works well if I take other statistics. For example the mean of Dur in data.1 without considering the sampling design is around 4.9 and 5.31 when I consider svymean(~Dur, panel.2) for example.