I am using the rms package for a cox model:
rmsfit2 <- cph(Surv(time,stop, Lstate) ~ gender+momedu+sibs+L1.Spostureaway_re+
L1.Sgazeaw_re + L1.Sgazepic_re+ frailty(id), x=TRUE, y=TRUE, surv = TRUE, data=data2step)
I had no problem when using the validate() function validate(rmsfit2, method = "boot", B = 100)
which returned:
index.orig training test optimism index.corrected n
Dxy 0.2711 0.2698 0.2671 0.0027 0.2684 100
R2 0.0582 0.0577 0.0562 0.0015 0.0568 100
Slope 1.0000 1.0000 0.9785 0.0215 0.9785 100
D 0.0122 0.0120 0.0117 0.0003 0.0119 100
U -0.0001 -0.0001 0.0001 -0.0001 0.0001 100
Q 0.0122 0.0121 0.0116 0.0004 0.0118 100
g 0.3162 0.3154 0.3066 0.0088 0.3075 100
However, I couldn't get the calibrate() function calibrate(rmsfit2, B = 20) to work. It returned an error message Error in reliability[, "index.corrected"] : subscript out of bounds.
I don't know what is the best way to reproduce this error with the sample data shipped with survival or rms packages but does anyone have any insight on this problem and how to make it work? Thank you!
How to get the result of lrm() respectively?
I use lrm() to bulid a logistic model, and get the result as follows:
n <- 1000 # define sample size
y <- rep(0:1, 500)
age <- rnorm(n, 50, 10)
sex <- factor(sample(c('female','male'), n,TRUE))
f <- lrm(y ~ age + sex, x=TRUE, y=TRUE)
f
Model Likelihood Discrimination Rank Discrim.
Ratio Test Indexes Indexes
Obs 1000 LR chi2 1.50 R2 0.002 C 0.520
0 500 d.f. 2 g 0.088 Dxy 0.040
1 500 Pr(> chi2) 0.4714 gr 1.092 gamma 0.040
max |deriv| 2e-13 gp 0.022 tau-a 0.020
Brier 0.250
Coef S.E. Wald Z Pr(>|Z|)
Intercept 0.2206 0.3370 0.65 0.5127
age -0.0030 0.0065 -0.46 0.6485
sex=male -0.1455 0.1266 -1.15 0.2504
How to get the result above as data.frame respectively? like:
mydf$df1
Model Likelihood Discrimination Rank Discrim.
Ratio Test Indexes Indexes
Obs 1000 LR chi2 1.50 R2 0.002 C 0.520
0 500 d.f. 2 g 0.088 Dxy 0.040
1 500 Pr(> chi2) 0.4714 gr 1.092 gamma 0.040
max |deriv| 2e-13 gp 0.022 tau-a 0.020
Brier 0.250
mydf$df2
Coef S.E. Wald Z Pr(>|Z|)
Intercept 0.2206 0.3370 0.65 0.5127
age -0.0030 0.0065 -0.46 0.6485
sex=male -0.1455 0.1266 -1.15 0.2504
Try,
res = capture.output(print(f), append = F, sep = " ")
lapply(res, function(x) write.table(data.frame(x), 'res.csv' , append= T, sep=',' ))
I am using semPaths (semPlot package) to draw my structural equation models. After some trial and error, I have a pretty good script to show what I want. Except, I haven’t been able to figure out how to include the p-value/significance levels of the estimates/regression coefficients in the figure.
Can/how can I include significance levels either as e.g. p-value in the edge labels below the estimate or as a broken line for insignificance or …?
I am also interested in including the R-square, but not as critically as the significance level.
This is the script I am using so far:
semPaths(fitmod.bac.class2,
what = "std",
whatLabels = "std",
style="ram",
edge.label.cex = 1.3,
layout = 'tree',
intercepts=FALSE,
residuals=FALSE,
nodeLabels = c("Negati-\nvicutes","cand_class\n_MB_A2_108", "CO2", "Bacilli","Ignavi-\nbacteria","C/N", "pH","Water\ncontent"),
sizeMan=7 )
Example of one of the SemPath outputs
In this example the following are not significant:
Ignavibacteria -> First_C_CO2_ugC_gC_day, p = 0.096
pH -> Ignavibacteria, p = 0.151
cand_class_MB_A2_108 <-> Bacilli correlation, p = 0.054
I am a R-user and not really a coder, so I might just be missing a crucial point in the arguments.
I am testing a lot of different models at the moment, and would really like not to have to draw them all up by hand.
update:
Using semPlotModel: Am I right in understanding that semPlotModel doesn’t include the significance levels from the sem function (see my script and output below)? I am specifically looking to include the P(>|z|) for regressions and covariance.
Is it just me that is missing that, or is it not included? If it is not included, my solution is simply just to custom the edge labels.
{model.NA.UP.bac.class2 <- '
#LATANT VARIABLES
#REGRESSIONS
#soil organic carbon quality
c_Negativicutes ~ CN
#microorganisms
First_C_CO2_ugC_gC_day ~ c_Bacilli
First_C_CO2_ugC_gC_day ~ c_Ignavibacteria
First_C_CO2_ugC_gC_day ~ c_cand_class_MB_A2_108
First_C_CO2_ugC_gC_day ~ c_Negativicutes
#pH
c_Bacilli ~pH
c_Ignavibacteria ~pH
c_cand_class_MB_A2_108~pH
c_Negativicutes ~pH
#COVARIANCE
initial_water ~~ CN
c_cand_class_MB_A2_108 ~~ c_Bacilli
'
fitmod.bac.class2 <- sem(model.NA.UP.bac.class2, data=datapNA.UP.log, missing="ml", meanstructure=TRUE, fixed.x=FALSE, std.lv=FALSE, std.ov=FALSE)
summary(fitmod.bac.class2, standardized=TRUE, fit.measures=TRUE, rsq=TRUE)
out <- capture.output(summary(fitmod.bac.class2, standardized=TRUE, fit.measures=TRUE, rsq=TRUE))
}
Output:
lavaan 0.6-5 ended normally after 188 iterations
Estimator ML
Optimization method NLMINB
Number of free parameters 28
Number of observations 30
Number of missing patterns 1
Model Test User Model:
Test statistic 17.816
Degrees of freedom 16
P-value (Chi-square) 0.335
Model Test Baseline Model:
Test statistic 101.570
Degrees of freedom 28
P-value 0.000
User Model versus Baseline Model:
Comparative Fit Index (CFI) 0.975
Tucker-Lewis Index (TLI) 0.957
Loglikelihood and Information Criteria:
Loglikelihood user model (H0) 472.465
Loglikelihood unrestricted model (H1) 481.373
Akaike (AIC) -888.930
Bayesian (BIC) -849.697
Sample-size adjusted Bayesian (BIC) -936.875
Root Mean Square Error of Approximation:
RMSEA 0.062
90 Percent confidence interval - lower 0.000
90 Percent confidence interval - upper 0.185
P-value RMSEA <= 0.05 0.414
Standardized Root Mean Square Residual:
SRMR 0.107
Parameter Estimates:
Information Observed
Observed information based on Hessian
Standard errors Standard
Regressions:
Estimate Std.Err z-value P(>|z|) Std.lv Std.all
c_Negativicutes ~
CN 0.419 0.143 2.939 0.003 0.419 0.416
c_cand_class_MB_A2_108 ~
CN -0.433 0.160 -2.707 0.007 -0.433 -0.394
First_C_CO2_ugC_gC_day ~
c_Bacilli 0.525 0.128 4.092 0.000 0.525 0.496
c_Ignavibacter 0.207 0.124 1.667 0.096 0.207 0.195
c_c__MB_A2_108 0.310 0.125 2.475 0.013 0.310 0.301
c_Negativicuts 0.304 0.137 2.220 0.026 0.304 0.271
c_Bacilli ~
pH 0.624 0.135 4.604 0.000 0.624 0.643
c_Ignavibacteria ~
pH 0.245 0.171 1.436 0.151 0.245 0.254
c_cand_class_MB_A2_108 ~
pH 0.393 0.151 2.597 0.009 0.393 0.394
c_Negativicutes ~
pH 0.435 0.129 3.361 0.001 0.435 0.476
Covariances:
Estimate Std.Err z-value P(>|z|) Std.lv Std.all
CN ~~
initial_water 0.001 0.000 2.679 0.007 0.001 0.561
.c_cand_class_MB_A2_108 ~~
.c_Bacilli -0.000 0.000 -1.923 0.054 -0.000 -0.388
Intercepts:
Estimate Std.Err z-value P(>|z|) Std.lv Std.all
.c_Negativicuts 0.145 0.198 0.734 0.463 0.145 3.826
.c_c__MB_A2_108 1.038 0.226 4.594 0.000 1.038 25.076
.Frs_C_CO2_C_C_ -0.346 0.233 -1.485 0.137 -0.346 -8.115
.c_Bacilli 0.376 0.135 2.778 0.005 0.376 9.340
.c_Ignavibacter 0.754 0.170 4.424 0.000 0.754 18.796
CN 0.998 0.007 145.158 0.000 0.998 26.502
pH 0.998 0.008 131.642 0.000 0.998 24.034
initial_water 0.998 0.008 125.994 0.000 0.998 23.003
Variances:
Estimate Std.Err z-value P(>|z|) Std.lv Std.all
.c_Negativicuts 0.001 0.000 3.873 0.000 0.001 0.600
.c_c__MB_A2_108 0.001 0.000 3.833 0.000 0.001 0.689
.Frs_C_CO2_C_C_ 0.001 0.000 3.873 0.000 0.001 0.408
.c_Bacilli 0.001 0.000 3.873 0.000 0.001 0.586
.c_Ignavibacter 0.002 0.000 3.873 0.000 0.002 0.936
CN 0.001 0.000 3.873 0.000 0.001 1.000
initial_water 0.002 0.000 3.873 0.000 0.002 1.000
pH 0.002 0.000 3.873 0.000 0.002 1.000
R-Square:
Estimate
c_Negativicuts 0.400
c_c__MB_A2_108 0.311
Frs_C_CO2_C_C_ 0.592
c_Bacilli 0.414
c_Ignavibacter 0.064
Warning message:
In lav_model_hessian(lavmodel = lavmodel, lavsamplestats = lavsamplestats, :
lavaan WARNING: Hessian is not fully symmetric. Max diff = 5.15131396241486e-05
This example is taken from ?semPaths since we don't have your object.
library('semPlot')
modFile <- tempfile(fileext = '.OUT')
download.file('http://sachaepskamp.com/files/mi1.OUT', modFile)
Use semPlotModel to get the object without plotting. There you can inspect what is to be plotted. I just dug around without reading the docs until I found what it seems to be using.
After you run semPlotModel, the object has an element x#Pars which contains the edges, nodes, and the std which is being used for the edge labels in your case. semPaths also has an argument that allows you to make custom edge labels, so you can take the data you need from x#Pars and add your p-values:
x <- semPlotModel(modFile)
x#Pars
# label lhs edge rhs est std group fixed par
# 1 lambda[11]^{(y)} perfIQ -> pc 1.000 0.6219648 Group 1 TRUE 0
# 2 lambda[21]^{(y)} perfIQ -> pa 0.923 0.5664888 Group 1 FALSE 1
# 3 lambda[31]^{(y)} perfIQ -> oa 1.098 0.6550159 Group 1 FALSE 2
# 4 lambda[41]^{(y)} perfIQ -> ma 0.784 0.4609990 Group 1 FALSE 3
# 5 theta[11]^{(epsilon)} pc <-> pc 5.088 0.6131598 Group 1 FALSE 5
# 10 theta[22]^{(epsilon)} pa <-> pa 5.787 0.6790905 Group 1 FALSE 6
# 15 theta[33]^{(epsilon)} oa <-> oa 5.150 0.5709541 Group 1 FALSE 7
# 20 theta[44]^{(epsilon)} ma <-> ma 7.311 0.7874800 Group 1 FALSE 8
# 21 psi[11] perfIQ <-> perfIQ 3.210 1.0000000 Group 1 FALSE 4
# 22 tau[1]^{(y)} int pc 10.500 NA Group 1 FALSE 9
# 23 tau[2]^{(y)} int pa 10.374 NA Group 1 FALSE 10
# 24 tau[3]^{(y)} int oa 10.663 NA Group 1 FALSE 11
# 25 tau[4]^{(y)} int ma 10.371 NA Group 1 FALSE 12
# 11 lambda[11]^{(y)} perfIQ -> pc 1.000 0.6515609 Group 2 TRUE 0
# 27 lambda[21]^{(y)} perfIQ -> pa 0.923 0.5876948 Group 2 FALSE 1
# 31 lambda[31]^{(y)} perfIQ -> oa 1.098 0.6981974 Group 2 FALSE 2
# 41 lambda[41]^{(y)} perfIQ -> ma 0.784 0.4621919 Group 2 FALSE 3
# 51 theta[11]^{(epsilon)} pc <-> pc 5.006 0.5754684 Group 2 FALSE 14
# 101 theta[22]^{(epsilon)} pa <-> pa 5.963 0.6546148 Group 2 FALSE 15
# 151 theta[33]^{(epsilon)} oa <-> oa 4.681 0.5125204 Group 2 FALSE 16
# 201 theta[44]^{(epsilon)} ma <-> ma 8.356 0.7863786 Group 2 FALSE 17
# 211 psi[11] perfIQ <-> perfIQ 3.693 1.0000000 Group 2 FALSE 13
# 221 tau[1]^{(y)} int pc 10.500 NA Group 2 FALSE 9
# 231 tau[2]^{(y)} int pa 10.374 NA Group 2 FALSE 10
# 241 tau[3]^{(y)} int oa 10.663 NA Group 2 FALSE 11
# 251 tau[4]^{(y)} int ma 10.371 NA Group 2 FALSE 12
# 26 alpha[1] int perfIQ -2.469 NA Group 2 FALSE 18
As you can see there are more edge labels than ones that are plotted, and I have no idea how it chooses which to use, so I am just taking the first four from each group (since there are four edges shown and the stds match those. Maybe there is an option to plot all of them or select which ones you need--I haven't read the docs.
## take first four stds from each group, generate some p-values
l <- sapply(split(x#Pars$std, x#Pars$group), function(x) head(x, 4))
set.seed(1)
l <- sprintf('%.3f, p=%s', l, format.pval(runif(length(l)), digits = 2))
l
# [1] "0.622, p=0.27" "0.566, p=0.37" "0.655, p=0.57" "0.461, p=0.91" "0.652, p=0.20" "0.588, p=0.90" "0.698, p=0.94" "0.462, p=0.66"
Then you can plot the object with your new labels, edgeLabels = l
layout(1:2)
semPaths(
x,
edgeLabels = l,
ask = FALSE, title = FALSE,
what = 'std',
whatLabels = 'std',
style = 'ram',
edge.label.cex = 1.3,
layout = 'tree',
intercepts = FALSE,
residuals = FALSE,
sizeMan = 7
)
With the help from #rawr, I have worked it out. If anybody else needs to include estimates and p-value from Lavaan in their semPaths, here is how it can be done.
#extracting the parameters from the sem model and selecting the interactions relevant for the semPaths (here, I need 12 estimates and p-values)
table2<-parameterEstimates(fitmod.bac.class2,standardized=TRUE) %>% head(12)
#turning the chosen parameters into text
b<-gettextf('%.3f \n p=%.3f', table2$std.all, digits=table2$pvalue)
I can honestly say that I do not understand how the last bit of script works. This is copied from rawr's answer before a lot of trial and error until it worked. There might (quite possibly) be a nicer way to write it, but it works :)
#putting that list into edgeLabels in sempaths
semPaths(fitmod.bac.class2,
what = "std",
edgeLabels = b,
style="ram",
edge.label.cex = 1,
layout = 'tree',
intercepts=FALSE,
residuals=FALSE,
nodeLabels = c("Negati-\nvicutes","cand_class\n_MB_A2_108", "CO2", "Bacilli","Ignavi-\nbacteria","C/N", "pH","Water\ncontent"),
sizeMan=7
)
Just a small, but relevant detail for an improvement for the above answer.
The above code requires an inspection of the parameter table to count how many lines to maintain to specify as in %>%head(4).
We can exclude from the extracted parameter table those lines which lhs and rhs are not equal.
#extracting the parameters from the sem model and selecting the interactions relevant for the semPaths
table2<-parameterEstimates(fitmod.bac.class2,standardized=TRUE)%>%as.dataframe()
table2<-table2[!table2$lhs==table2$rhs,]
If the formula comprised also extra lines as those with ':=' those also will comprise the parameter table, and should be removed.
The remaining keeps the same...
#turning the chosen parameters into text
b<-gettextf('%.3f \n p=%.3f', table2$std.all, digits=table2$pvalue)
#putting that list into edgeLabels in sempaths
semPaths(fitmod.bac.class2,
what = "std",
edgeLabels = b,
style="ram",
edge.label.cex = 1,
layout = 'tree',
intercepts=FALSE,
residuals=FALSE,
nodeLabels = c("Negati-\nvicutes","cand_class\n_MB_A2_108", "CO2", "Bacilli","Ignavi-\nbacteria","C/N", "pH","Water\ncontent"),
sizeMan=7
)
I have a correlation matrix in excel follows:
dfA <- read.table(text=
"beta1 beta2 beta3 beta4 beta5 beta6 X X2 X3
beta1 1.0000 -0.2515 -0.2157 0.7209 -0.7205 0.4679 0.1025 -0.3606 -0.0356
beta2 -0.2515 1.0000 0.9831 0.1629 -0.1654 -0.5595 -0.0316 0.0946 0.0829
beta3 -0.2157 0.9831 1.0000 0.1529 -0.1559 -0.4976 -0.0266 0.0383 0.0738
beta4 0.7209 0.1629 0.1529 1.0000 -1.0000 -0.2753 0.0837 -0.1445 0.0080
beta5 0.4679 -0.5595 -0.4976 -0.2753 1.0000 0.2757 0.0354 -0.3149 -0.0596
beta6 -0.7205 -0.1654 -0.1559 -1.0000 0.2757 1.0000 -0.0837 0.1451 -0.0081
X 0.1025 -0.0316 -0.0266 0.0837 -0.0837 0.0354 1.0000 0.0278 -0.0875
X2 -0.3606 0.0946 0.0383 -0.1445 0.1451 -0.3149 0.0278 1.0000 0.2047
X3 -0.0356 0.0829 0.0738 0.0080 -0.0081 -0.0596 -0.0875 0.2047 1.0000",
header=TRUE)
I have just the correlation matrix and not the original data from which the matrix is formed, so, I tried to read the this matrix into matrix in R with this code:
B <- as.matrix(dfA)
But when I try to form a scatter plot matrix with the following code:
library(corrplot)
corrplot(B, method="circle")
I receive error
Error in corrplot(B, method = "circle") : The matrix is not in [-1, 1]!
Kindly help me with this problem.
corrplot() Solution
Update to my first post using ggplot based on user20650's comments above. user20650 shows that the likely source of error was rounding mistakes leading to some numbers being out of the permissible [-1,1] range and that rounding solves this issue. I was able to produce a plot using corrplot() as well.
At this point, running corrplot() yields the following plot:
corMat<-as.matrix(dfA)
library('corrplot')
corrplot(corMat, method='circle')
ggplot() Solution
You can also do this in ggplot2 with a few additional steps. I personally think it looks much better.
1) I get rid of the redundant information in the lower triangle of the matrix.
corMat[lower.tri(corMat)]<-NA
> print(corMat)
beta1 beta2 beta3 beta4 beta5 beta6 X X2 X3
beta1 1 -0.2515 -0.2157 0.7209 0.4679 -0.7205 0.1025 -0.3606 -0.0356
beta2 NA 1.0000 0.9831 0.1629 -0.5595 -0.1654 -0.0316 0.0946 0.0829
beta3 NA NA 1.0000 0.1529 -0.4976 -0.1559 -0.0266 0.0383 0.0738
beta4 NA NA NA 1.0000 -0.2753 -1.0000 0.0837 -0.1445 0.0080
beta5 NA NA NA NA 1.0000 0.2757 -0.0837 0.1451 -0.0081
beta6 NA NA NA NA NA 1.0000 0.0354 -0.3149 -0.0596
X NA NA NA NA NA NA 1.0000 0.0278 -0.0875
X2 NA NA NA NA NA NA NA 1.0000 0.2047
X3 NA NA NA NA NA NA NA NA 1.0000
2) Then I use reshape2::melt() to transform the matrix into long form and create a formatted version of values that only show up to two decimal places. This will be useful for the plot.
library(reshape2)
m<-melt(corMat)
m<-data.frame(m[!is.na(m[,3]),]) # get rid of the NA matrix entries
m$value_lab<-sprintf('%.2f',m$value)
Here's what the data looks like:
> head(m)
Var1 Var2 value value_lab
1 beta1 beta1 1.0000 1.00
10 beta1 beta2 -0.2515 -0.25
11 beta2 beta2 1.0000 1.00
19 beta1 beta3 -0.2157 -0.22
20 beta2 beta3 0.9831 0.98
21 beta3 beta3 1.0000 1.00
3) Finally, I feed this data into ggplot2 - primarily relying on geom_tile() to print the matrix and geom_text() to print the labels over each tile. You can dress this up more if you want.
library(ggplot2)
ggplot(m, aes(Var2, Var1, fill = value, label=value_lab),color='blue') +
geom_tile() +
geom_text() +
xlab('')+
ylab('')+
theme_minimal()
I want to perform Variance Ratio tests (Lo-MackKinlay, Chow-Denning) but I have some problem with the running of the commands.
I have a price Index for 1957 to 2007. Do I need to perform the variance ratio tests on the level series or on the series of returns?
How do you fix the kvec? It is a vector with the lags for which you want to do the test right?
So here is my output:
> rcorr
[1] 0.0000 -0.1077 0.4103 -0.0347 0.1136 0.0286 0.0104 0.0104 0.1915
[10] -0.0025 0.0665 0.2127 0.0116 -0.1288 0.1640 0.3089 0.2098 -0.1071
[19] -0.2079 -0.1082 0.0022 0.1419 0.0641 -0.0082 -0.1163 -0.1731 0.0260
[28] 0.0468 0.0882 0.2640 0.3946 0.2094 0.2754 0.0623 -0.3696 -0.1095
[37] -0.1463 0.0118 0.0152 -0.0103 0.0223 0.0379 0.0580 -0.0091 -0.0510
[46] 0.0765 0.0984 0.1250 0.0519 0.1623 0.2552
> kvec<--c(2,5,10)
> Lo.Mac(rcorr,kvec)
Error in y[index] : only 0's may be mixed with negative subscripts
Why do I get this error?
It is the same error as in your other question I just answered:
kvec<--c(2,5,10)
is the same as
kvec <- -c(2,5,10)
ie
kvec <- -1 * c(2,5,10)
Remove the second dash.