While fitting a Smooth Coefficient Kernel Regression with help of npscoef {np} in R, I cannot output the standard errors for the regression estimates.
The Help states that if errors = TRUE, asymptotic standard errors should be computed and returned in the resulting smoothcoefficient object.
Based on the example provided by the authors of the package "NP":
library("np")
data(wage1)
NP.Ydata <- wage1$lwage
NP.Xdata <- wage1[c("educ", "tenure", "exper", "expersq")]
NP.Zdata <- wage1[c("female", "married")]
NP.bw.scoef <- npscoefbw(xdat=NP.Xdata, ydat=NP.Ydata, zdat=NP.Zdata)
NP.scoef <- npscoef(NP.bw.scoef,
betas = TRUE,
residuals = TRUE,
errors = TRUE)
Coefficients are in the object coef(NP.scoef) saved under betas = TRUE
> str(coef(NP.scoef))
num [1:526, 1:5] 0.146 0.504 0.196 0.415 0.415 ...
- attr(*, "dimnames")=List of 2
..$ : NULL
..$ : chr [1:5] "Intercept" "educ" "tenure" "exper" ...
But should not the standard errors for the estimates be saved under errors = TRUE?
I see only one column vector. Not 5 for intercept + 4 explanatory variables.
> str(se(NP.scoef))
num [1:526] 0.015 0.0155 0.0155 0.0268 0.0128 ...
I am confused. Hope for a clarification.
Related
I am trying run a linear model in R that does not specify an intercept. The reason is to eventually calculate the sums of squares reduced when an intercept is added. However, I am receiving different results when specifying this model using built-in factor contrasts versus explicitly stating the contrast values (i.e., -.5 and .5).
More specifically, using contrasts() results in a model with 2 terms (no intercept) while explicitly stating the contrast values via a column vector results in the correct model (no intercept and 1 term specifying the contrast).
group <- rep(c("c", "t"), each = 5)
group_cont <- rep(c(-.5, .5), each = 5)
var1 <- runif(10)
var2 <- runif(10)
test_data <- data.frame(
group = factor(group),
group_cont = group_cont,
y = var1,
x = var2
)
contrasts(test_data$group) <- cbind(grp = c(-.5, .5))
summary(lm(y ~ 1 + group, data = test_data)) # full model
summary(lm(y ~ 0 + group, data = test_data)) # weird results
summary(lm(y ~ 0 + group_cont, data = test_data)) # expected
Is there a way to specify a linear model without an intercept, but still use contrasts() to specify the contrast?
lm() asks for a data frame and column names as inputs. When you use contrasts(), you are assigning an attribute to the column in your data frame, which you can call directly using the the contrast function or attr. However, you are not changing the data type itself. Using you example above:
> str(test_data)
'data.frame': 10 obs. of 4 variables:
$ group : Factor w/ 2 levels "c","t": 1 1 1 1 1 2 2 2 2 2 #### still a factor ####
..- attr(*, "contrasts")= num [1:2, 1] -0.5 0.5 #### NOTE The contrast attribute ####
.. ..- attr(*, "dimnames")=List of 2
.. .. ..$ : chr "c" "t"
.. .. ..$ : chr "grp"
$ group_cont: num -0.5 -0.5 -0.5 -0.5 -0.5 0.5 0.5 0.5 0.5 0.5
$ y : num 0.161 0.518 0.417 0.335 0.301 ...
$ x : num 0.34 0.729 0.766 0.629 0.191 ...
> attr(test_data$group, "contrasts")
grp
c -0.5
t 0.5
So a attr was added but the type is still a factor. So lm treats it like a factor, providing you a coefficient for each level. Moreover, providing contrast or calling the attr inside lm will throw an error. Depending on what you want to the end to look like, you may need to explore a different package like contrast. There is also a contrast argument in lm but I am not 100% sure this is what you are really looking for. See ?lm for more on that.
I have a time series of 540 observations which I resample 999 times using the following code:
boot.mean = function(x,i){boot.mean = mean(x[i])}
z1 = boot(x1, boot.mean, R=999)
z1
ORDINARY NONPARAMETRIC BOOTSTRAP
Call:
boot(data = x1, statistic = boot.mean, R = 999)
Bootstrap Statistics :
original bias std. error
t1* -0.009381397 -5.903801e-05 0.002524366
trying to export the results gives me the following error:
write.csv(z1, "z1.csv")
Error in as.data.frame.default(x[[i]], optional = TRUE, stringsAsFactors = stringsAsFactors) :
cannot coerce class ""boot"" to a data.frame
How can I export the results to a .csv file?
I am expecting to obtain a file with 540 observations 999 times, and the goal is to apply the approx_entropy function from the pracma package, to obtain 999 values for approximate entropy and plot the distribution in Latex.
First, please make sure that your example is reproducible. You can do so by generating a small x1 object, or by generating a random x1 vector:
> x1 <- rnorm(540)
Now, from your question:
I am expecting to obtain a file with 540 observations 999 times
However, this is not what you will get. You are generating 999 repetitions of the mean of the resampled data. That means that every bootstrap replicate is actually a single number.
From Heroka's comment:
Hint: look at str(z1).
The function str shows you the actual data inside the z1 object, without the pretty formatting.
> str(z1)
List of 11
$ t0 : num 0.0899
$ t : num [1:999, 1] 0.1068 0.1071 0.0827 0.1413 0.0914 ...
$ R : num 999
$ data : num [1:540] 1.02 1.27 1.82 -2.92 0.68 ...
(... lots of irrelevant stuff here ...)
- attr(*, "class")= chr "boot"
So your original data is stored as z1$data, and the data that you have bootstraped, which is the mean of each resampling, is stored in z1$t. Notice how it tells you the dimension of each slot: z1$t is 999 x 1.
Now, what you probably want to do is change the boot.mean function by a boot.identity function, which simply returns the resampled data. It goes like:
> boot.identity = function(x,i){x[i]}
> z1 = boot(x1, boot.identity, R=999)
> str(z1)
List of 11
$ t0 : num [1:540] 1.02 1.27 1.82 -2.92 0.68 ...
$ t : num [1:999, 1:540] -0.851 -0.434 -2.138 0.935 -0.493 ...
$ R : num 999
$ data : num [1:540] 1.02 1.27 1.82 -2.92 0.68 ...
(... etc etc etc ...)
And you can save this data with write.csv(z1$t, "z1.csv").
I have a following code resulting in a table-like output
lvs <- c("normal", "abnormal")
truth <- factor(rep(lvs, times = c(86, 258)),
levels = rev(lvs))
pred <- factor(
c(
rep(lvs, times = c(54, 32)),
rep(lvs, times = c(27, 231))),
levels = rev(lvs))
xtab <- table(pred, truth)
library(caret)
confusionMatrix(xtab)
confusionMatrix(pred, truth)
confusionMatrix(xtab, prevalence = 0.25)
I would like to export the below part of the output as a .csv table
Accuracy : 0.8285
95% CI : (0.7844, 0.8668)
No Information Rate : 0.75
P-Value [Acc > NIR] : 0.0003097
Kappa : 0.5336
Mcnemar's Test P-Value : 0.6025370
Sensitivity : 0.8953
Specificity : 0.6279
Pos Pred Value : 0.8783
Neg Pred Value : 0.6667
Prevalence : 0.7500
Detection Rate : 0.6715
Detection Prevalence : 0.7645
Balanced Accuracy : 0.7616
Attempt to write it as a .csv table results in the error message:
write.csv(confusionMatrix(xtab),file="file.csv")
Error in as.data.frame.default(x[[i]], optional = TRUE, stringsAsFactors = stringsAsFactors) :
cannot coerce class ""confusionMatrix"" to a data.frame
Doing the whole work manually, for obvious reasons, is impractical and prone to human errors.
Any suggestions on how to export it as a .csv?
Using caret package
results <- confusionMatrix(pred, truth)
as.table(results) gives
Reference
Prediction X1 X0
X1 36 29
X0 218 727
as.matrix(results,what="overall") gives
Accuracy 7.554455e-01
Kappa 1.372895e-01
AccuracyLower 7.277208e-01
AccuracyUpper 7.816725e-01
AccuracyNull 7.485149e-01
AccuracyPValue 3.203599e-01
McnemarPValue 5.608817e-33
and
as.matrix(results, what = "classes") gives
Sensitivity 0.8953488
Specificity 0.6279070
Pos Pred Value 0.8783270
Neg Pred Value 0.6666667
Precision 0.8783270
Recall 0.8953488
F1 0.8867562
Prevalence 0.7500000
Detection Rate 0.6715116
Detection Prevalence 0.7645349
Balanced Accuracy 0.7616279
Using these and write.csv command you can get the entire confusionMatrix info
Ok, so if you inspect the output of confusionMatrix(xtab, prevalence = 0.25) , it's a list:
cm <- confusionMatrix(pred, truth)
str(cm)
List of 5
$ positive: chr "abnormal"
$ table : 'table' int [1:2, 1:2] 231 27 32 54
..- attr(*, "dimnames")=List of 2
.. ..$ Prediction: chr [1:2] "abnormal" "normal"
.. ..$ Reference : chr [1:2] "abnormal" "normal"
$ overall : Named num [1:7] 0.828 0.534 0.784 0.867 0.75 ...
..- attr(*, "names")= chr [1:7] "Accuracy" "Kappa" "AccuracyLower" "AccuracyUpper" ...
$ byClass : Named num [1:8] 0.895 0.628 0.878 0.667 0.75 ...
..- attr(*, "names")= chr [1:8] "Sensitivity" "Specificity" "Pos Pred Value" "Neg Pred Value" ...
$ dots : list()
- attr(*, "class")= chr "confusionMatrix"
From here on you select the appropriate objects that you want to create a csv from and make a data.frame that will have a column for each variable. In your case, this will be:
tocsv <- data.frame(cbind(t(cm$overall),t(cm$byClass)))
# You can then use
write.csv(tocsv,file="file.csv")
I found that capture.output works best for me.
It simply copies your output as a .csv file
(you can also do it as .txt)
capture.output(
confusionMatrix(xtab, prevalence = 0.25),
file = "F:/Home Office/result.csv")
The absolute easiest solution is to simply write out using readr::write_rds. You can export and import all while keeping the confusionMatrix structure intact.
If A is a caret::confusionMatrix object, then:
broom::tidy(A) %>% writexl::write_xlsx("mymatrix.xlsx")
optionally replace writexl with write.csv().
To also include the table on a separate sheet:
broom::tidy(A) %>% list(as.data.frame(A$table)) %>% writexl::write_xlsx("mymatrix.xlsx")
I am trying to use cox model to predict the probability of failure after time (which is named stop) 3.
bladder1 <- bladder[bladder$enum < 5, ]
coxmodel = coxph(Surv(stop, event) ~ (rx + size + number) +
cluster(id), bladder1)
range(predict(coxmodel, bladder1, type = "lp"))
range(predict(coxmodel, bladder1, type = "risk"))
range(predict(coxmodel, bladder1, type = "terms"))
range(predict(coxmodel, bladder1, type = "expected"))
However, the outputs of predict function are all not in 0-1 range. Is there any function or how can I use the lp prediction and baseline hazard function to calculate probability?
Please read the help page for predict.coxph. None of those are supposed to be probabilities. The linear predictor for a specific set of covariates is the log-hazard-ratio relative to a hypothetical (and very possibly non-existent) case with the mean of all the predictor values. The 'expected' comes the closest to a probability since it is a predicted number of events, but it would require specification of the time and then be divided by the number at risk at the beginning of observation.
In the case of the example offered on that help page for predict, you can see that the sum of predicted events is close the the actual number:
> sum(predict(fit,type="expected"), na.rm=TRUE)
[1] 163
> sum(lung$status==2)
[1] 165
I suspect you may want to be working instead with the survfit function, since the probability of event is 1-probability of survival.
?survfit.coxph
The code for a similar question appears here: Adding column of predicted Hazard Ratio to dataframe after Cox Regression in R
Since you suggested using the bladder1 dataset, then this would be the code for a specification of time=5
summary(survfit(coxmodel), time=5)
#------------------
Call: survfit(formula = coxmodel)
time n.risk n.event survival std.err lower 95% CI upper 95% CI
5 302 26 0.928 0.0141 0.901 0.956
That would return as a list with the survival prediction as a list element named $surv:
> str(summary(survfit(coxmodel), time=5))
List of 14
$ n : int 340
$ time : num 5
$ n.risk : num 302
$ n.event : num 26
$ conf.int: num 0.95
$ type : chr "right"
$ table : Named num [1:7] 340 340 340 112 NA 51 NA
..- attr(*, "names")= chr [1:7] "records" "n.max" "n.start" "events" ...
$ n.censor: num 19
$ surv : num 0.928
$ std.err : num 0.0141
$ lower : num 0.901
$ upper : num 0.956
$ cumhaz : num 0.0744
$ call : language survfit(formula = coxmodel)
- attr(*, "class")= chr "summary.survfit"
> summary(survfit(coxmodel), time=5)$surv
[1] 0.9282944
I am trying to figure out how to plot the profile likelihood curve of a GLM
parameter with 95% pLCI's on the same plot. The example I have been trying
with is below. The plots I am getting are not the likelihood curves that I
was expecting. The y-axis of the plots is tau and I would like that axis
to be the likelihood so that I have a curve that maxes at the parameter
estimate. I am not sure where I find those likelihood values? I may just
be misinterpreting the theory behind this. Thanks for any help you can give.
Max
clotting <- data.frame(
u = c(5,10,15,20,30,40,60,80,100),
lot1 = c(118,58,42,35,27,25,21,19,18),
lot2 = c(69,35,26,21,18,16,13,12,12))
glm2<-glm(lot2 ~ log(u), data=clotting, family=Gamma)
prof<-profile(glm2)
plot(prof)
Regenerate your example:
clotting <- data.frame(
u = c(5,10,15,20,30,40,60,80,100),
lot1 = c(118,58,42,35,27,25,21,19,18),
lot2 = c(69,35,26,21,18,16,13,12,12))
glm2 <- glm(lot2 ~ log(u), data=clotting, family=Gamma)
The profile.glm function actually lives in the MASS package:
library(MASS)
prof<-profile(glm2)
In order to figure out what profile.glm and plot.profile are doing, see ?profile.glm and ?plot.profile. However, in order to dig into the profile object it may also be useful to examine the code of MASS:::profile.glm and MASS:::plot.profile ... basically, what these tell you is that profile is returning the signed square root of the difference between the deviance and the minimum deviance, scaled by the dispersion parameter. The reason that this is done is so that the profile for a perfectly quadratic profile will appear as a straight line (it's much easier to detect deviations from a straight line than from a parabola by eye).
The other thing that may be useful to know is how the profile is stored. Basically, it's a list of data frames (one for each parameter profiled), except that the individual data frames are a little bit weird (containing one vector component and one matrix component).
> str(prof)
List of 2
$ (Intercept):'data.frame': 12 obs. of 3 variables:
..$ tau : num [1:12] -3.557 -2.836 -2.12 -1.409 -0.702 ...
..$ par.vals: num [1:12, 1:2] -0.0286 -0.0276 -0.0267 -0.0258 -0.0248 ...
.. ..- attr(*, "dimnames")=List of 2
.. .. ..$ : NULL
.. .. ..$ : chr [1:2] "(Intercept)" "log(u)"
..$ dev : num [1:12] 0.00622 0.00753 0.00883 0.01012 0.0114 ...
$ log(u) :'data.frame': 12 obs. of 2 variables:
..$ tau : num [1:12] -3.516 -2.811 -2.106 -1.403 -0.701 ...
..$ par.vals: num [1:12, 1:2] -0.0195 -0.0204 -0.0213 -0.0222 -0.023 ...
.. ..- attr(*, "dimnames")=List of 2
It also contains attributes summary and original.fit that you can use to recover the dispersion and minimum deviance:
disp <- attr(prof,"summary")$dispersion
mindev <- attr(prof,"original.fit")$deviance
Now reverse the transformation for parameter 1:
dev1 <- prof[[1]]$tau^2
dev2 <- dev1*disp+mindev
Plot:
plot(prof[[1]][,1],dev2,type="b")
(This is the plot of the deviance. You can multiply by 0.5 to get the negative log-likelihood, or -0.5 to get the log-likelihood ...)
edit: some more general functions to transform the profile into a useful format for lattice/ggplot plotting ...
tmpf <- function(x,n) {
data.frame(par=n,tau=x$tau,
deviance=x$tau^2*disp+mindev,
x$par.vals,check.names=FALSE)
}
pp <- do.call(rbind,mapply(tmpf,prof,names(prof),SIMPLIFY=FALSE))
library(reshape2)
pp2 <- melt(pp,id.var=1:3)
pp3 <- subset(pp2,par==variable,select=-variable)
Now plot it with lattice:
library(lattice)
xyplot(deviance~value|par,type="b",data=pp3,
scales=list(x=list(relation="free")))
Or with ggplot2:
library(ggplot2)
ggplot(pp3,aes(value,deviance))+geom_line()+geom_point()+
facet_wrap(~par,scale="free_x")
FYI, for fun, I took the above and whipped it together into a single function using purrr::imap_dfr as I couldn't find a package that implements the above.
get_profile_glm <- function(aglm){
prof <- MASS:::profile.glm(aglm)
disp <- attr(prof,"summary")$dispersion
purrr::imap_dfr(prof, .f = ~data.frame(par = .y,
deviance=.x$z^2*disp+aglm$deviance,
values = as.data.frame(.x$par.vals)[[.y]],
stringsAsFactors = FALSE))
}
Works great!
counts <- c(18,17,15,20,10,20,25,13,12)
outcome <- gl(3,1,9)
treatment <- gl(3,3)
print(d.AD <- data.frame(treatment, outcome, counts))
glm.D93 <- glm(counts ~ outcome + treatment, family = poisson())
ggplot(get_profile_glm(aglm), aes(x = values, y = deviance)) +
geom_point() +
geom_line() +
facet_wrap(~par, scale = "free_x")