How to extract the baseline hazard function h0(t) from glmnet object in R? - r

Extract the baseline hazard function h0(t) from glmnet object
I want to know the hazard function at time t >> h(t,X) = h0(t) exp[Σ βi*Xi]. How can I extract the baseline hazard function h0(t) from glmnet object in R?
What I know is that function "basehaz()" in Survival Packages can extract the baseline hazard function from coxph object only.
I also found a function, glmnet.basesurv(time, event, lp, times.eval = NULL, centered = FALSE). But when I try to use this function, there is an error.
Error: could not find function "glmnet.basesurv"
Below is my code, using glmnet to fit the cox model and obtained the coefficients of selected variables. Is it possible to get the baseline hazard function h0(t) from this glmnet object?
Code
# Split data into training data and testing data
set.seed(101)
train_ratio = 2/3
sample <- sample.int(nrow(x), floor(train_ratio*nrow(x)), replace = F)
x.train <- x[sample, ]
x.test <- x[-sample, ]
y.train <- y[sample, ]
y.test <- y[-sample, ]
surv_obj <- Surv(y.train[,1],y.train[,2])
#
my_alpha = 0.5
fit = glmnet(x = x.train, y = surv_obj, family = "cox",alpha = my_alpha) # fit the model with elastic net method
plot(fit,xvar="lambda", main="cox model coefficient paths(glmnet.fit)\n\n") # Plot the paths for the fit
fit
# cross validation to find out best lambda
cv_fit = cv.glmnet(x = x.train,y = surv_obj , family = "cox",nfolds = 10,alpha = my_alpha)
tencrossfit <- cv_fit$glmnet.fit
plot(cv_fit, main="Cross-validated Deviance(10 folds cv.glmnet.fit)\n\n")
plot(tencrossfit, main="cox model coefficient paths(10 folds cv.glmnet.fit)\n\n")
max(cv_fit$cvm)
summary(cv_fit$cvm)
cv_fit$lambda.min
cv_fit$lambda.1se
coef.min = coef(cv_fit, s = "lambda.1se")
pred_min_value2 <- predict(cv_fit, s=cv_fit$lambda.min, newx=x.test,type="link")
I really appreciate any help you can provide.

The glmnet.basesurv function is part of the hdnom package (which is available on CRAN), not glmnet itself. So install that, and then call it.

I had similar question and after installing hdnom install.packages("hdnom"), if you check inside the function list library(help = "hdnom")
you can see that the function is actually glmnet_survcurve(). I made it working as hdnom:::glmnet_survcurve(), example is here:
S <- Surv(data$survtimed, data$outcome)
X_glm<-model.matrix(S~.,data[, c("factor1", "factor2")])
cox_model <- glmnet(X_glm, S, family="cox", alpha=1, lambda=0.2)
times = c (1,2) #for predict of survival and
linearpredictors at times = 1 and 2
predictions = hdnom:::glmnet_survcurve(cox_model, S[,1], S[,2], X_glm, survtime = times)
predictions$p[,1] #survival probability at time 1

Related

Is it possible to adapt standard prediction interval code for dlm in R with other distribution?

Using the dlm package in R I fit a dynamic linear model to a time series data set, consisting of 20 observations. I then use the dlmForecast function to predict future values (which I can validate against the genuine data for said period).
I use the following code to create a prediction interval;
ciTheory <- (outer(sapply(fut1$Q, FUN=function(x) sqrt(diag(x))), qnorm(c(0.05,0.95))) +
as.vector(t(fut1$f)))
However my data does not follow a normal distribution and I wondered whether it would be possible to
adapt the qnorm function for other distributions. I have tried qt, but am unable to apply qgamma.......
Just wondered if anyone knew how you would go about sorting this.....
Below is a reproduced version of my code...
library(dlm)
data <- c(20.68502, 17.28549, 12.18363, 13.53479, 15.38779, 16.14770, 20.17536, 43.39321, 42.91027, 49.41402, 59.22262, 55.42043)
mod.build <- function(par) {
dlmModPoly(1, dV = exp(par[1]), dW = exp(par[2]))
}
# Returns most likely estimate of relevant values for parameters
mle <- dlmMLE(a2, rep(0,2), mod.build); #nileMLE$conv
if(mle$convergence==0) print("converged") else print("did not converge")
mod1 <- dlmModPoly(dV = v, dW = c(0, w))
mod1Filt <- dlmFilter(a1, mod1)
fut1 <- dlmForecast(mod1Filt, n = 7)
Cheers

ARFIMA model and accurancy function

I am foresting with data sets from fpp2 package and forecast package. So my intention is to make automatic forecasting with a several time series. So for that reason I am forecasting with function. You can see code below:
# CODE
library(fpp2)
library(dplyr)
library(forecast)
df<-qauselec
# Forecasting function
fct_fun <- function(Z, hrz = forecast_horizon) {
timeseries <- msts(Z, start = 1956, seasonal.periods = 4)
forecast <- arfima(timeseries)
}
acc_list <- lapply(X = df, fct_fun)
So next step is to check accuracy of model. So for that reason I am trying with this line of code you can see below
accurancy_arfima <- lapply(acc_list, accuracy)
Until now this line of code or function accuracy worked perfectly with other models like snaive,ets etc. but with arfima can’t work properly.
So can anybody help me how to resolve this problem with accuracy function?
Follow R-documentation, Returns range of summary measures of the forecast accuracy. If x is provided, the function measures test set forecast accuracy based on x-f . If x is not provided, the function only produces training set accuracy measures of the forecasts based on f["x"]-fitted(f).
And usage summary can be seen :
accuracy(f, x, test = NULL, d = NULL, D = NULL,
...)
So :
accuracy(acc_list[[1]]$fitted, df)
If you want to evaluate separately accuracy, It will work.
a <- c()
for (i in 1:4) {
b <- accuracy(df[i], acc_list[[1]]$fitted[i])
a <- rbind(a,b)
}

How to predict in kknn function? library(kknn)

I try to use kknn + loop to create a leave-out-one cross validation for a model, and compare that with train.kknn.
I have split the data into two parts: training (80% data), and test (20% data). In the training data, I exclude one point in the loop to manually create LOOCV.
I think something gets wrong in predict(knn.fit, data.test). I have tried to find how to predict in kknn through the kknn package instruction and online but all the examples are "summary(model)" and "table(validation...)" rather than the prediction on a separate test data. The code predict(model, dataset) works successfully in train.kknn function, so I thought I could use the similar arguments in kknn.
I am not sure if there is such a prediction function in kknn. If yes, what arguments should I give?
Look forward to your suggestion. Thank you.
library(kknn)
for (i in 1:nrow(data.train)) {
train.data <- data.train[-i,]
validation.data <- data.train[i,]
knn.fit <- kknn(as.factor(R1)~., train.data, validation.data, k = 40,
kernel = "rectangular", scale = TRUE)
# train.data + validation.data is the 80% data I split.
}
pred.knn <- predict(knn.fit, data.test) # data.test is 20% data.
Here is the error message:
Error in switch(type, raw = object$fit, prob = object$prob,
stop("invalid type for prediction")) : EXPR must be a length 1
vector
Actually I try to compare train.kknn and kknn+loop to compare the results of the leave-out-one CV. I have two more questions:
1) in kknn: is it possible to use another set of data as test data to see the knn.fit prediction?
2) in train.kknn: I split the data and use 80% of the whole data and intend to use the rest 20% for prediction. Is it an correct common practice?
2) Or should I just use the original data (the whole data set) for train.kknn, and create a loop: data[-i,] for training, data[i,] for validation in kknn? So they will be the counterparts?
I find that if I use the training data in the train.kknn function and use prediction on test data set, the best k and kernel are selected and directly used in generating the predicted value based on the test dataset.
In contrast, if I use kknn function and build a loop of different k values, the model generates the corresponding prediction results based on
the test data set each time the k value is changed. Finally, in kknn + loop, the best k is selected based on the best actual prediction accuracy rate of test data. In short, the best k train.kknn selected may not work best on test data.
Thank you.
For objects returned by kknn, predict gives the predicted value or the predicted probabilities of R1 for the single row contained in validation.data:
predict(knn.fit)
predict(knn.fit, type="prob")
The predict command also works on objects returned by train.knn.
For example:
train.kknn.fit <- train.kknn(as.factor(R1)~., data.train, ks = 10,
kernel = "rectangular", scale = TRUE)
class(train.kknn.fit)
# [1] "train.kknn" "kknn"
pred.train.kknn <- predict(train.kknn.fit, data.test)
table(pred.train.kknn, as.factor(data.test$R1))
The train.kknn command implements a leave-one-out method very close to the loop developed by #vcai01. See the following example:
set.seed(43210)
n <- 500
data.train <- data.frame(R1=rbinom(n,1,0.5), matrix(rnorm(n*10), ncol=10))
library(kknn)
pred.kknn <- array(0, nrow(data.train))
for (i in 1:nrow(data.train)) {
train.data <- data.train[-i,]
validation.data <- data.train[i,]
knn.fit <- kknn(as.factor(R1)~., train.data, validation.data, k = 40,
kernel = "rectangular", scale = TRUE)
pred.kknn[i] <- predict(knn.fit)
}
knn.fit <- train.kknn(as.factor(R1)~., data.train, ks = 40,
kernel = "rectangular", scale = TRUE)
pred.train.kknn <- predict(knn.fit, data.train)
table(pred.train.kknn, pred.kknn)
# pred.kknn
# pred.train.kknn 1 2
# 0 374 14
# 1 9 103

How to solve "impacts()" neighbors length error after running spdep::lagsarlm (Spatial Autoregressive Regression model)?

I have 9,150 polygons in my dataset. I was trying to run a spatial autoregressive model (SAR) in spdep to test spatial dependence of my outcome variable. After running the model, I wanted to examine the direct/indirect impacts, but encountered an error that seems to have something to do with the length of neighbors in the weights matrix not being equal to n.
I tried running the very same equation as SLX model (Spatial Lag X), and impacts() worked fine, even though there were some polygons in my set that had no neighbors. I Googled and looked at spdep documentation, but couldn't find a clue on how to solve this error.
# Defining queen contiguity neighbors for polyset and storing the matrix as list
q.nbrs <- poly2nb(polyset)
listweights <- nb2listw(q.nbrs, zero.policy = TRUE)
# Defining the model
model.equation <- TIME ~ A + B + C
# Run SAR model
reg <- lagsarlm(model.equation, data = polyset, listw = listweights, zero.policy = TRUE)
# Run impacts() to show direct/indirect impacts
impacts(reg, listw = listweights, zero.policy = TRUE)
Error in intImpacts(rho = rho, beta = beta, P = P, n = n, mu = mu, Sigma = Sigma, :
length(listweights$neighbours) == n is not TRUE
I know that this is a question from 2019, but maybe it can help people dealing with the same problem. I found out that in my case the problem was the type of dataset, your data=polyset should be of type "SpatialPolygonsDataFrame". Which can be achieved by converting your data:
polyset_spatial_sf <- sf::as_Spatial(polyset, IDs = polyset$ID)
Then rerun your code.

The right way to use R reliability plot

For classification problems, I was using Balanced Accuracy, Sensitivity and Specificity to evaluate the models. Recently, I saw calibration could capture those cannot be captured by accuracy and AUC. So, I want to give it a try, and Reliability Plot is the visualized calibration.
I am using R Verification package, reliability.plot() function. However the result looks weird like this:
Maybe it's because the variable I put into the function is wrong, but I am not sure how to modify. Here is my code:
Train The Model and Get Predicted Probilities
library(verification)
library(mlr)
svm_learner <- makeLearner("classif.ksvm", predict.type = "prob")
svm_param <- makeParamSet(
makeDiscreteParam("C", values = 2^c(-8,-4,-2,0)), #cost parameters
makeDiscreteParam("sigma", values = 2^c(-8,-4,0,4)) #RBF Kernel Parameter
)
ctrl <- makeTuneControlRandom()
cv_svm <- makeResampleDesc("CV",iters = 5L)
svm_tune <- tuneParams(svm_learner, task = train_task, resampling = cv_svm, par.set = svm_param, control = ctrl,measures = acc)
svm_tune$x
svm_tune$y
t.svm <- setHyperPars(svm_learner, par.vals = svm_tune$x)
svm_model <- mlr::train(svm_learner, train_task)
svmpredict <- predict(svm_model, test_task)
svmpredict
I am trying to calculate the observed frequency and forecasted frequency, and put them in the function
xy <- data.table(Truth=svmpredict$data$truth, Response=svmpredict$data$response)
summary(xy$Truth)
summary(xy$Response)
xy[, ObservedFreq := ifelse(Truth==0, 1806/(1806+48), 48/(1806+48))]
xy[, ForecastedFreq := ifelse(Truth==0, 1807/(1807+47), 47/(1807+47))]
reliability.plot(svmpredict$data$prob.1, xy$ObservedFreq, xy$ForecastedFreq, positive="1")
I guess the problem maybe caused by the variables I put in the function, but what else can be observed and forecasted frequency? Do you know how to plot the right reliability plot?

Resources