Rolling Exponential Smoothing - r

REWRITING ORIGINAL QUESTION
I want a function that takes a vector and pre-defined exponential smoothing model (in this example, simple exponential smoothing with alpha = 0.5), and does one-step ahead forecasting on the input vector. The below code, though clunky, does what I want. Is there a better way to do this, or a built in function / package function that does this?
The second part to this question is: If I fit an ETS or ARIMA model with, say, the {fable} package, is there a function that can take the fitted ETS/ARIMA model and do the same one-step-ahead forecasting on a vector, like in the first part?
# Ref: # https://stats.stackexchange.com/questions/44984/how-do-you-use-simple-exponential-smoothing-in-r
# NOT fully tested, just in this case. Gives one-step-ahead forecasts for an exponential smoothing model
ses <- function(x, alpha, beta = FALSE, gamma = FALSE) {
## Populate this vector with return values
result_vec <- numeric(length(x))
result_vec[1] <- NA
for (i in 2:length(x)) {
mod <- HoltWinters(x[seq(i)], alpha=alpha, beta=beta, gamma=gamma)
result_vec[i] <- predict(mod, n.ahead = 1)
}
result_vec
}
new_wt <- c(1, 0, 0, 0, 0, 1, 1, 1)
ses(new_wt, 0.5)
#> [1] NA 0.5000000 0.2500000 0.1250000 0.0625000 0.5312500 0.7656250
#> [8] 0.8828125
Created on 2020-10-22 by the reprex package (v0.3.0)

This is the question and post I was looking for that solved it for me:
https://stats.stackexchange.com/questions/265115/one-step-ahead-forecast
https://robjhyndman.com/hyndsight/out-of-sample-one-step-forecasts/
I was then able to leverage the forecast::ets() model builder to build what I want for a filter, and then use the technique there of re-calling ets() with the new data I want to do one-step-ahead forecasts on (and the original model), and then call fitted() on that new model to get the one-step-ahead forecasts. It matches up close enough with what I want a filter to do (with some off-by-one differences in the output that I can deal with easily enough).
It mostly worked with the {fable} package ETS() call. However, I don't know how to specify the starting values to make sure we match. But it is pretty close overall, and over time, should roughly match.
Below is sample code with what I had and what the ultimate solution was in the end.
# Ref: # https://stats.stackexchange.com/questions/44984/how-do-you-use-simple-exponential-smoothing-in-r
# Hand-rolled solution
# NOT fully tested, just in this case. Gives one-step-ahead forecasts for an exponential smoothing model
ses <- function(x, alpha, beta = FALSE, gamma = FALSE) {
## Populate this vector with return values
result_vec <- numeric(length(x))
result_vec[1] <- NA
for (i in 2:length(x)) {
## One method
mod <- HoltWinters(x[seq(i)], alpha=alpha, beta=beta, gamma=gamma)
result_vec[i] <- predict(mod, n.ahead = 1)
## A similar method
# result_vec[i] <- forecast::ses(x[seq(i)], alpha=alpha, beta=beta, gamma=gamma, h=1)$mean
}
result_vec
}
new_wt <- c(1, 0, 0, 0, 0, 1, 1, 1)
ses(new_wt, 0.5)
#> [1] NA 0.5000000 0.2500000 0.1250000 0.0625000 0.5312500 0.7656250
#> [8] 0.8828125
## From: https://robjhyndman.com/hyndsight/out-of-sample-one-step-forecasts/
## I can do one-step-ahead forecasts with the forecast package
library(forecast)
#> Registered S3 method overwritten by 'quantmod':
#> method from
#> as.zoo.data.frame zoo
#>
#> Attaching package: 'forecast'
#> The following object is masked _by_ '.GlobalEnv':
#>
#> ses
## Values to do one-step-ahead forecasts on
new_wt <- c(1, 0, 0, 0, 0, 1, 1, 1)
## Fit a model with hand-picked parameters, in this case, and walk the forecast
## ahead a step at a time.
initial_value <- 0
fit <- forecast::ets(initial_value, model="ANN", alpha=0.5, use.initial.values = T)
fit2 <- forecast::ets(new_wt, model=fit, use.initial.values = T)
fitted(fit2) %>% as.vector()
#> [1] 0.0000000 0.5000000 0.2500000 0.1250000 0.0625000 0.0312500 0.5156250
#> [8] 0.7578125
## With fable, I can't seem to make it work:
library(fable)
#> Loading required package: fabletools
#>
#> Attaching package: 'fabletools'
#> The following objects are masked from 'package:forecast':
#>
#> accuracy, forecast
library(tibble)
new_wt_ts <-
tibble(value = new_wt, idx = seq(length(new_wt))) %>%
as_tsibble(index = idx)
myfable <-
model(new_wt_ts, ets = ETS(value ~ error("A") + trend("N", alpha=0.5) + season("N")))
## This is close to the others, and goes in the right direction, not sure why it doesn't match?
fitted(myfable, new_wt_ts)
#> # A tsibble: 8 x 3 [1]
#> # Key: .model [1]
#> .model idx .fitted
#> <chr> <int> <dbl>
#> 1 ets 1 0.531
#> 2 ets 2 0.765
#> 3 ets 3 0.383
#> 4 ets 4 0.191
#> 5 ets 5 0.0957
#> 6 ets 6 0.0478
#> 7 ets 7 0.524
#> 8 ets 8 0.762
Created on 2020-10-22 by the reprex package (v0.3.0)

Related

Using tidymodels in R, my BART workflow changes after I fit it once. Why?

I have been trying to train a BART model using the tidymodels framework but I am running into some problems.
I can declare the model, the recipe, and the workflow alright, but once I fit the workflow, two unwanted things happen:
The original model object (bart_mod below), initially correctly stored, becomes "call: NULL", even though I don't touch the model object directly (I assign nothing to the same object name).
I am not able to retrieve any information about the fitted model. The bart_fit contains nothing and there seems to be no tidy method associated to it. All this is true even though I am able to predict values using the fitted model! (See last line of code in the reprex).
This may very well come from a misunderstanding of how all this works on my end, I am fairly new to tidymodels.
I would appreciate any help! Thank you.
library(tidyverse)
library(tidymodels)
set.seed(2022)
# Parameters --------------------------------------------------------------
n <- 5000
coef_x_var_1 <- 1
coef_x_var_2 <- 2
coef_x_var_3 <- 3
gen_y_1 <- function(data = dataset) {
return(data$y_0 +
data$x_var_1*coef_x_var_1 +
data$x_var_2*coef_x_var_2 +
data$x_var_3*coef_x_var_3 +
rnorm(n = nrow(data), mean = 0, sd = 3)
)}
# Data generation ---------------------------------------------------------
dataset <- matrix(NA, nrow = n, ncol = 3)
# Generate the unit-level moderators
dataset[,1] <- rnorm(mean = rnorm(n = 1), n = n)
dataset[,2] <- rnorm(mean = rnorm(n = 1), n = n)
dataset[,3] <- rnorm(mean = rnorm(n = 1), n = n)
# Change into dataframe
colnames(dataset) <- c("x_var_1", "x_var_2", "x_var_3")
dataset <- as_tibble(dataset)
# Make sure the variable format is numeric (except for the identifiers)
dataset$x_var_1 <- as.numeric(dataset$x_var_1)
dataset$x_var_2 <- as.numeric(dataset$x_var_2)
dataset$x_var_3 <- as.numeric(dataset$x_var_3)
# Generate the untreated potential outcomes
P0_coefs <- rdunif(n = 6, 1, 15)
dataset$y_0 <-
dataset$x_var_1*P0_coefs[4] +
dataset$x_var_2*P0_coefs[5] +
dataset$x_var_3*P0_coefs[6] +
rnorm(n = nrow(dataset), mean = 0, sd = 3)
dataset$y_1 <- gen_y_1(data = dataset)
# Create a variable to indicate treatment
treatment_group <- sample(1:nrow(dataset), size = nrow(dataset)/2)
# Indicate which potential outcome you observe
obs_dataset <- dataset |>
mutate(treated = ifelse(row_number() %in% treatment_group, 1, 0),
obs_y = ifelse(treated, y_1, y_0))
y1_obs_dataset <- obs_dataset |> filter(treated == 1)
y0_obs_dataset <- obs_dataset |> filter(treated == 0)
# Analysis ----------------------------------------------------------------
covariates <- c("x_var_1", "x_var_2", "x_var_3")
bart_formula <- as.formula(paste0("obs_y ~ ", paste(covariates, collapse = " + ")))
# Create the workflow
bart_mod <- bart() |>
set_engine("dbarts") |>
set_mode("regression")
bart_recipe <- recipe(bart_formula, data = obs_dataset) |>
step_zv(all_predictors())
bart_workflow <-
workflow() |>
add_model(bart_mod) |>
add_recipe(bart_recipe)
# The workflow first looks right
bart_workflow
#> ══ Workflow ════════════════════════════════════════════════════════════════════
#> Preprocessor: Recipe
#> Model: bart()
#>
#> ── Preprocessor ────────────────────────────────────────────────────────────────
#> 1 Recipe Step
#>
#> • step_zv()
#>
#> ── Model ───────────────────────────────────────────────────────────────────────
#> BART Model Specification (regression)
#>
#> Computational engine: dbarts
# Once I fit it though, the model part becomes call: NULL
bart_fit <- bart_workflow |>
fit(y1_obs_dataset)
# Nothing is stored in the fit
bart_fit
#> ══ Workflow [trained] ══════════════════════════════════════════════════════════
#> Preprocessor: Recipe
#> Model: bart()
#>
#> ── Preprocessor ────────────────────────────────────────────────────────────────
#> 1 Recipe Step
#>
#> • step_zv()
#>
#> ── Model ───────────────────────────────────────────────────────────────────────
#>
#> Call:
#> `NULL`()
# The content of this object has changed!
bart_workflow
#> ══ Workflow ════════════════════════════════════════════════════════════════════
#> Preprocessor: Recipe
#> Model: bart()
#>
#> ── Preprocessor ────────────────────────────────────────────────────────────────
#> 1 Recipe Step
#>
#> • step_zv()
#>
#> ── Model ───────────────────────────────────────────────────────────────────────
#>
#> Call:
#> NULL
bart_fit |>
extract_fit_parsnip(bart_fit)
#> parsnip model object
#>
#>
#> Call:
#> `NULL`()
# And yet, I am able to run a prediction using the fit!
predict(bart_fit, y0_obs_dataset)
#> # A tibble: 2,500 × 1
#> .pred
#> <dbl>
#> 1 -4.67
#> 2 -6.23
#> 3 6.35
#> 4 10.7
#> 5 4.90
#> 6 -13.8
#> 7 4.70
#> 8 19.6
#> 9 -0.907
#> 10 5.38
#> # … with 2,490 more rows
Created on 2022-12-24 with reprex v2.0.2
First stripping Martin's code down to a smaller script:
library(tidyverse)
library(tidymodels)
set.seed(2022)
obs_dataset <- structure(list(x_var_1 = c(-0.273203786163623, 0.0026566250757164,
-0.544359413888551, 0.569128408034224, -2.00048700105319, -0.159113741655834
), obs_y = c(-8.14952415680873, 1.91364235165124, -7.68391811408719,
-9.01497463720505, -18.5017189874949, -13.505685812581)), row.names = c(NA,
-6L), class = c("tbl_df", "tbl", "data.frame"))
bart_formula <- as.formula("obs_y ~ x_var_1")
# Create the workflow
bart_mod <- bart() |>
set_engine("dbarts") |>
set_mode("regression")
bart_recipe <- recipe(bart_formula, data = obs_dataset)
bart_workflow <-
workflow() |>
add_model(bart_mod) |>
add_recipe(bart_recipe)
The workflow at first looks right
bart_workflow
> ══ Workflow
> ════════════════════════════════════════════════════════════════
> Preprocessor: Recipe Model: bart()
>
> ── Preprocessor
> ────────────────────────── 0 Recipe Steps
>
> ── Model
> ─────────────────────────────────────────────────────────
> BART Model Specification (regression)
>
> Computational engine: dbarts
but this changes after fitting:
bart_fit <- bart_workflow |>
fit(obs_dataset)
bart_fit
The workflow now displays NULL for the call, as does the model object.
bart_workflow
bart_mod
══ Workflow [trained] ══════════════════════════════════════════════════════
Preprocessor: Recipe
Model: bart()
── Preprocessor ─────────────────────────────────
0 Recipe Steps
── Model ────────────────────────────────────────────────
Call:
`NULL`()
All these display values:
required_pkgs(bart_mod)
print_model_spec(bart_mod)
bart_mod[["engine"]]
bart_mod[["mode"]]
extract_recipe(bart_fit)
extract_preprocessor(bart_fit)
extract_mold(bart_fit)
bart_fit[["fit"]][["fit"]][["spec"]][["engine"]]
bart_fit[["fit"]][["fit"]][["spec"]][["mode"]]
These display NULL:
print(bart_mod)
print(bart_workflow)
print(bart_fit)
extract_fit_engine(bart_fit)
extract_fit_parsnip(bart_fit)
extract_model(bart_fit)
So, it seems that the model data is still in the objects,
and is useable,
but the print calls do not display it,
and the extract functions do not display it.

Is there a way I run this script?

I have a large dataset that I intend generating a sample of 10% from it to run my machine learning model 20 times. To test how it will work, I decided to use iris dataset to try it. First, I split the dataset into training and testing dataset and then used a While loop to try a simple loop but it doesn't seem to work as I got an error message. Please is there something I missed out?
### partitioning dataset
part <- sample(1:150, size = 100, replace = F)
training <- iris[part,]
testing <- iris[-part,]
## using a loop
n <-1
while (n<6) {
Train(n)<-training[sample(1:100,0.3*nrow(training), replace = F),]
fit <- randomForest(Species~., data = Train(n))
pred <- predict(fit, testing)
confusionMatrix(pred, testing$Species))
n <-n+1
}
The error message I got is
Error: unexpected '}' in "}"
Here is the loop corrected and tested.
suppressPackageStartupMessages({
library(randomForest)
library(caret)
})
set.seed(2022)
part <- sample(1:150, size = 100, replace = FALSE)
training <- iris[part,]
testing <- iris[-part,]
## using a loop
result <- vector("list", 6L)
n <- 1L
while(n < 6L) {
Train <- training[sample(1:100, 0.3*nrow(training), replace = FALSE), ]
fit <- randomForest(Species ~ ., data = Train)
pred <- predict(fit, testing)
result[[n]] <- confusionMatrix(pred, testing$Species)
n <- n + 1L
}
## see the first result
result[[1]]
#> Confusion Matrix and Statistics
#>
#> Reference
#> Prediction setosa versicolor virginica
#> setosa 16 0 0
#> versicolor 0 11 1
#> virginica 0 3 19
#>
#> Overall Statistics
#>
#> Accuracy : 0.92
#> 95% CI : (0.8077, 0.9778)
#> No Information Rate : 0.4
#> P-Value [Acc > NIR] : 1.565e-14
#>
#> Kappa : 0.8778
#>
#> Mcnemar's Test P-Value : NA
#>
#> Statistics by Class:
#>
#> Class: setosa Class: versicolor Class: virginica
#> Sensitivity 1.00 0.7857 0.9500
#> Specificity 1.00 0.9722 0.9000
#> Pos Pred Value 1.00 0.9167 0.8636
#> Neg Pred Value 1.00 0.9211 0.9643
#> Prevalence 0.32 0.2800 0.4000
#> Detection Rate 0.32 0.2200 0.3800
#> Detection Prevalence 0.32 0.2400 0.4400
#> Balanced Accuracy 1.00 0.8790 0.9250
Created on 2022-05-11 by the reprex package (v2.0.1)
There's nothing to gain with a while loop versus a for loop, you are manually incrementing n and that's what for loops are meant for.
The equivalent for loop is the following.
result <- vector("list", 6L)
for(n in 1:6) {
Train <- training[sample(1:100, 0.3*nrow(training), replace = FALSE), ]
fit <- randomForest(Species ~ ., data = Train)
pred <- predict(fit, testing)
result[[n]] <- confusionMatrix(pred, testing$Species)
}

Graphics object is recognized as 'environment' type rather than 'list', thus not compatible with ggplot although it should be

I can't pass a graphics object to ggplot, and one symptom of the problem is that it's recognized as environment type, rather than list, but I'm not sure this is what causing the issue.
I use the gamlj package to generate a linear model of some data. The function gamljGLM returns a model and a plot, which is supposed to be compatible with ggplot functions. In other words, the plot generated by gamljGLM should return TRUE for is.ggplot(), but this isn't the case for me. Furthermore, when testing the plot object for its type, I get that typeof() returns environment, although in principle it should return list.
Steps for reproducing the issue
Installing gamlj package (done successfully on R versions either 3.6.3 or 4.0.2)
## first, install 'devtools' package
install.packages("devtools")
## second, install 'gamlj'
devtools::install_github("gamlj/gamlj")
Load some data
library(tidyverse)
day_1 <- rnorm(1000, mean = 77, sd = 18)
day_2 <- rnorm(1000, mean = 74, sd = 19)
day_3 <- rnorm(1000, mean = 80, sd = 5)
day_4 <- rnorm(1000, mean = 76, sd = 18)
df <-
cbind(day_1, day_2, day_3, day_4) %>%
as_tibble() %>%
gather(., key = day, value = mood, day_1:day_4) %>%
mutate_at(vars(day), factor)
> df
## # A tibble: 4,000 x 2
## day mood
## <fct> <dbl>
## 1 day_1 88.2
## 2 day_1 66.7
## 3 day_1 67.0
## 4 day_1 93.8
## 5 day_1 70.6
## 6 day_1 97.9
## 7 day_1 81.9
## 8 day_1 91.2
## 9 day_1 69.4
## 10 day_1 48.4
## # … with 3,990 more rows
Model the data using gamlj::gamljGLM
p <- gamljGLM(df_as_df,
formula = formula("mood ~ day"),
plotError = "ci",
plotHAxis = "day")
> p
## GENERAL LINEAR MODEL
## Model Info
## ─────────────────────────────────────────────
## Info
## ─────────────────────────────────────────────
## Estimate Linear model fit by OLS
## Call mood ~ 1 + day
## R-squared 0.01558849
## Adj. R-squared 0.01484944
## ─────────────────────────────────────────────
## MODEL RESULTS
## ANOVA Omnibus tests
## ──────────────────────────────────────────────────────────────────────────
## SS df F p η²p
## ──────────────────────────────────────────────────────────────────────────
## Model 16406.19 3 21.09267 < .0000001 0.0155885
## day 16406.19 3 21.09267 < .0000001 0.0155885
## Residuals 1036048.90 3996
## Total 1052455.08 3999
## ──────────────────────────────────────────────────────────────────────────
## Fixed Effects Parameter Estimates
## ─────────────────────────────────────────────────────────────────────────────────────────────────────────## ─────────────────────────────────
## Names Effect Estimate SE Lower Upper β ## df t p
## ─────────────────────────────────────────────────────────────────────────────────────────────────────────## ─────────────────────────────────
## (Intercept) (Intercept) 76.9554387 0.2545935 76.456293 77.4545841 0.00000000 ## 3996 302.2678295 < .0000001
## day1 day_2 - day_1 -2.1270930 0.7200993 -3.538889 -0.7152967 -0.13111742 ## 3996 -2.9538885 0.0031563
## day2 day_3 - day_1 3.3874303 0.7200993 1.975634 4.7992266 0.20880663 ## 3996 4.7041156 0.0000026
## day3 day_4 - day_1 -0.6973011 0.7200993 -2.109097 0.7144952 -0.04298275 ## 3996 -0.9683401 0.3329331
## ─────────────────────────────────────────────────────────────────────────────────────────────────────────## ─────────────────────────────────
Upon running p, along the model above, it automatically draws the plot too. In addition, the plot can be called directly by p$descPlot$plot.
Passing the plot object to ggplot fails
p$descPlot$plot + xlab("blah_x_label")
Error in p$descPlot$plot + xlab("blah_x_label") :
non-numeric argument to binary operator
So I check this out:
> is.ggplot(p)
## [1] FALSE
And furthermore:
> typeof(p)
## [1] "environment" ## in principle, this should be 'list'
To sum up
This plot should be of ggplot type, and a friend has been successful in applying ggplot functions onto p$descPlot$plot, because the function is designed to plot a ggplot object. But for me it fails. I'm aware that this is a specific case with a specific package. I've contacted the package maintainer, and even opened an issue on its github project, but so far haven't been responded. I'm posting this question here too since I hope that maybe this issue, although specific, lies on a more general principle of ggplot objects, which could be solved regardless of being familiar with this specific package/function. Any ideas?
In the new versions (>2.1.1) it should be solved

Calculating the Standard Error and Standard Deviation of a Discrete Time Markov Chain

I have a matrix of the counts of transitions from one state to another and I would like to calculate the Maximum Likelihood Estimates, standard errors and standard deviations. The "markovchain" package has an example but the data is a sequence. My data is obtained from a balanced panel dataset of 155 companies so the example code they provide doesn't work for me.
This is the example I followed:
data(rain)
rain$rain[1:10]
[1] "6+" "1-5" "1-5" "1-5" "1-5" "1-5" "1-5" "6+" "6+" "6+"
#obtaining the empirical transition matrix
createSequenceMatrix(stringchar = rain$rain)
0 1-5 6+
0 362 126 60
1-5 136 90 68
6+ 50 79 124
#fitting the DTMC by MLE
alofiMcFitMle <- markovchainFit(data = rain$rain, method = "mle", name = "Alofi")
alofiMcFitMle
$estimate
Alofi
A 3 - dimensional discrete Markov Chain defined by the following states:
0, 1-5, 6+
The transition matrix (by rows) is defined as follows:
0 1-5 6+
0 0.6605839 0.2299270 0.1094891
1-5 0.4625850 0.3061224 0.2312925
6+ 0.1976285 0.3122530 0.4901186
$standardError
0 1-5 6+
0 0.03471952 0.02048353 0.01413498
1-5 0.03966634 0.03226814 0.02804834
6+ 0.02794888 0.03513120 0.04401395
$confidenceInterval
$confidenceInterval$confidenceLevel
[1] 0.95
$confidenceInterval$lowerEndpointMatrix
0 1-5 6+
0 0.6034754 0.1962346 0.08623909
1-5 0.3973397 0.2530461 0.18515711
6+ 0.1516566 0.2544673 0.41772208
$confidenceInterval$upperEndpointMatrix
0 1-5 6+
0 0.7176925 0.2636194 0.1327390
1-5 0.5278304 0.3591988 0.2774279
6+ 0.2436003 0.3700387 0.5625151
$logLikelihood
[1] -1040.419
Because I already have a matrix of count data I can't use the above code. I just want to take my 6x6 matrix of transition counts and determine the maximum likelihood estimators, standard errors (confidence interval) and standard deviation. Does anyone have an example I could follow?
I'm not sure how you could do this with the markovchain package, but you could write a function that would produce similar results. The function below generates the estimates of the transition probabilities through MLE, but the standard errors, lower and upper confidence bounds via parametric bootstrap. You can to either use the standard errors and estimates to make normal theory confidence intervals (ci = "normal") or use the relevant quantiles of the parametric bootstrap distributions (ci = "quantile"). The parallel flag identifies whether the levels of the matrix should be considered ordered or not. If ordered, the model uses an ordered login to calculate probabilities. If unordered (i.e., parallel = FALSE), then a multinomial login is used to calculate probabilities.
If I understand correctly, you're starting with something that looks like the output from createSequenceMatrix(stringchar = rain$rain), without the underlying sequence of values that generated it. Here's how it would work.
Generate the data.
library(markovchain)
#> Package: markovchain
#> Version: 0.9.0
#> Date: 2022-07-01
#> BugReport: https://github.com/spedygiorgio/markovchain/issues
data(rain)
mat <- createSequenceMatrix(stringchar = rain$rain)
Make the functions to estimate the models and print the results.
mcmat_est <- function(mat, parallel = FALSE, ci = c("normal", "quantile"), conf_level=.95, R=2500){
ci <- match.arg(ci)
if(!is.null(rownames(mat))){
vals <- rownames(mat)
}else{
vals <- 1:nrow(mat)
}
state_vals <- lapply(vals, \(i)data.frame(vals = rep(vals, mat[i, ])))
if(parallel){
state_vals <- lapply(state_vals, \(x){x$vals <- factor(x$vals, levels=vals, ordered=TRUE); x})
mods <- lapply(state_vals, \(x)MASS::polr(vals ~ 1, data=x, Hess = TRUE))
draws <- lapply(mods, \(m)cbind(-Inf, MASS::mvrnorm(R, m$zeta, vcov(m)), Inf))
qs <- lapply(draws, \(d)plogis(d))
probs<- lapply(qs, \(x)x[,-1] - x[,-ncol(x)])
ests <- lapply(mods, \(m)plogis(c(-Inf, m$zeta, Inf)))
ests <- lapply(ests, \(e)e[-1]-e[-length(e)])
}else{
state_vals <- lapply(state_vals, \(x){x$vals <- factor(x$vals, levels=vals); x})
mods <- lapply(state_vals, \(x)nnet::multinom(vals ~ 1, data=x, Hess = TRUE, trace=FALSE))
draws <- lapply(mods, \(m)MASS::mvrnorm(R, c(0, coef(m)),
cbind(0, rbind(0, vcov(m)))))
probs <- lapply(draws, \(d)prop.table(exp(d), 1))
ests <- lapply(mods, \(m)exp(c(0, coef(m))))
ests <- lapply(ests, \(e)e/sum(e))
}
logLik <- sum(sapply(mods, logLik))
nobs <- sum(sapply(mods, \(x)attr(logLik(x), "nobs")))
df <- sum(sapply(mods, \(x)attr(logLik(x), "df")))
attr(logLik, "nobs") <- nobs
attr(logLik, "df") <- df
class(logLik) <- "logLik"
est <- do.call(rbind, ests)
se <- do.call(rbind, lapply(probs, \(p)apply(p, 2, sd)))
crit_z <- 1-(1-conf_level)/2
if(ci == "normal"){
lwr <- est - qnorm(crit_z)*se
upr <- est + qnorm(crit_z)*se
}else{
lwr <- do.call(rbind, lapply(probs, \(p)apply(p, 2, quantile, 1-crit_z)))
upr <- do.call(rbind, lapply(probs, \(p)apply(p, 2, quantile, crit_z)))
}
rownames(est) <- rownames(se) <- rownames(lwr) <- rownames(upr) <- rownames(mat)
colnames(est) <- colnames(se) <- colnames(lwr) <- colnames(upr) <- colnames(mat)
res <- list(estimate = est, se = se, lower=lwr, upper=upr, logLik=logLik)
class(res) <- "mcest"
res
}
print.mcest <- function(x, ..., digits=3){
cat("\nEstimated Transition Probabilities\n")
print(x$estimate, digits=digits)
cat("\nStandard Errors of Transition Probabilities\n")
print(x$se, digits=digits)
cat("\nLower Confidence Bounds\n")
print(x$lower, digits=digits)
cat("\nUpper Confidence Bounds\n")
print(x$upper, digits=digits)
cat('\n')
print(x$logLik)
}
Estimate the models using the function I generated:
out <- mcmat_est(mat)
out
#>
#> Estimated Transition Probabilities
#> 0 1-5 6+
#> 0 0.661 0.230 0.109
#> 1-5 0.463 0.306 0.231
#> 6+ 0.198 0.312 0.490
#>
#> Standard Errors of Transition Probabilities
#> 0 1-5 6+
#> 0 0.0203 0.0181 0.0135
#> 1-5 0.0281 0.0269 0.0248
#> 6+ 0.0245 0.0289 0.0309
#>
#> Lower Confidence Bounds
#> 0 1-5 6+
#> 0 0.621 0.194 0.083
#> 1-5 0.407 0.253 0.183
#> 6+ 0.150 0.256 0.430
#>
#> Upper Confidence Bounds
#> 0 1-5 6+
#> 0 0.700 0.265 0.136
#> 1-5 0.518 0.359 0.280
#> 6+ 0.246 0.369 0.551
#>
#> 'log Lik.' -1040.419 (df=6)
Compare toe the markovchainFit() output.
fit <- markovchainFit(data = rain$rain, method = "mle", name = "Alofi")
fit
#> $estimate
#> Alofi
#> A 3 - dimensional discrete Markov Chain defined by the following states:
#> 0, 1-5, 6+
#> The transition matrix (by rows) is defined as follows:
#> 0 1-5 6+
#> 0 0.6605839 0.2299270 0.1094891
#> 1-5 0.4625850 0.3061224 0.2312925
#> 6+ 0.1976285 0.3122530 0.4901186
#>
#>
#> $standardError
#> 0 1-5 6+
#> 0 0.03471952 0.02048353 0.01413498
#> 1-5 0.03966634 0.03226814 0.02804834
#> 6+ 0.02794888 0.03513120 0.04401395
#>
#> $confidenceLevel
#> [1] 0.95
#>
#> $lowerEndpointMatrix
#> 0 1-5 6+
#> 0 0.5925349 0.1897800 0.0817850
#> 1-5 0.3848404 0.2428780 0.1763188
#> 6+ 0.1428496 0.2433971 0.4038528
#>
#> $upperEndpointMatrix
#> 0 1-5 6+
#> 0 0.7286330 0.2700740 0.1371931
#> 1-5 0.5403296 0.3693669 0.2862663
#> 6+ 0.2524073 0.3811089 0.5763843
#>
#> $logLikelihood
#> [1] -1040.419
Created on 2022-12-15 by the reprex package (v2.0.1)
The log-likelihoods for the two models are the same, which provides some confidence that the underlying estimation process is similar. The standard errors are different - the original function doesn't use a parametric bootstrap. I'm not sure what they do since the function to estimate the model is written in C++ and I haven't investigated the source code. If this difference is unacceptable, perhaps someone else will be able to get closer.

Log-rank test with time-dependent variable [duplicate]

Background: at half-year follow up times for 4y, patients may switch to a different medication group. To account for this, I've converted survival data into counting process form. I want to compare survival curves for medication groups A, B, and C. I am using an extended Cox model but want to do pairwise comparisons of each hazard function or do stratified log-rank tests. pairwise_survdiff throws an error because of the form of my data, I think.
Example data:
x<-data.frame(tstart=rep(seq(0,18,6),3),tstop=rep(seq(6,24,6),3), rx = rep(c("A","B","C"),4), death=c(rep(0,11),1))
x
Problem:
When using survdiff in the survival package,
survdiff(Surv(tstart,tstop,death) ~ rx, data = x)
I get the error:
Error in survdiff(Surv(tstart, tstop, death) ~ rx, data = x) :
Right censored data only
I think this stems from the counting process form, since I can't find an example online that compares survival curves for time-varying covariates.
Question: is there a quick fix to this problem? Or, is there an alternative package/function with the same versatility to compare survival curves, namely using different methods? How can I implement stratified log-rank tests using survidff on counting process form data?
NOTE: this was marked as a known issue in the survminer package, see github issue here, but updating survminer did not solve my issue, and using one time interval, tstop-tstart wouldn't be correct, since that would leave, e.g., multiple entries at 6 months rather than out to the actual interval of risk.
So, here is an example of fitting the model and making the multiple comparisons using multcomp package. Note that this implicitly assumes that administration of treatments A-C is random. Depending on the assumptions about the process, it might be better to fit a multistate model with transitions between treatments and outcome.
library(purrr)
library(dplyr)
#>
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#>
#> filter, lag
#> The following objects are masked from 'package:base':
#>
#> intersect, setdiff, setequal, union
library(survival)
library(multcomp)
#> Loading required package: mvtnorm
#> Loading required package: TH.data
#> Loading required package: MASS
#>
#> Attaching package: 'MASS'
#> The following object is masked from 'package:dplyr':
#>
#> select
#>
#> Attaching package: 'TH.data'
#> The following object is masked from 'package:MASS':
#>
#> geyser
# simulate survival data
set.seed(123)
n <- 200
df <- data.frame(
id = rep(1:n, each = 8),
start = rep(seq(0, 42, by = 6), times = 8),
stop = rep(seq(6, 48, by = 6), times = 8),
rx = sample(LETTERS[1:3], n * 8, replace = T))
df$hazard <- exp(-3.5 -1 * (df$rx == "A") + .5 * (df$rx == "B") +
.5 * (df$rx == "C"))
df_surv <- data.frame(id = 1:n)
df_surv$time <- split(df, f = df$id) %>%
map_dbl(~msm::rpexp(n = 1, rate = .x$hazard, t = .x$start))
df <- df %>% left_join(df_surv)
#> Joining, by = "id"
df <- df %>%
mutate(status = 1L * (time <= stop)) %>%
filter(start <= time)
df %>% head()
#> id start stop rx hazard time status
#> 1 1 0 6 A 0.01110900 13.78217 0
#> 2 1 6 12 C 0.04978707 13.78217 0
#> 3 1 12 18 B 0.04978707 13.78217 1
#> 4 2 0 6 B 0.04978707 22.37251 0
#> 5 2 6 12 B 0.04978707 22.37251 0
#> 6 2 12 18 C 0.04978707 22.37251 0
# fit the model
model <- coxph(Surv(start, stop, status)~rx, data = df)
# define pairwise comparison
glht_rx <- multcomp::glht(model, linfct=multcomp::mcp(rx="Tukey"))
glht_rx
#>
#> General Linear Hypotheses
#>
#> Multiple Comparisons of Means: Tukey Contrasts
#>
#>
#> Linear Hypotheses:
#> Estimate
#> B - A == 0 1.68722
#> C - A == 0 1.60902
#> C - B == 0 -0.07819
# perform multiple comparisons
# (adjusts for multiple comparisons + takes into account correlation of coefficients -> more power than e.g. bonferroni)
smry_rx <- summary(glht_rx)
smry_rx # -> B and C different to A, but not from each other
#>
#> Simultaneous Tests for General Linear Hypotheses
#>
#> Multiple Comparisons of Means: Tukey Contrasts
#>
#>
#> Fit: coxph(formula = Surv(start, stop, status) ~ rx, data = df)
#>
#> Linear Hypotheses:
#> Estimate Std. Error z value Pr(>|z|)
#> B - A == 0 1.68722 0.28315 5.959 <1e-05 ***
#> C - A == 0 1.60902 0.28405 5.665 <1e-05 ***
#> C - B == 0 -0.07819 0.16509 -0.474 0.88
#> ---
#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> (Adjusted p values reported -- single-step method)
# confidence intervals
plot(smry_rx)
Created on 2019-04-01 by the reprex package (v0.2.1)

Resources