Adjusting ODE model output using a Rogan-Gladen estimator in R - r

I have made an ODE model in R using the package deSolve. Currently the output of the model gives me the "observed" prevalence of a disease (i.e. the prevalence not accounting for diagnostic imperfection).
However, I want to adjust the model to output the "true" prevalence, using a simple adjustment formula called the Rogan-Gladen estimator (http://influentialpoints.com/Training/estimating_true_prevalence.htm):
True prevalence =
(Apparent prev. + (Specificity-1)) / (Specificity + (Sensitivity-1))
As you will see in the code below, I have attempted to adjust only one of the differential equations (diggP).
Running the model without adjustment gives an expected output (a proportion between 0 and 1). However, attempting to adjust the model using the RG-estimator gives a spurious output (a proportion less than 0).
Any advice on what might be going wrong here would be very much appreciated.
# Load required packages
library(tidyverse)
library(broom)
library(deSolve)
# Set time (age) for function
time = 1:80
# Defining exponential decay of lambda over age
y1 = 0.003 + (0.15 - 0.003) * exp(-0.05 * time) %>% jitter(10)
df <- data.frame(t = time, y = y1)
fit <- nls(y ~ SSasymp(time, yf, y0, log_alpha), data = df)
fit
# Values of lambda over ages 1-80 years
data <- as.matrix(0.003 + (0.15 - 0.003) * exp(-0.05 * time))
lambda<-as.vector(data[,1])
t<-as.vector(seq(1, 80, by=1))
foi<-cbind(t, lambda)
foi[,1]
# Making lambda varying by time useable in the ODE model
input <- approxfun(x = foi[,1], y = foi[,2], method = "constant", rule = 2)
# Model
ab <- function(time, state, parms) {
with(as.list(c(state, parms)), {
# lambda, changing by time
import<-input(time)
# Derivatives
# RG estimator:
#True prevalence = (apparent prev + (sp-1)) / (sp + (se-1))
diggP<- (((import * iggN) - iggR * iggP) + (sp_igg-1)) / (sp_igg + (se_igg-1))
diggN<- (-import*iggN) + iggR*iggP
dtgerpP<- (0.5*import)*tgerpN -tgerpR*tgerpP
dtgerpN<- (0.5*-import)*tgerpN + tgerpR*tgerpP
# Return results
return(list(c(diggP, diggN, dtgerpP, dtgerpN)))
})
}
# Initial values
yini <- c(iggP=0, iggN=1,
tgerpP=0, tgerpN=1)
# Parameters
pars <- c(iggR = 0, tgerpR = (1/8)/12,
se_igg = 0.95, sp_igg = 0.92)
# Solve model
results<- ode(y=yini, times=time, func=ab, parms = pars)
# Plot results
plot(results, xlab="Time (years)", ylab="Proportion")

Related

Stata vs. R: Delta Method provides different results for relative risk SE's from logit model

I've been trying to estimate the conditional mean treatment effect of covariates in a logit regression (using relative-risk) along with their standard errors for inference purposes. The delta method is necessary to calculated the standard errors for these treatment effects. I've been trying to recreate these results using the Stata user written command, adjrr, to calculate the relative risk with standard errors and confidence intervals in R.
The adjrr command in Stata calculates the adjusted relative-risk (the conditional mean treatment effect of interest for my project) and it's SE's using the delta method. The deltamethod command in R should create the same results, however this is not the case.
How can I replicate the results from Stata in R?
I used the following self generated data: (https://migariane.github.io/DeltaMethodEpiTutorial.nb.html).
R code below:
generateData <- function(n, seed){
set.seed(seed)
age <- rnorm(n, 65, 5)
age65p <- ifelse(age>=65, T, F)
cmbd <- rbinom(n, size=1, prob = plogis(1 - 0.05 * age))
Y <- rbinom(n, size=1, prob = plogis(1 - 0.02* age - 0.02 * cmbd))
data.frame(Y, cmbd, age, age65p)
}
# Describing the data
data <- generateData(n = 1000, seed = 777)
str(data)
logfit <- glm(Y ~ age65p + cmbd, data = data, family = binomial)
summary(logfit)
p1 <- predict(logfit, newdata = data.frame(age65p = T, cmbd = 0), type="response")
p0 <- predict(logfit, newdata = data.frame(age65p = F, cmbd = 0), type="response")
rr <- p1 / p0
rr
0.8123348 #result
library(msm)
se_rr_delta <- deltamethod( ~(1 + exp(-x1)) / (1 + exp(-x1 -x2)), coef(logfit), vcov(logfit))
se_rr_delta
0.6314798 #result
Stata Code (using same data):
logit Y i.age65p i.cmbd
adjrr age65p
//results below
R1 = 0.3685 (0.0218) 95% CI (0.3259, 0.4112)
R0 = 0.4524 (0.0222) 95% CI (0.4090, 0.4958)
ARR = 0.8146 (0.0626) 95% CI (0.7006, 0.9471)
ARD = -0.0839 (0.0311) 95% CI (-0.1449, -0.0229)
p-value (R0 = R1): 0.0071
p-value (ln(R1/R0) = 0): 0.0077

Hierarchical Dirichlet regression (jags)... overfitting

Good Morning, please I need community help in order to understand some problems that occurred writing this model.
I aim at modeling causes of death proportion using as predictors "log_GDP" (Gross domestic product in log scale), and "log_h" (hospital beds per 1,000 people on log scale)
y: 3 columns that are observed proportions of deaths over the years.
x1: "log_GDP" (Gross domestic product in log scale)
x2: "log_h" (hospital beds per 1,000 people in log scale)
As you can see from the estimation result in the last plot, I got a high noise level. Where I worked using just one covariate i.e. log_GDP, I obtained smoothed results
Here the model specification:
Here simulated data:
library(reshape2)
library(tidyverse)
library(ggplot2)
library(runjags)
CIRC <- c(0.3685287, 0.3675516, 0.3567829, 0.3517274, 0.3448940, 0.3391031, 0.3320184, 0.3268640,
0.3227445, 0.3156360, 0.3138515,0.3084506, 0.3053657, 0.3061224, 0.3051044)
NEOP <- c(0.3602199, 0.3567355, 0.3599409, 0.3591258, 0.3544591, 0.3566269, 0.3510974, 0.3536156,
0.3532980, 0.3460948, 0.3476183, 0.3475634, 0.3426035, 0.3352433, 0.3266048)
OTHER <-c(0.2712514, 0.2757129, 0.2832762, 0.2891468, 0.3006468, 0.3042701, 0.3168842, 0.3195204,
0.3239575, 0.3382691, 0.3385302, 0.3439860, 0.3520308, 0.3586342, 0.3682908)
log_h <- c(1.280934, 1.249902, 1.244155, 1.220830, 1.202972, 1.181727, 1.163151, 1.156881, 1.144223,
1.141033, 1.124930, 1.115142, 1.088562, 1.075002, 1.061257)
log_GDP <- c(29.89597, 29.95853, 29.99016, 30.02312, 30.06973, 30.13358, 30.19878, 30.25675, 30.30184,
30.31974, 30.30164, 30.33854, 30.37460, 30.41585, 30.45150)
D <- data.frame(CIRC=CIRC, NEOP=NEOP, OTHER=OTHER,
log_h=log_h, log_GDP=log_GDP)
cause.y <- as.matrix((data.frame(D[,1],D[,2],D[,3])))
cause.y <- cause.y/rowSums(cause.y)
mat.x<- D$log_GDP
mat.x2 <- D$log_h
n <- 15
Jags Model
dirlichet.model = "
model {
#setup priors for each species
for(j in 1:N.spp){
m0[j] ~ dnorm(0, 1.0E-3) #intercept prior
m1[j] ~ dnorm(0, 1.0E-3) # mat.x prior
m2[j] ~ dnorm(0, 1.0E-3)
}
#implement dirlichet
for(i in 1:N){
y[i,1:N.spp] ~ ddirch(a0[i,1:N.spp])
for(j in 1:N.spp){
log(a0[i,j]) <- m0[j] + m1[j] * mat.x[i]+ m2[j] * mat.x2[i] # m0 = intercept; m1= coeff log_GDP; m2= coeff log_h
}
}} #close model loop.
"
jags.data <- list(y = cause.y,mat.x= mat.x,mat.x2= mat.x2, N = nrow(cause.y), N.spp = ncol(cause.y))
jags.out <- run.jags(dirlichet.model,
data=jags.data,
adapt = 5000,
burnin = 5000,
sample = 10000,
n.chains=3,
monitor=c('m0','m1','m2'))
out <- summary(jags.out)
head(out)
Gather coefficient and I make estimation of proportions
coeff <- out[c(1,2,3,4,5,6,7,8,9),4]
coef1 <- out[c(1,4,7),4] #coeff (interc and slope) caus 1
coef2 <- out[c(2,5,8),4] #coeff (interc and slope) caus 2
coef3 <- out[c(3,6,9),4] #coeff (interc and slope) caus 3
pred <- as.matrix(cbind(exp(coef1[1]+coef1[2]*mat.x+coef1[3]*mat.x2),
exp(coef2[1]+coef2[2]*mat.x+coef2[3]*mat.x2),
exp(coef3[1]+coef3[2]*mat.x+coef3[3]*mat.x2)))
pred <- pred / rowSums(pred)
Predicted and Obs. values DB
Obs <- data.frame(Circ=cause.y[,1],
Neop=cause.y[,2],
Other=cause.y[,3],
log_GDP=mat.x,
log_h=mat.x2)
Obs$model <- "Obs"
Pred <- data.frame(Circ=pred[,1],
Neop=pred[,2],
Other=pred[,3],
log_GDP=mat.x,
log_h=mat.x2)
Pred$model <- "Pred"
tot60<-as.data.frame(rbind(Obs,Pred))
tot <- melt(tot60,id=c("log_GDP","log_h","model"))
tot$variable <- as.factor(tot$variable)
Plot
tot %>%filter(model=="Obs") %>% ggplot(aes(log_GDP,value))+geom_point()+
geom_line(data = tot %>%
filter(model=="Pred"))+facet_wrap(.~variable,scales = "free")
The problem for the non-smoothness is that you are calculating Pr(y=m|X) = f(x1, x2) - that is the predicted probability is a function of x1 and x2. Then you are plotting Pr(y=m|X) as a function of a single x variable - log of GDP. That result will almost certainly not be smooth. The log_GDP and log_h variables are highly negatively correlated which is why the result is not much more variable than it is.
In my run of the model, the average coefficient for log_GDP is actually positive for NEOP and Other, suggesting that the result you see in the plot is quite misleading. If you were to plot these in two dimensions, you would see that the result is again, smooth.
mx1 <- seq(min(mat.x), max(mat.x), length=25)
mx2 <- seq(min(mat.x2), max(mat.x2), length=25)
eg <- expand.grid(mx1 = mx1, mx2 = mx2)
pred <- as.matrix(cbind(exp(coef1[1]+coef1[2]*eg$mx1 + coef1[3]*eg$mx2),
exp(coef2[1]+coef2[2]*eg$mx1 + coef2[3]*eg$mx2),
exp(coef3[1]+coef3[2]*eg$mx1 + coef3[3]*eg$mx2)))
pred <- pred / rowSums(pred)
Pred <- data.frame(Circ=pred[,1],
Neop=pred[,2],
Other=pred[,3],
log_GDP=mx1,
log_h=mx2)
lattice::wireframe(Neop ~ log_GDP + log_h, data=Pred, drape=TRUE)
A couple of other things to watch out for.
Usually in hierarchical Bayesian models, your the parameters of your coefficients would themselves be distributions with hyperparameters. This enables shrinkage of the coefficients toward the global mean which is a hallmark of hierarhical models.
Not sure if this is what your data really look like or not, but the correlation between the two independent variables is going to make it difficult for the model to converge. You could try using a multivariate normal distribution for the coefficients - that might help.

Writing own logic regression function in R: how to handle ordinal data

I am trying to write my own logistic regression function in R. The goal is to get a function that gives the same results as the glm() function where family = binomial. I have some problems with ordinal data. I tried to factorize it before (as usually done in the case where the glm() function is used). However, the results of my function and of the glm() function are not the same. Does anyone know how to solve the problem with the ordinal data? Thanks in advance :) You can find my code below.
manual_logistic_regression = function(X,y,threshold = 1e-10, max_iter = 100)
#A function to find logistic regression coefficients
#Takes three inputs:
{
#A function to return p, given X and beta
#We'll need this function in the iterative section
calc_p = function(X,beta)
{
beta = as.vector(beta)
return(exp(X%*%beta) / (1+ exp(X%*%beta)))
}
#### setup bit ####
#initial guess for beta
beta = rep(0,ncol(X))
#initial value bigger than threshold so that we can enter our while loop
diff = 10000 # has to be bigger than threshold
#counter to ensure we're not stuck in an infinite loop
iter_count = 0
#### iterative bit ####
while(diff > threshold ) #tests for convergence
{
#calculate probabilities using current estimate of beta
p = as.vector(calc_p(X,beta)) #p is changing due to the new beta in each iteration
#calculate matrix of weights W
W = diag(p*(1-p)) #varianz von Y, weil Y 1 oder 0 ist
#calculate the change in beta
beta_change = solve(t(X)%*%W%*%X) %*% t(X)%*%(y - p)
#update beta
beta = beta + beta_change
#calculate how much we changed beta by in this iteration
#if this is less than threshold, we'll break the while loop
diff = sum(beta_change^2)
#see if we've hit the maximum number of iterations
iter_count = iter_count + 1
if(iter_count > max_iter) {
stop("Not converging.")
}
}
#make it pretty
coef = c("(Intercept)" = beta[1], x1 = beta[2], x2 = beta[3], x3 = beta[4], x4 = beta[5])
return(coef)
}
#I used the following data sample. I want to regress admit on great, gpa and rank.
mydata <- read.csv("https://stats.idre.ucla.edu/stat/data/binary.csv")
## view the first few rows of the data
head(mydata)
#
#institutions with a rank of 1 have the highest prestige, while those with a rank of 4 have the lowest.
mydata$rank <- factor(mydata$rank)
mylogit <- glm(admit ~ gre + gpa + rank, data = mydata, family = "binomial")
summary(mylogit)
rank<- factor(mydata$rank)
rank
class(rank) # make sure that it is really a factor
manual_logistic_regression(cbind(1,mydata$gre, mydata$gpa, rank),mydata$admit)

Simulating a mixed linear model and evaluating it with lmerTest in R

I am trying to understand how to use mixed linear models to analyse my data by simulating a model, but I can't reproduce the input parameters. What am I missing?
I want to start simulating a model with a random intercept for each subject. Here is the formula of what I want to simulate and reproduce:
If beta1 (<11) is small I find gamma00 as the intercept in fixed section, but I am completedly unaable to retrieve the slope (beta1). Also, the linear effect is not significant. Where is my conceptual mistake?
library(lmerTest)
# Generating data set
# General values and variables
numObj <- 20
numSub <- 100
e <- rnorm(numObj * numSub, mean = 0, sd = 0.1)
x <- scale(runif(numObj * numSub, min = -100, max = 100))
y <- c()
index <- 1
# Coefficients
gamma00 <- 18
gamma01 <- 0.5
beta1 <- -100
w <- runif(numSub, min = -3, max = 3)
uo <- rnorm(numSub, mean = 0, sd = 0.1)
meanBeta0 <- mean(gamma00 + gamma01*w + uo) # I should be able to retrieve that parameter.
for(j in 1:numSub){
for(i in 1:numObj){
y[index] <- gamma00 + gamma01*w[j]+ uo[j] + beta1*x[i] + e[index]
index <- index + 1
}
}
dataFrame2 <- data.frame(y = y, x = x, subNo = factor(rep(1:numSub, each = numObj)), objNum = factor(rep(1:numObj, numSub)))
model2 <- lmer(y ~ x +
(1 | subNo), data = dataFrame2)
summary(model2)
anova(model2)
No conceptual mistake here, just a mixed up index value: you should be using index rather than i to index x in your data generation loop.
Basically due to the mix-up you were using the first subject's x values for generating data for all the subjects, but using the individual x values in the model.

How to estimate the Kalman Filter with 'KFAS' R package, with an AR(1) transition equation?

I am using 'KFAS' package from R to estimate a state-space model with the Kalman filter. My measurement and transition equations are:
y_t = Z_t * x_t + \eps_t (measurement)
x_t = T_t * x_{t-1} + R_t * \eta_t (transition),
with \eps_t ~ N(0,H_t) and \eta_t ~ N(0,Q_t).
So, I want to estimate the variances H_t and Q_t, but also T_t, the AR(1) coefficient. My code is as follows:
library(KFAS)
set.seed(100)
eps <- rt(200, 4, 1)
meas <- as.matrix((arima.sim(n=200, list(ar=0.6), innov = rnorm(200)*sqrt(0.5)) + eps),
ncol=1)
Zt <- 1
Ht <- matrix(NA)
Tt <- matrix(NA)
Rt <- 1
Qt <- matrix(NA)
ss_model <- SSModel(meas ~ -1 + SSMcustom(Z = Zt, T = Tt, R = Rt,
Q = Qt), H = Ht)
fit <- fitSSM(ss_model, inits = c(0,0.6,0), method = 'L-BFGS-B')
But it returns: "Error in is.SSModel(do.call(updatefn, args = c(list(inits, model), update_args)),: System matrices (excluding Z) contain NA or infinite values, covariance matrices contain values larger than 1e+07"
The NA definitions for the variances works well, as documented in the package's paper. However, it seems this cannot be done for the AR coefficients. Does anyone know how can I do this?
Note that I am aware of the SSMarima function, which eases the definition of the transition equation as ARIMA models. Although I am able to estimate the AR(1) coef. and Q_t this way, I still cannot estimate the \eps_t variance (H_t). Moreover, I am migrating my Kalman filter codes from EViews to R, so I need to learn SSMcustom for other models that are more complicated.
Thanks!
It seems that you are missing something in your example, as your error message comes from the function fitSSM. If you want to use fitSSM for estimating general state space models, you need to provide your own model updating function. The default behaviour can only handle NA's in covariance matrices H and Q. The main goal of fitSSM is just to get started with simple stuff. For complex models and/or large data, I would recommend using your self-written objective function (with help of logLik method) and your favourite numerical optimization routines manually for maximum performance. Something like this:
library(KFAS)
set.seed(100)
eps <- rt(200, 4, 1)
meas <- as.matrix((arima.sim(n=200, list(ar=0.6), innov = rnorm(200)*sqrt(0.5)) + eps),
ncol=1)
Zt <- 1
Ht <- matrix(NA)
Tt <- matrix(NA)
Rt <- 1
Qt <- matrix(NA)
ss_model <- SSModel(meas ~ -1 + SSMcustom(Z = Zt, T = Tt, R = Rt,
Q = Qt), H = Ht)
objf <- function(pars, model, estimate = TRUE) {
model$H[1] <- pars[1]
model$T[1] <- pars[2]
model$Q[1] <- pars[3]
if (estimate) {
-logLik(model)
} else {
model
}
}
opt <- optim(c(1, 0.5, 1), objf, method = "L-BFGS-B",
lower = c(0, -0.99, 0), upper = c(100, 0.99, 100), model = ss_model)
ss_model_opt <- objf(opt$par, ss_model, estimate = FALSE)
Same with fitSSM:
updatefn <- function(pars, model) {
model$H[1] <- pars[1]
model$T[1] <- pars[2]
model$Q[1] <- pars[3]
model
}
fit <- fitSSM(ss_model, c(1, 0.5, 1), updatefn, method = "L-BFGS-B",
lower = c(0, -0.99, 0), upper = c(100, 0.99, 100))
identical(ss_model_opt, fit$model)

Resources