JM is a package to fit a model with joint longitudinal and survival data. I can get it to run with their example data, but I get an error with my own data. Any idea what the issue with JMfit1 or JMfit2 is?
My Data:
https://1drv.ms/u/s!AkG9wyz5G1c1gR4Vs_xohO--4Rb5
install.packages('JM')
require(JM)
?jointModel
# Example from vignette
# linear mixed model fit (random intercepts + random slopes)
fitLME <- lme(log(serBilir) ~ drug * year, random = ~ year | id, data = pbc2)
summary(fitLME)
# survival regression fit
fitSURV <- survreg(Surv(years, status2) ~ drug, data = pbc2.id, x = TRUE)
summary(fitSURV)
# joint model fit, under the (default) Weibull model
fitJOINT <- jointModel(fitLME, fitSURV, timeVar = "year")
fitJOINT
summary(fitJOINT)
# we can also include an interaction term of log(serBilir) with drug
fitJOINT <- jointModel(fitLME, fitSURV, timeVar = "year",
# interFact = list(value = ~ drug, data = pbc2.id))
fitJOINT
summary(fitJOINT)
# With my data:
data = readRDS('data.list.d1.dk.RDS')
d1 = data$d1
dk = data$dk
dim(d1); names(d1)
dim(dk); names(dk)
slct.cov = c('ID','Yi','Ai','zi.1','zi.2','zi.3','xi_A','di')
fmla.fix = as.formula('Yi ~ Ai*(zi.1+zi.2+zi.3)')
fmla.rnd = as.formula(' ~ Ai|ID')
fit.Yi = lme(fixed= fmla.fix, random=reStruct(fmla.rnd),
method="ML", data = dk[,slct.cov] )
surv.model = survreg(Surv(xi_A, di) ~ zi.1+zi.2+zi.3, data = d1[,slct.cov], x = TRUE)
JMfit1 = jointModel(lmeObject = fit.Yi, survObject = surv.model, timeVar = 'Ai')
# Error in if (t1 || t2) { : missing value where TRUE/FALSE needed
dForm <- list(fixed = ~ 1 + zi.1 + zi.2 + zi.3, indFixed = c(2,6,7,8), random = ~ 1, indRandom = 2)
JMfit2 = jointModel(lmeObject = fit.Yi, survObject = surv.model, timeVar = 'Ai',
derivForm = dForm, parameterization = c("both")) #"both", "value", "slope"
# method = "weibull-PH-aGH",
# "weibull-PH-aGH", "weibull-PH-GH", "weibull-AFT-aGH","weibull-AFT-GH",
# "piecewise-PH-aGH", "piecewise-PH-GH", "Cox-PH-aGH", "Cox-PH-GH",
# "spline-PH-aGH", "spline-PH-GH", # "ch-Laplace"
# interFact = NULL, lag = 0, scaleWB = NULL,
# CompRisk = FALSE, init = NULL, control = list())
It looks to me that your fmla.fix model needs more investigation in its own right maybe.
Simplifying the interactions to Yi ~ Ai+zi.1+zi.3 + Ai*(zi.2) or even Yi ~ Ai+zi.1+zi.2+zi.3 seem to give a valid JMfit1 output.
I suspect you'll get a different error for JMfit2 (do you??), so that may be a subsequent SO question.
Related
I would like to produce a posterior distribution for the difference of two groups. I can do this in JAGS in R, but am hoping to move into this century and replicate this with Stan. I am using brms.
My questions are:
Have I correctly produced the posterior distribution for the effect size?
If not, should I use one of the other functions suggested in the answer here?
How can I account for a prior effect size that could be negative? I imagine this requires using a non-beta prior, but am not sure what would be preferable.
The below code specifies a control and intervention group, with a control and intervention group event rate. It generates beta distributions for each group (step 1), prior distributions (step 2), builds a binomial model (step 3), and posteriors (step 4).
I am reasonably confident what I’ve done in the above steps is correct. I am a bit uncertain as to what the values generated by as_draws_df are - namely the b_Intercept_p vs. b_groupint_p. Further reading on these forums suggested subtracting b_groupint_p from b_Intercept_p; which does produce a plot that could plausibly be the posterior interval (labelled new_p when plotting the alt.graph object), but I am pretty uncertain on this point and would appreciate any clarification.
# Setup
## Packages
library(tidyverse)
library(ProbBayes)
library(brms)
library(tidybayes)
## Options
rstan_options(auto_write = TRUE)
options(mc.cores = parallel::detectCores())
# ***
# Setup simulation
# ***
# Trial data
con.n = 150
con.event = 50
int.n = 150
int.event = 30
# Distributions
## x1 is the expected control event rate at probability p.x1
## x2 is the control event rate at probability p.x2
x1 = 0.4
p.x1 = 0.5
x2 = 0.6
p.x2 = 0.9
## As above, but for the difference between intervention and control group
y1 = 0.05
p.y1 = 0.5
y2 = 0.1
p.y2 = 0.9
# Model setup
## For brm()
n.chains = 3
n.iter = 4000
n.warmup = 500
## For beta.select(); based on model so the number of observations are equal
n.draws = n.iter - n.warmup
# ***
# Simulation starts here
# ***
# Define Functions
## Log transformation
fun_log.trans = function(x) {
log.trans = log(x / (1 - x))
log.trans
}
## Inverse log transformation
fun_invlog.trans = function(x) {
invlog.trans = exp(x) / (1 + exp(x))
invlog.trans
}
# Run things
# 0. Put data into dataframe
data = data.frame(group = c("int" , "con"),
n = c(int.n , con.n),
event = c(int.event, con.event))
# 1. Generate prior distributions
## Beta prior for the control group event rate
beta0.val = beta.select(list(x = x1, p = p.x1),
list(x = x2, p = p.x2))
p0.sim = rbeta(n.draws, beta0.val[1], beta0.val[2])
### Log transform it
theta0.sim = fun_log.trans(p0.sim)
## Prior distribution for the difference in logit-p for each group
beta1.val = beta.select(list(x = y1, p = p.y1),
list(x = y2, p = p.y2))
p1.sim = rbeta(n.draws, beta1.val[1], beta1.val[2])
### Log transform
theta1.sim = fun_log.trans(p1.sim)
## 2. Create a matrix to store priors
priors = get_prior(family = binomial,
event | trials(n) ~ group,
data = data)
### Get characteristics of the normal distribution for the priors generated in step 1
theta0.val = c(mean(theta0.sim), sd(theta0.sim))
theta1.val = c(mean(theta1.sim), sd(theta1.sim))
### Input the these characteristics into the prior matrix generated at the start of step 2
priors$prior[3] = paste("normal(", theta0.val[1], ", ", theta0.val[2], ")", sep = "")
priors$prior[2] = paste("normal(", theta1.val[1], ", ", theta1.val[2], ")", sep = "")
# 3. Generate model with stan
model = brm(family = binomial,
event | trials(n) ~ group,
data = data,
prior = priors,
iter = n.iter,
warmup = n.warmup,
chains = n.chains,
refresh = 0)
## Plot model
plot(model)
print(model)
# 4. Generate posteriors
posteriors = as_draws_df(model)
## Inverse log transform function on theta to get p again
posteriors = posteriors %>%
mutate(across(starts_with("b_"), .f = fun_invlog.trans, .names = "{.col}_p")) %>%
rename_with(~paste0(., "_theta"), .cols = starts_with("b_") & !ends_with("_p"))
## 95% posterior interval estimate
quantile(posteriors$b_Intercept_p, c(0.025, 0.975))
# 5. Plot posterior densities
## Take the posterior interval data and bind it with the priors
## Ideally, n.draws = number of iterations - warmup
graph = posteriors %>%
select(ends_with("_p")) %>%
cbind(p0.sim, p1.sim) %>%
pivot_longer(cols = everything(),
names_to = "distribution",
values_to = "value")
alt.graph = posteriors %>%
mutate(new_p = b_Intercept_p - b_groupint_p) %>%
select(new_p) %>%
cbind(p0.sim, p1.sim)
quantile(alt.graph$new_p, c(0.025, 0.975))
alt.graph = alt.graph %>%
pivot_longer(cols = everything(),
names_to = "distribution",
values_to = "value")
## Plot it
alt.graph %>%
ggplot(aes(x = value)) +
geom_density(aes(fill = distribution),
alpha = 0.5) +
theme_light()
I have a mixed model with a non-linear term, so I would like to use the R package nlme instead of lme. However, switching to nlme, even without adding anything to the model, causes Rstudio and R to crash.
I have found that even generated data, which can easily be fitted using lme, causes this behaviour (on my computer).
Let's start by loading the libraries and setting up a data.frame with the grouping id and spatial coordinate x.
library(nlme)
nid <- 300
nx <- 10
data <- expand.grid(
x = seq(nx),
id = seq(nid)
)
Now, let's add correlated error and uncorrelated error as separate columns, as well as a random intercept value per id. The output of arima.sim requires a normalisation step.
data$ec <- c(
replicate(
nid,
as.numeric(
arima.sim(
model = list(
order = c(1, 0, 0),
ar = 0.5
),
n = nx
)
)
)
)
data$ec <- data$ec / sd(data$ec)
data$eu <- rnorm(nid * nx)
data$random <- rep(rnorm(nid), each = nx)
Now, we can create 3 dependent variables, for 3 models. The first is a mixed model with uncorrelated (regular) error. The second includes an exponential (AR1) correlation structure. The third combines both. I am adding an intercept of 1, an sd of the random effect of 2 and an sd of the total residual error of 3.
data$y1 <- 1 + 2 * data$random + 3 * data$eu
data$y2 <- 1 + 2 * data$random + 3 * data$ec
data$y3 <- 1 + 2 * data$random + sqrt(8) * data$ec + sqrt(1) * data$eu
All of the following lme models fit without problem, giving the expected result.
l1 <- lme(
fixed = y1 ~ 1,
random = ~ 1 | id,
data = data,
method = "ML"
)
l2 <- lme(
fixed = y2 ~ 1,
random = ~ 1 | id,
correlation = corExp(
form = ~ x | id
),
data = data,
method = "ML"
)
l3 <- lme(
fixed = y3 ~ 1,
random = ~ 1 | id,
correlation = corExp(
form = ~ x | id,
nugget = TRUE
),
data = data,
method = "ML"
)
As far as I know, the following nlme code specifies exactly the same models as above. The first runs without issues. But the ones with a correlation structure crash R / RStudio. No warning or error message is provided. Fiddling with the arguments and with nlmeControl does not help, though I do think nlmeControl could be the place to search for a solution.
nlme(
model = y1 ~ b0,
fixed = b0 ~ 1,
random = b0 ~ 1,
group = ~ id,
data = data,
start = list(
fixed = fixed.effects(l1),
random = setNames(random.effects(l1), "b0")
),
method = "ML"
)
nlme(
model = y2 ~ b0,
fixed = b0 ~ 1,
random = b0 ~ 1,
group = ~ id,
correlation = corExp(
form = ~ x
),
data = data,
start = list(
fixed = fixed.effects(l2),
random = setNames(random.effects(l2), "b0")
),
method = "ML"
)
nlme(
model = y3 ~ b0,
fixed = b0 ~ 1,
random = b0 ~ 1,
group = ~ id,
correlation = corExp(
form = ~ x,
nugget = TRUE
),
data = data,
start = list(
fixed = fixed.effects(l3),
random = setNames(random.effects(l3), "b0")
),
method = "ML"
)
Has anyone experienced this before? Does my example code give the same problem on your computer? What are good strategies to change nlmeControl to attempt to remedy this?
setwd("C:/Users/sevvalayse.yurtekin/Desktop/hw3")
data = read.table('DSE501_fall2020_HW3.csv', header= T, sep=',')
attach
data
getOption("max.print")
rs<-rowSums(data[,2:76], na.rm = TRUE)
data<-cbind(data,rs)
data
p1<-ggplot()+
geom_line(aes(y = rs, x=year), data=data)+
scale_x_continuous(breaks = seq(2004,2019,2))
p1
model = lm(rs ~ year )
model
summary(model)
residuals(model)
predict(model)
#model.fit = lm(year~rs)
#summary(model.fit)
new.year<-data.frame(
year = c(2021,2022,2023)
)
predict(model, newdata = new.year, interval = 'confidence')
data2 = read.table('TUIK_nufus_2019.csv', header = T, sep=",")
data2
total = data2$Total
mydata<-data[-c(1,2,3),]
model2 = lm(mydata~total)
model2
Hello, I have an error about the Error in model.frame.default(formula = mydata ~ total, drop.unused.levels = TRUE) : invalid type (list) for variable 'mydata'.
How can I fixed? I want to regression analyses from 2 data.
The line that's causing the issue is model2 = lm(mydata~total). mydata is not a vector, which is what your dependent variable should be in the lm function. When you set mydata you do not provide a column name: mydata<-data[-c(1,2,3), <enter column name of dependent variable>]
Otherwise you can fit your model with the following syntax (provided your dependent and independent variables are in the same dataframe). Here I just used y as a fake variable name: lm(y ~ total, data = mydata)
Is there any chance to specify the full model once and then just to drop regressors one after the other and producing a nice stargazer table with it without having to write every regression line again and again?
data <- datasets::airquality
# Treating Month and Day as crosssectional and time fixed effects
re1 <- plm(data = data, Ozone~Solar.R+Wind+Temp,
index=c("Month", "Day"), model="within", effect="twoways") # full model
# this is the only regression line I actually want to write.
# The other regressors should be automatically dropped one by one in subsequent regressions.
re2 <- plm(data = data, Ozone~Wind+Temp,
index=c("Month", "Day"), model="within", effect="twoways") # first regressor dropped
re3 <- plm(data = data, Ozone~Solar.R+Temp,
index=c("Month", "Day"), model="within", effect="twoways") # second regressor dropped
re4 <- plm(data = data, Ozone~Solar.R+Wind,
index=c("Month", "Day"), model="within", effect="twoways") # third regressor dropped
stargazer(re1, re2, re3, re4,
type = "html",
title = "Dropped after one another",
out="HopeThisWorks.html")
I have looked into the step() function but this isn't quite helping as I am not aiming to drop according to significance or anything.
re1 <- plm(data = data, Ozone~Solar.R+Wind+Temp,
index=c("Month", "Day"), model="within", effect="twoways") # full model
A=lapply(paste0(".~.-",c("Solar.R","Wind","Temp")),function(x)update(re1,as.formula(x)))
[[1]]
Model Formula: Ozone ~ Wind + Temp
Coefficients:
Wind Temp
-2.6933 2.3735
[[2]]
Model Formula: Ozone ~ Solar.R + Temp
Coefficients:
Solar.R Temp
0.040986 2.782978
[[3]]
Model Formula: Ozone ~ Solar.R + Wind
Coefficients:
Solar.R Wind
0.096607 -4.841992
Now to be able to access this in the global environment: use
list2env(setNames(A,paste0("re",seq_along(A)+1)),.GlobalEnv)
stargazer(re1, re2, re3, re4,
type = "html",
title = "Dropped after one another",
out="HopeThisWorks.html")
This is a more flexible alternative:
bene <- function(data = datasets::airquality,
depvar = "Ozone",
covariates = c("Wind","Temp","Solar.R")){
funny <- function(id){
covs <- ifelse(is.na(id),paste0(covariates,collapse = " + "),paste0(covariates[-id],collapse = " + "))
model <- eval(parse(text=paste0("plm(data = data,",depvar," ~ ",covs,",index=c('Month', 'Day'), model='within', effect='twoways')")))
return(model)
}
xx <- capture.output(stargazer::stargazer(purrr::map(c(NA,1:length(covariates)),funny),
type = "html",
out = paste0("results/model.html"),
star.cutoffs = c(0.05,0.01,0.001),
title = paste0(depvar)))
models <- purrr::map(c(NA,1:length(covariates)),funny)
map(models,function(x)print(summary(x)))
}
bene(data = datasets::airquality,
depvar = "Ozone",
covariates = c("Wind","Temp","Solar.R"))
You could use (an adaption of) this (so far only stepwise-working) function, which shows you the stargazer output and also saves the regression table as a html file.
stepwise_model <- function(data = "datasets::airquality",
depvar = "Ozone",
covariates = c("Wind","Temp","Solar.R")){
data_df <- eval(parse(text = data))
models <- c()
for(q in 1:length(covariates)){
label <- paste0("mod_",depvar,"_",q)
models[q] <- label
cov <- paste0(covariates[1:q],collapse = " + ")
eval(parse(text = paste0(label," <<- plm::plm(",depvar," ~ ",cov,",data = data_df,index=c('Month', 'Day'), model='within', effect='twoways')")))
eval(parse(text = paste0("print(summary(",label,"))")))
}
modellist <- eval(parse(text = paste0("list(",paste0(models,collapse = ","),")")))
xx <- capture.output(stargazer::stargazer(modellist ,
type = "html",
out = paste0("results/paper/models/mod_",depvar,".html"),
star.cutoffs = c(0.05,0.01,0.001),
title = paste0(depvar)))
}
stepwise_model()
I've been trying to fit multiple GAMs using the package mgcv within a function, and crudely select the most appropriate model through model selection procedures. But my function runs the first model then doesn't seem to recognise the input data dat again.
I get the error
Error in is.data.frame(data) : object 'dat' not found.
I think this is a scoping problem and I've looked here, and here for help but cannot figure it out.
Code and data are as follows (hopefully reproducible):
https://github.com/cwaldock1/Help/blob/master/test_gam.csv
library(mgcv)
# Function to fit multiple models
best.mod <- function(dat) {
# Set up control structure
ctrl <- list(niterEM = 0, msVerbose = TRUE, optimMethod="L-BFGS-B")
# AR(1)
m1 <- get.models(dredge(gamm(Temp ~ s(Month, bs = "cc") + s(Date, bs = 'cr') + Year,
data = dat, correlation = corARMA(form = ~ 1|Year, p = 1),
control = ctrl)), subset=1)[[1]]
# AR(2)
m2 <- get.models(dredge(gamm(Temp ~ s(Month, bs = "cc") + s(Date, bs = 'cr') + Year,
data = dat, correlation = corARMA(form = ~ 1|Year, p = 2),
control = ctrl)), subset=1)[[1]]
# AR(3)
m3 <- get.models(dredge(gamm(Temp ~ s(Month, bs = "cc") + s(Date, bs = 'cr') + Year,
data = dat, correlation = corARMA(form = ~ 1|Year, p = 3),
control = ctrl)), subset = 1)[[1]]
### Select best model to work with based on unselective AIC criteria
if(AIC(m2$lme) > AIC(m1$lme)){mod = m1}else{mod = m2}
if(AIC(mod$lme) > AIC(m3$lme)){mod = m3}else{mod = mod}
return(mod$gam)
}
mod2 <- best.mod(dat = test_gam)
Any help would be greatly appreciated.
Thanks,
Conor
get.models evaluates in model's formula environment, which in gamm is
(always?) .GlobalEnv, while it should be function's environment (i.e.
sys.frames(sys.nframe())).
So, instead of
get.models(ms, 1)
use
eval(getCall(ms, 1))