Package msm: segmentation-fault when introducing covariates - r

While using the package msm, I am currently getting the error:
* caught segfault * address 0x7f875be5ff48, cause 'memory not mapped'
when I introduce a covariate to my model. Previously, I had resolved a similar error by converting my response variable from a factor to a numeric variable. This however does not resolve my current issue.
The data <- https://www.dropbox.com/s/wx6s4liofaxur0v/data_msm.txt?dl=0
library(msm)
#number of transitions between states
#1: healthy; 2: ill; 3: dead; 4: censor
statetable.msm(state_2, id, data=dat.long)
#setting initial values
q <- rbind(c(0, 0.25, 0.25), c(0.25, 0, 0.25), c(0, 0, 0))
crudeinits <- crudeinits.msm(state_2 ~ time, subject=id, data=dat.long, qmatrix=q, censor = 4, censor.states = c(1,2))
#running model without covariates
(fm1.msm <- msm(state_2 ~ time, subject = id, qmatrix = crudeinits, data = dat.long, death = 3, censor = 4, censor.states = c(1,2)))
#running model with covariates
(fm2.msm <- msm(state_2 ~ time, subject = id, qmatrix = crudeinits, data = dat.long, covariates = ~ gender, death = 3, censor = 4, censor.states = c(1,2)))
Alternatively, I can run the models with covariates if I set the state values dead and censor (3 & 4) to missing.
#set death and censor to missing
dat.long$state_2[dat.long$state_2 %in% c(3,4)] <- NA
statetable.msm(state_2, id, data=dat.long)
#setting initial values
q <- rbind(c(0, 0.5), c(0.5, 0))
crudeinits <- crudeinits.msm(state_2 ~ time, subject=id, data=dat.long, qmatrix=q)
#running models with covariates
(fm3.msm <- msm(state_2 ~ time, subject = id, qmatrix = crudeinits, data = dat.long, covariates = ~ gender))
(fm4.msm <- msm(state_2 ~ time, subject = id, qmatrix = crudeinits, data = dat.long, covariates = ~ covar))
Thanks for your help

In version 1.5 of msm, there's an error in the R code that detects and drops NAs in the data. This is triggered when there are covariates, and the state or time variable contains NAs. Those NAs can then be passed through to the C code that computes the likelihood, causing a crash. I'll fix this for the next version. In the meantime you can work around it by dropping NAs from the data before calling msm.

Related

Alter fixed and random effects using Fixef and VarCorr in package simr with a ZIP glmmTMB

I am trying to alter the fixed and random effects of a zero-inflated Poisson model using R's glmmTMB function. I want to input the altered fixed effects into the powerSim function. Here is the data:
sample <- as.data.frame(cbind(1:50, (rep(1:10, each = 5))))
#randomize interventions by clinic
ru1 <- cbind(rbinom(10, 1, 0.5), 1:10)
ru2 <- cbind(rbinom(10, 1, 0.5), 1:10)
#merge randomization id with original sample
sample <- merge(sample, ru1, by = "V2") %>% merge(ru2)
#add days
sample <- as.data.frame(cbind(sample, scale(rep(-546:546, each = 50))))
#order by clinic and prescriber
sample <- sample[order(sample$V2, sample$V1.x),]
#simulate ZIP distribution for days supply
set.seed(789)
sample <- cbind(sample, ifelse(rbinom(54650, 1, p = 0.5) > 0, 0, rpois(54650, 5)))
#rename variables
sample <- rename(sample, pres = V1.x, clinic = V2, aj = V1.y, def = V1,
days = `scale(rep(-546:546, each = 50))`,
dayssply = `ifelse(rbinom(54650, 1, p = 0.5) > 0, 0, rpois(54650, 5))`)
#days truncated
sample$days_ <- ifelse(0 > sample$days, 0, sample$days)
#model
m1 <- glmmTMB(dayssply ~ aj*days_ + (1|clinic/pres), zi = ~ aj*days_,
data = sample, family = poisson)
After a lot of trial and error, I finally figured out how to specify the conditional fixed effect using the fixef function:
fixef(m1)$cond [["aj"]]
But when I try to change it to the desired fixed effect for the power analysis, I get the error that "cond is not the name of a fixed effect." Not sure if this is a syntax related issue, or if fixef doesn't work for zero-inflated models.
I would also like to alter the variances for the random effects using VarCorr.

Problem with Over- and Under-Sampling with ROSE in R

I have a dataset to classify between won cases (14399) and lost cases (8677). The dataset has 912 predicting variables.
I am trying to oversample the lost cases in order to reach almost the same number as the won cases (so having 14399 cases for each of the won and lost cases).
TARGET is the column with lost (0) and won (1) cases:
table(dat_train$TARGET)
0 1
8677 14399
Now I am trying to balance them using ROSE ovun.sample
dat_train_bal <- ovun.sample(dat_train$TARGET~., data = dat_train, p=0.5, seed = 1, method = "over")
I get this error:
Error in parse(text = x, keep.source = FALSE) :
<text>:1:17538: unexpected symbol
1: PPER_409030143+BP_RESPPER_9639064007+BP_RESPPER_7459058285+BP_RESPPER_9339059882+BP_RESPPER_9339058664+BP_RESPPER_5209073603+BP_RESPPER_5209061378+CRM_CURRPH_Initiation+Quotation+CRM_CURRPH_Ne
Can anyone help?
Thanks :-)
Reproducing your code from a sham example I found an error in your formula dat_train$TARGET~. needs to be corrected as TARGET~.
dframe <- tibble::tibble(val = sample(c("a", "b"), size = 100, replace = TRUE, prob = c(.1, .9))
, xvar = rnorm(100)
)
# Use oversampling
dframe_os <- ROSE::ovun.sample(formula = val ~ ., data = dframe, p=0.5, seed = 1, method = "over")
table(dframe_os$data$val)

Implementing multinomial-Poisson transformation with multilevel models

I know variations of this question have been asked before but I haven't yet seen an answer on how to implement the multinomial Poisson transformation with multilevel models.
I decided to make a fake dataset and follow the method outlined here, also consulting the notes the poster mentions as well as the Baker paper on MP transformation.
In order to check if I'm doing the coding correctly, I decided to create a binary outcome variable as a first step; because glmer can handle binary response variables, this will let me check I'm correctly recasting the logit regression as multiple Poissons.
The context of this problem is running multilevel regressions with survey data where the outcome variable is response to a question and the possible predictors are demographic variables. As I mentioned above, I wanted to see if I could properly code the binary outcome variable as a Poisson regression before moving on to multi-level outcome variables.
library(dplyr)
library(lme4)
key <- expand.grid(sex = c('Male', 'Female'),
age = c('18-34', '35-64', '45-64'))
set.seed(256)
probs <- runif(nrow(key))
# Make a fake dataset with 1000 responses
n <- 1000
df <- data.frame(sex = sample(c('Male', 'Female'), n, replace = TRUE),
age = sample(c('18-34', '35-64', '45-64'), n, replace = TRUE),
obs = seq_len(n), stringsAsFactors = FALSE)
age <- model.matrix(~ age, data = df)[, -1]
sex <- model.matrix(~ sex, data = df)[, -1]
beta_age <- matrix(c(0, 1), nrow = 2, ncol = 1)
beta_sex <- matrix(1, nrow = 1, ncol = 1)
# Create class probabilities as a function of age and sex
probs <- plogis(
-0.5 +
age %*% beta_age +
sex %*% beta_sex +
rnorm(n)
)
id <- ifelse(probs > 0.5, 1, 0)
df$y1 <- id
df$y2 <- 1 - df$y1
# First run the regular hierarchical logit, just with a varying intercept for age
glm_out <- glmer(y1 ~ (1|age), family = 'binomial', data = df)
summary(glm_out)
#Next, two Poisson regressions
glm_1 <- glmer(y1 ~ (1|obs) + (1|age), data = df, family = 'poisson')
glm_2 <- glmer(y2 ~ (1|obs) + (1|age), data = df, family = 'poisson')
coef(glm_1)$age - coef(glm_2)$age
coef(glm_out)$age
The outputs for the last two lines are:
> coef(glm_1)$age - coef(glm_2)$age
(Intercept)
18-34 0.14718933
35-64 0.03718271
45-64 1.67755129
> coef(glm_out)$age
(Intercept)
18-34 0.13517758
35-64 0.02190587
45-64 1.70852847
These estimates seem close but they are not exactly the same. I'm thinking I've specified an equation wrong with the intercept.

Error in contrasts when running multiple Poisson regression models in R:

I have data which looks like this:
df <- data.frame (
time = rep(c("2010", "2011", "2012", "2013", "2014"),4),
age = rep(c("40-44", "45-49", "50-54", "55-59", "60-64"),4),
weight = rep(c(0.38, 0.23, 0.19, 0.12, 0.08),4),
ethgp = rep(c(rep("M",5),rep("NM",5)),2),
gender = c(rep("M",10), rep("F",10)),
pop = round((runif(10, min = 10000, max = 99999)), digits = 0),
count = round((runif(10, min = 1000, max = 9999)), digits = 0)
)
df <- df %>%
mutate(rate = count / pop,
asr_rate = (rate * weight)*100000,
asr_round = round(asr_rate, digits = 0))
First, I remove all zero values from the dataframe
df <- df [apply(df!=0, 1, all),]
Then I run the following code, to run multiple Poisson regression models, for each sub-group within this data (age, gender, and year); comparing ethnic groups (M / NM). I want to generate rate ratios, and CIs, comparing M with NM, for all sub-groups.
Poisson_test <- df %>% group_by(time, gender, age) %>%
do({model = glm(asr_round ~ relevel(ethgp, ref = 2), family = "poisson", data = .);
data.frame(nlRR_MNM = coef(model)[[2]], SE_MNM = summary(model)$coefficients[,2][2])})
This code works fine for the sample above.
When I run this code on my actual dataset, however, I get the following error message: Error in contrasts<-(tmp, value = contr.funs[1 + isOF[nn]]) :
contrasts can be applied only to factors with 2 or more levels
Because I have only one explanatory variable, ethgp, I assume this is the source of the error?
I tested whether there are levels in my data (not in the sample data):
str(M_NM_NZ$ethgp)
R responds: Factor w/ 2 levels "M","NM": 1 1 1 1 1 1 1 1 1 1 ...
I checked if there were NA values in the ethgp
sum(is.na(M_NM_NZ%ethgp))
R responds [1] 0
Are there other reasons I might be getting this error message?
I have seen this question Error in contrasts when defining a linear model in R But in this example, it sounds like the explanatory variable is not in the correct format, or has NA values. This is not the case in my data. Are there other reasons I might be getting this error?
I don't understand the underlying problem which causes this error when a factor does have more than one level.
In this instance I fixed the issue by converting the ethgp variable into a numeric variable.
df <- df %>%
mutate(ethnum = ifelse(ethgp == "M", 1, 0))
And then running the regressions using ethnum as the explanatory variable.
Poisson <- df %>% group_by(time, gender, age) %>%
do({model = glm(asr_round ~ ethnum, family = "poisson", data = .);
data.frame(nlRR_MNM = coef(model)[[2]], nlUCI = confint(model)[2,2], nlLCI = confint(model)[2,1])})
Poisson <- mutate(Poisson,
RR_MNM = round(exp(nlRR_MNM),digits = 3),
UCI = round(exp(nlUCI),digits = 3),
LCI = round(exp(nlLCI),digits = 3))
This code also computes the upper and lower 95% confidence intervals for each rate ratio.

Meaning of "trait" in MCMCglmm

Like in this post I'm struggling with the notation of MCMCglmm, especially what is meant by trait. My code ist the following
library("MCMCglmm")
set.seed(123)
y <- sample(letters[1:3], size = 100, replace = TRUE)
x <- rnorm(100)
id <- rep(1:10, each = 10)
dat <- data.frame(y, x, id)
mod <- MCMCglmm(fixed = y ~ x, random = ~us(x):id,
data = dat,
family = "categorical")
Which gives me the error message For error structures involving catgeorical data with more than 2 categories pleasue use trait:units or variance.function(trait):units. (!sic). If I would generate dichotomous data by letters[1:2], everything would work fine. So what is meant by this error message in general and "trait" in particular?
Edit 2016-09-29:
From the linked question I copied rcov = ~ us(trait):units into my call of MCMCglmm. And from https://stat.ethz.ch/pipermail/r-sig-mixed-models/2010q3/004006.html I took (and slightly modified it) the prior
list(R = list(V = diag(2), fix = 1), G = list(G1 = list(V = diag(2), nu = 1, alpha.mu = c(0, 0), alpha.V = diag(2) * 100))). Now my model actually gives results:
MCMCglmm(fixed = y ~ 1 + x, random = ~us(1 + x):id,
rcov = ~ us(trait):units, prior = prior, data = dat,
family = "categorical")
But still I've got a lack of understanding what is meant by trait (and what by units and the notation of the prior, and what is us() compared to idh() and ...).
Edit 2016-11-17:
I think trait is synoym to "target variable" or "response" in general or y in this case. In the formula for random there is nothing on the left side of ~ "because the response is known from the fixed effect specification." So the rational behind specifiying that rcov needs trait:units could be that it is alread defined by the fixed formula, what trait is (y in this case).
units is the response variable value, and trait is the response variable name, which corresponds to the categories. By specifying rcov = ~us(trait):units, you are allowing the residual variance to be heterogeneous across "traits" (response categories) so that all elements of the residual variance-covariance matrix will be estimated.
In Section 5.1 of Hadfield's MCMCglmm Course Notes (vignette("CourseNotes", "MCMCglmm")) you can read an explanation for the reserved variables trait and units.

Resources