I am trying to estimate the model below. The model uses a package in R called brms. I am a doing all the data manipulation in Python. To bridge the two languages I am using rpy2. I am able to load the brms package with rpy2, but I can't figure out the syntax to estimate the model. Below is a simple example of what I would like to do. I tried to follow the documentation on rpy2's website, but I can't seem to get it to work. This code works natively in R. How do I translate it to rpy2?
library(brms)
data("kidney", package = "brms")
head(kidney, n = 3)
fit1 <- brm(time | cens(censored) ~ age + sex + disease,
data = kidney, family = weibull, inits = "0")
summary(fit1)
plot(fit1)
fit2 <- brm(time | cens(censored) ~ age + sex + disease + (1|patient),
data = kidney, family = weibull(), inits = "0",
prior = set_prior("cauchy(0,2)", class = "sd"))
summary(fit2)
plot(fit2)
In Python, every non-built-in attribute or object must be qualified with a namespace. Fortunately, in R everything is an object within implicit namespaces! Most new useRs may not know but built-in core libraries, base, stats, utils, are loaded with each session. So many everyday functions like read.csv, data.frame, and lapply are actually methods within libraries and can be called in Python's style with double-colon operator: utils::read_csv(), base::lapply(), stats::lm(). To find such libaries, in R check method's doc pages with ? (i.e., ?lapply) and find in upper left corner.
Therefore, simply retain all of your R syntax, of course, adhering to Python's syntax rules such as translating dot names and without the assignment <- operator. However, rpy2 does not render graphs interactively, so you need to save plots as images to disk and print any console output. Also, one challenge may be the loading of built-in datasets. Below includes the mtcars load from the built-in R datasets package. Hopefully it is translatable.
from rpy2.robjects.packages import importr, data
# IMPORT R PACKAGES
base = importr('base')
utils = importr('utils')
datasets = importr('datasets')
stats = importr('stats', robject_translations={'as.formula': 'as_formula'})
graphics = importr('graphics')
grDevices = importr('grDevices')
brms = importr('brms')
# LOADING DATA
# WORKING EXAMPLE: mtcars = data(datasets).fetch('mtcars')['mtcars']
kidney_df = data(brms).fetch('kidney')['kidney']
print(utils.head(kidney_df, n = 3))
# MODELING
formula1 = stats.as_formula("time | cens(censored) ~ age + sex + disease")
fit1 = brms.brm(formula1, data = kidney_df, family = "weibull", inits = "0")
print(stats.summary(fit1))
formula2 = stats.as_formula("time | cens(censored) ~ age + sex + disease + (1|patient)")
fit2 <- brms.brm(formula2, data = kidney_df, family = "weibull", inits = "0",
prior = brms.set_prior("cauchy(0,2)", class = "sd"))
print(stats.summary(fit2))
# GRAPHING
grDevices.png('/path/to/plot1.png')
graphics.plot(fit1)
grDevices.dev_off()
grDevices.png('/path/to/plot2.png')
graphics.plot(fit2)
grDevices.dev_off()
Related
I'm trying out a package regarding double machine learning (https://rdrr.io/github/yixinsun1216/crossfit/) and in trying to run the main function dml(), I get the following "Error: No tidy method for objects of class dgCMatrix" using example dataframe data. When looking through the documentation (https://rdrr.io/github/yixinsun1216/crossfit/src/R/dml.R), I can't find anything wrong with how tidy() is used. Does anyone have any idea what could be going wrong here?
R version 4.2.1
I have already tried installing broom.mixed, although broomextra doesn't seem to be available for my R version, and the same problem occurs. Code used below;
install.packages("remotes")
remotes::install_github("yixinsun1216/crossfit", force = TRUE)
library("remotes")
library("crossfit")
library("broom.mixed")
library("broom")
# Effect of temperature and precipitation on corn yield in the presence of
# time and locational effects
data(corn_yield)
library(magrittr)
dml_yield <- "logcornyield ~ lower + higher + prec_lo + prec_hi | year + fips" %>%
as.formula() %>%
dml(corn_yield, "linear", n = 5, ml = "lasso", poly_degree = 3, score = "finite")
I am using the package lqmm, to run a linear quantile mixed model on an imputed object of class mira from the package mice. I tried to make a reproducible example:
library(lqmm)
library(mice)
summary(airquality)
imputed<-mice(airquality,m=5)
summary(imputed)
fit1<-lqmm(Ozone~Solar.R+Wind+Temp+Day,random=~1,
tau=0.5, group= Month, data=airquality,na.action=na.omit)
fit1
summary(fit1)
fit2<-with(imputed, lqmm(Ozone~Solar.R+Wind+Temp+Day,random=~1,
tau=0.5, group= Month, na.action=na.omit))
"Error in lqmm(Ozone ~ Solar.R + Wind + Temp + Day, random = ~1, tau = 0.5, :
`data' must be a data frame"
Yes, it is possible to get lqmm() to work in mice. Viewing the code for lqmm(), it turns out that it's a picky function. It requires that the data argument is supplied, and although it appears to check if the data exists in another environment, it doesn't seem to work in this context. Fortunately, all we have to do to get this to work is capture the data supplied from mice and give it to lqmm().
fit2 <- with(imputed,
lqmm(Ozone ~ Solar.R + Wind + Temp + Day,
data = data.frame(mget(ls())),
random = ~1, tau = 0.5, group = Month, na.action = na.omit))
The explanation is that ls() gets the names of the variables available, mget() gets those variables as a list, and data.frame() converts them into a data frame.
The next problem you're going to find is that mice::pool() requires there to be tidy() and glance() methods to properly pool the multiple imputations. It looks like neither broom nor broom.mixed have those defined for lqmm. I threw together a very quick and dirty implementation, which you could use if you can't find anything else.
To get pool(fit2) to run you'll need to create the function tidy.lqmm() as below. Then pool() will assume the sample size is infinite and perform the calculations accordingly. You can also create the glance.lqmm() function before running pool(fit2), which will tell pool() the residual degrees of freedom. Afterwards you can use summary(pooled) to find the p-values.
tidy.lqmm <- function(x, conf.int = FALSE, conf.level = 0.95, ...) {
broom:::as_tidy_tibble(data.frame(
estimate = coef(x),
std.error = sqrt(
diag(summary(x, covariance = TRUE,
R = 50)$Cov[names(coef(x)),
names(coef(x))]))))
}
glance.lqmm <- function(x, ...) {
broom:::as_glance_tibble(
logLik = as.numeric(stats::logLik(x)),
df.residual = summary(x, R = 2)$rdf,
nobs = stats::nobs(x),
na_types = "rii")
}
Note: lqmm uses bootstrapping to estimate the standard error. By default it uses R = 50 bootstrapping replicates, which I've copied in the tidy.lqmm() function. You can change that line to increase the number of replicates if you like.
WARNING: Use these functions and the results with caution. I know just enough to be dangerous. To me it looks like these functions work to give sensible results, but there are probably intricacies that I'm not aware of. If you can find a more authoritative source for similar functions that work, or someone who is familiar with lqmm or pooling mixed models, I'd trust them more than me.
I read this article (https://journal.r-project.org/archive/2021/RJ-2021-073/RJ-2021-073.pdf) about multiple imputation and propensity score matching - here is the code from this article:
# code from "MatchThem:: Matching and Weighting after Multiple Imputation", Pishgar et al, The R Journal Vol. XX/YY, AAAA 20ZZ:
library(MatchThem)
data('osteoarthritis')
summary(osteoarthritis)
library(mice)
imputed.datasets <- mice(osteoarthritis, m = 5)
matched.datasets <- matchthem(OSP ~ AGE + SEX + BMI + RAC + SMK,
datasets = imputed.datasets,
approach = 'within',
method = 'nearest',
caliper = 0.05,
ratio = 2)
weighted.datasets <- weightthem(OSP ~ AGE + SEX + BMI + RAC + SMK,
datasets = imputed.datasets,
approach = 'across',
method = 'ps',
estimand = 'ATM')
library(cobalt)
bal.tab(matched.datasets, stats = c('m', 'ks'),
imp.fun = 'max')
bal.tab(weighted.datasets, stats = c('m', 'ks'),
imp.fun = 'max')
library(survey)
matched.models <- with(matched.datasets,
svyglm(KOA ~ OSP, family = quasibinomial()),
cluster = TRUE)
weighted.models <- with(weighted.datasets,
svyglm(KOA ~ OSP, family = quasibinomial()))
matched.results <- pool(matched.models)
summary(matched.results, conf.int = TRUE)
As far as I understand the author first uses multiple imputation with mice (m = 5) and continues with the matching procedure with MatchThem - in the end MatchThem gives back a "mimids-object" called "matched.datasets" which contains the 5 different dataset of multiple imputation.
There is the "complete" function which can extract one of the datasets, f.e.
newdataset <- complete(matched.datasets, 2) # extracts the second dataset.
So newdataset is a data frame without NAs (because imputed) and can be used for any further tests.
Now, I would like to use a dataset as a dataframe (like after using complete), but this dataset should be some kind of a "mean" of all datasets - because how could I decide, which of the 5 datasets I use for my further analyses? Is there a way of doing something like this:
meanofdatasets <- complete(matched.datasets, meanofall5datasets) # extracts a dataset which contains something like the mean values of all datasets
In my data, for which I want to use this method, I would like to use an imputed and matched dataset of my original about 500 rows to do further tests, f.e. cox regression, kaplan meier plots or competing risk analyses as well as simple descriptive statistics with plots about the matched population. But on which of the 5 datasets do I have to append my tests? For those tests I need a real data frame, don't I?
Thank you for any help!
here is some valuable source (from the creator of the mice package : Stef Vanbuuren) to learn why you should NOT average the multiples dataset, but POOL the estimates of each imputed dataset for instance doing your cox regression (see section 5.1 workflow).
Quick steps for Cox regression:
you can easily do the imputation + multiple imputation with matchthem() which will give you a mimids class object.
Then do your cox regression through with() function on your mimids object.
Finally pool your estimates through pool(), which will give you a mimira object.
Eventually mimira object is easily managed with gtsummary package (tbl_regression) which give you a fine and readily publishable table.
Link to data (1170 obs, 9 variables, .Rd file)
Simply read it in using readRDS(file).
I´m trying to setup a GLMM using the glmmPQL function from the MASS package including a random effects part and accounting for spatial autocorrelation. However, R (Version: 3.3.1) crashes upon execution.
library(nlme)
# setup model formula
fo <- hail ~ prec_nov_apr + t_min_nov_apr + srad_nov_apr + age
# setup corSpatial object
correl = corSpatial(value = c(10000, 0.1), form = ~ry + rx, nugget = TRUE,
fixed = FALSE, type = "exponential")
correl = Initialize(correl, data = d)
# fit model
fit5 <- glmmPQL(fo, random = ~1 | date, data = d,
correl = correl, family = binomial)
What I tried so far:
reduce number of observation
play with corSpatial parameters (range and nugget)
reduce number of fixed predictors
execute code on Windows, Linux (Debian) and Mac R installations
While I get no error message on my local pc (RStudio just crashes), running the script on a server returns the following error message:
R: malloc.c:3540: _int_malloc: Assertion (fwd->size & 0x4) == 0' failed. Aborted
I'd use the INLA package to model this, as it allows to use spatially correlated random effects. The required code is a bit too long to place here. Therefore I've place it in a document on http://rpubs.com/INBOstats/spde
I am trying to fit a multi-state model using R package R2BayesX. How can I do so correctly? There is no example in the manual. Here is my attempt.
activity is 1/0 ie the states
time is time
patient id is the random effect I want
f <- activity ~ sx(time,bs="baseline")+sx(PatientId, bs="re")
b <- bayesx(f, family = "multistate", method = "MCMC", data=df)
Note: created new output directory
Warning message:
In run.bayesx(file.path(res$bayesx.prg$file.dir, prg.name = res$bayesx.prg$prg.name), :
an error occurred during runtime of BayesX, please check the BayesX
logfile!
I'm not sure what kind of model exactly you want to specify but I tried to provide an artificial non-sensical data set to make the error above reproducible:
set.seed(1)
df <- data.frame(
activity = rbinom(1000, prob = 0.5, size = 1),
time = rep(1:50, 20),
id = rep(1:20, each = 50)
)
Possibly, you could provide an improved example. And then I can run your code:
library("R2BayesX")
f <- activity ~ sx(time, bs = "baseline") + sx(id, bs = "re")
b <- bayesx(f, family = "multistate", method = "MCMC", data = df)
This leads to the warning above and you can inspect BayesX's logfile via:
bayesx_logfile(b)
which tells you (among other information):
ERROR: family multistate is not allowed for method regress
So here only REML estimation appears to be supported, but:
b <- bayesx(f, family = "multistate", method = "REML", data = df)
also results in an error, the logfile says:
ERROR: Variable state has to be specified as a global option!
So the state has to be provided in a different way. I guess that you tried to do so by the binary response but it seems that the response should be the time variable (as in survival models) and then an additional state indicator needs to be provided somehow. I couldn't find an example for this in the BayesX manuals, though. I recommend that you contact the BayesX mailing list and/or the R2BayesX package maintainer with a more specific question and a reproducible example.