Consider autocorrelation in a Linear Quantile mixed models (LQMM) - r

(I am using R and the lqmm package)
I was wondering how to consider autocorrelation in a Linear Quantile mixed models (LQMM).
I have a data frame that looks like this:
df1<-data.frame( Time=seq(as.POSIXct("2017-11-13 00:00:00",tz="UTC"),
as.POSIXct("2017-11-13 00:1:59",tz="UTC"),"sec"),
HeartRate=rnorm(120, mean=60, sd=10),
Treatment=rep("TreatmentA",120),
AnimalID=rep("ID01",120),
Experiment=rep("Exp01",120))
df2<-data.frame( Time=seq(as.POSIXct("2017-08-11 00:00:00",tz="UTC"),
as.POSIXct("2017-08-11 00:1:59",tz="UTC"),"sec"),
HeartRate=rnorm(120, mean=62, sd=14),
Treatment=rep("TreatmentB",120),
AnimalID=rep("ID02",120),
Experiment=rep("Exp02",120))
df<-rbind(df1,df2)
head(df)
With:
The heart rates (HeartRate) that are measured every second on some animals (AnimalID). These measures are carried during an experiment (Experiment) with different treatment possible (Treatment). Each animal (AnimalID) was observed for multiple experiments with different treatments. I wish to look at the effect of the variable Treatment on the 90th percentile of the Heart Rates but including Experiment as a random effect and consider the autocorrelation (as heart rates are taken every second). (If there is a way to include AnimalID as random effect as well it would be even better)
Model for now:
library(lqmm)
model<-lqmm(fixed= HeartRate ~ Treatment, random= ~1| Exp01, data=df, tau=0.9)
Thank you very much in advance for your help.
Let me know if you need more information.

For resources on thinking about this type of problem you might look at chapters 17 and 19 of Koenker et al. 2018 Handbook of Quantile Regression from CRC Press. Neither chapter has nice R code to go from, but they discuss different approaches to the kind of data you're working with. lqmm does use nlme machinery, so there may be a way to customize the covariance matrices for the random effects, but I suspect it would be easiest to either ask for help from the package author or to do a deep dive into the package code to figure out how to do that.

Another resource is the quantile regression model for mixed effects models accounting for autocorrelation in 'Quantile regression for mixed models with an application to examine blood pressure trends in China' by Smith et al. (2015). They model a bivariate response with a copula, but you could do the simplified version with univariate response. I think their model only at this points incorporates lag-1 correlation structure within subjects/clusters. The code for that model does not seem to be available online either though.

Related

Extracting linear term from a polynomial predictor in a GLM

I am relatively new to both R and Stack overflow so please bear with me. I am currently using GLMs to model ecological count data under a negative binomial distribution in brms. Here is my general model structure, which I have chosen based on fit, convergence, low LOOIC when compared to other models, etc:
My goal is to characterize population trends of study organisms over the study period. I have created marginal effects plots by using the model to predict on a new dataset where all covariates are constant except year (shaded areas are 80% and 95% credible intervals for posterior predicted means):
I am now hoping to extract trend magnitudes that I can report and compare across species (i.e. say a certain species declined or increased by x% (+/- y%) per year). Because I use poly() in the model, my understanding is that R uses orthogonal polynomials, and the resulting polynomial coefficients are not easily interpretable. I have tried generating raw polynomials (setting raw=TRUE in poly()), which I thought would produce the same fit and have directly interpretable coefficients. However, the resulting models don't really run (after 5 hours neither chain gets through even a single iteration, whereas the same model with raw=FALSE only takes a few minutes to run). Very simplified versions of the model (e.g. count ~ poly(year, 2, raw=TRUE)) do run, but take several orders of magnitude longer than setting raw=FALSE, and the resulting model also predicts different counts than the model with orthogonal polynomials. My questions are (1) what is going on here? and (2) more broadly, how can I feasibly extract the linear term of the quartic polynomial describing response to year, or otherwise get at a value corresponding to population trend?
I feel like this should be relatively simple and I apologize if I'm overlooking something obvious. Please let me know if there is further code that I should share for more clarity–I didn't want to make the initial post crazy long, but happy to show specific predictions from different models or anything else. Thank you for any help.

Type of regression for large dataset, nonlinear, skewed in R

I'm researching moth biomass in different biotopes, and I want to find a model that estimates the biomass. I have measured the length and width of the forewing, abdomen and thorax of 37088 specimens, and I have weighed them individually (dried).
First, I wanted to a simple linear regression of each variable on the biomass. The problem is, none of the assumptions are met. The data is not linear, biomass (and some variables) don't follow a normal distribution, there is heteroskedasticity, and a lot of outliers. Now I have tried to transform my data using log, x^2, 1/x, and boxcox, but none of them actually helped. I have also tried Thiel-Sen regression (not possible because of too much data) and Siegel regression (biomass is not a vector). Is there some other form of non-parametric or median-based regression that I can try? Because I am really out of ideas.
Here is a frequency histogram for biomass:
Frequency histogram dry biomass
So what I actually want to do is to build a model that accurately estimates the dry biomass, based on the measurements I performed. I have a power function (Rogers et al.) that is general for all insects, but there is a significant difference between this estimate and what I actually weighed. Therefore, I just want to build to build a model with all significant variables. I am not very familiar with power functions, but maybe it is possible to build one myself? Can anyone recommend a method? Thanks in advance.
To fit a power function, you could perhaps try nlsLM from the minpack.lm package
library(minpack.lm)
m <- nlsLM( y ~ a*x^b, data=your.data.here )
Then see if it performs satisfactory.

Survival Curves For Cox PH Models. Checking My Understanding About Plotting Them

Im using the book Applied Survival Analysis Using R by Moore to try and model some time-to-event data. The issue I'm running into is plotting the estimated survival curves from the cox model. Because of this I'm wondering if my understanding of the model is wrong or not. My data is simple: a time column t, an event indicator column (1 for event 0 for censor) i, and a predictor column with 6 factor levels p.
I believe I can plot estimated surival curves for a cox model as follows below. But I don't understand how to use survfit and baseplot, nor functions from survminer to achieve the same end. Here is some generic code for clarifying my question. I'll use the pharmcoSmoking data set to demonstrate my issue.
library(survival)
library(asaur)
t<-pharmacoSmoking$longestNoSmoke
i<-pharmacoSmoking$relapse
p<-pharmacoSmoking$levelSmoking
data<-as.data.frame(cbind(t,i,p))
model <- coxph(Surv(data$t, data$i) ~ p, data=data)
As I understand it, with the following code snippets, modeled after book examples, a baseline (cumulative) hazard at my reference factor level for p may be given from
base<-basehaz(model, centered=F)
An estimate of the survival curve is given by
s<-exp(-base$hazard)
t<-base$time
plot(s~t, typ = "l")
The survival curve associated with a different factor level may then be given by
beta_n<-model$coefficients #only one coef in this case
s_n <- s^(exp(beta_n))
lines(s_n~t)
where beta_n is the coefficient for the nth factor level from the cox model. The code above gives what I think are estimated survival curves for heavy vs light smokers in the pharmcoSmokers dataset.
Since thats a bit of code I was looking to packages for a one-liner solution, I had a hard time with the documentation for Survival ( there weren't many examples in the docs) and also tried survminer. For the latter I've tried:
library(survminer)
ggadjustedcurves(model, variable ="p" , data=data)
This gives me something different than my prior code, although it is similar. Is the method I used earlier incorrect? Or is there a different methodology that accounts for the difference? The survminer code doesn't work from my data (I get a 'can't allocated vector of size yada yada error, and my data is ~1m rows) which seems weird considering I can make plots using what I did before no problem. This is the primary reason I am wondering if I am understanding how to plot survival curves for my model.

What does "survivalsvm" predict?

I'm trying to develop a predictive model using cancer survival data and used the R package survivalsvm which uses SVM method. After running the following code i got some results but finding it difficult to interpret it. I know that in Cox regression it predicts the Cumulative Hazard Function, but is it the same in survivalsvm? I ran both Cox and survivalsvm models and the results are quite different:
smodel_svm = survivalsvm(Surv(time, outcome) ~ radius.mean + tumor.size, data=training_set, gamma.mu = 1)
pred_test_svm = predict(smodel_svm, test_set)
summary(pred_test_svm)
The difference might be because you're using default parameters that is using type="regression" that uses the regression approach as described in this paper.
In summary, the authors(Van Belle et al. ) propose a different approach(MODEL 2 and MODEL 3) that essentially uses a Cox model but with both regression and ranking constraints.
Note however that the authors concluded:
Comparison of model 2 with the
coxmodel revealed no significant differences in performance. The advantage of model 2 above cox model lies in the easy extension towards non-linear models without the need to check non-linearities in the variables before modelling
From the function's docs(focusing on the parameter type):
The following denotations are used for the models implemented:
'regression' referring to the regression approach, named SVCR in Van Belle et al. (2011b),
'vanbelle1' according to the first version of survival surpport vector machines based on ranking constraints, named RANKSVMC by Van Belle et al. (2011b),
'vanbelle2' according to the second version of survival surpport vector machines based on ranking constraints like presented in model1 by Van Belle et al. (2011b) and
Predictions from survivalsvm have to be interpreted as ranks. The way in survival svm is to be able to predict ranks among individuals to estimate which patient, for example, should be handled earlier than others. See also fouodo et al. (2018) for more details about package usage in R.
If you use the regression approach, it will tell you the prediction of survival time. If you use vanbelle1 or vanbelle2 it will tell you the rank. The hybrid method also tells u the rank of the observation. As I know, from the rank we have, we can cluster them into groups of high-risk and low-risk using the rank or sometimes refer as a prognostic index.

How to check and control for autocorrelation in a mixed effect model of longitudinal data?

I have behavioral data for many groups of birds over 10 days of observation. I wanted to investigate whether there is a temporal pattern in some behaviors (e.g. does mate competition increase over time?) And I was told that I had to account for the autocorrelation of the data, since behavior is unlikely to be independent in each day.
However I was wondering about two things:
Since I'm not interested in the differences in y among days but the trend of y over days, do I still need to correct for autocorrelation?
If yes, how do I control for the autocorrelation so that I'm left out only with the signal (and noise of course)?
For the second question, keep in mind I will be analyzing the effect of time on behavior using mixed models in R (since there are random effects such as pseudo-replication), but I have not found any straightforward method of correcting for autocorrelation in the data when modeling the responses.
(1) Yes, you should check for/account for autocorrelation.
The first example here shows an example of estimating trends in a mixed model while accounting for autocorrelation.
You can fit these models with lme from the nlme package. Here's a mixed model without autocorrelation included:
cmod_lme <- lme(GS.NEE ~ cYear,
data=mc2, method="REML",
random = ~ 1 + cYear | Site)
and you can explore the autocorrelation by using plot(ACF(cmod_lme)).
(2) Add correlation to the model something like this:
cmod_lme_acor <- update(cmod_lme,
correlation=corAR1(form=~cYear|Site)
#JeffreyGirard notes that
to check the ACF after updating the model to include the correlation argument, you will need to use plot(ACF(cmod_lme_acor, resType = "normalized"))

Resources