I attempt to conduct generalized linear model in a microbiome dataset (16S HTS).
Given that microbiome dataset is compositional, I am wondering if there are any resources that compatible for doing the analysis.
I am using R package mvabund (doi: 10.1111/j.2041-210X.2012.00190.x), but the examples and documentation did not provide clues on compositional data.
It seems that it is not appropriate to treat the count data as abundance data, would converting the dataset into relative abundance or performing clr-transformation solve the problem?
Related
I am performing a spatial analysis of student grades according to their city of origin using R. I have several covariates such as poverty, education and socio-cultural indices. So far I have fitted univariate models such as: linear regression, weighted linear regression and CAR (conditional autoregressive).
Now, I am reading "Hierarchical Modeling and Analysis for Spatial Data" from Banerjee, Carlin and Gelfand. I am interested in applying multivariate models, in particular a MCAR (Multivariate Conditional Autoregressive) model.
However, I have not found any code in R (or Python) that has it implemented. The most possible has been the "spatialreg" library that includes univariate CAR and SAR models.
Is there any library that you know of that includes them? Thanks in advance
I have found "CARBayes" package. This works perfectly for fitting MCAR model.
I used mice to impute five missing data sets, saved as the object "allImputations" in the code below. I then needed to complete linear and dichotomous regression analyses across the imputed data sets (see below for a successful example):
SIStep2a<-with(allImputations, lm(Y1~X1+X2+X3))
SIStep2a<-as.mira(SIStep2a)
summary(pool(SIStep2a))
pool.r.squared(SIStep2a, adjusted = FALSE)
The above code above provides all the information I need for linear models, but I run into problems when I use glm to perform a logit regression in the same dataset(s).
treat.Step1a<-with(allImputations, glm(Y2~X1+X2+X3, family=binomial))
treat.Step1a<-as.mira(treat.Step1a)
summary(pool(treat.Step1a))
In this instance, I need a pooled pseudo R2 or other pooled model fit index (similar to the pool.r.squared function). However, I cannot find a way to produce either pooled model fit indices OR the fit indices for each analysis of the five imputed data sets.
Essentially, is there a pool.r.squared analog for glm analyses across multiply imputed datasets from mice? Or is there a longhand way to calculate this via the info in the saved object "treat.Step1a" above? Or is there a way to isolate fit indices for each of the five analyses completed for each imputed data set?
Update
I was able to download a package directly from GitHub (glmice), which was no longer available via CRAN. However, the command mcf() would not successfully execute in my current R Studio version.
I ultimately ran each step of the model (i.e. as I added each block of variables) across all five imputed data sets and averaged the McFadden's R2 of all five imputed datasets to very roughly assess the improvement in Pseudo R2.
Is this an acceptable middle ground approach?
Does anyone have experience with obtaining latent trait scores from a repeated measure design. Currently, my data is in long format (where baseline data is stacked on post-treatment data, similar to the way you would fix the dataset for lmer). With this dataset in long format, I attempted to use the mixedmirt function from the mirt package and regressed the latent trait on Time; however, I am unsure if mirt recognized that my design is repeated. It seemed to model the correlation and covariance in the latent abilities. Does anyone have experience with mirt or can recommend another R package for my analysis?
I have a dataset containing information about weather, air pollution and healthoutcomes. I want to regress temperature (T) and temperature lag (T1) against cardiac deaths (CVD). I have previously used the glm model in R using the following script:
#for mean daily temperature and temperature lags separately.
modelT<-glm(cvd~T, data=datapoisson, family=poisson(link="log"), na=na.omit)
I get the effect estimates and standard error values which i used to convert to risk ratio.
Now i want to use dynamic linear model or distributed linear model for check the predictor-outcome and lagged predictor outcome association. However, i can't find the script for running the model in R.
I installed the DLM package in R, but still can't figure out how to build a model using DLM package in R.
I would appreciate if someone can help with it.
Could you try least squares multiple regression to predict the outcome? I used that method when I tried to 'predict' which factors influenced power in a floating offshore wind turbine. It is good for correlating multiple parameters.
They fit a plane to a set of points, but it seems like a similar idea.
https://math.stackexchange.com/questions/99299/best-fitting-plane-given-a-set-of-points
I have two temporal processes. I would like to see if one temporal process (X_{t,2}) can be used to perform better forecast of the other process (X_{t,1}). I have multiple sources providing temporal data on X_{t,2}, (e.g. 3 time series measuring X_{t,2}). All time series require a seasonal component.
I found MARSS' notation to be pretty natural to fit this type of model and the code looks like this:
Z=factor(c("R","S","S","S")) # observation matrix
B=matrix(list(1,0,"beta",1),2,2) #evolution matrix
A="zero" #demeaned
R=matrix(list(0),4,4); diag(R)=c("r","s","s","s")
Q="diagonal and unequal"
U="zero"
period = 12
per.1st = 1 # Now create factors for seasons
c.in = diag(period)
for(i in 2:(ceiling(TT/period))) {c.in = cbind(c.in,diag(period))}
c.in = c.in[,(1:TT)+(per.1st-1)]
rownames(c.in) = month.abb
C = "unconstrained" #2 x 12 matrix
dlmfit = MARSS(data, model=list(Z=Z,B=B,Q=Q,C=C, c=c.in,R=R,A=A,U=U))
I got a beta estimate implying that the second temporal process is useful in forecasting the first process but to my dismay, MARSS gives me an error when I use MARSSsimulate to forecast because one of the matrices (related to seasonality) is time-varying.
Anyone, knows a way around this issue of the MARSS package? And if not, any tips on fitting an analogous model using, say the dlm package?
I was able to represent my state-space model in a form adequate to use with the dlm package. But I encountered some problems using dlm too. First, the ML estimates are VERY unstable. I bypassed this issue by constructing the dlm model based on marss estimates. However, dlmFilter is not working properly. I think the issue is that dlmFilter is not designed to deal with models with multiple sources for one time series, and additional seasonal components. dlmForecast gives me forecasts that I need!!!
In summary for my multivariate time series model (with multiple sources providing data for one of the temporal processes), the MARSS library gave me reasonable estimates of the parameters and allowed me to obtain filtered and smoothed values of the states. Forecast values were not possible. On the other hand, dlm gave fishy estimates for my model and the dlmFilter didn't work, but I was able to use dlmForecast to forecast values using the model I fitted in MARSS and reexpressed in dlm appropriate form.