Conditional Logisitc regression analysis with mgcv gam models - r

I am running a GAM model through the mgcv package with family = cox.ph() and have my data grouped by strata (strata = id). The data corresponds to one use location for an individual animal and 20 random locations associated with that individual that were available for use.
require(mgcv)
require(survival)
require(smoothHR)
gam1 = gam(time1~s(DWL)+strata(id),family=cox.ph(),method = "REML",data=dataset, weight = event1)
The model is running smoothly but I am unsure how to plot relationships to x-variable. DWL is a continuous variable. I have used the following to graph predictions:
x = seq(0,120) #extent of DWL values
plot(gam1,residuals=T,trans=function(x)exp(x)/(1+exp(x)),shade=T)
I am a bit confused about the use of the trans argument in the plot syntax. Using the cox.ph() for your family agrument, Is the logit-link the proper way to evaluate your predicted y-response to the x variable DWL?
Thank you,
P Farrell

Related

Linear Mixed-Effects Models for a big spatial auto-correlated dataset

So, I am working with a big dataset (55965 points). I am trying to run a LME accounting for correlation. But R will return me this
Error: 'sumLenSq := sum(table(groups)^2)' = 3.13208e+09 is too large.
Too large or no groups in your correlation structure?
I can not subset it since I need all the points. My questions are:
Is there some setting I can change in the function?
If not, is there any other package with similar function that would run such a big dataset?
Here is a reproducible example:
require(nlme)
my.data<- matrix(data = 0, nrow = 55965, ncol = 3)
my.data<- as.data.frame(my.data)
dummy <- rep(1, 55965)
my.data$dummy<- dummy
my.data$V1<- seq(780, 56744)
my.data$V2<- seq(1:55965)
my.data$X<- seq(49.708, 56013.708)
my.data$Y<-seq(-12.74094, -55977.7409)
null.model <- lme(fixed = V1~ V2, data = my.data, random = ~ 1 | dummy, method = "ML")
spatial_model <- update(null.model, correlation = corGaus(1, form = ~ X + Y), method = "ML")
Since you have assigned a grouping factor with only one level, there are no groups in the data, which is what the error message reports. If you just want to account for spatial autocorrelation, with no other random effects, use gls from the same package.
Edit: A further note on 2 different approaches to modelling spatial autocorrelation: The corrGauss (and other corrSpatial type functions) implement spatial correlation models for regression residuals, which is different from, say, a spatial random effect added to the model based on county/district/grid identity.

Prediction interval for ACP model in R

I'm trying to teach myself a bit about modeling time series for 'counts' data. I found a pretty simple model, the Autoregressive Conditional Poisson model (ACP) (Heinen 2003), that has an accompanying R package {acp}. I'm having trouble finding information about how to construct n-step-ahead prediction intervals for predictions made from an ACP model. Inconveniently, forecast doesn't work with these ACP objects. Any thoughts on how to construct these?
Additionally, when using predict() with an ACP model, you have to include an argument, newydata, that is a data frame of the values you want to predict...? Maybe I'm misinterpreting this, but it seems like you need to already have y when predicting yhat. Why?
Below I copy/pasted the example code from the {acp} package.
library(acp)
data(polio)
trend=(1:168/168)
cos12=cos((2*pi*(1:168))/12)
sin12=sin((2*pi*(1:168))/12)
cos6=cos((2*pi*(1:168))/6)
sin6=sin((2*pi*(1:168))/6)
#Autoregressive Conditional Poisson Model with explaning covariates
polio_data<-data.frame(polio, trend , cos12, sin12, cos6, sin6)
mod1 <- acp(polio~-1+trend+cos12+sin12+cos6+sin6,data=polio_data, p = 1 ,q = 2)
summary(mod1)
#Static out-of-sample fit example
train<-data.frame(polio_data[c(1: 119),])
mod1t <- acp(polio~-1+trend+cos12+sin12+cos6+sin6,data=train, p = 1 ,q = 2)
xpolio_data<-data.frame(trend , cos12, sin12, cos6, sin6)
test<-xpolio_data[c(120:nrow(xpolio_data)),]
yfor<-polio_data[120:nrow(polio_data),1]
predict(mod1t,yfor,test)
#Autoregressive Conditional Poisson Model without explaning covariates
polio_data<-data.frame(polio)
mod2 <- acp(polio~-1,data=polio_data, p = 3 ,q = 1)
summary(mod2)
The second argument in the predict() command is the vector of observed y values that confuses me.
Thanks!

Running a GLM with Poisson distribution with combined columns In R

Is it possible to run a GLM with a poisson distribution with a variable that has combined columns in R?
I am looking at the effects of different species, the cage density and the day that eggs are laid on how many eggs were laid and how many hatched, so I have linked the hatched and unhatched columns. My data are count data. The code works ok with family = binomial but I want to test if poisson is a better model.
My code is as follows:
attach(EggV)
density <- as.factor(Density)
day <- as.factor(Day)
Y <- cbind (Hatched, Unhatched)
model.pois <- glm(Y ~ Species + density + day, data = EggV, family = poisson)
But once I run the code it give me an error:
Error in x[good, , drop = FALSE] : (subscript) logical subscript too long
If I run the same code with only the variables "Hatched" or "Unhatched" it works but this is not sufficient for my data analysis.

R: varying-coefficient GAMM models in mgcv - extracting 'by' variable coefficients?

I am creating a varying-coefficient GAMM using 'mgcv' in R with a continuous 'by' variable by using the by setting. However, I am having difficulty in locating the parameter estimate of the effect of the 'by' variable. In this example we determine the spatially-dependent effect of temperature t on sole eggs (i.e. how the linear effect of temperature on sole eggs changes across space):
require(mgcv)
require(gamair)
data(sole)
b = gam(eggs ~ s(la,lo) + s(la,lo, by = t), data = sole)
We can then plot the predicted effects of s(la,lo, by = t) against the predictor t:
pred <- predict(b, type = "terms", se.fit =T)
by.variable.prediction <- pred[[1]][,2]
plot(x= sole$t, y = by.variable.prediction)
However, I can't find a listing/function with the parameter estimates of the 'by' variable t for each sampling location. summary(), coef(), and predict() do not give you the parameter estimates.
Any help would be appreciated!
So the coefficient for the variable t is the value where t is equal to 1, conditional on the latitude and longitude. So one way to get the coefficient/parameter estimate for t at each latitude and longitude is to construct your own dataframe with a range of latitude/longitude combinations with t=1 and run predict.gam on that (rather than running predict.gam on the data used the fit the model, as you have done). So:
preddf <- expand.grid(list(la=seq(min(sole$la), max(sole$la), length.out=100),
lo=seq(min(sole$lo), max(sole$lo), length.out=100),
t=1))
preddf$parameter <- predict(b, preddf, type="response")
And then if you want to visualize this coefficient over space, you could graph it with ggplot2.
library(ggplot2)
ggplot(preddf) +
geom_tile(aes(x=lo, y=la, fill=parameter))

Plot each predictor variable from multivariate GLM versus response (other predictors held constant)

I can plot one predictor variable (from a mulitvariate logistic, binomial GLM) versus the predicted response. I do it like this:
m3 <- mtcars # example with mtcars
model = glm(vs~cyl+mpg+wt+disp+drat,family=binomial, data=m3)
newdata <- m3
newdata$cyl <- mean(m3$cyl)
newdata$mpg <- mean(m3$mpg)
newdata$wt <- mean(m3$wt)
newdata$disp <- mean(m3$disp)
newdata$drat <- m3$drat
newdata$vs <- predict(model, newdata = newdata, type = "response")
ggplot(newdata, aes(x = drat, y = vs)) + geom_line()
Above, drat vs vs with all other predictors held constant. However, I would to do this for each of the predictor variables, and doing the above process each time seems tedious. Is there a smarter way to do this? I'd like to visualize the response of each the different predictors and eventually, perhaps, at different constants.
Check the response.plot2 function in the biomod2 package. It was developed to create response curves for species distribution models but it essentially does what you need- it generates a multi pannel plot with responses for each variable used in your model. It also outputs the data into a data structure that can then be used to plot in whichever way you like.

Resources