I fit the following glm model using the survey package:
design <- svydesign(ids=training.data$name, design=design,family=quasibinomial(), data=training.data)
significant.model <- svyglm(Win~x+ y + start+ speed+ vx0 + vy0 + ay + az + length+ rate+ height+ hand+ zone+ count, design=design, family=quasibinomial, data=training.data)
I have a set of test data that I excluded from the model fitting process so that I would be able to see how the model predicts the outcomes for the test data and examine the difference.
Typically, I would use makeFun in the mosaic package, but this does not support objects of type svyglm. Is there another function or method that I can use to create a function for the model?
There are a lot of categorical variables with multiple levels, so writing a user-defined function is not ideal in this situation.
I'm not sure what difficulty you were experiencing since your example is not reproducible. But since an svyglm object is a glm object, makeFun() will create a wrapper around predict() just as it would do for any glm object. This has not been tested extensively, but it seems to work in the following example:
r
example(svyglm)
f <- makeFun(api.reg)
f(enroll = 500)
Related
I am adjusting a mixed effects model which, due to the observed heteroscedasticity, it was necessary to include an effect to accommodate it. Therefore, using the lme function of the nlme package, this was easy to be solved, see the code below:
library(nlme)
library(lme4)
Model1 <- lme(log(Var1)~log(Var2)+log(Var3)+
(Var4)+(Var5),
random = ~1|Var6, Data1, method="REML",
weights = varIdent(form=~1|Var7))
#Var6: It is a factor with several levels.
#Var7: It is a Dummy variable.
However, I need to readjust the model described above using the lme4 package, that is, using the lmer function. It is known and many are the materials that inform some limitations existing in the lme4, such as, for example, modeling heteroscedasticity. What motivated me to readjust this model is the fact that I have an interest in using a specific package that in cases of mixed models it only accepts if they are adjusted through the lmer function. How could I resolve this situation? Below is a good part of the model adjusted using the lmer function, however, this model is not considering the effect to model the observed heteroscedasticity.
Model2 <- lmer(log(Var1)~log(Var2)+log(Var3)+
(Var4)+(Var5)+(1|Var6),
Data1, REML=T)
Regarding the choice of the random effect (Var6) and the inclusion of the effect to consider the heterogeneity by levels of the variable (Var7), these were carefully analyzed, however, I will not put here the whole procedure so as not to be an extensive post and to be more objective .
This is hackable. You need to add an observation-level random effect that is only applied to the group with the larger residual variance (you need to know this in advance!), via (0+dummy(Var7,"1")|obs); this has the effect of multiplying each observation-level random effect value by 1 if the observation is in group "1" of Var7, 0 otherwise. You also need to use lmerControl() to override a few checks that lmer does to try to make sure you are not adding redundant random effects.
Data1$obs <- factor(seq(nrow(Data1)))
Model2 <- lmer(log(Var1)~log(Var2)+log(Var3)+
(Var4)+(Var5) + (1|Var6) +
(0+dummy(Var7,"1")|obs),
Data1, REML=TRUE,
control=lmerControl(check.nobs.vs.nlev="ignore",
check.nobs.vs.nRE="ignore"))
all.equal(REMLcrit(Model2), c(-2*logLik(Model1))) ## TRUE
all.equal(fixef(Model1), fixef(Model2), tolerance=1e-7)
If you want to use this model with hnp you need to work around the fact that hnp doesn't pass the lmerControl option properly.
library(hnp)
d <- function(obj) resid(obj, type="pearson")
s <- function(n, obj) simulate(obj)[[1]]
f <- function(y.) refit(Model2, y.)
hnp(Model2, newclass=TRUE, diagfun=d, simfun=s, fitfun=f)
You might also be interested in the DHARMa package, which does similar simulation-based diagnostics.
I'm doing some regression using the geepack package and want to use multiple imputation to deal with missing values. The pool() command in mi doesn't work for my GEE, so I have to export (is that right?) so that I can use the data in geepack.
The complete() function produces each iteration, but not the pooled estimates.
Is there a way to produce a data frame with the pooled estimates?
The complete function in the mi package produces a list of m data.frames. You can call gee on each element of that list for the data argument and then use Rubin's rules to obtain pooled estimates.
There are a couple if packages that implement Rubin's rules in R (e.g., mi, mice, mitools, and mitml). The problem is that these implementation require that the functions for fitting statistical models have working methods for coef() and vcov() defined.
The geeglm() function, however, does not define vcov(), and standard implementations will not work. To remedy that situation, it is easiest to just define the missing method for the GEE. Below is an example using the mitml package and one of the example data sets provided with geepack.
library(geepack)
library(mitml)
# example data
data(dietox)
# example imputation
fml <- Feed + Weight ~ 1 + Time + (1|Pig)
imp <- panImpute(data=dietox, formula=fml, n.burn=5000, n.iter=500)
implist <- mitmlComplete(imp, "all")
# fit GEE
fit <- with(implist, geeglm(Weight ~ 1 + Time + Feed, id=Pig))
# define missing vcov() function for geeglm-objects
vcov.geeglm <- function(x) summary(x)$cov.scaled
# combine estimates using Rubin's rules
testEstimates(fit)
I was wondering if it is possible to predict with the plm function from the plm package in R for a new dataset of predicting variables. I have create a model object using:
model <- plm(formula, data, index, model = 'pooling')
Now I'm hoping to predict a dependent variable from a new dataset which has not been used in the estimation of the model. I can do it through using the coefficients from the model object like this:
col_idx <- c(...)
df <- cbind(rep(1, nrow(df)), df[(1:ncol(df))[-col_idx]])
fitted_values <- as.matrix(df) %*% as.matrix(model_object$coefficients)
Such that I first define index columns used in the model and dropped columns due to collinearity in col_idx and subsequently construct a matrix of data which needs to be multiplied by the coefficients from the model. However, I can see errors occuring much easier with the manual dropping of columns.
A function designed to do this would make the code a lot more readable I guess. I have also found the pmodel.response() function but I can only get this to work for the dataset which has been used in predicting the actual model object.
Any help would be appreciated!
I wrote a function (predict.out.plm) to do out of sample predictions after estimating First Differences or Fixed Effects models with plm.
The function is posted here:
https://stackoverflow.com/a/44185441/2409896
I have a data frame that I am using for machine learning using svm with the package e1071 with R. The formula I have is :
fidSVM = formula(y ~ a + b + c + d)
that I plug into :
fitsvm = svm(fidSVM, data = crossdata, method="C-classification", kernel="polynomial", degree=incdegree, cost=0.5, shrinking=TRUE,
scale=TRUE, gamma=1, coef0=0, cross = 10)
Then, I want to predict. To test the predict() function, I simply reuse the initial data frame:
predict(fitsvm, crossdata)
factor(0)
Levels: 0 1
The classification works pretty well (I checked that), but the predict function does not work properly. As mentioned in other posts, I was careful to use a data frame to deal with my data. I have used predict svm in the past and the predict function without problems. Does anyone have an idea on what may cause the problem here?
P.S.: I do have NaN's in my data, only factors and numerical values.
Thank you for your help!
I have a stationary time series to which I want to fit a linear model with an autoregressive term to correct for serial correlation, i.e. using the formula At = c1*Bt + c2*Ct + ut, where ut = r*ut-1 + et
(ut is an AR(1) term to correct for serial correlation in the error terms)
Does anyone know what to use in R to model this?
Thanks
Karl
The GLMMarp package will fit these models. If you just want a linear model with Gaussian errors, you can do it with the arima() function where the covariates are specified via the xreg argument.
There are several ways to do this in R. Here are two examples using the "Seatbelts" time series dataset in the datasets package that comes with R.
The arima() function comes in package:stats that is included with R. The function takes an argument of the form order=c(p, d, q) where you you can specify the order of the auto-regressive, integrated, and the moving average component. In your question, you suggest that you want to create a AR(1) model to correct for first-order autocorrelation in the errors and that's it. We can do that with the following command:
arima(Seatbelts[,"drivers"], order=c(1,0,0),
xreg=Seatbelts[,c("kms", "PetrolPrice", "law")])
The value for order specifies that we want an AR(1) model. The xreg compontent should be a series of other Xs we want to add as part of a regression. The output looks a little bit like the output of summary.lm() turned on its side.
Another alternative process might be more familiar to the way you've fit regression models is to use gls() in the nlme package. The following code turns the Seatbelt time series object into a dataframe and then extracts and adds a new column (t) that is just a counter in the sorted time series object:
Seatbelts.df <- data.frame(Seatbelts)
Seatbelts.df$t <- 1:(dim(Seatbelts.df)[1])
The two lines above are only getting the data in shape. Since the arima() function is designed for time series, it can read time series objects more easily. To fit the model with nlme you would then run:
library(nlme)
m <- gls(drivers ~ kms + PetrolPrice + law,
data=Seatbelts.df,
correlation=corARMA(p=1, q=0, form=~t))
summary(m)
The line that begins with "correlation" is the way you pass in the ARMA correlation structure to GLS. The results won't be exactly the same because arima() uses maximum likelihood to estimate models and gls() uses restricted maximum likelihood by default. If you add method="ML" to the call to gls() you will get identical estimates you got with the ARIMA function above.
What is your link function?
The way you describe it sounds like a basic linear regression with autocorrelated errors. In that case, one option is to use lm to get a consistent estimate of your coefficients and use Newey-West HAC standard errors.
I'm not sure the best answer for GLM more generally.