svyglm - how to code for a logistic regression model across all variables? - r

In R using GLM to include all variables you can simply use a . as shown How to succinctly write a formula with many variables from a data frame?
for example:
y <- c(1,4,6)
d <- data.frame(y = y, x1 = c(4,-1,3), x2 = c(3,9,8), x3 = c(4,-4,-2))
mod <- lm(y ~ ., data = d)
however I am struggling to do this with svydesign. I have many exploratory variables and an ID and weight variable, so first I create my survey design:
des <-svydesign(ids=~id, weights=~wt, data = df)
Then I try creating my binomial model using weights:
binom <- svyglm(y~.,design = des, family="binomial")
But I get the error:
Error in svyglm.survey.design(y ~ ., design = des, family = "binomial") :
all variables must be in design = argument
What am I doing wrong?

You typically wouldn't want to do this, because "all the variables" would include design metadata such as weights, cluster indicators, stratum indicators, etc
You can use col.names to extract all the variable names from a design object and then reformulate, probably after subsetting the names, eg with the api example in the package
> all_the_names <- colnames(dclus1)
> all_the_actual_variables <- all_the_names[c(2, 11:37)]
> reformulate(all_the_actual_variables,"y")
y ~ stype + pcttest + api00 + api99 + target + growth + sch.wide +
comp.imp + both + awards + meals + ell + yr.rnd + mobility +
acs.k3 + acs.46 + acs.core + pct.resp + not.hsg + hsg + some.col +
col.grad + grad.sch + avg.ed + full + emer + enroll + api.stu

Related

'lme' error R "attempt to apply non-function

I'm conducting lme analysis using on my dataset with the following code
M1 <- lme(VT ~ visit + sx + agevis + c_bmi + gpa + qa + BP + MH + ethn, data = Cleaned_data4t300919, random = ~ 1 + visit |id, corAR1(),method = "ML", na.action = na.omit(Cleaned_data4t300919))
and I get the following error message:
Error in model.frame.default(formula = ~visit + sx + agevis + c_bmi +
: attempt to apply non-function
I am not sure what I am doing wrong or how to get the model to run. I really appreciate an answer. Thank you.
I am trying to run a linear mixed effect model with VT as my dependent variable, visit as my time variable, with a 1st order autoregressive correlation, ML estimator on data with some missing observations.
I have tried changing the code in the following ways but got the same error message
library(nlme)
?lme
fm2 <- lme(VT ~ visit + sx + agevis + c_bmi + gpa + qa + BP + MH + ethn, data = Cleaned_data4t300919, random = ~ 1|id, corAR1(),method = "ML", na.action = na.pass(Cleaned_data4t300919))
fm2 <- lme(VT ~ visit + sx + agevis + c_bmi + gpa + qa + BP + sfnMH + ethn, data = Cleaned_data4t300919, random = ~ 1 + visit |cenid, corAR1(),method = "ML", na.action = na.omit(Cleaned_data4t300919))
fm2 <- lme(VT ~ visit + sx + agevis , data = Cleaned_data4t300919, random = ~ 1 + visit |id, corAR1(),method = "ML", na.action = na.omit(Cleaned_data4t300919))
fm2 <- lme(VT~visit + sx + agevis + c_bmi + gpa + qa + BP + MH + ethn, data = Cleaned_data4t300919, na.action = na.exclude(Cleaned_data4t300919))
fm2 <- lme(formula= sfnVT ~ visit + sx + agevis , data = Cleaned_data4t300919, random = ~ 1 + visit |cenid, corAR1(),method = "ML", na.action = na.omit(Cleaned_data4t300919))
I will like to obtain the estimates for the code and plot estimates using ggplot.
na.action = na.omit(Cleaned_data4t300919)
and similar attempts are the problem I think.
From ?lme:
na.action: a function that indicates what should happen when the data
contain 'NA's
You are providing data, not a function, since na.omit(dataset) returns a data.frame with NA containing rows removed, rather than something that can be applied to the data= specified. Just:
na.action=na.omit
or similar na.* functions will be sufficient.
A way to identify these kinds of issues for sure is to use ?debug - debug(lme) then step through the function line-by-line to see exactly what the error is in response to.

How to use predict function with my pooled results from mice()?

Hi I just started using R as part of a module in school. I have a data set with missing data and I have used mice() to impute the missing data. I'm now trying to use the predict function with my pooled results. However, I observed the following error:
Error in UseMethod("predict") :
no applicable method for 'predict' applied to an object of class "c('mipo', 'data.frame')"
I have included my entire code below and I'd greatly apprciate it if y'all can help a novice out. Thanks!
```{r}
library(magrittr)
library(dplyr)
train = read.csv("Train_Data.csv", na.strings=c("","NA"))
test = read.csv("Test_Data.csv", na.strings=c("","NA"))
cols <- c("naCardiac", "naFoodNutrition", "naGenitourinary", "naGastrointestinal", "naMusculoskeletal", "naNeurological", "naPeripheralVascular", "naPain", "naRespiratory", "naSkin")
train %<>%
mutate_each_(funs(factor(.)),cols)
test %<>%
mutate_each_(funs(factor(.)),cols)
str(train)
str(test)
```
```{r}
library(mice)
md.pattern(train)
```
```{r}
miTrain = mice(train, m = 5, maxit = 50, meth = "pmm")
```
```{r}
model = with(miTrain, lm(LOS ~ Age + Gender + Race + Temperature + RespirationRate + HeartRate + SystolicBP + DiastolicBP + MeanArterialBP + CVP + Braden + SpO2 + FiO2 + PO2_POCT + Haemoglobin + NumWBC + Haematocrit + NumPlatelets + ProthrombinTime + SerumAlbumin + SerumChloride + SerumPotassium + SerumSodium + SerumLactate + TotalBilirubin + ArterialpH + ArterialpO2 + ArterialpCO2 + ArterialSaO2 + Creatinine + Urea + GCS + naCardiac + GCS + naCardiac + naFoodNutrition + naGenitourinary + naGastrointestinal + naMusculoskeletal + naNeurological + naPeripheralVascular + naPain + naRespiratory + naSkin))
model
summary(model)
```
```{r}
modelResults = pool(model)
modelResults
```
```{r}
pred = predict(modelResults, newdata = test)
PredTest = data.frame(test$PatientID, modelResults)
str(PredTest)
summary(PredTest)
```
One slightly hacky way to achieve this may be to take one of the fitted models created by fit() and replace the stored coefficients with the final pooled estimates. I haven't done detailed testing but it seems to be working on this simple example:
library(mice)
imp <- mice(nhanes, maxit = 2, m = 2)
fit <- with(data = imp, exp = lm(bmi ~ hyp + chl))
pooled <- pool(fit)
# Copy one of the fitted lm models fit to
# one of the imputed datasets
pooled_lm = fit$analyses[[1]]
# Replace the fitted coefficients with the pooled
# estimates (need to check they are replaced in
# the correct order)
pooled_lm$coefficients = summary(pooled)$estimate
# Predict - predictions seem to match the
# pooled coefficients rather than the original
# lm that was copied
predict(fit$analyses[[1]], newdata = nhanes)
predict(pooled_lm, newdata = nhanes)
As far as I know predict() for a linear regression should only depend
on the coefficients, so you shouldn't have to replace other
stored values in the fitted model (but you would have to if applying
methods other than predict()).

R Predict using multiple models

I am new to R and trying to predict outcomes on a dataset using 4 different GLM's. I have tried running as one large model and while I do get results the model doesn't converge properly and I end up with N/A's. I therefore have four models:
model_team <- glm(mydata$OUT ~ TEAM + OPPONENT, family = "binomial",data = mydata )
model_conf <- glm(mydata$OUT ~ TCONF + OCONF, family = "binomial",data = mydata)
model_tstats <- glm(mydata$OUT ~ TPace + TORtg + TFTr + T3PAr + TTS. + TTRB. + TAST. + TSTL. + TBLK. + TeFG. + TTOV. + TORB. + TFT.FGA, family = "binomial",data = mydata)
model_ostats<- glm(mydata$OUT ~ OPace + OORtg + OFTr + O3PAr + OTS. + OTRB. + OAST. + OSTL. + OBLK. + OeFG. + OTOV. + OORB. + OFT.FGA, family = "binomial",data = mydata)
I then want to predict the outcomes using a different data set using the four models
predict(model_team, model_conf, model_tstats, model_ostats, fix, level = 0.95, type = "probs")
Is there a way to use all four models with joining them into one large set?
I don't really understand why you are trying to do what you are doing. I also don't have any example data that is a representation of the data you are working with. However, below is an example of how you could combine multiple GLMs into one using the resulting coefficients. Note that this will not work well if you have multicollinearity between the variables in your dataset.
# I used the iris dataset for my example
head(iris)
# Run several models
model1 <- glm(data = iris, Sepal.Length ~ Sepal.Width)
model2 <- glm(data = iris, Sepal.Length ~ Petal.Length)
model3 <- glm(data = iris, Sepal.Length ~ Petal.Width)
# Get combined intercept
intercept <- mean(
coef(model1)['(Intercept)'],
coef(model2)['(Intercept)'],
coef(model3)['(Intercept)'])
# Extract coefficients
coefs <- as.matrix(
c(coef(model1)[2],
coef(model2)[2],
coef(model3)[2])
# Get the feature values for the predictions
ds <- as.matrix(iris[,c('Sepal.Width', 'Petal.Length', 'Petal.Width')])
# Linear algebra: Matrix-multiply values with coefficients
prediction <- ds %*% coefs + intercept
# Let's look at the results
plot(iris$Petal.Length, prediction)

In R, is there a parsimonious or efficient way to get a regression prediction holding all covariates at their means?

I'm wondering if there is essentially a faster way of getting predictions from a regression model for certain values of the covariates without manually specifying the formulation. For example, if I wanted to get a prediction for a given dependent variable at means of the covariates, I can do something like this:
glm(ins ~ retire + age + hstatusg + qhhinc2 + educyear + married + hisp,
family = binomial, data = dat)
meanRetire <- mean(dat$retire)
meanAge <- mean(dat$age)
meanHStatusG <- mean(dat$hStatusG)
meanQhhinc2 <- mean(dat$qhhinc2)
meanEducyear <- mean(dat$educyear)
meanMarried <- mean(dat$married)
meanYear <- mean(dat$year)
ins_predict <- coef(r_3)[1] + coef(r_3)[2] * meanRetire + coef(r_3)[3] * meanAge +
coef(r_3)[4] * meanHStatusG + coef(r_3)[5] * meanQhhinc2 +
coef(r_3)[6] * meanEducyear + coef(r_3)[7] * meanMarried +
coef(r_3)[7] * meanHisp
Oh... There is a predict function:
fit <- glm(ins ~ retire + age + hstatusg + qhhinc2 + educyear + married + hisp,
family = binomial, data = dat)
newdat <- lapply(dat, mean) ## column means
lppred <- predict(fit, newdata = newdat) ## prediction of linear predictor
To get predicted response, use:
predict(fit, newdata = newdat, type = "response")
or (more efficiently from lppred):
binomial()$linkinv(lppred)

Holding the coefficients of a linear model constant while exchanging predictors for their sample means?

I've been trying to look at the explanatory power of individual variables in a model by holding other variables constant at their sample mean.
However, I am unable to do something like:
Temperature = alpha + Beta1*RFGG + Beta2*RFSOx + Beta3*RFSolar
where Beta1=Beta2=Beta3 -- something like
Temperature = alpha + Beta1*(RFGG + RFSolar + RFSOx)
I want to do this so I can compare the difference in explanatory power (R^2/size of residuals) when one independent variable is not held at the sample mean while the rest are.
Temperature = alpha + Beta1*(RFGG + meanRFSolar + meanRFSOx)
or
Temperature = alpha + Beta1*RFGG + Beta1*meanRFSolar + Beta1*meanRFSOx
However, the lm function seems to estimate its own coefficients so I don't know how I can hold anything constant.
Here's some ugly code I tried throwing together that I know reeks of wrongness:
# fixing a new clean matrix for my data
dat = cbind(dat[,1:2],dat[,4:6]) # contains 162 rows of: Date, Temp, RFGG, RFSolar, RFSOx
# make a bunch of sample mean independent variables to use
meandat = dat[,3:5]
meandat$RFGG = mean(dat$RFGG)
meandat$RFSolar = mean(dat$RFSolar)
meandat$RFSOx = mean(dat$RFSOx)
RFTotal = dat$RFGG + dat$RFSOx + dat$RFSolar
B = coef(lm(dat$Temp ~ 1 + RFTot)) # trying to save the coefficients to use them...
B1 = c(rep(B[1],length = length(dat[,1])))
B2 = c(rep(B[2],length = length(dat[,1])))
summary(lm(dat$Temp ~ B1 + B2*dat$RFGG:meandat$RFSOx:meandat$RFSolar)) # failure
summary(lm(dat$Temp ~ B1 + B2*RFTot))
Thanks for taking a look to whoever sees this and please ask me any questions.
Thank you both of you, it was a combination of eliminating the intercept with (-1) and the offset function.
a = lm(Temp ~ I(RFGG + RFSOx + RFSolar),data = dat)
beta1hat = rep(coef(a)[1],length=length(dat[,1]))
beta2hat = rep(coef(a)[2],length=length(dat[,1]))
b = lm(Temp ~ -1 + offset(beta1hat) + offset(beta2hat*(RFGG + RFSOx_bar + RFSolar_bar)),data = dat)
c = lm(Temp ~ -1 + offset(beta1hat) + offset(beta2hat*(RFGG_bar + RFSOx + RFSolar_bar)),data = dat)
d = lm(Temp ~ -1 + offset(beta1hat) + offset(beta2hat*(RFGG_bar + RFSOx_bar + RFSolar)),data = dat)

Resources