R survey package: svyby + svymean: one vs many variables - r

Let's assume the a data set mydata with the variables foo1..foo20 which are factors with the labels "Easy" and "Difficult". Now let's consider this code:
library(survey)
svd <- svydesign(ids = ~ 1, weights = ~ weight, data = mydata)
svyby(~ foo1, by = ~ group, svd, svymean)$foo1Difficult
svyby(~ foo1 + foo2 + foo3 + ... + foo20, by = ~ group, svd, svymean)$foo1Difficult
Are the results supposed to be identical? Is there a reason why the results could differ? Why does it make a difference whether I iterate over each variable or use all variables at once?

As #AnthonyDamico pointed out, the difference was caused by NAs.

Related

Loop mixed linear model longitudinal time data assessing groups effect on the continous y variable

EDITED:
I'm trying to assess the effect of variables (e.g. presence of severe trauma) on a continous variable (here energy expenditure (=REE) in calories) over time (Day). The dataframe is called my_data. Amongst the variables
Following I would like to display the results using the mixed linear model for each assessed variable in one large file.
General concept:
REE ~ Time*predictor + (1 + Time | Case identifier)
(1) Starting creating the lmer model:
library(tidyverse)
library(ggpmisc)
library(sjPlot)
library(lme4)
mixed.modelloop <- function(x) {
lmer(REE ~ Day*(x) + (1 + Day | Studynumber),
data=my_data,
REML=FALSE,
na.action=na.omit,
control = lmerControl(check.nobs.vs.nRE = "ignore"))
}
(2) Then creating the predictors (x)
cols <- c(colnames(my_data))
(3) And then generating the overall purrr function:
output <- purrr::map(cols, ~ mixed.modelloop(.x) %>% tab_model)
(4) generating the file which should include all separate univariate mixed model analyses:
pdf(file="mixed linear models.pdf" )
output
dev.off()
Unfortunately currently after step (3) I'm getting the following error message:
Error in model.frame.default(data = my_data, na.action = na.omit, drop.unused.levels = TRUE, :
variable lengths differ (found for 'x')
Any idea on how to adapt the function to resolve this issue?
Thanks!
Formulas have special rules, you can't insert a string into them and expect them to work.
This should work, although you haven't given a reproducible example to test with ...
mixed.modelloop <- function(x) {
form <- reformulate(c(sprintf("Day*%s", x), "(1 + Day | Studynumber)"),
response = "REE")
lmer(form,
data=my_data,
REML=FALSE,
na.action=na.omit,
control = lmerControl(check.nobs.vs.nRE = "ignore"))
}

How can I plot the difference between marginal effects for a categorical variable?

I'm trying to plot the difference between marginal effects for a dependent categorical variable. I have tried using emmeans, but I can't get what I want.
I'll try to follow the example in this vignette for emmeans for ordinal models.
I run this model with interaction:
library(ordinal)
library(emmeans)
wine.clm <- clm(rating ~ temp * contact, data = wine)
Then I plot the model:
plot_model(wine.clm, type = "pred",
terms = c("contact", "temp"))
This isn't bad. But to simplify, I want to plot the difference between the predicted probabilities for warm and cold. This, I hope, should also highlight the interaction.
I've tried with emmeans, which gives me a difference, but only for the odd-logs latent variable.
emmeans(wine.clm, list(pairwise ~ contact|temp))
Instead, I would like to plot the difference in probabilities for each rating category .
One option is to use the marginaleffects package (disclaimer: I am the author).
See the Contrasts vignette for different ways to compare predicted probabilities. Here’s one possibility:
library(ordinal)
library(ggplot2)
library(marginaleffects)
wine.clm <- clm(rating ~ temp * contact, data = wine)
cmp <- comparisons(wine.clm,
variables = "temp",
newdata = datagrid(contact = unique))
ggplot(cmp, aes(x = contact, y = comparison, ymin = conf.low, ymax = conf.high)) +
geom_pointrange() +
facet_wrap(~group) +
labs(x = "Contact", y = "P(Y|Warm) - P(Y|Cold)")

Stepwise regression in r with mixed models: numbers of rows changing [duplicate]

I want to run a stepwise regression in R to choose the best fit model, my code is attached here:
full.modelfixed <- glm(died_ed ~ age_1 + gender + race + insurance + injury + ais + blunt_pen +
comorbid + iss +min_dist + pop_dens_new + age_mdn + male_pct +
pop_wht_pct + pop_blk_pct + unemp_pct + pov_100x_npct +
urban_pct, data = trauma, family = binomial (link = 'logit'), na.action = na.exclude)
reduced.modelfixed <- stepAIC(full.modelfixed, direction = "backward")
There is a error message said
Error in stepAIC(full.modelfixed, direction = "backward") :
number of rows in use has changed: remove missing values?
Almost every variable in the data has some missing values, so I cannot delete all missing values (data = na.omit(data))
Any idea on how to fix this?
Thanks!!
This should probably be in a stats forum (stats.stackexchange) but briefly there are a number of considerations.
The main one is that when comparing two models they need to be fitted on the same dataset (i.e you need to be able to nest the models within each other).
For examples
glm1 <- glm(Dependent~indep1+indep2+indep3, family = binomial, data = data)
glm2 <- glm(Dependent~indep2+indep2, family = binomial, data = data)
Now imagine that we are missing values of indep3 but not indep1 or indep2.
When we run glm1 we are running it on a smaller dataset - the dataset for which we have the dependent variable and all three independent ones (i.e we exclude any rows where indep3 values are missing).
When we run glm2 the rows missing a value for indep3 are included because those rows do contain dependent, indep1 and indep2 which are the models in the variable.
We can no longer directly compare models as they are fitted on different datasets.
I think broadly you can either
1) Limit to data which is complete
2) If appropriate consider multiple imputation
Hope that helps.
You can use the MICE package to do imputation, then working with the dataset will not give you errors

How to run fixed-effects logit model with clustered standard errors and survey weights in R?

I am using Afrobarometer survey data using 2 rounds of data for 10 countries. My DV is a binary 0-1 variable. I need to use logistic regression, fixed-effects, clustered standard errors (at country), and weighted survey data. A variable for the weights already exists in the dataframe.
I've been looking at help files for the following packages: clogit, glm, pglm, glm2, zelig, bife , etc. Typical errors include: can't add weights, can't do fixed effects, cant do either or etc.
#Glm
t3c1.fixed <- glm(formula = ethnic ~ elec_prox +
elec_comp + round + country, data=afb,
weights = afb$survey_weight,
index c("country", "round"),
family=binomial(link='logit'))
#clogit
t3c1.fixed2 <- clogit(formula = ethnic ~ elec_prox +
elec_comp + round + country, data=afb,
weights = afb$survey_weight,
method=c("within"))
#bife attempt
library(bife)
t3c1.fixed3 <- bife(ethnic ~ elec_prox + elec_comp + round +
country, model = logit,data=afb,
weights = afb$survey_weight,
bias_corr = "ana")
I either get error messages or the code doesn't include one of the conditions I need to include, so I can't use them. In Stata it appears this process is very simple, but in R it seems rather tedious. Any help would be appreciated!
I would check out the survey package which provides everything for which you are asking. The first step is to create the survey object, specify the survey weights and then you are off to the races.
library(survey)
my_survey <- svydesign(ids= ~1, strata = ~country, wts = ~wts, data = your_data)
# Then you can use the survey glm to do what you want via
svy_fit <- svy_glm(ethnic ~ elec_prox +
elec_comp + round + country, data = my_survey, family = binomial())
Or at least I would go down this path given you are using survey data.

Loop multiple 'multiple linear regressions' in R

I have a database where I want to do several multiple regressions. They all look like this:
fit <- lm(Variable1 ~ Age + Speed + Gender + Mass, data=Data)
The only variable changing is variable1. Now I want to loop or use something from the apply family to loop several variables at the place of variable1. These variables are columns in my datafile. Can someone help me to solve this problem? Many thanks!
what I tried so far:
When I extract one of the column names with the names() function I do get a the name of the column:
varname = as.name(names(Data[14]))
But when I fill this in (and I used the attach() function):
fit <- lm(Varname ~ Age + Speed + Gender + Mass, data=Data)
I get the following error:
Error in model.frame.default(formula = Varname ~ Age + Speed + Gender
+ : object is not a matrix
I suppose that the lm() function does not recognize Varname as Variable1.
You can use lapply to loop over your variables.
fit <- lapply(Data[,c(...)], function(x) lm(x ~ Age + Speed + Gender + Mass, data = Data))
This gives you a list of your results.
The c(...) should contain your variable names as strings. Alternatively, you can choose the variables by their position in Data, like Data[,1:5].
The problem in your case is that the formula in the lm function attempts to read the literal names of columns in the data or feed the whole vector into the regression. Therefore, to use the column name, you need to tell the formula to interpret the value of the variable varnames and incorporate it with the other variables.
# generate some data
set.seed(123)
Data <- data.frame(x = rnorm(30), y = rnorm(30),
Age = sample(0:90, 30), Speed = rnorm(30, 60, 10),
Gender = sample(c("W", "M"), 30, rep=T), Mass = rnorm(30))
varnames <- names(Data)[1:2]
# fit regressions for multiple dependent variables
fit <- lapply(varnames,
FUN=function(x) lm(formula(paste(x, "~Age+Speed+Gender+Mass")), data=Data))
names(fit) <- varnames
fit
$x
Call:
lm(formula = formula(paste(x, "~Age+Speed+Gender+Mass")), data = Data)
Coefficients:
(Intercept) Age Speed GenderW Mass
0.135423 0.010013 -0.010413 0.023480 0.006939
$y
Call:
lm(formula = formula(paste(x, "~Age+Speed+Gender+Mass")), data = Data)
Coefficients:
(Intercept) Age Speed GenderW Mass
2.232269 -0.008035 -0.027147 -0.044456 -0.023895

Resources