Holding the coefficients of a linear model constant while exchanging predictors for their sample means? - r

I've been trying to look at the explanatory power of individual variables in a model by holding other variables constant at their sample mean.
However, I am unable to do something like:
Temperature = alpha + Beta1*RFGG + Beta2*RFSOx + Beta3*RFSolar
where Beta1=Beta2=Beta3 -- something like
Temperature = alpha + Beta1*(RFGG + RFSolar + RFSOx)
I want to do this so I can compare the difference in explanatory power (R^2/size of residuals) when one independent variable is not held at the sample mean while the rest are.
Temperature = alpha + Beta1*(RFGG + meanRFSolar + meanRFSOx)
or
Temperature = alpha + Beta1*RFGG + Beta1*meanRFSolar + Beta1*meanRFSOx
However, the lm function seems to estimate its own coefficients so I don't know how I can hold anything constant.
Here's some ugly code I tried throwing together that I know reeks of wrongness:
# fixing a new clean matrix for my data
dat = cbind(dat[,1:2],dat[,4:6]) # contains 162 rows of: Date, Temp, RFGG, RFSolar, RFSOx
# make a bunch of sample mean independent variables to use
meandat = dat[,3:5]
meandat$RFGG = mean(dat$RFGG)
meandat$RFSolar = mean(dat$RFSolar)
meandat$RFSOx = mean(dat$RFSOx)
RFTotal = dat$RFGG + dat$RFSOx + dat$RFSolar
B = coef(lm(dat$Temp ~ 1 + RFTot)) # trying to save the coefficients to use them...
B1 = c(rep(B[1],length = length(dat[,1])))
B2 = c(rep(B[2],length = length(dat[,1])))
summary(lm(dat$Temp ~ B1 + B2*dat$RFGG:meandat$RFSOx:meandat$RFSolar)) # failure
summary(lm(dat$Temp ~ B1 + B2*RFTot))
Thanks for taking a look to whoever sees this and please ask me any questions.

Thank you both of you, it was a combination of eliminating the intercept with (-1) and the offset function.
a = lm(Temp ~ I(RFGG + RFSOx + RFSolar),data = dat)
beta1hat = rep(coef(a)[1],length=length(dat[,1]))
beta2hat = rep(coef(a)[2],length=length(dat[,1]))
b = lm(Temp ~ -1 + offset(beta1hat) + offset(beta2hat*(RFGG + RFSOx_bar + RFSolar_bar)),data = dat)
c = lm(Temp ~ -1 + offset(beta1hat) + offset(beta2hat*(RFGG_bar + RFSOx + RFSolar_bar)),data = dat)
d = lm(Temp ~ -1 + offset(beta1hat) + offset(beta2hat*(RFGG_bar + RFSOx_bar + RFSolar)),data = dat)

Related

svyglm - how to code for a logistic regression model across all variables?

In R using GLM to include all variables you can simply use a . as shown How to succinctly write a formula with many variables from a data frame?
for example:
y <- c(1,4,6)
d <- data.frame(y = y, x1 = c(4,-1,3), x2 = c(3,9,8), x3 = c(4,-4,-2))
mod <- lm(y ~ ., data = d)
however I am struggling to do this with svydesign. I have many exploratory variables and an ID and weight variable, so first I create my survey design:
des <-svydesign(ids=~id, weights=~wt, data = df)
Then I try creating my binomial model using weights:
binom <- svyglm(y~.,design = des, family="binomial")
But I get the error:
Error in svyglm.survey.design(y ~ ., design = des, family = "binomial") :
all variables must be in design = argument
What am I doing wrong?
You typically wouldn't want to do this, because "all the variables" would include design metadata such as weights, cluster indicators, stratum indicators, etc
You can use col.names to extract all the variable names from a design object and then reformulate, probably after subsetting the names, eg with the api example in the package
> all_the_names <- colnames(dclus1)
> all_the_actual_variables <- all_the_names[c(2, 11:37)]
> reformulate(all_the_actual_variables,"y")
y ~ stype + pcttest + api00 + api99 + target + growth + sch.wide +
comp.imp + both + awards + meals + ell + yr.rnd + mobility +
acs.k3 + acs.46 + acs.core + pct.resp + not.hsg + hsg + some.col +
col.grad + grad.sch + avg.ed + full + emer + enroll + api.stu

Fitting two coefplot in one graph using par(mfrow()) method

I'm trying to arrange two coefplot objects into one graph via the par(mfrow(,)) method, but it didn't work out. What did I do wrong? Or is that coefplot just doesn't work this way? What will be alternative method?
I've referenced this earlier thread, but I tend to think that mine is a quite different issue.
# load the data
dat <- readRDS(url("https://www.dropbox.com/s/88h7hmiroalx3de/act.rds?dl=1"))
#fit two models
library(lmer4)
act1.fit <- glmer(act1 ~ os + education + marital + nat6 + nat5 + nat4 + nat3 + nat2 + nat1 +
(1 | region_id), data = action, family = binomial, control = glmerControl(optimizer = "bobyqa"),
nAGQ = 10)
action2.fit <- glmer(act2 ~ os + education + marital + nat6 + nat5 + nat4 + nat3 + nat2 + nat1 +
(1 | region_id), data = action, family = binomial, control = glmerControl(optimizer = "bobyqa"),
nAGQ = 10)
# plot the two model individually
library(coefplot)
# construct coefplot objects
coefplot:::buildModelCI(action1.fit)
coefplot:::buildModelCI(action2.fit)
coefplot(action2.fit, coefficients=c("nat1", "nat2", "nat3", "nat4", "nat5", "nat6"),
intercept = FALSE, color = "brown3")
# arrange two plots in one graph
par(mfrow=c(1,2))
coefplot(action1.fit, coefficients=c("nat1", "nat2", "nat3", "nat4", "nat5", "nat6"),
intercept = FALSE, color = "brown3")
coefplot(action2.fit, coefficients=c("nat1", "nat2", "nat3", "nat4", "nat5", "nat6"),
intercept = FALSE, color = "brown3")
# didn't work ???

lmer poly() function on interactions

My goal is to run a quadratic function using several IVs across time within subject. I have come across some code and am a little confused. Below is a reproducible example of what I am trying to run. Following the code will be my questions.
set.seed(1234)
obs <- 1:200
IV1<- rnorm(length(obs), mean = 1, sd = mean(obs^2) / 4)
IV2<- rnorm(length(obs), mean = 1.5, sd = mean(obs^2) / 8)
IV3<- rnorm(length(obs), mean = .5, sd = mean(obs^3) / 4)
y <- (obs + obs^2 + obs^3) + rnorm(length(obs), mean = 0, sd = mean(obs^2) / 4)
my.data <- data.frame(obs,IV1,IV2,IV3,y,
DV = y/10000,
time= c(1,2,3,4,5),
Subj= rep(letters[1:20], each =5),
group= rep(letters[1:2], each =5))
my.data$group.factor<-as.factor(my.data$group)
my.data$dx <- as.numeric(ifelse(my.data$group == "a", 1, 0))
Polylmer<- lmer(DV ~ poly(time*IV1, 2) + poly(time*IV2, 2) + poly(time*IV3, 2) + poly(time*dx, 2) + (1|Subj), data = my.data)
My questions are as follow:
In non poly() lmer statement time*IV2 would give coefficients for time and IV2 interaction as well as the lower order coefficients time and IV2. Am I correct that using the poly() statement does not put the lower terms into the model?
I have been taught that if you include the higher terms you should include the lower terms also. Is this still correct with the poly() function in r?
If so Would make sense to use either
Polylmer2<- lmer(DV ~ poly(time, 2)*poly(IV1, 2) + poly(time, 2)*poly(IV2, 2) + poly(time, 2)*poly(IV3, 2) + poly(time*dx, 2) + (1|Subj), data = my.data)
Polylmer3<- lmer(DV ~ time + IV1 + IV2 + IV3 + dx + poly(time, 2) + poly(IV1, 2) + poly(IV2, 2) + poly(IV3, 2) + poly(time*IV1, 2) + poly(time*IV2, 2) + poly(time*IV3, 2) + poly(time*dx, 2) + (1|Subj), data = my.data)
I would assume that the two above are equivalent, however, I am wrong as the second gives me an error:
Error: Dropping columns failed to produce full column rank design matrix
What columns did I drop?
Thank you for helping. I am very new to r so I am trying my best to understand what these functions do rather than just follow a recipe.

'lme' error R "attempt to apply non-function

I'm conducting lme analysis using on my dataset with the following code
M1 <- lme(VT ~ visit + sx + agevis + c_bmi + gpa + qa + BP + MH + ethn, data = Cleaned_data4t300919, random = ~ 1 + visit |id, corAR1(),method = "ML", na.action = na.omit(Cleaned_data4t300919))
and I get the following error message:
Error in model.frame.default(formula = ~visit + sx + agevis + c_bmi +
: attempt to apply non-function
I am not sure what I am doing wrong or how to get the model to run. I really appreciate an answer. Thank you.
I am trying to run a linear mixed effect model with VT as my dependent variable, visit as my time variable, with a 1st order autoregressive correlation, ML estimator on data with some missing observations.
I have tried changing the code in the following ways but got the same error message
library(nlme)
?lme
fm2 <- lme(VT ~ visit + sx + agevis + c_bmi + gpa + qa + BP + MH + ethn, data = Cleaned_data4t300919, random = ~ 1|id, corAR1(),method = "ML", na.action = na.pass(Cleaned_data4t300919))
fm2 <- lme(VT ~ visit + sx + agevis + c_bmi + gpa + qa + BP + sfnMH + ethn, data = Cleaned_data4t300919, random = ~ 1 + visit |cenid, corAR1(),method = "ML", na.action = na.omit(Cleaned_data4t300919))
fm2 <- lme(VT ~ visit + sx + agevis , data = Cleaned_data4t300919, random = ~ 1 + visit |id, corAR1(),method = "ML", na.action = na.omit(Cleaned_data4t300919))
fm2 <- lme(VT~visit + sx + agevis + c_bmi + gpa + qa + BP + MH + ethn, data = Cleaned_data4t300919, na.action = na.exclude(Cleaned_data4t300919))
fm2 <- lme(formula= sfnVT ~ visit + sx + agevis , data = Cleaned_data4t300919, random = ~ 1 + visit |cenid, corAR1(),method = "ML", na.action = na.omit(Cleaned_data4t300919))
I will like to obtain the estimates for the code and plot estimates using ggplot.
na.action = na.omit(Cleaned_data4t300919)
and similar attempts are the problem I think.
From ?lme:
na.action: a function that indicates what should happen when the data
contain 'NA's
You are providing data, not a function, since na.omit(dataset) returns a data.frame with NA containing rows removed, rather than something that can be applied to the data= specified. Just:
na.action=na.omit
or similar na.* functions will be sufficient.
A way to identify these kinds of issues for sure is to use ?debug - debug(lme) then step through the function line-by-line to see exactly what the error is in response to.

neural network in R

hi i am trying to use neuralnet function in R so i can predict an integer outcome (meaning) using the rest of the variables.
here is the code that i have used:
library("neuralnet")
I am going to put 2/3 from the data for neural network learning and the rest
for test
ind<-sample(1:nrow(Data),6463,replace=FALSE)
Train<-Data[ind,]
Test<-Data[-ind,]
m <- model.matrix(
~meaning +
firstLevelAFFIRM + firstLevelDAT.PRSN + firstLevelMODE +
firstLevelO.DEF + firstLevelO.INDIV + firstLevelS.AGE.INDIV +
secondLevelV.BIN + secondLevelWord1 + secondLevelWord2 +
secondLevelWord3 + secondLevelWord4 + thirdLevelP.TYPE,
data = Train[,-1]) #(the first column is ID , i am not going to use it)
PredictorVariables <- paste("m[," , 3:ncol(m),"]" ,sep="")
Formula <- formula(paste("meaning ~ ", paste(PredictorVariables, collapse=" + ")))
net <- neuralnet(Formula,data=m, hidden=3, threshold=0.05)
m.test < -model.matrix(
~meaning +
firstLevelAFFIRM + firstLevelDAT.PRSN + firstLevelMODE +
firstLevelO.DEF + firstLevelO.INDIV + firstLevelS.AGE.INDIV +
secondLevelV.BIN + secondLevelWord1 + secondLevelWord2 +
secondLevelWord3 + secondLevelWord4 + thirdLevelP.TYPE,
data = Test[,-1])
net.results <- compute(net, m.test[,-c(1,2)]) #(first column is ID and the second one is the outcome that i am trying to predict)
output<-cbind(round(net.results$net.result),Test$meaning)
mean(round(net.results$net.result)!=Test$meaning)
the misclassification that i got was around 0.01 which is great, but my question is why the outcome that i got (net.results$net.result) is not an integer?
I assume that your output is linear. Try setting linear.output = FALSE.
net <- neuralnet(Formula, data = m, hidden = 3, threshold = 0.05, linear.output = FALSE)

Resources