lmer poly() function on interactions - r

My goal is to run a quadratic function using several IVs across time within subject. I have come across some code and am a little confused. Below is a reproducible example of what I am trying to run. Following the code will be my questions.
set.seed(1234)
obs <- 1:200
IV1<- rnorm(length(obs), mean = 1, sd = mean(obs^2) / 4)
IV2<- rnorm(length(obs), mean = 1.5, sd = mean(obs^2) / 8)
IV3<- rnorm(length(obs), mean = .5, sd = mean(obs^3) / 4)
y <- (obs + obs^2 + obs^3) + rnorm(length(obs), mean = 0, sd = mean(obs^2) / 4)
my.data <- data.frame(obs,IV1,IV2,IV3,y,
DV = y/10000,
time= c(1,2,3,4,5),
Subj= rep(letters[1:20], each =5),
group= rep(letters[1:2], each =5))
my.data$group.factor<-as.factor(my.data$group)
my.data$dx <- as.numeric(ifelse(my.data$group == "a", 1, 0))
Polylmer<- lmer(DV ~ poly(time*IV1, 2) + poly(time*IV2, 2) + poly(time*IV3, 2) + poly(time*dx, 2) + (1|Subj), data = my.data)
My questions are as follow:
In non poly() lmer statement time*IV2 would give coefficients for time and IV2 interaction as well as the lower order coefficients time and IV2. Am I correct that using the poly() statement does not put the lower terms into the model?
I have been taught that if you include the higher terms you should include the lower terms also. Is this still correct with the poly() function in r?
If so Would make sense to use either
Polylmer2<- lmer(DV ~ poly(time, 2)*poly(IV1, 2) + poly(time, 2)*poly(IV2, 2) + poly(time, 2)*poly(IV3, 2) + poly(time*dx, 2) + (1|Subj), data = my.data)
Polylmer3<- lmer(DV ~ time + IV1 + IV2 + IV3 + dx + poly(time, 2) + poly(IV1, 2) + poly(IV2, 2) + poly(IV3, 2) + poly(time*IV1, 2) + poly(time*IV2, 2) + poly(time*IV3, 2) + poly(time*dx, 2) + (1|Subj), data = my.data)
I would assume that the two above are equivalent, however, I am wrong as the second gives me an error:
Error: Dropping columns failed to produce full column rank design matrix
What columns did I drop?
Thank you for helping. I am very new to r so I am trying my best to understand what these functions do rather than just follow a recipe.

Related

svyglm - how to code for a logistic regression model across all variables?

In R using GLM to include all variables you can simply use a . as shown How to succinctly write a formula with many variables from a data frame?
for example:
y <- c(1,4,6)
d <- data.frame(y = y, x1 = c(4,-1,3), x2 = c(3,9,8), x3 = c(4,-4,-2))
mod <- lm(y ~ ., data = d)
however I am struggling to do this with svydesign. I have many exploratory variables and an ID and weight variable, so first I create my survey design:
des <-svydesign(ids=~id, weights=~wt, data = df)
Then I try creating my binomial model using weights:
binom <- svyglm(y~.,design = des, family="binomial")
But I get the error:
Error in svyglm.survey.design(y ~ ., design = des, family = "binomial") :
all variables must be in design = argument
What am I doing wrong?
You typically wouldn't want to do this, because "all the variables" would include design metadata such as weights, cluster indicators, stratum indicators, etc
You can use col.names to extract all the variable names from a design object and then reformulate, probably after subsetting the names, eg with the api example in the package
> all_the_names <- colnames(dclus1)
> all_the_actual_variables <- all_the_names[c(2, 11:37)]
> reformulate(all_the_actual_variables,"y")
y ~ stype + pcttest + api00 + api99 + target + growth + sch.wide +
comp.imp + both + awards + meals + ell + yr.rnd + mobility +
acs.k3 + acs.46 + acs.core + pct.resp + not.hsg + hsg + some.col +
col.grad + grad.sch + avg.ed + full + emer + enroll + api.stu

How to fit non-linear function to data in ggplot2 using maximum likelihood model in R?

The data set (x.test, y.test) is an exponential fit. I'm trying to fit a custom non-linear function and attached is the code. The regular points plot just fine but I'm unable to get the fit line to work. Any suggestions?
x.test <- runif(50,2,8)
y.test <- 0.5^(x.test)
df <- data.frame(x.test, y.test)
library(ggpmisc)
my.formula <- y ~ lambda/ (1 + aii*x)
ggplot(data = df, aes(x=x.test,y=y.test)) +
geom_point(shape=21, fill="white", color="red", size=3) +
stat_smooth(method="nls",formula = y.test ~ lambda/ (1 + aii*x.test), method.args=list(start=c(lambda=1000,aii=-816.39)),se=F,color="red") +
geom_smooth(method="lm", formula = my.formula , col = "red") + stat_poly_eq(formula = my.formula, aes(label = stringr::str_wrap(paste(..eq.label.., ..rr.label.., sep = "~~~"))), parse = TRUE, size = 2.5, col = "red") + stat_function(fun=function (x.test){
y.test ~ lambda/ (1 + aii*x.test)}, color = "blue")
A few things:
you need to use y and x as the variable names in the formula argument to geom_smooth, regardless of what the names are in your data set
you need better starting values (see below)
there's a GLM trick you can use to fit this model; doesn't always work (can be numerically unstable), but it doesn't need starting values and will work more often than nls()
I don't think lm() and stat_poly_eq() are going to work as expected (or maybe at all) with a nonlinear formula ...
simulate data
(same as your code but using set.seed() - probably not important here but good practice)
set.seed(101)
x.test <- runif(50,2,8)
y.test <- 0.5^(x.test)
df <- data.frame(x.test, y.test)
attempt nls fit with your starting values
It's usually a good idea to troubleshoot by fitting any smoothing terms outside of ggplot2, so you have fewer layers to dig through to find the problems:
nls(y.test ~ lambda/(1+ aii*x.test),
start = list(lambda=1000,aii=-816.39),
data = df)
Error in nls(y.test ~ lambda/(1 + aii * x.test), start = list(lambda = 1000, :
singular gradient
OK, still doesn't work. Let's use glm() to get better starting values: we use an inverse-link GLM:
1/y = b0 + b1*x
y = 1/(b0 + b1*x)
= (1/b0)/(1 + (b1/b0)*x)
So:
g1 <- glm(y.test ~ x.test, family = gaussian(link = "inverse"))
s0 <- with(as.list(coef(g1)), list(lambda = 1/`(Intercept)`, aii = x.test/`(Intercept)`))
This gives lambda = -0.09, aii = -0.638 (with a little bit more work we could probably also figure out how to eyeball these by looking at the starting point and scale of the curve).
ggplot(data = df, aes(x=x.test,y=y.test)) +
geom_point(shape=21, fill="white", color="red", size=3) +
stat_smooth(method="nls",
formula = y ~ lambda/ (1 + aii*x),
method.args=list(start=s0),
se=FALSE,color="red") +
stat_smooth(method = "glm",
formula = y ~ x,
method.args = list(gaussian(link = "inverse")),
color = "blue", linetype = 2)

Getting Error in parse(text = paste("~", paste(nVal, collapse = "/"))) : <text>:2:0: unexpected end of input when running nlme package in R

I'm attempting to use the nlme package to fit the Generalized beta of the 2nd kind distribution to simulated health cost data.
Running the following code on a test dataset:
Package installation (if necessary)
install.packages("withr", dependencies = T)
library(withr)
with_makevars(c(PKG_CFLAGS ="-std=gnu99"),
install.packages("cubature"), assignment="+=")
install.packages("GB2", dependencies = T)
install.packages("nlme", dependencies = T)
# load packages
library(cubature)
library(GB2)
library(nlme)
# Binary independent variables
age <- rbinom(n=1000, size=1, prob=.3)
sex <- rbinom(n=1000, size=1, prob=.5)
trmt <- rbinom(n=1000, size=1, prob=.5)
# GB2 parameter equations
shape1 <- exp(rnorm(n=1000, mean=.1 + age/100 - sex/10 + trmt/10, sd=.3))
scale <- exp(rnorm(n=1000, mean=7 + age/50 + sex - trmt, sd=.5))
shape2 <- exp(rnorm(n=1000, mean=1.5 + age/100 + sex/10 - trmt/10, sd=.3))
shape3 <- exp(rnorm(n=1000, mean=.5 + age/100 - sex/10 - trmt/10, sd=.3))
# Outcome
y <- rgb2(1000, shape1, scale, shape2, shape3)
# Create test dataset
df <- data.frame(cbind(y,age,sex,trmt,shape1,scale,shape2,shape3))
# Fit GB2 distribution to data
gb2_fit <- nlme(y ~ scale*beta(shape2 + 1/shape1, shape3 - 1/shape1)/beta(shape2, shape3),
# data = list(y=df_gb2_test[,1]),
data = df,
fixed = list(shape1 ~ age + sex + trmt,
scale ~ age + sex + trmt,
shape2 ~ age + sex + trmt,
shape3 ~ age + sex + trmt),
start = list(fixed = c(shape1 = 1.00, scale = 100, shape2 = 1.00, shape3 = 1.00)))
I get the error:
Error in parse(text = paste("~", paste(nVal, collapse = "/"))) :
<text>:2:0: unexpected end of input
1: ~
^
Any ideas what I'm doing wrong? I seem to be using the tilde operator correctly.
I think nlme doesn't do what you think it does. It does nonlinear least squares mixed models; i.e., the response is assumed to be Gaussian, and there is assumed to be a random effect (perhaps you're confusing this with SAS PROC NLMIXED, which is more general?
library(bbmle)
## we need a version of the density function that takes a 'log' argument
dgb2B <- function(..., log=FALSE) {
r <- GB2::dgb2(...)
if (!log) r else log(r)
}
## don't include shape1, scale shape2, shape3 in the data, that confuses things
df2 <- df[,c("y","age","sex", "trmt")]
## fit homogeneous model
m1 <- mle2(y ~ dgb2B(shape1, scale, shape2, shape3),
method="Nelder-Mead",
trace=TRUE,
data=df2,
start = list(shape1 = 1.00, scale = 100, shape2 = 1.00, shape3 = 1.00))
## allow parameters to vary by group
mle2(y ~ dgb2B(shape1, scale, shape2, shape3),
## parameters need to be in the same order!
parameters=list(shape1 ~ age + sex + trmt,
scale ~ age + sex + trmt,
shape2 ~ age + sex + trmt,
shape3 ~ age + sex + trmt),
method="Nelder-Mead",
control=list(maxit=10000,
## set parameter scales equal to magnitude
## of starting values; each top-level parameter
## has 4 associated values (intercept, + 3 cov effects)
parscale=rep(abs(coef(m1)), each=4)),
trace=TRUE,
data=df2,
start = as.list(coef(m1))
)
For what it's worth, for this example you could achieve the same goal by fitting eight separate models to all of the age × sex × treatment groups (but I can appreciate that your real application may be more complicated, i.e. you might only want a subset of the parameters to vary across groups, or might want to allow parameters to vary according to a continuous covariate.
If you are going to try much harder problems you might want to fit the parameters on the log scale.
There is also an error happening earlier on:
y <- rgb2(1, shape1, scale, shape2, shape3)
Error in rgb2(1, shape1, scale, shape2, shape3) :
could not find function "rgb2"
you may need to load the required package for this:
https://www.rdocumentation.org/packages/gamlss.dist/versions/5.3-2/topics/GB2
it appears to be in library(gamlss.dist)

Holding the coefficients of a linear model constant while exchanging predictors for their sample means?

I've been trying to look at the explanatory power of individual variables in a model by holding other variables constant at their sample mean.
However, I am unable to do something like:
Temperature = alpha + Beta1*RFGG + Beta2*RFSOx + Beta3*RFSolar
where Beta1=Beta2=Beta3 -- something like
Temperature = alpha + Beta1*(RFGG + RFSolar + RFSOx)
I want to do this so I can compare the difference in explanatory power (R^2/size of residuals) when one independent variable is not held at the sample mean while the rest are.
Temperature = alpha + Beta1*(RFGG + meanRFSolar + meanRFSOx)
or
Temperature = alpha + Beta1*RFGG + Beta1*meanRFSolar + Beta1*meanRFSOx
However, the lm function seems to estimate its own coefficients so I don't know how I can hold anything constant.
Here's some ugly code I tried throwing together that I know reeks of wrongness:
# fixing a new clean matrix for my data
dat = cbind(dat[,1:2],dat[,4:6]) # contains 162 rows of: Date, Temp, RFGG, RFSolar, RFSOx
# make a bunch of sample mean independent variables to use
meandat = dat[,3:5]
meandat$RFGG = mean(dat$RFGG)
meandat$RFSolar = mean(dat$RFSolar)
meandat$RFSOx = mean(dat$RFSOx)
RFTotal = dat$RFGG + dat$RFSOx + dat$RFSolar
B = coef(lm(dat$Temp ~ 1 + RFTot)) # trying to save the coefficients to use them...
B1 = c(rep(B[1],length = length(dat[,1])))
B2 = c(rep(B[2],length = length(dat[,1])))
summary(lm(dat$Temp ~ B1 + B2*dat$RFGG:meandat$RFSOx:meandat$RFSolar)) # failure
summary(lm(dat$Temp ~ B1 + B2*RFTot))
Thanks for taking a look to whoever sees this and please ask me any questions.
Thank you both of you, it was a combination of eliminating the intercept with (-1) and the offset function.
a = lm(Temp ~ I(RFGG + RFSOx + RFSolar),data = dat)
beta1hat = rep(coef(a)[1],length=length(dat[,1]))
beta2hat = rep(coef(a)[2],length=length(dat[,1]))
b = lm(Temp ~ -1 + offset(beta1hat) + offset(beta2hat*(RFGG + RFSOx_bar + RFSolar_bar)),data = dat)
c = lm(Temp ~ -1 + offset(beta1hat) + offset(beta2hat*(RFGG_bar + RFSOx + RFSolar_bar)),data = dat)
d = lm(Temp ~ -1 + offset(beta1hat) + offset(beta2hat*(RFGG_bar + RFSOx_bar + RFSolar)),data = dat)

neural network in R

hi i am trying to use neuralnet function in R so i can predict an integer outcome (meaning) using the rest of the variables.
here is the code that i have used:
library("neuralnet")
I am going to put 2/3 from the data for neural network learning and the rest
for test
ind<-sample(1:nrow(Data),6463,replace=FALSE)
Train<-Data[ind,]
Test<-Data[-ind,]
m <- model.matrix(
~meaning +
firstLevelAFFIRM + firstLevelDAT.PRSN + firstLevelMODE +
firstLevelO.DEF + firstLevelO.INDIV + firstLevelS.AGE.INDIV +
secondLevelV.BIN + secondLevelWord1 + secondLevelWord2 +
secondLevelWord3 + secondLevelWord4 + thirdLevelP.TYPE,
data = Train[,-1]) #(the first column is ID , i am not going to use it)
PredictorVariables <- paste("m[," , 3:ncol(m),"]" ,sep="")
Formula <- formula(paste("meaning ~ ", paste(PredictorVariables, collapse=" + ")))
net <- neuralnet(Formula,data=m, hidden=3, threshold=0.05)
m.test < -model.matrix(
~meaning +
firstLevelAFFIRM + firstLevelDAT.PRSN + firstLevelMODE +
firstLevelO.DEF + firstLevelO.INDIV + firstLevelS.AGE.INDIV +
secondLevelV.BIN + secondLevelWord1 + secondLevelWord2 +
secondLevelWord3 + secondLevelWord4 + thirdLevelP.TYPE,
data = Test[,-1])
net.results <- compute(net, m.test[,-c(1,2)]) #(first column is ID and the second one is the outcome that i am trying to predict)
output<-cbind(round(net.results$net.result),Test$meaning)
mean(round(net.results$net.result)!=Test$meaning)
the misclassification that i got was around 0.01 which is great, but my question is why the outcome that i got (net.results$net.result) is not an integer?
I assume that your output is linear. Try setting linear.output = FALSE.
net <- neuralnet(Formula, data = m, hidden = 3, threshold = 0.05, linear.output = FALSE)

Resources