Delta Method in GLM.jl - julia

Is hypothesis testing of linear and non-linear functions of coefficients of GLM supported in Julia's GLM.jl?
I am looking for a Julia equivalent of marginaleffects package in R which uses the deltamethod() function, or the nlcom post estimation command in Stata.
Thanks!
Sample R code:
eq = lm(y ~ x1 + x2, data)
deltamethod(eq, "x1 / x2 = 1")
Sample Stata code:
reg y x1 x2
nlcom _b[x1]/_b[x2]

The Julia Package, 'TargetedLearning' uses the delta method, see:
( https://lendle.github.io/TargetedLearning.jl/user-guide/influencecurves/#the-delta-method )

Related

Syntax for glmer function for use with glmulti?

Using glmer, I can run a logistic regression mixed model just fine. But when I try to do the same using glmulti, I get errors (described below). I think the problem is with the function I am specifying for use in glmulti. I want a function that specifies a logistic regression model for data containing continuous fixed covariates and categorical random effects, using a logit link. The response variable is a binary 0/1.
Sample data:
library(lme4)
library(rJava)
library(glmulti)
set.seed(666)
x1 = rnorm(1000) # some continuous variables
x2 = rnorm(1000)
x3 = rnorm(1000)
r1 = rep(c("red", "blue"), times = 500) #categorical random effects
r2 = rep(c("big", "small"), times = 500)
z = 1 + 2*x1 + 3*x2 +2*x3
pr = 1/(1+exp(-z))
y = rbinom(1000,1,pr) # bernoulli response variable
df = data.frame(y=y,x1=x1,x2=x2, x3=x3, r1=r1, r2=r2)
A single glmer logistic regression works just fine:
model1<-glmer(y~x1+x2+x3+(1|r1)+(1|r2),data=df,family="binomial")
But errors occur when I try to use the same model structure through glmulti:
# create a function - I think this is where my problem is
glmer.glmulti<-function(formula, data, family=binomial(link ="logit"), random="", ...){
glmer(paste(deparse(formula),random),data=data,...)
}
# run glmulti models
glmulti.logregmixed <-
glmulti(formula(glmer(y~x1+x2+x3+(1|r1)+(1|r2), data=df), fixed.only=TRUE), #error w/o fixed.only=TRUE
data=df,
level = 2,
method = "g",
crit = "aicc",
confsetsize = 128,
plotty = F, report = F,
fitfunc = glmer.glmulti,
family = binomial(link ="logit"),
random="+(1|r1)","+(1|r2)", # possibly this line is incorrect?
intercept=TRUE)
#Errors returned:
singular fit
Error in glmulti(formula(glmer(y ~ x1 + x2 + x3 + (1 | r1) + (1 | r2), :
Improper call of glmulti.
In addition: Warning message:
In glmer(y ~ x1 + x2 + x3 + (1 | r1) + (1 | r2), data = df) :
calling glmer() with family=gaussian (identity link) as a shortcut to lmer() is deprecated; please call lmer() directly
I've tried various changes to the function, and within the formula and fitfunc portion of the glmulti code. I've tried substituting lmer for glmer and I guess I don't understand the error. I'm also afraid that calling lmer may change the model structure, as during one of my attempts the summary() of the model stated "Linear mixed model fit by REML ['lmerMod']." I need the glmulti models to be the same as what I'm obtaining with model1 using glmer (ie summary(model1) gives "Generalized linear mixed model fit by maximum likelihood (Laplace Approximation) ['glmerMod']"
Many similar questions remain unanswered. Thanks in advance!
Credit:
sample data set created with help from here:
https://stats.stackexchange.com/questions/46523/how-to-simulate-artificial-data-for-logistic-regression
glmulti code adapted from here:
Model selection using glmulti

SparkR MLlib & spark.ml: least squares and glm optimization

Would anyone be able to explain how to specify optimization methods in the SparkR operation glm? When I try to fit an OLS model with glm, I can only specify "normal" or "auto" as the solver type. SparkR isn't able to interpret the solver specification "l-bfgs", leading me to believe that when I do specify "auto", SparkR simply assumes "normal" and then estimates the model coefficients analytically, using the LS normal equation.
Is fitting GLMs with stochastic gradient descent and L-BFGS not available in SparkR, or am I writing the following evaluation incorrectly?
m <- SparkR::glm(y ~ x1 + x2 + x3, data = df, solver = "l-bfgs")
There's plenty of documentation in Spark about using iterative methods to fit GLMs, e.g. LogisticRegressionWithLBFGS and LinearRegressionWithSGD (discussed here), but I haven't been able to find any such documentation for the R API. Is this simply not available in SparkR (i.e. are SparkR users constrained to solving analytically and, therefore, constrained in the size of our data), or am I missing something essential here? If it isn't currently available in SparkR, is it supposed to come out with SparkR 2.0.0?
Below, I create a toy data set and fit three models, each with a different solver specification:
x1 <- rnorm(n=200, mean=10, sd=2)
x2 <- rnorm(n=200, mean=17, sd=3)
x3 <- rnorm(n=200, mean=8, sd=1)
y <- 1 + .2 * x1 + .4 * x2 + .5 * x3 + rnorm(n=200, mean=0, sd=.1)
dat <- cbind.data.frame(y, x1, x2, x3)
df <- as.DataFrame(sqlContext, dat)
m1 <- SparkR::glm(y ~ x1 + x2 + x3, data = df, solver = "normal")
m2 <- SparkR::glm(y ~ x1 + x2 + x3, data = df, solver = "auto")
m3 <- SparkR::glm(y ~ x1 + x2 + x3, data = df, solver = "l-bfgs")
The first and second model result in the same parameter estimation values (supporting my assumption that SparkR is solving the normal equation when fitting both models and, consequently, the models are equivalent). SparkR is able to fit the third model, but when I try to print a summary of the GLM, I receive the following error:
For reference, I am doing this through AWS and have tried different versions of EMR, including the most recent (in case that makes a difference). Also, I am using Spark 1.6.1 (R API).
Spark 1.6.2 API documentation is here
solver:
The solver algorithm used for optimization, this can be "l-bfgs", "normal" and "auto". "l-bfgs" denotes Limited-memory BFGS which is a limited-memory quasi-Newton optimization method. "normal" denotes using Normal Equation as an analytical solution to the linear regression problem. The default value is "auto" which means that the solver algorithm is selected automatically.
To me - this looks worthy of a bug report on the Apache Spark Jira site.

Replacing intercept with dummy variables in ARIMAX models in R

I am attempting to fit an ARIMAX model to daily consumption data in R. When I perform an OLS regression with lm() I am able to include a dummy variable for each unit and remove the constant term (intercept) to avoid less then full rank matrices.
lm1 <- lm(y ~ -1 + x1 + x2 + x3, data = dat)
I have not found a way to do this with arima() which forces me to use the constant term and exclude one of the dummy variables.
with(dat, arima(y, xreg = cbind(x1, x2))
Is there a specific reason why arima() doesn't allow this and is there a way to bypass?
See the documentation for the argument include.mean in ?arima, it seems you want the following: arima(y, xreg = cbind(x1, x2), include.mean=FALSE).
Be also aware of the definition of the model fitted by ARIMA as pointed by #RichardHardy.

Two random terms with nlme

I am performing a mixed model with nlme package in R. My situation is:
The mixed model is:
MY = DFC + DFC2, random=~DFC|Animal, data=my_data)
where Animal is the random effect.
However, if I write the model like this, I can only obtain random intercept, and slope for DFC (by Animal), but not DFC2.
I would like to have also the random slope (by Animal) for DFC2!
Could you please help me?
Thank you very much,
If you use the library lme4
library(lme4)
fit <- lmer(y ~ x1 + x2 (1+ x1 + x2|group), data = test.df)
coef(fit)
Use coef() on your fitted object to see the slopes.

Mixture of Linear Regression Models using flexmix

I have a data set with response variable ADA, and independent variables LEV, ROA, and ROAL. The data is called dt. I used the following code to get coefficients for latent classes.
m1 <- stepFlexmix(ADA ~ LEV+ROA+ROAL,data=dt,control= list(verbose=0),
k=1:5,nrep= 10);
m1 <- getModel(m1, "BIC");
All was fine until I read the following from http://rss.acs.unt.edu/Rdoc/library/flexmix/html/flexmix.html
model Object of FLXM of list of FLXM objects. Default is the object returned by calling FLXMRglm().
Which I think says that default model call is generalized linear model, while I am interested in linear model. How can I use linear model rather than GLM? I searched for it for quite a while, bit could't get it except this example from
http://www.inside-r.org/packages/cran/flexmix/docs/flexmix, which I couldn't make sense of:
data("NPreg", package = "flexmix")
## mixture of two linear regression models. Note that control parameters
## can be specified as named list and abbreviated if unique.
ex1 <- flexmix(yn~x+I(x^2), data=NPreg, k=2,
control=list(verb=5, iter=100))
ex1
summary(ex1)
plot(ex1)
## now we fit a model with one Gaussian response and one Poisson
## response. Note that the formulas inside the call to FLXMRglm are
## relative to the overall model formula.
ex2 <- flexmix(yn~x, data=NPreg, k=2,
model=list(FLXMRglm(yn~.+I(x^2)),
FLXMRglm(yp~., family="poisson")))
plot(ex2)
Someone please let me know how to use linear regression instead of GLM. Or am I already using LM and just got confused because of the "default model line"? Please explain. Thanks.
I did a numerical analysis to understand if
m1 <- stepFlexmix(ADA ~ LEV+ROA+ROAL,data=dt,control= list(verbose=0)
does produce results from linear regression. To do the experiment, I ran the following code and found that yes the estimated parameters are indeed from linear regression. Experiment helped me to allay my reservations.
x1 <- c(1:200);
x2 <- x1*x1;
x3 <- x1*x2;
e1 <- rnorm(200,0,1);
e2 <- rnorm(200,0,1);
y1 <- 5+12*x1+20*x2+30*x3+e1;
y2 <- 18+5*x1+10*x2+15*x3+e2;
y <- c(y1,y2)
x11 <- c(x1,x1)
x22 <- c(x2,x2)
x33 <- c(x3,x3)
d <- data.frame(y,x11,x22,x33)
m <- stepFlexmix(y ~ x11+x22+x33, data =d, control = list(verbose=0), k=1:5, nrep = 10);
m <- getModel(m, "BIC");
parameters(m);
plotEll(m, data = d)
m.refit <- refit(m);
summary(m.refit)

Resources