linear model with Julia,error - julia

using RDatasets
using GLM
housing = dataset("Ecdat", "Housing")
plot(housing, x="LotSize", y="Price", Geom.point)
log_housing = DataFrame(LotSize=log(housing[:,2]), Price=log(housing[:,1]))
plot(log_housing, x="LotSize", y="Price",
Geom.point,Guide.xlabel("LotSize(log)"), Guide.ylabel("Price(log)"))
lm = fit(LinearModel, Price ~ LotSize, log_housing)
#UndefVarError: Price not defined
I run linear model with Julia, but I couldn't get why it has error
This is what I do

In order to estimate linear model you can use lm function (and your code wold actually overwrite this name), so it is better to write:
julia> lm_model = lm(#formula(Price ~ LotSize), log_housing)
StatsModels.DataFrameRegressionModel{GLM.LinearModel{GLM.LmResp{Array{Float64,1}},GLM.DensePredChol{Float64,Base.LinAlg.Cholesky{Float64,Array{Float64,2}}}},Array{Float64,2}}
Formula: Price ~ 1 + LotSize
Coefficients:
Estimate Std.Error t value Pr(>|t|)
(Intercept) 6.46853 0.276741 23.374 <1e-83
LotSize 0.542179 0.0326501 16.6057 <1e-49
As a side note - application of log function to a vector is deprecated, you should use log. (broadcasted):
log_housing = DataFrame(LotSize=log.(housing[:,2]), Price=log.(housing[:,1]))

Related

Calculate proportion of random effect variance from zero-inflation component of glmmTMB model

I fitted a zero-inflation Model in glmmTMB onto my data. It is defined in the following way:
MT.total.glmm.zi <- glmmTMB(MT_total ~ Factor1 * Factor2 + Factor3
+ (1|RANEF1)
+ (1|RANEF2)
+ (1|RANEF3)
+ (1|RANEF4)
+ (1|RANEF5),
family = "poisson",
zi=~Factor1 * Factor2 + Factor3
+ (1|RANEF1)
+ (1|RANEF2)
+ (1|RANEF3)
+ (1|RANEF4)
+ (1|RANEF5),
data=df.MT.total)
Now, for the random effects (RANEF1-5) fitted as random intercepts I would like to report the proportion of variance they explain. For the conditional models, there is the function "get_variance" from the "insight" package providing me with the much needed information:
get_variance(MT.total.glmm.zi.complex, component = "all") %>%
print(.)
$var.fixed
[1] 0.02833294
$var.random
[1] 1.029546
$var.residual
[1] 0.4704095
$var.distribution
[1] 0.4704095
$var.dispersion
[1] 0
$var.intercept
RANEF1 RANEF2 RANEF3 RANEF4 RANEF5
0.2862753 0.1710377 0.0486532 0.1655541 0.3580260
Unfortunately, I could not find a downstream wrapper for get_variance to gain the same information for the zero-inflation component of my model. I "only" found downstream wrappers for the Anova, emmeans and effects package in the documentation of the glmmTMB package from Ben Bolker. Unfortunately due to the nature of my data, the models are zero-inflation models and not hurdle/zero-truncated poisson models. Otherwise I could have just modelled a separate binomial model on the binary version of "MT_total".
There is the VarCorr function which allows printing the variances of the individual random effects:
print(VarCorr(MT.total.glmm.zi.complex), comp = "Variance")
Conditional model:
Groups Name Std.Dev.
RANEF1 (Intercept) 0.286275
RANEF2 (Intercept) 0.171038
RANEF3 (Intercept) 0.048653
RANEF4 (Intercept) 0.165554
RANEF5 (Intercept) 0.358026
Zero-inflation model:
Groups Name Std.Dev.
RANEF1 (Intercept) 1.14835
RANEF2 (Intercept) 0.85102
RANEF3 (Intercept) 0.11784
RANEF4 (Intercept) 0.14599
RANEF5 (Intercept) 0.85835
But I don't really know how I could calculate the remaining variance components of the zero-inflation component of my model (var.fixed, var.residual, var.distribution)
So my two questions are:
Is there a function or downstream wrapper I overlooked which would allow me to use get_variance onto the zero-inflation component of my model?
Or
Could someone give me a hint or guide me in the direction of how I can calculate the remaining variance components of my model in order to calculate the proportion of variance explained by my random effects manually?

retrieve formula used by predict function in exponential equation in R

I can't figure out how to reconstruct the results nor the formula from the predict function of a linear model. I get the same results also when using this data in ggplot geom_smooth(method='lm',formula,y ~ exp(x)).
Here's some sample data
x=c(1,10,100,1000,10000,100000,1000000,3000000)
y=c(1,1,10,15,20,30,40,60)
I would like to use an exponential function so (ignore for the moment that I log the x value, because exp() fails for very large values):
model = lm( y ~ exp(log10(x)))
mypred = predict(model)
plot(log(x),mypred)
I have tried
lm_coef <- coef(model)
plot(log10(x),lm_coef[1]*exp(-lm_coef[2]*x))
However this is giving me a decreasing exponential instead of the increasing. My goal is to extract the equation of the exponential function so I can reuse the coefficients in another context.. What equation is predict() using and is there a way to see it?
I did something along the lines of:
Df<-data.frame(x=c(1,10,100,1000,10000,100000,1000000,3000000),
y=c(1,1,10,15,20,30,40,60))
model<-lm(data = Df, formula = y~log(x))
predict(model)
plot(log(Df$x),predict(model))
summary(model)
The relevant output you get is:
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -6.0700 4.7262 -1.284 0.246386
log(x) 3.5651 0.5035 7.081 0.000398 ***
---
Your equation therefore is 3.5651*log(x)-6.0700

Driscoll and Kraay standard errors in FE regression: reproducing Stata xtscc output in R

I am trying to replicate the results provided by the Stata command xtscc in R with package plm but I am having some trouble to see the same standard errors
I am using a dataset from the package plm also in Stata for replication purposes.
# code to obtain dataset
library(lmtest)
library(car)
library(tidyverse)
data("Produc", package="plm")
write.dta(Produc,"test.dta")
My aim is to run a two way-fixed effect panel model estimation with Driscoll and Kraay standard errors. The routine in Stata is the following
use "test.dta", clear \\ to import data
** i declare the panel
xtset state year
* create the dummies for the time fixed effects
quietly tab year, gen(yeardum)
* run a two way fixed effect regression model with Driscoll and Kraay standard errors
xi: xtscc gsp pcap emp unemp yeardum*,fe
* results are the following
Coef. Std. Err. t P>|t| [95% Conf. Interval]
pcap | -.1769881 .265713 -0.67 0.515 -.7402745 .3862983
emp | 40.61522 2.238392 18.14 0.000 35.87004 45.3604
unemp | 23.59849 85.10647 0.28 0.785 -156.8192 204.0161
In R I use the following routine:
# I declare the panel
Produc <- pdata.frame(Produc, index = c("state","year"), drop.index = FALSE)
# run a two way fixed effect model
femodel <- plm(gsp~pcap+emp+unemp, data=Produc,effect = "twoway",
index = c("iso3c","year"), model="within")
# compute Driscoll and Kraay standard errors using vcovSCC
coeftest(femodel, vcovSCC(femodel))
pcap -0.17699 0.25476 -0.6947 0.4874
emp 40.61522 2.14610 18.9252 <2e-16 ***
unemp 23.59849 81.59730 0.2892 0.7725
While point estimates are the same that in Stata, standard errors are different.
To check whether I am using the "wrong" small sample adjustment for standard errors, I also tryed running the coeftest with all available adjustments, but none yields the same values as xtscc.
library(purrr)
results <- map(c("HC0", "sss", "HC1", "HC2", "HC3", "HC4"),~coeftest(femodel, vcovSCC(femodel,type = .x)))
walk(results,print)
# none of the estimated standard errors is the same as xtscc
Does anyone know how I can replicate the results of Stata in R?
Since plm version 2.4, its function within_intercept(., return.model = TRUE) can return the full model of a within model with the intercept as in Stata. With this, it is possible to exactly replicate the result of Stata's user contributed command xtscc.
The way xtscc seems to work is by estimating the twoway FE model as a one-way FE model + dummies for the time dimension. So let's replicate that with plm:
data("Produc", package="plm")
Produc <- pdata.frame(Produc, index = c("state","year"), drop.index = FALSE)
femodel <- plm(gsp ~ pcap + emp + unemp + factor(year), data = Produc, model="within")
femodelint <- within_intercept(femodel, return.model = TRUE)
lmtest::coeftest(femodelint, vcov. = function(x) vcovSCC(x, type = "sss"))
# Estimate Std. Error t value Pr(>|t|)
# (Intercept) -6547.68816 3427.47163 -1.9104 0.0564466 .
# pcap -0.17699 0.26571 -0.6661 0.5055481
# emp 40.61522 2.23839 18.1448 < 0.00000000000000022 ***
# unemp 23.59849 85.10647 0.2773 0.7816356
# [...]

GLMMadaptive for semi-continuous data

I am dealing with a very hard-to-work data set: fish larval density. It is a semicontinuous data, with 90% of zeros and a right-skewed distribution, with few very huge values. I would like, for example, to make some predictions about enviromental features and and larval density. I am trying to use a two part model (GLMMadaptive for semicontinuous data), family = hurdle.lognormal().
But the command summary does not work with models fitted with mixed_model(), family = hurdle.lognormal(). So, I don't know how to get standard errors, p-values and confidence intervals for my predictors.
Another question is related to Goodness of Fit for the residuals. How can I look for it?
Also, I tried to fit a null model, without fixed effects, looking for model significance, but I couldn't fix it, because it gives me the following message:
Error in .subset2(x, i, exact = exact) : subscript out of bounds
Nullmodel <- mixed_model(fixed = Dprochilodus ~ 1, random = ~ 1|periodo, data = OeL_final, family = hurdle.lognormal(), max_coef_value = 30)
mymodel <- mixed_model(fixed = Dprochilodus ~ ponto+Dif_his.y+temp, random = ~ 1 | periodo, data = OeL_final, family = hurdle.lognormal(), n_phis = 1, zi_fixed = ~ ponto, max_coef_value = 30)
The results of my model are:
Call: mixed_model(fixed = logDprochilodus ~ ponto + Dif_his.y + temp,
random = ~1 | periodo, data = OeL_final, family = hurdle.lognormal(),
zi_fixed = ~ponto, n_phis = 1, max_coef_value = 30)
Model: family: hurdle log-normal link: identity
Random effects covariance matrix:
StdDev (Intercept) 0.05366623
Fixed effects: (Intercept) pontoIR pontoITA pontoJEQ pontoTB Dif_his.y temp
3.781147e-01 -1.161167e-09 3.660306e-01 -1.273341e+00 -5.834588e-01 1.374241e+00 -4.010771e-02
Zero-part coefficients: (Intercept) pontoIR pontoITA pontoJEQ pontoTB
1.4522523 21.3761790 3.3013379 1.1504374 0.2031707
Residual std. dev.:
1.240212
log-Lik: -216.3266
Have some one worked with this kind of model?? I really appreciate any help!
The summary() method should work with family = hurdle.lognormal(). For example, you can call summary() in the example posted here.
To check the goodness-of-fit you could use the simulated scale residuals provided from the DHARMa package; for an example check here.
If you are working in Rstudio console you may need to print(summary())

How to add a random intercept and random slope term to a GAMM model in R

I am trying to specify both a random intercept and random slope term in a GAMM model with one fixed effect.
I have successfully fitted a model with a random intercept using the below code within the mgcv library, but can now not determine what the syntax is for a random slope within the gamm() function:
M1 = gamm(dur ~ s(dep, bs="ts", k = 4), random= list(fInd = ~1), data= df)
If I was using both a random intercept and slope within a linear mixed-effects model I would write it in the following way:
M2 = lme(dur ~ dep, random=~1 + dep|fInd, data=df)
The gamm() supporting documentation states that the random terms need to be given in the list form as in lme() but I cannot find any interpretable examples that include both slope and intercept terms. Any advice / solutions would be much appreciated.
The gamm4 function in the gamm4 package contains a way to do this. You specify the random intercept and slope in the same way that you do in the lmer style. In your case:
M1 = gamm4(dur~s(dep,bs="ts",k=4), random = ~(1+dep|fInd), data=df)
Here is the gamm4 documentation:
https://cran.r-project.org/web/packages/gamm4/gamm4.pdf
Here is the gamm() syntax to enter correlated random intercept and slope effects, using the sleepstudy dataset.
library(nlme)
library(mgcv)
data(sleepstudy,package='lme4')
# Model via lme()
fm1 <- lme(Reaction ~ Days, random= ~1+Days|Subject, data=sleepstudy, method='REML')
# Model via gamm()
fm1.gamm <- gamm(Reaction ~ Days, random= list(Subject=~1+Days), data=sleepstudy, method='REML')
VarCorr(fm1)
VarCorr(fm1.gamm$lme)
# Both are identical
# Subject = pdLogChol(1 + Days)
# Variance StdDev Corr
# (Intercept) 612.0795 24.740241 (Intr)
# Days 35.0713 5.922103 0.066
# Residual 654.9424 25.591843
The syntax to enter uncorrelated random intercept and slope effects is the same for lme() and gamm().
# Model via lme()
fm2 <- lme(Reaction ~ Days, random= list(Subject=~1, Subject=~0+Days), data=sleepstudy, method='REML')
# Model via gamm()
fm2.gamm <- gamm(Reaction ~ Days, random= list(Subject=~1, Subject=~0+Days), data=sleepstudy, method='REML')
VarCorr(fm2)
VarCorr(fm2.gamm$lme)
# Both are identical
# Variance StdDev
# Subject = pdLogChol(1)
# (Intercept) 627.5690 25.051328
# Subject = pdLogChol(0 + Days)
# Days 35.8582 5.988172
# Residual 653.5838 25.565285
This answer also shows how to enter multiple random effects into lme().

Resources