I have the (sample) dataset below:
round<-c( 0.125150, 0.045800, -0.955299, -0.232007, 0.120880, -0.041525, 0.290473, -0.648752, 0.113264, -0.403685)
square<-c(-0.634753, 0.000492, -0.178591, -0.202462, -0.592054, -0.583173, -0.632375, -0.176673, -0.680557, -0.062127)
ideo<-c(0,1,0,1,0,1,0,0,1,1)
ex<-data.frame(round,square,ideo)
When I ran the GEE regression in SPSS I took this table as a result.
I used packages gee and geepack in R to run the same analysis and I took these results:
#gee
summary(gee(ideo ~ square + round,data = ex, id = ideo,
corstr = "independence"))
Coefficients:
Estimate Naive S.E. Naive z Robust S.E. Robust z
(Intercept) 1.0541 0.4099 2.572 0.1328 7.937
square 1.1811 0.8321 1.419 0.4095 2.884
round 0.7072 0.5670 1.247 0.1593 4.439
#geepack
summary(geeglm(ideo ~ square + round,data = ex, id = ideo,
corstr = "independence"))
Coefficients:
Estimate Std.err Wald Pr(>|W|)
(Intercept) 1.054 0.133 63.00 2.1e-15 ***
square 1.181 0.410 8.32 0.0039 **
round 0.707 0.159 19.70 9.0e-06 ***
---
I would like to recreate exactly the table of SPSS(not the results as I use a subset of the original dataset)but I do not know how to achieve all these results.
A tiny bit of tidyverse magic can get the same results - more or less.
Get the information from coef(summary(geeglm())) and compute the necessary columns:
library("tidyverse")
library("geepack")
coef(summary(geeglm(ideo ~ square + round,data = ex, id = ideo,
corstr = "independence"))) %>%
mutate(lowerWald = Estimate-1.96*Std.err, # Lower Wald CI
upperWald=Estimate+1.96*Std.err, # Upper Wald CI
df=1,
ExpBeta = exp(Estimate)) %>% # Transformed estimate
mutate(lWald=exp(lowerWald), # Upper transformed
uWald=exp(upperWald)) # Lower transformed
This produces the following (with the data you provided). The order and the names of the columns could be modified to suit your needs
Estimate Std.err Wald Pr(>|W|) lowerWald upperWald df ExpBeta lWald uWald
1 1.0541 0.1328 62.997 2.109e-15 0.7938 1.314 1 2.869 2.212 3.723
2 1.1811 0.4095 8.318 3.925e-03 0.3784 1.984 1 3.258 1.460 7.270
3 0.7072 0.1593 19.704 9.042e-06 0.3949 1.019 1 2.028 1.484 2.772
I am trying to do the restriction test for GARCH model (ugarch from 'rugarch' package) using the following hypothesis:
H0: alpha1 + beta1 = 1
H1: alpha1 + beta1 ≠ 1
So I am trying to follow the advice from
https://stats.stackexchange.com/questions/151573/testing-the-sum-of-garch1-1-parameters/151578?noredirect=1#comment629951_151578
1.Specify the restricted model using ugarchspec with option variance.model = list(model = "sGARCH") and estimate it using ugarchfit. Obtain the log-likelihood from the slot fit sub-slot likelihood.
2.Specify the restricted model using ugarchspec with option variance.model = list(model = "iGARCH") and estimate it using ugarchfit. Obtain the log-likelihood as above.
3.Calculate LR=2(Log-likelihood of unrestricted model − Log-likelihood of restricted model) and Obtain the p-value as pchisq(q = LR, df = 1).
I have the following 'sGARCH' and 'iGARCH' models I am using from 'rugarch' package.
(A) sGARCH (unrestricted model):
speccR = ugarchspec(variance.model=list(model = "sGARCH",garchOrder=c(1,1)),mean.model=list(armaOrder=c(0,0), include.mean=TRUE,archm = TRUE, archpow = 1))
ugarchfit(speccR, data=data.matrix(P),fit.control = list(scale = 1))
And the following is this sGARCH output:
*---------------------------------*
* GARCH Model Fit *
*---------------------------------*
Conditional Variance Dynamics
-----------------------------------
GARCH Model : sGARCH(1,1)
Mean Model : ARFIMA(0,0,0)
Distribution : norm
Optimal Parameters
------------------------------------
Estimate Std. Error t value Pr(>|t|)
mu -0.000355 0.001004 -0.35377 0.723508
archm 0.096364 0.039646 2.43059 0.015074
omega 0.000049 0.000010 4.91096 0.000001
alpha1 0.289964 0.021866 13.26117 0.000000
beta1 0.709036 0.023200 30.56156 0.000000
Robust Standard Errors:
Estimate Std. Error t value Pr(>|t|)
mu -0.000355 0.001580 -0.22482 0.822122
archm 0.096364 0.056352 1.71002 0.087262
omega 0.000049 0.000051 0.96346 0.335316
alpha1 0.289964 0.078078 3.71375 0.000204
beta1 0.709036 0.111629 6.35173 0.000000
LogLikelihood : 5411.828
Information Criteria
------------------------------------
Akaike -3.9180
Bayes -3.9073
Shibata -3.9180
Hannan-Quinn -3.9141
Weighted Ljung-Box Test on Standardized Residuals
------------------------------------
statistic p-value
Lag[1] 233.2 0
Lag[2*(p+q)+(p+q)-1][2] 239.1 0
Lag[4*(p+q)+(p+q)-1][5] 247.4 0
d.o.f=0
H0 : No serial correlation
Weighted Ljung-Box Test on Standardized Squared Residuals
------------------------------------
statistic p-value
Lag[1] 4.695 0.03025
Lag[2*(p+q)+(p+q)-1][5] 5.941 0.09286
Lag[4*(p+q)+(p+q)-1][9] 7.865 0.13694
d.o.f=2
Weighted ARCH LM Tests
------------------------------------
Statistic Shape Scale P-Value
ARCH Lag[3] 0.556 0.500 2.000 0.4559
ARCH Lag[5] 1.911 1.440 1.667 0.4914
ARCH Lag[7] 3.532 2.315 1.543 0.4190
Nyblom stability test
------------------------------------
Joint Statistic: 5.5144
Individual Statistics:
mu 0.5318
archm 0.4451
omega 1.3455
alpha1 4.1443
beta1 2.2202
Asymptotic Critical Values (10% 5% 1%)
Joint Statistic: 1.28 1.47 1.88
Individual Statistic: 0.35 0.47 0.75
Sign Bias Test
------------------------------------
t-value prob sig
Sign Bias 0.2384 0.8116
Negative Sign Bias 1.1799 0.2381
Positive Sign Bias 1.1992 0.2305
Joint Effect 2.9540 0.3988
Adjusted Pearson Goodness-of-Fit Test:
------------------------------------
group statistic p-value(g-1)
1 20 272.1 9.968e-47
2 30 296.9 3.281e-46
3 40 313.3 1.529e-44
4 50 337.4 1.091e-44
Elapsed time : 0.4910491
(B) iGARCH (restricted model):
speccRR = ugarchspec(variance.model=list(model = "iGARCH",garchOrder=c(1,1)),mean.model=list(armaOrder=c(0,0), include.mean=TRUE,archm = TRUE, archpow = 1))
ugarchfit(speccRR, data=data.matrix(P),fit.control = list(scale = 1))
However, I get the following output of beta1 with N/A in its standard error, t-value, and p-value.
The following is the iGARCH output:
*---------------------------------*
* GARCH Model Fit *
*---------------------------------*
Conditional Variance Dynamics
-----------------------------------
GARCH Model : iGARCH(1,1)
Mean Model : ARFIMA(0,0,0)
Distribution : norm
Optimal Parameters
------------------------------------
Estimate Std. Error t value Pr(>|t|)
mu -0.000355 0.001001 -0.35485 0.722700
archm 0.096303 0.039514 2.43718 0.014802
omega 0.000049 0.000008 6.42826 0.000000
alpha1 0.290304 0.021314 13.62022 0.000000
beta1 0.709696 NA NA NA
Robust Standard Errors:
Estimate Std. Error t value Pr(>|t|)
mu -0.000355 0.001488 -0.2386 0.811415
archm 0.096303 0.054471 1.7680 0.077066
omega 0.000049 0.000032 1.5133 0.130215
alpha1 0.290304 0.091133 3.1855 0.001445
beta1 0.709696 NA NA NA
LogLikelihood : 5412.268
Information Criteria
------------------------------------
Akaike -3.9190
Bayes -3.9105
Shibata -3.9190
Hannan-Quinn -3.9159
Weighted Ljung-Box Test on Standardized Residuals
------------------------------------
statistic p-value
Lag[1] 233.2 0
Lag[2*(p+q)+(p+q)-1][2] 239.1 0
Lag[4*(p+q)+(p+q)-1][5] 247.5 0
d.o.f=0
H0 : No serial correlation
Weighted Ljung-Box Test on Standardized Squared Residuals
------------------------------------
statistic p-value
Lag[1] 4.674 0.03063
Lag[2*(p+q)+(p+q)-1][5] 5.926 0.09364
Lag[4*(p+q)+(p+q)-1][9] 7.860 0.13725
d.o.f=2
Weighted ARCH LM Tests
------------------------------------
Statistic Shape Scale P-Value
ARCH Lag[3] 0.5613 0.500 2.000 0.4538
ARCH Lag[5] 1.9248 1.440 1.667 0.4881
ARCH Lag[7] 3.5532 2.315 1.543 0.4156
Nyblom stability test
------------------------------------
Joint Statistic: 1.8138
Individual Statistics:
mu 0.5301
archm 0.4444
omega 1.3355
alpha1 0.4610
Asymptotic Critical Values (10% 5% 1%)
Joint Statistic: 1.07 1.24 1.6
Individual Statistic: 0.35 0.47 0.75
Sign Bias Test
------------------------------------
t-value prob sig
Sign Bias 0.2252 0.8218
Negative Sign Bias 1.1672 0.2432
Positive Sign Bias 1.1966 0.2316
Joint Effect 2.9091 0.4059
Adjusted Pearson Goodness-of-Fit Test:
------------------------------------
group statistic p-value(g-1)
1 20 273.4 5.443e-47
2 30 300.4 6.873e-47
3 40 313.7 1.312e-44
4 50 337.0 1.275e-44
Elapsed time : 0.365
If I calculate the log-likelihood difference to derive the chi-square value
as suggested I get negative value as such:
2*(5411.828-5412.268)=-0.88
The Log-likelihood of the restricted model "iGARCH" is 5412.268 which is higher than the Log-likelihood of the unrestricted model "sGARCH" of 5411.828
which should not happen.
The data I use are as follows in time series manner (I am only posting first 100 data due to space limit):
Time P
1 0.454213593
2 0.10713195
3 -0.106819399
4 -0.101610699
5 -0.094327846
6 -0.037176107
7 -0.101550977
8 -0.016309894
9 -0.041889484
10 0.103384357
11 -0.011746377
12 0.063304432
13 0.059539249
14 -0.049946177
15 -0.023251656
16 0.013989353
17 -0.002815588
18 -0.009678745
19 -0.011139779
20 0.031592303
21 -0.02348106
22 -0.007206591
23 0.077422089
24 0.064632768
25 -0.003396734
26 -0.025524166
27 -0.026632474
28 0.014614485
29 -0.012380888
30 -0.007463018
31 0.022759969
32 0.038667465
33 -0.028619484
34 -0.021995984
35 -0.006162809
36 -0.031187399
37 0.022455611
38 0.011419264
39 -0.005700445
40 -0.010106343
41 -0.004310162
42 0.00513715
43 -0.00498106
44 -0.021382251
45 -0.000694252
46 -0.033326085
47 0.002596086
48 0.011008057
49 -0.004754233
50 0.008969559
51 -0.00354088
52 -0.007213115
53 -0.003101495
54 0.005016228
55 -0.010762641
56 0.030770993
57 -0.015636325
58 0.000875417
59 0.03975863
60 -0.050207219
61 0.011308261
62 -0.021453315
63 -0.003309127
64 0.025687191
65 0.009467306
66 0.005519485
67 -0.011473758
68 0.00223934
69 -0.000913651
70 -0.003055385
71 0.000974694
72 0.000288611
73 -0.002432251
74 -0.0016975
75 -0.001565034
76 0.003332848
77 -0.008007295
78 -0.003086435
79 -0.00160435
80 0.005825885
81 0.020078093
82 0.018055453
83 0.181098137
84 0.102698818
85 0.128786594
86 -0.013587077
87 -0.038429879
88 0.043637258
89 0.042741709
90 0.016384872
91 0.000216317
92 0.009275681
93 -0.008595197
94 -0.016323335
95 -0.024083247
96 0.035922206
97 0.034863621
98 0.032401779
99 0.126333922
100 0.054751935
In order to perform the restriction test from my H0 and H1 hypothesis, may I know how I can fix this problem?
There seems to be a problem with the estimation procedure... Since one model is a restricted version of the other, using iGARCH should indeed lead to a lower likelihood.
Using the subset of your data,
fit1 <- ugarchfit(speca, data = data.matrix(P))
# [1] 161.7373
fit2 <- ugarchfit(speca2, data = data.matrix(P))
# [1] 165.333
As I said in my deleted post, those numbers looked suspicious, as if they are -loglikelihoods. However, recovering the likelihood from the residuals gives
-sum(log(2 * pi * sigma(fit1)^2)) / 2 - sum(residuals(fit1, standardize = TRUE)^2) / 2
# [1] 161.7373
-sum(log(2 * pi * sigma(fit2)^2)) / 2 - sum(residuals(fit2, standardize = TRUE)^2) / 2
# [1] 165.333
Meaning that my suspicion was wrong (it must then be that the density values are > 1). For this reason, I think there is no way to use the current output to construct a test. The iGARCH restriction fits miraculously well..
However, some experimenting showed that using
fit.control = list(scale = 1)
changes things. In particular,
fit1 <- ugarchfit(speca, data = data.matrix(P), fit.control = list(scale = 1))
likelihood(fit1)
# [1] 161.7373
-sum(log(2 * pi * sigma(fit1)^2)) / 2 - sum(residuals(fit1, standardize = TRUE)^2) / 2
# [1] 161.7373
fit2 <- ugarchfit(speca2, data = data.matrix(P), fit.control = list(scale = 1))
likelihood(fit2)
# [1] 19.5233
-sum(log(2 * pi * sigma(fit2)^2)) / 2 - sum(residuals(fit2, standardize = TRUE)^2) / 2
# [1] 19.5233
That would somewhat make sense given
(page 25) "scaling sometimes facilitates the estimation process"
(page 46) Q: My model does not converge, what can I do?
"...Additionally, in the fit.control list of the fitting routines, the option to perform scaling of the data prior to fitting usually helps, although it is not available under some setups..."
However, it again is suspicious that the likelihood of the first model remains the same. Then we have that
fit1 <- ugarchfit(speca, data = data.matrix(P), fit.control = list(scale = 0), solver.control = list(trace = TRUE))
#
# Iter: 1 fn: -161.7373 Pars: -0.0454619 0.0085993 0.0002706 0.0593231 # 0.6898473
# Iter: 2 fn: -161.7373 Pars: -0.0454619 0.0085993 0.0002706 0.0593231 0.6898473
# solnp--> Completed in 2 iterations
coef(fit1)
# mu mxreg1 omega alpha1 beta1
# -0.0454619274 0.0085992743 0.0002706018 0.0593231138 0.6898472858
fit1 <- ugarchfit(speca, data = data.matrix(P), fit.control = list(scale = 1), solver.control = list(trace = TRUE))
# Iter: 1 fn: 114.8143 Pars: -0.72230 0.13663 0.06830 0.05930 0.68988
# Iter: 2 fn: 114.8143 Pars: -0.72228 0.13662 0.06830 0.05931 0.68986
# solnp--> Completed in 2 iterations
coef(fit1)
# mu mxreg1 omega alpha1 beta1
# -0.045463099 0.008599494 0.000270610 0.059310622 0.689858216
and
fit2 <- ugarchfit(speca2, data = data.matrix(P), fit.control = list(scale = 0), solver.control = list(trace = TRUE))
# Iter: 1 fn: -165.3330 Pars: 0.0292439 -0.0051098 0.0002221 0.7495846
# Iter: 2 fn: -165.3330 Pars: 0.0292434 -0.0051097 0.0002221 0.7495853
# solnp--> Completed in 2 iterations
coef(fit2)
# mu mxreg1 omega alpha1 beta1
# 0.0292434276 -0.0051096984 0.0002221457 0.7495853224 0.2504146776
fit2 <- ugarchfit(speca2, data = data.matrix(P), fit.control = list(scale = 1), solver.control = list(trace = TRUE))
# Iter: 1 fn: 111.2185 Pars: 0.46462 -0.08118 0.05607 0.74959
# Iter: 2 fn: 111.2185 Pars: 0.46458 -0.08118 0.05607 0.74959
# solnp--> Completed in 2 iterations
coef(fit2)
# mu mxreg1 omega alpha1 beta1
# 0.46458110 -0.08117626 0.05607215 0.74959242 0.25040758
Which makes things even stranger due to multiple inconsistencies...
This is the answer I received from the package author "Alexios Galanos":
The problem is that there is a restriction on the stationarity of the GARCH model which may interfere with
the solver convergence for models which are on the border of stationarity. Here is the solution:
library(rugarch)
library(xts)
dat<-read.table("data.txt",header = TRUE, stringsAsFactors = FALSE)
dat = xts(dat[,2], as.Date(strptime(dat[,1],"%d/%m/%Y")))
spec1<-ugarchspec(mean.model=list(armaOrder=c(0,0), archm=TRUE, archpow=1), variance.model=list(model="iGARCH"))
spec2<-ugarchspec(mean.model=list(armaOrder=c(0,0), archm=TRUE, archpow=1), variance.model=list(model="sGARCH"))
mod1<-ugarchfit(spec1, dat, solver="solnp")
mod2<-ugarchfit(spec2,dat)
persistence(mod2)
>0.999
# at the limit of the internal constraint
mod2<-ugarchfit(spec2, dat, solver="solnp", fit.control = list(stationarity=0))
likelihood(mod2)
>5428.871
likelihood(mod1)
>5412.268
persistence(mod2)
1.08693
# above the limit
Here is one solution to change the constraint:
.garchconbounds2= function(){
return(list(LB = 1e-12,UB = 0.99999999999))
}
assignInNamespace(x = ".garchconbounds", value=.garchconbounds2, ns="rugarch")
mod2<-ugarchfit(spec2, dat, solver="solnp")
likelihood(mod2)
>5412.268
Now the value is the same as the constrained model (they are both effectively integrated), but the constrained model has one less parameter
to estimate.
I don't even need a fit.control=list(scale=1) at all here. Probably better to delete this scale.
I have a data frame with 5 variables: Lot / Wafer / Serial Number / Voltage / Amplification. In this data frame there are 1020 subsets grouped by Serial_number. Each subset has a certain number of measurement data points (Amplification against voltage).
I fit the data with
summary(fit2.lme <- lmer(log(log(Amplification)) ~ poly(Voltage, 3) + (poly(Voltage, 1) | Serial_number),
+ data = APD))
which yields:
Linear mixed model fit by REML ['lmerMod']
Formula: log(log(Amplification)) ~ poly(Voltage, 3) + (poly(Voltage, 1) | Serial_number)
Data: APD
REML criterion at convergence: 35286.1
Scaled residuals:
Min 1Q Median 3Q Max
-20.7724 -0.2438 -0.1297 0.2434 13.2663
Random effects:
Groups Name Variance Std.Dev. Corr
Serial_number (Intercept) 1.439e-02 0.1199
poly(Voltage, 1) 2.042e+03 45.1908 -0.76
Residual 8.701e-02 0.2950
Number of obs: 76219, groups: Serial_number, 1020
Fixed effects:
Estimate Std. Error t value
(Intercept) 5.944e-02 3.914e-03 15.2
poly(Voltage, 3)1 5.862e+02 1.449e+00 404.5
poly(Voltage, 3)2 -1.714e+02 3.086e-01 -555.4
poly(Voltage, 3)3 4.627e+01 3.067e-01 150.8
Correlation of Fixed Effects:
(Intr) p(V,3)1 p(V,3)2
ply(Vlt,3)1 -0.713
ply(Vlt,3)2 0.015 -0.004
ply(Vlt,3)3 0.004 0.012 0.018
and when I add a higher polynomial in the random effects I get a warning:
> summary(fit3.lme <- lmer(log(log(Amplification)) ~ poly(Voltage, 3) + (poly(Voltage, 2) | Serial_number),
+ data = APD))
Linear mixed model fit by REML ['lmerMod']
Formula: log(log(Amplification)) ~ poly(Voltage, 3) + (poly(Voltage, 2) | Serial_number)
Data: APD
REML criterion at convergence: 16285.9
Scaled residuals:
Min 1Q Median 3Q Max
-20.5042 -0.2393 -0.0697 0.3165 13.9634
Random effects:
Groups Name Variance Std.Dev. Corr
Serial_number (Intercept) 1.584e-02 0.1259
poly(Voltage, 2)1 1.777e+03 42.1536 -0.67
poly(Voltage, 2)2 1.579e+03 39.7365 0.87 -0.95
Residual 6.679e-02 0.2584
Number of obs: 76219, groups: Serial_number, 1020
Fixed effects:
Estimate Std. Error t value
(Intercept) 5.858e-02 4.062e-03 14.4
poly(Voltage, 3)1 5.938e+02 1.351e+00 439.5
poly(Voltage, 3)2 -1.744e+02 1.276e+00 -136.7
poly(Voltage, 3)3 5.719e+01 2.842e-01 201.2
Correlation of Fixed Effects:
(Intr) p(V,3)1 p(V,3)2
ply(Vlt,3)1 -0.641
ply(Vlt,3)2 0.825 -0.906
ply(Vlt,3)3 -0.001 0.030 -0.004
convergence code: 1
Model failed to converge with max|grad| = 2.22294 (tol = 0.002, component 1)
Model is nearly unidentifiable: large eigenvalue ratio
- Rescale variables?
Warning messages:
1: In optwrap(optimizer, devfun, getStart(start, rho$lower, rho$pp), :
convergence code 1 from bobyqa: bobyqa -- maximum number of function evaluations exceeded
2: In checkConv(attr(opt, "derivs"), opt$par, ctrl = control$checkConv, :
Model failed to converge with max|grad| = 2.22294 (tol = 0.002, component 1)
3: In checkConv(attr(opt, "derivs"), opt$par, ctrl = control$checkConv, :
Model is nearly unidentifiable: large eigenvalue ratio
- Rescale variables?
The data is as following (can provide the complete data of course, if desired). It includes 77479 observables of 6 variables:
'data.frame': 77479 obs. of 6 variables:
$ Serial_number: num 9.12e+08 9.12e+08 9.12e+08 9.12e+08 9.12e+08 ...
$ Lot : int 9 9 9 9 9 9 9 9 9 9 ...
$ Wafer : int 912 912 912 912 912 912 912 912 912 912 ...
$ Amplification: num 1 1 1.01 1.01 1.01 ...
$ Voltage : num 25 30 34.9 44.9 49.9 ...
and the data itself looks like:
Serial_number Lot Wafer Amplification Voltage
1 912009913 9 912 1.00252 24.9681
2 912009913 9 912 1.00452 29.9591
3 912009913 9 912 1.00537 34.9494
(...)
73 912009913 9 912 918.112 375.9850
74 912009913 9 912 1083.74 377.9990
75 912009897 9 912 1.00324 19.9895
76 912009897 9 912 1.00449 29.9777
(...)
What does the warnings mean?
According to the anova the fit3.lme model describes the data better:
> anova(fit3.lme, fit2.lme)
refitting model(s) with ML (instead of REML)
Data: APD
Models:
fit2.lme: log(log(Amplification)) ~ poly(Voltage, 3) + (poly(Voltage, 1) |
fit2.lme: Serial_number)
fit3.lme: log(log(Amplification)) ~ poly(Voltage, 3) + (poly(Voltage, 2) |
fit3.lme: Serial_number)
Df AIC BIC logLik deviance Chisq Chi Df Pr(>Chisq)
fit2.lme 8 35294 35368 -17638.9 35278
fit3.lme 11 16264 16366 -8121.1 16242 19036 3 < 2.2e-16 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Warning message:
In optwrap(optimizer, devfun, x#theta, lower = x#lower, calc.derivs = TRUE, :
convergence code 1 from bobyqa: bobyqa -- maximum number of function evaluations exceeded
Therefore I would like to use that model but I stuck in the warning.
update:
center and scale predictors
ss.CS<- transform(APD, Voltage=scale(Voltage))
> fit31.lme<- update(fit3.lme, data=ss.CS)
Error in poly(dots[[i]], degree, raw = raw, simple = raw) :
'degree' must be less than number of unique points
Also for the other variable (don't know for which it makes sense)
> ss.CS<- transform(APD, Amplitude=scale(Amplitude))
Error in scale(Amplitude) : object 'Amplitude' not found
> ss.CS<- transform(APD, Amplification=scale(Amplification))
> fit31.lme<- update(fit3.lme, data=ss.CS)
Warning messages:
1: In log(Amplification) : NaNs produced
2: In log(log(Amplification)) : NaNs produced
3: In log(Amplification) : NaNs produced
4: In log(log(Amplification)) : NaNs produced
5: In log(Amplification) : NaNs produced
6: In log(log(Amplification)) : NaNs produced
7: Some predictor variables are on very different scales: consider rescaling
check singularity
> diag.vals<- getME(fit3.lme, "theta")[getME(fit3.lme, "lower")==0]
> any(diag.vals<- 1e-6)
[1] TRUE
Warning message:
In any(diag.vals <- 1e-06) : coercing argument of type 'double' to logical
compute gradient and Hessian with Richardson extrapolation
> devfun<- update(fit3.lme, devFunOnly=TRUE)
> if(isLMM(fit3.lme)){
+ pars<- getME(fit3.lme, "theta")
+ } else {
+ pars<- getME(fit3.lme, c("theta", "fixef"))
+ }
> if (require("numDeriv")) {
+ cat("hess:\n"); print(hess <- hessian(devfun, unlist(pars)))
+ cat("grad:\n"); print(grad <- grad(devfun, unlist(pars)))
+ cat("scaled gradient:\n")
+ print(scgrad <- solve(chol(hess), grad))
+ }
Loading required package: numDeriv
hess:
[,1] [,2] [,3] [,4] [,5] [,6]
[1,] 39137.840764 -56.189442277 -1.348127e+02 3.789427141 25.456612941 -3.806942811
[2,] -56.189442 0.508077776 6.283795e-01 -0.068882737 -0.056159369 0.003228274
[3,] -134.812704 0.628379462 1.061584e+00 -0.079620905 -0.152816413 0.007457255
[4,] 3.789427 -0.068882737 -7.962090e-02 0.516054976 0.534346634 0.001513168
[5,] 25.456613 -0.056159369 -1.528164e-01 0.534346634 0.901191745 -0.002344407
[6,] -3.806943 0.003228274 7.457255e-03 0.001513168 -0.002344407 0.179283416
grad:
[1] -22.9114985 2.2229416 -0.2959238 0.6790044 -0.2343368 -0.4020556
scaled gradient:
[1] -0.1123624 4.4764140 -0.8777938 1.3980054 -0.4223921 -0.9508207
> fit3.lme#optinfo$derivs
$gradient
[1] -22.9118920 2.2229424 -0.2959264 0.6790037 -0.2343360 -0.4020605
$Hessian
[,1] [,2] [,3] [,4] [,5] [,6]
[1,] 39137.915527 -56.20745850 -134.87176514 3.74780273 25.47540283 -3.79016113
[2,] -56.207458 0.44262695 0.61462402 -0.04736328 -0.06585693 0.02130127
[3,] -134.871765 0.61462402 1.04296875 -0.10467529 -0.23223877 0.05438232
[4,] 3.747803 -0.04736328 -0.10467529 0.52026367 0.50909424 -0.02130127
[5,] 25.475403 -0.06585693 -0.23223877 0.50909424 0.68481445 -0.02044678
[6,] -3.790161 0.02130127 0.05438232 -0.02130127 -0.02044678 0.07617188
4. restart the fit from the original value (or a slightly perturbed value):
> fit3.lme.restart <- update(fit3.lme, start=pars)
> summary(fit3.lme.restart)
Linear mixed model fit by REML ['lmerMod']
Formula: log(log(Amplification)) ~ poly(Voltage, 3) + (poly(Voltage, 2) | Serial_number)
Data: APD
REML criterion at convergence: 16250.3
Scaled residuals:
Min 1Q Median 3Q Max
-20.4868 -0.2404 -0.0697 0.3166 13.9464
Random effects:
Groups Name Variance Std.Dev. Corr
Serial_number (Intercept) 1.823e-02 0.1350
poly(Voltage, 2)1 2.124e+03 46.0903 -0.77
poly(Voltage, 2)2 1.937e+03 44.0164 0.90 -0.96
Residual 6.668e-02 0.2582
Number of obs: 76219, groups: Serial_number, 1020
Fixed effects:
Estimate Std. Error t value
(Intercept) 0.05823 0.00434 13.4
poly(Voltage, 3)1 593.83396 1.47201 403.4
poly(Voltage, 3)2 -174.61257 1.40711 -124.1
poly(Voltage, 3)3 57.15901 0.28427 201.1
Correlation of Fixed Effects:
(Intr) p(V,3)1 p(V,3)2
ply(Vlt,3)1 -0.735
ply(Vlt,3)2 0.868 -0.927
ply(Vlt,3)3 -0.001 0.028 -0.003
5. try all available optimizers
> source(system.file("utils", "allFit.R", package="lme4"))
Loading required package: optimx
Loading required package: dfoptim
> fit3.lme.all <- allFit(fit3.lme)
bobyqa : [OK]
Nelder_Mead : [OK]
nlminbw : [OK]
nmkbw : [OK]
optimx.L-BFGS-B : [OK]
nloptwrap.NLOPT_LN_NELDERMEAD : [OK]
nloptwrap.NLOPT_LN_BOBYQA : [OK]
Warning messages:
1: In checkConv(attr(opt, "derivs"), opt$par, ctrl = control$checkConv, :
unable to evaluate scaled gradient
2: In checkConv(attr(opt, "derivs"), opt$par, ctrl = control$checkConv, :
Model failed to converge: degenerate Hessian with 1 negative eigenvalues
3: In checkConv(attr(opt, "derivs"), opt$par, ctrl = control$checkConv, :
unable to evaluate scaled gradient
4: In checkConv(attr(opt, "derivs"), opt$par, ctrl = control$checkConv, :
Model failed to converge: degenerate Hessian with 1 negative eigenvalues
5: In checkConv(attr(opt, "derivs"), opt$par, ctrl = control$checkConv, :
unable to evaluate scaled gradient
6: In checkConv(attr(opt, "derivs"), opt$par, ctrl = control$checkConv, :
Model failed to converge: degenerate Hessian with 1 negative eigenvalues
7: In checkConv(attr(opt, "derivs"), opt$par, ctrl = control$checkConv, :
unable to evaluate scaled gradient
8: In checkConv(attr(opt, "derivs"), opt$par, ctrl = control$checkConv, :
Model failed to converge: degenerate Hessian with 1 negative eigenvalues
ss <- summary(fit3.lme.all)
> ss$ fixef ## extract fixed effects
(Intercept) poly(Voltage, 3)1 poly(Voltage, 3)2 poly(Voltage, 3)3
bobyqa 0.05822789 593.8340 -174.6126 57.15901
Nelder_Mead 0.05822787 593.8340 -174.6126 57.15902
nlminbw 0.05822787 593.8340 -174.6126 57.15902
nmkbw 0.05841966 593.7804 -174.4999 57.17107
optimx.L-BFGS-B 0.05822845 593.8336 -174.6116 57.16183
nloptwrap.NLOPT_LN_NELDERMEAD 0.05823870 593.8330 -174.6076 57.16039
nloptwrap.NLOPT_LN_BOBYQA 0.05823870 593.8330 -174.6076 57.16039
> ss$ llik ## log-likelihoods
bobyqa Nelder_Mead nlminbw nmkbw optimx.L-BFGS-B
-8125.125 -8125.125 -8125.125 -8129.827 -8125.204
nloptwrap.NLOPT_LN_NELDERMEAD nloptwrap.NLOPT_LN_BOBYQA
-8125.137
> ss$ sdcor ## SDs and correlations
Serial_number.(Intercept) Serial_number.poly(Voltage, 2)1.(Intercept) Serial_number.poly(Voltage, 2)2.(Intercept)
bobyqa 0.1350049 46.09013 44.01631
Nelder_Mead 0.1350064 46.09104 44.01705
nlminbw 0.1350065 46.09106 44.01707
nmkbw 0.1347214 46.11336 43.81219
optimx.L-BFGS-B 0.1356576 46.32849 44.27171
nloptwrap.NLOPT_LN_NELDERMEAD 0.1347638 45.97995 43.91054
nloptwrap.NLOPT_LN_BOBYQA 0.1347638 45.97995 43.91054
Serial_number.poly(Voltage, 2)1 Serial_number.poly(Voltage, 2)2.poly(Voltage, 2)1 Serial_number.poly(Voltage, 2)2
bobyqa -0.7665898 0.9042387 -0.9608662
Nelder_Mead -0.7665981 0.9042424 -0.9608680
nlminbw -0.7665980 0.9042425 -0.9608680
nmkbw -0.7658163 0.9076551 -0.9649999
optimx.L-BFGS-B -0.7713801 0.9067725 -0.9617129
nloptwrap.NLOPT_LN_NELDERMEAD -0.7645748 0.9034336 -0.9606020
nloptwrap.NLOPT_LN_BOBYQA -0.7645748 0.9034336 -0.9606020
sigma
bobyqa 0.2582156
Nelder_Mead 0.2582156
nlminbw 0.2582156
nmkbw 0.2584714
optimx.L-BFGS-B 0.2582244
nloptwrap.NLOPT_LN_NELDERMEAD 0.2582207
nloptwrap.NLOPT_LN_BOBYQA 0.2582207
> ss$ theta ## Cholesky factors
Serial_number.(Intercept) Serial_number.poly(Voltage, 2)1.(Intercept) Serial_number.poly(Voltage, 2)2.(Intercept)
bobyqa 0.5228377 -136.8323 154.1396
Nelder_Mead 0.5228438 -136.8364 154.1428
nlminbw 0.5228439 -136.8365 154.1429
nmkbw 0.5212237 -136.6278 153.8521
optimx.L-BFGS-B 0.5253478 -138.3947 155.4631
nloptwrap.NLOPT_LN_NELDERMEAD 0.5218936 -136.1436 153.6293
nloptwrap.NLOPT_LN_BOBYQA 0.5218936 -136.1436 153.6293
Serial_number.poly(Voltage, 2)1 Serial_number.poly(Voltage, 2)2.poly(Voltage, 2)1 Serial_number.poly(Voltage, 2)2
bobyqa 114.6181 -71.06063 1.578418e+01
Nelder_Mead 114.6186 -71.06067 1.578354e+01
nlminbw 114.6187 -71.06067 1.578351e+01
nmkbw 114.7270 -71.14411 3.440466e-42
optimx.L-BFGS-B 114.1731 -70.65227 1.527854e+01
nloptwrap.NLOPT_LN_NELDERMEAD 114.7688 -71.19817 1.568481e+01
nloptwrap.NLOPT_LN_BOBYQA 114.7688 -71.19817 1.568481e+01
> ss$ which.OK ## which fits worked
bobyqa Nelder_Mead nlminbw nmkbw optimx.L-BFGS-B
TRUE TRUE TRUE TRUE TRUE
nloptwrap.NLOPT_LN_NELDERMEAD nloptwrap.NLOPT_LN_BOBYQA
TRUE TRUE
Due to users's coment I add the following:
> bam(log(log(Amplification)) ~ s(Voltage) + s(Serial_number, bs="re") + s(Voltage, Serial_number, bs="re"), data=APD, discrete = TRUE)
Family: gaussian
Link function: identity
Formula:
log(log(Amplification)) ~ s(Voltage) + s(Serial_number, bs = "re") +
s(Voltage, Serial_number, bs = "re")
Estimated degrees of freedom:
9 993 987 total = 1990.18
fREML score: -226.8182
> summary(bam(log(log(Amplification)) ~ s(Voltage) + s(Serial_number, bs="re") + s(Voltage, Serial_number, bs="re"), data=APD, discrete = TRUE))
Family: gaussian
Link function: identity
Formula:
log(log(Amplification)) ~ s(Voltage) + s(Serial_number, bs = "re") +
s(Voltage, Serial_number, bs = "re")
Parametric coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.11500 0.01896 6.066 1.31e-09 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Approximate significance of smooth terms:
edf Ref.df F p-value
s(Voltage) 8.998 9 89229 <2e-16 ***
s(Serial_number) 993.441 1019 55241 <2e-16 ***
s(Voltage,Serial_number) 986.741 1019 36278 <2e-16 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
R-sq.(adj) = 0.989 Deviance explained = 99%
fREML = -226.82 Scale est. = 0.051396 n = 76219
On https://uploadfiles.io/n7h9z you can download the r script and data.
Update
(some plots concerning the gam model):
Here are all measurement data points transformed double-logarithmically:
The physical behaviour of the device is at least exponentially and even almost double-exponentially (as I found in a book). By transforming them double-logarithmically the almost behave then linearly. A polynomial of degree described the data already well but a polynomial degree of three did it better, though. I guess this can also be seen on the plot why that.
Some additional plots (I'm not used to GAMs so I just add them):
You can download the data from the link: https://uploadfiles.io/n7h9z
The convergence warnings disappeared when I removed all data points <2. I stumbled over this by coincidence..
Probably this is somehow connected to the issue that for each subset within the range from 0 to about 50 all data points are almost exactly the same (and have values of about ~1).
I have a beta regression model (using package 'betareg') and plots, but for reporting results I will need R-squared and Beta. I am only aware of the lm.beta funtion for finding Beta from a lm equation and the summary(lm(DV~IV, data=mydata))$r.squared for finding r-squared from lm equations. How do I find these values for a beta regression model?
There is a broad range of extractor functions for objects of class betareg, see Table 1 in vignette("betareg", package = "betareg").
As a simple example consider the ReadingSkills case study (Section 5.1):
library("betareg")
data("ReadingSkills", package = "betareg")
m <- betareg(accuracy ~ iq * dyslexia | iq + dyslexia, data = ReadingSkills)
The usual summary has the information you look for:
summary(m)
## Call:
## betareg(formula = accuracy ~ iq * dyslexia | iq + dyslexia, data = ReadingSkills)
##
## Standardized weighted residuals 2:
## Min 1Q Median 3Q Max
## -2.3900 -0.6416 0.1572 0.8524 1.6446
##
## Coefficients (mean model with logit link):
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) 1.1232 0.1428 7.864 3.73e-15 ***
## iq 0.4864 0.1331 3.653 0.000259 ***
## dyslexia -0.7416 0.1428 -5.195 2.04e-07 ***
## iq:dyslexia -0.5813 0.1327 -4.381 1.18e-05 ***
##
## Phi coefficients (precision model with log link):
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) 3.3044 0.2227 14.835 < 2e-16 ***
## iq 1.2291 0.2672 4.600 4.23e-06 ***
## dyslexia 1.7466 0.2623 6.658 2.77e-11 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Type of estimator: ML (maximum likelihood)
## Log-likelihood: 65.9 on 7 Df
## Pseudo R-squared: 0.5756
## Number of iterations: 25 (BFGS) + 1 (Fisher scoring)
To extract specific parts such as the Pseudo R-squared you can access the elements of the summary():
summary(m)$pseudo.r.squared
## 0.5756258
Or there are dedicated methods:
coef(m)
## (Intercept) iq dyslexia iq:dyslexia
## 1.1232250 0.4863696 -0.7416450 -0.5812569
## (phi)_(Intercept) (phi)_iq (phi)_dyslexia
## 3.3044312 1.2290731 1.7465642
coef(m, model = "mean")
## (Intercept) iq dyslexia iq:dyslexia
## 1.1232250 0.4863696 -0.7416450 -0.5812569
coef(m, model = "precision")
## (Intercept) iq dyslexia
## 3.304431 1.229073 1.746564
I would like to fit a mixed effect model that allows me to account for unequal variances across different geos. Specifically, I would like to predict response as a function of a fixed effect X with geo as the random effect.
Here are what the data look like:
X response geo
1 4 5.521461 other
2 4 5.164786 other
3 4 5.164786 other
4 6 3.401197 other
5 5 4.867534 other
6 4 5.010635 other
Unique values for the geo column:
[1] "other" "Atlanta-Sandy Springs-Marietta, GA" "Chicago-Naperville-Joliet, IL-IN-WI" "Dallas-Fort Worth-Arlington, TX"
[5] "Houston-Sugar Land-Baytown, TX" "Los Angeles-Long Beach-Santa Ana, CA" "Miami-Fort Lauderdale-Pompano Beach, FL" "Phoenix-Mesa-Glendale, AZ"
Here is the model that I've attempted:
> lme0 <- lme(response ~ factor(predictor) , random = ~1|factor(geo), data = HC_hired)
> summary(lme0)
Linear mixed-effects model fit by REML
Data: HC_hired
AIC BIC logLik
54770.69 54836.3 -27377.34
Random effects:
Formula: ~1 | factor(geo)
(Intercept) Residual
StdDev: 0.08689381 0.66802
Fixed effects: response ~ factor(predictor)
Value Std.Error DF t-value p-value
(Intercept) 4.255531 0.04410213 26918 96.49264 0.0000
factor(predictor)2 0.022986 0.03336742 26918 0.68889 0.4909
factor(predictor)3 0.166341 0.03221410 26918 5.16361 0.0000
factor(predictor)4 0.299172 0.03194177 26918 9.36618 0.0000
factor(predictor)5 0.378645 0.03249053 26918 11.65402 0.0000
factor(predictor)6 0.472583 0.03664732 26918 12.89543 0.0000
Correlation:
(Intr) fct()2 fct()3 fct()4 fct()5
factor(predictor)2 -0.660
factor(predictor)3 -0.683 0.903
factor(predictor)4 -0.689 0.912 0.945
factor(predictor)5 -0.679 0.897 0.930 0.940
factor(predictor)6 -0.603 0.795 0.824 0.832 0.819
Standardized Within-Group Residuals:
Min Q1 Med Q3 Max
-4.7047458 -0.3424262 0.1883132 0.7045260 2.1949313
Number of Observations: 26931
Number of Groups: 8
My issue is that the output does not specify a random effect for each level of geo. What is the correct model specification to do this? I've tried many permutations of the formula without luck. Any comments on the overall process are also welcome. Many thanks in advance!
RESPONSE TO COMMENT (coercing geo to factor does not change output):
HC_hired$geo <- as.factor(HC_hired$geo)
lme0 <- lme(response ~ factor(predictor) , random = ~1|factor(geo), data = HC_hired)
summary(lme0)