Feature Selection in caret rfe + sum with ROC - r

I have been trying to apply recursive feature selection using caret package. What I need is that ref uses the AUC as performance measure. After googling for a month I cannot get the process working. Here is the code I have used:
library(caret)
library(doMC)
registerDoMC(cores = 4)
data(mdrr)
subsets <- c(1:10)
ctrl <- rfeControl(functions=caretFuncs,
method = "cv",
repeats =5, number = 10,
returnResamp="final", verbose = TRUE)
trainctrl <- trainControl(classProbs= TRUE)
caretFuncs$summary <- twoClassSummary
set.seed(326)
rf.profileROC.Radial <- rfe(mdrrDescr, mdrrClass, sizes=subsets,
rfeControl=ctrl,
method="svmRadial",
metric="ROC",
trControl=trainctrl)
When executing this script I get the following results:
Recursive feature selection
Outer resampling method: Cross-Validation (10 fold)
Resampling performance over subset size:
Variables Accuracy Kappa AccuracySD KappaSD Selected
1 0.7501 0.4796 0.04324 0.09491
2 0.7671 0.5168 0.05274 0.11037
3 0.7671 0.5167 0.04294 0.09043
4 0.7728 0.5289 0.04439 0.09290
5 0.8012 0.5856 0.04144 0.08798
6 0.8049 0.5926 0.02871 0.06133
7 0.8049 0.5925 0.03458 0.07450
8 0.8124 0.6090 0.03444 0.07361
9 0.8181 0.6204 0.03135 0.06758 *
10 0.8069 0.5971 0.04234 0.09166
342 0.8106 0.6042 0.04701 0.10326
The top 5 variables (out of 9):
nC, X3v, Sp, X2v, X1v
The process always uses Accuracy as performance mesure. Another problem that arises is that when I try to get prediction from the model obtained using:
predictions <- predict(rf.profileROC.Radial$fit,mdrrDescr)
I get the following message
In predictionFunction(method, modelFit, tempX, custom = models[[i]]$control$custom$prediction) :
kernlab class prediction calculations failed; returning NAs
turning out to be imposible to get some prediction from the model.
Here is the information obtained through sessionInfo()
R version 3.0.2 (2013-09-25)
Platform: x86_64-pc-linux-gnu (64-bit)
locale:
[1] LC_CTYPE=es_ES.UTF-8 LC_NUMERIC=C LC_TIME=es_ES.UTF-8
[4] LC_COLLATE=es_ES.UTF-8 LC_MONETARY=es_ES.UTF-8 LC_MESSAGES=es_ES.UTF-8
[7] LC_PAPER=es_ES.UTF-8 LC_NAME=C LC_ADDRESS=C
[10] LC_TELEPHONE=C LC_MEASUREMENT=es_ES.UTF-8 LC_IDENTIFICATION=C
attached base packages:
[1] grid parallel splines stats graphics grDevices utils datasets methods base
other attached packages:
[1] e1071_1.6-2 class_7.3-9 pROC_1.6.0.1 doMC_1.3.2 iterators_1.0.6 foreach_1.4.1
[7] caret_6.0-21 ggplot2_0.9.3.1 lattice_0.20-24 kernlab_0.9-19
loaded via a namespace (and not attached):
[1] car_2.0-19 codetools_0.2-8 colorspace_1.2-4 compiler_3.0.2 dichromat_2.0-0
[6] digest_0.6.4 gtable_0.1.2 labeling_0.2 MASS_7.3-29 munsell_0.4.2
[11] nnet_7.3-7 plyr_1.8 proto_0.3-10 RColorBrewer_1.0-5 Rcpp_0.10.6
[16] reshape2_1.2.2 scales_0.2.3 stringr_0.6.2 tools_3.0.2

One problem is a minor typo ('trControl=' instead of 'trainControl='). Also, you change caretFuncs after you attached it to rfe's control function. Lastly, you will need to tell trainControl to calculate the ROC curves.
This code works:
caretFuncs$summary <- twoClassSummary
ctrl <- rfeControl(functions=caretFuncs,
method = "cv",
repeats =5, number = 10,
returnResamp="final", verbose = TRUE)
trainctrl <- trainControl(classProbs= TRUE,
summaryFunction = twoClassSummary)
rf.profileROC.Radial <- rfe(mdrrDescr, mdrrClass,
sizes=subsets,
rfeControl=ctrl,
method="svmRadial",
## I also added this line to
## avoid a warning:
metric = "ROC",
trControl = trainctrl)
> rf.profileROC.Radial
Recursive feature selection
Outer resampling method: Cross-Validated (10 fold)
Resampling performance over subset size:
Variables ROC Sens Spec ROCSD SensSD SpecSD Selected
1 0.7805 0.8356 0.6304 0.08139 0.10347 0.10093
2 0.8340 0.8491 0.6609 0.06955 0.10564 0.09787
3 0.8412 0.8491 0.6565 0.07222 0.10564 0.09039
4 0.8465 0.8491 0.6609 0.06581 0.09584 0.10207
5 0.8502 0.8624 0.6652 0.05844 0.08536 0.09404
6 0.8684 0.8923 0.7043 0.06222 0.06893 0.09999
7 0.8642 0.8691 0.6913 0.05655 0.10837 0.06626
8 0.8697 0.8823 0.7043 0.05411 0.08276 0.07333
9 0.8792 0.8753 0.7348 0.05414 0.08933 0.07232 *
10 0.8622 0.8826 0.6696 0.07457 0.08810 0.16550
342 0.8650 0.8926 0.6870 0.07392 0.08140 0.17367
The top 5 variables (out of 9):
nC, X3v, Sp, X2v, X1v
For the prediction problems, you should use rf.profileROC.Radial instead of the fit component:
> predict(rf.profileROC.Radial, head(mdrrDescr))
pred Active Inactive
1 Inactive 0.4392768 0.5607232
2 Active 0.6553482 0.3446518
3 Active 0.6387261 0.3612739
4 Inactive 0.3060582 0.6939418
5 Active 0.6661557 0.3338443
6 Active 0.7513180 0.2486820
Max

Related

Mismatching results for singular fit with different R/lme4 versions

I am trying to match the estimate of random effects from R version 3.5.3 (lme4 1.1-18-1) to R version 4.1.1 (lme4 1.1-27.1). However, there is a small difference of random effects between these two versions when there is singular fit. I'm fine with singularity warnings, but it is puzzling that different versions of R/lme4 produce slightly different results.
The following scripts are from R version 3.5.3 (lme4 1.1-18-1) and R version 4.1.1 (lme4 1.1-27.1) with the dataset Arabidopsis from lme4.
> sessionInfo()
R version 3.5.3 (2019-03-11)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 19042)
Matrix products: default
locale:
[1] LC_COLLATE=English_United States.1252
[2] LC_CTYPE=English_United States.1252
[3] LC_MONETARY=English_United States.1252
[4] LC_NUMERIC=C
[5] LC_TIME=English_United States.1252
attached base packages:
[1] stats graphics grDevices utils datasets methods base
loaded via a namespace (and not attached):
[1] minqa_1.2.4 MASS_7.3-51.1 compiler_3.5.3 Matrix_1.2-15
[5] tools_3.5.3 Rcpp_1.0.1 splines_3.5.3 nlme_3.1-137
[9] grid_3.5.3 nloptr_1.2.1 lme4_1.1-18-1 lattice_0.20-38
> library(lme4)
Loading required package: Matrix
> options(digits = 15)
> ##########
> #Example1#
> ##########
> fit1 <- lmer(total.fruits~(1|reg)+(1|reg:popu),data=Arabidopsis,control=lmerControl(optimizer="bobyqa"))
> VarCorr(fit1)
Groups Name Std.Dev.
reg:popu (Intercept) 7.744768797534
reg (Intercept) 10.629179104291
Residual 39.028818969641
> ##########
> #Example2#
> ##########
> fit2 <- lmer(total.fruits~(1|reg)+(1|reg:popu)+(1|reg:popu:amd)+(1|reg:popu:amd:status),data=Arabidopsis,control=lmerControl(optimizer="bobyqa"))
> fit2#theta
[1] 0.150979711638631 0.000000000000000 0.189968995915902
[4] 0.260818869156072
> VarCorr(fit2)
Groups Name Std.Dev.
reg:popu:amd:status (Intercept) 5.841181759473
reg:popu:amd (Intercept) 0.000000000000
reg:popu (Intercept) 7.349619506926
reg (Intercept) 10.090696322743
Residual 38.688521100461
> ##########
> #Example3#
> ##########
> devfun353 <- lmer(total.fruits~(1|reg)+(1|reg:popu)+(1|reg:popu:amd)+(1|reg:popu:amd:status),data=Arabidopsis,control=lmerControl(optimizer="bobyqa"),devFunOnly = T)
> save.image('myEnvironment353.Rdata')
> sessionInfo()
R version 4.1.1 (2021-08-10)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 19042)
Matrix products: default
locale:
[1] LC_COLLATE=English_United States.1252 LC_CTYPE=English_United States.1252
[3] LC_MONETARY=English_United States.1252 LC_NUMERIC=C
[5] LC_TIME=English_United States.1252
attached base packages:
[1] stats graphics grDevices utils datasets methods base
loaded via a namespace (and not attached):
[1] minqa_1.2.4 MASS_7.3-54 compiler_4.1.1 minque_2.0.0 Matrix_1.3-4
[6] tools_4.1.1 Rcpp_1.0.7 tinytex_0.34 splines_4.1.1 nlme_3.1-152
[11] grid_4.1.1 xfun_0.27 nloptr_1.2.2.2 boot_1.3-28 lme4_1.1-27.1
[16] ADDutil_2.2.1.9005 lattice_0.20-44
> library(lme4)
Loading required package: Matrix
Warning message:
package ‘lme4’ was built under R version 4.1.2
> options(digits = 15)
> ##########
> #Example1#
> ##########
> fit1 <- lmer(total.fruits~(1|reg)+(1|reg:popu),data=Arabidopsis,control=lmerControl(optimizer="bobyqa"))
> VarCorr(fit1)
Groups Name Std.Dev.
reg:popu (Intercept) 7.744768797534
reg (Intercept) 10.629179104291
Residual 39.028818969641
> ##########
> #Example2#
> ##########
> fit2 <- lmer(total.fruits~(1|reg)+(1|reg:popu)+(1|reg:popu:amd)+(1|reg:popu:amd:status),data=Arabidopsis,control=lmerControl(optimizer="bobyqa"))
boundary (singular) fit: see ?isSingular
> fit2#theta
[1] 0.150979743348540 0.000000000000000 0.189969036985684 0.260818797487214
> VarCorr(fit2)
Groups Name Std.Dev.
reg:popu:amd:status (Intercept) 5.841182965248
reg:popu:amd (Intercept) 0.000000000000
reg:popu (Intercept) 7.349621069388
reg (Intercept) 10.090693513643
Residual 38.688520961140
> ##########
> #Example3#
> ##########
> devfun411 <- lmer(total.fruits~(1|reg)+(1|reg:popu)+(1|reg:popu:amd)+(1|reg:popu:amd:status),data=Arabidopsis,control=lmerControl(optimizer="bobyqa"),devFunOnly = T)
> load('myEnvironment353.Rdata')
> devfun353 <- lme4:::mkdevfun(environment(devfun353))
> minqa::bobyqa(c(1,1,1,1),devfun353,0,control = list(iprint=2))
npt = 6 , n = 4
rhobeg = 0.2 , rhoend = 2e-07
start par. = 1 1 1 1 fn = 6443.44054431489
rho: 0.020 eval: 11 fn: 6393.61 par: 0.00000 0.621363 0.744867 0.823498
rho: 0.0020 eval: 38 fn: 6361.97 par:0.156855 0.00000 0.190090 0.234676
rho: 0.00020 eval: 49 fn: 6361.94 par:0.150719 0.00000 0.190593 0.249106
rho: 2.0e-05 eval: 67 fn: 6361.94 par:0.150988 0.00000 0.189943 0.260821
rho: 2.0e-06 eval: 74 fn: 6361.94 par:0.150980 0.00000 0.189965 0.260811
rho: 2.0e-07 eval: 82 fn: 6361.94 par:0.150980 0.00000 0.189969 0.260819
At return
eval: 90 fn: 6361.9381 par: 0.150980 0.00000 0.189969 0.260819
parameter estimates: 0.150979722854965, 0, 0.189968942342717, 0.260818725554898
objective: 6361.93810274656
number of function evaluations: 90
> minqa::bobyqa(c(1,1,1,1),devfun411,0,control = list(iprint=2))
npt = 6 , n = 4
rhobeg = 0.2 , rhoend = 2e-07
start par. = 1 1 1 1 fn = 6443.44054431489
rho: 0.020 eval: 11 fn: 6393.61 par: 0.00000 0.621363 0.744867 0.823498
rho: 0.0020 eval: 38 fn: 6361.97 par:0.156855 0.00000 0.190090 0.234676
rho: 0.00020 eval: 49 fn: 6361.94 par:0.150719 0.00000 0.190593 0.249106
rho: 2.0e-05 eval: 67 fn: 6361.94 par:0.150988 0.00000 0.189943 0.260821
rho: 2.0e-06 eval: 74 fn: 6361.94 par:0.150980 0.00000 0.189965 0.260811
rho: 2.0e-07 eval: 82 fn: 6361.94 par:0.150980 0.00000 0.189969 0.260819
At return
eval: 90 fn: 6361.9381 par: 0.150980 0.00000 0.189969 0.260819
parameter estimates: 0.150979722854965, 0, 0.189968942342717, 0.260818725554898
objective: 6361.93810274656
number of function evaluations: 90
When the model is simpler, there is no singularity warning and the results match. (See example 1 in both scripts) When model is relatively complex, there is singularity warning and the results are slightly off (See example 2 in both scripts). The difference is <1e-5 in this case but I have observed <1e-4 before. Can anyone shed some lights on why the results are slightly different? and is it even possible to match the results to at least 1e-8?
Not sure if this is useful but I also extract devfun from 3.5.3 and run it in 4.1.1. The results match. (see example 3) In addition, when I read iteration history from BOBYQA, the $\theta$ of the term that leads to singularity warning oscillates between 0 and small numbers (around 1e-7 to 1e-9).
This post discusses similar topics. It also shows the singularity warning leads to slightly different estimate. There is no obvious change in LME4 NEWS that cause the difference. This FAQ and ?isSingular give great explanation on singularity warning but does not address the issue of mismatching directly.
TL;DR: Sometimes when there is singularity warning (I am ok with), the random effects are slightly different under different R/lme4 versions. Why is this happening and how to address it?
This is a hard problem to solve in general, and even a fairly hard problem to solve in specific cases.
I think the difference arose between version 1.1.27.1 and 1.1.28, probably from this NEWS item:
construction of interacting factors (e.g. when f1:f2 or f1/f2 occur in random effects terms) is now more efficient for partially crossed designs (doesn't try to create all combinations of f1 and f2) (GH #635 and #636)
My guess is that this changes the ordering of the components in the Z matrix, which in turn means that results of various linear algebra operations are not identical (e.g. floating point arithmetic is not associative, so while binary addition is commutative (a + b == b + a), left-to-right evaluation of a sum may not be the same as right-to-left evaluation ((a+b) + c != a + (b+c)) ...)
My attempt at reproducing the problem uses the same version of R ("under development 2022-02-25 r81818") and compares only lme4 package versions 1.18.1 with 1.1.28.9000 (development); any upstream packages such as Rcpp, RcppEigen, Matrix use the same versions. (I had to backport a few changes from the development version of lme4 to 1.1.18.1 to get it to install under the most recent version of R, but I don't think any of those modifications would affect numerical results.)
I did the comparison by installing different versions of the lme4 package before running the code in a fresh R session. My results differed between versions 1.1.18.1 and 1.1.28 less than yours did (both fits were singular, and the relative differences in the theta estimates were of the order of 2e-7 — still greater than your desired 1e-8 tolerance but much smaller than 1e-4 ...)
The results from 1.1.18.1 and 1.1.27.1 were identical.
Q1: Why are your results more different between versions than mine?
in general/anecdotally, numerical results on Windows are slightly more unstable/differ more from other platforms
there are more differences between your two test platforms than among mine: R version, upstream packages (Matrix/Rcpp/RcppEigen/minqa), possibly the compiler versions and settings used to build everything [all of which could make a difference]
Q2: how should one deal with this kind of problem?
as a minor frame challenge, why (other than not understanding what's going on, which is a perfectly legitimate reason to be concerned) does this worry you? The differences in the results are way smaller than the magnitude of statistical uncertainty, and differences this large are also likely to occur across different platforms (OS/compiler version/etc.) even for otherwise identical environments (versions of R, lme4, and other packages).
you could revert to version 1.1.27.1 for now ...
I do take the differences between 1.1.27.1 as a bug, of sorts — at the very least it's an undocumented change in the package. If it were sufficiently high-priority I could investigate the code changes described above and see if there is a way to fix the problems they addressed without breaking backward compatibility (in theory this should be possible, but it could be annoyingly difficult ...)
## R CMD INSTALL ~/R/misc/lme4
library(lme4)
packageVersion("lme4")
## 1.1.18.1
fit2 <- lmer(total.fruits~(1|reg)+(1|reg:popu)+(1|reg:popu:amd)+(1|reg:popu:amd:status),data=Arabidopsis,control=lmerControl(optimizer="bobyqa"))
dput(getME(fit2, "theta"))
t1 <- c(`reg:popu:amd:status.(Intercept)` = 0.150979711638631, `reg:popu:amd.(Intercept)` = 0,
`reg:popu.(Intercept)` = 0.189968995915902, `reg.(Intercept)` = 0.260818869156072
)
Run under 1.1.28.9000 (fresh R session, re-run package-loading/lmer code above)
## R CMD INSTALL ~/R/pkgs/lme4git/lme4
packageVersion("lme4")
## [1] ‘1.1.28.9000’
dput(getME(fit2, "theta"))
t2 <- c(`reg:popu:amd:status.(Intercept)` = 0.15097974334854, `reg:popu:amd.(Intercept)` = 0,
`reg:popu.(Intercept)` = 0.189969036985684, `reg.(Intercept)` = 0.260818797487214
)
(t1-t2)/((t1+t2)/2)
## reg:popu:amd:status.(Intercept) reg:popu:amd.(Intercept)
## -2.100276e-07 NaN
## reg:popu.(Intercept) reg.(Intercept)
## -2.161920e-07 2.747841e-07
The second element is NaN because both versions give singular fits (0/0 == NaN).
Run under 1.1.27.1 (fresh R session, re-run package-loading/lmer code above)
## remotes::install_version("lme4", "1.1-27.1")
t3 <- c(`reg:popu:amd:status.(Intercept)` = 0.150979711638631, `reg:popu:amd.(Intercept)` = 0,
`reg:popu.(Intercept)` = 0.189968995915902, `reg.(Intercept)` = 0.260818869156072)
identical(t1, t3) ## TRUE

R using lmer gives: Error in diag(vcov(object, use.hessian = use.hessian))

There is a strange behaviour when I use lmer: when I save the fit using lmer into an object, let's say fit0, using lmer, I can look at the summary (output not showing):
>summary(fit0)
If I save the objects using save.image(), close the session and reopen it again, summary gives me:
>summary(fit0)
Error in diag(vcov(object, use.hessian = use.hessian))
error in evaluating the argument 'x' in selecting a method for function 'diag': Error in object#pp$unsc() : object 'merPredDunsc' not found
If I run again the model, I get the expected summary but will loose it if I close the session.
What happens? How can I avoid this Error?
Thanks for help.
Environment and version:
Windows 7
R version 3.1.2 (2014-10-31)
GNU Emacs 24.3.1 (i386-mingw-nt6.1.7601)/ESS
Here is a minimal example:
# j: cluster
# i[j]: i in cluster j
# yi[j] = zi[j] + N(0,1)
# zi[j] = b0j + b1*xi[j]
# b0j = g0 + u0j, u0j ~ N(0,sd0)
# b1 = const
library(lme4)
# Number of clusters (level 2)
N <- 20
# intercept
g0 <- 1
sd0 <- 2
# slope
b1 <- 3
# Number of observations (level 1) for cluster j
nj <- 10
# Vector of clusters indices 1,1...n1,2,2,2,....n2,...N,N,....nN
j <- c(sapply(1:N, function(x) rep(x, nj)))
# Vector of random variable
uj <- c(sapply(1:N, function(x)rep(rnorm(1,0,sd0), nj)))
# Vector of fixed variable
x1 <- rep(runif(nj),N)
# linear combination
z <- g0 + uj + b1*x1
# add error
y <- z + rnorm(N*nj,0,1)
# Put all together
d0 <- data.frame(j, y=y, z=z,x1=x1, uj=uj)
head(d0)
# mixed model
fit0 <- lmer(y ~ x1 + (1|j), data = d0)
vcov(fit0)
summary(fit0)
save.image()
After restarting und adding library lme4:
> sessionInfo()
R version 3.1.2 (2014-10-31)
Platform: x86_64-w64-mingw32/x64 (64-bit)
locale:
[1] LC_COLLATE=German_Switzerland.1252 LC_CTYPE=German_Switzerland.1252
[3] LC_MONETARY=German_Switzerland.1252 LC_NUMERIC=C
[5] LC_TIME=German_Switzerland.1252
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] lme4_1.1-7 Rcpp_0.11.0 Matrix_1.1-2-2
loaded via a namespace (and not attached):
[1] compiler_3.1.2 grid_3.1.2 lattice_0.20-29 MASS_7.3-35
[5] minqa_1.2.3 nlme_3.1-118 nloptr_1.0.4 splines_3.1.2
[9] tools_3.1.2
>

how to apply long only add.distribution parameterset in quantStrat - simpleError in param.combo[[param.label]]

I am applying similar add.distribution rule as in the luxor-demo while my strategy has only a long position.
The whole strategy works, but when applying a parameterset I get following error:
TakeProfitLONG 47 0.047
TakeProfitLONG 47 0.047 result of evaluating expression:
simpleError in param.combo[[param.label]]: subscript out of bounds
got results for task 47 numValues: 47, numResults: 47, stopped: FALSE
returning status FALSE evaluation # 48: $param.combo
I am trying to run a distribution on a simple takeProfit rule (get same result from stopLoss or trailingStop):
.use.takeProfit = TRUE
.takeprofit <- 2.0/100 # actual
.TakeProfit = seq(0.1, 4.8, length.out=48)/100 # parameter set for optimization
## take-profit
add.rule(strategy.st, name = 'ruleSignal',
arguments=list(sigcol='signal.gt.zero' , sigval=TRUE,
replace=FALSE,
orderside='long',
ordertype='limit',
tmult=TRUE,
threshold=quote(.takeprofit),
TxnFees=.txnfees,
orderqty='all',
orderset='ocolong'
),
type='chain',
parent='EnterLONG',
label='TakeProfitLONG',
enabled=.use.takeProfit
)
I am adding the distribution as follows:
add.distribution(strategy.st,
paramset.label = 'TakeProfit',
component.type = 'chain',
component.label = 'TakeProfitLONG',
variable = list(threshold = .TakeProfit),
label = 'TakeProfitLONG'
)
and apply the set:
results <- apply.paramset(strategy.st, paramset.label='TakeProfit', portfolio.st=portfolio.st, account.st=account.st, nsamples=.nsamples, verbose=TRUE)
From my limited debugging it seems that the parameterset is a simple vector whereas in the apply.paramset following function fails:
results <- fe %dopar% { ... }
Here I am too new to R as i am only 4 weeks looking into this, but possibly a call to:
install.param.combo <- function(strategy, param.combo, paramset.label)
might cause the error?
Have to apologize as I am to new, but did anyone encounter this or could help how to apply a distribution to only one item in a long only strategy?
Many thanks in advance!
EDIT 1: SessionInfo()
R version 3.1.2 (2014-10-31)
Platform: i486-pc-linux-gnu (32-bit)
locale:
[1] C
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] lattice_0.20-29 iterators_1.0.7 downloader_0.3
[4] quantstrat_0.9.1665 foreach_1.4.2 blotter_0.9.1644
[7] PerformanceAnalytics_1.4.3574 FinancialInstrument_1.2.0 quantmod_0.4-3
[11] TTR_0.22-0.1 xts_0.9-7 zoo_1.7-12
loaded via a namespace (and not attached):
[1] codetools_0.2-9 compiler_3.1.2 digest_0.6.7 grid_3.1.2 tools_3.1.2
This is the same bug as # 5776. It was fixed for "signal" component types, but not for "chain". It should now be fixed as of revision 1669 on R-Forge.

Post hoc tests with ezANOVA output

I tried to use TukeyHSD(my_anova$aov) but it gives an error:
Error in UseMethod("TukeyHSD") :
no applicable method for 'TukeyHSD' applied to an object of class "c('aovlist', 'listof')"
Google says that there is no way to post hoc with 'aovlist'. But maybe you have any idea about post hoc with ezANOVA output.
Example:
require(ez)
data(ANT)
rt_anova = ezANOVA(data = ANT[ANT$error==0,], dv = rt, wid = subnum, within = cue,return_aov = TRUE)
Try to use multcomp:
require(multcomp)
glht(my_anova$aov, linfct = mcp(cue = "Tukey"))
Error in model.matrix.aovlist(model) :
‘glht’ does not support objects of class ‘aovlist’
Error in factor_contrasts(model) :
no ‘model.matrix’ method for ‘model’ found!
Try to use lme:
require(nlme)
lme_velocity = lme(rt ~ cue, data=ANT[ANT$error==0,], random = ~1|subnum)
Error in .Call("La_chol", as.matrix(x), PACKAGE = "base") :
Incorrect number of arguments (1), expecting 2 for 'La_chol'
> sessionInfo()
R version 2.15.2 (2012-10-26)
Platform: i386-w64-mingw32/i386 (32-bit)
locale:
[1] LC_COLLATE=Russian_Russia.1251 LC_CTYPE=Russian_Russia.1251 LC_MONETARY=Russian_Russia.1251 LC_NUMERIC=C LC_TIME=Russian_Russia.1251
attached base packages:
[1] splines stats graphics grDevices utils datasets methods base
other attached packages:
[1] nlme_3.1-108 multcomp_1.2-15 survival_2.37-2 mvtnorm_0.9-9994 ez_4.1-1 stringr_0.6.2 scales_0.2.3 reshape2_1.2.2 plyr_1.8 memoise_0.1
[11] mgcv_1.7-22 lme4_0.999999-0 Matrix_1.0-10 lattice_0.20-13 ggplot2_0.9.3 car_2.0-15 nnet_7.3-5 MASS_7.3-23
loaded via a namespace (and not attached):
[1] colorspace_1.2-1 dichromat_2.0-0 digest_0.6.2 grid_2.15.0 gtable_0.1.2 labeling_0.1 munsell_0.4 proto_0.3-10 RColorBrewer_1.0-5
[10] stats4_2.15.0 tools_2.15.0
It's not that it's ezANOVA output but that it's a repeated measures ANOVA. The class 'aovlist' is typically for that. TukeyHSD is for independent designs. See this question and related links there.
You don't give any reproducible code, but my guess is that you need to use the package multcomp:
require(multcomp)
glht(my_anova$aov, linfct = mcp(cue= "Tukey"))
(does not work with repeated measures aov, see #John's answer why)
===Update===
Your code works for me (R 2.15.2, nlme 3.1-105, multcomp 1.2-15):
> data(ANT)
> lme_velocity = lme(rt ~ cue, data=ANT[ANT$error==0,], random = ~1|subnum)
> glht(lme_velocity, linfct = mcp(cue= "Tukey"))
General Linear Hypotheses
Multiple Comparisons of Means: Tukey Contrasts
Linear Hypotheses:
Estimate
Center - None == 0 -41.872
Double - None == 0 -47.897
Spatial - None == 0 -86.040
Double - Center == 0 -6.026
Spatial - Center == 0 -44.169
Spatial - Double == 0 -38.143

F-test with plm-model

I want to make a f-test to a plm-model and test for
model <- plm(y ~ a + b)
if
# a = b
and
# a = 0 and b = 0
I tried linearHypothesis like this
linearHypothesis(ur.model, c("a", "b")) to test for a = 0 and b = 0
but got the error
Error in constants(lhs, cnames_symb) :
The hypothesis "sgp1" is not well formed: contains bad coefficient/variable names.
Calls: linearHypothesis ... makeHypothesis -> rbind -> Recall -> makeHypothesis -> constants
In addition: Warning message:
In constants(lhs, cnames_symb) : NAs introduced by coercion
Execution halted
My example above is with code that is a little simplified if the problem is easy. If the problems is in the details is the actual code here.
model3 <- formula(balance.agr ~ sgp1 + sgp2 + cp + eu + election + gdpchange.imf + ue.ameco)
ur.model<-plm(model3, data=panel.l.fullsample, index=c("country","year"), model="within", effect="twoways")
linearHypothesis(ur.model, c("sgp1", "sgp2"), vcov.=vcovHC(plmmodel1, method="arellano", type = "HC1", clustering="group"))
I can't reproduce your error with one of the inbuilt data sets, even after quite a bit of fiddling.
Does this work for you?
require(plm)
require(car)
data(Grunfeld)
form <- formula(inv ~ value + capital)
re <- plm(form, data = Grunfeld, model = "within", effect = "twoways")
linearHypothesis(re, c("value", "capital"),
vcov. = vcovHC(re, method="arellano", type = "HC1"))
Note also, that you seem to have an error in the more complex code you showed. You are using linearHypothesis() on the object ur.model, yet call vcovHC() on object plmmodel1. Not sure if that is the problem or not, but check that in case.
Is it possible to provide the data? Finally, edit your Question to include output from sessionInfo(). Mine is (from quite a busy R instance):
> sessionInfo()
R version 2.11.1 Patched (2010-08-25 r52803)
Platform: x86_64-unknown-linux-gnu (64-bit)
locale:
[1] LC_CTYPE=en_GB.utf8 LC_NUMERIC=C
[3] LC_TIME=en_GB.utf8 LC_COLLATE=en_GB.utf8
[5] LC_MONETARY=C LC_MESSAGES=en_GB.utf8
[7] LC_PAPER=en_GB.utf8 LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_GB.utf8 LC_IDENTIFICATION=C
attached base packages:
[1] splines grid stats graphics grDevices utils datasets
[8] methods base
other attached packages:
[1] car_2.0-2 nnet_7.3-1 plm_1.2-6 Formula_1.0-0
[5] kinship_1.1.0-23 lattice_0.19-11 nlme_3.1-96 survival_2.35-8
[9] mgcv_1.6-2 chron_2.3-37 MASS_7.3-7 vegan_1.17-4
[13] lmtest_0.9-27 sandwich_2.2-6 zoo_1.6-4 moments_0.11
[17] ggplot2_0.8.8 proto_0.3-8 reshape_0.8.3 plyr_1.2.1
loaded via a namespace (and not attached):
[1] Matrix_0.999375-44 tools_2.11.1
Could it be because you are "mixing" models? You have a variance specification that starts out:
, ...vcov.=vcovHC(plmmodel1,
... and yet you are working with ur.model.

Resources