I have a 2-way repeated measures design (3 x 2), and I would like to get figures out how to calculate effect sizes (partial eta squared).
I have a matrix with data in it (called a) like so (repeated measures)
A.a A.b B.a B.b C.a C.b
1 514.0479 483.4246 541.1342 516.4149 595.5404 588.8000
2 569.0741 550.0809 569.7574 599.1509 621.4725 656.8136
3 738.2037 660.3058 812.2970 735.8543 767.0683 738.7920
4 627.1101 638.1338 641.2478 682.7028 694.3569 761.6241
5 599.3417 637.2846 599.4951 632.5684 626.4102 677.2634
6 655.1394 600.9598 729.3096 669.4189 728.8995 716.4605
idata =
Caps Lower
A a
A b
B a
B b
C a
C b
I know how to do a repeated measures ANOVA with the car package (type 3 SS is standard in my field although I know that it results in a logical error.. if somebody wants to explain that to me like I'm 5 I would love to understand it):
summary(Anova(lm(a ~ 1),
idata=idata,type=3,
idesign=~Caps*Lower)),
multivariate=FALSE)
I think what I want to do is take this part of the summary print out:
Univariate Type III Repeated-Measures ANOVA Assuming Sphericity
SS num Df Error SS den Df F Pr(>F)
(Intercept) 14920141 1 153687 5 485.4072 3.577e-06 ***
Caps 33782 2 8770 10 19.2589 0.000372 ***
Lower 195 1 13887 5 0.0703 0.801451
Caps:Lower 2481 2 907 10 13.6740 0.001376 **
And use it to calculate partial ETA squared. So, if I'm not mistaken, I need to take the SS from the first column and divide it by (itself + SS Error for that row) for each effect. Is this the correct way to go about it? If so, how do I do it? I can't figure out how to reference values from the summary print out.
The partial eta-squared can be calculated with the etasq function in heplots package
library(car)
mod <- Anova(lm(a ~ 1),
idata = idata,
type = 3,
idesign = ~Caps*Lower)
mod
library(heplots)
etasq(mod, anova = TRUE)
Since you are asking about the calculations:
From ?etasq: 'For univariate linear models, classical η^2 = SSH / SST and partial η^2 = SSH / (SSH + SSE). These are identical in one-way designs.'.
If you wish to inspect the code for the calculations of η^2 for a model with a class as in the example, you may use getS3method(f = "etasq", class = "Anova.mlm").
Related
I have this dataframe that I applied multinom function
df = data.frame(x = c('a','a','b','b','c','c','d','d','d','e','e','f','f',
'f','f','g','g','g','h','h','h','h','i','i','j','j'),
y = c(1,2,1,3,1,2,1,4,5,1,2,2,3,4,5,1,1,2,1,2,2,3,2,2,3,4) )
df$y = factor(df$y,ordered = TRUE)
nnet::multinom(y~x, data = df)
when checking the output, I have all the variables with their coefficients (meaning everything is fine)
Coefficients:
(Intercept) xb xc xd xe xf
2 -6.045294e-05 -31.83512 3.800915e-05 -36.67053 3.800915e-05 25.00515
3 -1.613311e+01 16.13310 -1.725649e+01 -21.06832 -1.725649e+01 41.13825
4 -1.692352e+01 -14.71119 -1.428100e+01 16.92351 -1.428100e+01 41.92865
5 -2.129358e+01 -10.49359 -1.002518e+01 21.29353 -1.002518e+01 46.29867
xg xh xi xj
2 -0.6931261 0.6932295 40.499799 -25.311410
3 -24.0387863 16.1332150 -8.876562 45.191730
4 -20.2673490 -16.0884760 -6.394423 45.982129
5 -15.1755064 -11.8589447 -4.563793 -6.953942
but my original dataframe (will share only the output) that is coded as the dependent and independent variables from the df dataframe (meaning as ordinal factors) and all the analysis is well done but when it comes to interpretation I have this output :
Coefficients:
(Intercept) FIES_R.L FIES_R.Q FIES_R.C FIES_R^4 FIES_R^5
2 -0.09594409 -1.303256 0.03325169 -0.1753022 -0.46026668 -0.282463422
3 -0.18587599 -1.469957 0.42005569 -0.2977628 0.00508412 0.003068678
4 -0.58189239 -2.875183 0.33128994 -0.6787992 0.11145099 0.239368520
5 -2.68727952 -10.178604 -5.12515249 -5.8454920 -3.13775961 -1.820629143
FIES_R^6 FIES_R^7 FIES_R^8
2 -0.2179067 -0.1000471 -0.1489342
3 0.1915476 -0.5483707 -0.2565626
4 0.2585801 0.3821566 -0.2679774
5 -0.5562958 -0.6335412 -0.7205215
I don't want FIES_R.L,FIES_R.Q and FIES_R.C. I want them as : FIES_R_1, FIES_R_2, FIES_R_3, FIES_R_4, FIES_R_5, FIES_R_6, FIES_R_7, FIES_R_8,
why I have such an output ? knowing that the two dataframes include ordinal categorical variables and the x variable and the FIES variable include many categories in both dataframes. Thanks
I just figured it out : because the independent variable is an ordinal factor. Meaning FIES in my dataset in an ordinal factor. When I used the argument ordered = FALSE the problem got solved
You can change the coefnames "by hand":
mod = nnet::multinom(y~x, data = df)
mod$vcoefnames = c("(Intercept)", paste0(substr(mod$vcoefnames, 1, 6), "_", 1:8))
I run a CV Lasso with the cv.gamlr function in R. I can get the coefficients for the lambdas that correspond to the “1se” or “min” criterion.
set.seed(123)
lasso<-cv.gamlr(x = X, y = Y, family ='binomial')
coef(lasso,select = "1se")
coef(lasso,select = "min")
But what if I want to obtain the coefficients for a specific lambda, stored in the lasso$gamlr$lambda vector? Is it possible to obtain them?
For example, to get the coefficients for the first lambda in the model... Something like this:
lambda_100<- lasso$gamlr$lambda[100]
coef(lasso,select = lambda_100)
Of course, this sends the following error:
Error in match.arg(select) : 'arg' must be NULL or a character vector
Thanks :)
The coefficients are stored under lasso$gamlr$beta, in your example, you can access them like this:
library(gamlr)
x = matrix(runif(500),ncol=5)
y = rnorm(100)
cvfit <- cv.gamlr(x, y, gamma=1)
dim(cvfit$gamlr$beta)
[1] 5 100
length(cvfit$gamlr$lambda)
[1] 100
cvfit$gamlr$lambda[100]
seg100
0.00125315
cvfit$gamlr$beta[,drop=FALSE,100]
5 x 1 sparse Matrix of class "dgCMatrix"
seg100
1 0.12960060
2 -0.16406246
3 -0.46566731
4 0.08197053
5 -0.54170494
Or if you prefer it in a vector:
cvfit$gamlr$beta[,100]
1 2 3 4 5
0.12960060 -0.16406246 -0.46566731 0.08197053 -0.54170494
Background:
McElearth (2016) in his rethinking book pages 158-159, uses an index variable instead of dummy coding for a 3-category variable called "clade" to predict "kcal.per.g" (linear regression).
Question: I was wondering if we could apply the same approach in "rstanarm"? I have provided data and R code for a possible demonstration below.
library("rethinking") # A github package not on CRAN
data(milk)
d <- milk
d$clade_id <- coerce_index(d$clade) # Index variable maker
#[1] 4 4 4 4 4 2 2 2 2 2 2 2 2 2 3 3 3 3 3 3 1 1 1 1 1 1 1 1 1 # index variable
# Model Specification:
fit1 <- map(
alist(
kcal.per.g ~ dnorm( mu , sigma ) ,
mu <- a[clade_id] ,
a[clade_id] ~ dnorm( 0.6 , 10 ) ,
sigma ~ dunif( 0 , 10 )
) ,
data = d )
The most analogous way to do this using the rstanarm package is with
library(rstanarm)
fit1 <- stan_glmer(kcal.per.g ~ 1 + (1 | clade_id), data = milk,
prior_intercept = normal(0.6, 1, autoscale = FALSE),
prior_aux = exponential(rate = 1/5, autoscale = FALSE),
prior_covariance = decov(shape = 10, scale = 1))
However, this is not exactly the same for the following reasons:
Bounded uniform priors on sigma are not implemented because they are not a good idea, so I have used an exponential distribution with an expectation of 5 instead
Fixing the standard deviation on a is not implemented either so I have used a gamma distribution with an expectation of 10
Hierarchical models in rstanarm (and lme4) are parameterized with deviations from common parameters, so rather than using an expectation of 0.6 for a, I have used an expectation of 0.6 for the global intercept and the prior on a is normal with an expectation of zero. This means you need to call coef(fit1) rather than ranef(fit1) to see the "intercepts" as they are parameterized in the original model.
I have the following data frame:
lm mean resids sd resids resid 1 resid 2 resid 3 intercept beta
1 0.000000e+00 6.2806844 -3.6261548 7.2523096 -3.6261548 103.62615 24.989340
2 -2.960595e-16 8.7515899 -5.0527328 10.1054656 -5.0527328 141.96786 -1.047323
3 -2.960595e-16 5.9138984 -3.4143908 6.8287817 -3.4143908 206.29046 -26.448694
4 3.700743e-17 0.5110845 0.2950748 -0.5901495 0.2950748 240.89801 -35.806642
5 7.401487e-16 6.6260504 3.8255520 -7.6511040 3.8255520 187.03479 -23.444762
6 5.921189e-16 8.7217431 5.0355007 -10.0710014 5.0355007 41.43239 3.138396
7 0.000000e+00 5.5269434 3.1909823 -6.3819645 3.1909823 -119.90628 27.817845
8 -1.480297e-16 1.0204260 -0.5891432 1.1782864 -0.5891432 -180.33773 35.623363
9 -5.921189e-16 6.9488186 -4.0119023 8.0238046 -4.0119023 -64.72245 21.820226
10 -8.881784e-16 8.6621512 -5.0010953 10.0021906 -5.0010953 191.65339 -5.218767
Each row represents an estimated linear model with window length 3. I used rollapply on a separate dataframe with the function lm(y~t) to extract the coefficients and intercepts into a new dataframe, which I have combined with the residuals from the same model and their corresponding means and residuals.
Since the window length is 3, it implies that there are 3 residuals as shown, per model, in resid 1, resid 2 and resid 3. The mean and sd of these are included accordingly.
I am seeking to predict the next observation, in essence, k+1, where k is the window length, using the intercept and beta.
Recall that lm1 takes observations 1,2,3 to estimate the intercept and the beta, and lm2 takes 2,3,4, lm3 takes 3,4,5, etc. The function for the prediction should be:
predict_lm1 = intercept_lm1 + beta_lm1*(k+1)
Where k+1 = 4. For lm2:
predict_lm2 = intercept_lm2 + beta_lm2*(k+1)
Where k+1 = 5.
Clearly, k increases by 1 every time I move down one row in the dataset. This is because the explanatory variable is time, t, which is a sequence increasing by one per observation.
Should I use a for loop, or an apply function here?
How can I make a function that iterates down the rows and calculates the predictions accordingly with the information found in that row?
Thanks.
EDIT:
I managed to find a possible solution by writing the following:
n=nrow(dataset)
for(i in n){
predictions = dataset$Intercept + dataset$beta*(k+1)
}
However, k does not increase by 1 per iteration. Thus, k+1 is always = 4.
How can I make sure k increases by 1 accordingly?
EDIT 2
I managed to add 1 to k by writing the following:
n=nrow(dataset)
for(i in n){
x = 0
x[i] = k + 1
preds = dataset$`(Intercept)` + dataset$t*(x[i])
}
However, the first prediction is overestimated. It should be 203, whereas it is estimated as 228, implying that it sets the explanatory variable as 1 too high.
Yet, the second prediction is correct. I am not sure what I am doing wrong. Any advice?
EDIT 3
I managed to find a solution as follows:
n=nrow(dataset)
for(i in n){
x = k + 1
preds = dataset$`(Intercept)` + dataset$t*(x)
x = x + 1
}
Your loop is not iterating:
dataset <- read.table(text="lm meanresids sdresids resid1 resid2 resid3 intercept beta
1 0.000000e+00 6.2806844 -3.6261548 7.2523096 -3.6261548 103.62615 24.989340
2 -2.960595e-16 8.7515899 -5.0527328 10.1054656 -5.0527328 141.96786 -1.047323
3 -2.960595e-16 5.9138984 -3.4143908 6.8287817 -3.4143908 206.29046 -26.448694
4 3.700743e-17 0.5110845 0.2950748 -0.5901495 0.2950748 240.89801 -35.806642
5 7.401487e-16 6.6260504 3.8255520 -7.6511040 3.8255520 187.03479 -23.444762
6 5.921189e-16 8.7217431 5.0355007 -10.0710014 5.0355007 41.43239 3.138396
7 0.000000e+00 5.5269434 3.1909823 -6.3819645 3.1909823 -119.90628 27.817845
8 -1.480297e-16 1.0204260 -0.5891432 1.1782864 -0.5891432 -180.33773 35.623363
9 -5.921189e-16 6.9488186 -4.0119023 8.0238046 -4.0119023 -64.72245 21.820226
10 -8.881784e-16 8.6621512 -5.0010953 10.0021906 -5.0010953 191.65339 -5.218767", header=T)
n <- nrow(dataset)
predictions <- data.frame()
for(i in 1:n){
k <- i ##not sure where k is coming from but put it here
predictions <- rbind(predictions, dataset$intercept[i] + dataset$beta[i]*(k+1))
}
predictions
I am currently learning R. I have no previous knowledge of STATA.
I want to reanalyze a study which was done in Stata (xtpcse linear regression with panel-corrected standard errors). I could not find the model or more detailed code in Stata or any other hint how to rewrite this in R. I have the plm package for econometrics installed for R. That's as far as I got.
The first lines of the .do file from STATA are copied below (I just saw that it's pretty unreadable. Here is a link to the txt file in which I copied the .do content: http://dl.dropbox.com/u/4004629/This%20was%20in%20the%20.do%20file.txt).
I have no idea of how to go about this in a better way. I tried google-ing STATA and R comparison and the like but it did not work.
All data for the study I want to replicate are here:
https://umdrive.memphis.edu/rblanton/public/ISQ_data
---STATA---
Group variable: c_code Number of obs = 265
Time variable: year Number of groups = 27
Panels: correlated (unbalanced) Obs per group: min = 3
Autocorrelation: common AR(1) avg = 9.814815
Sigma computed by pairwise selection max = 14
Estimated covariances = 378 R-squared = 0.8604
Estimated autocorrelations = 1 Wald chi2(11) = 8321.15
Estimated coefficients = 15 Prob > chi2 = 0.0000
------------------------------------------------------------------------------
| Panel-corrected
food | Coef. Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
lag_food | .8449038 .062589 13.50 0.000 .7222316 .967576
ciri | -.010843 .0222419 -0.49 0.626 -.0544364 .0327504
human_cap | .0398406 .0142954 2.79 0.005 .0118222 .0678591
worker_rts | -.1132705 .0917999 -1.23 0.217 -.2931951 .066654
polity_4 | .0113995 .014002 0.81 0.416 -.0160439 .0388429
market_size | .0322474 .0696538 0.46 0.643 -.1042716 .1687665
income | .0382918 .0979499 0.39 0.696 -.1536865 .2302701
econ_growth | .0145589 .0105009 1.39 0.166 -.0060224 .0351402
log_trade | -.3062828 .1039597 -2.95 0.003 -.5100401 -.1025256
fix_dollar | -.0351874 .1129316 -0.31 0.755 -.2565293 .1861545
fixed_xr | -.4941214 .2059608 -2.40 0.016 -.897797 -.0904457
xr_fluct | .0019044 .0106668 0.18 0.858 -.0190021 .0228109
lab_growth | .0396278 .0277936 1.43 0.154 -.0148466 .0941022
english | -.1594438 .1963916 -0.81 0.417 -.5443641 .2254766
_cons | .4179213 1.656229 0.25 0.801 -2.828227 3.66407
-------------+----------------------------------------------------------------
rho | .0819359
------------------------------------------------------------------------------
. xtpcse fab_metal lag_fab_metal ciri human_cap worker_rts polity_4 market
> income econ_growth log_trade fix_dollar fixed_xr xr_fluct lab_growth
> english, pairwise corr(ar1)
Update:
I just tried Vincent's code. I tried the pcse2 and vcovBK code, and they both worked (even though I'm not sure what to do with the correlation matrix that comes out of vcocBK).
However, I still have troubles reproducing the estimates of the regression coefficients in the paper I'm reanalyzing. I'm following their recipe as good as I can, the only step I'm missing is, I think, the part where in Stata "Autocorrelation: common AR(1)" is done. The paper I'm analyzing says: "OLS regression using panel corrected standard errors (Beck/Katz '95), control for first order correlation within each panel (corr AR1 option in Stata)."
How do I control for first order correlation within each panel in R?
Here is what I did so far on my data:
## run lm
res.lm <- lm(total_FDI ~ ciri + human_cap + worker_rts + polity_4 + lag_total + market_size + income + econ_growth + log_trade + fixed_xr + fix_dollar + xr_fluct + english + lab_growth, data=D)
## run pcse
res.pcse <- pcse2(res.lm,groupN="c_code",groupT="year",pairwise=TRUE)
As Ramnath mentioned, the pcse package will do what Stata's xtpcse does. Alternatively, you could use the vcovBK() function from the plm package. If you opt for the latter option, make sure you use the cluster='time' option, which is what the Beck & Katz (1995) article suggests and what the Stata command implements.
The pcse package works well, but there are some issues that makes a lot of intuitive user inputs unacceptable, especially if your dataset is unbalanced. You might want to try this re-write of the function that I coded a while ago. Just load the pcse package, load the pcse2 function, and use it by following the instructions in the pcse documentation. IMHO, the function pasted below is cleaner, more flexible and more robust than the one provided by the pcse folks. Simple benchmarks also suggest that my version may be 5 to 10 times faster than theirs, which may matter for big datasets.
Good luck!
library(Matrix)
pcse2 <- function(object, groupN, groupT, pairwise=TRUE){
## Extract basic model info
groupT <- tail(as.character((match.call()$groupT)), 1)
groupN <- tail(as.character((match.call()$groupN)), 1)
dat <- eval(parse(text=object$call$data))
## Sanity checks
if(!"lm" %in% class(object)){stop("Formula object must be of class 'lm'.")}
if(!groupT %in% colnames(dat)){stop(paste(groupT, 'was not found in data', object$call$data))}
if(!groupN %in% colnames(dat)){stop(paste(groupN, 'was not found in data', object$call$data))}
if(anyDuplicated(paste(dat[,groupN], dat[,groupT]))>0){stop(paste('There are duplicate groupN-groupT observations in', object$call$data))}
if(length(dat[is.na(dat[,groupT]),groupT])>0){stop('There are missing unit indices in the data.')}
if(length(dat[is.na(dat[,groupN]),groupN])>0){stop('There are missing time indices in the data.')}
## Expand model frame to include groupT, groupN, resid columns.
f <- as.formula(object$call$formula)
f.expanded <- update.formula(f, paste(". ~ .", groupN, groupT, sep=" + "))
dat.pcse <- model.frame(f.expanded, dat)
dat.pcse$e <- resid(object)
## Extract basic model info (part II)
N <- length(unique(dat.pcse[,groupN]))
T <- length(unique(dat.pcse[,groupT]))
nobs <- nrow(dat.pcse)
is.balanced <- length(resid(object)) == N * T
## If balanced dataset, calculate as in Beck & Katz (1995)
if(is.balanced){
dat.pcse <- dat.pcse[order(dat.pcse[,groupN], dat.pcse[,groupT]),]
X <- model.matrix(f, dat.pcse)
E <- t(matrix(dat.pcse$e, N, T, byrow=TRUE))
Omega <- kronecker((crossprod(E) / T), Matrix(diag(1, T)) )
## If unbalanced and pairwise, calculate as in Franzese (1996)
}else if(pairwise==TRUE){
## Rectangularize
rectangle <- expand.grid(unique(dat.pcse[,groupN]), unique(dat.pcse[,groupT]))
names(rectangle) <- c(groupN, groupT)
rectangle <- merge(rectangle, dat.pcse, all.x=TRUE)
rectangle <- rectangle[order(rectangle[,groupN], rectangle[,groupT]),]
valid <- ifelse(is.na(rectangle$e),0,1)
rectangle[is.na(rectangle)] <- 0
X <- model.matrix(f, rectangle)
X[valid==0,1] <- 0
## Calculate pcse
E <- crossprod(t(matrix(rectangle$e, N, T, byrow=TRUE)))
V <- crossprod(t(matrix(valid, N, T, byrow=TRUE)))
if (length(V[V==0]) > 0){stop("Error! A CS-unit exists without any obs or without any obs in a common period with another CS-unit. You must remove that unit from the data passed to pcse().")}
Omega <- kronecker(E/V, Matrix(diag(1, T)))
## If unbalanced and casewise, caluate based on largest rectangular subset of data
}else{
## Rectangularize
rectangle <- expand.grid(unique(dat.pcse[,groupN]), unique(dat.pcse[,groupT]))
names(rectangle) <- c(groupN, groupT)
rectangle <- merge(rectangle, dat.pcse, all.x=TRUE)
rectangle <- rectangle[order(rectangle[,groupN], rectangle[,groupT]),]
valid <- ifelse(is.na(rectangle$e),0,1)
rectangle[is.na(rectangle)] <- 0
X <- model.matrix(f, rectangle)
X[valid==0,1] <- 0
## Keep only years for which we have the max number of observations
large.panels <- by(dat.pcse, dat.pcse[,groupT], nrow) # How many valid observations per year?
if(max(large.panels) < N){warning('There is no time period during which all units are observed. Consider using pairwise estimation.')}
T.balanced <- names(large.panels[large.panels==max(large.panels)]) # Which years have max(valid observations)?
T.casewise <- length(T.balanced)
dat.balanced <- dat.pcse[dat.pcse[,groupT] %in% T.balanced,] # Extract biggest rectangular subset
dat.balanced <- dat.balanced[order(dat.balanced[,groupN], dat.balanced[,groupT]),]
e <- dat.balanced$e
## Calculate pcse as in Beck & Katz (1995)
E <- t(matrix(dat.balanced$e, N, T.casewise, byrow=TRUE))
Omega <- kronecker((crossprod(E) / T.casewise), Matrix(diag(1, T)))
}
## Finish evaluation, clean and output
salami <- t(X) %*% Omega %*% X
bread <- solve(crossprod(X))
sandwich <- bread %*% salami %*% bread
colnames(sandwich) <- names(coef(object))
row.names(sandwich) <- names(coef(object))
pcse <- sqrt(diag(sandwich))
b <- coef(object)
tstats <- b/pcse
df <- nobs - ncol(X)
pval <- 2*pt(abs(tstats), df, lower.tail=FALSE)
res <- list(vcov=sandwich, pcse=pcse, b=b, tstats=tstats, df=df, pval=pval, pairwise=pairwise,
nobs=nobs, nmiss=(N*T)-nobs, call=match.call())
class(res) <- "pcse"
return(res)
}
Look at the pcse package, which considers panel corrected standard errors. You certainly have to look at the documentation in STATA to figure out the assumptions made and cross check that with pcse.