I am trying to compute the Bayes Factor (BF) for one of the fixed effect with the BayesFactor package in R.
The data has the following structure:
rating is the dependent variable
cond is the independent variable with 3 levels ("A", "B", "C")
C1 is a contrast code derived from cond that opposes "A" (coded -0.50) to "B" and "C" (both coded -0.25)
C2 is a contrast code derived from cond that opposes "B" (coded -0.50) to "C" (coded +0.5; and "A" is coded 0)
judge and face are random factors such that face is crossed with judge but nested within cond (and thus also nested within C1 and C2)
DT <- fread("http://matschmitz.github.io/dataLMM.csv")
DT[, judge := factor(judge)]
DT[, face := factor(face)]
# > DT
# judge face cond C1 C2 rating
# 1: 66 13 A -0.50 0.0 1
# 2: 20 13 A -0.50 0.0 4
# 3: 22 13 A -0.50 0.0 7
# 4: 69 13 A -0.50 0.0 1
# 5: 7 13 A -0.50 0.0 3
# ---
# 4616: 45 62 C 0.25 0.5 2
# 4617: 30 62 C 0.25 0.5 6
# 4618: 18 62 C 0.25 0.5 4
# 4619: 40 62 C 0.25 0.5 3
# 4620: 65 62 C 0.25 0.5 1
Ideally I would like to test the "full" model as in:
library(lmerTest)
lmer(rating ~ C1 + C2 + (1 + C1 + C2|judge) + (1|face), data = DT)
and compute the BF for C1.
I managed to compute the BF for C1 but with random intercepts only:
library(BayesFactor)
BF1 <- lmBF(rating ~ C1 + C2 + judge + face, whichRandom = c("judge", "face"), data = DT)
BF0 <- lmBF(rating ~ C2 + judge + face, whichRandom = c("judge", "face"), data = DT)
BF10 <- BF1 / BF0
# > BF10
# Bayes factor analysis
# --------------
# [1] C1 + C2 + judge + face : 0.4319222 ±15.49%
#
# Against denominator:
# rating ~ C2 + judge + face
# ---
# Bayes factor type: BFlinearModel, JZS
I tried without success this solution to include the random slopes:
BF1 <- lmBF(rating ~ C1 + C2 + judge + face + C1:judge + C2:judge,
whichRandom = c("judge", "face", "C1:judge", "C2:judge"), data = DT)
# Some NAs were removed from sampling results: 10000 in total.
I would also need to include (if possible) the correlation between the random intercepts and slopes for judge.
Please feel free to use any other package (e.g., rstan, bridgesampling) in your answer.
Some additional questions:
Do I need to perform any transformation on the BF10, or can I interpret it as it?
What are the default priors?
The covariate has to be a "factor".
In your case, not just the "judge", "face", "C1" and "C2" need to be a factor as well.
DT$C1 = factor(DT$C1)
Related
I have a dataset like this called df
head(df[, 1:3])
ratio
P
T
H
S
p1
p2
PM10
CO2
B
G
Month
Year
0.5
89
-7
98
133
0
40
50
30
3
20
1
2019
0.5
55
4
43
43
30
30
40
32
1
15
1
2019
0.85
75
4
63
43
30
30
42
32
1
18
1
2019
I would like to do a principal component analysis to reduced number of variables for regression analysis. I gave that code
library(factoextra)
df.pca <- prcomp(df, scale = TRUE)
But I got this error message and for that reason I was not able to continue
Error in colMeans(x, na.rm = TRUE) : 'x' must be numeric
What I am doing wrong?
prcomp() will assume that every column in the object you are passing to it should be used in the analysis. You'll need to drop any non-numeric columns, as well as any numeric columns that should not be used in the PCA.
library(factoextra)
# Example data
df <- data.frame(
x = letters,
y1 = rbinom(26,1,0.5),
y2 = rnorm(26),
y3 = 1:26,
id = 1:26
)
# Reproduce your error
prcomp(df)
#> Error in colMeans(x, na.rm = TRUE): 'x' must be numeric
# Remove all non-numeric columns
df_nums <- df[sapply(df, is.numeric)]
# Conduct PCA - works but ID column is in there!
prcomp(df_nums, scale = TRUE)
#> Standard deviations (1, .., p=4):
#> [1] 1.445005e+00 1.039765e+00 9.115092e-01 1.333315e-16
#>
#> Rotation (n x k) = (4 x 4):
#> PC1 PC2 PC3 PC4
#> y1 0.27215111 -0.5512026 -0.7887391 0.000000e+00
#> y2 0.07384194 -0.8052981 0.5882536 4.715914e-16
#> y3 -0.67841033 -0.1543868 -0.1261909 -7.071068e-01
#> id -0.67841033 -0.1543868 -0.1261909 7.071068e-01
# Remove ID
df_nums$id <- NULL
# Conduct PCA without ID - success!
prcomp(df_nums, scale = TRUE)
#> Standard deviations (1, .., p=3):
#> [1] 1.1253120 0.9854030 0.8733006
#>
#> Rotation (n x k) = (3 x 3):
#> PC1 PC2 PC3
#> y1 -0.6856024 0.05340108 -0.7260149
#> y2 -0.4219813 -0.84181344 0.3365738
#> y3 0.5931957 -0.53712052 -0.5996836
I am trying to write a function to return regression coefficient and standard errors since I need run a large number of regressions.
The data could look like this
library(tidyverse)
library(fixest)
library(broom)
data<-tibble(Date = c("2020-01-01","2020-01-01","2020-01-01","2020-01-01","2020-02-01","2020-02-01","2020-02-01","2020-02-01"),
Card = c(1,2,3,4,1,2,3,4),
A = rnorm(8),
B = rnorm(8),
C = rnorm(8)
)
My current code is as following
estimation_fun <- function(col1,col2,df) {
regression<-feols(df[[col1]] ~ df[[col2]] | Card + Date, df)
est =tidy(regression)$estimate
se = tidy(regression)$std.error
output <- list(est,se)
return(output)
}
estimation_fun("A","B",example)
However, it does not work. I guess it is related to column name in feols because I can make it work for lm().
feols function needs a formula object. You can create it using paste0/sprintf.
estimation_fun <- function(col1,col2,df) {
regression<-feols(as.formula(sprintf('%s ~ %s | Card + Date', col1, col2)), df)
est =tidy(regression)$estimate
se = tidy(regression)$std.error
output <- list(est,se)
return(output)
}
estimation_fun("A","B",data)
#[[1]]
#[1] -0.1173276
#attr(,"type")
#[1] "Clustered (Card)"
#[[2]]
#[1] 1.083011
#attr(,"type")
#[1] "Clustered (Card)"
To apply this for every pair of variables you may do -
cols <- names(data)[-(1:2)]
do.call(rbind, combn(cols, 2, function(x) {
data.frame(cols = paste0(x, collapse = '-'),
t(estimation_fun(x[1],x[2],data)))
}, simplify = FALSE))
cols X1 X2
#1 A-B -0.1173276 1.083011
#2 A-C -0.1117691 0.5648162
#3 B-C -0.3771884 0.1656587
Ronak's right: only formulas made of variable names can be used.
Since fixest 0.10.0, you can use the dot square bracket operator to do just that. See the help page for formula manipulation in xpd.
Just change one line in your code to make it work:
estimation_fun <- function(lhs, rhs, df) {
# lhs must be of length 1 (otherwise => not what you'd want)
# rhs can be a vector of variables
regression <- feols(.[lhs] ~ .[rhs] | Card + Date, df)
# etc...
}
# Example of how ".[]" works:
lhs = "A"
rhs = c("B", "C")
feols(.[lhs] ~ .[rhs], data)
#> OLS estimation, Dep. Var.: A
#> Observations: 8
#> Standard-errors: IID
#> Estimate Std. Error t value Pr(>|t|)
#> (Intercept) 0.375548 0.428293 0.876849 0.42069
#> B -0.670476 0.394592 -1.699164 0.15004
#> C 0.177647 0.537452 0.330536 0.75440
#> ---
#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> RMSE: 0.737925 Adj. R2: 0.183702
By the way, I recommend to use the built-in multiple estimation facility (see help here) since estimation speed will be substantially improved.
Update
All combinations can be estimated with one line of code:
# All combinations at once
est_all = feols(c(A, B, C) ~ sw(A, B, C) | Card + Date, data)
Extraction of coefs/SEs can be done with another line:
# Coef + SE // see doc for summary.fixest_multi
coef_se_all = summary(est_all, type = "se_long")
coef_se_all
#> lhs rhs type A B C
#> 1 A A coef 1.0000000 NA NA
#> 2 A A se NaN NA NA
#> 3 A B coef NA 0.8204932 NA
#> 4 A B se NA 1.1102853 NA
#> 5 A C coef NA NA -0.7889534
#> 6 A C se NA NA 0.3260451
#> 7 B A coef 0.2456443 NA NA
#> 8 B A se 0.2314143 NA NA
#> 9 B B coef NA 1.0000000 NA
#> 10 B B se NA NaN NA
#> 11 B C coef NA NA -0.1977089
#> 12 B C se NA NA 0.3335988
#> 13 C A coef -0.4696954 NA NA
#> 14 C A se 0.3858851 NA NA
#> 15 C B coef NA -0.3931512 NA
#> 16 C B se NA 0.8584968 NA
#> 17 C C coef NA NA 1.0000000
#> 18 C C se NA NA NaN
NOTA: it requires fixest 0.10.1 or higher.
I am trying to calculate Within And Total Sum of Squares and Cross-Product Matrices (W) in one-way MANOVA.
I have a treatment matrix tm:
n x1 x2
1 6 7
1 5 9
1 8 6
...
2 3 3
2 1 6
2 2 3
...
3 2 3
3 2 3
3 5 1
...
I also have each individual observations in their own variables, for example:
x111 = x[1,1]
x112 = x[2,1]
...
that are also in the variables that create vectors:
# creating vectors
t11 = c(x111, x111_2) # 6,7
t12 = c(x112, x112_2) # 5,9
t13 = c(x113, x113_2) # 8,6
t14 = c(x114, x114_2) # 4,9
t15 = c(x115, x115_2) # 7,9
t21 = c(x211, x211_2) # 3,3
t22 = c(x212, x212_2) # 1,6
t23 = c(x213, x213_2) # 2,3
t31 = c(x311, x311_2) # 2,3
t32 = c(x312, x312_2) # 5,1
t33 = c(x313, x313_2) # 3,1
t34 = c(x314, x314_2) # 2,3
>dput(t11)
c(6,7)
I am trying to calculate W (Within And Total Sum of Squares and Cross-Product Matrices).
The means are
> x1 # treatment 1
[1] 6 8
> x2 # treatment 2
[1] 2 4
> x3 # treatment 3
[1] 3 2
> x # overall mean
X1 X2
[1,] 4 5
The code I have is:
W = (t(t11)-t(x1))*(t11-x1)
+(t(t12)-t(x1))%*%(t12-x1)
+(t(t13)-t(x1))%*%(t13-x1)
+(t(t14)-t(x1))%*%(t14-x1)
+(t(t15)-t(x1))%*%(t15-x1)
+(t(t21)-t(x2))%*%(t21-x2)
+(t(t22)-t(x2))%*%(t22-x2)
+(t(t23)-t(x2))%*%(t23-x2)
+(t(t31)-t(x3))%*%(t31-x3)
+(t(t32)-t(x3))%*%(t32-x3)
+(t(t33)-t(x3))%*%(t33-x3)
+(t(t34)-t(x3))%*%(t34-x3)
The result I get is:
Error in (t(t11) - t(x1)) * (t11 - x1) + (t(t12) - t(x1)) %*% :
non-conformable arrays
When I isolated each statements, I got this:
> (t(t11)-t(x1))%*%(t11-x1)
[,1]
[1,] 1
> (t(t12)-t(x1))%*%(t12-x1)
[,1]
[1,] 2
Why do these statements evaluate to 1x1 matrices? When I calculate 2x1 and 1x2 operations (subtraction and multiplication) manually, I get 2x2 for both. Here is an online calculator
It can be confusing sometimes when working with vectors in R and you want to do matrix multiplication. A vector in R (say, x = c(1,2)) is printed like it might be a row vector, but R treats it as a column vector.
With that in mind, to get the 2x2 matrix you want, do
t11 = c(6, 7)
x1 = c(6, 8)
(t11 - x1) %*% t(t11 - x1)
No need for too many transposes.
I am having a seemingly simple but very frustrating problem. When you run a model with an interaction term in R, R names the parameter generated "var1:var2" etc. Unfortunately, this naming convention prevents me from calculating predicted values and CI's where newdata is required, because ":" is not a character that can be included in a column header, and the names in the original data frame must exactly match those in newdata. Has anyone else had this problem?
Here is a sample of my code:
wemedist2.exp = glm(survive/trials ~ sitedist + type + sitedist*type + roaddist, family = binomial(logexp(wemedata$expos)), data=wemedata)
summary(wemedist2.exp)
wemepredict3 = with(wemedata, data.frame(sitedist=mean(sitedist),roaddist=mean(roaddist), type=factor(1:2)))
wemepredict3 = cbind(wemepredict3, predict(wemedist2.exp, newdata = wemepredict3, type = "link", se = TRUE))
This produces a table with predicted values for each of the variables at the specified levels, but not interaction.
For your newdata data frame, you shouldn't include columns for the interactions. The product of the interactive variables will be calculated for you (and multiplied by the estimated coefficient) when calling predict.
For example:
Create some dummy data:
set.seed(1)
n <- 10000
X <- data.frame(x1=runif(n), x2=runif(n))
X$x1x2 <- X$x1 * X$x2
head(X)
# x1 x2 x1x2
# 1 0.2655087 0.06471249 0.017181728
# 2 0.3721239 0.67661240 0.251783646
# 3 0.5728534 0.73537169 0.421260147
# 4 0.9082078 0.11129967 0.101083225
# 5 0.2016819 0.04665462 0.009409393
# 6 0.8983897 0.13091031 0.117608474
b <- runif(4)
y <- b[1] + c(as.matrix(X) %*% b[-1]) + rnorm(n, sd=0.1)
Fit the model and compare the estimated vs. true coefficients:
M <- lm(y ~ x1 * x2, X)
summary(M)
# Call:
# lm(formula = y ~ x1 * x2, data = X)
#
# Residuals:
# Min 1Q Median 3Q Max
# -0.43208 -0.06743 -0.00170 0.06601 0.37197
#
# Coefficients:
# Estimate Std. Error t value Pr(>|t|)
# (Intercept) 0.202040 0.003906 51.72 <2e-16 ***
# x1 0.128237 0.006809 18.83 <2e-16 ***
# x2 0.156942 0.006763 23.21 <2e-16 ***
# x1:x2 0.292582 0.011773 24.85 <2e-16 ***
# ---
# Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
#
# Residual standard error: 0.09906 on 9996 degrees of freedom
# Multiple R-squared: 0.5997, Adjusted R-squared: 0.5996
# F-statistic: 4992 on 3 and 9996 DF, p-value: < 2.2e-16
b
# [1] 0.2106027 0.1147864 0.1453641 0.3099322
Create example data to predict to, and do prediction. Note that we only create x1 and x2, and do not create x1:x2:
X.predict <- data.frame(x1=runif(10), x2=runif(10))
head(X.predict)
# x1 x2
# 1 0.26037592 0.7652155
# 2 0.73988333 0.3352932
# 3 0.02650689 0.9788743
# 4 0.84083874 0.1446228
# 5 0.85052685 0.7674547
# 6 0.13568509 0.9612156
predict(M, newdata=X.predict)
# 1 2 3 4 5 6 7
# 0.4138194 0.4221251 0.3666572 0.3681432 0.6225354 0.4084543 0.4711018
# 8 9 10
# 0.7092744 0.3401867 0.2320834
Or...
An alternative approach is to include the interactions in your model-fitting data by calculating the product of the interactive terms, and then include this in your new data as well. We've done the first step in point 1 above, where we created a column called x1x2.
Then we would fit the model with: lm(y ~ x1 + x2 + x1x2, X)
And predict to the following data:
X.predict <- data.frame(x1=runif(10), x2=runif(10), x1x2=runif(10)
If you have categorical variables involved in interactions...
When you have interactions involving categorical variables, the model estimates coefficients describing the effect of belonging to each level relative to belonging to a reference level. So for instance if we have one continuous predictor (x1) and one categorical predictor (x2, with levels a, b, and c), then the model y ~ x1 * x2 will estimate six coefficients, describing:
the intercept (i.e. the predicted y when x1 is zero and the observation belongs to the reference level of x2);
the effect of varying x1 when the observation belongs to the reference level of x2 (i.e. the slope, for the reference level of x2);
the effect of belonging to the second level (i.e. the change in intercept due to belonging to the second level, relative to belonging to the reference level);
the effect of belonging to the third level (i.e. the change in intercept due to belonging to the third level, relative to belonging to the reference level);
the change in the effect of x1 (i.e. change in slope) due to belonging to the second level, relative to belonging to the reference level; and
the change in the effect of x1 (i.e. change in slope) due to belonging to the third level, relative to belonging to the reference level.
If you want to fit and predict the model with/to pre-calculated data describing the interaction, you can create a dataframe that includes columns: x1; x2b (binary, indicating whether the observation belongs to level b); x2c (binary, indicating whether the observation belongs to level c); x1x2b (the product of x1 and x2b); and x1x2c (the product of x1 and x2c).
A quick way to do this is with model.matrix:
set.seed(1)
n <- 1000
d <- data.frame(x1=runif(n), x2=sample(letters[1:3], n, replace=TRUE))
head(d)
# x1 x2
# 1 0.2655087 b
# 2 0.3721239 c
# 3 0.5728534 b
# 4 0.9082078 c
# 5 0.2016819 a
# 6 0.8983897 a
X <- model.matrix(~x1*x2, d)
head(X)
# (Intercept) x1 x2b x2c x1:x2b x1:x2c
# 1 1 0.2655087 1 0 0.2655087 0.0000000
# 2 1 0.3721239 0 1 0.0000000 0.3721239
# 3 1 0.5728534 1 0 0.5728534 0.0000000
# 4 1 0.9082078 0 1 0.0000000 0.9082078
# 5 1 0.2016819 0 0 0.0000000 0.0000000
# 6 1 0.8983897 0 0 0.0000000 0.0000000
b <- rnorm(6) # coefficients
y <- X %*% b + rnorm(n, sd=0.1)
You can rename the columns of X to whatever you want, as long as you use consistent naming when predicting the model to new data later.
Now fit the model. Here I tell lm not to calculate an intercept (with -1), since the variable (Intercept) already exists in X and will have a coefficient calculated for it. We could have also done this by fitting to data as.data.frame(X[, -1]):
(M <- lm(y ~ . - 1, as.data.frame(X)))
# Call:
# lm(formula = y ~ . - 1, data = as.data.frame(X))
#
# Coefficients:
# `(Intercept)` x1 x2b x2c `x1:x2b` `x1:x2c`
# 1.14389 1.09168 -0.88879 0.20405 0.09085 -1.63769
Create some new data to predict to, and carry out the prediction:
d.predict <- expand.grid(x1=seq(0, 1, 0.1), x2=letters[1:3])
X.predict <- model.matrix(~x1*x2, d.predict)
y.predict <- predict(M, as.data.frame(X.predict))
I am using npcdens from np package to construct a conditional density of y on covariates x. However, I need the derivative of the log of this density with respect to y. Is there some way in R to get this?
bw <- npcdensbw(formula=y ~ x1+x2+x3)
fhat <- npcdens(bws=bw,gradients=TRUE)
grad.fhat <- gradients(npcdens(bws=bw,gradients=TRUE))
which returns the gradient with respect to x1, x2 and x3
Can we use this example dataset?
dta = data.frame(expand.grid(x1=1:5,x2=2:6,x3=5:10))
dta$y = with(dta,x1+2*x2 + 3*x3^2)
head(dta)
x1 x2 x3 y
1 1 2 5 80
2 2 2 5 81
3 3 2 5 82
4 4 2 5 83
5 5 2 5 84
6 1 3 5 82
y is the value of the "density". estimate a conditional bandwith object
bw <- npcdensbw(formula = y ~ x1+x2+x3,data=dta)
and look at the gradients
head(gradients(npcdens(bws=bw,gradients=TRUE)))
[,1] [,2] [,3]
[1,] -2.024422e-15 -2.048994e-50 -1.227563e-294
[2,] -1.444541e-15 -1.994174e-50 -1.604693e-294
[3,] -1.017979e-31 -1.201719e-50 -1.743784e-294
[4,] 1.444541e-15 -6.753912e-64 -1.604693e-294
[5,] 2.024422e-15 1.201719e-50 -1.227563e-294
[6,] -2.024422e-15 -3.250713e-50 -1.227563e-294
What do you mean with "derivative with respect to y"? this is a function g(x1,x2,x3), so you can only take derivatives w.r.t. to those 3 dimensions. Concerning the "log of y" part of your question, could this be it?
bw <- npcdensbw(formula = log(y) ~ x1 + x2 + x3,data=dta)
I've never used this package, so these are the thoughts of a non-practitioner. I guess you looked at the examples in help(npcdensbw)?