How do you code covaried errors for an SEM? - r

I am re-analyzing data from a published paper using their correlation matrix and reconstructing their SEM. However, I do not know how to code the errors. Specifically, I attempting to code the errors on the "Empathy" and "Teaching Self-Efficacy" variables in the SEM (pictured). How do you code the errors?
Below is my code:
library(MVN)
library(lavaan)
library(haven)
library(semPlot)
library(lavaanPlot)
Goroshit_Hen_cor_mat <- matrix(c(1, 0.11, -0.12, -0.1, -0.09, 0.03,
0.11, 1, .3, -0.01, -0.05, 0.06,
-0.12, 0.3, 1, 0.21, 0.23, 0.24,
-0.1, -0.01, 0.21, 1, 0.56, 0.53,
-0.09, -0.05, 0.23, 0.56, 1, 0.38,
0.03, 0.06, 0.24, 0.53, 0.38, 1),
nrow = 6, ncol = 6)
var_names = c("Gender", "Degree", "Years_Teaching", "Emotional_SE", "Empathy", "Teaching_SE")
Goroshit_Hen_cor_mat
var_sds <- c(0.47, 0.42, 10.19, 0.48, 0.43, 0.55)
Goroshit_Hen_cov <- cor2cov(Goroshit_Hen_cor_mat, var_sds, names = var_names)
Goroshit_Hen_cov
Goroshit_Hen_SEM <-"Empathy ~ Emotional_SE
Empathy ~ Gender
Empathy ~ Degree
Empathy ~ Years_Teaching
Teaching_SE ~ Emotional_SE
Teaching_SE ~ Gender
Teaching_SE ~ Degree
Teaching_SE ~ Years_Teaching
Emotional_SE ~~ Gender
Emotional_SE ~~ Degree
Emotional_SE ~~ Years_Teaching
Gender ~~ Degree
Gender ~~ Years_Teaching
Degree ~~ Years_Teaching
Empathy ~~ Teaching_SE
Empathy ~~ Empathy
Teaching_SE ~~ Teaching_SE
"
Goroshit_Hen_SEM_fit <- sem(Goroshit_Hen_SEM, sample.cov = Goroshit_Hen_cov, sample.nobs = 273)
semPaths(Goroshit_Hen_SEM_fit, "path", whatLabels = "est", edge.label.cex = 1, intercepts = FALSE, residuals = TRUE, curve = 1, curvature = 2, nCharNodes = 8, sizeMan = 6, sizeMan2 = 3, optimizeLatRes = T, rotation = 3, edge.color = "#000000")
summary(Goroshit_Hen_SEM_fit)

All double-headed arrows are specified with the ~~ operator, as explained in the ?model.syntax help page and the lavaan tutorial. So you can add this line to your existing syntax
Empathy ~~ Teaching_SE
But if you merely want to free that parameter, it is not necessary when using the sem() or cfa() wrappers, which call lavaan() with the setting auto.cov.y=TRUE (see the ?lavOptions for a description). You should see the parameter estimate in your summary() output.

Related

How to fit exponential model in R and print correct y=ab^(x) equation

I'm trying to fit an exponential model on this datased.
y <- c(0.04, 0.04, 0.03, 0.03, 0.04, 0.03, 0.02, 0.03, 0.03, 0.02, 0.08, 0.04, 0.04, 0.07, 0.04, 0.05, 0.12, 0.05, 0.13, 0.11, 0.11, 0.33, 0.03, 0.08)
x <- c(3.75, 4.25, 1.77, 4.24, 2.99, 3.82, 1.85, 3.17, 2.64, 2.10, 4.23, 3.81, 3.55, 3.73, 3.85, 4.31, 4.35, 3.80, 7.26, 5.91, 8.15, 8.56, 7.49, 8.12)
df <- data.frame(x, y)
ggplot(data = df, aes(x=x,y=y)) +
geom_point(size = 3) +
stat_smooth(method = "lm", formula = y ~ exp(x))+
stat_poly_eq(label.x=0.1, label.y=0.85,
aes(x=x,y=y,label = paste(..eq.label..)), formula = y ~ exp(x),
parse = TRUE, size = 3.5)+
stat_poly_eq(label.x=0.1, label.y=0.8,
aes(x=x,y=y,label = paste(..rr.label..)), formula = y ~ exp(x),
parse = TRUE, size = 3.5)+
theme_classic()
I'd also need to plot it, and so far I was able to fit a proper smooth together with a correct r2 I think. however, I can't seem to be able to print the correct exponential function on the plot, at least by using stat_poly_eq() function.
This only seem to be able to print a function in a linear way, althgough I specify the formula = y ~ exp(x), argument.
Does anyone know how I could have the right exp function on the plot?
Thank you!
Here is a solution.
define a format string, eq_fmt, to make the plot code easier to read;
use the coefficients names b_0 and b_1 like below. This will not write the equation as a*b^x, the base is the base of natural logarithms;
and set output.type = "numeric".
library(ggplot2)
library(ggpmisc)
#> Loading required package: ggpp
#>
#> Attaching package: 'ggpp'
#> The following object is masked from 'package:ggplot2':
#>
#> annotate
y <- c(0.04, 0.04, 0.03, 0.03, 0.04, 0.03, 0.02, 0.03, 0.03, 0.02, 0.08, 0.04, 0.04, 0.07, 0.04, 0.05, 0.12, 0.05, 0.13, 0.11, 0.11, 0.33, 0.03, 0.08)
x <- c(3.75, 4.25, 1.77, 4.24, 2.99, 3.82, 1.85, 3.17, 2.64, 2.10, 4.23, 3.81, 3.55, 3.73, 3.85, 4.31, 4.35, 3.80, 7.26, 5.91, 8.15, 8.56, 7.49, 8.12)
df <- data.frame(x, y)
eq_fmt <- "`y`~`=`~%.3g~italic(e)^{%.3g~`x`}"
ggplot(data = df, aes(x=x,y=y)) +
geom_point(size = 3) +
stat_smooth(method = "lm", formula = y ~ exp(x))+
stat_poly_eq(mapping = aes(x = x, y = y,
label = sprintf(eq_fmt,
after_stat(b_0),
after_stat(b_1))),
label.x = 0.1, label.y = 0.85,
formula = y ~ exp(x),
output.type = "numeric",
parse = TRUE
) +
stat_poly_eq(label.x=0.1, label.y=0.8,
aes(x=x,y=y,label = paste(..rr.label..)), formula = y ~ exp(x),
parse = TRUE, size = 3.5)+
theme_classic()
Created on 2022-09-22 with reprex v2.0.2

How to estimate an SUR model in R with factors to be projected out and clustered standard errors?

I want to estimate an SUR (Seemingly Unrelated Regressions) model.
I tried using systemfit and its wrapper Zelig. But I am not able to understand how to specify factors to be projected out (i.e., add fixed effects) and cluster the standard errors, like we do in felm().
Also, if I simply add the fixed effect variables to my regression equations, then I get the following error:
Error in LU.dgC(a) : cs_lu(A) failed: near-singular A (or out of memory)
Thank you so much for your help!
I am adding a data sample from my data:
Y_var1 <- c(0.45, 0.40, 0.30, 0.40, 0.15, 0.35, 0.50, 0.55, 0.10, 0.15, 0.30, 0.10)
Y_var2 <- c(0.40, 0.25, 0.45, 0.30, 0.35, 0.25, 0.15, 0.25, 0.35, 0.30, 0.20, 0.15)
X_var1 <- c(0, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0)
X_var2 <- c(0, 0, 0, 1, 1, 1, 0, 0, 0, 0, 0, 0)
X_var3 <- c(0, 0, 1, 0, 0, 1, 0, 0, 1, 0, 0, 1)
X_var4 <- c(0.18, 0.18, 0.18, 0.20, 0.20, 0.20, 0.22, 0.22, 0.22, 0.24, 0.24, 0.24)
X_var5 <- c(0.08, 0.08, 0.08, 0.06, 0.06, 0.06, 0.04, 0.04, 0.04, 0.02, 0.02, 0.02)
X_var6 <- c(-0.25, -0.25, -0.25, 1.30, 1.30, 1.30, 1.80, 1.80, 1.80, 2.25, 2.25, 2.25)
X_var7 <- c(1000, 1000, 1000, 1500, 1500, 1500, 2000, 2000, 2000, 2500, 2500, 2500)
X_var8 <- c('ABC', 'ABC', 'ABC', 'MNO', 'MNO', 'MNO', 'DEF', 'DEF', 'DEF', 'XYZ', 'XYZ', 'XYZ')
X_var9 <- c(2000, 2010, 2020, 2000, 2010, 2020, 2000, 2010, 2020, 2000, 2010, 2020)
sample_data <- data.frame(Y_var1, Y_var2, X_var1, X_var2, X_var3, X_var4, X_var5, X_var6, X_var7, X_var8, X_var9)
library(systemfit)
formula <- list(mu1 = Y_var1 ~ X_var1*X_var3 + X_var2*X_var3 + X_var4 + X_var5 + X_var6 + log(X_var7),
mu2 = Y_var2 ~ X_var1*X_var3 + X_var2*X_var3 + X_var4 + X_var5 + X_var6 + log(X_var7))
fitsur <- systemfit(formula = formula, data=sample_data, method = "SUR")
fitols <- systemfit(formula = formula, data=sample_data, method = "OLS")
(Since this is a sample dataset, thus, the above two regressions will give an error I have mentioned above, but are working fine on my actual data.)
However, what I am interested in is estimating the above formula using SUR, with X_var8 and X_var9 fixed effects and standard errors clustered at X_var8 level.
If we use felm(), the specification is
felm(mu1 = Y_var1 ~ X_var1*X_var3 + X_var2*X_var3 + X_var4 + X_var5 + X_var6 + log(X_var7) | X_var8 + X_var9 | 0 | X_var8)
However, as my standard errors are correlated across equations, I need to use SUR.
Any help would be much appreciated. Thank You!
I think now I get it how to implement Fixed Effect correctly to SUR Model,
we need to transform the X_var8 to numeric first with one hot encoding, and also I make new variable based by your interaction formula above
library(mltools)
sample_data2 <- as.data.frame(one_hot(as.data.table(sample_data)))
sample_data2$X_var13 <- sample_data2$X_var1 * sample_data2$X_var3
sample_data2$X_var23 <- sample_data2$X_var2 * sample_data2$X_var3
Check Closely the value of sample_data2$X_var13, and sample_data2$X_var23
sample_data2$X_var13
[1] 0 0 0 0 0 0 0 0 0 0 0 0
sample_data2$X_var23
[1] 0 0 0 0 0 1 0 0 0 0 0 0
Since for the desired sample data all sample_data2$X_var13 is 0, it will also effecting an error of Error in LU.dgC(a) : cs_lu(A) failed: near-singular A (or out of memory) since it doesn't have any meaningful value, we can discard it, but feel free to use it to real data
Make Formula with added fixed effects:
formula <- list(mu1 = Y_var1 ~ X_var23 + X_var4 + X_var5 + X_var6 + log(X_var7) + X_var8_ABC + X_var8_DEF + X_var8_MNO + X_var8_XYZ + X_var9, mu2 = Y_var2 ~ X_var23 + X_var4 + X_var5 + X_var6 + log(X_var7) + X_var8_ABC + X_var8_DEF + X_var8_MNO + X_var8_XYZ + X_var9)
Fit the SUR Model and make summary:
fitsur <- systemfit(formula = formula, data=sample_data2, method = "SUR")
summary(fitsur)

Efficient way to calculate average MAPE and MSE in R

I have a real data and predicted data and I want to calculate overall MAPE and MSE. The data are time series, with each column representing data for different weeks. I predict value for each of the 52 weeks for each of the items as shown below. What would be the best possible calculate overall Error in R.
real = matrix(
c("item1", "item2", "item3", "item4", .5, .7, 0.40, 0.6, 0.3, 0.29, 0.7, 0.09, 0.42, 0.032, 0.3, 0.37),
nrow=4,
ncol=4)
colnames(real) <- c("item", "week1", "week2", "week3")
predicted = matrix(
c("item1", "item2", "item3", "item4", .55, .67, 0.40, 0.69, 0.13, 0.9, 0.47, 0.19, 0.22, 0.033, 0.4, 0.37),
nrow=4,
ncol=4)
colnames(predicted) <- c("item", "week1", "week2", "week3")
How do you get the predicted values in the first place? The model you use to get the predicted values is probably based on minimising some function of prediction errors (usually MSE). Therefore, if you calculate your predicted values, the residuals and some metrics on MSE and MAPE have been calculated somewhere along the line in fitting the model. You can probably retrieve them directly.
If the predicted values happened to be thrown into your lap and you have nothing to do with fitting the model, then you calculate MSE and MAPE as per below:
You have only one record per week for every item. So for every item, you can only calculate one prediction error per week. Depending on your application, you can choose to calculate the MSE and MAPE per item or per week.
This is what your data looks like:
real <- matrix(
c(.5, .7, 0.40, 0.6, 0.3, 0.29, 0.7, 0.09, 0.42, 0.032, 0.3, 0.37),
nrow = 4, ncol = 3)
colnames(real) <- c("week1", "week2", "week3")
predicted <- matrix(
c(.55, .67, 0.40, 0.69, 0.13, 0.9, 0.47, 0.19, 0.22, 0.033, 0.4, 0.37),
nrow = 4, ncol = 3)
colnames(predicted) <- c("week1", "week2", "week3")
Calculate the (percentage/squared) errors for every entry:
pred_error <- real - predicted
pct_error <- pred_error/real
squared_error <- pred_error^2
Calculate MSE, MAPE:
# For per-item prediction errors
apply(squared_error, MARGIN = 1, mean) # MSE
apply(abs(pct_error), MARGIN = 1, mean) # MAPE
# For per-week prediction errors
apply(squared_error, MARGIN = 0, mean) # MSE
apply(abs(pct_error), MARGIN = 0, mean) # MAPE

Ratio of polynomials approximation

I am trying to fit a polynomial to my dataset, which looks like that (full dataset is at the end of the post):
The theory predicts that the formulation of the curve is:
which looks like this (for x between 0 and 1):
When I try to make a linear model in R by doing:
mod <- lm(y ~ poly(x, 2, raw=TRUE)/poly(x, 2))
I get the following curve:
Which is much different from what I would expect. Have you got any idea how to fit a new curve from this data so that it would be similar to the one, which theory predicts? Also, it should have only one minimum.
Full dataset:
Vector of x values:
x <- c(0.02, 0.03, 0.04, 0.05, 0.06, 0.07, 0.08, 0.09, 0.10, 0.11, 0.12,
0.13, 0.14, 0.15, 0.16, 0.17, 0.18, 0.19, 0.20, 0.21, 0.22, 0.23, 0.24, 0.25,
0.26, 0.27, 0.28, 0.29, 0.30, 0.31, 0.32, 0.33, 0.34, 0.35, 0.36, 0.37, 0.38,
0.39, 0.40, 0.41, 0.42, 0.43, 0.44, 0.45, 0.46, 0.47, 0.48, 0.49, 0.50, 0.51,
0.52, 0.53, 0.54, 0.55, 0.56, 0.57, 0.58, 0.59, 0.60, 0.61, 0.62, 0.63, 0.64,
0.65, 0.66, 0.67, 0.68, 0.69, 0.70, 0.71, 0.72, 0.73, 0.74, 0.75, 0.76, 0.77,
0.78, 0.79, 0.80, 0.81, 0.82, 0.83, 0.84, 0.85, 0.86, 0.87, 0.88, 0.89, 0.90,
0.91, 0.92, 0.93, 0.94, 0.95)
Vector of y values:
y <- c(4.104, 4.444, 4.432, 4.334, 4.285, 4.058, 3.901, 4.382,
4.258, 4.158, 3.688, 3.826, 3.724, 3.867, 3.811, 3.550, 3.736, 3.591,
3.566, 3.566, 3.518, 3.581, 3.505, 3.454, 3.529, 3.444, 3.501, 3.493,
3.362, 3.504, 3.365, 3.348, 3.371, 3.389, 3.506, 3.310, 3.578, 3.497,
3.302, 3.530, 3.593, 3.630, 3.420, 3.467, 3.656, 3.644, 3.715, 3.698,
3.807, 3.836, 3.826, 4.017, 3.942, 4.208, 3.959, 3.856, 4.157, 4.312,
4.349, 4.286, 4.483, 4.599, 4.395, 4.811, 4.887, 4.885, 5.286, 5.422,
5.527, 5.467, 5.749, 5.980, 6.242, 6.314, 6.587, 6.790, 7.183, 7.450,
7.487, 8.566, 7.946, 9.078, 9.308, 10.267, 10.738, 11.922, 12.178, 13.243,
15.627, 16.308, 19.246, 22.022, 25.223, 29.752)
Use nls to fit a nonlinear model. Note that the model formula is not uniquely defined as displayed in the question since if we multiply all the coefficients by any number the result will still give the same predictions. To avoid this we need to fix one coefficient. A first try used the coefficients shown in the question as starting values (except fixing one) but that failed so dropping C was tried and the resulting coefficients fed into a second fit with C = 1.
st <- list(a = 43, b = -14, c = 25, B = 18)
fm <- nls(y ~ (a + b * x + c * x^2) / (9 + B * x), start = st)
fm2 <- nls(y ~ (a + b * x + c * x^2) / (9 + B * x + C * x^2), start = c(coef(fm), C = 1))
plot(y ~ x)
lines(fitted(fm2) ~ x, col = "red")
(continued after chart)
Note: Here is an example of using nls2 to get starting values with random search. We assume that the coefficients each lie between -50 and 50.
library(nls2)
set.seed(123) # for reproducibility
v <- c(a = 50, b = 50, c = 50, B = 50, C = 50)
st0 <- as.data.frame(rbind(-v, v))
fm0 <- nls2(y ~ (a + b * x + c * x^2) / (9 + B * x + C * x^2), start = st0,
alg = "random", control = list(maxiter = 1000))
fm3 <- nls(y ~ (a + b * x + c * x^2) / (9 + B * x + C * x^2), st = coef(fm0))
Since you already have a theoretic prediction, you don't seem in need of a new model, and it's really only a plotting task:
png(); plot(y~x)
lines(x,mod,col="blue")
dev.off()
You cannot expect lm to produce a good approximation to a non-linear problem. The denominator involving x in that theoretic expression makes this inherently nonlinear.

text to expression in function of variance estimation of derived parameters via Delta Method

I have written a function to perform matrix multiplication on each row of the data set pd.matrix. The function my.var.function performs as intended. However, now I want to generalize the function to handle matrices of variable sizes instead of just the example matrix with five columns.
To generalize the function I imagine that I will need to replace x[1], x[2], x[3], x[4], x[5] in the apply statement with something like x[1]:x[ncol(pd.matrix)]. I imagine I similarly will need to replace the two instances of (x1, x2, x3, x4, x5) within the function.
I have tried making these changes with eval(parse(text= followed by paste0 to create the desired series of x1, x2, x3, x4, x5 or x[1], x[2], x[3], x[4], x[5] for this example. However, I have been unable to get eval(parse(text= to work after trying numerous permutations.
How can I generalize the function and apply statement to handle a pd.matrix of n columns rather than five columns?
pd.matrix <- matrix(c(0.10, 0.20, 0.30, 0.40, 0.50,
0.11, 0.21, 0.31, 0.41, 0.51,
0.12, 0.22, 0.32, 0.42, 0.52,
0.13, 0.23, 0.33, 0.43, 0.53,
0.14, 0.24, 0.34, 0.44, 0.54), nrow = 5, byrow = TRUE)
vcv.mat = matrix(c(0.01, 0.0020, 0.0030, 0.0040, 0.0050,
0.0020, 0.02, 0.0031, 0.0041, 0.0051,
0.0030, 0.0031, 0.03, 0.0042, 0.0052,
0.0040, 0.0041, 0.0042, 0.04, 0.0053,
0.0050, 0.0051, 0.0052, 0.0053, 0.05), nrow = 5, byrow = TRUE)
my.var.function <- function(x1, x2, x3, x4, x5) {
my.pd <- matrix(c(x1, x2, x3, x4, x5), nrow = 1)
my.mat = my.pd %*% vcv.mat
my.var = my.mat %*% t(my.pd)
return(my.var = my.var)
}
apply(pd.matrix, 1, function(x) my.var.function(x[1], x[2], x[3], x[4], x[5]))
# [1] 0.0303160 0.0319642 0.0336588 0.0353998 0.0371872
The solution turned out to be very simple. Not sure why I did not see this solution before.
pd.matrix <- matrix(c(0.10, 0.20, 0.30, 0.40, 0.50,
0.11, 0.21, 0.31, 0.41, 0.51,
0.12, 0.22, 0.32, 0.42, 0.52,
0.13, 0.23, 0.33, 0.43, 0.53,
0.14, 0.24, 0.34, 0.44, 0.54), nrow = 5, byrow = TRUE)
vcv.mat = matrix(c(0.01, 0.0020, 0.0030, 0.0040, 0.0050,
0.0020, 0.02, 0.0031, 0.0041, 0.0051,
0.0030, 0.0031, 0.03, 0.0042, 0.0052,
0.0040, 0.0041, 0.0042, 0.04, 0.0053,
0.0050, 0.0051, 0.0052, 0.0053, 0.05), nrow = 5, byrow = TRUE)
my.var.function <- function(x) {
my.pd <- matrix(c(x), nrow = 1)
my.mat = my.pd %*% vcv.mat
my.var = my.mat %*% t(my.pd)
return(my.var = my.var)
}
apply(pd.matrix, 1, function(x) my.var.function(x))
# [1] 0.0303160 0.0319642 0.0336588 0.0353998 0.0371872

Resources