Mathematical functions in lavaan's lavTestWald function - r

I'm using the lavaan package in R and want to use lavaan::lavTestWald to test the fit of a model under linear constraints. This test is part of a loop, so there are a large number of models where I want to test these constraints.
As part of the test, I want to set the absolute value of two quantities to be equal. Is there a way to do this? I know about R's abs() function but haven't been able to figure out how to incorporate abs() into lavaan::lavTestWald.
Here's a reproducible example.
HS.model <- ' visual =~ x1 + a*x2 + b*x3
textual =~ x4 + x5 + x6
speed =~ x7 + x8 + x9'
fit <- cfa(HS.model, data = HolzingerSwineford1939)
lavTestWald(fit, "abs(a) == abs(b)") # I want something like this

I don't know why, but the reason that your code is not working as expected seems related to abs(). This worked for me:
lavTestWald(fit, "sqrt(a^2) == sqrt(b^2)")
Note that you could also define new parameters in your model statement and test those, but I don't know if that makes any difference in your situation.
HS.model <- ' visual =~ x1 + a*x2 + b*x3
textual =~ x4 + x5 + x6
speed =~ x7 + x8 + x9
# new1 := abs(a) - abs(b)
new2 := sqrt(a^2) - sqrt(b^2)
'
fit <- cfa(HS.model, data = HolzingerSwineford1939)
lavTestWald(fit, "new2 == 0")

Related

Error Message in Francis Huangs Article on Multilevel Confirmatory Factor Analysis

I am running a Mulitlevel Confirmatory Factorial Analysis in R based on the paper of Francis Huang. Iam working with the syntax from the article and the example data set used in the article.
Here is the Article: http://web.missouri.edu/~huangf/data/mcfa/MCFAinRHUANG.pdf
Watch out! The webadress for downloading the data and the R function is obsolete in the paper. In the syntax below the webadresses are correct.
load lavaan package
Library(lavaan)
That’s an r function
source('http://web.missouri.edu/huangf/data/mcfa/mcfa.R')
dataset containing only the grouping variable and the variables of interest.
raw <- read.csv("http://web.missouri.edu/huangf/data/mcfa/raw.csv")
the function is applied to the dataset specifying that sid (school id) is the clustering variable
x <- mcfa.input("sid", raw)
STEP 1
one factor model and two factor model with results
onefactor <- 'f1 =~ x1 + x2 + x3 + x4 + x5 + x6'
twofactor <- 'f1 =~ x1 + x2 + x4; f2= ~x3 + x5 + x6'
results1 <- cfa(onefactor, sample.cov = x$pw.cov, sample.nobs = x$n - x$G)
summary(results1, fit.measures = T, standardized = T)
results2 <- cfa(twofactor, sample.cov = x$pw.cov, sample.nobs = x$n - x$G)
summary(results2, fit.measures = T, standardized = T)
STEP 2
A null model is specified where both the SPW and SB matrices are used in a
multigroup setup using the factor structure defined at step 1 on both matrices with all equality constraints set to be equal. We do not really have two groups but the multigroup setup will be used to analyze both the 'within group' and the 'between group' matrices simultaneously. In lavaan, multiple input covariance matrices and the sample sizes for each are stored in a list object:
combined.cov <- list(within = x$pw.cov, between = x$b.cov)
combined.n <- list(within = x$n - x$G, between = x$G)
The first object in the list refers to group one and the second object refers to group two and we create two new objects (i.e., combined.cov and combined.n) that contain the two covariance matrices (i.e., SPW and SB) and the sample size for each (n - G and G, respectively).
Next, a model imposing the equality constraints must be specified. In this step, the model specification expands quite a bit. In lavaan, the equality constrains are imposed for the par-
ticular variable by indicating c(a,a)*variable where c() is the concatenate function, a is
a label assigned by the user to indicate that loading a for group one is set to be equal for
loading a in group two. The same label names instruct lavaan to use the same estimates be-
tween groups or in other words, specify equality constraints. To specify equal factor loadings
for both factors for the within and between models, we indicate: f1 =~ x1 + c(a,a)*x2 +
c(b,b)*x4; f2 =~ x3 + c(c,c)*x5 + c(d,d)*x6. The loadings for x1 and x3 are auto matically set to 1 so do not need to be specified.
nullmodel <- '
+ f1 =~ x1 + c(a,a)*x2 + c(b,b)*x4
+ f2 =~ x3 + c(c,c)*x5 + c(d,d)*x6
+ x1 ~~ c(e,e)*x1
+ x2 ~~ c(f,f)*x2
+ x3 ~~ c(g,g)*x3
+ x4 ~~ c(h,h)*x4
+ x5 ~~ c(i,i)*x5
+ x6 ~~ c(j,j)*x6
+ f1 ~~ c(k,k)*f1
+ f2 ~~ c(l,l)*f2
+ f1 ~~ c(m,m)*f2
+ '
results3 <- cfa(nullmodel, sample.cov = combined.cov,
+ sample.nobs = combined.n)
Thats the error message
Error: Unexpected '=' in:
"results3 <- cfa(nullmodel, sample.cov = combined.cov, + sample.nobs ="
With my data I am receiving the same error message.
Any ideas? Help is appreciated.
Best regards Konstantin

Model fit for Lavaan CFA. What is the problem with this code?

I have implemented a confirmatory factor analysis using the lavaan package in R. The code I wrote produces the following error:
lavaan 0.6-7 did NOT end normally after 2044 iterations
Could someone kindly give me some help, please?
##PACKAGES
library(lavaan)
library(semTools)
library(semPlot)
library(readxl)
library(dplyr)
##DATASET
dataset_ita <- read_excel("dataset_ita.xlsx", col_types = "numeric")
dataset_eng <- read_excel("dataset_eng.xlsx", col_types = "numeric")
dataset <- rbind(dataset_ita, dataset_eng)
##CONFIRMATORY FACTOR ANALYSIS
model <- '
WinePurchaseBehaviour =~ X5+X6+X7+X8+X9+X10+X11
WineConsumptionBehaviour =~ X12+X13+X14+X15+X16+X17+X18+X19+X20+X21+X22+X23+X24+X25
WineClubInterest =~ X26+X27+X28+X29+X30+X31+X64+X65+X66+X67
WineInvolvement =~ WinePurchaseBehaviour+WineConsumptionBehaviour+WineClubInterest
GeneralInvolvement =~ X32+X33+X34+X35+X36+X37+X38+X39+X40
FeatureInvolvement =~ X44+X46+X47+X48+X49
RitualInvolvement =~ X41+X42+X43+X45+X50
AppExperience =~ X51+X52+X53+X54+X55+X56+X57+X58
SensoryExperience =~ X59+X60+X61+X62+X63
ProductInvolvement =~ GeneralInvolvement+FeatureInvolvement+AppExperience+SensoryExperience
Purchase =~ X68+X69+X70+X71+X72+X73
Purchase =~ WineInvolvement+ProductInvolvement'
analysis <- cfa(model, data = dataset, se = "robust.sem")
summary(analysis, fit.measures=TRUE)
The datasets can be found here
I took a quick look at the data and it seems that RitualInvolvement factor just doesn't work - it doesn't work on its own even. The FeatureInolvement factor worked better using X47 as the coefficient set to 1, so use that first in the list. The model below, without estimating the RitualInvolvement factor estimates and converges for me.
model <- '
WinePurchaseBehaviour =~ X5+X6+X7+X8+X9+X10+X11
WineConsumptionBehaviour =~ X12+X13+X14+X15+X16+X17+X18+X19+X20+X21+X22+X23+X24+X25
WineClubInterest =~ X26+X27+X28+X29+X30+X31+X64+X65+X66+X67
WineInvolvement =~ WinePurchaseBehaviour+WineConsumptionBehaviour+WineClubInterest
GeneralInvolvement =~ X32+X33+X34+X35+X36+X37+X38+X39+X40
FeatureInvolvement =~ X47+X46+X44+X48+X49
# RitualInvolvement =~ X41+X42+X43+X45+X50
AppExperience =~ X51+X52+X53+X54+X55+X56+X57+X58
SensoryExperience =~ X59+X60+X61+X62+X63
ProductInvolvement =~ GeneralInvolvement+FeatureInvolvement+AppExperience+SensoryExperience
Purchase =~ X68+X69+X70+X71+X72+X73
Purchase =~ WineInvolvement+ProductInvolvement'
analysis <- cfa(model, data = dataset, se = "robust.sem")

How to summarize the elements in a complex object?

I have a very complex S4 object (output from a lavaan model) which has slots within slots within slots and variables ($) at the deepest level of each of the deepest slots. How do I extract and store the object.size(and potentially other functions like length and dim and the object name) of every element within this object so that I can compare it to another object of the same class?
I have tried storing the output from str(obj) and unclass(obj) and then manipulating the output to extract the information I want, but it's turning out to be very tedious. Looping over names is equally difficult. Is there a way to "flatten" the object into a list? Is there a recursive function anyone can think of to repeatedly dig into each slot?
Edit
Here's an example, using the lavaan package I referenced above, though ideally the solution shouldn't be dependent on the specific object class and could work across classes:
library(lavaan)
model <- '
# measurement model
ind60 =~ x1 + x2 + x3
dem60 =~ y1 + y2 + y3 + y4
dem65 =~ y5 + y6 + y7 + y8
# regressions
dem60 ~ ind60
dem65 ~ ind60 + dem60
# residual correlations
y1 ~~ y5
y2 ~~ y4 + y6
y3 ~~ y7
y4 ~~ y8
y6 ~~ y8
'
fit <- sem(model, data=PoliticalDemocracy)
The object fit contains many slots and objects inside. I can, of course, extract information from a particular element like object.size(fit#Data#X[[1]]), but I'm looking for a generalized solution. The challenge is that I want to extract the same information about each element, regardless of its "depth".
Thanks!
seems like purrr package might be of help here, especially functions like flatten, transpose, map / at_depth - combined with char vectors as input you can easily extract stuff from deeply nested lists. for example, you could write down the "extractor" functions that you need separatly, then store them all in a list and use invoke (also from purrr) with your object as sole arg or invoke_map on many such objects.
Edit
Here is some code to help you extract object.size(fit#Data#X[[1]]) from one or many lavaan objects. Since the slots you are interested in, in practice are most probably at different depths, my guess is that there is no easy general solution.
The idea is that once you know the exact elements you are interested in - it is fairly straightforward to code up some helper functions to manipulate single/multiple such objects. The functions that I mentioned above provide friendly shortcuts for achieving this.
Let me know if i can be of further help.
library("lavaan")
library("tidyverse")
model <- '
# measurement model
ind60 =~ x1 + x2 + x3
dem60 =~ y1 + y2 + y3 + y4
dem65 =~ y5 + y6 + y7 + y8
# regressions
dem60 ~ ind60
dem65 ~ ind60 + dem60
# residual correlations
y1 ~~ y5
y2 ~~ y4 + y6
y3 ~~ y7
y4 ~~ y8
y6 ~~ y8
'
# say you have 3 different models
fit1 <- sem(model, data=PoliticalDemocracy)
fit2 <- sem(model, data=PoliticalDemocracy)
fit3 <- sem(model, data=PoliticalDemocracy)
# S4 objects have function slot() function for accessing so
object.size(fit#Data#X[[1]])
# becomes
slot(slot(fit, "Data"), "X")[[1]]
# since fit is an S4 object - you have to wrap it up
# in a list to manipulate it easier.
# above code becomes :
list(fit1) %>%
map(~ slot(., "Data")) %>%
map(~ slot(., "X")) %>%
flatten %>%
map(object.size)
# wrap up the above code in a helper function ...
extr_obj_size <- function(lavaan_fit) {
list(lavaan_fit) %>%
map(~ slot(., "Data")) %>%
map(~ slot(., "X")) %>%
map(object.size)
}
extr_obj_size(fit1)
# which you can further wrap up in a function that operates
# on a vector of such objects
extr_multiple_obj_size <- function(vec_lavaan_fits) {
vec_lavaan_fits %>%
map(extr_obj_size) %>%
flatten
}
c(fit,fit2,fit3) %>% extr_multiple_obj_size
Edit2
I don't know how helpful would the following code be in general but i jumbled up something that, given that you know the name of the slot you are interested in - would check at depth 1 and 2 and return you the corresponding value.
fit <- sem(model, data=PoliticalDemocracy)
slot_of_interest <- "eq.constraints"
# slot names at depth 1
depth1 <- names(getSlots("lavaan"))
# slot names at depth 2
depth2 <- depth1 %>% map(~ slotNames(slot(fit,.)))
# helper fun to check if a slot name of interest is inside a slot
in_slot <- function(x) slot_of_interest %in% x
# so if its at depth 1 - then just map slot()-function with that name
if (slot_of_interest %in% depth1) {
list(fit1) %>% map(~slot(., slot_of_interest))
} else {
# otherwise you would need to detect at which index at depth2 does this name appear
index1 <- depth2 %>% detect_index(in_slot)
# and map first the slot-name at that index - then the corresponding slot of interest
list(fit1) %>% map(~ slot(., depth1[index1])) %>% map(~ slot(., slot_of_interest))
}

Using a variable in update() in R to update formula

I am trying to add term to a model formula in R. This is straightforward to do using update() if I enter the variable name directly into the update function. However it does not work if the variable name is in a variable.
myFormula <- as.formula(y ~ x1 + x2 + x3)
addTerm <- 'x4'
#Works: x4 is added
update(myFormula, ~ . + x4)
Output: y ~ x1 + x2 + x3 + x4
#Does not work: "+ addTerm" is added instead of x4 being removed
update(myFormula, ~ . + addTerm)
Output: y ~ x1 + x2 + x3 + addTerm
Adding x4 via the variable can be done in a slightly more complex way.
formulaString <- deparse(myFormula)
newFormula <- as.formula(paste(formulaString, "+", addTerm))
update(newFormula, ~.)
Output: y ~ x1 + x2 + x3 + x4
Is there a way to get update() to do this directly without needing these extra steps? I've tried paste, parse, and the other usual functions and they don't work.
For example, if paste0 is used the output is
update(myFormula, ~ . + paste0(addTerm))
Output: y ~ x1 + x2 + x3 + paste0(addTerm)
Does anybody have any recommendations on how to use a variable in update()?
Thanks
You can probably just do:
update(myFormula, paste("~ . +",addTerm))

Remove perfectly multicollinear variables from data frame

I have a data frame with variables, of which some contain the same information
x1 = runif(1000)
x2 = runif(1000)
x3 = x1 + x2
x4 = runif(1000)
x5 = runif(1000)*0.00000001 +x4
x6 = x5 + x3
x = data.frame(x1, x2, x3, x4, x5, x6)
In a next step I want to rid myself of all variables which are perfectly multicollinear, e.g. column x3 and x6 (there might be also other combinations).
In Stata this is fairly easy: _rmcoll varlist
How is this efficiently done in R?
EDIT:
Note that the ultimate goal is to compute the Mahalanobis distance between observations. For this I need to drop redunant variables. And as far as I can foresee, for this application it would not matter whether I drop x1, x2 or x3
I don't know of a built-in convenience function, but QR decomposition will do it.
We need the data frame to be a matrix:
X <- as.matrix(x)
Use a slightly lower than default tolerance to keep the slightly-non-multicollinear column:
qr.X <- qr(X, tol=1e-9, LAPACK = FALSE)
(rnkX <- qr.X$rank) ## 4 (number of non-collinear columns)
(keep <- qr.X$pivot[seq_len(rnkX)])
## 1 2 4 5
X2 <- X[,keep]
This strictly answers your question; you might also be able to use singular value decomposition (svd()) to implement Mahalanobis distances directly on this type of data ...
For completeness I post the quick-and-dirty solution I was using until now. I actually think it does not perform that badly compared to other methods.
x1 = runif(1000)
x2 = runif(1000)
x3 = x1 + x2
x4 = runif(1000)
x5 = runif(1000)*0.00000001 +x4
x6 = x5 + x3
x = data.frame(x1, x2, x3, x4, x5, x6)
const = rep(1,1000)
a<-lm(const ~ ., data=x)
names(a$coefficients[!is.na(a$coefficients)])[c(-1)]

Resources