How to summarize the elements in a complex object? - r

I have a very complex S4 object (output from a lavaan model) which has slots within slots within slots and variables ($) at the deepest level of each of the deepest slots. How do I extract and store the object.size(and potentially other functions like length and dim and the object name) of every element within this object so that I can compare it to another object of the same class?
I have tried storing the output from str(obj) and unclass(obj) and then manipulating the output to extract the information I want, but it's turning out to be very tedious. Looping over names is equally difficult. Is there a way to "flatten" the object into a list? Is there a recursive function anyone can think of to repeatedly dig into each slot?
Edit
Here's an example, using the lavaan package I referenced above, though ideally the solution shouldn't be dependent on the specific object class and could work across classes:
library(lavaan)
model <- '
# measurement model
ind60 =~ x1 + x2 + x3
dem60 =~ y1 + y2 + y3 + y4
dem65 =~ y5 + y6 + y7 + y8
# regressions
dem60 ~ ind60
dem65 ~ ind60 + dem60
# residual correlations
y1 ~~ y5
y2 ~~ y4 + y6
y3 ~~ y7
y4 ~~ y8
y6 ~~ y8
'
fit <- sem(model, data=PoliticalDemocracy)
The object fit contains many slots and objects inside. I can, of course, extract information from a particular element like object.size(fit#Data#X[[1]]), but I'm looking for a generalized solution. The challenge is that I want to extract the same information about each element, regardless of its "depth".
Thanks!

seems like purrr package might be of help here, especially functions like flatten, transpose, map / at_depth - combined with char vectors as input you can easily extract stuff from deeply nested lists. for example, you could write down the "extractor" functions that you need separatly, then store them all in a list and use invoke (also from purrr) with your object as sole arg or invoke_map on many such objects.
Edit
Here is some code to help you extract object.size(fit#Data#X[[1]]) from one or many lavaan objects. Since the slots you are interested in, in practice are most probably at different depths, my guess is that there is no easy general solution.
The idea is that once you know the exact elements you are interested in - it is fairly straightforward to code up some helper functions to manipulate single/multiple such objects. The functions that I mentioned above provide friendly shortcuts for achieving this.
Let me know if i can be of further help.
library("lavaan")
library("tidyverse")
model <- '
# measurement model
ind60 =~ x1 + x2 + x3
dem60 =~ y1 + y2 + y3 + y4
dem65 =~ y5 + y6 + y7 + y8
# regressions
dem60 ~ ind60
dem65 ~ ind60 + dem60
# residual correlations
y1 ~~ y5
y2 ~~ y4 + y6
y3 ~~ y7
y4 ~~ y8
y6 ~~ y8
'
# say you have 3 different models
fit1 <- sem(model, data=PoliticalDemocracy)
fit2 <- sem(model, data=PoliticalDemocracy)
fit3 <- sem(model, data=PoliticalDemocracy)
# S4 objects have function slot() function for accessing so
object.size(fit#Data#X[[1]])
# becomes
slot(slot(fit, "Data"), "X")[[1]]
# since fit is an S4 object - you have to wrap it up
# in a list to manipulate it easier.
# above code becomes :
list(fit1) %>%
map(~ slot(., "Data")) %>%
map(~ slot(., "X")) %>%
flatten %>%
map(object.size)
# wrap up the above code in a helper function ...
extr_obj_size <- function(lavaan_fit) {
list(lavaan_fit) %>%
map(~ slot(., "Data")) %>%
map(~ slot(., "X")) %>%
map(object.size)
}
extr_obj_size(fit1)
# which you can further wrap up in a function that operates
# on a vector of such objects
extr_multiple_obj_size <- function(vec_lavaan_fits) {
vec_lavaan_fits %>%
map(extr_obj_size) %>%
flatten
}
c(fit,fit2,fit3) %>% extr_multiple_obj_size
Edit2
I don't know how helpful would the following code be in general but i jumbled up something that, given that you know the name of the slot you are interested in - would check at depth 1 and 2 and return you the corresponding value.
fit <- sem(model, data=PoliticalDemocracy)
slot_of_interest <- "eq.constraints"
# slot names at depth 1
depth1 <- names(getSlots("lavaan"))
# slot names at depth 2
depth2 <- depth1 %>% map(~ slotNames(slot(fit,.)))
# helper fun to check if a slot name of interest is inside a slot
in_slot <- function(x) slot_of_interest %in% x
# so if its at depth 1 - then just map slot()-function with that name
if (slot_of_interest %in% depth1) {
list(fit1) %>% map(~slot(., slot_of_interest))
} else {
# otherwise you would need to detect at which index at depth2 does this name appear
index1 <- depth2 %>% detect_index(in_slot)
# and map first the slot-name at that index - then the corresponding slot of interest
list(fit1) %>% map(~ slot(., depth1[index1])) %>% map(~ slot(., slot_of_interest))
}

Related

Error Message in Francis Huangs Article on Multilevel Confirmatory Factor Analysis

I am running a Mulitlevel Confirmatory Factorial Analysis in R based on the paper of Francis Huang. Iam working with the syntax from the article and the example data set used in the article.
Here is the Article: http://web.missouri.edu/~huangf/data/mcfa/MCFAinRHUANG.pdf
Watch out! The webadress for downloading the data and the R function is obsolete in the paper. In the syntax below the webadresses are correct.
load lavaan package
Library(lavaan)
That’s an r function
source('http://web.missouri.edu/huangf/data/mcfa/mcfa.R')
dataset containing only the grouping variable and the variables of interest.
raw <- read.csv("http://web.missouri.edu/huangf/data/mcfa/raw.csv")
the function is applied to the dataset specifying that sid (school id) is the clustering variable
x <- mcfa.input("sid", raw)
STEP 1
one factor model and two factor model with results
onefactor <- 'f1 =~ x1 + x2 + x3 + x4 + x5 + x6'
twofactor <- 'f1 =~ x1 + x2 + x4; f2= ~x3 + x5 + x6'
results1 <- cfa(onefactor, sample.cov = x$pw.cov, sample.nobs = x$n - x$G)
summary(results1, fit.measures = T, standardized = T)
results2 <- cfa(twofactor, sample.cov = x$pw.cov, sample.nobs = x$n - x$G)
summary(results2, fit.measures = T, standardized = T)
STEP 2
A null model is specified where both the SPW and SB matrices are used in a
multigroup setup using the factor structure defined at step 1 on both matrices with all equality constraints set to be equal. We do not really have two groups but the multigroup setup will be used to analyze both the 'within group' and the 'between group' matrices simultaneously. In lavaan, multiple input covariance matrices and the sample sizes for each are stored in a list object:
combined.cov <- list(within = x$pw.cov, between = x$b.cov)
combined.n <- list(within = x$n - x$G, between = x$G)
The first object in the list refers to group one and the second object refers to group two and we create two new objects (i.e., combined.cov and combined.n) that contain the two covariance matrices (i.e., SPW and SB) and the sample size for each (n - G and G, respectively).
Next, a model imposing the equality constraints must be specified. In this step, the model specification expands quite a bit. In lavaan, the equality constrains are imposed for the par-
ticular variable by indicating c(a,a)*variable where c() is the concatenate function, a is
a label assigned by the user to indicate that loading a for group one is set to be equal for
loading a in group two. The same label names instruct lavaan to use the same estimates be-
tween groups or in other words, specify equality constraints. To specify equal factor loadings
for both factors for the within and between models, we indicate: f1 =~ x1 + c(a,a)*x2 +
c(b,b)*x4; f2 =~ x3 + c(c,c)*x5 + c(d,d)*x6. The loadings for x1 and x3 are auto matically set to 1 so do not need to be specified.
nullmodel <- '
+ f1 =~ x1 + c(a,a)*x2 + c(b,b)*x4
+ f2 =~ x3 + c(c,c)*x5 + c(d,d)*x6
+ x1 ~~ c(e,e)*x1
+ x2 ~~ c(f,f)*x2
+ x3 ~~ c(g,g)*x3
+ x4 ~~ c(h,h)*x4
+ x5 ~~ c(i,i)*x5
+ x6 ~~ c(j,j)*x6
+ f1 ~~ c(k,k)*f1
+ f2 ~~ c(l,l)*f2
+ f1 ~~ c(m,m)*f2
+ '
results3 <- cfa(nullmodel, sample.cov = combined.cov,
+ sample.nobs = combined.n)
Thats the error message
Error: Unexpected '=' in:
"results3 <- cfa(nullmodel, sample.cov = combined.cov, + sample.nobs ="
With my data I am receiving the same error message.
Any ideas? Help is appreciated.
Best regards Konstantin

How to do rolling regression against multiple independent variables using the roll_lm function

I'm trying to regress returns against FF 3-factors with a rolling window.
To do so, I have found the function roll_lm in R, but the function is only producing regression output for one of the 3 variables.
The code is described here:
Y <- as.matrix(Portfolio_returns[,2])
X1 <- as.matrix(Mydata[,2])
X2 <- as.matrix(Mydata[,3])
X3 <- as.matrix(Mydata[,4])
Five_years_Rolling_reg <- roll_lm(X1 + X2 + X3,Y,60)
When I apply the coef function, I only get output for X1 and not X2 nor X3.
What am I doing wrong?
You problem seems to be a basic misunderstanding of how the function works. Looking at ?roll_lm
Arguments
x
matrix or xts object. Rows are observations and columns are the independent variables.
Currently it seems like you are trying to use a formula = X1 + X2 + X3 style of input, which is not what the help page is saying. As such it is adding the columns together as if it was: x1 = 2; x2 = 3; x1 + x2 = 5
Instead you should bind the rows together.
Y <- as.matrix(Portfolio_returns[,2])
X <- as.matrix(Mydata[,2:4]
roll_lm(X, Y, 60)
Or alternatively use the model.frame, model.response, model.matrix functions from base-R, which gives you the familiarity of the formula settings.
names(Mydata)[1:4] <- c("Y", "X1", "X2", "X3")
frame <- model.frame(Y ~ X1 + X2 + X3, data = Mydata)
X <- model.matrix(Y ~ X1 + X2 + X3, data = Mydata)
roll_lm(X, model.response(frame), 60)

Mathematical functions in lavaan's lavTestWald function

I'm using the lavaan package in R and want to use lavaan::lavTestWald to test the fit of a model under linear constraints. This test is part of a loop, so there are a large number of models where I want to test these constraints.
As part of the test, I want to set the absolute value of two quantities to be equal. Is there a way to do this? I know about R's abs() function but haven't been able to figure out how to incorporate abs() into lavaan::lavTestWald.
Here's a reproducible example.
HS.model <- ' visual =~ x1 + a*x2 + b*x3
textual =~ x4 + x5 + x6
speed =~ x7 + x8 + x9'
fit <- cfa(HS.model, data = HolzingerSwineford1939)
lavTestWald(fit, "abs(a) == abs(b)") # I want something like this
I don't know why, but the reason that your code is not working as expected seems related to abs(). This worked for me:
lavTestWald(fit, "sqrt(a^2) == sqrt(b^2)")
Note that you could also define new parameters in your model statement and test those, but I don't know if that makes any difference in your situation.
HS.model <- ' visual =~ x1 + a*x2 + b*x3
textual =~ x4 + x5 + x6
speed =~ x7 + x8 + x9
# new1 := abs(a) - abs(b)
new2 := sqrt(a^2) - sqrt(b^2)
'
fit <- cfa(HS.model, data = HolzingerSwineford1939)
lavTestWald(fit, "new2 == 0")

arguments transposed by R paste command within loop

Because of a bug in the neuralnet command in R, I am building a formula manually instead of using the '.' notation for all variables. Inside of a loop, the paste function is transposing the "~" and "y" as shown below.
for(i in 1:3)
{
f <- as.formula(paste(c("y",i,"~", paste(c("x1","x2"), collapse = " + ")), collapse=""))
message(f)
}
produces:
~y1x1 + x2
~y2x1 + x2
~y3x1 + x2
I tried reversing the order of the "~" and "y", but that gives an error "unexpected symbol". So the question is, how do I get:
y1~x1 + x2
y2~x1 + x2
y3~x1 + x2
Thanks!
This would be a method of producing 5 formula-objects with an sapply-loop. Note: Your current for-loop will over-write the f-values because you did not index the assignment:
sapply( paste("y",1:5,"~", paste(c("x1","x2"), collapse = " + "),
sep="") , as.formula)
$`y1~x1 + x2`
y1 ~ x1 + x2
<environment: 0x121e1b668>
$`y2~x1 + x2`
y2 ~ x1 + x2
<environment: 0x121e1b668>
$`y3~x1 + x2`
y3 ~ x1 + x2
<environment: 0x121e1b668>
$`y4~x1 + x2`
y4 ~ x1 + x2
<environment: 0x121e1b668>
$`y5~x1 + x2`
y5 ~ x1 + x2
<environment: 0x121e1b668>
There is really no way to have any other structure than a list-object, since formulas are language constructs and typically need to be inside list or list like structures and use "[[" to gain access to their values.

R add1 function, scope argument to reference all variables

When using the add1 function to consider new variables, I would like to reference all variables (either in some dataframe or global environment), but I can not figure out how to use the scope argument to do this.
I am aware I can use it like this
X = data.frame(replicate(4,rnorm(20))) ; y = rnorm(20)
lm1 = lm(y ~ 1)
out = add1(lm1, scope= ~X$X1 + X$X2 + X$X3)
but I want to avoid manually writing in every variable.
As I have seen in other questions, I know the . symbol will not work but I am not sure why. It stands for what is already there, so if I do
x1 = rnorm(20) ; x2 = rnorm(20) ; x3 = rnorm(20) ; x4 = rnorm(20) ; y = rnorm(20)
out = add1(lm1, scope= ~ . )
it does not use what is already in the global environment.
I know the documentation says that scope must be "a formula giving the terms to be considered", but that is usually where . can be used to reference all variables.
Thanks in advance.
Also note I have read Chp 7 of MASS, and these related threads
scope from add1()-command in R
http://tolstoy.newcastle.edu.au/R/help/02b/3588.html
This is an even simpler answer, which I found after browsing this question
http://r.789695.n4.nabble.com/glm-formula-vs-character-td2543061.html
x1 = rnorm(100)
x2 = rnorm(100)
x3 = rnorm(100)
y = rnorm(100)
BaseReg = lm(y ~ 1)
newdf = data.frame(x1,x2,x3)
out = add1(BaseReg, names(newdf))
It is baffling that such a simple way to get this was not stated in the documentation for add1.
As the help page for add1 says the formula ~. means "what's already there". It is not any simpler to use as.formula for small numbers of names but this approach can be using in a function or script. (Generally one would expect to put the X's and Y in the same dataframe.)
as.formula(paste("~", paste(names(YX)[-c(1,5)],collapse="+")))
#~X1 + X2 + X3
YX <- cbind(y,X)
form <- as.formula(paste("~", paste(names(YX)[-c(1,5)],collapse="+")))
add1(lm1, form)
You appear to have stumbled across a more efficient strategy. If using a data object with column names: "y" "X1" "X2" "X3"
"X4:
> formula(YX)
y ~ X1 + X2 + X3 + X4
> formula(YX)[-2]
~X1 + X2 + X3 + X4
> as.list(formula(YX))
[[1]]
`~`
[[2]]
y
[[3]]
X1 + X2 + X3 + X4
> names(YX)
[1] "y" "X1" "X2" "X3" "X4"
You can see that a formula object has as its first element the formula-defining tilde which is really an R function. The second element is the LHS expression and the third elemtn is the RHS expression.
Here is something I found that works:
X = data.frame(replicate(4,rnorm(20)))
lm1 = lm(X1 ~ 1 ,data=X)
add1(lm1, scope=formula(X)[-2])
Granted, I have no idea why this is the case
formula(X)[-2]
# ~X2 + X3 + X4
I just found it by accident. Other things like formula(X)[-1] and formula(X)[-3] also return other things which are equally bizarre to me.

Resources