I am helping another researcher with their coding in R. I did not work with them during the planning of the experiment design and now I could really use some help with this tricky design. I have four fixed factor: FactorA, FactorB, FactorC, and FactorD. The experiment is not a fully factorial design. There are missing cells (combinantions of factors that are not available) in addition to umbalaced number of samples. For the combinations FactorA:FactorB, FactorA:FactorC, and FactorB:FactorC, I have the proper amount of cells (treatment combinations). I also have a random factor: Block, which is nested within FactorD. In my field, it is common for people (even in high impact journals) just to run different ANOVAs for each factor to avoid dealing with this type of problem, but I wonder if I could write a model that comprises all those factors.
Please, could I use something like this?
lmerTest::lmer(Response ~ FactorA + FactorB + FactorC + FactorD +
FactorA:FactorB + FactorA:FactorC + FactorB:FactorC +
(1|FactorD/Block),indexes)
I appreciate any suggestions you may have!
Assuming that what you're missing from the design are some combinations of factor D with the other factors, this is close.
You can express this a little more compactly as
Response ~ (FactorA + FactorB + FactorC)^2 + FactorD + (1|FactorD:Block)
You shouldn't use (1|FactorD/Block), because that will expand to (1|FactorD) + (1|FactorD:Block) and give you a redundant term (FactorD will be specified as both a fixed and a random effect)
Unbalanced numbers of observations don't matter as long as a factor combination is not completely missing/has at least one observation.
Related
I have a dataset here:
'''dataset
I want to perform linear and multiple regression.MoralRelationship and SkeletalP are both dependent variables while others are independent. I tried all the various method of Transformation I know but it did not yield any meaningful result from my diagnostic plot
I did this:
lm1<- lm(MoralRelationship ~ RThumb + RTindex + RTmid + RTFourth + RTFifth + Lthumb + Lindex
+ LTMid + LTFourth + LTfifth + BldGRP1 + BlDGR2, data=data)
I did same for SkeletalP
I did adiagnostic plot for both. then Tried to normalize the variables because there is correlation nor linearity. I took square term, log ,Sqrtof all independent variables also,log,1/x but no better output.
I also did
`lm(SkeletalP ~ RThumb + I(RThumb^2), data=data)`
if i will get a better result with one variable.
The independent variables are right skewed except for ANB which is normally distributed.
is there method I can use to transform my data? most importantly, to be uniformly distributed so that i can perform other statistical test.
Your dataset is kind of small. You can try dimensionality reduction like PCA, but I don't think it's appropriate here. It's also harder to interpret.
Have you tried other models? Tuning might help the fit of your regression models (e.g. Lasso/Ridge L1/L2 regulation)
Title basically explains it but I'm trying to build a bifactor model with psychopathy as one factor and subtypes as the other. I believe that I have everything constrained properly but that might be the issue.
Current code:
BifactorModel <- 'psychopathyBi =~ YPIS_1 + YPIS_2 + YPIS_3 + YPIS_4 + YPIS_5 + YPIS_6 + YPIS_7 + YPIS_8 + YPIS_9 +YPIS_10 + YPIS_11 + YPIS_12 + YPIS_13 + YPIS_14 + YPIS_15 + YPIS_16 + YPIS_17 + YPIS_18
GMbi =~ YPIS_4 + YPIS_5 + YPIS_8 + YPIS_9 + YPIS_14 + YPIS_16
CUbi =~ YPIS_3 + YPIS_6 + YPIS_10 + YPIS_15 + YPIS_17 + YPIS_18
DIbi =~ YPIS_1 + YPIS_2 + YPIS_7 + YPIS_11 + YPIS_12 + YPIS_13
psychopathyBi ~~ 0*GMbi
psychopathyBi ~~ 0*CUbi
psychopathyBi ~~ 0*DIbi
GMbi ~~ 0*CUbi
GMbi ~~ 0*DIbi
CUbi ~~ 0*DIbi
'
#fit bifactor model
bifactorFit <- cfa(BifactorModel, data = YPIS_Data)
#get summary of bifactor model
summary(bifactorFit, fit.measures = TRUE, standardized = TRUE)
This produces the following:
lavaan 0.6-9 did NOT end normally after 862 iterations
this is what the model should ultimately look like once converged
Any suggestions or comments would be greatly appreciated. Thanks in advance.
The variances of several of your latent variables are very small. For example, Dlbi appears to be effectively zero. That's the source of the issue here.
There are two things you can to try to remedy this.
First, it may work better to identify the model by fixing the latent variable variances to 1, rather than fixing the first indicator factor loadings to 1. Do this by specifying std.lv = TRUE.
Even then, it will likely be the case that loadings onto one or more of the group factors will have very small loadings. This indicates that there really isn't much of a distinct group factor in your data for this items that is distinct from the general factor. You should consider estimating a model that drops that group factor (as well as comparing with models dropping the other group factors one at a time). We discuss this issue some here: https://psyarxiv.com/q356f/
Additionally, you should constrain item loadings so that they are in the theoretically expected direction (e.g., all positive with a lower bound of 0). It is common for bifactor models to overextract variance in items and produce uninterpretable group factors that have a mix of positive and negative loadings. This can also cause convergence issues.
In general, this sort of unconstrained bifactor model tends to be overly flexible and tends to overfit to a similar degree as exploratory factor analysis. You should be sure to evaluate the bifactor model based not only on global model fit statistics, but also on whether the factor loadings actually resemble a true bifactor model--do the items each show substantial loadings on both the general factor and their group factor in the expected directions, or do items tend to load on only one or the other? See some examples in the paper linked above about this issue.
Another option would be to switch to exploratory bifactor modeling. This is implemented in R in the fungible package in the fungible::BiFAD() function. This approach is discussed here:
https://www.sciencedirect.com/science/article/pii/S0001879120300555
Exploratory bifactor models are useful because they rely on targeted EFA rotation to estimate loadings. This makes convergence much more likely and can help to diagnose when a group factor is too weak to identify in the data.
I want to perform a multiple group CFA with lavaan in R.
I have several categorical variables and some variables contains 11 categories. So these variables will have 10 thresholds. In the results below you can see thatthe 10th threshold is smaller than the 9th, i.e., it is not in the creasing order.
Several variables with 11 categories have the same problem.
Question:
Why are the thresholds distorted?
R-code:
model2<-'range = ~ NA*gvjbevn + gvhlthc + gvslvol + gvslvue + gvcldcr + gvpdlwk
goals = ~ NA*sbprvpv + sbeqsoc + sbcwkfm
range~~1*range
goals~~1*goals
gvhlthc ~~ gvslvol
gvcldcr ~~ gvpdlwk
'
cfa.model2<-cfa(model2, ordered=varcat, estimator="WLSMV",data=sub)
summary(cfa.model2,fit.measures=TRUE,standardized=TRUE, modindices=TRUE)
Label assignation of the thresholds was sorted alphabetically, aka c('t1','t10','t2','t3'....) but summary() sorts it ""properly"".
You can try to add additional factors to check if your scale corresponds to:
c('t1','t10','t11','t12',...,'t2','t3'....)
Not much you can do on your side, except understand which row is each of your factors.
Well, it seems like I cannot add a comment due to not having enough reputation, so I can only reply with an answer, although this is not a proper answer (it will definitely not solve your issue, though I hope it points in the right direction).
For your example to be reproducible, you should provide the community with the data to fit the model.
On the other side, I guess your problem must have to do with the nature of the category: it's possible that your 11th category does not mean "the most level of agreement" with the item, or that the response categories are not ordered from 1 to 11, or something similar. Given that the rest of the thresholds seem to accurately represent a continuous, monotonically increasing scale, and that this same problem precisely happens in the same category in different variables (at least the two that you are showing), there must be something with the response scale in those items.
In summary, it seems to be more of a problem of interpretation of the parameters of the model rather than a statistical issue.
I'm having a huge problem with a nested model I am trying to fit in R.
I have response time experiment with 2 conditions with 46 people each and 32 measures each. I would like measures to be nested within people and people nested within conditions, but I can't get it to work.
The code I thought should make sense was:
nestedmodel <- lmer(responsetime ~ 1 + condition +
(1|condition:person) + (1|person:measure), data=dat)
However, all I get is an error:
Error in checkNlevels(reTrms$flist, n = n, control) :
number of levels of each grouping factor must be < number of observations
Unfortunately, I do not even know where to start looking what the problem is here.
Any ideas? Please, please, please? =)
Cheers!
This might be more appropriate on CrossValidated, but: lme4 is trying to tell you that one or more of your random effects is confounded with the residual variance. As you've described your data, I don't quite see why: you should have 2*46*32=2944 total observations, 2*46=92 combinations of condition and person, and 46*32=1472 combinations of measure and person.
If you do
lf <- lFormula(responsetime ~ 1 + condition +
(1|condition:person) + (1|person:measure), data=dat)
and then
lapply(lf$reTrms$Ztlist,dim)
to look at the transposed random-effect design matrices for each term, what do you get? You should (based on your description of your data) see that these matrices are 1472 by 2944 and 92 by 2944, respectively.
As #MrFlick says, a reproducible example would be nice. Other things you could show us are:
fit the model anyway, using lmerControl(check.nobs.vs.nRE="ignore") to ignore the test, and show us the results (especially the random effects variances and the statement of the numbers of groups)
show us the results of with(dat,table(table(interaction(condition,person))) to give information on the number of replicates per combination (and similarly for measure)
I have data of various companies' financial information organized by company ticker. I'd like to regress one of the columns' values against the others while keeping the company constant. Is there an easy way to write this out in lm() notation?
I've tried using:
reg <- lmList(lead2.dDA ~ paudit1 + abs.d.GINDEX + logcapx + logmkvalt +
logmkvalt2|pp, data=reg.df)
where pp is a vector of company names, but this returns coefficients as though I regressed all the data at once (and did not separate by company name).
A convenient and apparently little-known syntax for estimating separate regression coefficients by group in lm() involves using the nesting operator, /. In this case it would look like:
reg <- lm(lead2.dDA ~ 0 + pp/(paudit1 + abs.d.GINDEX + logcapx +
logmkvalt + logmkvalt2), data=reg.df)
Make sure that pp is a factor and not a numeric. Also notice that the overall intercept must be suppressed for this to work; in the new formulation, we have a different "intercept" for each group.
A couple comments:
Although the regression coefficients obtained this way will match those given by lmList(), it should be noted that with lm() we estimate only a single residual variance across all the groups, whereas lmList() would estimate separate residual variances for each group.
Like I mentioned in my earlier comment, the lmList() syntax that you gave looks like it should have worked. Since you say it didn't, this leads me to expect that really the problem is something else (although it's hard to tell what without a reproducible example), and so it seems likely that the solution I posted will fail for you as well, for the same unknown reasons. If you want more detailed guidance, please provide more information; help us help you.