R - plotting the predictions from a mixed model with more than two predictors (continuous and factor) - r

I found this answer by Ben Bolker to a post and it is really helpful (How to plot random intercept and slope in a mixed model with multiple predictors?). However, if my model looks more like this: /n
mod <- lmer(resp ~ pred1 + pred2 + factor(pred3) + (1|RF1),data=d) and I also want to plot the factor's influence on the response keeping the other two constant, how would I create the nd dataframe instead? Also, how would I go about plotting random slopes? Thank you very much in advance!
EDIT: Ben, thank you very much for the answer and I apologize, of course it makes sense to give a reproducible example.
So, the first question: how can I plot the influence of a predictor keeping the others constant (as described in your answer to the above linked question) if I have a factor variable in my model?
Here is my example data: https://www.dropbox.com/s/ytlocw868fsnpu7/realdatasample.csv?dl=0, please treat confidentially :).
So the model would be:
moddata <- lmer(meanQUALNEW ~ meanDBH + meanCRRATIO + richn_tar + (1|region),data=realdatasample)
From what I understand, the example given in the link above is about constructing a plot for one predictor while keeping the other constant and then vice versa and taking into account the random effect. But how do I expand that code to account for three variables and especially if it is a factor?
The second question:
How can I visualize the random slopes in a model like this?
moddata1 <- lmer(meanQUALNEW ~ meanDBH + meanCRRATIO + richn_tar + (richn_tar-1|region),data=realdatasample)
As far as I understand, the packages visreg and effects provide ways to visualize the fixed part of such models in the accepted way (change in one predictor keeping others constant). But they don't work (as far as I know) for nice visualizations of the random effects variance components.
I realize that there is probably a lot of information about this out there, but I like the clear code example from above very much and would like to understand how to do these things "by hand".
Thanks so much for any help!

Related

Having issues in transforming my data for further analysis In R

I have a dataset here:
'''dataset
I want to perform linear and multiple regression.MoralRelationship and SkeletalP are both dependent variables while others are independent. I tried all the various method of Transformation I know but it did not yield any meaningful result from my diagnostic plot
I did this:
lm1<- lm(MoralRelationship ~ RThumb + RTindex + RTmid + RTFourth + RTFifth + Lthumb + Lindex
+ LTMid + LTFourth + LTfifth + BldGRP1 + BlDGR2, data=data)
I did same for SkeletalP
I did adiagnostic plot for both. then Tried to normalize the variables because there is correlation nor linearity. I took square term, log ,Sqrtof all independent variables also,log,1/x but no better output.
I also did
`lm(SkeletalP ~ RThumb + I(RThumb^2), data=data)`
if i will get a better result with one variable.
The independent variables are right skewed except for ANB which is normally distributed.
is there method I can use to transform my data? most importantly, to be uniformly distributed so that i can perform other statistical test.
Your dataset is kind of small. You can try dimensionality reduction like PCA, but I don't think it's appropriate here. It's also harder to interpret.
Have you tried other models? Tuning might help the fit of your regression models (e.g. Lasso/Ridge L1/L2 regulation)

Correlation coefficient between nominal and cardinal scale variables

I have to describe the correlation between a variable "Average passes completed per game" (cardinal scale) and a variable "Position" (nominal scale) and measure the strength of the correlation. For that I have to choose the correlation coefficient correctly considering the Scales. Does anyone know what the best way to do that would be? I am not sure what to use since it is two different scales. The full dataset consists of the following variables:
PLAYER: Name of the player
COUNTRY: Country of origin
BIRTHDATE: Birthday Date
HEIGHT_IN_CM: Height of the player
POSITION: Position of the player
PASSES_COMPLETED: Passes completed by the player
DISTANCE_COVERED: Distance covered by the player in km
MINUTES_PLAYED: Minutes played
AVG_PASSES_COMPLETED: Average passes completed by the player
I would very much appreciate if someone could give me some advice on this.
Thank you!
OK, so you need to redefine your question somewhat. Without two continuous variables correlations cannot be used to "describe" a relationship as I guess you are asking. You can, however, see if there are statistically significant differences in pass rates between different positions. As for the questions on the statistics, I agree with Maurtis...CV is best place. As for the code to do the tests, try this:
Firstly you need to make sure you have the right packages installed. You will definitely need ggplot and ggfortify, and maybe others if you have to manipulate data, or other things. And load the libraries:
library(ggplot2)
library(ggfortify)
Next, make sure that your data is tidy: ie, variables in columns.
Then import your data into R:
#find file
data.location = file.choose()
#Import data
curr.data <- read.csv(data.location)
#Check data import
glimpse(curr.data)
Then plot using ggplot:
ggplot(curr.data, aes(x = POSITION, y = AVG_PASSES_COMPLETED)) +
geom_boxplot() +
theme_bw()
Then model using the linear model function (lm()) to see if there is a significant difference in pass rates with regards to position.
passrate_model <- lm(AVG_PASSES_COMPLETED ~ POSITION, data = curr.data)
Before you test your hypothesis, you need to check the appropriateness of the model
autoplot(passrate_model, smooth.colour = NA)
If the residual plots look fine, then we are ready to test. If not then you will have to use another type of model (and I'm not going into that here now....).
The appropriate test for this (I think) would be a Tukey test, which requires an ANOVA. This will give a summary, and should show you if there is variance due to position:
passrate_av <- aov(passrate_model)
summary(passrate_av)
This will perform the Tukey test and give pair-wise comparisons including difference in means, 95% confidence intervals, and adjusted p-values:
tukey.test <- TukeyHSD(passrate_av)
tukey.test
And it can even do a nice plot for you too:
plot(tukey.test)

Lavaan - CFA - categorical variables - the last threshold is strange

I want to perform a multiple group CFA with lavaan in R.
I have several categorical variables and some variables contains 11 categories. So these variables will have 10 thresholds. In the results below you can see thatthe 10th threshold is smaller than the 9th, i.e., it is not in the creasing order.
Several variables with 11 categories have the same problem.
Question:
Why are the thresholds distorted?
R-code:
model2<-'range = ~ NA*gvjbevn + gvhlthc + gvslvol + gvslvue + gvcldcr + gvpdlwk
goals = ~ NA*sbprvpv + sbeqsoc + sbcwkfm
range~~1*range
goals~~1*goals
gvhlthc ~~ gvslvol
gvcldcr ~~ gvpdlwk
'
cfa.model2<-cfa(model2, ordered=varcat, estimator="WLSMV",data=sub)
summary(cfa.model2,fit.measures=TRUE,standardized=TRUE, modindices=TRUE)
Label assignation of the thresholds was sorted alphabetically, aka c('t1','t10','t2','t3'....) but summary() sorts it ""properly"".
You can try to add additional factors to check if your scale corresponds to:
c('t1','t10','t11','t12',...,'t2','t3'....)
Not much you can do on your side, except understand which row is each of your factors.
Well, it seems like I cannot add a comment due to not having enough reputation, so I can only reply with an answer, although this is not a proper answer (it will definitely not solve your issue, though I hope it points in the right direction).
For your example to be reproducible, you should provide the community with the data to fit the model.
On the other side, I guess your problem must have to do with the nature of the category: it's possible that your 11th category does not mean "the most level of agreement" with the item, or that the response categories are not ordered from 1 to 11, or something similar. Given that the rest of the thresholds seem to accurately represent a continuous, monotonically increasing scale, and that this same problem precisely happens in the same category in different variables (at least the two that you are showing), there must be something with the response scale in those items.
In summary, it seems to be more of a problem of interpretation of the parameters of the model rather than a statistical issue.

Multiple comparisions using glht with repeated measure anova

I'm using the following code to try to get at post-hoc comparisons for my cell means:
result.lme3<-lme(Response~Pressure*Treatment*Gender*Group, mydata, ~1|Subject/Pressure/Treatment)
aov.result<-aov(result.lme3, mydata)
TukeyHSD(aov.result, "Pressure:Treatment:Gender:Group")
This gives me a result, but most of the adjusted p-values are incredibly small - so I'm not convinced the result is correct.
Alternatively I'm trying this:
summary(glht(result.lme3,linfct=mcp(????="Tukey")
I don't know how to get the Pressure:Treatment:Gender:Group in the glht code.
Help is appreciated - even if it is just a link to a question I didn't find previously.
I have 504 observations, Pressure has 4 levels and is repeated in each subject, Treatment has 2 levels and is repeated in each subject, Group has 3 levels, and Gender is obvious.
Thanks
I solved a similar problem creating a interaction dummy variable using interaction() function which contains all combinations of the leves of your 4 variables.
I made many tests, the estimates shown for the various levels of this variable show the joint effect of the active levels plus the interaction effect.
For example if:
temperature ~ interaction(infection(y/n), acetaminophen(y/n))
(i put the possible leves in the parenthesis for clarity) the interaction var will have a level like "infection.y:acetaminophen.y" which show the effect on temperature of both infection, acetaminophen and the interaction of the two in comparison with the intercept (where both variables are n).
Instead if the model was:
temperature ~ infection(y/n) * acetaminophen(y/n)
to have the same coefficient for the case when both vars are y, you would have had to add the two simple effect plus the interaction effect. The result is the same but i prefer using interaction since is more clean and elegant.
The in glht you use:
summary(glht(model, linfct= mcp(interaction_var = 'Tukey'))
to achieve your post-hoc, where interaction_var <- interaction(infection, acetaminophen).
TO BE NOTED: i never tested this methodology with nested and mixed models so beware!

Regression coefficients by group in dataframe R

I have data of various companies' financial information organized by company ticker. I'd like to regress one of the columns' values against the others while keeping the company constant. Is there an easy way to write this out in lm() notation?
I've tried using:
reg <- lmList(lead2.dDA ~ paudit1 + abs.d.GINDEX + logcapx + logmkvalt +
logmkvalt2|pp, data=reg.df)
where pp is a vector of company names, but this returns coefficients as though I regressed all the data at once (and did not separate by company name).
A convenient and apparently little-known syntax for estimating separate regression coefficients by group in lm() involves using the nesting operator, /. In this case it would look like:
reg <- lm(lead2.dDA ~ 0 + pp/(paudit1 + abs.d.GINDEX + logcapx +
logmkvalt + logmkvalt2), data=reg.df)
Make sure that pp is a factor and not a numeric. Also notice that the overall intercept must be suppressed for this to work; in the new formulation, we have a different "intercept" for each group.
A couple comments:
Although the regression coefficients obtained this way will match those given by lmList(), it should be noted that with lm() we estimate only a single residual variance across all the groups, whereas lmList() would estimate separate residual variances for each group.
Like I mentioned in my earlier comment, the lmList() syntax that you gave looks like it should have worked. Since you say it didn't, this leads me to expect that really the problem is something else (although it's hard to tell what without a reproducible example), and so it seems likely that the solution I posted will fail for you as well, for the same unknown reasons. If you want more detailed guidance, please provide more information; help us help you.

Resources