compare different multi level regressions - r

i am struggeling at the following:
My idea is to analyse the development (slope) of an output of different multi level regressions.
The output is matched in my data with 2 different timepoints.
I have 3 predictors (senseofhumor, seriousness, friendlyness)
These predictors are meassured for many people and groups.
And is assume here, that SenseofhumorHIGH (as a special value variable from "senseofhumor" ) might have an impact if its high within a group on the outcome. I also assume the slope might first increase dramatically and than increase slower.
How can I compare different slopes with from different regressions with each other?
How is the best way to visualize this slopes?
The code would look something like that:
RandomslopeEC(timepoint1) <- lme(criteria(timepoint1) ~ senseofhumor + seriousness + friendlyness , data = DATA, random = ~ **SenseofhumorHIGH**|group)
RandomslopeEC(timepoint2) <- lme(criteria(timepoint2) ~ senseofhumor + seriousness + friendlyness , data = DATA, random = ~ **SenseofhumorHIGH**|group)
RandomslopeEC(timepoint3) <- lme(criteria(timepoint3) ~ senseofhumor + seriousness + friendlyness , data = DATA, random = ~ **SenseofhumorHIGH**|group)
Thanks a lot in advance

it worked out with changing the format from wide to long.
I used:
DATAlong<- DATA %>%
gather(`criteriatimepoint1`, `criteriatimepoint2`, `criteriatimepoint3`, key = "timepoint", value = "criteriavalue")
for that.
Afterwards i used
RandomslopeEC <- lme(criteria) ~ senseofhumor*timepoint + seriousness*timepoint + friendlyness*timepoint , data = DATAlong, random = ~ 1|group/timepoint)
for that.
I hope this might others help as well.

Related

Permutation test / create a function for event dummy for different dates

This question will be very difficult to formulate, but essentially I'm doing an event study for my bachelor thesis and would like to examine how significant the coefficient for my interaction effect is.
What I find difficult is that I want to run a fixed effect regression in which the interaction effect, treatment * event, change dates for the event window. I want to create a function that makes the dummy variable for the event to take on different dates (it should be a length of 7 consequitive days) and that the start date of these seven dates change.
The equation looks something like this:
return = intercept + ROA + Tanibility + ... + Event + Treatment group + Event * Treatment group
Then I would like to run the fixed effects regression with these different dummies to then extract the estimates for the dummy. This would enable me to compare the coefficent estimate obtained for the actual event date with the "fake" event days.
Thank you in advance if you decide to help me :)
I have tried to find different functions and I have only found permutation tests that test the means of the two groups.
I've tried to make a function and have this:
set.seed(1000)
N <- 10^3-1 # Number of permutations
resultX <- numeric(N)
for(i in 1:N){
index1 <- sample(length(merged_df), size = 7, replace = FALSE)
df_c$eventdummy <- ifelse(merged_df$day == index1, 1, 0)
resultX[i] <- plm(ret ~ sse*eventdummy +
sse +
eventdummy +
roa +
leverage +
mtb +
tangibility,
data = merged_df,
index = c("entity"),
model = "within")
}

How do I add variables to my regression solely when another variable is a specific value?

My dataset contains R&D expenses for the same company for several periods: -3 to +3. I use this fixed effects linear model:
m5a <- felm(ihs(RD_expenses) ~ merger_it_pre1 + merger_it + merger_it_post1 + merger_it_post2 + merger_it_post3 + factor(year),
data = TandC,
subset = RD_expenses > pcts[1] & RD_expenses < pcts[2])
Now I want to add a covariate for RD expenses but only when the period is -1. The period variable is another column in the same dataset.
Any ideas how I can do this? I tried to add ihs(RD_expenses)if(period = -1)) to the model but this did not work. I get the error: unexpected if.
I'm quite new to R so if someone could help me out that would be much appreciated!

R Syntax Simple Slopes MEM

Question regarding the syntax of a mixed effects model on R.
I have run the following code to examine the simple slope to determine the effect of one of my variables (variability) within another one of my variables (ambiguity):
lmer.E1.v2 <- lmer(logRT ~ Variability.c / Ambiguity.c + (Variability.c + Ambiguity.c|ID),
data=data %>% filter(Experiment == "E1"),
control=lmerControl(optimizer="bobyqa", optCtrl=list(maxfun=2e5)))
summary(lmer.E1.v2)
When I reverse these two variables, so that the code looks like this:
lmer.E1.v2 <- lmer(logRT ~ Ambiguity.c / Variability.c + (Ambiguity.c + Variability.c|ID),
data=data %>% filter(Experiment == "E1"),
control=lmerControl(optimizer="bobyqa", optCtrl=list(maxfun=2e5)))
summary(lmer.E1.v2)
.. and I get different output in the first section of code than the second. What is the difference in interpretation in reversing the order of my two variables in the syntax?
The primary issue is that the / operator is not commutative (i.e. a/b != b/a): a/b expands to a + a:b, while b/a expands to b + a:b. You should get the same overall fit (predictions, likelihood, etc.), at least up to some degree of numeric fuzz, but the model parameterization will be different.
There do exist cases where (a+b|g) gives different answers from (b+a|g) (see here, but this is unusual).

lavaan WARNING: some observed variances are (at least) a factor 1000 times larger than others; use varTable(fit) to investigate

I am trying to evaluate the sem model from a dataset, some of the data are in likert scale i.e from 1-5. and some of the data are COUNTS generated from the computer log for some of the activity.
Whereas while performing the fits the laveen is giving me the error as:
lavaan WARNING: some observed variances are (at least) a factor 1000 times larger than others; use varTable(fit) to investigate
To mitigate this warning I want to scale some of the variables. But couldn't understand the way for doing that.
Log_And_SurveyResult <- read_excel("C:/Users/Aakash/Desktop/analysis/Log-And-SurveyResult.xlsx")
model <- '
Reward =~ REW1 + REW2 + REW3 + REW4
ECA =~ ECA1 + ECA2 + ECA3
Feedback =~ FED1 + FED2 + FED3 + FED4
Motivation =~ Reward + ECA + Feedback
Satisfaction =~ a*MaxTimeSpentInAWeek + a*TotalTimeSpent + a*TotalLearningActivityView
Motivation ~ Satisfaction'
fit <- sem(model,data = Log_And_SurveyResult)
summary(fit, standardized=T, std.lv = T)
fitMeasures(fit, c("cfi", "rmsea", "srmr"))
I want to scale some of the variables like MaxTimeSpentInAWeek and TotalTimeSpent
Could you please help me figure out how to scale the variables? Thank you very much.
As Elias pointed out, the difference in the magnitude between the variables is huge and it is suggested to scale the variables.
The warning gives a hint and inspecting varTable(fit) returns summary information about the variables in a fitted lavaan object.
Rather than running scale() separately on each column, you could use apply() on a subset or on your whole data.frame:
## Scale the variables in the 4th and 7h column
Log_And_SurveyResult[, c(4, 7)] <- apply(Log_And_SurveyResult[, c(4, 7)], 2, scale)
## Scale the whole data.frame
Log_And_SurveyResult <- apply(Log_And_SurveyResult, 2, scale)
You can just use scale(MaxTimeSpentInAWeek). This will scale your variable to mean = 0 and variance = 1. E.g:
Log_And_SurveyResult$MaxTimeSpentInAWeek <-
scale(Log_And_SurveyResult$MaxTimeSpentInAWeek)
Log_And_SurveyResult$TotalTimeSpent <-
scale(Log_And_SurveyResult$TotalTimeSpent)
Or did I misunderstand your question?

Visualising a three way interaction between two continuous variables and one categorical variable in R

I have a model in R that includes a significant three-way interaction between two continuous independent variables IVContinuousA, IVContinuousB, IVCategorical and one categorical variable (with two levels: Control and Treatment). The dependent variable is continuous (DV).
model <- lm(DV ~ IVContinuousA * IVContinuousB * IVCategorical)
You can find the data here
I am trying to find out a way to visualise this in R to ease my interpretation of it (perhaps in ggplot2?).
Somewhat inspired by this blog post I thought that I could dichotomise IVContinuousB into high and low values (so it would be a two-level factor itself:
IVContinuousBHigh <- mean(IVContinuousB) + sd (IVContinuousB)
IVContinuousBLow <- mean(IVContinuousB) - sd (IVContinuousB)
I then planned to plot the relationship between DV and IV ContinuousA and fit lines representing the slopes of this relationship for different combinations of IVCategorical and my new dichotomised IVContinuousB:
IVCategoricalControl and IVContinuousBHigh
IVCategoricalControl and IVContinuousBLow
IVCategoricalTreatment and IVContinuousBHigh
IVCategoricalTreatment and IVContinuousBLow
My first question is does this sound like a viable solution to producing an interpretable plot of this three-way-interaction? I want to avoid 3D plots if possible as I don't find them intuitive... Or is there another way to go about it? Maybe facet plots for the different combinations above?
If it is an ok solution, my second question is how to I generate the data to predict the fit lines to represent the different combinations above?
Third question - does anyone have any advice as to how to code this up in ggplot2?
I posted a very similar question on Cross Validated but because it is more code related I thought I would try here instead (I will remove the CV post if this one is more relevant to the community :) )
Thanks so much in advance,
Sarah
Note that there are NAs (left as blanks) in the DV column and the design is unbalanced - with slightly different numbers of datapoints in the Control vs Treatment groups of the variable IVCategorical.
FYI I have the code for visaualising a two-way interaction between IVContinuousA and IVCategorical:
A<-ggplot(data=data,aes(x=AOTAverage,y=SciconC,group=MisinfoCondition,shape=MisinfoCondition,col = MisinfoCondition,))+geom_point(size = 2)+geom_smooth(method='lm',formula=y~x)
But what I want is to plot this relationship conditional on IVContinuousB....
Here are a couple of options for visualizing the model output in two dimensions. I'm assuming here that the goal here is to compare Treatment to Control
library(tidyverse)
theme_set(theme_classic() +
theme(panel.background=element_rect(colour="grey40", fill=NA))
dat = read_excel("Some Data.xlsx") # I downloaded your data file
mod <- lm(DV ~ IVContinuousA * IVContinuousB * IVCategorical, data=dat)
# Function to create prediction grid data frame
make_pred_dat = function(data=dat, nA=20, nB=5) {
nCat = length(unique(data$IVCategorical))
d = with(data,
data.frame(IVContinuousA=rep(seq(min(IVContinuousA), max(IVContinuousA), length=nA), nB*2),
IVContinuousB=rep(rep(seq(min(IVContinuousB), max(IVContinuousB), length=nB), each=nA), nCat),
IVCategorical=rep(unique(IVCategorical), each=nA*nB)))
d$DV = predict(mod, newdata=d)
return(d)
}
IVContinuousA vs. DV by levels of IVContinuousB
The roles of IVContinuousA and IVContinuousB can of course be switched here.
ggplot(make_pred_dat(), aes(x=IVContinuousA, y=DV, colour=IVCategorical)) +
geom_line() +
facet_grid(. ~ round(IVContinuousB,2)) +
ggtitle("IVContinuousA vs. DV, by Level of IVContinousB") +
labs(colour="")
You can make a similar plot without faceting, but it gets difficult to interpret as the number of IVContinuousB levels increases:
ggplot(make_pred_dat(nB=3),
aes(x=IVContinuousA, y=DV, colour=IVCategorical, linetype=factor(round(IVContinuousB,2)))) +
geom_line() +
#facet_grid(. ~ round(IVContinuousB,2)) +
ggtitle("IVContinuousA vs. DV, by Level of IVContinousB") +
labs(colour="", linetype="IVContinuousB") +
scale_linetype_manual(values=c("1434","11","62")) +
guides(linetype=guide_legend(reverse=TRUE))
Heat map of the model-predicted difference, DV treatment - DV control on a grid of IVContinuousA and IVContinuousB values
Below, we look at the difference between treatment and control at each pair of IVContinuousA and IVContinuousB.
ggplot(make_pred_dat(nA=100, nB=100) %>%
group_by(IVContinuousA, IVContinuousB) %>%
arrange(IVCategorical) %>%
summarise(DV = diff(DV)),
aes(x=IVContinuousA, y=IVContinuousB)) +
geom_tile(aes(fill=DV)) +
scale_fill_gradient2(low="red", mid="white", high="blue") +
labs(fill=expression(Delta*DV~(Treatment - Control)))
If you really want to avoid 3-d plotting, you could indeed turn one of the continuous variables into a categorical one for visualization purposes.
For the purpose of the answer, I used the Duncan data set from the package car, as it is of the same form as the one you described.
library(car)
# the data
data("Duncan")
# the fitted model; education and income are continuous, type is categorical
lm0 <- lm(prestige ~ education * income * type, data = Duncan)
# turning education into high and low values (you can extend this to more
# levels)
edu_high <- mean(Duncan$education) + sd(Duncan$education)
edu_low <- mean(Duncan$education) - sd(Duncan$education)
# the values below should be used for predictions, each combination of the
# categories must be represented:
prediction_mat <- data.frame(income = Duncan$income,
education = rep(c(edu_high, edu_low),each =
nrow(Duncan)),
type = rep(levels(Duncan$type), each =
nrow(Duncan)*2))
predicted <- predict(lm0, newdata = prediction_mat)
# rearranging the fitted values and the values used for predictions
df <- data.frame(predicted,
income = Duncan$income,
edu_group =rep(c("edu_high", "edu_low"),each = nrow(Duncan)),
type = rep(levels(Duncan$type), each = nrow(Duncan)*2))
# plotting the fitted regression lines
ggplot(df, aes(x = income, y = predicted, group = type, col = type)) +
geom_line() +
facet_grid(. ~ edu_group)

Resources