Related
I have a repeated measurements dataset of 24 stroke patients in which I want to assess the effect of three different types of rehabilitation (Group) on functional recovery scores (Barthel_index). Each patients functional ability was measured weekly (Time_num) for 8 weeks.
The data looks as follows:
library(dplyr)
library(magrittr)
library(nlme)
library(lmer)
mydata <-
structure(list(Subject = c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 4L,
4L, 4L, 4L, 4L, 4L, 4L, 4L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 6L,
6L, 6L, 6L, 6L, 6L, 6L, 6L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 8L,
8L, 8L, 8L, 8L, 8L, 8L, 8L, 9L, 9L, 9L, 9L, 9L, 9L, 9L, 9L, 10L,
10L, 10L, 10L, 10L, 10L, 10L, 10L, 11L, 11L, 11L, 11L, 11L, 11L,
11L, 11L, 12L, 12L, 12L, 12L, 12L, 12L, 12L, 12L, 13L, 13L, 13L,
13L, 13L, 13L, 13L, 13L, 14L, 14L, 14L, 14L, 14L, 14L, 14L, 14L,
15L, 15L, 15L, 15L, 15L, 15L, 15L, 15L, 16L, 16L, 16L, 16L, 16L,
16L, 16L, 16L, 17L, 17L, 17L, 17L, 17L, 17L, 17L, 17L, 18L, 18L,
18L, 18L, 18L, 18L, 18L, 18L, 19L, 19L, 19L, 19L, 19L, 19L, 19L,
19L, 20L, 20L, 20L, 20L, 20L, 20L, 20L, 20L, 21L, 21L, 21L, 21L,
21L, 21L, 21L, 21L, 22L, 22L, 22L, 22L, 22L, 22L, 22L, 22L, 23L,
23L, 23L, 23L, 23L, 23L, 23L, 23L, 24L, 24L, 24L, 24L, 24L, 24L,
24L, 24L), Group = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L,
3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L,
3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L,
3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L,
3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L), .Label = c("A", "B", "C"), class = "factor"),
Time_num = c(1, 2, 3, 4, 5, 6, 7, 8, 1, 2, 3, 4, 5, 6, 7,
8, 1, 2, 3, 4, 5, 6, 7, 8, 1, 2, 3, 4, 5, 6, 7, 8, 1, 2,
3, 4, 5, 6, 7, 8, 1, 2, 3, 4, 5, 6, 7, 8, 1, 2, 3, 4, 5,
6, 7, 8, 1, 2, 3, 4, 5, 6, 7, 8, 1, 2, 3, 4, 5, 6, 7, 8,
1, 2, 3, 4, 5, 6, 7, 8, 1, 2, 3, 4, 5, 6, 7, 8, 1, 2, 3,
4, 5, 6, 7, 8, 1, 2, 3, 4, 5, 6, 7, 8, 1, 2, 3, 4, 5, 6,
7, 8, 1, 2, 3, 4, 5, 6, 7, 8, 1, 2, 3, 4, 5, 6, 7, 8, 1,
2, 3, 4, 5, 6, 7, 8, 1, 2, 3, 4, 5, 6, 7, 8, 1, 2, 3, 4,
5, 6, 7, 8, 1, 2, 3, 4, 5, 6, 7, 8, 1, 2, 3, 4, 5, 6, 7,
8, 1, 2, 3, 4, 5, 6, 7, 8, 1, 2, 3, 4, 5, 6, 7, 8, 1, 2,
3, 4, 5, 6, 7, 8), Barthel_index = c(45L, 45L, 45L, 45L,
80L, 80L, 80L, 90L, 20L, 25L, 25L, 25L, 30L, 35L, 30L, 50L,
50L, 50L, 55L, 70L, 70L, 75L, 90L, 90L, 25L, 25L, 35L, 40L,
60L, 60L, 70L, 80L, 100L, 100L, 100L, 100L, 100L, 100L, 100L,
100L, 20L, 20L, 30L, 50L, 50L, 60L, 85L, 95L, 30L, 35L, 35L,
40L, 50L, 60L, 75L, 85L, 30L, 35L, 45L, 50L, 55L, 65L, 65L,
70L, 40L, 55L, 60L, 70L, 80L, 85L, 90L, 90L, 65L, 65L, 70L,
70L, 80L, 80L, 80L, 80L, 30L, 30L, 40L, 45L, 65L, 85L, 85L,
85L, 25L, 35L, 35L, 35L, 40L, 45L, 45L, 45L, 45L, 45L, 80L,
80L, 80L, 80L, 80L, 80L, 15L, 15L, 10L, 10L, 10L, 20L, 20L,
20L, 35L, 35L, 35L, 45L, 45L, 45L, 50L, 50L, 40L, 40L, 40L,
55L, 55L, 55L, 60L, 65L, 20L, 20L, 30L, 30L, 30L, 30L, 30L,
30L, 35L, 35L, 35L, 40L, 40L, 40L, 40L, 40L, 35L, 35L, 35L,
40L, 40L, 40L, 45L, 45L, 45L, 65L, 65L, 65L, 80L, 85L, 95L,
100L, 45L, 65L, 70L, 90L, 90L, 95L, 95L, 100L, 25L, 30L,
30L, 35L, 40L, 40L, 40L, 40L, 25L, 25L, 30L, 30L, 30L, 30L,
35L, 40L, 15L, 35L, 35L, 35L, 40L, 50L, 65L, 65L)), row.names = c(NA,
-192L), class = c("tbl_df", "tbl", "data.frame"))
head(mydata)
# A tibble: 6 x 4
Subject Group Time_num Barthel_index
<int> <fct> <dbl> <int>
1 1 A 1 45
2 1 A 2 45
3 1 A 3 45
4 1 A 4 45
5 1 A 5 80
6 1 A 6 80
To see if and how intercepts and slopes vary per patient I want to plot the intercepts and slopes using the lmList and interval functions.
Question 1 I don't understand why calling the lmList function () in lme4 gives me 48 warnings while the same function in nlme does not:
lmlist <-
lme4::lmList(Barthel_index ~ Time_num | Subject,
data=mydata)
> There were 48 warnings (use warnings() to see them)
lmlist <-
nlme::lmList(Barthel_index ~ Time_num | Subject,
data=mydata)
# Works fine
Question 2 I am trying to extract the confidence intervals for each regression slope, but this gives a warning and NaN for certain values:
lmlist <-
nlme::lmList(Barthel_index ~ Time_num | Subject,
data=mydata)
coefs <- coef(lmlist)
names(coefs) <- c("Intercepts", "Slopes")
intervals(lmlist)
> Warning message:
In summary.lm(el) : essentially perfect fit: summary may be unreliable
Question 3 Now that I have my new list of coefficients with confidence intervals, I'd like to plot them to see if and how much intercepts and slopes vary amongst patients. I'm trying to achieve something like the following:
Any help? Thanks.
Q1. The warnings are occurring in lme4::lmList because you're using a tibble as input: no warnings from
lme4::lmList(Barthel_index ~ Time_num | Subject,
data=as.data.frame(mydata))
(this is a harmless "infelicity" or buglet in lme4 ...)
Q2. If you look at the list of coefficients, you'll see that subject 5 is the problematic one. The data for this subject all have the same response value: thus it's not surprising that we can't compute confidence intervals on a linear regression fit ...
mydata[mydata$Subject=="5",]
# A tibble: 8 × 4
Subject Group Time_num Barthel_index
<int> <fct> <dbl> <int>
1 5 A 1 100
2 5 A 2 100
3 5 A 3 100
4 5 A 4 100
5 5 A 5 100
6 5 A 6 100
7 5 A 7 100
8 5 A 8 100
Q3 plot(intervals(lmlist))
For Q3, you could use the dotplot function in the lattice package:
require(lattice)
m0 <- lmer(Reaction ~ Days + (Days | Subject), data = sleepstudy)
dotplot(ranef(m0, condVar = TRUE))
I would like to set the thickness of geom_line to the proportion of data that follows that path, in the same way that geom_count sets the size of points based on the proportion of data that overlap at that point, or find a function that will allow me to do this.
I would also be happy if I could do this as a count rather than a proportion - either would work. I have attached the graph the grey lines represent connections between the same ID (ie. same individual in different categories), if I could set the thickness of the lines I can show the most common connection pathways.
My current code is:
ggplot(dat, aes(x = Category, y = Metric, group = ID)) +
geom_line(aes(group = ID), colour = "gray59") +
geom_count(aes(size = ..prop.., group = 1), colour = "gray59") +
scale_size_area(max_size = 5) +
theme_bw() +
geom_smooth(method = "lm", se = F, colour = "black",
aes(group = 1), linetype = "dotdash") +
xlab("Category") +
ylab("Metric") +
theme(text = element_text(size = 16))
This is the resulting graph, point size shows the proportion of data that overlaps at that point, I would like to do the same with line thickness if possible:
My searching has so far turned up nothing helpful but maybe I am searching the wrong terms. Any help would be much appreciated!
Here is the data - unsure how to upload it as a file
dat <- structure(list(IDD = structure(c(1L, 1L, 1L, 1L, 3L, 3L, 4L,
4L, 4L, 4L, 4L, 5L, 5L, 5L, 6L, 6L, 2L, 2L, 2L, 2L, 7L, 7L, 7L,
8L, 8L, 8L, 9L, 9L, 9L, 9L, 10L, 10L, 10L, 10L, 11L, 11L, 12L,
12L, 13L, 13L, 13L, 13L, 14L, 14L, 15L, 15L, 15L, 15L, 16L, 16L,
16L, 16L, 17L, 17L, 18L, 18L, 18L, 18L, 19L, 19L, 20L, 20L, 21L,
21L, 21L, 22L, 22L, 23L, 23L, 24L, 24L, 25L, 25L, 25L, 26L, 26L,
26L, 26L, 27L, 27L, 28L, 28L, 29L, 29L, 29L, 30L, 30L, 30L, 31L,
31L, 31L, 31L, 32L, 32L, 33L, 33L, 33L, 34L, 34L, 34L, 34L, 35L,
35L, 36L, 36L, 36L, 37L, 37L, 37L, 37L, 38L, 38L, 38L, 39L, 39L,
39L, 40L, 40L, 40L, 41L, 41L, 42L, 42L, 43L, 43L, 44L, 44L, 44L,
44L, 45L, 45L, 45L, 46L, 46L, 46L, 47L, 47L, 47L, 48L, 48L, 49L,
49L, 50L, 50L, 51L, 51L, 51L, 51L, 52L, 52L, 53L, 53L, 54L, 54L,
55L, 55L, 56L, 56L, 57L, 57L, 57L, 58L, 58L, 59L, 59L, 59L, 59L
), .Label = c("ID005", "ID040", "ID128", "ID131", "ID133", "ID134",
"ID147", "ID149", "ID166", "ID167", "ID175", "ID181", "ID191",
"ID198", "ID213", "ID235", "ID254", "ID257", "ID259", "ID273",
"ID279", "ID287", "ID292", "ID299", "ID300", "ID321", "ID334",
"ID348", "ID349", "ID354", "ID359", "ID377", "ID379", "ID383",
"ID390", "ID395", "ID409", "ID445", "ID467", "ID469", "ID482",
"ID492", "ID496", "ID524", "ID526", "ID527", "ID534", "ID535",
"ID538", "ID545", "ID564", "ID576", "ID578", "ID579", "ID600",
"ID610", "ID622", "ID631", "ID728"), class = "factor"), Category = c(2L,
4L, 5L, 5L, 2L, 4L, 1L, 3L, 3L, 4L, 4L, 2L, 4L, 5L, 5L, 5L, 2L,
5L, 5L, 5L, 3L, 2L, 5L, 4L, 5L, 5L, 4L, 4L, 5L, 5L, 3L, 4L, 5L,
5L, 2L, 4L, 2L, 5L, 3L, 4L, 5L, 5L, 4L, 5L, 3L, 4L, 5L, 5L, 3L,
4L, 5L, 5L, 5L, 5L, 2L, 3L, 4L, 4L, 5L, 5L, 5L, 5L, 4L, 4L, 5L,
5L, 5L, 3L, 4L, 5L, 5L, 4L, 5L, 5L, 1L, 3L, 4L, 4L, 3L, 5L, 3L,
5L, 2L, 3L, 4L, 3L, 4L, 4L, 3L, 3L, 4L, 4L, 3L, 5L, 3L, 4L, 4L,
3L, 3L, 4L, 5L, 2L, 3L, 2L, 3L, 4L, 2L, 2L, 3L, 4L, 4L, 5L, 5L,
2L, 3L, 4L, 2L, 3L, 4L, 3L, 4L, 4L, 5L, 3L, 4L, 1L, 2L, 3L, 4L,
1L, 3L, 4L, 1L, 3L, 4L, 1L, 3L, 4L, 3L, 4L, 3L, 3L, 2L, 3L, 2L,
2L, 3L, 3L, 2L, 3L, 2L, 3L, 3L, 4L, 3L, 4L, 3L, 4L, 1L, 2L, 3L,
2L, 3L, 1L, 3L, 4L, 4L), Metric = c(2, 2, 3.5, 4, 2, 1.5, 2,
2, 3, 3, 2, 2, 2, 2, 3.5, 3.5, 2, 3, 3.5, 4, 2, 2, 3, 2, 3, 3,
2, 3, 3, 2.5, 1.5, 3, 3.5, 4, 2, 2, 1.5, 2, 1.5, 2, 2, 2, 2.5,
3, 2.5, 3.5, 3.5, 3.5, 1.5, 2, 2.5, 2.5, 3.5, 4, 2, 2, 1.5, 3,
3.5, 3, 3, 3, 3.5, 2.5, 3, 3, 3, 2, 3, 2.5, 2.5, 2, 2, 2, 2,
2, 2, 2, 2.5, 2.5, 2, 3, 2.5, 2, 2.5, 2, 2.5, 2.5, 2, 2, 2.5,
3.5, 2, 2.5, 2.5, 2.5, 2.5, 2, 2, 2, 2.5, 2, 2, 1.5, 2, 2, 2.5,
2, 2, 2.5, 2, 2, 2.5, 2.5, 2.5, 3, 2.5, 2.5, 2.5, 2, 2, 2.5,
2.5, 2, 2, 2, 2, 1.5, 2, 1.5, 2, 2, 2, 1.5, 2, 2, 2.5, 2.5, 1.5,
1.5, 2, 2.5, 2, 2, 2, 2, 2.5, 2, 1.5, 2, 2.5, 2, 1.5, 1.5, 1.5,
2, 2, 2, 2, 2, 1.5, 2, 2.5, 2, 2, 2.5, 2.5)), .Names = c("IDD",
"Category", "Metric"), class = "data.frame", row.names = c(NA,
-167L))
I am a bit confused about how you want to scale different line segments, but I was able to create a proportional variable within dat and then plot that as an argument to geom_line():
dat$thickness <- with(dat, ave(Category, Metric, FUN = prop.table))
ggplot(dat, aes(x = Category, y = Metric, group = ID)) +
geom_line(aes(group = ID), colour = "gray59", size = dat$thickness) +
geom_count(aes(size = ..prop.., group = 1), colour = "gray59") +
scale_size_area(max_size = 5) +
theme_bw() +
geom_smooth(method = "lm", se = F, colour = "black",
aes(group = 1), linetype = "dotdash") +
xlab("Category") +
ylab("Metric") +
theme(text = element_text(size = 16))
Which yields this plot:
Here's a snipped of randomly selected data from my full dataframe:
canopy<-structure(list(Stage = structure(c(6L, 5L, 3L, 6L, 7L, 5L, 4L,
7L, 2L, 7L, 5L, 1L, 1L, 4L, 3L, 6L, 5L, 7L, 4L, 4L), .Label = c("milpa",
"robir", "jurup che", "pak che kor", "mehen che", "nu kux che",
"tam che"), class = c("ordered", "factor")), ID = c(44L, 34L,
18L, 64L, 54L, 59L, 28L, 51L, 11L, 56L, 33L, 1L, 7L, 25L, 58L,
48L, 36L, 51L, 27L, 66L), Sample = c(4L, 2L, 2L, 10L, 6L, 9L,
4L, 3L, 3L, 8L, 1L, 1L, 7L, 1L, 10L, 8L, 4L, 3L, 3L, 10L), Subsample = c(2L,
3L, 4L, 3L, 2L, 1L, 3L, 2L, 4L, 3L, 1L, 3L, 2L, 4L, 1L, 1L, 3L,
1L, 1L, 4L), Size..ha. = c(0.5, 0.5, 0.5, 0.5, 6, 0.5, 0.5, 0.25,
0.5, 6, 1, 1, 0.5, 2, 1, 0.5, 1, 0.25, 0.5, 2), Avg.Subsample.Canopy = c(94.8,
94.8, 97.92, 96.88, 97.14, 92.46, 93.24, 97.4, 25.64, 97.4, 94.8,
33.7, 13.42, 98.18, 85.44, 96.36, 97.4, 95.58, 85.7, 92.2), dec = c(0.948,
0.948, 0.9792, 0.9688, 0.9714, 0.9246, 0.9324, 0.974, 0.2564,
0.974, 0.948, 0.337, 0.1342, 0.9818, 0.8544, 0.9636, 0.974, 0.9558,
0.857, 0.922)), .Names = c("Stage", "ID", "Sample", "Subsample",
"Size..ha.", "Avg.Subsample.Canopy", "dec"), row.names = c(693L,
537L, 285L, 1017L, 853L, 929L, 441L, 805L, 173L, 889L, 513L,
9L, 101L, 397L, 913L, 753L, 569L, 801L, 417L, 1053L), class = "data.frame")
I am trying to code a GLMM of dec as a function of Stage and Size..ha.
The GLMM is necessary because each row represents a point Subsample measured within a larger Sample area. I am also using a binomial distribution given dec are proportional data.
I tried the model:
canopy.binomial.mod<-glmer(dec~Stage*Size..ha.+(1|Sample),family="binomial",data=canopy)
summary(canopy.binomial.mod)
but get the error:
Error in pwrssUpdate(pp, resp, tol = tolPwrss, GQmat = GQmat, compDev
= compDev, : (maxstephalfit) PIRLS step-halvings failed to reduce deviance in pwrssUpdate
I've seen online that this can be a result of needing to scale a predictor variable, so I tried:
cs. <- function(x) scale(x,scale=TRUE,center=TRUE)
canopy.binomial.mod<-glmer(dec~Stage*cs.(Size..ha.)+(1|Sample),family="binomial",data=canopy.rmna)
summary(canopy.binomial.mod)
Which doesn't seem to help. I also thought that maybe I'm asking too much of the model and it's not converging due to too many predictor variables, so let's remove the Size variable, which is of less interest to me.
canopy.binomial.mod<-glmer(dec~Stage+(1|Sample),family="binomial",data=canopy.rmna)
summary(canopy.binomial.mod)
Still no luck. Any ideas how to address this?
I'm posting this question because the very similar question here has not been answered until now.
I have been asked to plot the mean +/- SEM of my whole cohort of patients over the xyplot() that depicts the values of all patients. The data used represents intraoperative cardiovascular findings from patients undergoing surgery.
This is my data.frame called df
dput(df)
structure(list(Name = structure(c(2L, 2L, 2L, 2L, 2L, 2L, 2L,
5L, 5L, 5L, 5L, 5L, 5L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 4L, 4L, 4L, 4L, 4L, 4L, 3L, 3L, 3L, 3L, 3L,
3L, 3L, 3L), .Label = c("DE", "JS", "KG", "MK", "TG", "WT"), class = "factor"),
Time = structure(c(1L, 2L, 3L, 4L, 5L, 7L, 8L, 1L, 2L, 3L,
4L, 7L, 8L, 1L, 2L, 3L, 4L, 5L, 7L, 8L, 1L, 2L, 3L, 4L, 5L,
6L, 7L, 8L, 2L, 3L, 4L, 5L, 7L, 8L, 1L, 2L, 3L, 4L, 5L, 6L,
7L, 8L), .Label = c("T1", "T2", "T3", "T4", "T5", "T6", "T7",
"T8"), class = "factor"), Dobut = structure(c(1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 1L), .Label = c("No", "Yes"
), class = "factor"), DobutDose = c(NA, NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
4L, 6L, 8L, 8L, 8L, 8L, 8L, NA, NA, NA, NA, NA, NA, NA, NA,
NA, NA, NA, 5L, 5L, NA), CI = c(1.4, 2.3, 1.3, 1.8, 2.1,
2, 2.1, 2.1, 2.3, 1.9, 1.6, 2, 2.4, 2.7, 2.6, 2.7, 2.6, 2.3,
2.4, 2.6, 0.9, 2.5, 2.1, 1.6, 1.5, 1.8, 2, 2, 1.9, 2.1, 2.3,
2, 2.4, 2.3, 2.6, 2.4, 2, 2.2, 1.6, 2.1, 2.5, 2.8), SvO2 = c(57L,
65L, 47L, 45L, 51L, 60L, 56L, 70L, 85L, 75L, 79L, 82L, 73L,
77L, 78L, 73L, 71L, 73L, 80L, 74L, 41L, 66L, 51L, 51L, 49L,
54L, 68L, 48L, 80L, 70L, 71L, 69L, 74L, 79L, 77L, 77L, 75L,
74L, 70L, 79L, 80L, 79L), SVRI = c(4000L, 1983L, 4000L, 2444L,
1981L, 2120L, 2514L, 2971L, 2157L, 3747L, 4300L, 3200L, 2867L,
1778L, 1169L, 1215L, 1262L, 1461L, 1600L, 1692L, 4978L, 1760L,
2019L, 2650L, 2827L, 2356L, 1800L, 2840L, 2063L, 2248L, 1948L,
2160L, 1733L, 2296L, 2677L, 2100L, 2640L, 2655L, 3950L, 2210L,
2848L, 2543L), MAP = c(80L, 65L, 86L, 74L, 67L, 65L, 74L,
90L, 70L, 90L, 96L, 94L, 100L, 82L, 60L, 61L, 62L, 62L, 69L,
71L, 70L, 71L, 77L, 73L, 75L, 77L, 61L, 85L, 65L, 74L, 70L,
67L, 69L, 74L, 92L, 71L, 88L, 93L, 89L, 79L, 97L, 97L), CVP = c(10L,
8L, 21L, 19L, 15L, 12L, 8L, 12L, 8L, 11L, 10L, 14L, 14L,
22L, 22L, 20L, 21L, 20L, 21L, 16L, 14L, 16L, 24L, 20L, 22L,
24L, 16L, 14L, 16L, 15L, 14L, 13L, 17L, 8L, 5L, 8L, 22L,
20L, 20L, 21L, 8L, 8L), PAP = c(23L, 22L, 36L, 36L, 34L,
32L, 22L, 33L, 28L, 36L, 36L, 40L, 37L, 37L, 40L, 35L, 35L,
34L, 38L, 36L, 45L, 43L, 55L, 49L, 52L, 54L, 43L, 47L, 27L,
25L, 23L, 22L, 28L, 21L, 20L, 25L, 33L, 33L, 38L, 35L, 33L,
29L), PCWP = c(15L, 11L, 28L, 26L, 23L, 21L, 11L, 26L, NA,
NA, 25L, 25L, NA, 27L, NA, NA, NA, NA, NA, NA, 30L, NA, NA,
NA, NA, NA, NA, NA, 19L, NA, NA, NA, NA, NA, 16L, NA, NA,
NA, NA, NA, NA, NA)), .Names = c("Name", "Time", "Dobut",
"DobutDose", "CI", "SvO2", "SVRI", "MAP", "CVP", "PAP", "PCWP"
), class = "data.frame", row.names = c(NA, -42L))
Now the first xyplot I made for the variable CI looks like this
require(lattice)
xyplot(CI~Time, groups=Name, data=df, ty=c("l", "p"),
+ ,xlab="Measurement Time Point",
ylab=expression("CI"~(l/min/m^"2")), main="Cardiac Index")
Now I was able to add the mean (black line) of the whole cohort, by doing the following
xyplot(CI~Time, groups=Name, data=df, ty=c("l", "p"),
panel = function(x, y, ...) {
panel.xyplot(x, y, ...)
panel.linejoin(x, y, horizontal = FALSE,..., col="black", lty=1, lwd=4)
}
,xlab="Measurement Time Point",
ylab=expression("CI"~(l/min/m^"2")), main="Cardiac Index")
Now I'd like to add +/- SE to the mean as a line above/below the mean, but nowhere can I find how to do this.
What I can do is using the latticeExtra package is add the loess line +/- SE, as below, but that's not the correct mathematical function I'm looking for. I've left the mean line in there to illustrate the difference between the two.
require(latticeExtra)
xyplot(CI~Time, groups=Name, data=df, ty=c("l", "p"),
+ panel = function(x, y, ...) {
+ panel.xyplot(x, y, ...)
+ panel.linejoin(x, y, horizontal = FALSE,..., col="black", lty=1, lwd=4)
+ panel.smoother(x,y,se=TRUE, col.se="grey")
+ }
+ ,xlab="Measurement Time Point",
ylab=expression("CI"~(l/min/m^"2")), main="Cardiac Index")
I have performed an extensive search through SO and the internet, but I haven't been able to find the right function to do this.
Help is very much appreciated! Thanks.
You could create your own panel function to plot a +/- SD region. For example
#new panel function
panel.se <- function(x, y, col.se=plot.line$col, alpha.se=.25, ...) {
plot.line <- trellis.par.get("plot.line")
xs <- if(is.factor(x)) {
factor(c(levels(x) , rev(levels(x))), levels=levels(x))
} else {
xx <- sort(unique(x))
c(xx, rev(xx))
}
means <- tapply(y,x, mean, na.rm=T)
vars <- tapply(y,x, var, na.rm=T)
Ns <- tapply(!is.na(y),x, sum)
ses <- sqrt(vars/Ns)
panel.polygon(xs, c(means+ses, rev(means-ses)), col=col.se, alpha=alpha.se)
}
and then you can use it like
#include new panel function
xyplot(CI~Time, groups=Name, data=df, ty=c("l", "p"),
panel = function(x, y, ...) {
panel.se(x,y, col.se="grey")
panel.xyplot(x, y, ...)
panel.linejoin(x, y, horizontal = FALSE,..., col="black", lty=1, lwd=4)
}
,xlab="Measurement Time Point",
ylab=expression("CI"~(l/min/m^"2")), main="Cardiac Index")
which results in
I am running a logit mixed-effects model using glmer() in package lme4.
The experiment used a within-subjects within-items design with Subjects and Items as crossed random effects.
My problem: different versions of R and lme4 (run on different OS X) produce different standard errors estimates for the fixed effects, and consequently, different significance results.
Here is a subset of my data (data from the last two subjects):
structure(list(SubjN = c(87L, 87L, 87L, 87L, 87L, 87L, 87L, 87L,
87L, 87L, 87L, 87L, 87L, 87L, 87L, 87L, 87L, 87L, 87L, 87L, 87L,
87L, 87L, 87L, 88L, 88L, 88L, 88L, 88L, 88L, 88L, 88L, 88L, 88L,
88L, 88L, 88L, 88L, 88L, 88L, 88L, 88L, 88L, 88L, 88L, 88L, 88L,
88L), Items = structure(c(3L, 10L, 11L, 5L, 1L, 12L, 2L, 6L,
9L, 6L, 3L, 4L, 8L, 11L, 12L, 7L, 8L, 2L, 7L, 10L, 9L, 5L, 1L,
4L, 10L, 3L, 5L, 11L, 12L, 1L, 2L, 6L, 9L, 6L, 3L, 4L, 8L, 11L,
12L, 7L, 2L, 8L, 10L, 7L, 9L, 5L, 1L, 4L), .Label = c("a", "c",
"k", "f", "g", "i", "d", "l", "e", "j", "b", "h"), class = "factor"),
IV1 = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L), .Label = c("N", "L", "P"
), class = "factor"), DV = c(0, 0, 1, 0, 1, 0, 1, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 1, 1,
0, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0),
IV1.h = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L), contrasts = structure(c(-1,
0.5, 0.5, 0, -0.5, 0.5), .Dim = c(3L, 2L), .Dimnames = list(
c("N", "L", "P"), c("N_vs_L&P", "L_vs_P"))), .Label = c("N",
"L", "P"), class = "factor"), N_vs_LP = c(-1, -1, -1, -1,
-1, -1, -1, -1, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5,
0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, -1, -1, -1, -1, -1, -1,
-1, -1, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5,
0.5, 0.5, 0.5, 0.5, 0.5, 0.5), L_vs_P = c(0, 0, 0, 0, 0,
0, 0, 0, -0.5, -0.5, -0.5, -0.5, -0.5, -0.5, -0.5, -0.5,
0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0, 0, 0, 0, 0, 0,
0, 0, -0.5, -0.5, -0.5, -0.5, -0.5, -0.5, -0.5, -0.5, 0.5,
0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5)), .Names = c("SubjN",
"Items", "IV1", "DV", "IV1.h", "N_vs_LP", "L_vs_P"), row.names = c("3099",
"3100", "3101", "3102", "3103", "3104", "3119", "3120", "3107",
"3108", "3109", "3110", "3097", "3098", "3105", "3106", "3115",
"3116", "3117", "3118", "3111", "3112", "3113", "3114", "3147",
"3148", "3149", "3150", "3151", "3152", "3167", "3168", "3155",
"3156", "3157", "3158", "3145", "3146", "3153", "3154", "3163",
"3164", "3165", "3166", "3159", "3160", "3161", "3162"), class = "data.frame")
Each subject was tested on 24 trials on 3 different conditions (factor IV1, levels: N, L, P).
I recorded whether they produced a target linguistic structure (DV == 1) or not (DV == 0).
In the analysis, I only included those subjects who produced the target structure at least one.
Nonetheless, most of them produced the target structure only on very few occasion. This is the proportion of DV == 1 produced by each subject in each condition:
library(plyr)
#dput(ddply(mydata, .(SubjN, IV1), summarise, l = length(DV), y = round(mean(DV),2)))
structure(list(SubjN = c(1L, 1L, 1L, 2L, 2L, 2L, 3L, 3L, 3L,
4L, 4L, 4L, 5L, 5L, 5L, 6L, 6L, 6L, 7L, 7L, 7L, 8L, 8L, 8L, 9L,
9L, 9L, 10L, 10L, 10L, 11L, 11L, 11L, 12L, 12L, 12L, 13L, 13L,
13L, 14L, 14L, 14L, 15L, 15L, 15L, 16L, 16L, 16L, 17L, 17L, 17L,
18L, 18L, 18L, 19L, 19L, 19L, 20L, 20L, 20L, 21L, 21L, 21L, 22L,
22L, 22L, 23L, 23L, 23L, 24L, 24L, 24L, 25L, 25L, 25L, 26L, 26L,
26L, 27L, 27L, 27L, 28L, 28L, 28L, 29L, 29L, 29L, 30L, 30L, 30L,
31L, 31L, 31L, 32L, 32L, 32L, 33L, 33L, 33L, 34L, 34L, 34L, 35L,
35L, 35L, 36L, 36L, 36L, 37L, 37L, 37L, 38L, 38L, 38L, 39L, 39L,
39L, 40L, 40L, 40L, 41L, 41L, 41L, 42L, 42L, 42L, 43L, 43L, 43L,
44L, 44L, 44L, 45L, 45L, 45L, 46L, 46L, 46L, 47L, 47L, 47L, 48L,
48L, 48L, 49L, 49L, 49L, 50L, 50L, 50L, 51L, 51L, 51L, 52L, 52L,
52L, 53L, 53L, 53L, 54L, 54L, 54L, 55L, 55L, 55L, 56L, 56L, 56L,
57L, 57L, 57L, 58L, 58L, 58L, 59L, 59L, 59L, 60L, 60L, 60L, 61L,
61L, 61L, 62L, 62L, 62L, 63L, 63L, 63L, 64L, 64L, 64L, 65L, 65L,
65L, 66L, 66L, 66L, 67L, 67L, 67L, 68L, 68L, 68L, 69L, 69L, 69L,
70L, 70L, 70L, 71L, 71L, 71L, 72L, 72L, 72L, 73L, 73L, 73L, 74L,
74L, 74L, 75L, 75L, 75L, 76L, 76L, 76L, 77L, 77L, 77L, 78L, 78L,
78L, 79L, 79L, 79L, 80L, 80L, 80L, 81L, 81L, 81L, 82L, 82L, 82L,
83L, 83L, 83L, 84L, 84L, 84L, 85L, 85L, 85L, 86L, 86L, 86L, 87L,
87L, 87L, 88L, 88L, 88L), IV1 = structure(c(1L, 2L, 3L, 1L, 2L,
3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L,
1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L,
2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L,
3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L,
1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L,
2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L,
3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L,
1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L,
2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L,
3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L,
1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L,
2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L,
3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L,
1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L,
2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L,
3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L,
1L, 2L, 3L), .Label = c("N", "L", "P"), class = "factor"), l = c(8L,
8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L,
8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L,
8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L,
8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 7L, 8L, 7L, 8L, 8L,
8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L,
8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 7L, 8L, 8L, 8L, 8L, 8L, 8L,
7L, 8L, 6L, 7L, 7L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L,
8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L,
8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 7L, 8L, 8L, 7L, 7L, 8L, 7L, 8L,
8L, 7L, 8L, 8L, 7L, 8L, 8L, 7L, 8L, 8L, 7L, 8L, 8L, 7L, 8L, 8L,
8L, 8L, 8L, 8L, 8L, 8L, 8L, 6L, 8L, 4L, 8L, 8L, 8L, 8L, 8L, 8L,
8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 7L, 8L, 8L, 8L, 8L,
8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L,
8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 7L,
8L, 8L, 7L, 8L, 8L, 7L, 8L, 8L, 7L, 8L, 8L, 7L, 8L, 8L, 7L, 8L,
8L, 7L, 8L, 8L, 7L, 8L, 8L, 7L, 8L, 8L, 7L, 8L, 7L, 8L, 8L, 8L,
8L, 8L, 8L, 8L, 8L, 8L, 8L), y = c(1, 0.88, 1, 0.5, 0.25, 0.62,
0, 0, 0.25, 0, 0.25, 0, 0.12, 0, 0, 0, 0.12, 0, 0, 0.12, 0.12,
0, 0, 0.12, 0.38, 0, 0.25, 0, 0.12, 0, 0.12, 0, 0.25, 0, 0, 0.12,
0.5, 0.25, 0.5, 0, 0, 0.12, 0, 0.25, 0.12, 0, 0, 0.12, 0, 0.12,
0, 0, 0.12, 0.12, 0.12, 0.62, 0, 0, 0.5, 0.25, 1, 0.88, 1, 0,
0, 0.12, 0, 0.12, 0.12, 0.12, 0.12, 0, 0.62, 0.62, 0.38, 0.5,
0.88, 0.12, 0.12, 0, 0, 0.12, 0.12, 0, 0, 0.12, 0, 0, 0.12, 0,
0, 0.12, 0, 0, 0.25, 0, 0, 0.14, 0, 0.5, 0.57, 0.29, 0, 0.12,
0, 0, 0.12, 0, 0.25, 0.5, 0.25, 0, 0.12, 0.12, 0.25, 0, 0.38,
0, 0, 0.12, 0, 0, 1, 0.25, 0.12, 0.25, 0, 0.12, 0.12, 0, 0, 0.12,
0, 0, 0.12, 0.12, 0, 0, 0.12, 0, 0.14, 0.14, 0.12, 0, 0.12, 0,
0, 0.12, 0.12, 0, 1, 0.88, 1, 0, 0.12, 0, 0.12, 0, 0, 0.12, 0,
0.12, 0, 0, 0.12, 0.12, 0.12, 0.12, 1, 1, 1, 0.12, 0, 0, 0.12,
0.38, 0, 0, 0.12, 0, 0, 0, 0.5, 0.5, 0, 0.25, 0, 0.12, 0.29,
0, 0, 0.38, 0, 0, 0.62, 0.5, 0, 0.12, 0, 0.12, 0.12, 0.25, 0.12,
0.25, 0.12, 0, 0.12, 0, 0, 0.12, 0, 0, 0.12, 0, 0.12, 0.12, 0,
0.12, 0.12, 0, 0, 0.12, 0.12, 0.12, 0, 0.38, 0.12, 0.57, 0, 0.12,
0, 0, 0.12, 0, 0, 0.12, 0, 0, 0.12, 0.14, 0.88, 0.88, 0.86, 0,
0, 0.14, 0, 0.12, 0.14, 0, 0.12, 0, 0, 0, 0.12, 0, 0, 0.12, 0.38,
0, 0, 0.5, 0.12, 0)), .Names = c("SubjN", "IV1", "l", "y"), row.names = c(NA,
-264L), class = "data.frame")
I run the following model including IV1 as fixed effect with helmert-contrast coding;
first contrast: N vs. L & P, second contrast: L vs. P.
m1 <- glmer(DV ~ IV1.h + (1 + IV1.h|SubjN) + (1|Items) + (0 + N_vs_LP|Items) + (0 + L_vs_P|Items), family ='binomial', mydata)
The model does not allow for the correlation between the by-Items random variables (I did this by creating separate slopes for the two contrasts), since when correlation was allowed they were perfectly correlated (which I interpreted as a sign of over-parametrization).
1) Results using
os x 10.8.5 mountain lion
R version 3.0.2 (2013-09-25)
lme4_1.0-5
(the original analysis I run)
Generalized linear mixed model fit by maximum likelihood ['glmerMod']
Family: binomial ( logit )
Formula: DV ~ IV1.h + (1 + N_vs_LP + L_vs_P | SubjN) + (1 | Items) + (0 + N_vs_LP | Items) + (0 + L_vs_P | Items)
Data: mydata
AIC BIC logLik deviance
1492.5408 1560.2050 -734.2704 1468.5408
Random effects:
Groups Name Variance Std.Dev. Corr
SubjN (Intercept) 2.3885505 1.54549
N_vs_LP 0.4394195 0.66289 -0.69
L_vs_P 1.9287559 1.38880 0.04 0.08
Items (Intercept) 0.0531518 0.23055
Items.1 N_vs_LP 0.0001950 0.01396
Items.2 L_vs_P 0.0003619 0.01902
Number of obs: 2077, groups: SubjN, 88; Items, 12
Fixed effects:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -2.2998 0.1964 -11.710 < 2e-16 ***
IV1.hN_vs_L&P 0.3704 0.1378 2.689 0.00717 **
IV1.hL_vs_P 0.2060 0.2320 0.888 0.37459
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Correlation of Fixed Effects:
(Intr) IV1.N_
IV1.hN_vs_L&P -0.388
IV1.hL_vs_P 0.014 0.019
2) Results using:
OS X 10.9.4 Mavericks
R version 3.1.1 (2014-07-10)
lme4_1.1-7
optimizer 'bobyqa'
Generalized linear mixed model fit by maximum likelihood (Laplace Approximation) ['glmerMod']
Family: binomial ( logit )
Formula: DV ~ IV1.h + (1 + N_vs_LP + L_vs_P | SubjN) + (1 | Items) + (0 +
N_vs_LP | Items) + (0 + L_vs_P | Items)
Data: mydata
Control: glmerControl(optimizer = "bobyqa")
AIC BIC logLik deviance df.resid
1492.5 1560.2 -734.3 1468.5 2065
Scaled residuals:
Min 1Q Median 3Q Max
-2.4174 -0.3364 -0.2595 -0.1706 4.6028
Random effects:
Groups Name Variance Std.Dev. Corr
SubjN (Intercept) 2.38791 1.5453
N_vs_LP 0.43935 0.6628 -0.69
L_vs_P 1.92629 1.3879 0.04 0.07
Items (Intercept) 0.05319 0.2306
Items.1 N_vs_LP 0.00000 0.0000
Items.2 L_vs_P 0.00000 0.0000
Number of obs: 2077, groups: SubjN, 88; Items, 12
Fixed effects:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -2.2998 0.2095 -10.975 <2e-16 ***
IV1.hN_vs_L&P 0.3703 0.1892 1.958 0.0503 .
IV1.hL_vs_P 0.2063 0.2679 0.770 0.4413
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Correlation of Fixed Effects:
(Intr) IV1.N_
IV1.hN__L&P -0.379
IV1.hL_vs_P -0.001 0.003
I really don't know which outcome I should trust. Any help would be very much appreciated.
Ps. Sorry if something is not clear - it's my first post :)
Thanks very much!
From lme4's NEWS file, for version 1.1-4
Standard errors of fixed effects are now computed from the approximate Hessian by default (see the use.hessian argument in vcov.merMod); this gives better (correct) answers when the estimates of the random- and fixed-effect parameters are correlated (Github #47)
The description of the problem is here
You should be able to retrieve the old standard errors from the newer (1.1-7) model by sqrt(diag(vcov(fitted_model,use.hessian=FALSE))), but the new version is more likely to be correct.
For more precise confidence intervals/p values, you can do a likelihood ratio test (use anova to compare nested models) and/or compute the profile confidence intervals with confint(fitted_model,which="beta_").