r geom_bar reorder layers of bars by values - r

I have produced a bar chart that shows cumulative totals over periods of months for various programs using the following data structure and code:
library(dplyr)
data_totals <- data_long %>%
group_by(Period, Program) %>%
arrange(Period, Program) %>%
ungroup() %>%
group_by(Program) %>%
mutate(Running_Total = cumsum(Value))
dput(data_totals)
structure(list(Period = structure(c(1L, 1L, 1L, 1L, 1L, 2L, 2L,
2L, 2L, 2L, 3L, 3L, 3L, 3L, 3L, 4L, 4L, 4L, 4L, 4L, 5L, 5L, 5L,
5L, 5L, 6L, 6L, 6L, 6L, 6L, 7L, 7L, 7L, 7L, 7L, 8L, 8L, 8L, 8L,
8L, 9L, 9L, 9L, 9L, 9L, 10L, 10L, 10L, 10L, 10L, 11L, 11L, 11L,
11L, 11L, 12L, 12L, 12L, 12L, 12L), .Label = c("2018-04", "2018-05",
"2018-06", "2018-07", "2018-08", "2018-09", "2018-10", "2018-11",
"2018-12", "2019-01", "2019-02", "2019-03", "Apr-Mar 2019"), class = "factor"),
Program = structure(c(1L, 2L, 3L, 4L, 5L, 1L, 2L, 3L, 4L,
5L, 1L, 2L, 3L, 4L, 5L, 1L, 2L, 3L, 4L, 5L, 1L, 2L, 3L, 4L,
5L, 1L, 2L, 3L, 4L, 5L, 1L, 2L, 3L, 4L, 5L, 1L, 2L, 3L, 4L,
5L, 1L, 2L, 3L, 4L, 5L, 1L, 2L, 3L, 4L, 5L, 1L, 2L, 3L, 4L,
5L, 1L, 2L, 3L, 4L, 5L), .Label = c("A",
"B", "C", "D",
"E"), class = "factor"), Value = c(5597,
0, 0, 0, 1544, 0, 0, 0, 0, 1544, 0, 0, 0, 0, 1544, 0, 0,
850, 0, 1544, 0, 0, 0, 0, 1544, 0, 0, 0, 0, 1544, 0, 0, 0,
0, 1544, 0, 0, 0, 0, 1544, 0, 0, 0, 0, 1544, 0, 0, 0, 0,
1544, 0, 0, 0, 0, 1544, 0, 0, 0, 0, 1544), Running_Total = c(5597,
0, 0, 0, 1544, 5597, 0, 0, 0, 3088, 5597, 0, 0, 0, 4632,
5597, 0, 850, 0, 6176, 5597, 0, 850, 0, 7720, 5597, 0, 850,
0, 9264, 5597, 0, 850, 0, 10808, 5597, 0, 850, 0, 12352,
5597, 0, 850, 0, 13896, 5597, 0, 850, 0, 15440, 5597, 0,
850, 0, 16984, 5597, 0, 850, 0, 18528)), .Names = c("Period",
"Program", "Value", "Running_Total"), class = c("grouped_df",
"tbl_df", "tbl", "data.frame"), row.names = c(NA, -60L), vars = "Program", labels = structure(list(
Program = structure(1:5, .Label = c("A",
"B", "C", "D",
"E"), class = "factor")), class = "data.frame", row.names = c(NA,
-5L), vars = "Program", drop = TRUE, .Names = "Program"), indices = list(
c(0L, 5L, 10L, 15L, 20L, 25L, 30L, 35L, 40L, 45L, 50L, 55L
), c(1L, 6L, 11L, 16L, 21L, 26L, 31L, 36L, 41L, 46L, 51L,
56L), c(2L, 7L, 12L, 17L, 22L, 27L, 32L, 37L, 42L, 47L, 52L,
57L), c(3L, 8L, 13L, 18L, 23L, 28L, 33L, 38L, 43L, 48L, 53L,
58L), c(4L, 9L, 14L, 19L, 24L, 29L, 34L, 39L, 44L, 49L, 54L,
59L)), drop = TRUE, group_sizes = c(12L, 12L, 12L, 12L, 12L
), biggest_group_size = 12L)
# reorder the groups descending so that the lowest total will be on layers from front to back
reorder(data_totals$Program, -data_totals$Running_Total)
ggplot(data = data_totals, aes(x = Period, y = Running_Total)) +
geom_bar(aes(color = Program, group = Program, fill = Program),
stat = "identity", position = "identity", alpha = 1.0)
It works in that it creates the graph with all the proper data, but the smaller Running_Totals are obscured by the larger ones.
I get the following error message as well:
Warning message:
The plyr::rename operation has created duplicates for the following name(s): (`colour`)
Even though I do not have the plyr package loaded.
I can see all the Running_Totals if I set the alpha to 0.5
Running_Total for each Program by Period, alpha = 0.5:
How can I get the layers ordered so that the smallest values are on the front most layers working back toward the highest values?

The way I was trying to represent the data in the original question was flawed.
There is no advantage to having the Program with the maximum value for each Period be the top of the bar.
A more illustrative solution is to have a stacked bar, with labels indicating the contribution of each Program to the overall value of each Period:
ggplot(data = data_totals[which(data_totals$Running_Total > 0),], aes(x = Period, y = Running_Total, fill = Program)) +
geom_bar(aes(color = Program, group = Program, fill = Program), stat = "identity", position = "stack", alpha = 1.0) +
geom_text(aes(label = Running_Total), position = position_stack(vjust = 0.5))
I used [which(data_totals$Running_Total > 0),] to eliminate any "0" bars and labels.

Related

Make ggplot2 heatmap with different colors for values over/under thresholds

I want to make a table with cells highlighted according to their value, Perc_Diff in this case. I want values between -100 and +100 to follow a scale_fill_gradient2 pattern (see code below), but values <= -100 to be blue, and values of >= 100 to be yellow. Here is the data (some of it has been changed to illustrate my question).
plot30 <- structure(list(Station = structure(c(20L, 20L, 20L, 20L, 20L,
20L, 15L, 15L, 15L, 15L, 15L, 15L, 25L, 25L, 25L, 25L, 25L, 25L,
6L, 6L, 6L, 6L, 6L, 6L, 36L, 36L, 36L, 36L, 36L, 36L, 13L, 13L,
13L, 13L, 13L, 13L, 18L, 18L, 18L, 18L, 18L, 18L, 45L, 45L, 45L,
45L, 45L, 45L, 29L, 29L, 29L, 29L, 29L, 29L, 7L, 7L, 7L, 7L,
7L, 7L, 39L, 39L, 39L, 39L, 39L, 39L, 33L, 33L, 33L, 33L, 33L,
33L, 24L, 24L, 24L, 24L, 24L, 24L, 41L, 41L, 41L, 41L, 41L, 41L,
22L, 22L, 22L, 22L, 22L, 22L, 28L, 28L, 28L, 28L, 28L, 28L, 32L,
32L, 32L, 32L, 32L, 32L, 23L, 23L, 23L, 23L, 23L, 23L, 3L, 3L,
3L, 3L, 3L, 3L, 31L, 31L, 31L, 31L, 31L, 31L, 34L, 34L, 34L,
34L, 34L, 34L, 8L, 8L, 8L, 8L, 8L, 8L, 27L, 27L, 27L, 27L, 27L,
27L, 37L, 37L, 37L, 37L, 37L, 37L, 5L, 5L, 5L, 5L, 5L, 5L, 19L,
19L, 19L, 19L, 19L, 19L, 44L, 44L, 44L, 44L, 44L, 44L, 17L, 17L,
17L, 17L, 17L, 17L, 43L, 43L, 43L, 43L, 43L, 43L, 40L, 40L, 40L,
40L, 40L, 40L, 9L, 9L, 9L, 9L, 9L, 9L, 4L, 4L, 4L, 4L, 4L, 4L,
30L, 30L, 30L, 30L, 30L, 30L, 38L, 38L, 38L, 38L, 38L, 38L, 12L,
12L, 12L, 12L, 12L, 12L, 35L, 35L, 35L, 35L, 35L, 35L, 14L, 14L,
14L, 14L, 14L, 14L, 10L, 10L, 10L, 10L, 10L, 10L, 11L, 11L, 11L,
11L, 11L, 11L, 42L, 42L, 42L, 42L, 42L, 42L, 26L, 26L, 26L, 26L,
26L, 26L, 21L, 21L, 21L, 21L, 21L, 21L, 16L, 16L, 16L, 16L, 16L,
16L, 2L, 2L, 2L, 2L, 2L, 2L, 1L, 1L, 1L, 1L, 1L, 1L), .Label = c("WTES2",
"WRIS2", "WEBS2", "VGAS2", "UNIS2", "TIMS2", "STMS2", "SHSS2",
"SFWS2", "SFSS2", "RDFS2", "RBMS2", "PSTS2", "PRFS2", "ORRS2",
"OCMS2", "OAKS2", "NISS2", "MHTS2", "MCTS2", "MCMS2", "MATS2",
"LMNS2", "LLAS2", "JWLS2", "HMLS2", "HIHS2", "GTBS2", "GOTS2",
"FLNS2", "FLKS2", "EGBS2", "EDTS2", "CTWS2", "CTNS2", "CPTS2",
"CLRS2", "BWLS2", "BTNS2", "BTCS2", "BSNS2", "BKMS2", "BFMS2",
"AURS2", "ATRS2"), class = "factor"), Comp_Data = structure(c(1L,
2L, 3L, 4L, 5L, 6L, 1L, 2L, 3L, 4L, 5L, 6L, 1L, 2L, 3L, 4L, 5L,
6L, 1L, 2L, 3L, 4L, 5L, 6L, 1L, 2L, 3L, 4L, 5L, 6L, 1L, 2L, 3L,
4L, 5L, 6L, 1L, 2L, 3L, 4L, 5L, 6L, 1L, 2L, 3L, 4L, 5L, 6L, 1L,
2L, 3L, 4L, 5L, 6L, 1L, 2L, 3L, 4L, 5L, 6L, 1L, 2L, 3L, 4L, 5L,
6L, 1L, 2L, 3L, 4L, 5L, 6L, 1L, 2L, 3L, 4L, 5L, 6L, 1L, 2L, 3L,
4L, 5L, 6L, 1L, 2L, 3L, 4L, 5L, 6L, 1L, 2L, 3L, 4L, 5L, 6L, 1L,
2L, 3L, 4L, 5L, 6L, 1L, 2L, 3L, 4L, 5L, 6L, 1L, 2L, 3L, 4L, 5L,
6L, 1L, 2L, 3L, 4L, 5L, 6L, 1L, 2L, 3L, 4L, 5L, 6L, 1L, 2L, 3L,
4L, 5L, 6L, 1L, 2L, 3L, 4L, 5L, 6L, 1L, 2L, 3L, 4L, 5L, 6L, 1L,
2L, 3L, 4L, 5L, 6L, 1L, 2L, 3L, 4L, 5L, 6L, 1L, 2L, 3L, 4L, 5L,
6L, 1L, 2L, 3L, 4L, 5L, 6L, 1L, 2L, 3L, 4L, 5L, 6L, 1L, 2L, 3L,
4L, 5L, 6L, 1L, 2L, 3L, 4L, 5L, 6L, 1L, 2L, 3L, 4L, 5L, 6L, 1L,
2L, 3L, 4L, 5L, 6L, 1L, 2L, 3L, 4L, 5L, 6L, 1L, 2L, 3L, 4L, 5L,
6L, 1L, 2L, 3L, 4L, 5L, 6L, 1L, 2L, 3L, 4L, 5L, 6L, 1L, 2L, 3L,
4L, 5L, 6L, 1L, 2L, 3L, 4L, 5L, 6L, 1L, 2L, 3L, 4L, 5L, 6L, 1L,
2L, 3L, 4L, 5L, 6L, 1L, 2L, 3L, 4L, 5L, 6L, 1L, 2L, 3L, 4L, 5L,
6L, 1L, 2L, 3L, 4L, 5L, 6L, 1L, 2L, 3L, 4L, 5L, 6L), .Label = c("Total_QPE",
"Stn_PP", "Diff_in", "Perc_Diff", "Frz_Days", "Miss_Days"), class = "factor"),
stuff = c(3.831, 3.07, -0.761, -24.79, 0, 0, 3.075, 1.81,
-1.265, -69.89, 0, 5, 2.941, 2.2, -0.741, -33.68, 0, 0, 2.907,
2.33, -0.577, -24.76, 0, 0, 2.319, 0.36, -1.959, -100, 0,
19, 2.241, 1.24, -1.001, -80.73, 0, 0, 1.926, 1.23, -0.696,
-56.59, 0, 0, 1.91, 1.07, -0.84, -78.5, 0, 0, 1.877, 1.47,
-0.407, -27.69, 0, 0, 1.867, 1.35, -0.517, -38.3, 0, 0, 1.773,
1.22, -0.553, -45.33, 0, 0, 1.773, 1.43, -0.343, -23.99,
0, 0, 1.717, 1.35, -0.367, -27.19, 0, 0, 1.659, 0.71, -0.949,
-100, 0, 0, 1.481, 0.5, -0.981, -100, 0, 0, 1.401, 0.23,
-1.171, -100, 0, 2, 1.377, 0.08, -1.297, -100, 0, 0, 1.296,
0.97, -0.326, -33.61, 0, 0, 1.263, 0.8, -0.463, -57.88, 0,
0, 1.255, 1.06, -0.195, -18.4, 0, 0, 1.212, 0.63, -0.582,
-92.38, 0, 0, 1.203, 0.71, -0.493, -69.44, 0, 0, 1.189, 0.01,
-1.179, -100, 0, 0, 1.18, 0.53, -0.65, -100, 0, 0, 1.144,
0.42, -0.724, -100, 0, 0, 1.105, 0.65, -0.455, -70, 0, 0,
1.062, 0.62, -0.442, -71.29, 0, 0, 1.043, 0.45, -0.593, -100,
0, 0, 1.032, 0.68, -0.352, -51.76, 0, 13, 0.99, 0.66, -0.33,
-50, 0, 0, 0.985, 0.67, -0.315, -47.01, 0, 0, 0.972, 0.7,
-0.272, -38.86, 0, 0, 0.946, 0.5, -0.446, -89.2, 0, 0, 0.916,
0.63, -0.286, -45.4, 0, 0, 0.87, 0.55, -0.32, -58.18, 0,
5, 0.854, 0.6, -0.254, -42.33, 0, 0, 0.825, 0.56, -0.265,
-47.32, 0, 0, 0.816, 0.74, -0.076, -10.27, 0, 0, 0.808, 0.24,
-0.568, -100, 0, 6, 0.765, 0.577, -0.188, -32.58, 0, 4, 0.723,
0.79, 0.067, 8.48, 0, 0, 0.713, 0.66, -0.053, -8.03, 0, 0,
0.647, 0.79, 0.143, 18.1, 0, 0, 0.452, 0.4, -0.052, -13,
0, 0, 0.328, 0.5, 0.172, 34.4, 0, 0), Perc_Diff = c(0, 0,
0, -24.79, 0, 0, 0, 0, 0, -69.89, 0, 0, 0, 0, 0, -33.68,
0, 0, 0, 0, 0, -24.76, 0, 0, 0, 0, 0, -100, 0, 0, 0, 0, 0,
-80.73, 0, 0, 0, 0, 0, -56.59, 0, 0, 0, 0, 0, -78.5, 0, 0,
0, 0, 0, -27.69, 0, 0, 0, 0, 0, -38.3, 0, 0, 0, 0, 0, -45.33,
0, 0, 0, 0, 0, -23.99, 0, 0, 0, 0, 0, -27.19, 0, 0, 0, 0,
0, -100, 0, 0, 0, 0, 0, -100, 0, 0, 0, 0, 0, -100, 0, 0,
0, 0, 0, -100, 0, 0, 0, 0, 0, -33.61, 0, 0, 0, 0, 0, -57.88,
0, 0, 0, 0, 0, -18.4, 0, 0, 0, 0, 0, -92.38, 0, 0, 0, 0,
0, -69.44, 0, 0, 0, 0, 0, -100, 0, 0, 0, 0, 0, -100, 0, 0,
0, 0, 0, -100, 0, 0, 0, 0, 0, -70, 0, 0, 0, 0, 0, -71.29,
0, 0, 0, 0, 0, -100, 0, 0, 0, 0, 0, -51.76, 0, 0, 0, 0, 0,
-50, 0, 0, 0, 0, 0, 100, 0, 0, 0, 0, 0, -38.86, 0, 0,
0, 0, 0, -89.2, 0, 0, 0, 0, 0, -45.4, 0, 0, 0, 0, 0, -58.18,
0, 0, 0, 0, 0, -42.33, 0, 0, 0, 0, 0, 100, 0, 0, 0, 0,
0, -10.27, 0, 0, 0, 0, 0, -100, 0, 0, 0, 0, 0, -32.58, 0,
0, 0, 0, 0, 8.48, 0, 0, 0, 0, 0, -8.03, 0, 0, 0, 0, 0, 18.1,
0, 0, 0, 0, 0, -13, 0, 0, 0, 0, 0, 34.4, 0, 0)), row.names = c(NA,
-270L), class = c("tbl_df", "tbl", "data.frame"))
This is my working code to make the plot, but without the values above or below 100 specially colored:
library(ggplot2)
ggplot(plot30, aes(Comp_Data, Station)) + geom_tile(aes(fill = Perc_Diff),color='black') + geom_text(aes(label = stuff)) +
scale_fill_gradient2(low = "green", mid = 'white', high = "red",limits=c(min(plot30$Perc_Diff,na.rm=T), max(plot30$Perc_Diff,na.rm=T))) +
ggtitle(paste('30 Day Precipitation Comparison (Inches) for',date_30,'to',date_1,'\nCell Values Represent Differences (SD Mesonet minus QPE; Inches)')) +
theme(legend.key.height = unit(3, "cm")) +
theme(axis.title = element_blank()) + theme(plot.title = element_text(hjust = 0.5)) + theme(panel.background = element_blank()) +
theme(axis.ticks = element_blank()) + theme(axis.text.y = element_text(margin = margin(r = 0))) + theme(legend.title = element_blank()) +
theme(legend.text = element_text(colour="black", size = 14, face = "bold")) +
scale_y_discrete(labels = parse(text = levels(plot30$Station))) +
theme(axis.text = element_text(size = 12, colour = "black", face='bold'))
I have tried putting ifelse statements in the fill statement like this (as seen in a couple other questions and online sources), but it doesn't work for me as coded.
geom_tile(aes(fill = if (Perc_Diff >= 100) {'yellow'} else if (Perc_Diff) <= -100 {'blue'} else {'Perc_Diff')), color = 'black')
Do I need to switch to a manual scale here to get this to work? I'd really like to avoid that if possible, to keep the continuous scale between -100 and +100. Any help would be wonderful. Thank you.
This approach is a potential solution, but the figure looks a little 'weird' as the colors (Perc Diff) and label (stuff) sometimes match, sometimes don't match.
Regardless, here is how I would split the dataset into 3 'groups' for plotting (<= -100 , > -100 & < 100, >= 100):
ggplot(plot30) +
geom_tile(data = plot30 %>% filter(Perc_Diff < 100 & Perc_Diff > -100),
aes(x = Comp_Data,
y = factor(Station, levels = unique(Station)),
fill = Perc_Diff), color='black') +
geom_tile(data = plot30 %>% filter(Perc_Diff <= -100),
aes(x = Comp_Data,
y = factor(Station, levels = unique(Station))),
fill = "blue", color='black') +
geom_tile(data = plot30 %>% filter(Perc_Diff >= 100),
aes(x = Comp_Data,
y = factor(Station, levels = unique(Station))),
fill = "yellow", color='black') +
geom_text(aes(x = Comp_Data,
y = factor(Station, levels = unique(Station)),
label = stuff)) +
scale_fill_gradient2(low = "green", mid = 'white', high = "red",
limits=c(min(plot30$Perc_Diff,na.rm=T),
max(plot30$Perc_Diff,na.rm=T))) +
ggtitle(paste('30 Day Precipitation Comparison (Inches) for', "date_30",'to',"date_1",'\nCell Values Represent Differences (SD Mesonet minus QPE; Inches)')) +
theme(legend.key.height = unit(3, "cm")) +
theme(axis.title = element_blank()) +
theme(plot.title = element_text(hjust = 0.5)) +
theme(panel.background = element_blank()) +
theme(axis.ticks = element_blank()) +
theme(axis.text.y = element_text(margin = margin(r = 0))) +
theme(legend.title = element_blank()) +
theme(legend.text = element_text(colour="black", size = 14, face = "bold")) +
theme(axis.text = element_text(size = 12, colour = "black", face='bold'))
png(filename = "E:\\Precip_QC_Folder\\New_Output\\30_day_pp.png",height = 3000, width = 2250, res = 250)
ggplot(plot30) +
geom_tile(data = plot30 %>% filter(Perc_Diff > -100 & Perc_Diff < 100),
aes(x = Comp_Data, y = Station, fill = Perc_Diff), color='black') +
geom_tile(data = plot30 %>% filter(Perc_Diff <= -100),
aes(x = Comp_Data, y = Station), fill = 'lightblue', color= 'black') +
geom_tile(data = plot30 %>% filter(Perc_Diff >= 100),
aes(x = Comp_Data, y = Station), fill = 'yellow', color= 'black') +
geom_text(aes(x = Comp_Data, y = Station, label = stuff)) +
scale_fill_gradient2(low = "lightgreen", mid = 'white', high = "orange",limits=c(min(plot30$Perc_Diff,na.rm=T), max(plot30$Perc_Diff,na.rm=T))) +
ggtitle(paste('30 Day Precipitation Comparison (Inches) for',date_30,'to',date_1,'\nCell Values Represent Differences (SD Mesonet minus QPE; Inches)')) +
theme(legend.key.height = unit(3, "cm")) +
theme(axis.title = element_blank()) + theme(plot.title = element_text(hjust = 0.5)) + theme(panel.background = element_blank()) +
theme(axis.ticks = element_blank()) + theme(axis.text.y = element_text(margin = margin(r = 0))) + theme(legend.title = element_blank()) +
theme(legend.text = element_text(colour="black", size = 14, face = "bold")) +
scale_y_discrete(labels = parse(text = levels(plot30$Station))) +
theme(axis.text = element_text(size = 12, colour = "black", face='bold'))
dev.off()

complex ggplot in R - half circular bar plot

Okay so here is the challenge. How do recreate this chart?
The numbers and so on does not have to match, what I am really trying to do is create a circular bar chart in a gauge type layout with the gap. Headers and text is optional. More just the idea of a 3/4 circular bar chart.
Here is some example code that I am playing with:
library(ggplot2)
fixed_income.df <- data.frame(name = c("total","US Gov't Debt","US Municipal Debt",
"US IG Corp","US HY Corp","Int'l Developed",
"Emerging Market"),
allocation = c(3,1,4,3,4,2,3),
x_ax = c(1:7))
ggplot(fixed_income.df,aes(x = as.numeric(x_ax), y = allocation)) +
geom_bar(stat = "identity") +
ylim(-5,5) +
coord_polar(
theta = "x",
start=-3)
) + coord_flip()
which returns:
ANy help will earn a cookie! No really any help would be so appreciated, I am stuck..
Sody
The code for the basic plot is fairly simple (at least, without the annotations)
library(ggplot2)
ggplot(df, aes(xvals, yvals, fill = cols)) +
geom_col(width = 1) +
scale_y_continuous(limits = c(-2, 3)) +
scale_fill_manual(values = rev(c("#e9cbc1", "#b54649", "gray90",
"gray50", "#8ba55d", "#e2e4d6",
"white", "#c3a891", "#37959d",
"#5c7890", "#dcad3c", "#55a3b9",
"#f39068"))) +
theme_void() +
geom_vline(colour = "white", xintercept = c(0.5, 1.5, 8.5, 15.5, 16.5, 17.5),
size = 3) +
geom_segment(data = data.frame(x = 0.5 + 1:23, y = 0, yend = 1),
aes(x = x, y = 0, yend = 1, xend = x), colour = "white",
inherit.aes = FALSE) +
scale_x_continuous(expand = c(0.2, 1)) +
coord_polar(start = -pi) +
theme(legend.position = "none")
It's getting your data in the correct format that's going to be difficult:
df <- structure(list(xvals = c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 4L, 4L,
4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 5L, 5L, 5L, 5L, 5L,
5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L,
6L, 6L, 6L, 6L, 6L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L,
7L, 7L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 9L,
9L, 9L, 9L, 9L, 9L, 9L, 9L, 9L, 9L, 9L, 9L, 9L, 10L, 10L, 10L,
10L, 10L, 10L, 10L, 10L, 10L, 10L, 10L, 10L, 10L, 11L, 11L, 11L,
11L, 11L, 11L, 11L, 11L, 11L, 11L, 11L, 11L, 11L, 12L, 12L, 12L,
12L, 12L, 12L, 12L, 12L, 12L, 12L, 12L, 12L, 12L, 13L, 13L, 13L,
13L, 13L, 13L, 13L, 13L, 13L, 13L, 13L, 13L, 13L, 14L, 14L, 14L,
14L, 14L, 14L, 14L, 14L, 14L, 14L, 14L, 14L, 14L, 15L, 15L, 15L,
15L, 15L, 15L, 15L, 15L, 15L, 15L, 15L, 15L, 15L, 16L, 16L, 16L,
16L, 16L, 16L, 16L, 16L, 16L, 16L, 16L, 16L, 16L, 17L, 17L, 17L,
17L, 17L, 17L, 17L, 17L, 17L, 17L, 17L, 17L, 17L, 18L, 18L, 18L,
18L, 18L, 18L, 18L, 18L, 18L, 18L, 18L, 18L, 18L, 19L, 19L, 19L,
19L, 19L, 19L, 19L, 19L, 19L, 19L, 19L, 19L, 19L, 20L, 20L, 20L,
20L, 20L, 20L, 20L, 20L, 20L, 20L, 20L, 20L, 20L, 21L, 21L, 21L,
21L, 21L, 21L, 21L, 21L, 21L, 21L, 21L, 21L, 21L, 22L, 22L, 22L,
22L, 22L, 22L, 22L, 22L, 22L, 22L, 22L, 22L, 22L, 23L, 23L, 23L,
23L, 23L, 23L, 23L, 23L, 23L, 23L, 23L, 23L, 23L), yvals = c(0.45,
0, 0, 0.1, 0, 0.45, 0.5, 1, 0, 0, 0, 0, 0, 0.45, 0, 0.05, 0,
0.2, 0.3, 0.5, 0, 1, 0, 0, 0, 0, 0.3, 0.15, 0.05, 0, 0, 0.5,
0.5, 0, 1, 0, 0, 0, 0, 0.3, 0.15, 0.05, 0, 0, 0.5, 0.5, 0, 1,
0, 0, 0, 0, 0.45, 0, 0.05, 0, 0.2, 0.3, 0.5, 0, 1, 0, 0, 0, 0,
0.45, 0, 0.05, 0, 0.2, 0.3, 0.5, 0, 1, 0, 0, 0, 0, 0.45, 0, 0,
0.1, 0, 0.45, 0.5, 0, 1, 0, 0, 0, 0, 0.45, 0, 0, 0.1, 0, 0.45,
0.5, 0, 1, 0, 0, 0, 0, 0.3, 0.15, 0.05, 0, 0, 0.5, 0.5, 0, 0,
1, 0, 0, 0, 0.15, 0.3, 0.05, 0, 0, 0.5, 0.5, 0, 0, 1, 0, 0, 0,
0.45, 0, 0.05, 0, 0.2, 0.3, 0.5, 0, 0, 1, 0, 0, 0, 0.45, 0, 0,
0.1, 0, 0.45, 0.5, 0, 0, 1, 0, 0, 0, 0.45, 0, 0.05, 0, 0.2, 0.3,
0.5, 0, 0, 1, 0, 0, 0, 0.3, 0.15, 0.05, 0, 0, 0.5, 0.5, 0, 0,
1, 0, 0, 0, 0.45, 0, 0, 0.1, 0, 0.45, 0.5, 0, 0, 1, 0, 0, 0,
0.45, 0, 0, 0.1, 0, 0.45, 0.5, 0, 0, 0, 1, 0, 0, 0.45, 0, 0,
0.1, 0, 0.45, 0.5, 0, 0, 0, 0, 1, 0, 0.45, 0, 0.05, 0, 0.2, 0.3,
0.5, 0, 0, 0, 0, 0, 1, 0.3, 0.15, 0.05, 0, 0, 0.5, 0.5, 0, 0,
0, 0, 0, 1, 0.45, 0, 0, 0.1, 0, 0.45, 0.5, 0, 0, 0, 0, 0, 1,
0.45, 0, 0, 0.1, 0, 0.45, 0.5, 0, 0, 0, 0, 0, 1, 0.45, 0, 0,
0.1, 0, 0.45, 0.5, 0, 0, 0, 0, 0, 1, 0.45, 0, 0, 0.1, 0, 0.45,
0.5, 0, 0, 0, 0, 0, 1), cols = structure(c(13L, 12L, 11L, 10L,
9L, 8L, 7L, 6L, 5L, 4L, 3L, 2L, 1L, 13L, 12L, 11L, 10L, 9L, 8L,
7L, 6L, 5L, 4L, 3L, 2L, 1L, 13L, 12L, 11L, 10L, 9L, 8L, 7L, 6L,
5L, 4L, 3L, 2L, 1L, 13L, 12L, 11L, 10L, 9L, 8L, 7L, 6L, 5L, 4L,
3L, 2L, 1L, 13L, 12L, 11L, 10L, 9L, 8L, 7L, 6L, 5L, 4L, 3L, 2L,
1L, 13L, 12L, 11L, 10L, 9L, 8L, 7L, 6L, 5L, 4L, 3L, 2L, 1L, 13L,
12L, 11L, 10L, 9L, 8L, 7L, 6L, 5L, 4L, 3L, 2L, 1L, 13L, 12L,
11L, 10L, 9L, 8L, 7L, 6L, 5L, 4L, 3L, 2L, 1L, 13L, 12L, 11L,
10L, 9L, 8L, 7L, 6L, 5L, 4L, 3L, 2L, 1L, 13L, 12L, 11L, 10L,
9L, 8L, 7L, 6L, 5L, 4L, 3L, 2L, 1L, 13L, 12L, 11L, 10L, 9L, 8L,
7L, 6L, 5L, 4L, 3L, 2L, 1L, 13L, 12L, 11L, 10L, 9L, 8L, 7L, 6L,
5L, 4L, 3L, 2L, 1L, 13L, 12L, 11L, 10L, 9L, 8L, 7L, 6L, 5L, 4L,
3L, 2L, 1L, 13L, 12L, 11L, 10L, 9L, 8L, 7L, 6L, 5L, 4L, 3L, 2L,
1L, 13L, 12L, 11L, 10L, 9L, 8L, 7L, 6L, 5L, 4L, 3L, 2L, 1L, 13L,
12L, 11L, 10L, 9L, 8L, 7L, 6L, 5L, 4L, 3L, 2L, 1L, 13L, 12L,
11L, 10L, 9L, 8L, 7L, 6L, 5L, 4L, 3L, 2L, 1L, 13L, 12L, 11L,
10L, 9L, 8L, 7L, 6L, 5L, 4L, 3L, 2L, 1L, 13L, 12L, 11L, 10L,
9L, 8L, 7L, 6L, 5L, 4L, 3L, 2L, 1L, 13L, 12L, 11L, 10L, 9L, 8L,
7L, 6L, 5L, 4L, 3L, 2L, 1L, 13L, 12L, 11L, 10L, 9L, 8L, 7L, 6L,
5L, 4L, 3L, 2L, 1L, 13L, 12L, 11L, 10L, 9L, 8L, 7L, 6L, 5L, 4L,
3L, 2L, 1L, 13L, 12L, 11L, 10L, 9L, 8L, 7L, 6L, 5L, 4L, 3L, 2L,
1L), .Label = c("Nesting Variable 6", "Nesting Variable 5",
"Nesting Variable 4",
"Nesting Variable 3", "Nesting Variable 2", "Nesting Variable 1",
"blank", "mint", "green", "darkgray", "lightgray", "red", "pink"
), class = "factor")), class = "data.frame", row.names = c(NA,
-299L))
OMG blonde moment, the answer is so simple.. How did I miss it..
xlim()

Error in is.pbalanced.default(x) : argument "y" is missing, with no default, due to stata imported labels

For the following panel data set:
panel <- structure(list(uurwerk = structure(c(40, 40, 40, 40, 36, 1, 32,
36, 32, 32, 36, 36, 40, 40, 40, 40, 40, 38, 38, 38, 38, 60, 55,
40, 42, 42, 42), label = "hours/week work in fact (on average)", class = c("labelled",
"numeric")), loon_c = structure(c(0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 1, 1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0), label = "pay/salary [gross] mean int", class = c("labelled",
"numeric")), mtr_loon = structure(c(0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0.4195, 0.42, 0, 0.404, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0), label = "pay/salary [gross] mean int", class = c("labelled",
"numeric")), ntot = structure(c(33084.2085, 31810.0215, 21653.7235,
21961.9788, 21535.29225, 24139.039, 22988.2945, 22183.3175, 84427.88258,
21729.72304, 24248.3388, 23044.16914, 24783.0759660205, 24955.49,
26060.0875, 29328.0404, 30407.6135, 39047.7663137553, 24467.7521372549,
37826.93, 25963.83683, 24516.76866, 24941.179175, 27549.5975,
32690.0255, 25200.10125, 23777.335), label = "total net income", class = c("labelled",
"numeric")), INC = structure(c(6L, 6L, 3L, 3L, 5L, 4L, 5L, 5L,
6L, 5L, 4L, 4L, 3L, 3L, 3L, 3L, 4L, 4L, 6L, 5L, 1L, 4L, 4L, 6L,
5L, 5L, 5L), .Label = c("[ 0, 3010)", "[ 3010, 20300)",
"[20300, 27189)", "[27189, 34020)", "[34020, 40767)", "[40767, 50961)",
"[50961,1165420]"), class = c("labelled", "factor"), label = "total netto income household 2013"),
year = structure(c(1L, 2L, 5L, 6L, 1L, 2L, 3L, 4L, 5L, 6L,
7L, 8L, 1L, 4L, 5L, 6L, 7L, 1L, 3L, 4L, 6L, 5L, 8L, 1L, 2L,
3L, 4L), .Label = c("2011", "2012", "2013", "2014", "2015",
"2016", "2017", "2018"), class = "factor"), nohhold = structure(c(1L,
1L, 1L, 1L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 4L, 4L, 4L, 4L,
4L, 5L, 5L, 5L, 5L, 6L, 6L, 7L, 7L, 7L, 7L), .Label = c("6",
"21", "38", "106", "116", "175", "262"), class = "factor")), row.names = c("6-2011",
"6-2012", "6-2015", "6-2016", "38-2011", "38-2012", "38-2013",
"38-2014", "38-2015", "38-2016", "38-2017", "38-2018", "106-2011",
"106-2014", "106-2015", "106-2016", "106-2017", "116-2011", "116-2013",
"116-2014", "116-2016", "175-2015", "175-2018", "262-2011", "262-2012",
"262-2013", "262-2014"), class = c("pdata.frame", "data.frame"
), index = structure(list(nohhold = structure(c(1L, 1L, 1L, 1L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 3L, 4L, 4L, 4L,
4L, 5L, 5L, 6L, 6L, 6L, 6L), .Label = c("6", "38", "106", "116",
"175", "262"), class = "factor"), year = structure(c(1L, 2L,
5L, 6L, 1L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 1L, 4L, 5L, 6L, 7L, 1L,
3L, 4L, 6L, 5L, 8L, 1L, 2L, 3L, 4L), .Label = c("2011", "2012",
"2013", "2014", "2015", "2016", "2017", "2018"), class = "factor")), row.names = c(1L,
2L, 5L, 6L, 17L, 18L, 19L, 20L, 21L, 22L, 23L, 24L, 25L, 28L,
29L, 30L, 31L, 33L, 35L, 36L, 38L, 43L, 46L, 47L, 48L, 49L, 50L
), class = c("pindex", "data.frame")))
I would like to run a(ny) random effects model:
library(plm)
summary(plm(uurwerk ~ loon_c + ntot, model="random", data=panel))
But I get the error:
Error in is.pbalanced.default(x) : argument "y" is missing, with no default
I cannot for the life of me figure out how this can be the case.
Any suggestions?
The problem is coming from your dependent variable:
class(panel$uurwerk)
[1] "pseries" "labelled" "numeric"
From your formula I guess you are treating it as continuous, so you can do:
plm(as.numeric(uurwerk) ~ loon_c + ntot, model="random",data=panel)
Model Formula: as.numeric(uurwerk) ~ loon_c + ntot
Coefficients:
(Intercept) loon_c ntot
4.0427e+01 -9.8477e-02 -9.2704e-06

Inconsistent predictions from predict.gbm() 2.1.4 vs 2.1.3

This question is related to my earlier post here.
I have tracked down the problem and it seems to be related to which version of gbm I use. The latest version, 2.1.4 exhibits the problem on my system (R 3.4.4 and also 3.5; both on Ubuntu 18.04) whereas version 2.1.3 works as expected:
mydata <- structure(list(Count = c(1L, 3L, 1L, 4L, 1L, 0L, 1L, 2L, 0L, 0L, 1L, 1L, 2L, 1L, 2L, 2L, 1L, 2L, 0L, 2L, 3L, 1L, 4L, 3L, 0L, 4L, 1L, 2L, 1L, 1L, 0L, 2L, 1L, 4L, 1L, 5L, 3L, 0L, 0L, 1L, 1L, 0L, 1L, 0L, 0L, 2L, 0L, 0L, 1L, 1L, 1L, 0L, 3L, 1L, 1L, 0L, 3L, 1L, 1L, 1L, 1L, 2L, 3L, 2L, 2L, 0L, 0L, 3L, 5L, 1L, 2L, 1L, 1L, 0L, 0L, 1L, 2L, 1L, 3L, 1L, 1L, 0L, 2L, 2L, 1L, 3L, 3L, 2L, 0L, 0L, 1L, 2L, 1L, 0L, 2L, 0L, 0L, 4L, 4L, 2L), Treat1 = structure(c(10L, 14L, 8L, 2L, 3L, 12L, 1L, 10L, 6L, 2L, 11L, 11L, 15L, 1L, 8L, 3L, 13L, 9L, 9L, 11L, 1L, 8L, 14L, 5L, 10L, 8L, 15L, 11L, 7L, 6L, 13L, 11L, 7L, 1L, 1L, 2L, 7L, 12L, 5L, 1L, 8L, 1L, 9L, 8L,12L, 14L, 12L, 7L, 8L, 14L, 3L, 3L, 5L, 1L, 1L, 11L, 6L, 5L, 5L, 13L, 9L, 3L, 8L, 9L, 13L, 9L, 7L, 9L, 2L, 6L, 10L, 3L, 11L, 4L, 3L, 15L, 12L, 6L, 4L, 3L, 8L, 8L, 11L, 1L, 11L, 2L, 11L, 5L, 12L, 6L, 8L, 14L, 1L, 9L, 9L, 10L, 10L, 5L, 14L, 3L), .Label = c("D", "U", "R", "E", "C", "Y", "L", "O", "G", "T", "N", "J", "V", "X", "A"), class = "factor"), Treat2 = structure(c(15L, 13L, 7L, 8L, 2L, 5L, 15L, 4L, 2L, 7L, 6L, 2L, 3L, 14L, 10L, 7L, 7L, 14L, 11L, 7L, 6L, 1L, 5L, 13L, 11L, 6L, 10L, 5L, 3L, 1L, 7L, 9L, 6L, 10L, 5L, 11L, 15L, 9L, 7L, 11L, 10L, 2L, 3L, 3L, 5L, 11L, 8L, 6L,4L, 5L, 15L, 8L, 8L, 2L, 2L, 10L, 4L, 1L, 10L, 11L, 10L, 8L, 7L, 7L, 8L, 14L, 16L, 11L, 10L, 9L, 3L, 15L, 13L, 1L, 11L, 11L, 9L, 7L, 10L, 9L, 3L, 7L, 5L, 13L, 3L, 14L, 10L, 10L, 15L, 13L, 15L, 12L, 14L, 11L, 5L, 4L, 2L, 3L, 11L, 10L), .Label = c("B", "X", "R", "H", "L", "D", "U", "Q", "K", "C", "T", "V", "J", "E", "F", "A"), class = "factor"), Near = c(0, 1, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 0, 1, 1, 1, 1, 0, 0, 1, 1, 1, 1, 0, 0, 1, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 0, 1, 0, 1, 1, 0, 1, 0, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 0, 0, 0, 0, 0, 1, 1, 1, 0, 0, 0, 0, 1, 0, 0, 1, 0, 1, 0, 1, 0, 0, 0, 1, 0, 0, 1, 1, 1, 1, 1, 0, 1, 0, 0), Co1 = c(2, 5, 1, 1, 0, 1, 1, 2, 1, 2, 5, 2, 1, 0, 1, 2, 6, 3, 3, 1, 2, 2, 3, 0, 1, 0, 1, 0, 2, 1, 0, 1, 2, 3, 1, 2, 2, 0, 0, 2, 3, 3, 1, 1, NA, 2, 0, 2, 1, NA, 1, 1, 0, 1, 2, 0, 2, 1, 1, 1, 2, 3, 1, 0, 4, 0, 0, 0, 2, 2, 1, 1,2, 0, 1, 2, 1, 0, 0, 0, 0, 2, 1, 2, 2, 2, 2, 1, 0, 1, 1, 1, 1, 1, 0, 2, 0, 0, 5, 1), Co2 = c(1, 1, 2, 2, 4, 1, 3, 0, 5, 2, 2, 4, 1, 1, 2, 1, 2, 3, 0, 2, 3, 3, 0, 3, 1, 0, 1, 1, 1, 2, 0, 1, 1, 1, 2, 3, 2, 2, 3, 0, 0, 0, 1, 2, NA, 1, 1, 1, 0, 2, 1, 1, 2, 5, 0, 2, 1, 4, 1, 1, 3, 0, 1, 1, 1, 1, NA, 0, 2, 1, 1, 3, 2, 1, 2, 1, 3, 1, 2, 0, 1, 5, 2, 2, 1, 2, 3, 4, 3, 1, 1, 0, 5, 1, 1, 0, 1, 1, 2, 0)), .Names = c("Count", "Treat1", "Treat2", "Near", "Co1", "Co2"), row.names = c(1759L, 959L, 1265L, 1504L, 630L, 1905L, 1885L, 1140L, 1187L, 1792L, 1258L, 1125L, 756L, 778L, 1718L, 1797L, 388L, 715L, 63L, 311L, 1492L, 1128L, 629L, 536L, 503L, 651L, 1684L, 1893L, 721L, 1440L, 1872L, 1444L, 1593L, 143L, 1278L, 1558L, 1851L, 1168L, 1829L, 386L, 365L, 849L, 429L, 155L, 11L, 1644L, 101L, 985L, 72L, 459L, 1716L, 844L, 1313L, 77L, 1870L, 744L, 219L, 513L, 644L, 831L, 338L, 284L, 211L, 1096L,243L, 1717L, 1881L, 1784L, 1017L, 992L, 45L, 707L, 489L, 1267L, 1152L, 1819L, 995L, 510L, 1350L, 1700L, 56L, 1754L, 725L, 1625L, 319L, 1818L, 1287L, 1634L, 953L, 1351L, 1787L, 923L, 917L, 484L, 886L, 390L, 1531L, 679L, 1811L, 1736L), class = "data.frame")
detach("package:gbm", unload = TRUE )
remove.packages("gbm")
require(devtools)
install_version("gbm", version = "2.1.3")
set.seed(12345)
require(gbm)
n.trees <- 10000
m1.gbm <- gbm(Count ~ Treat1 + Treat2 + Near + Co1 + Co2, data = mydata, distribution = "poisson", n.trees = n.trees)
head(predict(m1.gbm, newdata = mydata, n.trees = n.trees, type = "response"))
[1] 0.8620154 2.8210216 0.8800267 3.7808341 0.4749737 0.3716022
predict(m1.gbm, newdata = head(mydata), n.trees = n.trees, type = "response")
[1] 0.8620154 2.8210216 0.8800267 3.7808341 0.4749737 0.3716022
...as expected. However,
detach("package:gbm", unload = TRUE )
remove.packages("gbm")
install.packages("gbm", dependencies = TRUE)
# I had to restart R after this, otherwise the following line failed with:
# Loading required package: gbm
# Error: package or namespace load failed for ‘gbm’ in get(method, envir = home):
# lazy-load database '/home/container/R/x86_64-pc-linux-gnu-library/3.5/gbm/R/gbm.rdb' is corrupt
require(gbm)
m1.gbm <- gbm(Count ~ Treat1 + Treat2 + Near + Co1 + Co2, data = mydata, distribution = "poisson", n.trees = n.trees)
head(predict(m1.gbm, newdata = mydata, n.trees = n.trees, type = "response"))
[1] 0.7524109 2.8789957 0.7843470 4.1724821 0.4525449 0.2036923
predict(m1.gbm, newdata = head(mydata), n.trees = n.trees, type = "response")
[1] 2.2216079 1.2806235 0.9109426 2.2842149 2.4828922 0.6124778
...which exhibits the problem in my earlier post.
I find this quite surprising since gbm is a well-known package, although I see that the vignette was update last month, so perhaps the latest version was only recently released. I was unable to find the exact date from here. What is the best way to proceed here ?

Different versions of R, lme4 and OS X give different fixed-effects significance results in glmer

I am running a logit mixed-effects model using glmer() in package lme4.
The experiment used a within-subjects within-items design with Subjects and Items as crossed random effects.
My problem: different versions of R and lme4 (run on different OS X) produce different standard errors estimates for the fixed effects, and consequently, different significance results.
Here is a subset of my data (data from the last two subjects):
structure(list(SubjN = c(87L, 87L, 87L, 87L, 87L, 87L, 87L, 87L,
87L, 87L, 87L, 87L, 87L, 87L, 87L, 87L, 87L, 87L, 87L, 87L, 87L,
87L, 87L, 87L, 88L, 88L, 88L, 88L, 88L, 88L, 88L, 88L, 88L, 88L,
88L, 88L, 88L, 88L, 88L, 88L, 88L, 88L, 88L, 88L, 88L, 88L, 88L,
88L), Items = structure(c(3L, 10L, 11L, 5L, 1L, 12L, 2L, 6L,
9L, 6L, 3L, 4L, 8L, 11L, 12L, 7L, 8L, 2L, 7L, 10L, 9L, 5L, 1L,
4L, 10L, 3L, 5L, 11L, 12L, 1L, 2L, 6L, 9L, 6L, 3L, 4L, 8L, 11L,
12L, 7L, 2L, 8L, 10L, 7L, 9L, 5L, 1L, 4L), .Label = c("a", "c",
"k", "f", "g", "i", "d", "l", "e", "j", "b", "h"), class = "factor"),
IV1 = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L), .Label = c("N", "L", "P"
), class = "factor"), DV = c(0, 0, 1, 0, 1, 0, 1, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 1, 1,
0, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0),
IV1.h = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L), contrasts = structure(c(-1,
0.5, 0.5, 0, -0.5, 0.5), .Dim = c(3L, 2L), .Dimnames = list(
c("N", "L", "P"), c("N_vs_L&P", "L_vs_P"))), .Label = c("N",
"L", "P"), class = "factor"), N_vs_LP = c(-1, -1, -1, -1,
-1, -1, -1, -1, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5,
0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, -1, -1, -1, -1, -1, -1,
-1, -1, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5,
0.5, 0.5, 0.5, 0.5, 0.5, 0.5), L_vs_P = c(0, 0, 0, 0, 0,
0, 0, 0, -0.5, -0.5, -0.5, -0.5, -0.5, -0.5, -0.5, -0.5,
0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0, 0, 0, 0, 0, 0,
0, 0, -0.5, -0.5, -0.5, -0.5, -0.5, -0.5, -0.5, -0.5, 0.5,
0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5)), .Names = c("SubjN",
"Items", "IV1", "DV", "IV1.h", "N_vs_LP", "L_vs_P"), row.names = c("3099",
"3100", "3101", "3102", "3103", "3104", "3119", "3120", "3107",
"3108", "3109", "3110", "3097", "3098", "3105", "3106", "3115",
"3116", "3117", "3118", "3111", "3112", "3113", "3114", "3147",
"3148", "3149", "3150", "3151", "3152", "3167", "3168", "3155",
"3156", "3157", "3158", "3145", "3146", "3153", "3154", "3163",
"3164", "3165", "3166", "3159", "3160", "3161", "3162"), class = "data.frame")
Each subject was tested on 24 trials on 3 different conditions (factor IV1, levels: N, L, P).
I recorded whether they produced a target linguistic structure (DV == 1) or not (DV == 0).
In the analysis, I only included those subjects who produced the target structure at least one.
Nonetheless, most of them produced the target structure only on very few occasion. This is the proportion of DV == 1 produced by each subject in each condition:
library(plyr)
#dput(ddply(mydata, .(SubjN, IV1), summarise, l = length(DV), y = round(mean(DV),2)))
structure(list(SubjN = c(1L, 1L, 1L, 2L, 2L, 2L, 3L, 3L, 3L,
4L, 4L, 4L, 5L, 5L, 5L, 6L, 6L, 6L, 7L, 7L, 7L, 8L, 8L, 8L, 9L,
9L, 9L, 10L, 10L, 10L, 11L, 11L, 11L, 12L, 12L, 12L, 13L, 13L,
13L, 14L, 14L, 14L, 15L, 15L, 15L, 16L, 16L, 16L, 17L, 17L, 17L,
18L, 18L, 18L, 19L, 19L, 19L, 20L, 20L, 20L, 21L, 21L, 21L, 22L,
22L, 22L, 23L, 23L, 23L, 24L, 24L, 24L, 25L, 25L, 25L, 26L, 26L,
26L, 27L, 27L, 27L, 28L, 28L, 28L, 29L, 29L, 29L, 30L, 30L, 30L,
31L, 31L, 31L, 32L, 32L, 32L, 33L, 33L, 33L, 34L, 34L, 34L, 35L,
35L, 35L, 36L, 36L, 36L, 37L, 37L, 37L, 38L, 38L, 38L, 39L, 39L,
39L, 40L, 40L, 40L, 41L, 41L, 41L, 42L, 42L, 42L, 43L, 43L, 43L,
44L, 44L, 44L, 45L, 45L, 45L, 46L, 46L, 46L, 47L, 47L, 47L, 48L,
48L, 48L, 49L, 49L, 49L, 50L, 50L, 50L, 51L, 51L, 51L, 52L, 52L,
52L, 53L, 53L, 53L, 54L, 54L, 54L, 55L, 55L, 55L, 56L, 56L, 56L,
57L, 57L, 57L, 58L, 58L, 58L, 59L, 59L, 59L, 60L, 60L, 60L, 61L,
61L, 61L, 62L, 62L, 62L, 63L, 63L, 63L, 64L, 64L, 64L, 65L, 65L,
65L, 66L, 66L, 66L, 67L, 67L, 67L, 68L, 68L, 68L, 69L, 69L, 69L,
70L, 70L, 70L, 71L, 71L, 71L, 72L, 72L, 72L, 73L, 73L, 73L, 74L,
74L, 74L, 75L, 75L, 75L, 76L, 76L, 76L, 77L, 77L, 77L, 78L, 78L,
78L, 79L, 79L, 79L, 80L, 80L, 80L, 81L, 81L, 81L, 82L, 82L, 82L,
83L, 83L, 83L, 84L, 84L, 84L, 85L, 85L, 85L, 86L, 86L, 86L, 87L,
87L, 87L, 88L, 88L, 88L), IV1 = structure(c(1L, 2L, 3L, 1L, 2L,
3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L,
1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L,
2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L,
3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L,
1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L,
2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L,
3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L,
1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L,
2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L,
3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L,
1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L,
2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L,
3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L,
1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L,
2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L,
3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L,
1L, 2L, 3L), .Label = c("N", "L", "P"), class = "factor"), l = c(8L,
8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L,
8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L,
8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L,
8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 7L, 8L, 7L, 8L, 8L,
8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L,
8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 7L, 8L, 8L, 8L, 8L, 8L, 8L,
7L, 8L, 6L, 7L, 7L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L,
8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L,
8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 7L, 8L, 8L, 7L, 7L, 8L, 7L, 8L,
8L, 7L, 8L, 8L, 7L, 8L, 8L, 7L, 8L, 8L, 7L, 8L, 8L, 7L, 8L, 8L,
8L, 8L, 8L, 8L, 8L, 8L, 8L, 6L, 8L, 4L, 8L, 8L, 8L, 8L, 8L, 8L,
8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 7L, 8L, 8L, 8L, 8L,
8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L,
8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 7L,
8L, 8L, 7L, 8L, 8L, 7L, 8L, 8L, 7L, 8L, 8L, 7L, 8L, 8L, 7L, 8L,
8L, 7L, 8L, 8L, 7L, 8L, 8L, 7L, 8L, 8L, 7L, 8L, 7L, 8L, 8L, 8L,
8L, 8L, 8L, 8L, 8L, 8L, 8L), y = c(1, 0.88, 1, 0.5, 0.25, 0.62,
0, 0, 0.25, 0, 0.25, 0, 0.12, 0, 0, 0, 0.12, 0, 0, 0.12, 0.12,
0, 0, 0.12, 0.38, 0, 0.25, 0, 0.12, 0, 0.12, 0, 0.25, 0, 0, 0.12,
0.5, 0.25, 0.5, 0, 0, 0.12, 0, 0.25, 0.12, 0, 0, 0.12, 0, 0.12,
0, 0, 0.12, 0.12, 0.12, 0.62, 0, 0, 0.5, 0.25, 1, 0.88, 1, 0,
0, 0.12, 0, 0.12, 0.12, 0.12, 0.12, 0, 0.62, 0.62, 0.38, 0.5,
0.88, 0.12, 0.12, 0, 0, 0.12, 0.12, 0, 0, 0.12, 0, 0, 0.12, 0,
0, 0.12, 0, 0, 0.25, 0, 0, 0.14, 0, 0.5, 0.57, 0.29, 0, 0.12,
0, 0, 0.12, 0, 0.25, 0.5, 0.25, 0, 0.12, 0.12, 0.25, 0, 0.38,
0, 0, 0.12, 0, 0, 1, 0.25, 0.12, 0.25, 0, 0.12, 0.12, 0, 0, 0.12,
0, 0, 0.12, 0.12, 0, 0, 0.12, 0, 0.14, 0.14, 0.12, 0, 0.12, 0,
0, 0.12, 0.12, 0, 1, 0.88, 1, 0, 0.12, 0, 0.12, 0, 0, 0.12, 0,
0.12, 0, 0, 0.12, 0.12, 0.12, 0.12, 1, 1, 1, 0.12, 0, 0, 0.12,
0.38, 0, 0, 0.12, 0, 0, 0, 0.5, 0.5, 0, 0.25, 0, 0.12, 0.29,
0, 0, 0.38, 0, 0, 0.62, 0.5, 0, 0.12, 0, 0.12, 0.12, 0.25, 0.12,
0.25, 0.12, 0, 0.12, 0, 0, 0.12, 0, 0, 0.12, 0, 0.12, 0.12, 0,
0.12, 0.12, 0, 0, 0.12, 0.12, 0.12, 0, 0.38, 0.12, 0.57, 0, 0.12,
0, 0, 0.12, 0, 0, 0.12, 0, 0, 0.12, 0.14, 0.88, 0.88, 0.86, 0,
0, 0.14, 0, 0.12, 0.14, 0, 0.12, 0, 0, 0, 0.12, 0, 0, 0.12, 0.38,
0, 0, 0.5, 0.12, 0)), .Names = c("SubjN", "IV1", "l", "y"), row.names = c(NA,
-264L), class = "data.frame")
I run the following model including IV1 as fixed effect with helmert-contrast coding;
first contrast: N vs. L & P, second contrast: L vs. P.
m1 <- glmer(DV ~ IV1.h + (1 + IV1.h|SubjN) + (1|Items) + (0 + N_vs_LP|Items) + (0 + L_vs_P|Items), family ='binomial', mydata)
The model does not allow for the correlation between the by-Items random variables (I did this by creating separate slopes for the two contrasts), since when correlation was allowed they were perfectly correlated (which I interpreted as a sign of over-parametrization).
1) Results using
os x 10.8.5 mountain lion
R version 3.0.2 (2013-09-25)
lme4_1.0-5
(the original analysis I run)
Generalized linear mixed model fit by maximum likelihood ['glmerMod']
Family: binomial ( logit )
Formula: DV ~ IV1.h + (1 + N_vs_LP + L_vs_P | SubjN) + (1 | Items) + (0 + N_vs_LP | Items) + (0 + L_vs_P | Items)
Data: mydata
AIC BIC logLik deviance
1492.5408 1560.2050 -734.2704 1468.5408
Random effects:
Groups Name Variance Std.Dev. Corr
SubjN (Intercept) 2.3885505 1.54549
N_vs_LP 0.4394195 0.66289 -0.69
L_vs_P 1.9287559 1.38880 0.04 0.08
Items (Intercept) 0.0531518 0.23055
Items.1 N_vs_LP 0.0001950 0.01396
Items.2 L_vs_P 0.0003619 0.01902
Number of obs: 2077, groups: SubjN, 88; Items, 12
Fixed effects:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -2.2998 0.1964 -11.710 < 2e-16 ***
IV1.hN_vs_L&P 0.3704 0.1378 2.689 0.00717 **
IV1.hL_vs_P 0.2060 0.2320 0.888 0.37459
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Correlation of Fixed Effects:
(Intr) IV1.N_
IV1.hN_vs_L&P -0.388
IV1.hL_vs_P 0.014 0.019
2) Results using:
OS X 10.9.4 Mavericks
R version 3.1.1 (2014-07-10)
lme4_1.1-7
optimizer 'bobyqa'
Generalized linear mixed model fit by maximum likelihood (Laplace Approximation) ['glmerMod']
Family: binomial ( logit )
Formula: DV ~ IV1.h + (1 + N_vs_LP + L_vs_P | SubjN) + (1 | Items) + (0 +
N_vs_LP | Items) + (0 + L_vs_P | Items)
Data: mydata
Control: glmerControl(optimizer = "bobyqa")
AIC BIC logLik deviance df.resid
1492.5 1560.2 -734.3 1468.5 2065
Scaled residuals:
Min 1Q Median 3Q Max
-2.4174 -0.3364 -0.2595 -0.1706 4.6028
Random effects:
Groups Name Variance Std.Dev. Corr
SubjN (Intercept) 2.38791 1.5453
N_vs_LP 0.43935 0.6628 -0.69
L_vs_P 1.92629 1.3879 0.04 0.07
Items (Intercept) 0.05319 0.2306
Items.1 N_vs_LP 0.00000 0.0000
Items.2 L_vs_P 0.00000 0.0000
Number of obs: 2077, groups: SubjN, 88; Items, 12
Fixed effects:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -2.2998 0.2095 -10.975 <2e-16 ***
IV1.hN_vs_L&P 0.3703 0.1892 1.958 0.0503 .
IV1.hL_vs_P 0.2063 0.2679 0.770 0.4413
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Correlation of Fixed Effects:
(Intr) IV1.N_
IV1.hN__L&P -0.379
IV1.hL_vs_P -0.001 0.003
I really don't know which outcome I should trust. Any help would be very much appreciated.
Ps. Sorry if something is not clear - it's my first post :)
Thanks very much!
From lme4's NEWS file, for version 1.1-4
Standard errors of fixed effects are now computed from the approximate Hessian by default (see the use.hessian argument in vcov.merMod); this gives better (correct) answers when the estimates of the random- and fixed-effect parameters are correlated (Github #47)
The description of the problem is here
You should be able to retrieve the old standard errors from the newer (1.1-7) model by sqrt(diag(vcov(fitted_model,use.hessian=FALSE))), but the new version is more likely to be correct.
For more precise confidence intervals/p values, you can do a likelihood ratio test (use anova to compare nested models) and/or compute the profile confidence intervals with confint(fitted_model,which="beta_").

Resources