HSD.test row names error. How do I check row names? - r

I have a dataframe for which I did a two-way ANOVA.
dput(m3)
structure(list(Delta = c(-40, -40, -40, -40, -31.7, -29.3, -27.8,
-26.7, -26.2, -25.4, -24.7, -23.1, -23, -22.9, -22.4, -22.2,
-21.4, -21, -20.8, -15.1, -14.9, -14.1, -6.2, -6.2, -6, -5.3,
-4.9), Location = structure(c(1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L,
1L, 2L, 2L, 1L, 2L, 1L, 1L, 1L, 2L, 1L, 2L, 2L, 1L, 2L, 3L, 2L,
3L, 3L, 3L), .Label = c("int", "pen + int", "ter + pen"), class = "factor"),
Between = c(0L, 1L, 1L, 2L, 1L, 1L, 1L, 2L, 0L, 2L, 1L, 0L,
1L, 0L, 2L, 0L, 2L, 1L, 1L, 1L, 1L, 0L, 0L, 0L, 0L, 0L, 0L
), Relative = structure(c(5L, 6L, 6L, 7L, 8L, 3L, 3L, 4L,
5L, 4L, 3L, 5L, 3L, 5L, 7L, 5L, 4L, 6L, 3L, 3L, 6L, 2L, 1L,
2L, 1L, 1L, 1L), .Label = c("1&2", "2&3", "2&4", "2&5", "3&4",
"3&5", "3&6", "4&6"), class = "factor")), class = "data.frame", row.names = c(NA,
-27L))
library(agricolae)
aov.2sum=aov(Delta.~Location*X.between, data=m3)
I want to analyze the data using a HSD.test as I have for another dataframe using the same features.
I am following the code format in the package manual as below.
tx <- with(m3, interaction(Location, X.between))
amod <-aov(Delta~tx, data=m3)
test=HSD.test(amod, "tx", group=TRUE)
Then I receive the following error
Error in .rowNamesDF<-(x, value = value) :
duplicate 'row.names' are not allowed
In addition: Warning message:
non-unique values when setting 'row.names': ‘int.0’, ‘pen + int.1’, ‘pen + int.2’, ‘te + int.0’, ‘te + int.1’
Upon further analysis I see that my duplicate row names error is related to my X.between feature. When I use the following code I get the same duplicate row names error:
HSD.test(amod, "X.between", group=TRUE)
>> Error in data.frame(row.names = means[, 1], means[, 2:6]) :
duplicate row.names: 0, 1, 2
How are row names chosen for the HSD.test?
Then how can I change my row names? Or just avoid this duplication error?
Thank you for all and any help.

Related

How to make create two y-axis labels with a grid of facets with a single x-axis label

I have been struggling with ggplot to display these plots how I would like. My data have 2 factors, quarter and species. Station will be on the x-axis, value on the y-axis, and the constituent will be used with the facet_wrap. I want quarter differentiated with shapes, and species with colors.
The issue is I'm trying to replicate a figure done in SigmaPlot. It is 4x4 grid of plots, with the first two rows of the first column are empty, to allow for the placement of the legend. My original plan was to have two separate facets made using facet-wrap, and combine those, however, this doesn't maintain the 4x4 arrangement, it transforms it into a 1x2, which ruins alignment of plots and shrinks the larger faceted grid.
My next thought was to create each plot individually, then arrange them in a grid using cowplot. This presents the plots how I'd like them arranged, but I can't figure out how to have two y-axis labels, due to different units. One label would be centered on the two leftmost plots, and one centered on the left of the next column of 4 plots.
I'm trying to use this code (just copy the example data below, and run):
library(ggplot)
library(gridExtra)
test.data1 <- test.data[1:95, ]
test.data2 <- test.data[96:111, ]
testplot1 <- ggplot(test.data1, aes(Station, value)) +
geom_point(aes(shape = factor(quarter), fill = Species)) +
scale_shape_manual(values = c(21, 22)) +
labs(x = "Station", y = "Unit a", shape = "Sampling Quarter", fill = "Species") +
theme(legend.position = "none", legend.title = element_blank()) +
guides(fill = guide_legend(override.aes = list(shape = 21), nrow = 2, byrow = TRUE), shape = guide_legend(nrow = 2, byrow = TRUE)) +
facet_wrap( ~ constituent, ncol = 3, scales = "free_y")
testplot2 <- ggplot(test.data2, aes(Station, value)) +
geom_point(aes(shape = factor(quarter), fill = Species))
scale_shape_manual(values = c(21, 22)) +
labs(x = "Station", y = "Unit b", shape = "Sampling Quarter", fill = "Species") +
theme(legend.position = "top", legend.title = element_blank()) +
guides(fill = guide_legend(override.aes = list(shape = 21), nrow = 2, byrow = TRUE), shape = guide_legend(nrow = 2, byrow = TRUE)) +
facet_wrap( ~ constituent, ncol = 1, scales = "free_y")
grid.arrange(testplot2, testplot1, ncol = 2)
Which generates this:
But I want it to be arranged like this, where the XX and YY plots from above are normalized in size with the other plots (this was done using individual plots, and using plot_grid):
Example data from a larger set:
test.data <- structure(list(Station = structure(c(1L, 1L, 1L, 1L, 2L, 2L,
2L, 2L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 1L, 1L, 1L, 1L, 2L, 2L,
2L, 2L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 1L, 1L, 1L, 1L, 2L, 2L,
2L, 2L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 1L, 1L, 1L, 1L, 2L, 2L,
2L, 2L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 1L, 1L, 1L, 1L, 2L, 2L,
2L, 2L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 1L, 1L, 1L, 1L, 2L, 2L, 2L,
2L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 1L, 1L, 1L, 1L, 2L, 2L, 2L,
2L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L), .Label = c("StA", "StB"), class = "factor"),
CollectionDate = structure(c(3L, 2L, 3L, 1L, 3L, 1L, 3L,
1L, 3L, 2L, 3L, 1L, 3L, 1L, 3L, 1L, 3L, 2L, 3L, 1L, 3L, 1L,
3L, 1L, 3L, 2L, 3L, 1L, 3L, 1L, 3L, 1L, 3L, 2L, 3L, 1L, 3L,
1L, 3L, 1L, 3L, 2L, 3L, 1L, 3L, 1L, 3L, 1L, 3L, 2L, 3L, 1L,
3L, 1L, 3L, 1L, 3L, 2L, 3L, 1L, 3L, 1L, 3L, 1L, 3L, 2L, 3L,
1L, 3L, 1L, 3L, 1L, 3L, 2L, 3L, 1L, 3L, 1L, 1L, 3L, 2L, 3L,
1L, 3L, 1L, 3L, 1L, 3L, 2L, 3L, 1L, 3L, 1L, 3L, 1L, 3L, 2L,
3L, 1L, 3L, 1L, 3L, 1L, 3L, 2L, 3L, 1L, 3L, 1L, 3L, 1L), .Label = c("10/1/2017",
"10/16/2017", "4/1/2017"), class = "factor"), Species = structure(c(1L,
2L, 2L, 3L, 1L, 2L, 2L, 3L, 1L, 2L, 2L, 3L, 1L, 2L, 2L, 3L,
1L, 2L, 2L, 3L, 1L, 2L, 2L, 3L, 1L, 2L, 2L, 3L, 1L, 2L, 2L,
3L, 1L, 2L, 2L, 3L, 1L, 2L, 2L, 3L, 1L, 2L, 2L, 3L, 1L, 2L,
2L, 3L, 1L, 2L, 2L, 3L, 1L, 2L, 2L, 3L, 1L, 2L, 2L, 3L, 1L,
2L, 2L, 3L, 1L, 2L, 2L, 3L, 1L, 2L, 2L, 3L, 1L, 2L, 2L, 3L,
1L, 2L, 3L, 1L, 2L, 2L, 3L, 1L, 2L, 2L, 3L, 1L, 2L, 2L, 3L,
1L, 2L, 2L, 3L, 1L, 2L, 2L, 3L, 1L, 2L, 2L, 3L, 1L, 2L, 2L,
3L, 1L, 2L, 2L, 3L), .Label = c("SpA", "SpB", "SpC"), class = "factor"),
quarter = structure(c(1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L,
2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L,
1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L,
2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L,
1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L,
2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 2L, 1L, 2L, 1L, 2L, 1L,
2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L,
1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L), .Label = c("2017 Q2",
"2017 Q4"), class = "factor"), constituent = structure(c(1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 4L, 4L, 4L, 4L, 4L, 4L, 4L,
4L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 6L, 6L, 6L, 6L, 6L, 6L,
6L, 6L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 8L, 8L, 8L, 8L, 8L,
8L, 8L, 8L, 9L, 9L, 9L, 9L, 9L, 9L, 9L, 9L, 10L, 10L, 10L,
10L, 10L, 10L, 10L, 11L, 11L, 11L, 11L, 11L, 11L, 11L, 11L,
12L, 12L, 12L, 12L, 12L, 12L, 12L, 12L, 13L, 13L, 13L, 13L,
13L, 13L, 13L, 13L, 14L, 14L, 14L, 14L, 14L, 14L, 14L, 14L
), .Label = c("A", "B", "C", "D", "E", "F", "G", "H", "I",
"J", "K", "L", "XX", "YY"), class = "factor"), value = c(16,
35, 46, 23, 40, 19, 9, 50, 0.2, 1, 0.5698, 0.322, 1, 0.45,
0.322, 0.5, 16, 9, 6, 19, 14, 13, 16, 9, 0, 0.004, 0, 0.004,
1, 0.32, 1, 0.678, 0, 0.39, 0.23, 0, 0, 1.1, 0.5, 0.5, 9,
4.9, 7, 4.768, 9, 8.65, 4.768, 6.54, 195, 195, 46, 46, 124,
124, 218, 218, 2, 1, 1, 1, 1, 2, 1, 1, 0.1, 0.4, 0.22, 0.4,
0.22, 0.4, 0.22, 0.1, 0.99, 0.99, 1.2, 0.45, 0.765, 0.99,
0.99, 0.99, 0.99, 1.2, 4.3, 0.98, 0.99, 1.2, 1.2, 34, 34,
65, 98, 150, 34, 65, 65, 2, 0, 4, 1.3, 5, 3.3, 1.56, 1, 9,
0.36, 4, 4, 11, 2, 2.22, 11)), class = "data.frame", row.names = c(NA,
-111L))

How do I reduce this data frame by groups?

I have the following
t <- structure(list(name = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 3L, 3L, 3L,
3L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L), .Label = c("Alice", "Bob",
"Jane Doe", "John Doe"), class = "factor"), school = structure(c(1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 3L,
3L, 3L, 3L, 3L, 3L, 3L, 3L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L), .Label = c("Alice School",
"Bob School", "Someother School", "Someschool College"), class = "factor"),
group = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L), .Label = c("A", "B"), class = "factor"),
question = structure(c(2L, 4L, 6L, 8L, 1L, 3L, 5L, 7L, 2L,
4L, 6L, 8L, 1L, 3L, 5L, 7L, 2L, 4L, 6L, 8L, 1L, 3L, 5L, 7L,
2L, 4L, 6L, 8L, 1L, 3L, 5L, 7L), .Label = c("q1", "q2", "q3",
"q4", "q5", "q6", "q7", "q8"), class = "factor"), mark = c(0L,
0L, 0L, 0L, 1L, 1L, 1L, 1L, 0L, 0L, 0L, 0L, 1L, 1L, 1L, 1L,
1L, 1L, 0L, 1L, 0L, 1L, 0L, 1L, 0L, 1L, 1L, 0L, 1L, 0L, 1L,
1L), subject = structure(c(1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L,
1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 1L, 1L, 1L, 1L, 2L, 2L, 2L,
2L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L), .Label = c("C", "M"), class = "factor")), .Names = c("name",
"school", "group", "question", "mark", "subject"), row.names = c(7L,
15L, 23L, 31L, 3L, 11L, 19L, 27L, 8L, 16L, 24L, 32L, 4L, 12L,
20L, 28L, 6L, 14L, 22L, 30L, 2L, 10L, 18L, 26L, 5L, 13L, 21L,
29L, 1L, 9L, 17L, 25L), class = "data.frame")
and I need to produce a data frame in which each student has one combined mark for each subject. The combination is simply a sum of the marks on each question. So, for example, Jane Doe will have 3 on subject C and 2 on subject M. I've been banging my head for long enough with Reduce and other approaches. I could possibly solve this in a very procedural way, but if I could do that with a one-liner (or close approximation), I'd be happier. I'm sure it can be done...
You said it in your question; you want to group_by student and subject and compute the sum
library(tidyverse)
asdf %>%
group_by(name, subject) %>%
summarise(score = sum(mark))
Here a data.table solution:
library(data.table)
setDT(t)[, sum(mark), by = list(name, subject)]
And just for completeness, base R:
aggregate(mark ~ name + subject, data=t, sum)
This says "aggregate the response variable mark by the grouping variables name and subject, using sum as the aggregation function".

Print a-priori contrasts with type III sums of squares using Anova() in R

I am trying to print a-priori contrasts with type III sums of squares results. (Please don't speak about type I vs. type III. That's not the point of my question.) I can print the contrasts like I need using summary.aov(), however that uses type I SS. When I use the Anova() function from library(car) to get type III SS, it doesn't print the contrasts. I have also tried using drop1() with the lm() model, but this just prints the same results as Anova() (without the contrasts).
Please advise on a way to print the results of the contrasts with type III SS. An example follows.
Sample data:
DF <- structure(list(Code = structure(c(1L, 1L, 1L, 2L, 2L, 2L, 3L, 3L,
3L, 4L, 4L, 4L, 5L, 5L, 5L, 6L, 6L, 6L, 7L, 7L, 7L, 8L, 8L, 8L, 9L, 9L,
9L, 10L, 10L, 10L, 11L, 11L, 11L, 12L, 12L, 12L), .Label = c("A",
"B", "C", "D", "E", "F", "G", "H", "I", "J", "K", "L"), class =
"factor"), GzrTreat = structure(c(3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L,
3L, 3L, 3L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L), contrasts = structure(c(1,
-2, 1, 1, 0, -1), .Dim = c(3L, 2L), .Dimnames = list(c("I",
"N", "R"), NULL)), .Label = c("I", "N", "R"), class = "factor"),
BugTreat = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L), .Label =
c("Immigration", "Initial", "None"), class = "factor"), TempTreat =
structure(c(2L, 2L, 2L, 2L, 2L, 2L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L,
2L, 2L, 2L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 1L,
1L, 1L, 1L, 1L, 1L), .Label = c("Not Warm", "Warmed"), class =
"factor"), ShadeTreat = structure(c(2L, 2L, 2L, 1L, 1L, 1L, 2L, 2L,
2L, 1L, 1L, 1L, 2L, 2L, 2L, 1L, 1L, 1L, 2L, 2L, 2L, 1L, 1L,
1L, 2L, 2L, 2L, 1L, 1L, 1L, 2L, 2L, 2L, 1L, 1L, 1L), .Label = c("Light",
"Shaded"), class = "factor"), EpiChla = c(0.268482353, 0.423119608,
0.579507843, 0.738839216, 0.727856863, 0.523960784, 0.405801961,
0.335964706, 0.584441176, 0.557543137, 0.436456863, 0.563909804,
0.432398039, 0.344956863, 0.340309804, 0.992884314, 0.938390196,
0.663270588, 0.239833333, 0.62875098, 0.466011765, 0.536182353,
0.340309804, 0.721172549, 0.752082353, 0.269372549, 0.198180392,
1.298882353, 0.298354902, 0.913139216, 0.846129412, 0.922317647,
0.727033333, 1.187662745, 0.35622549, 0.073547059), log_EpiChla =
c(0.10328443, 0.153241402, 0.198521787, 0.240259426, 0.237507762,
0.182973791, 0.147924145, 0.125794985, 0.19987612, 0.192440084,
0.157292589, 0.194211702, 0.156063718, 0.128708355, 0.127205194,
0.299482089, 0.287441205, 0.220962908, 0.093363308, 0.21185469,
0.166137456, 0.186442772, 0.127205194, 0.235824411, 0.243554515,
0.103589102, 0.078522208, 0.361516746, 0.113393422, 0.281746574,
0.266262141, 0.283825153, 0.23730072, 0.339980371, 0.132331903,
0.030821087), MeanZGrowthAFDM_g = c(0.00665, 0.003966667, 0.004466667,
0.01705, 0.0139, 0.0129, 0.0081, 0.003833333, 0.00575, 0.011266667,
0.0103, 0.009, 0.0052, 0.00595, 0.0105, 0.0091, 0.00905, 0.0045, 0.0031,
0.006466667, 0.0053, 0.009766667, 0.0181, 0.00725, 0, 0.0012, 5e-04,
0.0076, 0.00615, 0.0814, NA, 0.0038, 0.00165, 0.0046, 0, 0.0015)),
.Names = c("Code", "GzrTreat", "BugTreat", "TempTreat", "ShadeTreat",
"EpiChla", "log_EpiChla", "MeanZGrowthAFDM_g"), class = "data.frame",
row.names = c(NA, -36L))
Code:
## a-priori contrasts
library(stats)
contrasts(DF$GzrTreat) <- cbind(c(1,-2,1), c(1,0,-1))
round(crossprod(contrasts(DF$GzrTreat)))
c_labels <- list(GzrTreat=list('presence'=1, 'immigration'=2))
## model
library(car)
EpiLM <- lm(log_EpiChla~TempTreat*GzrTreat*ShadeTreat, DF)
summary.aov(EpiLM, split=c_labels) ### MUST USE summary.aov(), to get
#contrast results, but sadly this uses Type I SS
Anova(EpiLM, split=c_labels, type="III") # Uses Type III SS, but NO
#CONTRASTS!!!!!
drop1(EpiLM, ~., test="F") # again, this does not print contrasts
# I need contrast results like from summary.aov(), AND Type III SS
# like from Anova()

Combining new lines and italics in facet labels with ggplot2

I have a problem getting some words used in facet labels in italics. I use the following code to create new lines for the labels:
levels(length_subject$CONSTRUCTION) <-
c("THAT \n Extraposed", "THAT \n Post-predicate", "TO \n Extraposed \n for-subject", "TO \n Post-predicate \n for-subject", "THAT \n Extraposed \n that-omission", "THAT \n Post-predicate \n that-omission")
However, I want the words "that" and "for" to appear in italics. I've tried something like
"TO \n Extraposed \n (italics(for))-subject"
bit it doesn't work.
This is what the plots look like:
produced with the following code:
ggplot( length_subject, aes( x = SUBJECT ) ) +
geom_histogram(binwidth=.6, colour="black", fill="grey") +
ylab("Frequency") +
xlab("Subject length") +
scale_x_discrete(breaks=c(2,4, 6, 8, 10, 12, 14, 16, 18, 20, 22, 24, 26, 28, 30)) + #
facet_grid( SUBJECT_TYPE~CONSTRUCTION, scales="free_x", space="free") +
theme(strip.text.x = element_text(size = 8))
Here is a reduced variant of the data:
structure(list(ID = structure(1:86, .Label = c("A05_122_01",
"A05_253_01", "A05_277_07", "A05_400_01", "A05_99_01", "A06_1076_01",
"A06_1261_01", "A06_1283_01", "A06_1283_02", "A06_1317_01", "A06_1326_01",
"A06_1389_01", "A06_1390_01", "A06_1437_01", "A06_1441_02", "A06_1441_03",
"A06_1442_03", "A06_1456_01", "A06_1461_01", "A06_830_01", "A06_868_01",
"A06_884_01", "A06_884_03", "A0K_1057_02", "A0K_1144_07", "A0K_1177_01",
"A0K_1190_03", "A0K_1214_03", "A0K_1216_01", "A0K_950_02", "A0K_986_01",
"A1A_102_02", "A1A_163_01", "A1A_199_01", "A1A_45_01", "A1A_97_01",
"A1B_1008_02", "A1B_1013_01", "A1B_1028_02", "A1B_1042_01", "A1B_1064_01",
"A1B_1126_03", "A1B_1152_01", "A1B_1174_01", "A1B_1271_01", "A1B_997_01",
"A1J_487_01", "A1J_544_02", "A1J_555_03", "A1J_569_01", "A1J_601_01",
"A1N_422_04", "A1N_70_02", "A1S_191_01", "A1S_329_01", "A1S_330_01",
"A1S_465_04", "A1Y_248_01", "A1Y_278_02", "A1Y_292_01", "A1Y_466_01",
"A1Y_521_01", "A1Y_612_01", "A1Y_634_01", "A26_139_03", "A26_142_01",
"A26_148_01", "A26_289_01", "A26_345_02", "A26_439_01", "A26_441_02",
"A26_463_01", "A28_171_01", "A28_244_01", "A28_245_01", "A28_30_01",
"A28_341_01", "A28_42_01", "A28_494_03", "A2A_301_01", "A2A_396_01",
"A2A_599_01", "A2A_637_01", "A2A_676_01", "A2E_22_01", "A2E_25_03"
), class = "factor"), SUBJECT = c(3L, 2L, 6L, 2L, 2L, 1L, 1L,
1L, 1L, 2L, 4L, 1L, 4L, 2L, 3L, 2L, 2L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 7L, 1L, 3L, 2L, 2L, 1L, 6L, 7L, 4L, 1L, 5L, 4L, 2L, 9L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 3L, 5L, 3L, 4L, 1L, 1L, 1L, 1L, 5L,
2L, 1L, 1L, 2L, 1L, 1L, 1L, 1L, 5L, 2L, 1L, 2L, 2L, 1L, 7L, 1L,
4L, 1L, 2L, 1L, 1L, 3L, 1L, 13L, 2L, 1L, 1L, 1L, 3L, 1L, 1L),
CONSTRUCTION = structure(c(1L, 3L, 1L, 1L, 1L, 4L, 4L, 1L,
1L, 5L, 5L, 1L, 1L, 5L, 1L, 3L, 5L, 1L, 5L, 4L, 3L, 3L, 1L,
5L, 3L, 5L, 1L, 1L, 2L, 3L, 1L, 1L, 3L, 1L, 1L, 1L, 3L, 1L,
4L, 3L, 1L, 3L, 1L, 1L, 1L, 1L, 4L, 2L, 4L, 1L, 1L, 3L, 2L,
5L, 1L, 1L, 1L, 3L, 1L, 1L, 4L, 4L, 3L, 1L, 2L, 3L, 3L, 1L,
3L, 1L, 1L, 1L, 6L, 1L, 1L, 2L, 4L, 4L, 3L, 5L, 3L, 3L, 3L,
3L, 5L, 1L), .Label = c("THAT_EXT", "THAT_EXT_NT", "THAT_POST",
"THAT_POST_NT", "TO_EXT_FOR", "TO_POST_FOR"), class = "factor"),
SUBJECT_TYPE = structure(c(1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L,
2L, 1L, 1L, 2L, 1L, 1L, 1L, 1L, 1L, 2L, 1L, 2L, 2L, 2L, 2L,
1L, 1L, 2L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 1L, 3L, 1L, 1L,
2L, 3L, 1L, 2L, 2L, 3L, 1L, 3L, 1L, 1L, 1L, 1L, 2L, 2L, 2L,
1L, 1L, 1L, 2L, 2L, 3L, 2L, 2L, 2L, 3L, 1L, 1L, 2L, 1L, 1L,
2L, 1L, 2L, 1L, 2L, 1L, 2L, 2L, 1L, 2L, 1L, 1L, 2L, 2L, 2L,
1L, 3L, 3L), .Label = c("NP", "PRO", "PROPER"), class = "factor")), .Names = c("ID",
"SUBJECT", "CONSTRUCTION", "SUBJECT_TYPE"), class = "data.frame", row.names = c(NA,
-86L))
To get italics, you need the formatting described in plotmath (and then for that to be parsed as an expression). However, the plotmath syntax does not have a line break operation. You can get something similar with atop, though. With your given example, you can set the labels to
levels(length_subject$CONSTRUCTION) <-
c("atop(textstyle('THAT'),textstyle('Extraposed'))",
"atop(textstyle('THAT'),textstyle('Post-predicate'))",
"atop(atop(textstyle('TO'),textstyle('Extraposed')),italic('for')*textstyle('-subject'))",
"atop(atop(textstyle('TO'),textstyle('Post-predicate')),italic('for')*textstyle('-subject'))",
"atop(atop(textstyle('THAT'),textstyle('Extraposed')),italic('that')*textstyle('-omission'))",
"atop(atop(textstyle('THAT'),textstyle('Post-predicate')),italic('that')*textstyle('-omission'))")
and then adding labeller=label_parsed to the facet_grid call
ggplot( length_subject, aes( x = SUBJECT ) ) +
geom_histogram(binwidth=.6, colour="black", fill="grey") +
ylab("Frequency") +
xlab("Subject length") +
scale_x_discrete(breaks=c(2,4, 6, 8, 10, 12, 14, 16, 18, 20, 22, 24, 26, 28, 30)) + #
facet_grid( SUBJECT_TYPE~CONSTRUCTION, scales="free_x", space="free",
labeller=label_parsed) +
theme(strip.text.x = element_text(size = 8))
gives
It's not perfect (the spacing between lines is not the same, and the disparity would only get worse the more lines there are), but that is the only way I've found to combine the two (newlines in plotmath expressions).
Edit (2016)
With the new facet labelling system, this solution does not work anymore. The trick of inheriting from element_blank to make a custom grob is now explicitly disabled. I guess the lesson is to accept that some things cannot be done in ggplot2, by design, and not waste too much energy with workarounds that may get broken at any time in the future.
Original answer
You could try to create a suitable custom element to place in the theme settings. The theme design does not make it very easy, unfortunately,
require(ggplot2)
require(gridExtra) # tableGrob
element_grob.element_custom <- function(element, label="", ...) {
mytheme <- ttheme_minimal(core = list(fg_params = list(parse=TRUE)))
disect <- strsplit(label, "\\n")[[1]]
g1 <- tableGrob(as.matrix(disect), theme=mytheme)
# wrapping into a gTree only because grobHeight.gtable would be too tight
# cf. absolute.units() squashing textGrobs
gTree(children=gList(g1), height=sum(g1$heights),
cl = "custom_strip")
}
# gTrees don't know their size and ggplot would squash it, so give it room
grobHeight.custom_strip = heightDetails.custom_axis = function(x, ...)
x$height
# silly wrapper to fool ggplot2's inheritance check...
facet_custom <- function(...){
structure(
list(...), # this ... information is not used, btw
class = c("element_custom","element_blank", "element") # inheritance test workaround
)
}
title <- c("First~line \n italic('wait, a second')",
"this~is~boring",
"integral(f(x)*dx, a, b)")
iris2 <- iris
iris2$Species <- factor(iris$Species, labels=title)
ggplot(iris2, aes(Sepal.Length, Sepal.Width)) +
geom_line() + facet_grid(.~Species) +
theme(strip.text.x = facet_custom())
As several of you were looking for how to fix the spacing, I have found a solution.
Add a line with atop(scriptscriptstyle("") before the last line from 3 lines (making this 4) or any following lines and don't forget to add ) afterwards

increasing the legend items in ggplot2

I have thid data frame:
head(x)
Date Company Region Units
1 1/1/2012 Gateway America 0
2 1/1/2012 Gateway Europe 0
3 1/1/2012 Gateway America 0
4 1/1/2012 Gateway Americas 0
5 1/1/2012 Gateway Europe 0
6 1/1/2012 Gateway Pacific 0
x
dput(x)
structure(list(Date = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L,
3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L), .Label = c("1/1/2012",
"1/12/2012", "1/2/2012"), class = "factor"), Company = structure(c(1L,
1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 3L, 3L, 3L,
3L, 3L, 3L, 3L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 3L, 3L, 3L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 1L, 1L, 1L, 1L,
1L, 1L, 1L), .Label = c("Gateway", "HP", "IBM"), class = "factor"),
Region = structure(c(1L, 3L, 1L, 2L, 3L, 4L, 2L, 1L, 3L,
1L, 2L, 3L, 4L, 2L, 1L, 3L, 1L, 2L, 3L, 4L, 2L, 1L, 3L, 1L,
2L, 3L, 4L, 2L, 1L, 3L, 1L, 2L, 3L, 4L, 2L, 1L, 3L, 1L, 2L,
3L, 4L, 2L, 1L, 3L, 1L, 2L, 3L, 4L, 2L, 1L, 3L, 1L, 2L, 3L,
4L, 2L, 1L, 3L, 1L, 2L, 3L, 4L, 2L, 1L, 3L, 1L, 2L, 3L, 4L,
2L, 1L, 3L, 1L, 2L, 3L, 4L, 2L, 1L, 3L, 1L, 2L, 3L, 4L, 2L
), .Label = c("America", "Americas", "Europe", "Pacific"), class = "factor"),
Units = c(1L, 3L, 1L, 6L, 20L, 2L, 2L, 10L, 2L, 1L, 2L, 4L,
6L, 30L, 2L, 15L, 10L, 3L, 4L, 7L, 9L, 12L, 34L, 50L, 3L,
2L, 4L, 3L, 1L, 3L, 3L, 1L, 4L, 0L, 1L, 0L, 0L, 1L, 0L, 4L,
0L, 0L, 0L, 0L, 5L, 0L, 8L, 0L, 0L, 0L, 0L, 0L, 9L, 0L, 56L,
10L, 0L, 0L, 5L, 7L, 0L, 0L, 8L, 0L, 2L, 0L, 4L, 0L, 5L,
7L, 0L, 0L, 8L, 10L, 0L, 6L, 0L, 4L, 4L, 0L, 2L, 0L, 5L,
0L)), .Names = c("Date", "Company", "Region", "Units"), class = "data.frame", row.names = c(NA,
-84L))
I would like to create a heat map:
ggplot(x, aes(Date, Company, fill=Units)) + geom_tile(aes(fill=Units)) + facet_grid(~Region) + scale_fill_gradient(low="white", high="red")
This command works but I need to be able to use different colors rather than white and red and increase the scalse on the legend. Right now, default is, there are 5 legends. I like to increase that 10. O would be white and others should be distinctly different from white so that users will notice it.
How would I increase the number of legend values using ggplot and assign different color to each legend?
I find it very informative to use quantiles to plot heatmaps as done here in this blog. This helps to generate skewed color sets (as shown in the blog). Suppose that the data is like yours (quite high amount of 0's), then by calculating appropriate quantiles, we could create a skewed color-map, which with appropriate labels, would be visually excellent and informative. I've modified the code from the blog plot already linked for this problem and added a bit more explanation. The blog post must get all the credit for the idea and implementation.
Before going into the code, we'll have to do some analysis with quantiles of your data to see which quantiles to use. By doing:
quantile(x$Units, seq(0, 1, length.out = 25)
# 0% 4.166667% 8.333333% 12.5% 16.66667% 20.83333% 25% 29.16667% 33.33333%
# 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000
# 37.5% 41.66667% 45.83333% 50% 54.16667% 58.33333% 62.5% 66.66667% 70.83333%
# 1.00000 1.00000 2.00000 2.00000 3.00000 3.00000 4.00000 4.00000 5.00000
# 75% 79.16667% 83.33333% 87.5% 91.66667% 95.83333% 100%
# 6.00000 7.00000 8.00000 9.62500 10.16667 25.41667 56.00000
You see that the 0% quantile corresponds to your data Units=0. And it is as such until 33% (33.33% to be precise). So, maybe we choose 38% as the next quantile. Then say, 60%, 75%, 90% and finally finish with 100%. Now, we have enough levels you've wanted and they are at levels that make sense for your data.
we'll need zoo package to accomplish this. Let's construct the data now:
require(zoo) # for rollapply
# the quantiles we just decided to categorise the data into classes.
qtiles <- quantile(x$Units, probs = c(0, 38, 60, 75, 90, 100)/100)
# a color palette
c_pal <- colorRampPalette(c("#3794bf", "#FFFFFF",
"#df8640"))(length(qtiles)-1)
# since we are using quantile classes for fill levels,
# we'll have to generate the appropriate labels
labels <- rollapply(round(qtiles, 2), width = 2, by = 1,
FUN = function(i) paste(i, collapse = " : "))
# added the quantile interval in which the data falls,
# which will be used for fill
x$q.units <- findInterval(x$Units, qtiles, all.inside = TRUE)
# Now plot
library(ggplot2)
p <- ggplot(data = x, aes(x = Date, y = Company, fill = factor(q.units)))
p <- p + geom_tile(color = "black")
p <- p + scale_fill_manual(values = c_pal, name = "", labels = labels)
p <- p + facet_grid( ~ Region)
p <- p + theme(axis.text.x = element_text(angle = 90, hjust = 1))
p
You get this:
Hope this helps.
Edit: You can also visit colorbrewer2.org to get nice palettes and set the colors yourself. For example:
# try out these colors:
c_pal <- c("#EDF8FB", "#B3CDE3", "#8C96C6", "#8856A7", "#810F7C")
c_pal <- c("#FFFFB2", "#FECC5C", "#FD8D3C", "#F03B20", "#BD0026")
Also, try setting alpha in the code geom_tile(color = "black", alpha = 0.5").

Resources