I'm trying to use facet_grid_sc to manipulate the y axis but by plotting the panel column-wise instead of row-wise. I have the following dataframe:
test2 <- structure(list(stream = structure(c(2L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 4L, 4L, 4L, 4L, 4L, 4L, 4L,
4L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L), .Label = c("Feed", "Cells 1-4",
"Cells 5-8", "Cells 9-12", "Totalconcentrate", "Tailings"), class = "factor"),
mineral = c("Calcite", "Calcite", "Calcite", "Calcite", "Scheelite",
"Scheelite", "Scheelite", "Scheelite", "Calcite", "Calcite",
"Calcite", "Calcite", "Scheelite", "Scheelite", "Scheelite",
"Scheelite", "Calcite", "Calcite", "Calcite", "Calcite",
"Scheelite", "Scheelite", "Scheelite", "Scheelite", "Calcite",
"Calcite", "Calcite", "Calcite", "Scheelite", "Scheelite",
"Scheelite", "Scheelite"), shapefactor = structure(c(3L,
1L, 2L, 4L, 3L, 1L, 2L, 4L, 3L, 1L, 2L, 4L, 3L, 1L, 2L, 4L,
3L, 1L, 2L, 4L, 3L, 1L, 2L, 4L, 3L, 1L, 2L, 4L, 3L, 1L, 2L,
4L), .Label = c("Angularity", "Circularity", "Formfactor",
"Roundness"), class = "factor"), mean = c(0.191074267258554,
1.57871188864644, 4.98640988695014, 0.709748496492633, 0.255307602333514,
1.41318627525434, 4.48236746482907, 0.787906844284224, 0.2370993275776,
1.59011418196729, 5.00866589220356, 0.708099932389451, 0.379279621962832,
1.41798512797767, 4.49174029724501, 0.803054249581329, 0.188107140488459,
1.58446664800185, 4.99394785197469, 0.720664938740251, 0.261663000285933,
1.33457686608134, 4.2649277507168, 0.809433325901688, 0.204386468447994,
1.55129002878455, 4.88754754288822, 0.761051008277419, 0.432222746956355,
1.22012862228623, 3.87276933395819, 0.861599941934953)), .Names = c("stream",
"mineral", "shapefactor", "mean"), row.names = c(73L, 74L, 75L,
76L, 93L, 94L, 95L, 96L, 125L, 126L, 127L, 128L, 145L, 146L,
147L, 148L, 177L, 178L, 179L, 180L, 197L, 198L, 199L, 200L, 281L,
282L, 283L, 284L, 301L, 302L, 303L, 304L), class = "data.frame")
I plot it using the following code:
scales_y <- list(
"Angularity" = scale_y_continuous(limits = c(0.5,2)),
"Circularity" = scale_y_continuous(limits = c(2,5.5)),
"Formfactor" = scale_y_continuous(limits = c(0,0.5)),
"Roundness" = scale_y_continuous(limits = c(0.6,0.9))
)
g <- ggplot(test2, aes(x=stream, y=mean, color=mineral, group=mineral))
g <- g + geom_point()
g <- g + geom_line()
g <- g + theme_bw()
g <- g + theme(axis.text.x = element_text(size =8),
axis.ticks.x=element_blank(),
legend.position="bottom")
g <- g + scale_color_brewer(palette = "Paired")
g <- g + facet_grid_sc(rows = vars(shapefactor), scales = list(y = scales_y))
print(g)
This works fine. However, if I want to plot the shapefactor in columns instead of rows (so writing facet_grid_sc(cols = vars(shapefactor), scales = list(y = scales_y))), then I get this error message:
Error in .subset2(x, i, exact = exact) : attempt to select less
than one element in get1index
I'm probably writing this wrong, but I can't find in the help of the package how to write it properly. Can anyone help me please?
Thanks in advance!
Nath
I did not get your fancy facet_grid_sc to work, but here is an alternate, a bit hack-ey solution using cowplot:
library(tidyverse)
library(cowplot)
# split, not group by for the labels
out <- test2 %>% split(.,.$shapefactor) %>%
map( ~ggplot(.,aes(x=stream, y=mean, color=mineral, group=mineral)) +
geom_point() +
geom_line() +
theme_bw() +
theme(axis.text.x = element_text(size =8),
axis.ticks.x=element_blank(),
legend.position='none') +
scale_color_brewer(palette = "Paired") +
scales_y[[.$shapefactor[1]]])
# create Dummy for legend
dummy <- ggplot(test2,aes(x=stream, y=mean, color=mineral, group=mineral)) +
geom_point() +
geom_line()+
scale_color_brewer(palette = "Paired") +
theme(legend.position = 'bottom',legend.justification = 'center')
# add legend to list
out$' ' <- cowplot::get_legend(dummy)
cowplot::plot_grid(plotlist = out, ncol = 1,labels = names(out),axis = 'r', align = 'h')
It obviously needs a bit of formatting, but you get the jest.
plot_grid offers a lot of customizability for its labels, the legend has to be changed via the dummy-plot.
Related
I have a data.frame (see below) and I would like to build a scatterplot, where colours of dots is based on a factor column (replicate). I simultaneously want to add a line that represents the mean of y, for each x. The problem is that when I define the stat_summary it uses the colours I requested for groupingand hence I get three mean lines (for each color) instead of one. Trying to redefine groups either in ggplot() or stat_summary() function did not work.
if I disable colors I get what I want (a single mean line).
How do I have colors (plot # 1), yet still have a single mean line (plot # 2)?
structure(list(conc = c(10L, 10L, 10L, 25L, 25L, 25L, 50L, 50L,
50L, 75L, 75L, 75L, 100L, 100L, 100L, 200L, 200L, 200L, 300L,
300L, 300L, 400L, 400L, 400L, 500L, 500L, 500L, 750L, 750L, 750L,
1000L, 1000L, 1000L), citric_acid = c(484009.63, 409245.09, 303193.26,
426427.47, 332657.35, 330875.96, 447093.71, 344837.39, 302873.98,
435321.69, 359146.09, 341760.28, 378298.37, 342970.87, 323146.92,
362396.98, 361246.41, 290638.14, 417357.82, 351927.66, 323611.37,
416280.3, 359430.65, 327950.99, 431167.14, 361429.91, 291901.43,
340166.41, 353640.91, 341839.08, 393392.69, 311375.19, 342103.54
), MICIT = c(20771.28, 18041.97, 12924.35, 49814.13, 38683.32,
38384.72, 106812.16, 82143.12, 72342.43, 156535.39, 128672.12,
119397.14, 187208.46, 167814.92, 159418.62, 350813.47, 357227.48,
295948.31, 505553.77, 523282.46, 489652.3, 803544.84, 704431.61,
654753.29, 1030485.41, 895451.64, 717698.52, 1246839.19, 1309712.63,
1212111.53, 1930503.38, 1499838.89, 1642091.64), replicate = structure(c(1L,
2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L,
3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L
), .Label = c("1", "2", "3"), class = "factor"), MICITNorm = c(0.0429150139016862,
0.0440859779160698, 0.0426274317575529, 0.116817357005636, 0.116285781751102,
0.116009395182412, 0.238903293897827, 0.238208275500519, 0.238853235263062,
0.359585551549246, 0.358272367659634, 0.34935932285636, 0.494869856298879,
0.489297881187402, 0.493331701877276, 0.968036405822146, 0.98887482369721,
1.01827072661558, 1.21131974956166, 1.48690347328766, 1.51308744189056,
1.93029754230503, 1.95985403582026, 1.99649737297637, 2.38999059622215,
2.47752500616233, 2.45870162403795, 3.6653801002868, 3.70350995307641,
3.54585417793659, 4.90731889298706, 4.81682207885606, 4.79998435561351
)), class = "data.frame", row.names = c(NA, -33L))
ggplot(xx, aes (conc, MICIT, colour = replicate)) + geom_point () +
stat_summary(geom = "line", fun = mean)
Use aes(group = 1):
ggplot(xx, aes(conc, MICIT, colour = replicate)) +
geom_point() +
geom_line() +
stat_summary(aes(group = 1), geom = "line", fun = mean)
This sounds like a popular plot but I really was trying to figure it out without any solution! Can I produce a plot that shows the percentage of the occurrence in each Blocked lanes inside each Duration? My data is
data<- structure(list(Lanes.Cleared.Duration = c(48, 55, 20, 38, 22,
32, 52, 21, 39, 14, 69, 13, 14, 13, 25), Blocked.Lanes = c(1L,
2L, 1L, 2L, 5L, 3L, 3L, 1L, 3L, 2L, 2L, 2L, 2L, 3L, 1L), Durations = structure(c(3L,
3L, 2L, 3L, 2L, 3L, 3L, 2L, 3L, 2L, 4L, 2L, 2L, 2L, 2L), .Label = c("<10",
"<30", "<60", "<90", "<120", ">120"), class = "factor")), row.names = c(NA,
-15L), na.action = structure(c(`17` = 17L, `26` = 26L, `28` = 28L,
`103` = 103L, `146` = 146L, `166` = 166L, `199` = 199L, `327` = 327L,
`368` = 368L, `381` = 381L, `431` = 431L, `454` = 454L, `462` = 462L,
`532` = 532L, `554` = 554L, `703` = 703L, `729` = 729L, `768` = 768L,
`769` = 769L, `785` = 785L, `970` = 970L, `1043` = 1043L, `1047` = 1047L,
`1048` = 1048L, `1081` = 1081L, `1125` = 1125L), class = "omit"), class = "data.frame")
I tried the following code and it gave me Real Duration rather than percentage. Here is my code.
data %>%
ggplot(aes(fill=factor(Blocked.Lanes), y=Lanes.Cleared.Duration, x=Durations)) +
geom_bar(position="dodge", stat="identity")
My result should show the percentage of occurrence of each Blocked lane inside each Duration.
I tried to group by Durations but it did not work.
Not quite elegant, but you can do a tally by duration and blocked lane first, and then do a percentage with grouped duration.
df1 <- data %>% group_by(Durations, Blocked.Lanes) %>% tally()
df1 <- df1 %>% ungroup %>% group_by(Durations) %>% mutate(perc = n/sum(n))
ggplot(df1, aes(fill=factor(Blocked.Lanes), y=perc, x=Durations)) +
geom_bar(position="dodge", stat="identity")
You can do:
library(tidyverse)
data %>%
count(Durations, Blocked.Lanes) %>%
group_by(Durations) %>%
mutate(n = prop.table(n) * 100) %>%
ggplot(aes(fill = factor(Blocked.Lanes), y = n, x = Durations)) +
geom_bar(position = "dodge", stat = "identity") +
ylab("Percentage of Blocked Lane") +
guides(fill = guide_legend(title = "Blocked Lane"))
Output
I want to plot the p values to each panel in a faceted ggplot. If the p value is larger than 0.05, I want to display the p value as it is. If the p value is smaller than 0.05, I want to display the value in scientific notation (i.e, 0.0032 -> 3.20e-3; 0.0000425 -> 4.25e-5).
The code I wrote to do this is:
p1 <- ggplot(data = CD3, aes(location, value, color = factor(location),
fill = factor(location))) +
theme_bw(base_rect_size = 1) +
geom_boxplot(alpha = 0.3, size = 1.5, show.legend = FALSE) +
geom_jitter(width = 0.2, size = 2, show.legend = FALSE) +
scale_color_manual(values=c("#4cdee6", "#e47267", "#13ec87")) +
scale_fill_manual(values=c("#4cdee6", "#e47267", "#13ec87")) +
ylab(expression(paste("Density of clusters, ", mm^{-2}))) +
xlab(NULL) +
stat_compare_means(comparisons = list(c("CT", 'N'), c("IF","N")),
aes(label = ifelse(..p.format.. < 0.05, formatC(..p.format.., format = "e", digits = 2),
..p.format..)),
method = 'wilcox.test', show.legend = FALSE, size = 10) +
#ylab(expression(paste('Density, /', mm^2, )))+
theme(axis.text = element_text(size = 10),
axis.title = element_text(size = 20),
legend.text = element_text(size = 38),
legend.title = element_text(size = 40),
strip.background = element_rect(colour="black", fill="white", size = 2),
strip.text = element_text(margin = margin(10, 10, 10, 10), size = 40),
panel.grid = element_line(size = 1.5))
plot(p1)
This code runs without error, however, the format of numbers isn't changed. What am I doing wrong?
I attached the data to reproduce the plot: donwload data here
EDIT
structure(list(value = c(0.931966449207829, 3.24210526315789,
3.88811650210901, 0.626860993574675, 4.62085308056872, 0.477508650519031,
0.111900110501359, 3.2495164410058, 4.06626506024096, 0.21684918139434,
1.10365086026018, 4.66666666666667, 0.174109967855698, 0.597625869832174,
2.3758865248227, 0.360751947840548, 1.00441501103753, 3.65168539325843
), Criteria = structure(c(2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L), .Label = c("Density", "Density of cluster",
"nodular count", "Elongated count"), class = "factor"), Case = structure(c(1L,
1L, 1L, 2L, 2L, 2L, 3L, 3L, 3L, 4L, 4L, 4L, 5L, 5L, 5L, 6L, 6L,
6L), .Label = c("Case 1A", "Case 1B", "Case 2", "Case 3", "Case 4",
"Case 5"), class = "factor"), Mark = structure(c(1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L), .Label = c("CD3",
"CD4", "CD8", "CD20", "FoxP3"), class = "factor"), location = structure(c(3L,
1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L,
2L), .Label = c("CT", "IF", "N"), class = "factor")), row.names = c(91L,
92L, 93L, 106L, 107L, 108L, 121L, 122L, 123L, 136L, 137L, 138L,
151L, 152L, 153L, 166L, 167L, 168L), class = "data.frame")
I think your issue came from the stat_compare_means and the use of comparisons.
I'm not totally sure, but I will guess that the output of p value for stat_compare_means is different from compare_means and so, you can't use it for the aes of label.
Let me explain, with your example, you can modify the display of the p.value like this:
library(ggplot2)
library(ggpubr)
ggplot(df, aes(x = location, y = value, color = location))+
geom_boxplot()+
stat_compare_means(ref.group = "N", aes(label = ifelse(p < 0.05,sprintf("p = %2.1e", as.numeric(..p.format..)), ..p.format..)))
You get the correct display of p.value but you lost your bars. So, if you use comparisons argument, you get:
library(ggplot2)
library(ggpubr)
ggplot(df, aes(x = location, y = value, color = location))+
geom_boxplot()+
stat_compare_means(comparisons = list(c("CT","N"), c("IF","N")), aes(label = ifelse(p < 0.05,sprintf("p = %2.1e", as.numeric(..p.format..)), ..p.format..)))
So, now, you get bars but not the correct display.
To circumwent this issue, you can perform the statistics outside of ggplot2 using compare_means functions and use the package ggsignif to display the correct display.
Here, I'm using dplyr and the function mutate to create new columns, but you can do it easily in base R.
library(dplyr)
library(magrittr)
c <- compare_means(value~location, data = df, ref.group = "N")
c %<>% mutate(y_pos = c(5,5.5), labels = ifelse(p < 0.05, sprintf("%2.1e",p),p))
# A tibble: 2 x 10
.y. group1 group2 p p.adj p.format p.signif method y_pos labels
<chr> <chr> <chr> <dbl> <dbl> <chr> <chr> <chr> <dbl> <chr>
1 value N CT 0.00866 0.017 0.0087 ** Wilcoxon 5 8.7e-03
2 value N IF 0.00866 0.017 0.0087 ** Wilcoxon 5.5 8.7e-03
Then, you can plot it:
library(ggplot2)
library(ggpubr)
library(ggsignif)
ggplot(df, aes(x = location, y = value))+
geom_boxplot(aes(colour = location))+
ylim(0,6)+
geom_signif(data = as.data.frame(c), aes(xmin=group1, xmax=group2, annotations=labels, y_position=y_pos),
manual = TRUE)
Does it look what you are trying to plot ?
So I had this script working yesterday on a different data set, an it actually worked once on this data set, but when I tried to combine it with another figure using plot_grid, I got this error:
Error:
T_SHOW_BACKTRACE environmental variable.
Error in grid.Call(L_textBounds, as.graphicsAnnot(x$label), x$x, x$y, :
polygon edge not found
Now when I try to construct the boxplot itself, I get the same error...
Here is my data:
dput(SUICMass)
structure(list(ChillTime = structure(c(1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 3L, 3L,
3L, 3L, 3L, 3L, 4L, 4L, 4L, 4L, 4L, 4L, 5L, 5L, 5L, 5L, 5L, 5L,
5L, 5L, 5L, 5L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 7L, 7L,
7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L), .Label = c("2", "4", "6", "24",
"27", "29", "31"), class = "factor"), Mass = c(1.2687, 1.5417,
1.6898, 1.7655, 2.413, 2.0333, 2.0824, 1.2676, 1.4916, 2.1585,
2.2453, 1.3624, 1.2951, 2.4209, 2.0804, 1.9227, 1.9032, 2.1063,
1.7601, 1.9905, 1.9837, 1.6312, 1.8567, 1.4433, 1.9369, 2.1029,
2.0265, 1.3212, 1.2971, 1.5823, 1.4759, 1.2745, 0.714, 1.5693,
1.7906, 1.8607, 1.8851, 1.9192, 1.6307, 1.4269, 1.7011, 0.8249,
1.7198, 1.3939, 1.394, 2.1527, 1.288, 1.4724, 1.5264, 1.6562,
1.5796, 1.4982, 1.2794, 1.6021, 0.6345, 2.4041, 2.0246, 1.8398,
1.349, 2.0156, 1.1563, 2.0462)), .Names = c("ChillTime", "Mass"
), row.names = c(NA, -62L), class = "data.frame")
Here is my code:
library(ggplot2)
library(multcompView)
library(plyr)
library(gridExtra)
library(cowplot)
## Box plot for Susans WMA population
SUICMass <- read.csv('SUICMass_Test_June_28_2017.csv', header = TRUE)
SUICMass$ChillTime <- factor(SUICMass$ChillTime, levels=c("2", "4", "6", "24", "27", "29", "31"))
generate_label_df <- function(SUICMassTUKEY, variable){
# Extract labels and factor levels from Tukey post-hoc
Tukey.levels <- SUICMassTUKEY[[variable]][,4]
Tukey.labels <- data.frame(multcompLetters(Tukey.levels)['Letters'])
#I need to put the labels in the same order as in the boxplot :
Tukey.labels$treatment=rownames(Tukey.labels)
Tukey.labels=Tukey.labels[order(Tukey.labels$treatment) , ]
return(Tukey.labels)
}
SUICMassmodel=lm(SUICMass$Mass~SUICMass$ChillTime )
SUICMassANOVA=aov(SUICMassmodel)
# Tukey test to study each pair of treatment :
SUICMassTUKEY <- TukeyHSD(x=SUICMassANOVA, 'SUICMass$ChillTime', conf.level=0.95)
labels<-generate_label_df(SUICMassTUKEY , "SUICMass$ChillTime")#generate labels using function
names(labels)<-c('Letters','ChillTime')#rename columns for merging
SUICMassyvalue<-aggregate(.~ChillTime, data=SUICMass, max)# obtain letter position for y axis using means
SUICMassfinal<-merge(labels,SUICMassyvalue) #merge dataframes
SUICMassPlot <- ggplot(SUICMass, aes(x = ChillTime, y = Mass)) +
stat_boxplot(geom ='errorbar', width=.2) +
geom_blank() +
theme_bw() +
theme(panel.border = element_rect(fill=NA, colour = "black", size=0.75)) +
theme(axis.text.x = element_text(face="bold")) +
theme(axis.text.y = element_text(face="bold")) +
labs(x = 'Time (weeks)', y = 'Mass (g)') +
ggtitle(expression(atop(bold("Fresh Mass"), atop(italic("(Sarah's - UIC Colony)"))))) +
theme(plot.title = element_text(hjust = 0.5, vjust = -0.6, face='bold')) +
geom_boxplot(fill = 'dodgerblue1', stat = "boxplot") +
geom_text(data = SUICMassfinal, aes(x = ChillTime, y = Mass, label = Letters),vjust=-2,hjust=.5) +
scale_y_continuous(limit = c(0, 3.5))
I can't figure out what the issue is here, because sometimes I can get the script to work and other times not.
I have this ggplot
ggplot(dt.1, aes(x=pctOAC,y=NoP, fill=Age)) +
geom_bar(stat="identity",position=position_dodge()) +
geom_smooth(aes(x=pctOAC,y=NoP, colour=Age), se=F, method="loess",show_guide = FALSE,lwd=0.7) +
theme(legend.position=c(.2,0.8))
dt1 <- structure(list(Age = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L), .Label = c("o80", "u80"), class = "factor"), NoP = c(47L, 5L, 33L, 98L, 287L, 543L, 516L, 222L, 67L, 14L, 13L, 30L, 1L, 6L, 17L, 30L, 116L, 390L, 612L, 451L, 146L, 52L), pctOAC = c(0, 10, 20, 30, 40, 50, 60, 70, 80, 90, 100, 0, 10, 20, 30, 40, 50, 60, 70, 80, 90, 100)), .Names = c("Age", "NoP", "pctOAC"), row.names = c(NA, -22L), class = "data.frame")
I would like to have the smooth lines constrained to lie above zero, perhaps something similar to a kernel density. In fact if I had the underlying data, I expect a kernel density is exactly what I would want, but I only have the aggregated data. Is there any way to do this ? I tried using different method= in the geom_smooth, but the small dataset seems to prevent it. I wondered about using stat_function but I don't have much clue about how to proceed with finding a suitable function to plot.
Another possibility is to use method="glm" with a spline curve and a log link (i.e. also tried method="gam", but its automatic complexity adjustment wanted to reduce the wiggliness too much:
library(splines)
ggplot(dt.1, aes(x=pctOAC,y=NoP, fill=Age)) +
geom_bar(stat="identity",position=position_dodge()) +
geom_smooth(aes(colour=Age), se=F,
method="glm",
formula=y~ns(x,8),
family=gaussian(link="log"),
show_guide = FALSE,lwd=0.7) +
theme(legend.position=c(.2,0.8))
How about geom_density()?
ggplot(dt1, aes(x=pctOAC,y=NoP, colour=Age, fill=Age)) +
geom_bar(stat="identity",position=position_dodge()) +
geom_density(stat="identity", fill=NA) +
theme(legend.position=c(.2,0.8))