R - boxplot log scale problem with 0 values

R - boxplot log scale problem with 0 values - r

I have a dataset that consists of 0 values. I want to use log scale but because of the 0 values, it is returning an error. I tried to replace 0s with 1s and it returned something that did not seem right.
As you can see in the figure, I have very small values for the 16k case but to show it clearly, I want to use log scale. Also, I want the order to be 8k_B, 8k_S, 16k_B, 16k_S. I tried factor and levels but still it didn't change the order.
Can someone please help? I can post the data if necessary. Thank you.
Here is the code I used.
data_freq <- data.frame(name=c( rep("8K_B",24), rep("8K_S",24), rep("16_B",24), rep("16K_S",24)),sines=c(rep("B",24),rep("S",24),rep("B",24),rep("S",24)),
value_freq=c( r1B$Frequency, r1S$Frequency, r2B$Frequency, r2S$Frequency)
)
p <- ggplot(data_freq, aes(x=name, y=value_freq, fill=name)) +
geom_boxplot()
Here is the data:
data_freq <- structure(list(name = structure(c(3L, 3L, 3L, 3L, 3L, 3L, 3L,
3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L,
3L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L,
4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L), .Label = c("16K_S", "16_B",
"8K_B", "8K_S"), class = "factor"), sines = structure(c(1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L), .Label = c("B",
"S"), class = "factor"), value_freq = c(6.269822e-05, 5.494403e-05,
5.84888e-05, 5.727028e-05, 7.300023e-05, 6.502448e-05, 6.568913e-05,
5.771338e-05, 5.638409e-05, 5.693796e-05, 5.527635e-05, 6.103661e-05,
5.660564e-05, 6.269822e-05, 5.594099e-05, 6.978778e-05, 5.571945e-05,
6.258745e-05, 6.779384e-05, 6.668609e-05, 6.048274e-05, 5.826725e-05,
5.671641e-05, 6.070429e-05, 9.433902e-05, 8.037108e-05, 8.203393e-05,
8.591391e-05, 9.633444e-05, 9.123503e-05, 8.946133e-05, 8.447278e-05,
7.638024e-05, 8.103622e-05, 8.15905e-05, 8.480535e-05, 7.527167e-05,
8.779847e-05, 8.192307e-05, 9.7443e-05, 7.649109e-05, 8.425106e-05,
9.134589e-05, 9.555844e-05, 8.724419e-05, 7.881908e-05, 7.771052e-05,
8.358592e-05, 1.1077e-07, 1.1077e-07, 0, 0, 1.1077e-07, 0, 0,
1.1077e-07, 1.1077e-07, 0, 0, 0, 0, 0, 3.3232e-07, 0, 2.2155e-07,
4.431e-07, 1.1077e-07, 1.1077e-07, 1.1077e-07, 0, 2.2155e-07,
0, 5.5428e-07, 5.5428e-07, 6.6514e-07, 6.6514e-07, 7.64911e-06,
6.6514e-07, 6.6514e-07, 1.1086e-07, 5.5428e-07, 6.6514e-07, 6.6514e-07,
6.6514e-07, 3.3257e-07, 6.6514e-07, 0, 6.6514e-07, 3.87998e-06,
6.6514e-06, 1.1086e-07, 1.1086e-07, 1.1086e-07, 3.3257e-07, 3.3257e-07,
1.10857e-06)), class = "data.frame", row.names = c(NA, -96L))

You could try to do log(x+n) transformation instead.
p <- data_freq %>%
mutate(value_freq = log(value_freq + 0.000001)) %>% # your numbers are really small so I am adding a small number
ggplot(aes(x=name, y=value_freq, fill=name)) +
geom_boxplot()
Alternatively, you can try square root transformation.
p <- data_freq %>%
mutate(value_freq = value_freq^(1/2)) %>%
ggplot(aes(x=name, y=value_freq, fill=name)) +
geom_boxplot()
Or do the transformation using ggplot:
p <- data_freq %>%
ggplot(aes(x=name, y=value_freq, fill=name)) +
geom_boxplot() +
scale_y_log10()

Related

Running ggplot inside foreach is slower than running it with just a for loop

I am making a map of a dataset with 20 million data points in ggplot. A single map (with facetting) takes 10-15 mins, so I checked if using multiple cores in parallel mode would work better. Using foreach, maps took more time to make. Some foreach runs were running for 30min-1 hour without producing any results. I know that the answer to this question (Why is the parallel package slower than just using apply?) shows that in parallel computation, sometimes it takes more time to combine the result from the separate parallel processes than running the task itself. But I saw some run examples when I was searching that minute-runs can be improved. Do you think it is possible?
Due to the sheer amount of data I have, I sampled my dataset:
structure(list(lat = c(46.791667, 52.958333, 57.375, 62.625,
74.041667, 60.208333, 30.208333, 56.791667, 57.375, 40.958333,
56.541667, 38.958333, 35.291667, 43.625, 71.375, 66.875, 74.375,
66.458333, 47.791667, 48.041667, 41.541667, 40.875, 57.208333,
64.375, 42.625, 43.958333, 69.958333, 72.375, 36.875, 66.958333,
39.791667, 36.625, 52.625, 65.708333, 42.208333, 53.708333, 35.458333,
58.625, 34.875, 57.291667, 59.708333, 61.708333, 72.041667, 59.958333,
32.208333, 43.625, 39.541667, 62.625, 41.208333, 32.291667, 48.958333,
47.291667, 60.375, 49.458333, 37.208333, 65.708333, 57.958333,
31.041667, 63.875, 43.625, 54.541667, 55.541667, 45.458333, 72.375,
54.708333, 37.958333, 32.375, 60.125, 59.041667, 37.875, 42.958333,
44.375, 59.791667, 49.208333, 34.375, 53.208333, 59.458333, 53.375,
45.458333, 72.125, 66.208333, 60.958333, 47.625, 60.291667, 41.125,
67.541667, 54.625, 55.541667, 37.541667, 44.291667, 44.458333,
40.041667, 49.458333, 39.625, 73.375, 41.458333, 71.375, 31.041667,
66.791667, 42.541667), lon = c(-6.125, -19.541667, -29.291667,
-9.2083333, -11.541667, -6.625, -25.708333, -14.458333, -48.291667,
-63.541667, -41.291667, -12.541667, -48.291667, -58.708333, 6.625,
-2.7083333, -69.375, -19.291667, -27.208333, -36.625, -17.791667,
-50.541667, -38.708333, 9.375, -56.208333, -44.958333, -59.041667,
8.875, -21.125, -24.791667, -40.375, -26.208333, -31.875, -11.875,
-60.958333, -39.125, -32.458333, -54.791667, -44.541667, -37.958333,
-48.625, -10.541667, 3.5416667, -17.791667, -16.041667, -9.9583333,
-32.708333, 0.875, -18.625, -54.208333, -63.125, -56.458333,
-55.125, -22.708333, -40.958333, -56.208333, -33.625, -69.125,
-58.125, -17.541667, -23.541667, -17.041667, -18.458333, -64.791667,
-25.208333, -35.875, -44.791667, -11.291667, -58.291667, -46.208333,
-41.208333, -5.0416667, -38.208333, -13.875, -55.291667, -17.291667,
-48.625, -38.791667, -59.375, -19.291667, 5.5416667, -19.625,
-41.375, -66.291667, -17.625, -14.208333, -39.291667, -48.875,
-16.541667, -21.375, -46.375, 2.625, -60.291667, -40.375, 2.4583333,
-16.458333, 4.875, -66.291667, -4.2916667, -36.458333), entity = structure(c(2L,
2L, 1L, 1L, 2L, 2L, 1L, 2L, 2L, 1L, 2L, 1L, 2L, 2L, 1L, 2L, 2L,
2L, 2L, 1L, 2L, 2L, 2L, 1L, 2L, 2L, 1L, 1L, 2L, 2L, 2L, 1L, 2L,
1L, 2L, 2L, 2L, 1L, 1L, 1L, 2L, 2L, 1L, 2L, 2L, 1L, 2L, 1L, 2L,
1L, 1L, 1L, 1L, 1L, 1L, 2L, 1L, 2L, 2L, 2L, 2L, 2L, 1L, 2L, 2L,
2L, 1L, 2L, 1L, 1L, 1L, 2L, 1L, 1L, 1L, 1L, 1L, 2L, 1L, 2L, 1L,
2L, 1L, 1L, 2L, 2L, 1L, 2L, 1L, 2L, 1L, 1L, 2L, 1L, 1L, 1L, 1L,
2L, 1L, 1L), .Label = c("Cc", "Li"), class = "factor"), watcon = structure(c(1L,
2L, 2L, 1L, 1L, 2L, 2L, 2L, 1L, 2L, 2L, 1L, 1L, 2L, 2L, 2L, 1L,
2L, 2L, 2L, 2L, 1L, 1L, 2L, 2L, 2L, 1L, 2L, 2L, 1L, 1L, 1L, 2L,
2L, 2L, 2L, 2L, 1L, 1L, 1L, 2L, 1L, 2L, 1L, 2L, 2L, 2L, 2L, 1L,
1L, 2L, 1L, 1L, 1L, 2L, 1L, 2L, 2L, 1L, 1L, 1L, 1L, 1L, 2L, 1L,
1L, 1L, 1L, 2L, 2L, 2L, 2L, 1L, 1L, 1L, 2L, 1L, 2L, 1L, 2L, 1L,
1L, 1L, 2L, 1L, 1L, 1L, 2L, 2L, 1L, 1L, 2L, 1L, 1L, 1L, 2L, 1L,
2L, 2L, 2L), .Label = c("calm", "stormy"), class = "factor"),
step = structure(c(1L, 3L, 2L, 4L, 4L, 2L, 3L, 3L, 3L, 4L,
2L, 2L, 3L, 2L, 1L, 3L, 3L, 3L, 4L, 4L, 4L, 1L, 1L, 1L, 2L,
4L, 1L, 2L, 2L, 4L, 2L, 4L, 4L, 3L, 4L, 2L, 2L, 1L, 2L, 2L,
1L, 4L, 3L, 2L, 3L, 3L, 2L, 4L, 1L, 3L, 2L, 2L, 3L, 4L, 4L,
2L, 2L, 3L, 4L, 1L, 3L, 2L, 1L, 4L, 2L, 3L, 1L, 3L, 1L, 2L,
1L, 4L, 4L, 3L, 1L, 1L, 1L, 3L, 1L, 2L, 2L, 4L, 3L, 4L, 4L,
4L, 1L, 2L, 1L, 4L, 3L, 4L, 2L, 4L, 1L, 4L, 1L, 2L, 2L, 4L
), .Label = c("abundance", "enccomb", "adscomb", "infcomb"
), class = "factor")), row.names = c(NA, -100L), class = "data.frame")
The code I am using is:
library(oceanmap)
library(sf)
library(ggmap)
library(rnaturalearth)
library(rnaturalearthdata)
library(rgeos)
library(tidyverse)
world <- ne_countries(scale = "medium", returnclass = "sf")
library(doParallel)
library(foreach)
cl <- makeCluster(5)
doParallel::registerDoParallel(cl)
entities <- unique(sample$entity)
foreach(i=1:length(entities), .packages = c("tidyverse", "dplyr")) %dopar% {
ggplot (data=world) + geom_sf(color="white", fill="white") + coord_sf(xlim = c(-34,-30), ylim = c(53,62) , expand = FALSE) + geom_point(subset(sample, sample$entity==entities[i]), mapping=aes(x = lon, y = lat, color = log10(value))) + facet_grid(watcon~step2) +
ggtitle(entities[i])
ggsave(filename = paste0(i,".png"))
}
stopCluster(cl)
P.S. I am using ggplot because of the opportunities for editing the map itself. Lattice is faster but lacks the customization that I am looking for.
Any help on how to improve my foreach code or if it's even possible is greatly appreciated!

Plot linear regression analysis with error bar for variability

I wanted to make plots that look like figure 1 (source: link)
In figure 1, they have plotted the regression analysis with one-year yield variability. In my case, I would like to plot variability between two locations and 4 blocks for each treatment group. So the plot I wanted would have three facets for factors B.glucosidase, Protein, POX.C of variable and four colors for treatments factors. Also, in my current plot I have legend for block and treatment. I should only have treatment because the block should be used for making error bar for variability.
I tried with this code, which obviously doesn't work for what I want. (Data for df.melted included below.)
ggplot(df.melted, aes(x = value, y = yield, color = as.factor(treatment))) +
geom_point(aes(shape= as.factor(block))) +
stat_smooth(method = "lm", formula = y ~ x, col = "darkslategrey", se=F) +
stat_poly_eq(formula = y~x,
# aes(label = paste(..eq.label.., ..rr.label.., sep = "~~~")),
aes(label = ..rr.label..),
parse = TRUE) +
theme_classic() +
geom_errorbar(aes(ymax = df.melted$yield+sd(df.melted$yield), ymin = df.melted$yield-sd(df.melted$yield)), width = 0.05)+
facet_wrap(~variable)
Data:
df.melted <- structure(list(Location = structure(c(1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L), .Label = c("M", "U"), class = "factor"),
treatment = structure(c(1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 3L,
3L, 3L, 3L, 4L, 4L, 4L, 4L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L,
3L, 3L, 3L, 3L, 4L, 4L, 4L, 4L, 1L, 1L, 1L, 1L, 2L, 2L, 2L,
2L, 3L, 3L, 3L, 3L, 4L, 4L, 4L, 4L, 1L, 1L, 1L, 1L, 2L, 2L,
2L, 2L, 3L, 3L, 3L, 3L, 4L, 4L, 4L, 4L, 1L, 1L, 1L, 1L, 2L,
2L, 2L, 2L, 3L, 3L, 3L, 3L, 4L, 4L, 4L, 4L, 1L, 1L, 1L, 1L,
2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 4L, 4L, 4L, 4L), .Label = c("CC",
"CCS", "CS", "SCS"), class = "factor"), block = c(1L, 2L,
3L, 4L, 1L, 2L, 3L, 4L, 1L, 2L, 3L, 4L, 1L, 2L, 3L, 4L, 1L,
2L, 3L, 4L, 1L, 2L, 3L, 4L, 1L, 2L, 3L, 4L, 1L, 2L, 3L, 4L,
1L, 2L, 3L, 4L, 1L, 2L, 3L, 4L, 1L, 2L, 3L, 4L, 1L, 2L, 3L,
4L, 1L, 2L, 3L, 4L, 1L, 2L, 3L, 4L, 1L, 2L, 3L, 4L, 1L, 2L,
3L, 4L, 1L, 2L, 3L, 4L, 1L, 2L, 3L, 4L, 1L, 2L, 3L, 4L, 1L,
2L, 3L, 4L, 1L, 2L, 3L, 4L, 1L, 2L, 3L, 4L, 1L, 2L, 3L, 4L,
1L, 2L, 3L, 4L), yield = c(5156L, 5157L, 5551L, 5156L, 4804L,
4720L, 4757L, 5021L, 4826L, 4807L, 4475L, 4596L, 4669L, 4588L,
4542L, 4592L, 5583L, 5442L, 5693L, 5739L, 5045L, 4902L, 5006L,
5086L, 4639L, 4781L, 4934L, 4857L, 4537L, 4890L, 4842L, 4608L,
5156L, 5157L, 5551L, 5156L, 4804L, 4720L, 4757L, 5021L, 4826L,
4807L, 4475L, 4596L, 4669L, 4588L, 4542L, 4592L, 5583L, 5442L,
5693L, 5739L, 5045L, 4902L, 5006L, 5086L, 4639L, 4781L, 4934L,
4857L, 4537L, 4890L, 4842L, 4608L, 5156L, 5157L, 5551L, 5156L,
4804L, 4720L, 4757L, 5021L, 4826L, 4807L, 4475L, 4596L, 4669L,
4588L, 4542L, 4592L, 5583L, 5442L, 5693L, 5739L, 5045L, 4902L,
5006L, 5086L, 4639L, 4781L, 4934L, 4857L, 4537L, 4890L, 4842L,
4608L), variable = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 3L, 3L, 3L,
3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L,
3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L), .Label = c("B.glucosidase",
"Protein", "POX.C"), class = "factor"), value = c(1.600946,
1.474084, 1.433078, 1.532492, 1.198667, 1.193193, 1.214941,
1.360981, 1.853056, 1.690117, 1.544357, 1.825132, 1.695409,
1.764123, 1.903743, 1.538684, 0.845077, 1.011463, 0.857032,
0.989803, 0.859022, 0.919467, 1.01717, 0.861689, 0.972332,
0.952922, 0.804431, 0.742634, 1.195837, 1.267285, 1.08571,
1.20097, 6212.631579, 5641.403509, 4392.280702, 7120.701754,
5305.964912, 4936.842105, 5383.157895, 6077.894737, 5769.122807,
5016.842105, 5060.350877, 5967.017544, 5576.842105, 5174.035088,
5655.438596, 5468.77193, 7933.333333, 7000, 6352.982456,
8153.684211, 6077.894737, 4939.649123, 5002.807018, 6489.122807,
4694.035088, 5901.052632, 4303.859649, 6768.421053, 6159.298246,
6090.526316, 4939.649123, 5262.45614, 810.3024, 835.5242,
856.206, 759.8589, 726.2298, 792.6472, 724.7165, 699.3266,
500.9153, 634.8698, 637.9536, 648.8814, 641.0357, 623.3822,
555.2834, 520.8119, 683.3528, 595.9173, 635.4315, 672.4234,
847.2944, 745.5665, 778.3548, 735.8141, 395.2647, 570.4148,
458.0383, 535.3851, 678.0293, 670.7419, 335.2923, 562.5674
)), row.names = c(NA, -96L), class = "data.frame")

library(dplyr)
library(ggplot2)
library(ggpmisc)
Summarize data frame (this could also be done with stat_summary(), but it's often clearer/more transparent to do it explicitly up front). (I think that because your data set is balanced you could collapse/average over the block structure first, and then do your whole plot with the reduced data set - it shouldn't change the outcome of the linear regressions at all, at least not the mean values ... and any statistical comparisons should probably done on block-level summaries anyway ...)
df.sum <- (df.melted
%>% group_by(Location,treatment,variable)
%>% summarise(value=mean(value),yield_sd=sd(yield),
## collapse yield to mean *after* computing sd!
yield=mean(yield))
)
Plot:
(ggplot(df.melted,
aes(x = value, y = yield, color = treatment))
+ stat_smooth(method = "lm", col = "darkslategrey", se=FALSE)
+ stat_poly_eq(
formula = y ~ x,
## aes(label = paste(..eq.label.., ..rr.label.., sep = "~~~")),
aes(group=1, label = ..rr.label..),
parse = TRUE)
+ theme_classic()
+ scale_shape(guide=FALSE)
+ geom_point(data=df.sum)
+ geom_errorbar(data=df.sum,
aes(ymax = yield+yield_sd, ymin = yield-yield_sd),
width = 0.05)
+ facet_wrap(~variable,scale="free_x")
)
(adding group=1 to the stat_poly_eq() aesthetics means we only plot a single R^2 value per facet)
Since you're no longer using the shape aesthetic for anything, you could consider using it to show the Location variable ...

Geom Bar Plot, sum of count not visible

I have a dataframe tag, with 51X5 structure
structure(list(Tagging = structure(c(1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L), .Label = c("CIRCLE CAMPIAGN",
"NATIONAL CAMPIAGN"), class = "factor"), Status = structure(c(1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 2L), .Label = c("Negative", "Positive"), class = "factor"),
Month = structure(c(1L, 1L, 1L, 2L, 2L, 2L, 2L, 3L, 3L, 3L,
3L, 3L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 3L, 3L, 3L,
3L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 3L, 1L, 1L,
1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 3L), .Label = c("JUL",
"JUN", "MAY"), class = "factor"), Category = structure(c(1L,
4L, 6L, 1L, 2L, 4L, 6L, 1L, 2L, 4L, 5L, 6L, 1L, 2L, 4L, 5L,
6L, 1L, 2L, 4L, 5L, 6L, 1L, 2L, 4L, 6L, 1L, 4L, 6L, 2L, 3L,
4L, 6L, 1L, 2L, 3L, 4L, 6L, 1L, 2L, 3L, 4L, 5L, 6L, 1L, 2L,
3L, 4L, 5L, 6L, 6L), .Label = c("Data", "Other", "Roaming",
"Unlimited", "VAS", "Voice"), class = "factor"), count = c(3L,
2L, 1L, 4L, 5L, 2L, 1L, 2L, 6L, 7L, 2L, 3L, 4L, 9L, 6L, 2L,
3L, 3L, 3L, 10L, 2L, 5L, 5L, 5L, 4L, 3L, 1L, 1L, 1L, 2L,
1L, 1L, 1L, 2L, 2L, 1L, 2L, 2L, 2L, 4L, 1L, 1L, 3L, 3L, 2L,
1L, 1L, 1L, 3L, 4L, 2L)), class = "data.frame", row.names = c(NA,
-51L))
I want to create a bar plot (ggplot) to show bar graph with label on bar as sum of count of category month wise I am using below code
ggplot(data = tag, aes(x = Tagging, y = count, fill = Status)) +
geom_col() +
labs(x = "Tagging", y = "Count", title = "FlyTxt ROI", subtitle = "Statistics") +
geom_text(aes(label = count), color = "white", size = 3, position = position_stack(vjust = 0.5)) +
theme_minimal()+facet_wrap(~Month)
But I am getting split count values:
Help as I want only sum of count for each status

The problem is, that the information you show in the bar is accumulated by geom_col over all categories but the geom_text doesn't do that.
On option is to pre-summarize the data (to get rid of the category split) and then plot the graph.
library(tidyverse)
tag_sum <- tag %>%
group_by(Tagging, Status, Month) %>%
summarise(count_sm = sum(count))
ggplot(data = tag_sum, aes(x = Tagging, y = count_sm, fill = Status)) +
geom_col() +
geom_text(aes(label = count_sm), color = "white", size = 3,
position = position_stack(vjust = 0.5)) +
facet_wrap(~Month) +
labs(x = "Tagging", y = "Count", title = "FlyTxt ROI", subtitle = "Statistics") +
theme_minimal()

How to add significance letters (anova) to ggplot geom_bar with facet wrap

When I make a barplot with significance letters from anova above the bars, I use following code:
anova_NDW_geel<-aov(nodule_dry_weight~treatment,inoculatieproef_geel_variety2)
HSD_NDW_geel <- HSD.test(anova_NDW_geel,"treatment",alpha=0.05,group=TRUE)$groups
HSD_NDW_means_geel <- HSD.test(anova_NDW_geel,"treatment",alpha=0.05,group=TRUE)$means
HSD_NDW_means_geel <- HSD_NDW_means_geel[order(-HSD_NDW_means_geel$nodule_dry_weight),]
p_HSD_NDW_geel <- ggplot(aes(x=treatment, y=NDW_mean_geel, width=0.6), data=inoculatieproef_mean_geel)+
geom_bar(stat="identity", data=HSD_NDW_geel, aes(x=trt, y=means), fill="gray40")+
geom_text(data=HSD_NDW_geel, aes(x=trt, y=means, label=M), size=5, vjust=-1, hjust=1)+
ggtitle("Zand")+
ylab("Droog gewicht wortelknolletjes (g)")+
xlab("Behandeling")+
geom_errorbar(aes(ymin=NDW_mean_geel-NDW_sd_geel,ymax=NDW_mean_geel+NDW_sd_geel),
position=position_dodge(width=0.5),width=0.1,size=0.3)+
theme_bw() +
theme(axis.line = element_line(colour="black"),
panel.grid.major = element_blank(),
panel.grid.minor = element_blank(),
panel.border = element_blank(),
panel.background = element_blank())+
scale_y_continuous(expand = c(0, 0))+
theme(axis.text.x = element_text(angle = 0, hjust = 1, vjust = 0.5))+
theme(text = element_text(size=12))
which results in following graph: http://i.stack.imgur.com/bZidZ.png
This is probably not the best way to do this and when I want to add the letters to the barplots with facet wrap.
Here is a sample of the data I want to make a facet wrap with significance letters with:
structure(list(treatment = structure(c(1L, 2L, 3L, 4L, 5L, 6L,
7L, 8L, 1L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 1L, 2L, 3L, 4L, 5L, 6L,
7L, 8L, 1L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 1L, 2L, 3L, 4L, 5L, 6L,
7L, 8L, 1L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 1L, 2L, 3L, 4L, 5L, 6L,
7L, 8L, 1L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 1L, 2L, 3L, 4L, 5L, 6L,
7L, 8L, 1L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 1L, 2L, 3L, 4L, 5L, 6L,
7L, 8L, 1L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 1L, 2L, 3L, 4L, 5L, 6L,
7L, 8L, 1L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 1L, 2L, 3L, 4L, 5L, 6L,
7L, 8L, 1L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 1L, 2L, 3L, 4L, 5L, 6L,
7L, 8L, 1L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 1L, 2L, 3L, 4L, 5L, 6L,
7L, 8L, 1L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 1L, 2L, 3L, 4L, 5L, 6L,
7L, 8L, 1L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 1L, 2L, 3L, 4L, 5L, 6L,
7L, 8L, 1L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 1L, 2L, 3L, 4L, 5L, 6L,
7L, 8L, 1L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 1L, 2L, 3L, 4L, 5L, 6L,
7L, 8L, 1L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 1L, 2L, 3L, 4L, 5L, 6L,
7L, 8L, 1L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 1L, 2L, 3L, 4L, 5L, 6L,
7L, 8L, 1L, 2L, 3L, 4L, 5L, 6L, 7L, 8L), .Label = c("1", "2",
"3", "4", "5", "6", "7", "8"), class = "factor"), block = structure(c(1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 3L,
3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 4L,
4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 3L,
3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 4L,
4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 3L,
3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 4L,
4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 3L,
3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 4L,
4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L), .Label = c("I",
"II", "III", "IV"), class = "factor"), position = structure(c(2L,
1L, 2L, 1L, 2L, 2L, 2L, 2L, 2L, 1L, 2L, 1L, 2L, 2L, 2L, 2L, 1L,
2L, 1L, 2L, 2L, 2L, 2L, 2L, 1L, 2L, 1L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 1L, 2L, 1L, 2L, 2L, 2L, 2L, 2L, 1L, 2L, 1L, 2L, 2L,
2L, 2L, 2L, 2L, 1L, 2L, 1L, 2L, 2L, 2L, 2L, 2L, 1L, 2L, 1L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L), .Label = c("b",
"gem(ab)"), class = "factor"), variety = structure(c(1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L), .Label = c("1",
"2"), class = "factor"), location = structure(c(2L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L), .Label = c("Geel",
"Merelbeke"), class = "factor"), year = structure(c(1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L), .Label = c("2014",
"2015"), class = "factor"), nodule_dry_weight = c(0, 0.0467,
0.0328, 0.0885, 0.0081, 0.1086, 0.0788, 0.0267, 0, 0.0128, 0.0143,
0.0333, 0.006, 0.098, 0.0286, 0.011, 0, 0.0627, 0.0769, 0.0784,
0.023, 0.1504, 0.1026, 0.0254, 0, 0.0597, 0.0158, 0.0354, 0.0226,
0.3261, 0.0436, 0, 0, 0.0203, 0.0469, 0.0904, 0.1593, 0.0836,
0.056, 0.0037, 0, 0.0534, 0.0901, 0.0435, 0.0248, 0.0435, 0.0279,
0.0029, 0, 0.0545, 0.038, 0.0991, 0.0099, 0.1453, 0.1096, 0.0272,
0, 0.0319, 0.0624, 0.0508, 0.0415, 0.11, 0.0079, 0, 0, 0.1257,
0.1242, 0.2899, 0.024, 0.2175, 0.2979, 0.0396, 0, 0.1583, 0.2935,
0.2541, 0.1027, 0.4196, 0.2059, 0.0396, 0, 0.0891, 0.167, 0.0907,
0.2153, 0.3063, 0.2921, 0.0528, 0, 0.0928, 0.2109, 0.1514, 0.0821,
0.3607, 0.0996, 0.0069, 0, 0.0685, 0.3109, 0.1862, 0.0393, 0.286,
0.3418, 0.0459, 0, 0.0765, 0.3486, 0.3988, 0.1155, 0.6341, 0.3653,
0.039, 0, 0.0766, 0.3112, 0.1988, 0.05, 0.2856, 0.34, 0.0862,
0, 0.2621, 0.1146, 0.393, 0.1644, 0.3415, 0.1343, 0.019, 0, 0.0976,
0.1853, 0.0691, 0.0248, 0.1764, 0.1244, 0.1525, 0, 0.1529, 0.1069,
0.2833, 0.0204, 0.2966, 0.2371, 0.1464, 0, 0.0691, 0.2094, 0.1633,
0.0264, 0.1344, 0.0694, 0.1175, 0, 0.1783, 0.1434, 0.2136, 0.0873,
0.19, 0.1683, 0.1927, 0, 0.0571, 0.0599, 0.1061, 0.0244, 0.1256,
0.0894, 0.0123, 0, 0.1696, 0.1046, 0.2164, 0.0939, 0.1552, 0.2942,
0.1652, 0, 0.0844, 0.102, 0.0227, 0.025, 0.0654, 0.1234, 0.0702,
0, 0.0979, 0.1246, 0.0958, 0.0867, 0.1104, 0.1969, 0.227, 0,
0.3704, 0.4727, 0.2527, 0.2078, 0.3377, 0.308, 0.1293, 0, 0.2417,
0.3744, 0.2916, 0.1773, 0.433, 0.2446, 0.1382, 0, 0.4718, 0.4271,
0.4882, 0.1799, 0.4178, 0.518, 0.3915, 0, 0.3421, 0.3804, 0.2112,
0.4292, 0.3829, 0.1315, 0.2719, 0, 0.3197, 0.6867, 0.414, 0.3112,
0.2914, 0.4994, 0.369, 0.0256, 0.1494, 0.5577, 0.2538, 0.3854,
0.4151, 0.544, 0.4009, 0, 0.5208, 0.2962, 0.4175, 0.2689, 0.3374,
0.5075, 0.3601, 0, 0.704, 0.4631, 0.4573, 0.154, 0.5087, 0.4319,
0.4155)), .Names = c("treatment", "block", "position", "variety",
"location", "year", "nodule_dry_weight"), row.names = c(NA, -256L
), class = "data.frame")
I use following code for my graph with facet wrap:
inoculatieproef <- inoculatieproef %>%
group_by(treatment, location, variety, year) %>%
mutate(NDW_mean = mean(nodule_dry_weight),
NDW_sd = sd(nodule_dry_weight))
ggplot(data=inoculatieproef,aes(x=treatment, y=NDW_mean))+
facet_wrap(~location*variety*year,ncol=2)+
geom_bar(position="dodge", stat="identity")+
geom_errorbar(aes(ymin = NDW_mean - NDW_sd,
ymax = NDW_mean + NDW_sd),
width=0.1,size=0.3,
color = "darkgrey")+
theme_bw() +
theme(axis.line = element_line(colour="black"),
panel.grid.minor = element_blank(),
panel.background = element_blank())
How do I add on each barplot the significance letters (anova) in de the facet wrap graph?

No idea if the test fits your data distribution, but you can start with that:
library(tidyverse)
stat_pvalue <- dd %>%
group_by(location, variety, year) %>%
rstatix::t_test(nodule_dry_weight~treatment) %>%
filter(p < 0.05) %>%
group_by(location, variety, year) %>%
rstatix::add_significance("p") %>%
rstatix::add_y_position() %>%
mutate(y.position = seq(min(y.position), max(y.position),length.out = n())*1.1) %>%
ungroup()
ggplot(data=dd,aes(x=treatment, y=nodule_dry_weight))+
geom_boxplot() +
facet_wrap(~location + variety + year,ncol=2, scales = "free_y") +
ggpubr::stat_pvalue_manual(stat_pvalue, label = "p")

Bar chart for factorial designs in R

I'm currently trying to create a clustered bar chart using ggplot2. It's basically just mean response times for a 2x2x2 factorial design. The three factors are load, compatibility and salience. I'm having a hard time jamming the third factor (salience) in there though. It shouldn't be a stacked graph though
This is what I currently have
bar+stat_summary(fun.y = mean, geom = "bar", position = "dodge") +
+ stat_summary(fun.data = mean_cl_normal, geom = "errorbar", position = position_dodge(width = 0.90), width = 0.2)+
+ labs(x = "Compatibility", y = "Mean RT", fill = "Load")
Here's a small sample of the data I'm trying to graph:
ID load comp sal rt
1 1 High Incompatible Non_Salient 787
2 1 Low Compatible Salient 754
3 2 High Incompatible Salient 654
I've seen graphs like these numerous times before but I have no idea how to get ggplot2 to display three independent variables at the same time.
I've tried splitting the graphs by adding
+ facet_wrap( ~ sal)
but that doesn't work either. It just says "Invalid argument to unary operator"
Any help would be appreciated.

Is this the kind of plot you are looking for?
I used the Wii data from the book "Discovering Statistics Using R", which is in a similar format to yours.
structure(list(athlete = structure(c(1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 2L), .Label = c("Athlete", "Non-Athlete"), class = "factor"),
stretch = structure(c(2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L), .Label = c("No Stretching", "Stretching"
), class = "factor"), wii = structure(c(1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L), .Label = c("Playing Wii",
"Watching Wii"), class = "factor"), injury = c(2L, 2L, 1L,
2L, 0L, 1L, 2L, 0L, 2L, 2L, 2L, 1L, 4L, 2L, 2L, 0L, 0L, 3L,
3L, 3L, 2L, 1L, 0L, 2L, 2L, 3L, 2L, 2L, 3L, 1L, 2L, 4L, 1L,
2L, 2L, 2L, 1L, 4L, 4L, 1L, 2L, 3L, 3L, 3L, 3L, 2L, 3L, 2L,
2L, 2L, 1L, 0L, 3L, 3L, 2L, 1L, 2L, 4L, 1L, 2L, 5L, 5L, 3L,
6L, 4L, 3L, 4L, 5L, 5L, 2L, 6L, 4L, 4L, 4L, 3L, 4L, 3L, 2L,
1L, 4L, 3L, 2L, 2L, 1L, 3L, 1L, 1L, 3L, 4L, 2L, 7L, 8L, 6L,
9L, 4L, 7L, 5L, 9L, 6L, 4L, 8L, 5L, 4L, 7L, 10L, 1L, 3L,
2L, 1L, 3L, 3L, 2L, 3L, 4L, 2L, 0L, 1L, 3L, 2L, 0L)),
.Names = c("athlete", "stretch", "wii", "injury"),
class = "data.frame", row.names = c(NA, -120L))
Here is how to produce the plot.
library(ggplot2)
library(Hmisc)
ggplot(data=Wii, aes(x=stretch, y=injury, fill=wii)) +
facet_wrap(~athlete) +
stat_summary(fun.y = mean, geom = "bar", position = "dodge") +
stat_summary(fun.data = mean_cl_normal, geom = "errorbar", position = position_dodge(width = 0.90), width = 0.2)

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

R - boxplot log scale problem with 0 values - r

Related

Running ggplot inside foreach is slower than running it with just a for loop

Plot linear regression analysis with error bar for variability

Geom Bar Plot, sum of count not visible

How to add significance letters (anova) to ggplot geom_bar with facet wrap

Bar chart for factorial designs in R

Categories

Resources