I used facet_wrap to visually split into two single panels representing for different letters. I would like to add astertiks and bars to show the level of significance between two groups (m and n).
My pseudo data looks like this:
my_data <- data.frame(Letter = c("a", "a", "a", "a", "a",
"b", "b", "b", "b", "b"),
value = c(19,13.5, 6.4, 17.5, 14.2,
0.3, 0.4, 0.7, 0.8,0.9),
group = c("m", "n"))
To compare the significant difference between groups, I am using anova and will plot boxplot for visualization. To meet the assumptions of using anova, I did transform data using different kinds of transformation (e.g., log10, Boxcox, etc).
My first visualization looks like:
ggplot(my_data, aes(group, value, colour = group, shape = group)) +
geom_boxplot(width = .5, alpha = 2) +
ylab(NULL) +
xlab("") +
facet_wrap(~Letter, scales = "free", ncol=4, strip.position = "left") +
facetted_pos_scales(
y = list(
Letter == "a" ~ scale_y_log10(),
Letter == "b" ~ scale_y_log10()))
I would like to add the astertiks and bars manually into each panel having different y scales. Any suggestions for this? Thank you in advance!!!
My desired output looks like:
P.s I did draw another single boxplot since we could have more than 3 groups in case.
Related
I'm new to R and I'm trying to create a single plot with data from 2 melted dataframes.
Ideally I would have a legend for each of the dataframes with their respective titles; however, I get a only a single legend with the title of the first aesthetic.
My starting point is:
aerobic_melt <- melt(aerobic, id.vars = 'Distance', variable.name = 'Aerobic')
anaerobic_melt <- melt(anaerobic, id.vars = 'Distance', variable.name = 'Anaerobic')
plot <- ggplot() +
geom_line(data = aerobic_melt, aes(Distance, value, col=Aerobic)) +
geom_line(data = anaerobic_melt, aes(Distance, value, col= Anaerobic)) +
xlim(0, 125) +
ylab('Energy (J/kg )') +
xlab('Distance (m)')
Which results in
I've searched, but with my limited ability I haven't been able to find a way to do it.
My question is:
How do I create separate legends with titles 'Aerobic' and 'Anaerobic' which should respectively refer to A,B,C,F,G,L and E,H,I,J,K?
Any help is appreciated.
Obviously we don't have your data, but I have created some sample data that should have the same names and structure as your own data frames, since it works with your own plot code. See the end of the answer for the data used here.
You can use the package ggnewscale if you want two color scales on the same plot. Just add in a new_scale_color() call between your geom_line calls. I have left the rest of your code as-is.
library(ggplot2)
library(ggnewscale)
plot <- ggplot() +
geom_line(data = aerobic_melt, aes(Distance, value, col=Aerobic)) +
new_scale_color() +
geom_line(data = anaerobic_melt, aes(Distance, value, col= Anaerobic)) +
xlim(0, 125) +
ylab('Energy (J/kg )') +
xlab('Distance (m)')
plot
Data
set.seed(1)
aerobic_melt <- data.frame(
Aerobic = rep(c("A", "B", "C", "F", "G", "L"), each = 120),
value = as.numeric(replicate(6, cumsum(rnorm(120)))),
Distance = rep(1:120, 6))
anaerobic_melt <- data.frame(
Anaerobic = rep(c("E", "H", "I", "J", "K"), each = 120),
value = as.numeric(replicate(5, cumsum(rnorm(120)))),
Distance = rep(1:120, 5))
Let's say, I have data like following example,
dat1 <- data.frame(group = c("a", "a","a", "a", "a", "b", "b", "b","b","b","b","b","c","c","c"),
subgroup = c(paste0("R", rep(1:5)),paste0("R", rep(1:7)),paste0("R", rep(1:3))),
value = c(5,6,0,8,2,3,4,5,2,4,7,0,3,4,0),
pp = c("AT","BT","CT","AA","AT","TT","RT","CC","SE","DN","AA","MM","XT","QQ","HH"))
And, I want to add some cut off as dat1 = dat1[dat1$value > 2, ]. My code
pl <- ggplot(dat1, aes(y = as.character(pp), x = as.factor(subgroup))) +
geom_point( aes(size=as.numeric(value)))+ facet_grid(cols = vars(group), scales="free", space="free")+
ylab("names") +xlab(" ")
pl
enter image description here
But I want to see all scale in each panel. For example in the first panel, there are five values or five scales even if below of cut off or zero I just want to see all five scale. The second panel has 7 scales but after cut off, there should be 6 columns, but I want to see all 7 scales even if it has zero.
How can I modify my code or make as this kind of plot?
We can use the scales and space arguments in facet_grid.
ggplot(dat1, aes(subgroup, pp)) +
geom_point(aes(size = value)) +
facet_grid(cols = vars(group), scales = "free", space = "free")
I've managed to make the plot below using ggplot2 containing a boxplot, the observations and the weighted mean.
I managed to get the legend that i want by using the three different aesthetics fill, color and size, but I would like to produce the legend without using the aesthetics. Using the aesthetics make customization of the plot with regard to colors, fills and sized impossible and given I one day need a fourth element plotted, I'm running out of aesthetics.
Is there any way to treat legends individually on a "geom-basis" using the same aesthetics for all geoms?
More specifically I want to have the edges of the boxplot colored as fill, the observations colored as the boxplot and the weighted average colored black, but if I specidy these colors outside of the aes(), the legend is deleted or altered.
library(dplyr)
library(tidyr)
library(ggplot2)
study <- c(1:10)
observations <- c(seq(10, 100, by = 10))
type <- c("A", "A", "A", "A", "A", "B", "B", "B", "B", "B")
rate <- c(runif(10, 0, 1))
data1 <- data.frame(study, type, observations, rate)
average <- data1 %>%
group_by(type) %>%
summarise(rate = weighted.mean(rate, observations))
data1 %>%
ggplot() +
geom_boxplot(aes(x = type, y = rate, fill = type), alpha = 0.2) +
geom_point(aes(x = type, y = rate, size = "Observations")) +
geom_point(data = average,
aes(x = type, y = rate, color = "Weighted mean"),
shape = 18, size = 5) +
guides(fill = guide_legend(title = "Legend"),
color = guide_legend(title = ""),
size = guide_legend(title = ""))
I would like to draw lines between different elements in a stacked bar plot using ggplot2.
I have plotted a stacked barchart using ggplot2 (first figure), but would like to get something like in second figure.
dta <- tribble(
~colA, ~colB, ~colC,
"A", "a", 1,
"A", "b", 3,
"B", "a", 4,
"B", "b", 2); dta
ggplot(dta, aes(x = colA, y = colC, fill = colB)) +
geom_bar(stat = "identity")
The fastes way would probably to the add the lines by manually drawing them into the exported image. However, I prefere avoiding this.
This Stackoverflow entry (esp. the answere of Henrik) gives a potential solution. However, I was wondering whether there is another solution that is more generic (i.e. that does not require to manually define all the start and end points of the segments/lines)
You could use the "factor as numbers" trick to draw lines between the bar centers (shown, e.g., here).
In your case this needs to be combined with stacking in geom_line().
ggplot(dta, aes(x = colA, y = colC, fill = colB)) +
geom_bar(stat = "identity") +
geom_line( aes(x = as.numeric(factor(colA))),
position = position_stack())
Getting the lines to the edges instead of the center would take some manual work. It's OK if you really only have two stacks like this, but would be difficult to easily scale.
In this case you'd want to add .45 to the group that comes first on the x axis and subtract .45 from the second. This might seem magical, but the default width is 90% of the resolution of the data so I used half of 0.9.
dta = transform(dta, colA_num = ifelse(colA == "A",
as.numeric(factor(colA)) + .45,
as.numeric(factor(colA)) - .45) )
ggplot(dta, aes(x = colA, y = colC, fill = colB)) +
geom_bar(stat = "identity") +
geom_line( aes(x = colA_num),
position = position_stack())
This doesn't add a line at 0 because those values aren't in the dataset. This could be added as a segment along the lines of
annotate(geom = "segment", y = 0, yend = 0, x = 1.45, xend = 1.55)
Suppose my data is two columns, one is "Condition", one is "Stars"
food <- data.frame(Condition = c("A", "B", "A", "B", "A"), Stars=c('good','meh','meh','meh','good'))
How to make a barplot of the frequency of "Star" as grouped by "Condition"?
I read here but would like to expand that answer to include groups.
for now I have
q <- ggplot(food, aes(x=Stars))
q + geom_bar(aes(y=..count../sum(..count..)))
but that is the proportion of the full data set.
How to make a plot with four bars, that is grouped by 'Condition'?
Eg. 'Condition A' would have 'Good' as 0.66 and 'Meh' as 0.33
I guess this is what you are looking for:
food <- data.frame(Condition = c("A", "B", "A", "B", "A"), Stars=c('good','meh','meh','meh','good'))
library(ggplot2)
library(dplyr)
data <- food %>% group_by(Stars,Condition) %>% summarize(n=n()) %>% mutate(freq=n/sum(n))
ggplot(data, aes(x=Stars, fill = Condition, group = Condition)) + geom_bar(aes(y=freq), stat="identity", position = "dodge")
At first i have calculated the frequencies using dplyr package, which is used as y argument in geom_bar(). Then i have used fill=Condition argument in ggplot() which divided the bars according to Condition. Additionally i have set position="dodge" to get the bars next to each other and stat="identity", due to already calculated frequencies.
I have used value ..prop.., aesthetic group and facet_wrap(). Using aesthetic group proportions are computed by groups. And facet_wrap() is used to plot each condition separately.
require(ggplot2)
food <- data.frame(Condition = c("A", "B", "A", "B", "A"),
Stars=c('good','meh','meh','meh','good'))
ggplot(food) +
geom_bar(aes(x = Stars, y = ..prop.., group = Condition)) +
facet_wrap(~ Condition)