Show mean values in boxplots in R

Show mean values in boxplots in R - r

time_pic <- ggplot(data_box, aes(x=Kind, y=TimeTotal, fill=Sitting_Position)) +
geom_boxplot()
print(time_pic)
time_pic+labs(title="", x="", y = "Time (Sec)")
I ran the above codes to get the following image. But I don't know how to add average value for each boxplot on this image.
updated.
I tried this.
means <- aggregate(TimeTotal ~ Sitting_Position*Kind, data_box, mean)
ggplot(data=data_box, aes(x=Kind, y=TimeTotal, fill=Sitting_Position)) +
geom_boxplot() +
stat_summary(fun=mean, colour="darkred", geom="point", shape=18, size=3,show_guide = FALSE) +
geom_text(data = means, aes(label = TimeTotal, y = TimeTotal + 0.08))
This is what it looks like now. Two dots are on the same line. And two values are overlapping with each other.

As others said, you can share your dataset for more specific help, but in this case I think the point can be made using a dummy dataset. I'm creating one that looks pretty similar to your own in terms of naming, so theoretically you can just plug in this code and it could work.
The biggest thing you need here is to control how ggplot2 is separating the separate boxplots for the data_box$Sitting_Position that share the same data_box$Kind. The process of separating and spreading the boxes around that x= axis value is called "dodging". When you supply a fill= or color= (or other) aesthetic in aes() for that geom, ggplot2 knows enough that it will assume you also want to group the data according to that value. So, your initial ggplot() call has in aes() that fill=Sitting_Position, which means that geom_boxplot() "works" - it creates the separate boxes that are colored differently and which are "dodged" properly.
When you create the points and the text, ggplot2 has no idea that you want to "dodge" this data, and even if you did want to dodge, on what basis to use for the dodge, since the fill= aesthetic doesn't make sense for a text or point geom. How to fix this? The answer is to:
Supply a group= aesthetic, which can override the grouping of a fill= or color= aesthetic, but which also can serve as a basis for the dodging for geoms that do not have a similar aesthetic.
Specify more clearly how you want to dodge. This will be important for accurate positioning of all things you want to dodge. Otherwise, you will have things dodged, but maybe not the same distance.
Here's how I combined all that:
# the datasets
set.seed(1234)
data_box <- data.frame(
Kind=c(rep('Model-free AR',100),rep('Real-world',100)),
TimeTotal=c(rnorm(50,5.5,1),rnorm(50,5.43,1.1),rnorm(50,4.9,1),rnorm(50,4.7,0.2)),
Sitting_Position=rep(c(rep('face to face',50),rep('side by side',50)),2)
)
means <- aggregate(TimeTotal ~ Sitting_Position*Kind, data_box, mean)
# the plot
ggplot(data_box, aes(x=Kind, y=TimeTotal)) + theme_bw() +
# specifying dodge here and width to avoid overlapping boxes
geom_boxplot(
aes(fill=Sitting_Position),
position=position_dodge(0.6), width=0.5
) +
# note group aesthetic and same dodge call for next two objects
stat_summary(
aes(group=Sitting_Position),
position=position_dodge(0.6),
fun=mean,
geom='point', color='darkred', shape=18, size=3,
show.legend = FALSE
) +
geom_text(
data=means,
aes(label=round(TimeTotal,2), y=TimeTotal + 0.18, group=Sitting_Position),
position=position_dodge(0.6)
)
Giving you this:

Related

I want to add the count percentage for each category as a label to my ggplot pie chart

I am using the code:
age_pie_chart <- ggplot(data = data , aes(x = "", fill = `Q1: How old are you?`))+
geom_bar(position = "fill", width = 1) +
coord_polar(theta = "y") + xlab("") + ylab("") + blank_theme + scale_fill_grey()+
theme(axis.text.x=element_blank())
age_pie_chart
I want the percentage of each category to be added inside the chart.
from searching I understand that I need to use geom_text function but before that I need to construct a frequency table with the percentage of the count of each category.

Here's an example with some dummy data. First, here's the dummy data. I'm using sample() to select values and then I'm ensuring we have a "small" slice by adding only 2 values for Q1 = "-15".
set.seed(1234)
data <- data.frame(
Q1 = c(sample(c('15-7','18-20','21-25'), 98, replace=TRUE), rep('-15', 2))
)
For the plot, it's basically the same as your plot, although you don't need the width argument for geom_bar().
To get the text positioned correctly, you'll want to use geom_text(). You could calculate the frequency before plotting, or you could have ggplot calculate that on the fly much like it did for geom_bar by using stat="count" for your geom_text() layer.
library(ggplot2)
ggplot(data=data, aes(x="", fill=Q1)) +
geom_bar(position="fill") +
geom_text(
stat='count',
aes(y=after_stat(..count..),
label=after_stat(scales::percent(..count../sum(..count..),1))),
position=position_fill(0.5),
) +
coord_polar(theta="y") +
labs(x=NULL, y=NULL) +
scale_fill_brewer(palette="Pastel1") +
theme_void()
So to be clear what's going on in that geom_text() line:
I'm using stat="count". We want to access that counting statistic within the plot, so we need to specify to use that special stat function.
We are using ..count.. within the aesthetics. The key here is that in order to access the ..count.. value, we need the stat function to compute first. You can use the after_stat() function to specify that the stat function should happen first before mapping the aesthetics - that opens ..count.. up to use for use within aes()!
Percent calculation for the label. This happens via ..count../sum(..count..). I'm using scales::percent() as a way to format the label as a percent.
Position = fill. Like the bar geom, we are also needing to use position="fill". You need to specify to use the actual position_fill() method, since it allows us to access the vjust argument. This positions each text value in the "middle" of each slice. If you just set position="fill", you get the text positioned at the edge of each slice.

How to set background color for each panel in grouped boxplot?

I plotted a grouped boxplot and trying to change the background color for each panel. I can use panel.background function to change whole plot background. But how this can be done for individual panel? I found a similar question here. But I failed to adopt the code to my plot.
Top few lines of my input data look like
Code
p<-ggplot(df, aes(x=Genotype, y=Length, fill=Treatment)) + scale_fill_manual(values=c("#69b3a2", "#CF7737"))+
geom_boxplot(width=2.5)+ theme(text = element_text(size=20),panel.spacing.x=unit(0.4, "lines"),
axis.title.x=element_blank(),axis.text.x=element_blank(),axis.ticks.x=element_blank(),axis.text.y = element_text(angle=90, hjust=1,colour="black")) +
labs(x = "Genotype", y = "Petal length (cm)")+
facet_grid(~divide,scales = "free", space = "free")
p+theme(panel.background = element_rect(fill = "#F6F8F9", colour = "#E7ECF1"))

Unfortunately, like the other theme elements, the fill aesthetic of element_rect() cannot be mapped to data. You cannot just send a vector of colors to fill either (create your own mapping of sorts). In the end, the simplest solution probably is going to be very similar to the answer you linked to in your question... with a bit of a twist here.
I'll use mtcars as an example. Note that I'm converting some of the continuous variables in the dataset to factors so that we can create some more discrete values.
It's important to note, the rect geom is drawn before the boxplot geom, to ensure the boxplot appears on top of the rect.
ggplot(mtcars, aes(factor(carb), disp)) +
geom_rect(
aes(fill=factor(carb)), alpha=0.5,
xmin=-Inf, xmax=Inf, ymin=-Inf, ymax=Inf) +
geom_boxplot() +
facet_grid(~factor(carb), scales='free_x') +
theme_bw()
All done... but not quite. Something is wrong and you might notice this if you pay attention to the boxes on the legend and the gridlines in the plot panels. It looks like the alpha value is incorrect for some facets and okay for others. What's going on here?
Well, this has to do with how geom_rect works. It's drawing a box on each plot panel, but just like the other geoms, it's mapped to the data. Even though the x and y aesthetics for the geom_rect are actually not used to draw the rectangle, they are used to indicate how many of each rectangle are drawn. This means that the number of rectangles drawn in each facet corresponds to the number of lines in the dataset which exist for that facet. If 3 observations exist, 3 rectangles are drawn. If 20 observations exist for one facet, 20 rectangles are drawn, etc.
So, the fix is to supply a dataframe that contains one observation each for every facet. We have to then make sure that we supply any and all other aesthetics (x and y here) that are included in the ggplot call, or we will get an error indicating ggplot cannot "find" that particular column. Remember, even if geom_rect doesn't use these for drawing, they are used to determine how many observations exist (and therefore how many to draw).
rect_df <- data.frame(carb=unique(mtcars$carb)) # supply one of each type of carb
# have to give something to disp
rect_df$disp <- 0
ggplot(mtcars, aes(factor(carb), disp)) +
geom_rect(
data=rect_df,
aes(fill=factor(carb)), alpha=0.5,
xmin=-Inf, xmax=Inf, ymin=-Inf, ymax=Inf) +
geom_boxplot() +
facet_grid(~factor(carb), scales='free_x') +
theme_bw()
That's better.

Grouped scatterplot over grouped barplot

I am trying to make a grouped barplot with scatterplots colored for individual data points overlaid on the bars (I'm aware of the advantages of boxplots, but they are not standard in my field).
As I am new to R, I am mostly working by cutting and pasting bits of code in a semi-logical trial and error process, and here is the closest I have been able to come. It using the sample dataset "Males".
p <- ggplot(Males, aes(factor=year, fill=year, y=wage, x=ethn))
p + stat_summary(fun.y = "mean", geom = "bar", position="dodge") +
stat_summary(fun.data = mean_cl_normal, geom = "errorbar", mult=1,
color="yellow", position="dodge") +
geom_jitter(aes(size=.05, col=industry),position=position_dodge(width = 0.8))
ymax not defined: adjusting position using y instead
Unfortunately I do not have enough reputation points to post an image of the output, but basically the dots are all the same color as the column they overlay, instead of having a mix of different industries (colors) over each column.
My understanding is that the ymax error I got has nothing to do with the problem I'm having. Anyway, I'd be glad of any suggestions people can offer.

The problem you are having, I think, is that some geometries, such as geom_point and geom_jitter do not use or allow the factor aesthetic for grouping along the x-axis.
Thus when you plot other geoms on top of a chart where factor is not ignored, such as geom_bar, the factor setting is ignored for some layers but not for bar, and you don't get year-resolved columns of points.
To solve the problem, I would try using facet_grid or facet_wrap to indirectly get the x-axis groupings that you want.
For example:
require(Ecdat)
data(Males)
quartz(height=6, width=12)
ggplot(Males, aes(x=year, y=wage)) +
facet_grid(.~ethn) +
stat_summary(mapping=aes(fill=year), fun.y=mean, geom='bar') +
stat_summary(fun.data = mean_sdl, geom='errorbar', color='yellow') +
geom_jitter(aes(color=industry),
position=position_jitter(width=0.2), alpha=0.8)
quartz.save('SO_29610340.png')

How to make dodge in geom_bar agree with dodge in geom_errorbar, geom_point

I have a dataset where measurements are made for different groups at different days.
I want to have side by side bars representing the measurements at the different days for the different groups with the groups of bars spaced according to day of measurement with errorbars overlaid to them.
I'm having trouble with making the dodging in geom_bar agree with the dodge on geom_errorbar.
Here is a simple piece of code:
days = data.frame(day=c(0,1,8,15));
groups = data.frame(group=c("A","B","C","D", "E"), means=seq(0,1,length=5));
my_data = merge(days, groups);
my_data$mid = exp(my_data$means+rnorm(nrow(my_data), sd=0.25));
my_data$sigma = 0.1;
png(file="bar_and_errors_example.png", height=900, width=1200);
plot(ggplot(my_data, aes(x=day, weight=mid, ymin=mid-sigma, ymax=mid+sigma, fill=group)) +
geom_bar (position=position_dodge(width=0.5)) +
geom_errorbar (position=position_dodge(width=0.5), colour="black") +
geom_point (position=position_dodge(width=0.5), aes(y=mid, colour=group)));
dev.off();
In the plot, the errorsegments appears with a fixed offset from its bar (sorry, no plots allowed for newbies even if ggplot2 is the subject).
When binwidth is adjusted in geom_bar, the offset is not fixed and changes from day to day.
Notice, that geom_errorbar and geom_point dodge in tandem.
How do I get geom_bar to agree with the other two?
Any help appreciated.

The alignment problems are due, in part, to your bars not representing the data you intend. The following lines up correctly:
ggplot(my_data, aes(x=day, weight=mid, ymin=mid-sigma, ymax=mid+sigma, fill=group)) +
geom_bar (position=position_dodge(), aes(y=mid), stat="identity") +
geom_errorbar (position=position_dodge(width=0.9), colour="black") +
geom_point (position=position_dodge(width=0.9), aes(y=mid, colour=group))

This is an old question, but since i ran into the problem today, i want to add the following:
In
geom_bar(position = position_dodge(width=0.9), stat = "identity") +
geom_errorbar( position = position_dodge(width=0.9), colour="black")
the width-argument within position_dodge controls the dodging width of the things to dodge from each other. However, this produces whiskers as wide as the bars, which is possibly undesired.
An additional width-argument outside the position_dodge controls the width of the whiskers (and bars):
geom_bar(position = position_dodge(width=0.9), stat = "identity", width=0.7) +
geom_errorbar( position = position_dodge(width=0.9), colour="black", width=0.3)

The first change I reformatted the code according to the advanced R style guide.
days <- data.frame(day=c(0,1,8,15))
groups <- data.frame(
group=c("A","B","C","D", "E"),
means=seq(0,1,length=5)
)
my_data <- merge(days, groups)
my_data$mid <- exp(my_data$means+rnorm(nrow(my_data), sd=0.25))
my_data$sigma <- 0.1
Now when we look at the data we see that day is a factor and everything else is the same.
str(my_data)
To remove blank space from the plot I converted the day column to factors. CHECK that the levels are in the proper order before proceeding.
my_data$day <- as.factor(my_data$day)
levels(my_data$day)
The next change I made was defining y in your aes arguments. As I'm sure you are aware, this lets ggplot know where to look for y values. Then I changed the position argument to "dodge" and added the stat="identity" argument. The "identity" argument tells ggplot to plot y at x. geom_errorbar inherits the dodge position from geom_bar so you can leave it unspecified, but geom_point does not so you must specify that value. The default dodge is position_dodge(.9).
ggplot(data = my_data,
aes(x=day,
y= mid,
ymin=mid-sigma,
ymax=mid+sigma,
fill=group)) +
geom_bar(position="dodge", stat = "identity") +
geom_errorbar( position = position_dodge(), colour="black") +
geom_point(position=position_dodge(.9), aes(y=mid, colour=group))

sometimes you put aes(x=tasks,y=val,fill=group) in geom_bar rather than ggplot. This causes the problem since ggplot looks forward x and you specify it by the location of each group.

label a dodged bar chart

Maybe it's because of the dark outside, but I can't get this
Position geom_text on dodged barplot
to work on my fairly simple dataframe
fs <- data.frame(productcategory=c("c2","c2"), product=c("p4", "p5"), ms1=c(2,1))
plot <- ggplot(data=NULL)
plot +
geom_bar(data=fs, aes(x=productcategory, y=ms1, weight=ms1, fill=product),stat="identity", position="dodge") +
geom_text(data=fs, aes(label = ms1, x = productcategory, y=ms1+0.2), position=position_dodge(width=1)))
My plot still shows the labels in the "middle" of the product category and not above of the proper product.
Looks like this even it seems very simple, but I'm totally stuck on this
So any hints are very much appreciated how to get labels above the proper bars.
Tom

Because you have the aesthetics defined for each geom individually, geom_text isn't picking up on the fact that you're subdividing the x variable productcategory by the fill variable product.
You can get the graph you want by adding fill=product to the aes() call for geom_text, or you can try to define as many aesthetics as possible in the original ggplot() call, so that all the geoms pick up on those aesthetics automatically and you only have to define them if they're specific to that particular geom.
plot2 <- ggplot(data=fs, aes(x=productcategory, y=ms1, fill=product)) +
geom_bar(stat="identity", position="dodge") +
geom_text(aes(label=ms1, y =ms1 + 0.2), position=position_dodge(width=1))
print(plot2)

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

Show mean values in boxplots in R - r

Related

I want to add the count percentage for each category as a label to my ggplot pie chart

How to set background color for each panel in grouped boxplot?

Grouped scatterplot over grouped barplot

How to make dodge in geom_bar agree with dodge in geom_errorbar, geom_point

label a dodged bar chart

Categories

Resources