Grouped scatterplot over grouped barplot - r

I am trying to make a grouped barplot with scatterplots colored for individual data points overlaid on the bars (I'm aware of the advantages of boxplots, but they are not standard in my field).
As I am new to R, I am mostly working by cutting and pasting bits of code in a semi-logical trial and error process, and here is the closest I have been able to come. It using the sample dataset "Males".
p <- ggplot(Males, aes(factor=year, fill=year, y=wage, x=ethn))
p + stat_summary(fun.y = "mean", geom = "bar", position="dodge") +
stat_summary(fun.data = mean_cl_normal, geom = "errorbar", mult=1,
color="yellow", position="dodge") +
geom_jitter(aes(size=.05, col=industry),position=position_dodge(width = 0.8))
ymax not defined: adjusting position using y instead
Unfortunately I do not have enough reputation points to post an image of the output, but basically the dots are all the same color as the column they overlay, instead of having a mix of different industries (colors) over each column.
My understanding is that the ymax error I got has nothing to do with the problem I'm having. Anyway, I'd be glad of any suggestions people can offer.

The problem you are having, I think, is that some geometries, such as geom_point and geom_jitter do not use or allow the factor aesthetic for grouping along the x-axis.
Thus when you plot other geoms on top of a chart where factor is not ignored, such as geom_bar, the factor setting is ignored for some layers but not for bar, and you don't get year-resolved columns of points.
To solve the problem, I would try using facet_grid or facet_wrap to indirectly get the x-axis groupings that you want.
For example:
require(Ecdat)
data(Males)
quartz(height=6, width=12)
ggplot(Males, aes(x=year, y=wage)) +
facet_grid(.~ethn) +
stat_summary(mapping=aes(fill=year), fun.y=mean, geom='bar') +
stat_summary(fun.data = mean_sdl, geom='errorbar', color='yellow') +
geom_jitter(aes(color=industry),
position=position_jitter(width=0.2), alpha=0.8)
quartz.save('SO_29610340.png')

Related

How to set background color for each panel in grouped boxplot?

I plotted a grouped boxplot and trying to change the background color for each panel. I can use panel.background function to change whole plot background. But how this can be done for individual panel? I found a similar question here. But I failed to adopt the code to my plot.
Top few lines of my input data look like
Code
p<-ggplot(df, aes(x=Genotype, y=Length, fill=Treatment)) + scale_fill_manual(values=c("#69b3a2", "#CF7737"))+
geom_boxplot(width=2.5)+ theme(text = element_text(size=20),panel.spacing.x=unit(0.4, "lines"),
axis.title.x=element_blank(),axis.text.x=element_blank(),axis.ticks.x=element_blank(),axis.text.y = element_text(angle=90, hjust=1,colour="black")) +
labs(x = "Genotype", y = "Petal length (cm)")+
facet_grid(~divide,scales = "free", space = "free")
p+theme(panel.background = element_rect(fill = "#F6F8F9", colour = "#E7ECF1"))
Unfortunately, like the other theme elements, the fill aesthetic of element_rect() cannot be mapped to data. You cannot just send a vector of colors to fill either (create your own mapping of sorts). In the end, the simplest solution probably is going to be very similar to the answer you linked to in your question... with a bit of a twist here.
I'll use mtcars as an example. Note that I'm converting some of the continuous variables in the dataset to factors so that we can create some more discrete values.
It's important to note, the rect geom is drawn before the boxplot geom, to ensure the boxplot appears on top of the rect.
ggplot(mtcars, aes(factor(carb), disp)) +
geom_rect(
aes(fill=factor(carb)), alpha=0.5,
xmin=-Inf, xmax=Inf, ymin=-Inf, ymax=Inf) +
geom_boxplot() +
facet_grid(~factor(carb), scales='free_x') +
theme_bw()
All done... but not quite. Something is wrong and you might notice this if you pay attention to the boxes on the legend and the gridlines in the plot panels. It looks like the alpha value is incorrect for some facets and okay for others. What's going on here?
Well, this has to do with how geom_rect works. It's drawing a box on each plot panel, but just like the other geoms, it's mapped to the data. Even though the x and y aesthetics for the geom_rect are actually not used to draw the rectangle, they are used to indicate how many of each rectangle are drawn. This means that the number of rectangles drawn in each facet corresponds to the number of lines in the dataset which exist for that facet. If 3 observations exist, 3 rectangles are drawn. If 20 observations exist for one facet, 20 rectangles are drawn, etc.
So, the fix is to supply a dataframe that contains one observation each for every facet. We have to then make sure that we supply any and all other aesthetics (x and y here) that are included in the ggplot call, or we will get an error indicating ggplot cannot "find" that particular column. Remember, even if geom_rect doesn't use these for drawing, they are used to determine how many observations exist (and therefore how many to draw).
rect_df <- data.frame(carb=unique(mtcars$carb)) # supply one of each type of carb
# have to give something to disp
rect_df$disp <- 0
ggplot(mtcars, aes(factor(carb), disp)) +
geom_rect(
data=rect_df,
aes(fill=factor(carb)), alpha=0.5,
xmin=-Inf, xmax=Inf, ymin=-Inf, ymax=Inf) +
geom_boxplot() +
facet_grid(~factor(carb), scales='free_x') +
theme_bw()
That's better.

Show mean values in boxplots in R

time_pic <- ggplot(data_box, aes(x=Kind, y=TimeTotal, fill=Sitting_Position)) +
geom_boxplot()
print(time_pic)
time_pic+labs(title="", x="", y = "Time (Sec)")
I ran the above codes to get the following image. But I don't know how to add average value for each boxplot on this image.
updated.
I tried this.
means <- aggregate(TimeTotal ~ Sitting_Position*Kind, data_box, mean)
ggplot(data=data_box, aes(x=Kind, y=TimeTotal, fill=Sitting_Position)) +
geom_boxplot() +
stat_summary(fun=mean, colour="darkred", geom="point", shape=18, size=3,show_guide = FALSE) +
geom_text(data = means, aes(label = TimeTotal, y = TimeTotal + 0.08))
This is what it looks like now. Two dots are on the same line. And two values are overlapping with each other.
As others said, you can share your dataset for more specific help, but in this case I think the point can be made using a dummy dataset. I'm creating one that looks pretty similar to your own in terms of naming, so theoretically you can just plug in this code and it could work.
The biggest thing you need here is to control how ggplot2 is separating the separate boxplots for the data_box$Sitting_Position that share the same data_box$Kind. The process of separating and spreading the boxes around that x= axis value is called "dodging". When you supply a fill= or color= (or other) aesthetic in aes() for that geom, ggplot2 knows enough that it will assume you also want to group the data according to that value. So, your initial ggplot() call has in aes() that fill=Sitting_Position, which means that geom_boxplot() "works" - it creates the separate boxes that are colored differently and which are "dodged" properly.
When you create the points and the text, ggplot2 has no idea that you want to "dodge" this data, and even if you did want to dodge, on what basis to use for the dodge, since the fill= aesthetic doesn't make sense for a text or point geom. How to fix this? The answer is to:
Supply a group= aesthetic, which can override the grouping of a fill= or color= aesthetic, but which also can serve as a basis for the dodging for geoms that do not have a similar aesthetic.
Specify more clearly how you want to dodge. This will be important for accurate positioning of all things you want to dodge. Otherwise, you will have things dodged, but maybe not the same distance.
Here's how I combined all that:
# the datasets
set.seed(1234)
data_box <- data.frame(
Kind=c(rep('Model-free AR',100),rep('Real-world',100)),
TimeTotal=c(rnorm(50,5.5,1),rnorm(50,5.43,1.1),rnorm(50,4.9,1),rnorm(50,4.7,0.2)),
Sitting_Position=rep(c(rep('face to face',50),rep('side by side',50)),2)
)
means <- aggregate(TimeTotal ~ Sitting_Position*Kind, data_box, mean)
# the plot
ggplot(data_box, aes(x=Kind, y=TimeTotal)) + theme_bw() +
# specifying dodge here and width to avoid overlapping boxes
geom_boxplot(
aes(fill=Sitting_Position),
position=position_dodge(0.6), width=0.5
) +
# note group aesthetic and same dodge call for next two objects
stat_summary(
aes(group=Sitting_Position),
position=position_dodge(0.6),
fun=mean,
geom='point', color='darkred', shape=18, size=3,
show.legend = FALSE
) +
geom_text(
data=means,
aes(label=round(TimeTotal,2), y=TimeTotal + 0.18, group=Sitting_Position),
position=position_dodge(0.6)
)
Giving you this:

How to get boxplots using ggplot to overlap instead of faceting

I am new to ggplot, and using ggplot to show box plots of my data corresponding to different types like this. There are four types. I found that I can use facet_wrap to generate four different graphs.
ggplot(o.xp.sample, aes(power, reduction, fill=interaction(type,power), dodge=type)) +
stat_boxplot(geom ='errorbar')+
geom_boxplot() +
facet_wrap(~type)
My question is, I want to combine all the four graphs into one graph such that each type has a different color (and slightly transparent to show other plots through). Is this possible?
Here is the data https://gist.github.com/anonymous/9589729
Try this:
library(ggplot2)
o.xp.sample = read.csv("C:\\...\\data.csv",sep=",")
ggplot(o.xp.sample, aes(factor(power), reduction, fill=interaction(type,power), dodge=type)) +
stat_boxplot(geom ='errorbar') +
geom_boxplot() +
theme_bw() +
guides(fill = guide_legend(ncol = 3)) #added line as suggested by Paulo Cardoso

How to make dodge in geom_bar agree with dodge in geom_errorbar, geom_point

I have a dataset where measurements are made for different groups at different days.
I want to have side by side bars representing the measurements at the different days for the different groups with the groups of bars spaced according to day of measurement with errorbars overlaid to them.
I'm having trouble with making the dodging in geom_bar agree with the dodge on geom_errorbar.
Here is a simple piece of code:
days = data.frame(day=c(0,1,8,15));
groups = data.frame(group=c("A","B","C","D", "E"), means=seq(0,1,length=5));
my_data = merge(days, groups);
my_data$mid = exp(my_data$means+rnorm(nrow(my_data), sd=0.25));
my_data$sigma = 0.1;
png(file="bar_and_errors_example.png", height=900, width=1200);
plot(ggplot(my_data, aes(x=day, weight=mid, ymin=mid-sigma, ymax=mid+sigma, fill=group)) +
geom_bar (position=position_dodge(width=0.5)) +
geom_errorbar (position=position_dodge(width=0.5), colour="black") +
geom_point (position=position_dodge(width=0.5), aes(y=mid, colour=group)));
dev.off();
In the plot, the errorsegments appears with a fixed offset from its bar (sorry, no plots allowed for newbies even if ggplot2 is the subject).
When binwidth is adjusted in geom_bar, the offset is not fixed and changes from day to day.
Notice, that geom_errorbar and geom_point dodge in tandem.
How do I get geom_bar to agree with the other two?
Any help appreciated.
The alignment problems are due, in part, to your bars not representing the data you intend. The following lines up correctly:
ggplot(my_data, aes(x=day, weight=mid, ymin=mid-sigma, ymax=mid+sigma, fill=group)) +
geom_bar (position=position_dodge(), aes(y=mid), stat="identity") +
geom_errorbar (position=position_dodge(width=0.9), colour="black") +
geom_point (position=position_dodge(width=0.9), aes(y=mid, colour=group))
This is an old question, but since i ran into the problem today, i want to add the following:
In
geom_bar(position = position_dodge(width=0.9), stat = "identity") +
geom_errorbar( position = position_dodge(width=0.9), colour="black")
the width-argument within position_dodge controls the dodging width of the things to dodge from each other. However, this produces whiskers as wide as the bars, which is possibly undesired.
An additional width-argument outside the position_dodge controls the width of the whiskers (and bars):
geom_bar(position = position_dodge(width=0.9), stat = "identity", width=0.7) +
geom_errorbar( position = position_dodge(width=0.9), colour="black", width=0.3)
The first change I reformatted the code according to the advanced R style guide.
days <- data.frame(day=c(0,1,8,15))
groups <- data.frame(
group=c("A","B","C","D", "E"),
means=seq(0,1,length=5)
)
my_data <- merge(days, groups)
my_data$mid <- exp(my_data$means+rnorm(nrow(my_data), sd=0.25))
my_data$sigma <- 0.1
Now when we look at the data we see that day is a factor and everything else is the same.
str(my_data)
To remove blank space from the plot I converted the day column to factors. CHECK that the levels are in the proper order before proceeding.
my_data$day <- as.factor(my_data$day)
levels(my_data$day)
The next change I made was defining y in your aes arguments. As I'm sure you are aware, this lets ggplot know where to look for y values. Then I changed the position argument to "dodge" and added the stat="identity" argument. The "identity" argument tells ggplot to plot y at x. geom_errorbar inherits the dodge position from geom_bar so you can leave it unspecified, but geom_point does not so you must specify that value. The default dodge is position_dodge(.9).
ggplot(data = my_data,
aes(x=day,
y= mid,
ymin=mid-sigma,
ymax=mid+sigma,
fill=group)) +
geom_bar(position="dodge", stat = "identity") +
geom_errorbar( position = position_dodge(), colour="black") +
geom_point(position=position_dodge(.9), aes(y=mid, colour=group))
sometimes you put aes(x=tasks,y=val,fill=group) in geom_bar rather than ggplot. This causes the problem since ggplot looks forward x and you specify it by the location of each group.

label a dodged bar chart

Maybe it's because of the dark outside, but I can't get this
Position geom_text on dodged barplot
to work on my fairly simple dataframe
fs <- data.frame(productcategory=c("c2","c2"), product=c("p4", "p5"), ms1=c(2,1))
plot <- ggplot(data=NULL)
plot +
geom_bar(data=fs, aes(x=productcategory, y=ms1, weight=ms1, fill=product),stat="identity", position="dodge") +
geom_text(data=fs, aes(label = ms1, x = productcategory, y=ms1+0.2), position=position_dodge(width=1)))
My plot still shows the labels in the "middle" of the product category and not above of the proper product.
Looks like this even it seems very simple, but I'm totally stuck on this
So any hints are very much appreciated how to get labels above the proper bars.
Tom
Because you have the aesthetics defined for each geom individually, geom_text isn't picking up on the fact that you're subdividing the x variable productcategory by the fill variable product.
You can get the graph you want by adding fill=product to the aes() call for geom_text, or you can try to define as many aesthetics as possible in the original ggplot() call, so that all the geoms pick up on those aesthetics automatically and you only have to define them if they're specific to that particular geom.
plot2 <- ggplot(data=fs, aes(x=productcategory, y=ms1, fill=product)) +
geom_bar(stat="identity", position="dodge") +
geom_text(aes(label=ms1, y =ms1 + 0.2), position=position_dodge(width=1))
print(plot2)

Resources