Boxplot overlaid on dot plot + means, means in wrong position

Boxplot overlaid on dot plot + means, means in wrong position - r

Using advice given in Overdraw mean points in grouped boxplot with ggplot2 I tried
FEV <- expand.grid(sex=c('female', 'male'), smoke=c('no', 'yes'),
reps=1:5)
set.seed(1)
FEV$fev <- runif(nrow(FEV), 1, 4)
ggplot(FEV, aes(x=smoke, y=fev, color=sex)) +
geom_boxplot(alpha=.5, width=.2) + # remove width to overlay boxes on pts
stat_summary(fun.y=mean, geom="point", shape=5, size=2,
position=position_dodge(width=.2)) +
geom_dotplot(binaxis='y', stackdir='center', position='dodge') +
xlab('') + ylab(expression(FEV[1])) + coord_flip()
The means are not placed quite correctly in the vertical sense. Guidance welcomed. Note: I like having the box plots between the two sets of dots; that's not the problem.

The help for ?position_dodge just says that dodging things with different widths is tricky - I usually tweak this manually. Trying a few values, it looks like you want the points to take a dodge width that is 3/4 of the boxplot width but I don't know why or if that holds for other geoms. I would try changing the width for the stat_summary call to 0.15

Related

How to set background color for each panel in grouped boxplot?

I plotted a grouped boxplot and trying to change the background color for each panel. I can use panel.background function to change whole plot background. But how this can be done for individual panel? I found a similar question here. But I failed to adopt the code to my plot.
Top few lines of my input data look like
Code
p<-ggplot(df, aes(x=Genotype, y=Length, fill=Treatment)) + scale_fill_manual(values=c("#69b3a2", "#CF7737"))+
geom_boxplot(width=2.5)+ theme(text = element_text(size=20),panel.spacing.x=unit(0.4, "lines"),
axis.title.x=element_blank(),axis.text.x=element_blank(),axis.ticks.x=element_blank(),axis.text.y = element_text(angle=90, hjust=1,colour="black")) +
labs(x = "Genotype", y = "Petal length (cm)")+
facet_grid(~divide,scales = "free", space = "free")
p+theme(panel.background = element_rect(fill = "#F6F8F9", colour = "#E7ECF1"))

Unfortunately, like the other theme elements, the fill aesthetic of element_rect() cannot be mapped to data. You cannot just send a vector of colors to fill either (create your own mapping of sorts). In the end, the simplest solution probably is going to be very similar to the answer you linked to in your question... with a bit of a twist here.
I'll use mtcars as an example. Note that I'm converting some of the continuous variables in the dataset to factors so that we can create some more discrete values.
It's important to note, the rect geom is drawn before the boxplot geom, to ensure the boxplot appears on top of the rect.
ggplot(mtcars, aes(factor(carb), disp)) +
geom_rect(
aes(fill=factor(carb)), alpha=0.5,
xmin=-Inf, xmax=Inf, ymin=-Inf, ymax=Inf) +
geom_boxplot() +
facet_grid(~factor(carb), scales='free_x') +
theme_bw()
All done... but not quite. Something is wrong and you might notice this if you pay attention to the boxes on the legend and the gridlines in the plot panels. It looks like the alpha value is incorrect for some facets and okay for others. What's going on here?
Well, this has to do with how geom_rect works. It's drawing a box on each plot panel, but just like the other geoms, it's mapped to the data. Even though the x and y aesthetics for the geom_rect are actually not used to draw the rectangle, they are used to indicate how many of each rectangle are drawn. This means that the number of rectangles drawn in each facet corresponds to the number of lines in the dataset which exist for that facet. If 3 observations exist, 3 rectangles are drawn. If 20 observations exist for one facet, 20 rectangles are drawn, etc.
So, the fix is to supply a dataframe that contains one observation each for every facet. We have to then make sure that we supply any and all other aesthetics (x and y here) that are included in the ggplot call, or we will get an error indicating ggplot cannot "find" that particular column. Remember, even if geom_rect doesn't use these for drawing, they are used to determine how many observations exist (and therefore how many to draw).
rect_df <- data.frame(carb=unique(mtcars$carb)) # supply one of each type of carb
# have to give something to disp
rect_df$disp <- 0
ggplot(mtcars, aes(factor(carb), disp)) +
geom_rect(
data=rect_df,
aes(fill=factor(carb)), alpha=0.5,
xmin=-Inf, xmax=Inf, ymin=-Inf, ymax=Inf) +
geom_boxplot() +
facet_grid(~factor(carb), scales='free_x') +
theme_bw()
That's better.

How I can correctly overlap bar and linechart together

I am using below codes
p <- ggplot() +
geom_bar(data=filter(df, variable=="LA"), aes(x=Gen, y=Mean, fill=Leaf),
stat="identity", position="dodge")+
geom_point(data=filter(df, variable=="TT"),aes(x=Gen, y=Mean, colour=Leaf))+
geom_line(data=filter(df, variable=="TT"), aes(x=Gen, y=Mean, group=Leaf))+
ggtitle("G")+xlab("Genotypes")+ylab("Canopy temperature")+
scale_fill_hue(name="", labels=c("Leaf-1", "Leaf-2", "Leaf-3"))+
scale_y_continuous(sec.axis=sec_axis(~./20, name="2nd Y-axis"))+
theme(axis.text.x=element_text(angle=90, hjust=1), legend.position="top")
graph produced from above code
I want graph like that
data
https://docs.google.com/spreadsheets/d/1Fjmg-l0WTL7jhEqwwtC4RXY_9VQV9GOBliFq_3G1f8I/edit#gid=0
From data, I want variable LA to left side and TT from right side
Above part is resolved,
Now, I am trying to put errorbars on the bar graph with below code, it caused an error, can someone have a look for solution?
p + geom_errorbar(aes(ymin=Mean-se, ymax=Mean+se), width=0.5,
position=position_dodge(0.9), colour="black", size=.7)

For this you need to understand that even you have the second Y-Axis, it is just a markup and everything draw on the graph is still base on the main Y-Axis(left one).
So you need to do two things:
Convert anything that should reference to the second Y-Axis to same scale of the one on the left, in this case is the bar scale (LA variables) whose maximum is 15. So you need to divide the value of TT by 20.
Second Axis needs to label correctly so it will be the main Y-Axis multiply by 20.
p <- ggplot() +
geom_bar(data=filter(df, variable=="LA"), aes(x=Gen, y=Mean, fill=Leaf),
stat="identity", position="dodge") +
# values are divided by 20 to be in the same value range of bar graph
geom_point(data=filter(df, variable=="TT"),aes(x=Gen, y=Mean/20, colour=Leaf))+
geom_line(data=filter(df, variable=="TT"), aes(x=Gen, y=Mean/20, group=Leaf))+
ggtitle("G")+xlab("Genotypes")+ylab("Canopy temperature")+
scale_fill_hue(name="", labels=c("Leaf-1", "Leaf-2", "Leaf-3"))+
# second axis is multiply by 20 to reflect the actual value of lines & points
scale_y_continuous(
sec.axis=sec_axis(trans = ~ . * 20, name="2nd Y-axis",
breaks = c(0, 100, 200, 300))) +
theme(axis.text.x=element_text(angle=90, hjust=1), legend.position="top")
For the error par which is very basic here. You will need to adjust the theme and the graph to have a good looking one.
p + geom_errorbar(data = filter(df, variable=="TT"),
aes(x = Gen, y=Mean/20, ymin=(Mean-se)/20,
ymax=(Mean+se)/20), width=0.5,
position=position_dodge(0.9), colour="black", size=.7)
One final note: Please consider reading the error message, understand what it say, reference to the help document of packages, functions in R so you can learn how to do all the code yourself.

Overlay points (and error bars) over bar plot with position_dodge

I have been trying to look for an answer to my particular problem but I have not been successful, so I have just made a MWE to post here.
I tried the answers here with no success.
The task I want to do seems easy enough, but I cannot figure it out, and the results I get are making me have some fundamental questions...
I just want to overlay points and error bars on a bar plot, using ggplot2.
I have a long format data frame that looks like the following:
> mydf <- data.frame(cell=paste0("cell", rep(1:3, each=12)),
scientist=paste0("scientist", rep(rep(rep(1:2, each=3), 2), 3)),
timepoint=paste0("time", rep(rep(1:2, each=6), 3)),
rep=paste0("rep", rep(1:3, 12)),
value=runif(36)*100)
I have attempted to get the plot I want the following way:
myPal <- brewer.pal(3, "Set2")[1:2]
myPal2 <- brewer.pal(3, "Set1")
outfile <- "test.pdf"
pdf(file=outfile, height=10, width=10)
print(#or ggsave()
ggplot(mydf, aes(cell, value, fill=scientist )) +
geom_bar(stat="identity", position=position_dodge(.9)) +
geom_point(aes(cell, color=rep), position=position_dodge(.9), size=5) +
facet_grid(timepoint~., scales="free_x", space="free_x") +
scale_y_continuous("% of total cells") +
scale_fill_manual(values=myPal) +
scale_color_manual(values=myPal2)
)
dev.off()
But I obtain this:
The problem is, there should be 3 "rep" values per "scientist" bar, but the values are ordered by "rep" instead (they should be 1,2,3,1,2,3, instead of 1,1,2,2,3,3).
Besides, I would like to add error bars with geom_errorbar but I didn't manage to get a working example...
Furthermore, overlying actual value points to the bars, it is making me wonder what is actually being plotted here... if the values are taken properly for each bar, and why the max value (or so it seems) is plotted by default.
The way I think this should be properly plotted is with the median (or mean), adding the error bars like the whiskers in a boxplot (min and max value).
Any idea how to...
... have the "rep" value points appear in proper order?
... change the value shown by the bars from max to median?
... add error bars with max and min values?

I restructured your plotting code a little to make things easier.
The secret is to use proper grouping (which is otherwise inferred from fill and color. Also since you're dodging on multiple levels, dodge2 has to be used.
When you are unsure about "what is plotted where" in bar/column charts, it's always helpful to add the option color="black" which reveals that still things are stacked on top each other, because of your use of dodge instead of dodge2.
p = ggplot(mydf, aes(x=cell, y=value, group=paste(scientist,rep))) +
geom_col(aes(fill=scientist), position=position_dodge2(.9)) +
geom_point(aes(cell, color=rep), position=position_dodge2(.9), size=5) +
facet_grid(timepoint~., scales="free_x", space="free_x") +
scale_y_continuous("% of total cells") +
scale_fill_brewer(palette = "Set2")+
scale_color_brewer(palette = "Set1")
ggsave(filename = outfile, plot=p, height = 10, width = 10)
gives:
Regarding error bars
Since there are only three replicates I would show original data points and maybe a violin plot. For completeness sake I added also a geom_errorbar.
ggplot(mydf, aes(x=cell, y=value,group=paste(cell,scientist))) +
geom_violin(aes(fill=scientist),position=position_dodge(),color="black") +
geom_point(aes(cell, color=rep), position=position_dodge(0.9), size=5) +
geom_errorbar(stat="summary",position=position_dodge())+
facet_grid(timepoint~., scales="free_x", space="free_x") +
scale_y_continuous("% of total cells") +
scale_fill_brewer(palette = "Set2")+
scale_color_brewer(palette = "Set1")
gives
Update after comment
As I mentioned in my comment below, the stacking of the percentages leads to an undesirable outcome.
ggplot(mydf, aes(x=paste(cell, scientist), y=value)) +
geom_bar(aes(fill=rep),stat="identity", position=position_stack(),color="black") +
geom_point(aes(color=rep), position=position_dodge(.9), size=3) +
facet_grid(timepoint~., scales="free_x", space="free_x") +
scale_y_continuous("% of total cells") +
scale_fill_brewer(palette = "Set2")+
scale_color_brewer(palette = "Set1")

ggplot2 density histogram with width=.5, vline and centered bar positions

I want a nice density (that sums to 1) histogram for some discrete data. I have tried a couple of ways to do this, but none were entirely satisfactory.
Generate some data:
#data
set.seed(-999)
d.test = data.frame(score = round(rnorm(100,1)))
mean.score = mean(d.test[,1])
d1 = as.data.frame(prop.table(table(d.test)))
The first gives the right placement of bars -- centered on top of the number -- but the wrong placement of vline(). This is because the x-axis is discrete (factor) and so the mean is plotted using the number of levels, not the values. The mean value is .89.
ggplot(data=d1, aes(x=d.test, y=Freq)) +
geom_bar(stat="identity", width=.5) +
geom_vline(xintercept=mean.score, color="blue", linetype="dashed")
The second gives the correct vline() placement (because the x-axis is continuous), but wrong placement of bars and the width parameter does not appear to be modifiable when x-axis is continuous (see here). I also tried the size parameter which also has no effect. Ditto for hjust.
ggplot(d.test, aes(x=score)) +
geom_histogram(aes(y=..count../sum(..count..)), width=.5) +
geom_vline(xintercept=mean.score, color="blue", linetype="dashed")
Any ideas? My bad idea is to rescale the mean so that it fits with the factor levels and use the first solution. This won't work well in case some of the factor levels are 'missing', e.g. 1, 2, 4 with no factor for 3 because no datapoint had that value. If the mean is 3.5, rescaling this is odd (x-axis is no longer an interval scale).
Another idea is this:
ggplot(d.test, aes(x=score)) +
stat_bin(binwidth=.5, aes(y= ..density../sum(..density..)), hjust=-.5) +
scale_x_continuous(breaks = -2:5) + #add ticks back
geom_vline(xintercept=mean.score, color="blue", linetype="dashed")
But this requires adjusting the breaks, and the bars are still in the wrong positions (not centered). Unfortunately, hjust does not appear to work.
How do I get everything I want?
density sums to 1
bars centered above values
vline() at the correct number
width=.5
With base graphics, one could perhaps solve this problem by plotting twice on the x-axis. Is there some similar way here?

It sounds like you just want to make sure that your x-axis values are numeric rather than factors
ggplot(data=d1, aes(x=as.numeric(as.character(d.test)), y=Freq)) +
geom_bar(stat="identity", width=.5) +
geom_vline(xintercept=mean.score, color="blue", linetype="dashed") +
scale_x_continuous(breaks=-2:3)
which gives

How to make dodge in geom_bar agree with dodge in geom_errorbar, geom_point

I have a dataset where measurements are made for different groups at different days.
I want to have side by side bars representing the measurements at the different days for the different groups with the groups of bars spaced according to day of measurement with errorbars overlaid to them.
I'm having trouble with making the dodging in geom_bar agree with the dodge on geom_errorbar.
Here is a simple piece of code:
days = data.frame(day=c(0,1,8,15));
groups = data.frame(group=c("A","B","C","D", "E"), means=seq(0,1,length=5));
my_data = merge(days, groups);
my_data$mid = exp(my_data$means+rnorm(nrow(my_data), sd=0.25));
my_data$sigma = 0.1;
png(file="bar_and_errors_example.png", height=900, width=1200);
plot(ggplot(my_data, aes(x=day, weight=mid, ymin=mid-sigma, ymax=mid+sigma, fill=group)) +
geom_bar (position=position_dodge(width=0.5)) +
geom_errorbar (position=position_dodge(width=0.5), colour="black") +
geom_point (position=position_dodge(width=0.5), aes(y=mid, colour=group)));
dev.off();
In the plot, the errorsegments appears with a fixed offset from its bar (sorry, no plots allowed for newbies even if ggplot2 is the subject).
When binwidth is adjusted in geom_bar, the offset is not fixed and changes from day to day.
Notice, that geom_errorbar and geom_point dodge in tandem.
How do I get geom_bar to agree with the other two?
Any help appreciated.

The alignment problems are due, in part, to your bars not representing the data you intend. The following lines up correctly:
ggplot(my_data, aes(x=day, weight=mid, ymin=mid-sigma, ymax=mid+sigma, fill=group)) +
geom_bar (position=position_dodge(), aes(y=mid), stat="identity") +
geom_errorbar (position=position_dodge(width=0.9), colour="black") +
geom_point (position=position_dodge(width=0.9), aes(y=mid, colour=group))

This is an old question, but since i ran into the problem today, i want to add the following:
In
geom_bar(position = position_dodge(width=0.9), stat = "identity") +
geom_errorbar( position = position_dodge(width=0.9), colour="black")
the width-argument within position_dodge controls the dodging width of the things to dodge from each other. However, this produces whiskers as wide as the bars, which is possibly undesired.
An additional width-argument outside the position_dodge controls the width of the whiskers (and bars):
geom_bar(position = position_dodge(width=0.9), stat = "identity", width=0.7) +
geom_errorbar( position = position_dodge(width=0.9), colour="black", width=0.3)

The first change I reformatted the code according to the advanced R style guide.
days <- data.frame(day=c(0,1,8,15))
groups <- data.frame(
group=c("A","B","C","D", "E"),
means=seq(0,1,length=5)
)
my_data <- merge(days, groups)
my_data$mid <- exp(my_data$means+rnorm(nrow(my_data), sd=0.25))
my_data$sigma <- 0.1
Now when we look at the data we see that day is a factor and everything else is the same.
str(my_data)
To remove blank space from the plot I converted the day column to factors. CHECK that the levels are in the proper order before proceeding.
my_data$day <- as.factor(my_data$day)
levels(my_data$day)
The next change I made was defining y in your aes arguments. As I'm sure you are aware, this lets ggplot know where to look for y values. Then I changed the position argument to "dodge" and added the stat="identity" argument. The "identity" argument tells ggplot to plot y at x. geom_errorbar inherits the dodge position from geom_bar so you can leave it unspecified, but geom_point does not so you must specify that value. The default dodge is position_dodge(.9).
ggplot(data = my_data,
aes(x=day,
y= mid,
ymin=mid-sigma,
ymax=mid+sigma,
fill=group)) +
geom_bar(position="dodge", stat = "identity") +
geom_errorbar( position = position_dodge(), colour="black") +
geom_point(position=position_dodge(.9), aes(y=mid, colour=group))

sometimes you put aes(x=tasks,y=val,fill=group) in geom_bar rather than ggplot. This causes the problem since ggplot looks forward x and you specify it by the location of each group.

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

Boxplot overlaid on dot plot + means, means in wrong position - r

Related

How to set background color for each panel in grouped boxplot?

How I can correctly overlap bar and linechart together

Overlay points (and error bars) over bar plot with position_dodge

ggplot2 density histogram with width=.5, vline and centered bar positions

How to make dodge in geom_bar agree with dodge in geom_errorbar, geom_point

Categories

Resources