ggplot2: Issues with changing binwidth of stacked histogram - r

Having issues changing the binwidth of stacked histogram created with ggplot2.
It does not error out but seems to be ignoring the binwidth setting.
ggplot(trade.a, aes(x=variable1,y=value ,fill=category)) +
geom_bar(stat = "identity", binwidth=c(0,300),position ='fill') +
xlim(0, 300) +
xlab("Variable1") +
ylab("Count") +
ggtitle("Category") +
scale_y_continuous(labels = percent_format()) +
theme_grey(base_size = 20)
Any ideas?

Using stat="identity" inside the geom_bar means that the data in trade.a has already been binned and counted (which is also implied by specifying a y aesthetic which points into the trade.a data). binwidth is an argument to stat_bin (the default stat for geom_bar) which does the aggregation for you. (Additionally it takes only a single value; the breaks argument can take a vector of breakpoints.) Thus to change the binning width for the trade.a data, you need to go back to the step where you did the binning. Or start with unbinned data and use the default stat for geom_bar with the binwidth specified.

Related

I want to add the count percentage for each category as a label to my ggplot pie chart

I am using the code:
age_pie_chart <- ggplot(data = data , aes(x = "", fill = `Q1: How old are you?`))+
geom_bar(position = "fill", width = 1) +
coord_polar(theta = "y") + xlab("") + ylab("") + blank_theme + scale_fill_grey()+
theme(axis.text.x=element_blank())
age_pie_chart
I want the percentage of each category to be added inside the chart.
from searching I understand that I need to use geom_text function but before that I need to construct a frequency table with the percentage of the count of each category.
Here's an example with some dummy data. First, here's the dummy data. I'm using sample() to select values and then I'm ensuring we have a "small" slice by adding only 2 values for Q1 = "-15".
set.seed(1234)
data <- data.frame(
Q1 = c(sample(c('15-7','18-20','21-25'), 98, replace=TRUE), rep('-15', 2))
)
For the plot, it's basically the same as your plot, although you don't need the width argument for geom_bar().
To get the text positioned correctly, you'll want to use geom_text(). You could calculate the frequency before plotting, or you could have ggplot calculate that on the fly much like it did for geom_bar by using stat="count" for your geom_text() layer.
library(ggplot2)
ggplot(data=data, aes(x="", fill=Q1)) +
geom_bar(position="fill") +
geom_text(
stat='count',
aes(y=after_stat(..count..),
label=after_stat(scales::percent(..count../sum(..count..),1))),
position=position_fill(0.5),
) +
coord_polar(theta="y") +
labs(x=NULL, y=NULL) +
scale_fill_brewer(palette="Pastel1") +
theme_void()
So to be clear what's going on in that geom_text() line:
I'm using stat="count". We want to access that counting statistic within the plot, so we need to specify to use that special stat function.
We are using ..count.. within the aesthetics. The key here is that in order to access the ..count.. value, we need the stat function to compute first. You can use the after_stat() function to specify that the stat function should happen first before mapping the aesthetics - that opens ..count.. up to use for use within aes()!
Percent calculation for the label. This happens via ..count../sum(..count..). I'm using scales::percent() as a way to format the label as a percent.
Position = fill. Like the bar geom, we are also needing to use position="fill". You need to specify to use the actual position_fill() method, since it allows us to access the vjust argument. This positions each text value in the "middle" of each slice. If you just set position="fill", you get the text positioned at the edge of each slice.

Percentages in faceted histogram with scale_y_continuous()

I am trying to use scale_y_continuous() with a faceted histogram and running into an issue. I am hoping to get each count to be a percentage instead. My code is:
ggplot(d, aes(x = likely_att)) +
geom_histogram(binwidth = 0.5, color = "black") +
facet_wrap(~married, scales = "free_y") +
theme_classic() +
scale_y_continuous(labels = percent_format())
It looks like the distributions themselves are accurate, but the scaling is off: the percentages are "200 000%", "5 000%", etc. and that seems wrong, but I'm not quite sure why it's happening.
There are many more "yes" than "no" or "separated" married values in my dataset, which is why I use scales = "free_y" and why I'm hoping to just have percentages shown and only need one axis value shown.
I can't share this exact data for privacy reasons, but the likely_att variable is just a 1-5 numeric var, and married is a character var with 3 values: yes, no, separated.
In case it's helpful, I basically want it to look just like this image, but with percentages instead of counts, so I can just have one single y axis on the far left with 0 - 100 %
The problem is that using the percentage_format() function changes the way the labels are printed, but it doesn't actually rescale the numbers. To do that, you could use the density constructed variable and multiply it by the bin-width, then use the percent formatting.
ggplot(d, aes(x = likely_att)) +
stat_bin(aes(y=..density..*.5, group = married),
binwidth = 0.5, color = "black") +
facet_wrap(~married, scales = "free_y") +
theme_classic() +
scale_y_continuous(labels = percent_format())

Show mean values in boxplots in R

time_pic <- ggplot(data_box, aes(x=Kind, y=TimeTotal, fill=Sitting_Position)) +
geom_boxplot()
print(time_pic)
time_pic+labs(title="", x="", y = "Time (Sec)")
I ran the above codes to get the following image. But I don't know how to add average value for each boxplot on this image.
updated.
I tried this.
means <- aggregate(TimeTotal ~ Sitting_Position*Kind, data_box, mean)
ggplot(data=data_box, aes(x=Kind, y=TimeTotal, fill=Sitting_Position)) +
geom_boxplot() +
stat_summary(fun=mean, colour="darkred", geom="point", shape=18, size=3,show_guide = FALSE) +
geom_text(data = means, aes(label = TimeTotal, y = TimeTotal + 0.08))
This is what it looks like now. Two dots are on the same line. And two values are overlapping with each other.
As others said, you can share your dataset for more specific help, but in this case I think the point can be made using a dummy dataset. I'm creating one that looks pretty similar to your own in terms of naming, so theoretically you can just plug in this code and it could work.
The biggest thing you need here is to control how ggplot2 is separating the separate boxplots for the data_box$Sitting_Position that share the same data_box$Kind. The process of separating and spreading the boxes around that x= axis value is called "dodging". When you supply a fill= or color= (or other) aesthetic in aes() for that geom, ggplot2 knows enough that it will assume you also want to group the data according to that value. So, your initial ggplot() call has in aes() that fill=Sitting_Position, which means that geom_boxplot() "works" - it creates the separate boxes that are colored differently and which are "dodged" properly.
When you create the points and the text, ggplot2 has no idea that you want to "dodge" this data, and even if you did want to dodge, on what basis to use for the dodge, since the fill= aesthetic doesn't make sense for a text or point geom. How to fix this? The answer is to:
Supply a group= aesthetic, which can override the grouping of a fill= or color= aesthetic, but which also can serve as a basis for the dodging for geoms that do not have a similar aesthetic.
Specify more clearly how you want to dodge. This will be important for accurate positioning of all things you want to dodge. Otherwise, you will have things dodged, but maybe not the same distance.
Here's how I combined all that:
# the datasets
set.seed(1234)
data_box <- data.frame(
Kind=c(rep('Model-free AR',100),rep('Real-world',100)),
TimeTotal=c(rnorm(50,5.5,1),rnorm(50,5.43,1.1),rnorm(50,4.9,1),rnorm(50,4.7,0.2)),
Sitting_Position=rep(c(rep('face to face',50),rep('side by side',50)),2)
)
means <- aggregate(TimeTotal ~ Sitting_Position*Kind, data_box, mean)
# the plot
ggplot(data_box, aes(x=Kind, y=TimeTotal)) + theme_bw() +
# specifying dodge here and width to avoid overlapping boxes
geom_boxplot(
aes(fill=Sitting_Position),
position=position_dodge(0.6), width=0.5
) +
# note group aesthetic and same dodge call for next two objects
stat_summary(
aes(group=Sitting_Position),
position=position_dodge(0.6),
fun=mean,
geom='point', color='darkred', shape=18, size=3,
show.legend = FALSE
) +
geom_text(
data=means,
aes(label=round(TimeTotal,2), y=TimeTotal + 0.18, group=Sitting_Position),
position=position_dodge(0.6)
)
Giving you this:

ggplot, ggplotly, scale_y_continuous, ylim and percentage

I would like to plot a graph where the y axis is in percentage:
p = ggplot(test, aes(x=creation_date, y=value, color=type)) +
geom_line(aes(group=type)) +
scale_colour_manual(values=c("breach"="red","within_promise"="green","before_promise"="blue")) +
geom_vline(xintercept=c(as.numeric(as.Date('2016-05-14'))),linetype="dotted") +
scale_y_continuous(labels=percent)
ggplotly()
Now I would like to set the y axis superior limit to be 100%
p = ggplot(test, aes(x=creation_date, y=value, color=type)) +
geom_line(aes(group=type)) +
scale_colour_manual(values=c("breach"="red","within_promise"="green","before_promise"="blue")) +
geom_vline(xintercept=c(as.numeric(as.Date('2016-05-14'))),linetype="dotted") +
scale_y_continuous(labels=percent) +
ylim(0, 1)
ggplotly()
But result is the same as the previous plot, the y axis limits are the same.
It works when I don't put the y axis to be in percent:
p = ggplot(test, aes(x=creation_date, y=value, color=type)) +
geom_line(aes(group=type)) +
scale_colour_manual(values=c("breach"="red","within_promise"="green","before_promise"="blue")) +
geom_vline(xintercept=c(as.numeric(as.Date('2016-05-14'))),linetype="dotted") +
ylim(0, 1)
ggplotly()
Moreover using ggplotly when I set the y axis to be in percent when I put my mouse on a point of the graph the value is not in percent:
I'm aware it's been a whle since you asked, but you could use limits inside scale_y_continuous(), like this:
scale_y_continuous(labels = scales::percent, limits=c(0,1))
Minor suggested edit to the response above:
It seems that you have to specify the limits within the scale_y_continuous call prior to setting the values as percentages:
scale_y_continuous(limits=c(0,1), labels = scales::percent)
As you have not given the dataset, I am making my best guess.
You need to give limits option within scale_y_continuous. ylim as you see, does not override the aesthetics set by scale_y_continuous. You need to use one function to change aesthetics of y-axis. Use ylim or scale_y_continuous.
I had a similar issue here and neither solutions worked for me. It's clear that we can't combine scale_y_continuous with ylim. Setting the limit parameter within scale_y_continuous caused some erros. However, as suggested in the docs we can use the function coord_cartesian() in combination with scale_y_continuous. The final code would be something like this:
...+
coord_cartesian(ylim=c(0.50, 0.75)) +
scale_y_continuous(labels = scales::percent)

How to make dodge in geom_bar agree with dodge in geom_errorbar, geom_point

I have a dataset where measurements are made for different groups at different days.
I want to have side by side bars representing the measurements at the different days for the different groups with the groups of bars spaced according to day of measurement with errorbars overlaid to them.
I'm having trouble with making the dodging in geom_bar agree with the dodge on geom_errorbar.
Here is a simple piece of code:
days = data.frame(day=c(0,1,8,15));
groups = data.frame(group=c("A","B","C","D", "E"), means=seq(0,1,length=5));
my_data = merge(days, groups);
my_data$mid = exp(my_data$means+rnorm(nrow(my_data), sd=0.25));
my_data$sigma = 0.1;
png(file="bar_and_errors_example.png", height=900, width=1200);
plot(ggplot(my_data, aes(x=day, weight=mid, ymin=mid-sigma, ymax=mid+sigma, fill=group)) +
geom_bar (position=position_dodge(width=0.5)) +
geom_errorbar (position=position_dodge(width=0.5), colour="black") +
geom_point (position=position_dodge(width=0.5), aes(y=mid, colour=group)));
dev.off();
In the plot, the errorsegments appears with a fixed offset from its bar (sorry, no plots allowed for newbies even if ggplot2 is the subject).
When binwidth is adjusted in geom_bar, the offset is not fixed and changes from day to day.
Notice, that geom_errorbar and geom_point dodge in tandem.
How do I get geom_bar to agree with the other two?
Any help appreciated.
The alignment problems are due, in part, to your bars not representing the data you intend. The following lines up correctly:
ggplot(my_data, aes(x=day, weight=mid, ymin=mid-sigma, ymax=mid+sigma, fill=group)) +
geom_bar (position=position_dodge(), aes(y=mid), stat="identity") +
geom_errorbar (position=position_dodge(width=0.9), colour="black") +
geom_point (position=position_dodge(width=0.9), aes(y=mid, colour=group))
This is an old question, but since i ran into the problem today, i want to add the following:
In
geom_bar(position = position_dodge(width=0.9), stat = "identity") +
geom_errorbar( position = position_dodge(width=0.9), colour="black")
the width-argument within position_dodge controls the dodging width of the things to dodge from each other. However, this produces whiskers as wide as the bars, which is possibly undesired.
An additional width-argument outside the position_dodge controls the width of the whiskers (and bars):
geom_bar(position = position_dodge(width=0.9), stat = "identity", width=0.7) +
geom_errorbar( position = position_dodge(width=0.9), colour="black", width=0.3)
The first change I reformatted the code according to the advanced R style guide.
days <- data.frame(day=c(0,1,8,15))
groups <- data.frame(
group=c("A","B","C","D", "E"),
means=seq(0,1,length=5)
)
my_data <- merge(days, groups)
my_data$mid <- exp(my_data$means+rnorm(nrow(my_data), sd=0.25))
my_data$sigma <- 0.1
Now when we look at the data we see that day is a factor and everything else is the same.
str(my_data)
To remove blank space from the plot I converted the day column to factors. CHECK that the levels are in the proper order before proceeding.
my_data$day <- as.factor(my_data$day)
levels(my_data$day)
The next change I made was defining y in your aes arguments. As I'm sure you are aware, this lets ggplot know where to look for y values. Then I changed the position argument to "dodge" and added the stat="identity" argument. The "identity" argument tells ggplot to plot y at x. geom_errorbar inherits the dodge position from geom_bar so you can leave it unspecified, but geom_point does not so you must specify that value. The default dodge is position_dodge(.9).
ggplot(data = my_data,
aes(x=day,
y= mid,
ymin=mid-sigma,
ymax=mid+sigma,
fill=group)) +
geom_bar(position="dodge", stat = "identity") +
geom_errorbar( position = position_dodge(), colour="black") +
geom_point(position=position_dodge(.9), aes(y=mid, colour=group))
sometimes you put aes(x=tasks,y=val,fill=group) in geom_bar rather than ggplot. This causes the problem since ggplot looks forward x and you specify it by the location of each group.

Resources