This question already has answers here:
Bar plot with log scales
(2 answers)
Closed 2 years ago.
I'm making a the following bar plot with ggplot:
df %>% ggplot( aes(x= group,y= cases,fill=color ) ) +
geom_bar(stat="identity") +
theme_minimal()
Which gives the following result:
The issue is that the smaller colors are not visible, hence I tried to use a log scale:
df %>% ggplot( aes(x= group,y= cases,fill=color ) ) +
geom_bar(stat="identity") +
scale_y_log10(labels = comma) +
theme_minimal()
But this completelly broke the scales, now I´m getting a 10 MM value from nowhere and bar sizes are wrong
The data I´m ussing for this is the following:
index,group,color,cases
1,4,4,9
2,4,3,61
3,1,1,5000
4,4,2,138
5,4,1,246
6,3,1,359
7,2,1,2000
8,3,2,57
9,1,2,153
10,2,2,130
11,2,3,15
12,1,3,23
13,3,3,11
14,2,4,1
TL;DR: You cannot and should not use a log scale with a stacked barplot. If you want to use a log scale, use a "dodged" barplot instead. You'll also have better luck to use geom_col instead of geom_bar here and set your fill= variable as a factor.
Geom_col vs. geom_bar
Try using geom_col in place of geom_bar. You can use coord_flip() if the direction is not to your liking. See here for reference, but the gist of the issue is that geom_bar should be used when you want to plot against "count", and geom_col should be used when you want to plot against "values". Here, your y-axis is "cases" (a value), so use geom_col.
The Problem with log scales and Stacked Barplots
With that being said, u/Dave2e is absolutely correct. The plot you are getting makes sense, because the underlying math being done to calculate the y-axis values is: log10(x) + log10(y) + log10(z) instead of what you expected, which was log10(x + y + z).
Let's use the numbers in your actual data frame for comparison here. In "group 1", you have the following:
index group color cases
3 1 1 5000
9 1 2 153
12 1 3 23
So on the y-axis what's happening is the total value of a stacked barplot (without a log scale) will be the sum of all. In other words:
> 5000 + 153 + 23
[1] 5176
This means that each of the bars represents the correct relative size, and when you add them up (or stack them up), the total size of the bar is equivalent to the total sum. Makes sense.
Now consider the same case, but for a log10 scale:
> log10(5000) + log10(153) + log10(23)
[1] 7.245389
Or, just about 17.5 million. The total height of the bar is still the sum of all individual bars (because that's what a stacked barplot is), and you can still compare the relative sizes, but the sum total of the individual logs does not equal the log of the sum:
>log10(5000 + 153 + 23)
[1] 3.713994
Suggested Way to Change your Plot
Moral of the story: you can still use a log scale to "stretch out" the small bars, but don't stack them. Use postion='dodge':
df %>% ggplot( aes(x= group,y= log10(cases),fill=as.factor(color) ) ) +
geom_col(position='dodge') +
theme_minimal()
Finally, position='dodge' (or position=position_dodge(width=...)) does not work with fill=color, since df$color is not a factor (it's numeric). This is also why your legend is showing a gradient for a categorical variable. That's why I used as.factor(color) in the ggplot call here, although you can also just apply that to the original dataset with df$color <- as.factor(df$color) and do the same thing.
Related
I have a dataframe:
>picard
count reads
1 20681318
2 3206677
3 674351
4 319173
5 139411
6 117706
How do I plot log10(count) vs log10(reads) on a ggplot (barplot)?
I tried:
ggplot(picard) + geom_bar(aes(x=log10(count),y=log10(reads)))
But it is not accepting y=log10(reads). How do I plot my y values?
You can do something like this, but plotting the x axis, which is not continuous, with a log10 scale doesn't make sense for me :
ggplot(picard) +
geom_bar(aes(x=count,y=reads),stat="identity") +
scale_y_log10() +
scale_x_log10()
If you only want an y axis with a log10 scale, just do :
ggplot(picard) +
geom_bar(aes(x=count,y=reads),stat="identity") +
scale_y_log10()
Use stat="identity":
ggplot(picard) + geom_bar(aes(x=log10(count),y=log10(reads)), stat="identity")
You will actually get a warning with your approach:
Mapping a variable to y and also using stat="bin".
With stat="bin", it will attempt to set the y value to the count of cases in each group.
This can result in unexpected behavior and will not be allowed in a future version of ggplot2.
If you want y to represent counts of cases, use stat="bin" and don't map a variable to y.
If you want y to represent values in the data, use stat="identity".
See ?geom_bar for examples. (Deprecated; last used in version 0.9.2)
There's a direct way to do this, i.e. by using the geom_col() function. Just make a tiny adjustment to your code:
ggplot(picard) + geom_col(aes(x=log10(count), y=log10(reads)))
and it will give the same output as setting the stat argument to identity with geom_bar(). The thing is, geom_bar() uses count as default for stat, hence it will not take any variable for the y-axis. It will simply use the count, i.e, the number of occurrences of each value of the x-axis, for it's y-axis. I hope this answers your question.
I've been trying to standardise multiple bar plots so that the bars are all identical in width regardless of the number of bars. Note that this is over multiple distinct plots - faceting is not an option. It's easy enough to scale the plot area so that, for instance, a plot with 6 bars is 1.5* the width of a plot with 4 bars. This would work perfectly, except that each plot has an expanded x axis by default, which I would like to keep.
"The defaults are to expand the scale by 5% on each side for continuous variables, and by 0.6 units on each side for discrete variables."
https://ggplot2.tidyverse.org/reference/scale_discrete.html
My problem is that I can't for the life of me work out what '0.6 units' actually means. I've manually measured the distance between the bars and the y axis in various design tools and gotten inconsistent answers, so I can't factor '0.6 units' into my calculations when working out what size the panel windows should be. Additionally I can't find any answers on how many 'units' long a discrete x axis is - I assumed at first it would be 1 unit per category but that doesn't fit with the visuals at all. I've included an image that hopefully shows what I mean - the two graphs
In this image, the top graph has a plot area exactly 1.5* that of the bottom graph. Seeing as it has 6 bars compared with 4, that would mean each bar is the same width, except that that extra space between the axis and the first bar messes this up. Setting expand = expansion(add = c(0, 0)) clears this up but results in not-so-pretty graphs. What I'd like is for the bars to be identical in width between the two plots, accounting for this extra space. I'm specifically looking for a general solution that I can use for future plots, not for the individual solution for this sample. As such, what I'd really like to know is how many 'units' long are these two x axes? Many thanks for any and all help!
Instead of using expansion for the axis, I would probably use the fact that categorical variables are actually plotted on the positive integers on Cartesian co-ordinates. This means that, provided you know the maximum number of columns you are going to use in your plots, you can set this as the range in coord_cartesian. There is a little arithmetic involved to keep the bars centred, but it should give consistent results.
We start with some reproducible data:
library(ggplot2)
set.seed(1)
df <- data.frame(group = letters[1:6], value = 100 * runif(6))
Now we set the value for the maximum number of bars we will need:
MAX_BARS <- 6
And the only thing "funny" about the plot code is the calculation of the x axis limits in coord_cartesian:
ggplot(df, aes(group, value)) +
geom_col() +
coord_cartesian(xlim = c(1 -(MAX_BARS - length(unique(df$group)))/2,
MAX_BARS - (MAX_BARS - length(unique(df$group)))/2))
Now let us remove one factor level and run the exact same plot code:
df <- df[-1,]
ggplot(df, aes(group, value)) +
geom_col() +
coord_cartesian(xlim = c(1 -(MAX_BARS - length(unique(df$group)))/2,
MAX_BARS - (MAX_BARS - length(unique(df$group)))/2))
And again:
df <- df[-1,]
ggplot(df, aes(group, value)) +
geom_col() +
coord_cartesian(xlim = c(1 -(MAX_BARS - length(unique(df$group)))/2,
MAX_BARS - (MAX_BARS - length(unique(df$group)))/2))
And again:
df <- df[-1,]
ggplot(df, aes(group, value)) +
geom_col() +
coord_cartesian(xlim = c(1 -(MAX_BARS - length(unique(df$group)))/2,
MAX_BARS - (MAX_BARS - length(unique(df$group)))/2))
You will see the bars remain constant width and centralized, yet the panel size remains fixed.
Created on 2021-11-06 by the reprex package (v2.0.0)
I'm trying to overlay two bar plots on top of each other, not beside.
The data is from the same dataset. I want 'Block' on the x-axis and 'Start' and 'End' as overlaying bar plots.
Block Start End
1 P1L 76.80 0.0
2 P1S 68.87 4.4
3 P2L 74.00 0.0
4 P2S 74.28 3.9
5 P3L 82.22 7.7
6 P3S 80.82 17.9
My script is
ggplot(data=NULL,aes(x=Block))+
geom_bar(data=my_data$Start,stat="identity",position ="identity",alpha=.3,fill='lightblue',color='lightblue4')+
geom_bar(data=my_data$End,stat="identity",position ="identity",alpha=.8,fill='pink',color='red')
I get Error: ggplot2 doesn't know how to deal with data of class numeric
I've also tried
ggplot(my_data,aes(x=Block,y=Start))+
geom_bar(data=my_data$End, stat="identity",position="identity",...)
Anyone know how I can make it happen? Thank you.
Edit:
How to get dodge overlaying bars?
I edit this post, because my next question is relevant as it's the opposite problem of my original post.
#P.merkle
I had to change my plot into four bars showing the mean values of all Blocks labeled L and S. The L stand for littoral, and S for Sublittoral. They were exposed for two treatments: Normal and reduced.
I've calculated the means, and their standard deviation.
I need four bars with their respective error bars:
Normal/Littoral , Reduced/Littoral , Normal/Sublittoral , Reduced/Sublittoral.
Problem is when I plot it, both the littoral bars and both the sublittoral bars overlay each other! So now I want them not to overlap!
How can i make it happen? I've tried all sorts of position = 'dodge' andposition = position_dodge(newdata$Force), without luck...
My newdata contain this information:
Zonation Force N mean sd se
1 Litoral Normal 6 0.000000 0.000000 0.000000
2 Litoral Redusert 6 5.873333 3.562868 1.454535
3 Sublitoral Normal 6 7.280000 2.898903 1.183472
4 Sublitoral Redusert 6 21.461667 4.153535 1.695674
My script is this:
ggplot(data=cdata,aes(x=newdata$Force,y=newdata$mean))+
geom_bar(stat="identity",position ="dodge",
alpha=.4,fill='red', color='lightblue4',width = .6)+
geom_errorbar(aes(ymin=newdata$mean-sd,ymax=newdata$mean+sd),
width=.2, position=position_dodge(.9))
The outcome is unfortunately this
As of the error bars, it's clearly four bars there, but they overlap. Please, how can I solve this?
If you don't need a legend, Solution 1 might work for you. It is simpler because it keeps your data in wide format.
If you need a legend, consider Solution 2. It requires your data to be converted from wide format to long format.
Solution 1: Without legend (keeping wide format)
You can refine your aesthetics specification on the level of individual geoms (here, geom_bar):
ggplot(data=my_data, aes(x=Block)) +
geom_bar(aes(y=Start), stat="identity", position ="identity", alpha=.3, fill='lightblue', color='lightblue4') +
geom_bar(aes(y=End), stat="identity", position="identity", alpha=.8, fill='pink', color='red')
Solution 2: Adding a legend (converting to long format)
To add a legend, first use reshape2::melt to convert your data frame from wide format into long format.
This gives you two columns,
the variable column ("Start" vs. "End"),
and the value column
Now use the variable column to define your legend:
library(reshape2)
my_data_long <- melt(my_data, id.vars = c("Block"))
ggplot(data=my_data_long, aes(x=Block, y=value, fill=variable, color=variable, alpha=variable)) +
geom_bar(stat="identity", position ="identity") +
scale_colour_manual(values=c("lightblue4", "red")) +
scale_fill_manual(values=c("lightblue", "pink")) +
scale_alpha_manual(values=c(.3, .8))
This question already has answers here:
Is there an equivalent in ggplot to the varwidth option in plot?
(2 answers)
Closed 8 years ago.
I have a boxplot that I made in R with ggplot2 analagous to the sample boxplot below.
The problem is, for the values on the y axis (in this sample, the number of cylinders in the car) I have very different frequencies -- I may have included 2 8 cylinder cars, but 200 4 cylinder cars. Because of this, I'd like to be able to resize the boxplots (in this case, change the height along the y axis) so that the 4 cylinder boxplot is a larger portion of the chart than the 8 cylinder boxplot. Does someone know how to do this?
As #aosmith mentioned, varwidth is the argument you want. It looks like it may have been accidentally removed from ggplot2 at some point (https://github.com/hadley/ggplot2/blob/master/R/geom-boxplot.r). If you look at the commit title, it is adding back in the varwidth parmeter. I'm not sure if that ever made into the cran package, but you might want to check your version. It works with my version: ggplot2 v.1.0.0 I'm not sure how recently the feature was added.
Here is an example:
library(ggplot2)
set.seed(1234)
df <- data.frame(cond = factor( c(rep("A",200), rep("B",150), rep("C",200), rep("D",10)) ),
rating = c(rnorm(200),rnorm(150, mean=0.2), rnorm(200, mean=.8), rnorm(10, mean=0.6)))
head(df, 5)
tail(df, 5)
p <- ggplot(df, aes(x=cond, y=rating, fill=cond)) +
guides(fill=FALSE) + coord_flip()
p + geom_boxplot()
Gives:
p + geom_boxplot(varwidth=T)
Gives:
For a couple of more options, you can also use a violin plot with scaled widths (the scale="count" argument):
p+ geom_violin(scale="count")
Or combine violin and boxplots to maximize your information.
p+ geom_violin(scale="count") + geom_boxplot(fill="white", width=0.2, alpha=0.3)
This question already has answers here:
Creating a vertical color gradient for a geom_bar plot
(4 answers)
Closed 9 months ago.
How to plot a bar chart with continous coloring, similar to scale_fill_brewer() but more continously?
ggplot(diamonds, aes(clarity, fill=cut)) + geom_bar()
I would comment on this, but not enough rep yet...
Your problem with the Diamonds data set is that the data is discrete, meaning each value / observation belonging to it is distinct and separate. In order to do a continuous fill you need continuous data (values / observations belonging to it may take on any value within a finite or infinite interval).
When you have a continuous data set you can use the following ggplot2 command: scale_colour_gradient.
EDIT
You can try this: ggplot(diamonds, aes(clarity, fill=..count..)) + geom_bar(), but you loose the Cut information:
Color Brewer has sequential color schemes built in - check colorbrewer2.org. For instance,
ggplot(diamonds, aes(clarity, fill=cut)) +
geom_bar() +
scale_fill_brewer(palette="PuBu")
yields
Update:
Per OP's comment below, it appears that OP wants to map alpha to the factor levels of cut, with alpha decreasing as factor levels increase (i.e., alpha for 'Fair' should be higher than alpha for Ideal). We can manually assign alpha using scale_alpha_manual, but a simpler solution is to use scale_alpha_discrete with the levels argument defined 'in reverse':
ggplot(diamonds, aes(clarity, alpha=cut)) + geom_bar(fill="darkblue") +
scale_alpha_discrete(range=c(1, 0.2)) #Note that range is ordered 'backwards'
Of course you can adjust the color using the fill argument to geom_bar (darkblue comes out looking fairly purple).