I'm trying to create a nice barchart from the following data:
> counts$counts_16
[1] 46921 1546 248 78 31 15 1 3 2 2 0
> counts$score
[1] 0 1 2 3 4 5 6 7 8 9 10
With the following code:
ggplot(data = counts, aes(x=score, y=counts_16)) + geom_bar(stat="identity", width=bar.width) + scale_y_continuous(trans=log2_trans())
Unfortunately, the result looks a bit odd. First of all, the bars do not start from the x axis, but are located too high.
Then, there is no bar for the 6th value, which should be 1.
For zero, there is a bar, although there should not be one.
Here's an example:
Now, I understand why it behaves odd for values of 0 on log scale, but how can I work around it? And how do I fix the other issues?
After a log transformation, the default "baseline" of the bar graph will be 1, rather than zero, because log(0) is -Inf. So when you have a count of 1, there's no bar to display since both the bottom at top of the bar are equal to 1. On the other hand, because log(0) = -Inf, the bar with a count of zero will extend downward beyond the bottom of the y-range of the graph for any lower y-limit less than 1.
UPDATE: Regarding your comment, another option is to add points to the plot, so that the you get a point where the y-value equals 1. ggplot also includes the top-half of the point for y=0, which sort of marks the zero count. For example:
counts = data.frame(score=0:6, counts_16=c(11000,10000,0:4))
ggplot(data = counts, aes(x=score, y=counts_16)) +
geom_bar(stat="identity", width=0.1, fill="grey50") +
geom_point(pch=21, fill="red", size=4) +
scale_y_log10(limits=c(1e-1,2e4), breaks=10^seq(-1,4,1),
labels=c(0.1, sprintf("%1.0f", 10^seq(0,4,1)))) +
scale_x_continuous(breaks=0:6)
You can, of course, just go with points (and perhaps a connecting line to guide the eye) and eliminate the bars, which avoids the awkward baseline issue with a bar plot on a log scale.
Related
I'm making a simple bar chart but I can't seem to figure it out. I've got my data as laid out here:
Candidate
SkinTone
Elected
1
7
1
2
4
0
3
3
0
4
2
1
Skin tone refers to a person's skin tone (obvs) and elected is a dummy variable that denotes whether a candidate was elected or not. What I want to do is have every skin tone value (it goes from 1-11) as a tick on my x-axis and my y-axis should be the percentage of those candidates that have a "1" as their elected value. So, for example, this tiny data set should generate a chart that looks like this:
Final Bar Graph
The problem I encounter is that I'm not able to figure out how to get this graph's y-axis correctly. Using this code below, I can generate a graph that looks like the one below:
ggplot(data=data, mapping = aes(x=Tone, y=Elected)) +
geom_bar(stat='identity',
fill="yellow",
col="black",
width=1,
alpha=.2) +
coord_cartesian(xlim = c(0.5,11.5)) +
scale_x_continuous(breaks = 1*1:11,
expand = expansion(add = .5)) +
labs(title="Skin Tone Electoral Success Barplot", x="Skin Tone", y="Percentage of Candidates Elected")
Incorrect Bar Graph
However, this doesn't work for me as the y-axis is showing the count of candidates who had a 1 in the Elected variable instead of the percentage. In addition, I'm getting these black blocks in between each observation, which I haven't gotten before when using col=. Lastly, I also find trouble adding in a density line as geom_density() gives me an error saying I'm missing my y aesthetic.
This question already has answers here:
Bar plot with log scales
(2 answers)
Closed 2 years ago.
I'm making a the following bar plot with ggplot:
df %>% ggplot( aes(x= group,y= cases,fill=color ) ) +
geom_bar(stat="identity") +
theme_minimal()
Which gives the following result:
The issue is that the smaller colors are not visible, hence I tried to use a log scale:
df %>% ggplot( aes(x= group,y= cases,fill=color ) ) +
geom_bar(stat="identity") +
scale_y_log10(labels = comma) +
theme_minimal()
But this completelly broke the scales, now I´m getting a 10 MM value from nowhere and bar sizes are wrong
The data I´m ussing for this is the following:
index,group,color,cases
1,4,4,9
2,4,3,61
3,1,1,5000
4,4,2,138
5,4,1,246
6,3,1,359
7,2,1,2000
8,3,2,57
9,1,2,153
10,2,2,130
11,2,3,15
12,1,3,23
13,3,3,11
14,2,4,1
TL;DR: You cannot and should not use a log scale with a stacked barplot. If you want to use a log scale, use a "dodged" barplot instead. You'll also have better luck to use geom_col instead of geom_bar here and set your fill= variable as a factor.
Geom_col vs. geom_bar
Try using geom_col in place of geom_bar. You can use coord_flip() if the direction is not to your liking. See here for reference, but the gist of the issue is that geom_bar should be used when you want to plot against "count", and geom_col should be used when you want to plot against "values". Here, your y-axis is "cases" (a value), so use geom_col.
The Problem with log scales and Stacked Barplots
With that being said, u/Dave2e is absolutely correct. The plot you are getting makes sense, because the underlying math being done to calculate the y-axis values is: log10(x) + log10(y) + log10(z) instead of what you expected, which was log10(x + y + z).
Let's use the numbers in your actual data frame for comparison here. In "group 1", you have the following:
index group color cases
3 1 1 5000
9 1 2 153
12 1 3 23
So on the y-axis what's happening is the total value of a stacked barplot (without a log scale) will be the sum of all. In other words:
> 5000 + 153 + 23
[1] 5176
This means that each of the bars represents the correct relative size, and when you add them up (or stack them up), the total size of the bar is equivalent to the total sum. Makes sense.
Now consider the same case, but for a log10 scale:
> log10(5000) + log10(153) + log10(23)
[1] 7.245389
Or, just about 17.5 million. The total height of the bar is still the sum of all individual bars (because that's what a stacked barplot is), and you can still compare the relative sizes, but the sum total of the individual logs does not equal the log of the sum:
>log10(5000 + 153 + 23)
[1] 3.713994
Suggested Way to Change your Plot
Moral of the story: you can still use a log scale to "stretch out" the small bars, but don't stack them. Use postion='dodge':
df %>% ggplot( aes(x= group,y= log10(cases),fill=as.factor(color) ) ) +
geom_col(position='dodge') +
theme_minimal()
Finally, position='dodge' (or position=position_dodge(width=...)) does not work with fill=color, since df$color is not a factor (it's numeric). This is also why your legend is showing a gradient for a categorical variable. That's why I used as.factor(color) in the ggplot call here, although you can also just apply that to the original dataset with df$color <- as.factor(df$color) and do the same thing.
I'm trying to overlay two bar plots on top of each other, not beside.
The data is from the same dataset. I want 'Block' on the x-axis and 'Start' and 'End' as overlaying bar plots.
Block Start End
1 P1L 76.80 0.0
2 P1S 68.87 4.4
3 P2L 74.00 0.0
4 P2S 74.28 3.9
5 P3L 82.22 7.7
6 P3S 80.82 17.9
My script is
ggplot(data=NULL,aes(x=Block))+
geom_bar(data=my_data$Start,stat="identity",position ="identity",alpha=.3,fill='lightblue',color='lightblue4')+
geom_bar(data=my_data$End,stat="identity",position ="identity",alpha=.8,fill='pink',color='red')
I get Error: ggplot2 doesn't know how to deal with data of class numeric
I've also tried
ggplot(my_data,aes(x=Block,y=Start))+
geom_bar(data=my_data$End, stat="identity",position="identity",...)
Anyone know how I can make it happen? Thank you.
Edit:
How to get dodge overlaying bars?
I edit this post, because my next question is relevant as it's the opposite problem of my original post.
#P.merkle
I had to change my plot into four bars showing the mean values of all Blocks labeled L and S. The L stand for littoral, and S for Sublittoral. They were exposed for two treatments: Normal and reduced.
I've calculated the means, and their standard deviation.
I need four bars with their respective error bars:
Normal/Littoral , Reduced/Littoral , Normal/Sublittoral , Reduced/Sublittoral.
Problem is when I plot it, both the littoral bars and both the sublittoral bars overlay each other! So now I want them not to overlap!
How can i make it happen? I've tried all sorts of position = 'dodge' andposition = position_dodge(newdata$Force), without luck...
My newdata contain this information:
Zonation Force N mean sd se
1 Litoral Normal 6 0.000000 0.000000 0.000000
2 Litoral Redusert 6 5.873333 3.562868 1.454535
3 Sublitoral Normal 6 7.280000 2.898903 1.183472
4 Sublitoral Redusert 6 21.461667 4.153535 1.695674
My script is this:
ggplot(data=cdata,aes(x=newdata$Force,y=newdata$mean))+
geom_bar(stat="identity",position ="dodge",
alpha=.4,fill='red', color='lightblue4',width = .6)+
geom_errorbar(aes(ymin=newdata$mean-sd,ymax=newdata$mean+sd),
width=.2, position=position_dodge(.9))
The outcome is unfortunately this
As of the error bars, it's clearly four bars there, but they overlap. Please, how can I solve this?
If you don't need a legend, Solution 1 might work for you. It is simpler because it keeps your data in wide format.
If you need a legend, consider Solution 2. It requires your data to be converted from wide format to long format.
Solution 1: Without legend (keeping wide format)
You can refine your aesthetics specification on the level of individual geoms (here, geom_bar):
ggplot(data=my_data, aes(x=Block)) +
geom_bar(aes(y=Start), stat="identity", position ="identity", alpha=.3, fill='lightblue', color='lightblue4') +
geom_bar(aes(y=End), stat="identity", position="identity", alpha=.8, fill='pink', color='red')
Solution 2: Adding a legend (converting to long format)
To add a legend, first use reshape2::melt to convert your data frame from wide format into long format.
This gives you two columns,
the variable column ("Start" vs. "End"),
and the value column
Now use the variable column to define your legend:
library(reshape2)
my_data_long <- melt(my_data, id.vars = c("Block"))
ggplot(data=my_data_long, aes(x=Block, y=value, fill=variable, color=variable, alpha=variable)) +
geom_bar(stat="identity", position ="identity") +
scale_colour_manual(values=c("lightblue4", "red")) +
scale_fill_manual(values=c("lightblue", "pink")) +
scale_alpha_manual(values=c(.3, .8))
Main issue: I want to display the data from 0 to 1.0 as an upward bar (starting from 0) but do not want the intervals to be equally spaced but log spaced.
I am trying to display the column labeled "mean" in the dataset below as a bar plot in ggplot but as the numbers are very small, I would like to show the y-axis on a log scale rather than log transform the data itself. In other words, I want to have upright bars with y-axis labels as 0, 1e-8, 1e-6 1e-4 1e-2 and 1e-0 (i.e. from 0 to 1.0 but the intervals are log scaled).
The solution below does not work as the bars are inverted.
> print(df)
type mean sd se snp
V7 outer 1.596946e-07 2.967432e-06 1.009740e-08 A
V8 outer 7.472417e-07 6.598652e-06 2.245349e-08 B
V9 outer 1.352327e-07 2.515771e-06 8.560512e-09 C
V10 outer 2.307726e-07 3.235821e-06 1.101065e-08 D
V11 outer 4.598375e-06 1.653457e-05 5.626284e-08 E
V12 outer 5.963164e-07 5.372226e-06 1.828028e-08 F
V71 middle 2.035414e-07 3.246161e-06 1.104584e-08 A
V81 middle 9.000131e-07 7.261463e-06 2.470886e-08 B
V91 middle 1.647716e-07 2.875840e-06 9.785733e-09 C
V101 middle 3.290817e-07 3.886779e-06 1.322569e-08 D
V111 middle 6.371170e-06 1.986268e-05 6.758752e-08 E
V121 middle 8.312429e-07 6.329386e-06 2.153725e-08 F
The code below properly generates the grouped barplot with error bars
ggplot(data=df, aes(x=snp,y=mean,fill=type))+
geom_bar(stat="identity",position=position_dodge(),width=0.5) +
geom_errorbar(aes(ymin=mean-se, ymax=mean+se),width=.3, position=position_dodge(.45))
However, I want to make the y-axis log scaled and so I add in scale_y_log10() as follows:
ggplot(data=df, aes(x=snp,y=mean,fill=type))+
geom_bar(stat="identity",position=position_dodge(),width=0.5) + scale_y_log10() +
geom_errorbar(aes(ymin=mean-se, ymax=mean+se),width=.3, position=position_dodge(.45))
But strangely the bars are falling from above but I simply want them to be going up (as normally) and don't know what I am doing wrong.
Thank you
Here's a bit of hacking to show what happens if you try to get bars that start at zero on a log scale. I've used geom_segment for illustration, so that I can create "bars" (wide line segments, actually) extending over arbitrary ranges. To make this work, I've also had to do all the dodging manually, which is why the x mapping looks weird.
In the example below, the scale goes from y=1e-20 to y=1. The y-axis intervals are log scaled, meaning that the physical distance from, say 1e-20 to 1e-19 is the same as the physical distance from, say, 1e-8 to 1e-7, even though the magnitudes of those intervals differ by a factor of one trillion.
Bars that go down to zero can't be displayed, because zero on the log scale is an infinite distance below the bottom of the graph. We could get closer to zero by, for example, changing 1e-20 to 1e-100 in the code below. But that will just make the already-small physical distances between the data values even smaller and thus even harder to distinguish.
The bars are also misleading in another way, because, as #hrbrmstr pointed out, our brains treat distance along the bar linearly, but the magnitude represented by each increment of distance along the bar changes by a factor of 10 about every few millimeters in the example below. The bars simply aren't encoding meaningful information about the data.
ggplot(data=df, aes(x=as.numeric(snp) + 0.3*(as.numeric(type) - 1.5),
y=mean, colour=type)) +
geom_errorbar(aes(ymin=mean-se, ymax=mean+se), width=.3) +
geom_segment(aes(xend=as.numeric(snp) + 0.3*(as.numeric(type) - 1.5),
y=1e-20, yend=mean), size=5) +
scale_y_log10(limits=c(1e-20, 1), breaks=10^(-100:0), expand=c(0,0)) +
scale_x_continuous(breaks=1:6, labels=LETTERS[1:6])
If you want to stick with a log scale, maybe plotting points would be a better approach:
pd = position=position_dodge(.5)
ggplot(data=df, aes(x=snp,y=mean,fill=type))+
geom_errorbar(aes(ymin=mean-se, ymax=mean+se, colour=type), width=.3, position=pd) +
geom_point(aes(colour=type), position=pd) +
scale_y_log10(limits=c(1e-7, 1e-5), breaks=10^(-10:0)) +
annotation_logticks(sides="l")
I am looking for a way where data points are connected following a top-down manner to visualize a ranking. In that the y-axis represents the rank and the x-axis the attributes. With the normal setting the line connects the point starting from left to right. This results that the points are connected in the wrong order.
With the data below the line should be connected from (6,1) to (4,2) and then (5,3) etc. Optimally the ranking scale need to be inverted so that rank one starts on the top.
data <- read.table(header=TRUE, text='
attribute rank
1 6
2 5
3 4
4 2
5 3
6 1
7 7
8 11
9 10
10 8
11 9
')
plot(data$attribute,data$rank,type="l")
Is there a way to change the line drawing direction? My second idea would be to rotate the graph or maybe you have better ideas.
The graph I am trying to achieve is somewhat similar to this one:
example vertical line chart
You can do this with ggplot:
library(ggplot2)
ggplot(data, aes(y = attribute, x = rank)) +
geom_line() +
coord_flip() +
scale_x_reverse()
It solves the problem exactly the way you suggested. The first part of the command (ggplot(...) + geom_line()) creates an "ordinary" line plot. Note that I have already switched x- and y-coordinates. The next command (coord_flip()) flips x- and y-axis, and the last one (scale_x_reverse) changes the ordering of the x-axis (which is plotted as the y-axis) such that 1 is in the top left corner.
Just to show you that something like the example you linked in your question can be done with ggplot2, I add the following example:
library(tidyr)
data$attribute2 <- sample(data$attribute)
data$attribute3 <- sample(data$attribute)
plot_data <- pivot_longer(data, cols = -"rank")
ggplot(plot_data, aes(y = value, x = rank, colour = name)) +
geom_line() +
geom_point() +
coord_flip() +
scale_x_reverse()
If you intend to do your plots with R, learning ggplot2 is really worthwhile. You can find many examples on Cookbook for R.