Overlay two bar plots with geom_bar() - r

I'm trying to overlay two bar plots on top of each other, not beside.
The data is from the same dataset. I want 'Block' on the x-axis and 'Start' and 'End' as overlaying bar plots.
Block Start End
1 P1L 76.80 0.0
2 P1S 68.87 4.4
3 P2L 74.00 0.0
4 P2S 74.28 3.9
5 P3L 82.22 7.7
6 P3S 80.82 17.9
My script is
ggplot(data=NULL,aes(x=Block))+
geom_bar(data=my_data$Start,stat="identity",position ="identity",alpha=.3,fill='lightblue',color='lightblue4')+
geom_bar(data=my_data$End,stat="identity",position ="identity",alpha=.8,fill='pink',color='red')
I get Error: ggplot2 doesn't know how to deal with data of class numeric
I've also tried
ggplot(my_data,aes(x=Block,y=Start))+
geom_bar(data=my_data$End, stat="identity",position="identity",...)
Anyone know how I can make it happen? Thank you.
Edit:
How to get dodge overlaying bars?
I edit this post, because my next question is relevant as it's the opposite problem of my original post.
#P.merkle
I had to change my plot into four bars showing the mean values of all Blocks labeled L and S. The L stand for littoral, and S for Sublittoral. They were exposed for two treatments: Normal and reduced.
I've calculated the means, and their standard deviation.
I need four bars with their respective error bars:
Normal/Littoral , Reduced/Littoral , Normal/Sublittoral , Reduced/Sublittoral.
Problem is when I plot it, both the littoral bars and both the sublittoral bars overlay each other! So now I want them not to overlap!
How can i make it happen? I've tried all sorts of position = 'dodge' andposition = position_dodge(newdata$Force), without luck...
My newdata contain this information:
Zonation Force N mean sd se
1 Litoral Normal 6 0.000000 0.000000 0.000000
2 Litoral Redusert 6 5.873333 3.562868 1.454535
3 Sublitoral Normal 6 7.280000 2.898903 1.183472
4 Sublitoral Redusert 6 21.461667 4.153535 1.695674
My script is this:
ggplot(data=cdata,aes(x=newdata$Force,y=newdata$mean))+
geom_bar(stat="identity",position ="dodge",
alpha=.4,fill='red', color='lightblue4',width = .6)+
geom_errorbar(aes(ymin=newdata$mean-sd,ymax=newdata$mean+sd),
width=.2, position=position_dodge(.9))
The outcome is unfortunately this
As of the error bars, it's clearly four bars there, but they overlap. Please, how can I solve this?

If you don't need a legend, Solution 1 might work for you. It is simpler because it keeps your data in wide format.
If you need a legend, consider Solution 2. It requires your data to be converted from wide format to long format.
Solution 1: Without legend (keeping wide format)
You can refine your aesthetics specification on the level of individual geoms (here, geom_bar):
ggplot(data=my_data, aes(x=Block)) +
geom_bar(aes(y=Start), stat="identity", position ="identity", alpha=.3, fill='lightblue', color='lightblue4') +
geom_bar(aes(y=End), stat="identity", position="identity", alpha=.8, fill='pink', color='red')
Solution 2: Adding a legend (converting to long format)
To add a legend, first use reshape2::melt to convert your data frame from wide format into long format.
This gives you two columns,
the variable column ("Start" vs. "End"),
and the value column
Now use the variable column to define your legend:
library(reshape2)
my_data_long <- melt(my_data, id.vars = c("Block"))
ggplot(data=my_data_long, aes(x=Block, y=value, fill=variable, color=variable, alpha=variable)) +
geom_bar(stat="identity", position ="identity") +
scale_colour_manual(values=c("lightblue4", "red")) +
scale_fill_manual(values=c("lightblue", "pink")) +
scale_alpha_manual(values=c(.3, .8))

Related

Bar chart - bars jumped to y-axis

I was plotting a bar chart with the code which worked perfectly well until some of the data had a value of 0.
barwidth = 0.35
df1:
norms_number R2.c
1 0.011
2 0
3 0.015
4 0.011
5 0
6 0.012
df2:
norms_number R2.c
1 0.001
2 0
3 0.012
4 0.006
5 0
6 0.004
test <- ggplot()+
geom_bar(data=df1, aes(x=norms_number, y=R2.c),stat="identity", position="dodge", width = barwidth)+
geom_bar(data=df2, aes(x=norms_number+barwidth+0.03, y=R2.c),
stat="identity", position="dodge",width = barwidth)
my result was:
and I got a warning that position stack requires non-overlapping x intervals (but they are not overlapping?)
I looked into it and changed the DV to factor (from numeric), which half helped, because now the graph looks like this:
why are the bars on the y axis? how else can I get around this weird error with values of 0?
First of all, you are intending to plot a bar chart where the heights are represented by a value rather than by number of cases. See here for more details, but you should be using geom_col instead of geom_bar.
With that being said, the error you are getting and the result is because it seems with x=norms_number+barwidth+0.03 you are trying to specify the precise positioning of the second set of data (df2) relative to the first set of data (df1).
In order for ggplot to dodge, it has to understand what to use as a basis for the dodge, and then it will separate (or "dodge") each observation containing the same x= aesthetic based upon that particular group used as the basis. Under normal circumstances, you would specify in aes( something like fill=, and ggplot is smart enough to know that whatever you set as fill= will also be the basis for position='dodge' to function. in the abscence of that (or if you wanted to override that), you would need to specify a group= aesthetic that would be used for dodging.
Ultimately, this means that you need to combine your datasets and provide ggplot a way of deciding how to dodge. This makes sense, since both of your dataframes are intended to be placed in the same plot, and both have identical x and y aesthetics. If you leave them as separate dataframes, you can overplot them in the same plot, but there is no good way to have ggplot use position='dodge', because it needs to see all the data in the geom_col call in order to know what to use as the basis for the dodge.
With all that being said, here's what I would recommend:
# combine datasets, but first make a marker called "origin"
# this will be used as a basis for the dodge and fill aesthetics
df1$origin <- 'df1'
df2$origin <- 'df2'
df <- rbind(df1, df2)
# need to change norms_number to a factor to allow for discrete axis
df$norms_number <- as.factor(df$norms_number)
You then use only one call to geom_col to get your plot. In the first case, I will use only the group= aesthetic to show you how ggplot uses this for the dodge mechanism:
ggplot(df, aes(x=norms_number, y=R2.c)) +
geom_col(position='dodge', width=0.35, aes(group=origin), color='black')
As mentioned, you can also just supply a fill= aesthetic, and ggplot will know to use that as the mechanism for dodging:
ggplot(df, aes(x=norms_number, y=R2.c)) +
geom_col(position='dodge', width=0.35, aes(fill=origin), color='black')
Not very sure if you are trying to draw something more complicated like a bar over a bar etc.. anyhow, one way is to use geom_rect() if you want to have one over the other:
ggplot()+
geom_rect(data=df1,
aes(xmin=norms_number-barwidth,xmax=norms_number,
ymin=0,ymax=R2.c))+
geom_rect(data=df2,
aes(xmin=norms_number,xmax=norms_number+barwidth,
ymin=0,ymax=R2.c))+
scale_x_continuous(breaks=1:6)

Log scale on bar plot brake axis values [duplicate]

This question already has answers here:
Bar plot with log scales
(2 answers)
Closed 2 years ago.
I'm making a the following bar plot with ggplot:
df %>% ggplot( aes(x= group,y= cases,fill=color ) ) +
geom_bar(stat="identity") +
theme_minimal()
Which gives the following result:
The issue is that the smaller colors are not visible, hence I tried to use a log scale:
df %>% ggplot( aes(x= group,y= cases,fill=color ) ) +
geom_bar(stat="identity") +
scale_y_log10(labels = comma) +
theme_minimal()
But this completelly broke the scales, now I´m getting a 10 MM value from nowhere and bar sizes are wrong
The data I´m ussing for this is the following:
index,group,color,cases
1,4,4,9
2,4,3,61
3,1,1,5000
4,4,2,138
5,4,1,246
6,3,1,359
7,2,1,2000
8,3,2,57
9,1,2,153
10,2,2,130
11,2,3,15
12,1,3,23
13,3,3,11
14,2,4,1
TL;DR: You cannot and should not use a log scale with a stacked barplot. If you want to use a log scale, use a "dodged" barplot instead. You'll also have better luck to use geom_col instead of geom_bar here and set your fill= variable as a factor.
Geom_col vs. geom_bar
Try using geom_col in place of geom_bar. You can use coord_flip() if the direction is not to your liking. See here for reference, but the gist of the issue is that geom_bar should be used when you want to plot against "count", and geom_col should be used when you want to plot against "values". Here, your y-axis is "cases" (a value), so use geom_col.
The Problem with log scales and Stacked Barplots
With that being said, u/Dave2e is absolutely correct. The plot you are getting makes sense, because the underlying math being done to calculate the y-axis values is: log10(x) + log10(y) + log10(z) instead of what you expected, which was log10(x + y + z).
Let's use the numbers in your actual data frame for comparison here. In "group 1", you have the following:
index group color cases
3 1 1 5000
9 1 2 153
12 1 3 23
So on the y-axis what's happening is the total value of a stacked barplot (without a log scale) will be the sum of all. In other words:
> 5000 + 153 + 23
[1] 5176
This means that each of the bars represents the correct relative size, and when you add them up (or stack them up), the total size of the bar is equivalent to the total sum. Makes sense.
Now consider the same case, but for a log10 scale:
> log10(5000) + log10(153) + log10(23)
[1] 7.245389
Or, just about 17.5 million. The total height of the bar is still the sum of all individual bars (because that's what a stacked barplot is), and you can still compare the relative sizes, but the sum total of the individual logs does not equal the log of the sum:
>log10(5000 + 153 + 23)
[1] 3.713994
Suggested Way to Change your Plot
Moral of the story: you can still use a log scale to "stretch out" the small bars, but don't stack them. Use postion='dodge':
df %>% ggplot( aes(x= group,y= log10(cases),fill=as.factor(color) ) ) +
geom_col(position='dodge') +
theme_minimal()
Finally, position='dodge' (or position=position_dodge(width=...)) does not work with fill=color, since df$color is not a factor (it's numeric). This is also why your legend is showing a gradient for a categorical variable. That's why I used as.factor(color) in the ggplot call here, although you can also just apply that to the original dataset with df$color <- as.factor(df$color) and do the same thing.

How to add legend to plot with data from multiple data frames

I have scripted a ggplot compiled from two separate data frames, but as it stands there is no legend as the colours aren't included in aes. I'd prefer to keep the two datasets separate if possible, but can't figure out how to add the legend. Any thoughts?
I've tried adding the colours directly to the aes function, but then colours are just added as variables and listed in the legend instead of colouring the actual data.
Plotting this with base r, after creating the plot I would've used:
legend("top",c("Delta 18O","Delta 13C"),fill=c("red","blue")
and gotten what I needed, but I'm not sure how to replicate this in ggplot.
The following code currently plots exactly what I want, it's just missing the legend... which ideally should match what the above line would produce, except the "18" and "13" need superscripted.
Examples of an old plot using base r (with a correct legend, except lacking superscripted 13 and 18) and the current plot missing the legend can be found here:
Old: https://imgur.com/xgd9e9C
New, missing legend: https://imgur.com/eGRhUzf
Background data
head(avar.data.x)
time av error
1 1.015223 0.030233604 0.003726832
2 2.030445 0.014819145 0.005270609
3 3.045668 0.010054801 0.006455241
4 4.060891 0.007477541 0.007453974
5 5.076113 0.006178282 0.008333912
6 6.091336 0.004949045 0.009129470
head(avar.data.y)
time av error
1 1.015223 0.06810001 0.003726832
2 2.030445 0.03408136 0.005270609
3 3.045668 0.02313839 0.006455241
4 4.060891 0.01737148 0.007453974
5 5.076113 0.01405144 0.008333912
6 6.091336 0.01172788 0.009129470
The following avarn function produces a data frame with three columns and several thousand rows (see header above). These are then graphed over time on a log/log plot.
avar.data.x <- avarn(data3$"d Intl. Std:d 13C VPDB - Value",frequency)
avar.data.y <- avarn(data3$"d Intl. Std:d 18O VPDB-CO2 - Value",frequency)
Create allan deviation plot
ggplot()+
geom_line(data=avar.data.y,aes(x=time,y=sqrt(av)),color="red")+
geom_line(data=avar.data.x,aes(x=time,y=sqrt(av)),color="blue")+
scale_x_log10()+
scale_y_log10()+
labs(x=expression(paste("Averaging Time ",tau," (seconds)")),y="Allan Deviation (per mil)")
The above plot is only missing a legend to show the name of the two plotted datasets and their respective colours. I would like the legend in the top centre of the graph.
How to superscript legend titles?:
ggplot()+
geom_line(data=avar.data.y,aes(x=time,y=sqrt(av),
color =expression(paste("Delta ",18^,"O"))))+
geom_line(data=avar.data.xmod,aes(x=time,y=sqrt(av),
color=expression(paste("Delta ",13^,"C"))))+
scale_color_manual(values = c("blue", "red"),name=NULL) +
scale_x_log10()+
scale_y_log10()+
labs(
x=expression(paste("Averaging Time ",tau," (seconds)")),
y="Allan Deviation (per mil)") +
theme(legend.position = c(0.5, 0.9))
Set color inside the aes and add a scale_color_ function to your plot should do the trick.
ggplot()+
geom_line(data=avar.data.y,aes(x=time,y=sqrt(av), color = "a"))+
geom_line(data=avar.data.x,aes(x=time,y=sqrt(av), color="b"))+
scale_color_manual(
values = c("red", "blue"),
labels = expression(avar.data.x^2, "b")
) +
scale_x_log10()+
scale_y_log10()+
labs(
x=expression(paste("Averaging^2 Time ",tau," (seconds)")),
y="Allan Deviation (per mil)") +
theme(legend.position = c(0.5, 0.9))
You can make better use of ggplot's aesthetics by combining both data sets into one. This is particularly easy when your data frames have the same structure. Here, you could then for example use color.
This way you only need one call to geom_line and it is easier to control the legend(s). You could even make some fancy function to automate your labels. etc.
Also note that white spaces in column names are not great (you're making your own life very difficult) and that you may want to think about automating your avarn calls, e.g. with lapply, which would result in a list of data frames and makes the binding of the data frames even easier.
avar.data.x <- readr::read_table("0 time av error
1 1.015223 0.030233604 0.003726832
2 2.030445 0.014819145 0.005270609
3 3.045668 0.010054801 0.006455241
4 4.060891 0.007477541 0.007453974
5 5.076113 0.006178282 0.008333912
6 6.091336 0.004949045 0.009129470")
avar.data.y <- readr::read_table("0 time av error
1 1.015223 0.06810001 0.003726832
2 2.030445 0.03408136 0.005270609
3 3.045668 0.02313839 0.006455241
4 4.060891 0.01737148 0.007453974
5 5.076113 0.01405144 0.008333912
6 6.091336 0.01172788 0.009129470")
library(tidyverse)
combine_df <- bind_rows(list(a = avar.data.x, b = avar.data.y), .id = 'ID')
ggplot(combine_df)+
geom_line(aes(x = time, y = sqrt(av), color = ID))+
scale_color_manual(values = c("red", "blue"),
labels = c(expression("Delta 18"^"O"), expression("Delta 13"^"C")))
Created on 2019-11-11 by the reprex package (v0.2.1)

vertical line chart - change line plotting direction to top-down in R

I am looking for a way where data points are connected following a top-down manner to visualize a ranking. In that the y-axis represents the rank and the x-axis the attributes. With the normal setting the line connects the point starting from left to right. This results that the points are connected in the wrong order.
With the data below the line should be connected from (6,1) to (4,2) and then (5,3) etc. Optimally the ranking scale need to be inverted so that rank one starts on the top.
data <- read.table(header=TRUE, text='
attribute rank
1 6
2 5
3 4
4 2
5 3
6 1
7 7
8 11
9 10
10 8
11 9
')
plot(data$attribute,data$rank,type="l")
Is there a way to change the line drawing direction? My second idea would be to rotate the graph or maybe you have better ideas.
The graph I am trying to achieve is somewhat similar to this one:
example vertical line chart
You can do this with ggplot:
library(ggplot2)
ggplot(data, aes(y = attribute, x = rank)) +
geom_line() +
coord_flip() +
scale_x_reverse()
It solves the problem exactly the way you suggested. The first part of the command (ggplot(...) + geom_line()) creates an "ordinary" line plot. Note that I have already switched x- and y-coordinates. The next command (coord_flip()) flips x- and y-axis, and the last one (scale_x_reverse) changes the ordering of the x-axis (which is plotted as the y-axis) such that 1 is in the top left corner.
Just to show you that something like the example you linked in your question can be done with ggplot2, I add the following example:
library(tidyr)
data$attribute2 <- sample(data$attribute)
data$attribute3 <- sample(data$attribute)
plot_data <- pivot_longer(data, cols = -"rank")
ggplot(plot_data, aes(y = value, x = rank, colour = name)) +
geom_line() +
geom_point() +
coord_flip() +
scale_x_reverse()
If you intend to do your plots with R, learning ggplot2 is really worthwhile. You can find many examples on Cookbook for R.

How do you plot two vectors on x-axis and another on y-axis in ggplot2

I am trying to plot two vectors with different values, but equal length on the same graph as follows:
a<-23.33:52.33
b<-33.33:62.33
days<-1:30
df<-data.frame(x,y,days)
a b days
1 23.33 33.33 1
2 24.33 34.33 2
3 25.33 35.33 3
4 26.33 36.33 4
5 27.33 37.33 5
etc..
I am trying to use ggplot2 to plot x and y on the x-axis and the days on the y-axis. However, I can't figure out how to do it. I am able to plot them individually and combine the graphs, but I want just one graph with both a and b vectors (different colors) on x-axis and number of days on y-axis.
What I have so far:
X<-ggplot(df, aes(x=a,y=days)) + geom_line(color="red")
Y<-ggplot(df, aes(x=b,y=days)) + geom_line(color="blue")
Is there any way to define the x-axis for both a and b vectors? I have also tried using the melt long function, but got stuck afterwards.
Any help is much appreciated. Thank you
I think the best way to do it is via a the approach of melting the data (as you have mentioned). Especially if you are going to add more vectors. This is the code
library(reshape2)
library(ggplot2)
a<-23:52
b<-33:62
days<-1:30
df<-data.frame(x=a,y=b,days)
df_molten=melt(df,id.vars="days")
ggplot(df_molten) + geom_line(aes(x=value,y=days,color=variable))
You can also change the colors manually via scale_color_manual.
A simpler solution is to use only ggplot. The following code will work in your case
a<-23.33:52.33
b<-33.33:62.33
days<-1:30
df<-data.frame(a,b,days)
ggplot(data = df)+
geom_line(aes(x = df$days,y = df$a), color = "blue")+
geom_line(aes(x = df$days,y = df$b), color = "red")
I added the colors, you might want to use them to differentiate between your variables.

Resources