R ggplot horizontal bar chart with thousand data - r

I am trying to produce a bar graph that has thousand data.
I have size problem with ggplot.
Code :
ggplot(data = df, aes(x=extension, y=duration)) +
geom_bar(stat="identity", width=10,fill="steelblue")+
ggtitle("Chart") +
xlab("Number") +
ylab("Duration") +
theme(legend.position = "none")+
theme(plot.title = element_text(hjust = 0.5), plot.subtitle = element_text(hjust = 0.5))+
coord_flip()
Output:
Chart output
Load data frame from MongoDB.
Data Frame:
1 36952 7158803
2 36110 7068360
3 36080 4736043
4 36509 4726630
5 36890 4699026
6 36051 4698594
7 36783 4677233
8 36402 4672623
9 36880 4672093
10 36513 4655583
11 36522 4630962
12 36116 4628046
13 36746 4593291
....

From your sample chart I would infer that your x-axis (extension) is probably a factor. If it were numeric, ggplot would correctly scale the axis.
I would recommend to check the class of the columns of your dataset. Make sure that both are numeric.
Alternatively, you would have to come up with an appropriate scaling of your x-axis.
Here's the plot where your flipped x-axis is a factor; ggplot tries to render every separate level of the factor and they overlap as there are so many. I created some fake data quickly to mimic yours.
Here's the plot where extension is numeric and ggplot neatly scales this correctly.

Related

Partaly "free_y" Facet Wrap with ggplot

my goal is to produce a column graph showing different element concentrations.
there is a very wide range so I want to customise the scale of my faceted graph into 3 groups.
that way the graphs are able to show the variation in samples for each element and still be comparable between elements,
so idealy I would have 3 different scales for Groups 1,2,and 3 in the graph below.
this is the code to make the above graph
ggplot(binded)+
aes(y=mean,
x=sample,
group=id)+
geom_col(aes(fill=element))+
geom_errorbar(aes(ymin = mean - sd,
ymax = mean + sd))+
facet_wrap(rang~element)+
scale_x_continuous(breaks = seq(1,15,by=1),
name = "Sample ID")+
scale_y_continuous(name="Elemental Conc. (mg/kg)",labels = comma)+
theme(legend.position = "none")
and the data used is below
if i swich the facting to facet_wrap(rang~element,scales = "free_y") then i get
is there any way to mage the scales only free within each group of rang?
i suspect im going to have to just create 3 seperat graphs.
Thanks to Danlooo for the suggestion of patchwork that package and creating 3 separate graphs + plus another one for the y axis label proved successful.
I produced several graphs with the original code and a data frame filters for different concentrations. and the following patchwork code to produce the following graph
p5<-(p1 | p2) / p3+ plot_layout(heights=c(1,2))
(p4+p5)+plot_layout(widths = c(1, 25))

How to add legend to plot with data from multiple data frames

I have scripted a ggplot compiled from two separate data frames, but as it stands there is no legend as the colours aren't included in aes. I'd prefer to keep the two datasets separate if possible, but can't figure out how to add the legend. Any thoughts?
I've tried adding the colours directly to the aes function, but then colours are just added as variables and listed in the legend instead of colouring the actual data.
Plotting this with base r, after creating the plot I would've used:
legend("top",c("Delta 18O","Delta 13C"),fill=c("red","blue")
and gotten what I needed, but I'm not sure how to replicate this in ggplot.
The following code currently plots exactly what I want, it's just missing the legend... which ideally should match what the above line would produce, except the "18" and "13" need superscripted.
Examples of an old plot using base r (with a correct legend, except lacking superscripted 13 and 18) and the current plot missing the legend can be found here:
Old: https://imgur.com/xgd9e9C
New, missing legend: https://imgur.com/eGRhUzf
Background data
head(avar.data.x)
time av error
1 1.015223 0.030233604 0.003726832
2 2.030445 0.014819145 0.005270609
3 3.045668 0.010054801 0.006455241
4 4.060891 0.007477541 0.007453974
5 5.076113 0.006178282 0.008333912
6 6.091336 0.004949045 0.009129470
head(avar.data.y)
time av error
1 1.015223 0.06810001 0.003726832
2 2.030445 0.03408136 0.005270609
3 3.045668 0.02313839 0.006455241
4 4.060891 0.01737148 0.007453974
5 5.076113 0.01405144 0.008333912
6 6.091336 0.01172788 0.009129470
The following avarn function produces a data frame with three columns and several thousand rows (see header above). These are then graphed over time on a log/log plot.
avar.data.x <- avarn(data3$"d Intl. Std:d 13C VPDB - Value",frequency)
avar.data.y <- avarn(data3$"d Intl. Std:d 18O VPDB-CO2 - Value",frequency)
Create allan deviation plot
ggplot()+
geom_line(data=avar.data.y,aes(x=time,y=sqrt(av)),color="red")+
geom_line(data=avar.data.x,aes(x=time,y=sqrt(av)),color="blue")+
scale_x_log10()+
scale_y_log10()+
labs(x=expression(paste("Averaging Time ",tau," (seconds)")),y="Allan Deviation (per mil)")
The above plot is only missing a legend to show the name of the two plotted datasets and their respective colours. I would like the legend in the top centre of the graph.
How to superscript legend titles?:
ggplot()+
geom_line(data=avar.data.y,aes(x=time,y=sqrt(av),
color =expression(paste("Delta ",18^,"O"))))+
geom_line(data=avar.data.xmod,aes(x=time,y=sqrt(av),
color=expression(paste("Delta ",13^,"C"))))+
scale_color_manual(values = c("blue", "red"),name=NULL) +
scale_x_log10()+
scale_y_log10()+
labs(
x=expression(paste("Averaging Time ",tau," (seconds)")),
y="Allan Deviation (per mil)") +
theme(legend.position = c(0.5, 0.9))
Set color inside the aes and add a scale_color_ function to your plot should do the trick.
ggplot()+
geom_line(data=avar.data.y,aes(x=time,y=sqrt(av), color = "a"))+
geom_line(data=avar.data.x,aes(x=time,y=sqrt(av), color="b"))+
scale_color_manual(
values = c("red", "blue"),
labels = expression(avar.data.x^2, "b")
) +
scale_x_log10()+
scale_y_log10()+
labs(
x=expression(paste("Averaging^2 Time ",tau," (seconds)")),
y="Allan Deviation (per mil)") +
theme(legend.position = c(0.5, 0.9))
You can make better use of ggplot's aesthetics by combining both data sets into one. This is particularly easy when your data frames have the same structure. Here, you could then for example use color.
This way you only need one call to geom_line and it is easier to control the legend(s). You could even make some fancy function to automate your labels. etc.
Also note that white spaces in column names are not great (you're making your own life very difficult) and that you may want to think about automating your avarn calls, e.g. with lapply, which would result in a list of data frames and makes the binding of the data frames even easier.
avar.data.x <- readr::read_table("0 time av error
1 1.015223 0.030233604 0.003726832
2 2.030445 0.014819145 0.005270609
3 3.045668 0.010054801 0.006455241
4 4.060891 0.007477541 0.007453974
5 5.076113 0.006178282 0.008333912
6 6.091336 0.004949045 0.009129470")
avar.data.y <- readr::read_table("0 time av error
1 1.015223 0.06810001 0.003726832
2 2.030445 0.03408136 0.005270609
3 3.045668 0.02313839 0.006455241
4 4.060891 0.01737148 0.007453974
5 5.076113 0.01405144 0.008333912
6 6.091336 0.01172788 0.009129470")
library(tidyverse)
combine_df <- bind_rows(list(a = avar.data.x, b = avar.data.y), .id = 'ID')
ggplot(combine_df)+
geom_line(aes(x = time, y = sqrt(av), color = ID))+
scale_color_manual(values = c("red", "blue"),
labels = c(expression("Delta 18"^"O"), expression("Delta 13"^"C")))
Created on 2019-11-11 by the reprex package (v0.2.1)

ggplot2: facets: different axis limits and free space

I want to display two dimensions in my data, (1) reporting entity in different facets and (2) country associated to the data point on the x-axis. The problem is that the country dimension includes a "total", which is a lot higher than all of the individual values, so I would need an own axis limit for that.
My solution was to try another facetting dimension, but I could not get it working and looking nicely at the same time. Consider the following dummy data:
id <- c(1,1,1,1,1,1,2,2,2,2,2,2)
country <- c("US","US","UK","World","World","World","US","US","UK","World","World","World")
value <- c(150,40,100,1000,1100,1500,5,10,20,150,200,120)
# + some other dimensions
mydat <- data.frame(id,country,value)
id country value
1 1 US 150
2 1 US 40
3 1 UK 100
4 1 World 1000
5 1 World 1100
6 1 World 1500
7 2 US 5
8 2 US 10
9 2 UK 20
10 2 World 150
11 2 World 200
12 2 World 120
If I use a facet grid to display a world total, the axis limit is forced for the other countries as well:
mydat$breakdown <- mydat$country == "World"
ggplot(mydat) + aes(x=country,y=value) + geom_point() +
facet_grid(id ~ breakdown,scales = "free",space = "free_x") +
theme(strip.text.x = element_blank() , strip.background = element_blank(),
plot.margin = unit( c(0,0,0,0) , units = "lines" ) )
(the last part of the plot is just to remove the additional strip).
If I use a facet wrap, it does give me different axis limits for each plot, but then I cannot pass the space = "free_x" argument, meaning that the single column for the total will consume the same space as the entire country overview, which looks ugly for data sets with many countries:
ggplot(mydat) + aes(x=country,y=value) + geom_point() +
facet_wrap(id ~ breakdown,scales = "free")
There are several threads here which ask similar questions, but none of the answers helped me to achieve this yet.
Different axis limits per facet in ggplot2
Is it yet possible to have different axis breaks / limits for individual facets in ggplot with free scale?
Setting individual axis limits with facet_wrap and scales = "free" in ggplot2
Maybe try gridExtra::grid.arrange or cowplot::plot_grid:
lst <- split(mydat, list(mydat$breakdown, mydat$id))
plots <- lapply(seq(lst), function(x) {ggplot(lst[[x]]) +
aes(x=country,y=value) +
geom_point() +
ggtitle(names(lst)[x]) + labs(x=NULL, y=NULL)
})
do.call(gridExtra::grid.arrange,
c(plots, list(ncol=2, widths=c(2/3, 1/3)),
left="Value", bottom="country"))

Create a grouped barplot in R using ggplot

I'm trying to create a grouped barplot using ggplot due to the more aesthetically pleasing quality it produces. I have a dataframe, together, containing the values and the name of each value but I can't manage to create the plot it? the dataframe is as follows
USperReasons USperReasonsNY USuniquNegR
1 0.198343304187759 0.191304347826087 Late Flight
2 0.35987114588127 0.321739130434783 Customer Service Issue
3 0.0667280257708237 0.11304347826087 Lost Luggage
4 0.0547630004601933 0.00869565217391304 Flight Booking Problems
5 0.109065807639208 0.121739130434783 Can't Tell
6 0.00460193281178095 0 Damaged Luggage
7 0.0846755637367694 0.0782608695652174 Cancelled Flight
8 0.0455591348366314 0.0521739130434783 Bad Flight
9 0.0225494707777266 0.0347826086956522 longlines
10 0.0538426138978371 0.0782608695652174 Flight Attendant Complaints
I tried different methods with errors in all, one such example is below
ggplot(together,aes(USuniquNegR, USperReasons,USperReasonsNY))+ geom_bar(position = "dodge")
Thanks,
Alan.
df <- reshape2::melt(together, 3)
ggplot(reshape2::melt(df, 3),
aes(USuniquNegR, value, fill = variable)) +
geom_bar(stat = 'identity', position = 'dodge') +
coord_flip() +
theme(legend.position = 'top')

R - Secondary y-axis line not showing up when using gtable_add_grob

I'm trying to overlay precipitation data over water quality data I've been collecting. I've made the water quality data and precipitation plots separately and am now trying to combine them using gtable_add_grob (a la http://rpubs.com/kohske/dual_axis_in_ggplot2). I've got the plot almost finished and looking good, but am running into a problem with the secondary y-axis not displaying. My code is as follows (for example):
y=(1:12)
y2=(12:1)
x=seq(as.Date("2014-01-01"), as.Date("2014-12-31"), by="months")
df=data.frame(x,y)
df2=data.frame(x,y2)
#plot1
g<-ggplot(df,aes(x,y))
g<-g+geom_bar(stat="identity",alpha=0.4)
g<-g+scale_y_reverse()
g<-g+theme(panel.grid = element_blank())
g<-g+theme(panel.background = element_blank())
g<-g+scale_x_date(labels = date_format("%b-%y"),breaks = date_breaks("months"))
g<-g+theme(axis.text.x = element_text(angle=45,hjust=1,color="black"))
g<-g+theme(axis.text.y = element_text(color="black"))
g<-g+theme(panel.grid = element_blank())
g<-g+theme(axis.line=element_line(colour="black"))
#print(g) #looks fine with axes lines
#plot2
g2<-ggplot(df,aes(x,y))
g2<-g2+geom_line()
g2<-g2+theme(panel.grid = element_blank())
g2<-g2+theme(panel.background = element_blank())
g2<-g2+scale_x_date(labels = date_format("%b-%y"),breaks = date_breaks("months"))
g2<-g2+theme(axis.text.x = element_text(angle=45,hjust=1,color="black"))
g2<-g2+theme(axis.text.y = element_text(color="black"))
g2<-g2+theme(panel.grid = element_blank())
g2<-g2+theme(axis.line=element_line(colour="black"))
#print(g2) #looks fine with axes lines
#combining them
gnew1<-ggplot_gtable(ggplot_build(g))
gnew2<-ggplot_gtable(ggplot_build(g2))
gg<-c(subset(gnew1$layout,name=="panel",se=t:r))
gnew<-gtable_add_grob(gnew2,gnew1$grobs[[which(gnew1$layout$name=="panel")]],pp$t,pp$l,pp$b,pp$l)
#attempted secondary axis
ia<-which(gnew1$layout$name=="axis-l")
ga<-gnew1$grobs[[ia]]
ax<-ga$children[[2]]
ax$widths<-rev(ax$widths)
ax$grobs<-rev(ax$grobs)
ax$grobs[[1]]$x<-ax$grobs[[1]]$x - unit(1,"npc") + unit(0.15, "cm")
gnew<-gtable_add_cols(gnew,p1$widths[p1$layout[ia, ]$l], length(g1$widths) - 1)
gnew<-gtable_add_grob(gnew, ax, pp$t,length(gnew$widths)-1)
grid.draw(gnew)
This gives me a plot that looks like so:
My problem is that I want the secondary y-axis line to show up as well - you can see it's missing here. My original suspicion was that it had something to do with my making the panels grids and backgrounds blank for both graphs, but the axes lines on the independent graphs plot fine after use of
axis.line=element_line(colour="black")
Additionally, I need the clear backgrounds for the way this data will be displayed (so if it is this, is there a work around?). I went through the combining graphs portion of the code step by step and it seems to be working as intended. My output for the combined graph is
> gnew
TableGrob (6 x 6) "layout": 10 grobs
z cells name grob
1 0 (1-6,1-6) background rect[plot.background.rect.1102]
2 3 (3-3,3-3) axis-l absoluteGrob[GRID.absoluteGrob.1094]
3 1 (4-4,3-3) spacer zeroGrob[NULL]
4 2 (3-3,4-4) panel gTree[GRID.gTree.1080]
5 4 (4-4,4-4) axis-b absoluteGrob[GRID.absoluteGrob.1087]
6 5 (5-5,4-4) xlab text[axis.title.x.text.1096]
7 6 (3-3,2-2) ylab text[axis.title.y.text.1098]
8 7 (2-2,4-4) title text[plot.title.text.1100]
9 8 (3-3,4-4) layout gTree[GRID.gTree.1048]
10 9 (3-3,5-5) layout gtable[axis]
This is similar to the output of the combined graph from my own data. Any thoughts on why the secondary y-axis line will not display?
So I found a work-around to the right y-axis when combining plots if anyone is looking at this later and is curious.
Using the gridExtra package, I created a manual border around the first plot's right side using borderGrob (see http://rgm3.lab.nig.ac.jp/RGM/R_rdfile?f=gridExtra/man/borderGrob.Rd&d=R_CC). The first plot may look a little "silly" when plotted on its own after applying the border, but my goal was to combine plots so the first independent plot doesn't really concern me. In addition I noticed a few mistypes in the example code where I didn't correct the copy and paste from the code I was using for my specific work, so sorry if anyone was trying to help and couldn't reproduce the example! The corrected code would go as follows:
##apologies for not adding these in the question
library(ggplot2)
library(scales)
library(gtable)
library(gridExtra)
y=(1:12)
y2=(12:1)
x=seq(as.Date("2014-01-01"), as.Date("2014-12-31"), by="months")
df=data.frame(x,y)
df2=data.frame(x,y2)
#plot1
g<-ggplot(df,aes(x,y))
g<-g+geom_bar(stat="identity",alpha=0.4)
g<-g+scale_y_reverse()
g<-g+theme(panel.grid = element_blank())
g<-g+theme(panel.background = element_blank())
g<-g+scale_x_date(labels = date_format("%b-%y"),breaks = date_breaks("months"))
g<-g+theme(axis.text.x = element_text(angle=45,hjust=1,color="black"))
g<-g+theme(axis.text.y = element_text(color="black"))
##creating a border on the right side (type=3) ##make sure colour is spelled with a u!
gg<-borderGrob(type=3,colour="black",lwd=1)
##adding the new border
g<-g+annotation_custom(gg)
g<-g+theme(axis.line=element_line(colour="black"))
#print(g) ##now plotted with 3 axis lines (left, bottom, right)
#plot2
g2<-ggplot(df,aes(x,y))
g2<-g2+geom_line()
g2<-g2+theme(panel.grid = element_blank())
g2<-g2+theme(panel.background = element_blank())
g2<-g2+scale_x_date(labels = date_format("%b-%y"),breaks = date_breaks("months"))
g2<-g2+theme(axis.text.x = element_text(angle=45,hjust=1,color="black"))
g2<-g2+theme(axis.text.y = element_text(color="black"))
g2<-g2+theme(panel.grid = element_blank())
g2<-g2+theme(axis.line=element_line(colour="black"))
#print(g2)
#combining them
gnew1<-ggplot_gtable(ggplot_build(g))
gnew2<-ggplot_gtable(ggplot_build(g2))
gg<-c(subset(gnew1$layout,name=="panel",se=t:r))
gnew<-gtable_add_grob(gnew2,gnew1$grobs[[which(gnew1$layout$name=="panel")]],gg$t,gg$l,gg$b,gg$l) ##fixed pp->gg
#extracting the axis from plot1
ia<-which(gnew1$layout$name=="axis-l")
ga<-gnew1$grobs[[ia]]
ax<-ga$children[[2]]
ax$widths<-rev(ax$widths)
ax$grobs<-rev(ax$grobs)
ax$grobs[[1]]$x<-ax$grobs[[1]]$x - unit(1,"npc") + unit(0.15, "cm")
gnew<-gtable_add_cols(gnew,gnew1$widths[gnew1$layout[ia, ]$l], length(gnew2$widths) - 1) ##fixed g1->gnew ##fixed p1->gnew2 (twice)
gnew<-gtable_add_grob(gnew, ax, gg$t,length(gnew$widths)-1) ##fixed pp->gg
grid.draw(gnew)
This produces:
this!

Resources