ggplot2: plotting error bars for groups without overlap - r

I wish to show the effect of two pollutants on the same outcome and was happy with the plot when there are no groups. Now when I want to plot the same data for all-year and stratified by season, I either get overlaps of error bars or three separate panels which are not optimal for my need.
Sample data could be accessed from here:
https://drive.google.com/file/d/0B_4NdfcEvU7LV2RrMjVyUmpoSDg/edit?usp=sharing
As an example with the following code I create a plot for all-year:
ally<-subset(df, seas=="allyear")
ggplot(ally,aes(x = set, y = pinc,ymin = lcinc, ymax =ucinc,color=pair,shape=pair)) +
geom_point(position=position_dodge(width=0.5) ,size = 2.5) +
geom_linerange(position=position_dodge(width=0.5), size =0.5) + theme_bw() +
geom_hline(aes(yintercept = 0)) +
labs(colour="Pollutant", shape="Pollutant", y="Percent Increase", x="") +
scale_x_discrete(labels=c(NO2=expression(NO[2]),
NOx=expression(NO[x]),
Coarse= expression(Coarse),
PM25=expression(PM[2.5]),
PM10=expression(PM[10]))) +
theme(plot.title = element_text(size = 12,face="bold" )) +
theme(axis.title=element_text(size="12") ,axis.text=element_text(size=12))
But when I add facet_grid(. ~ seas) I will have three separate panels. How can I display this data for all year and divided by seasons in one panel?

Either color or shape needs to be used to represent season, not pollutant.
Then this should come close to what you want:
library(ggplot2)
ggplot(df, aes(x = set, y = pinc,ymin = lcinc, ymax =ucinc,
color=seas, shape=pair)) +
geom_point(position=position_dodge(width=0.5), size = 2.5) +
geom_linerange(position=position_dodge(width=0.5), size =0.5) + theme_bw() +
geom_hline(aes(yintercept = 0)) +
labs(colour="Season", shape="Pollutant", y="Percent Increase", x="") +
scale_x_discrete(labels=c(NO2=expression(NO[2]),
NOx=expression(NO[x]),
Coarse= expression(Coarse),
PM25=expression(PM[2.5]),
PM10=expression(PM[10]))) +
theme(plot.title = element_text(size = 12,face="bold" )) +
theme(axis.title=element_text(size="12") ,axis.text=element_text(size=12))
I do think that facetting gives you better graphs here --
if you want to focus attention on the comparison between seasons for each pollutant, use this (facet_grid(~pair, labeller=label_both)):
if you want to focus attention on the comparison between pollutants for each season, use this (facet_grid(~seas, labeller=label_both)):

Related

How to add percentages on top of an histogram when data is grouped

This is not my data (for confidentiality reasons), but I have tried to create a reproducible example using a dataset included in the ggplot2 library. I have an histogram summarizing the value of some variable by group (factor of 2 levels). First, I did not want the counts but proportions of the total, so I used that code:
library(ggplot2)
library(dplyr)
df_example <- diamonds %>% as.data.frame() %>% filter(cut=="Premium" | cut=="Ideal")
ggplot(df_example,aes(x=z,fill=cut)) +
geom_histogram(aes(y=after_stat(width*density)),binwidth=1,center=0.5,col="black") +
facet_wrap(~cut) +
scale_x_continuous(breaks=seq(0,9,by=1)) +
scale_y_continuous(labels=scales::percent_format(accuracy=2,suffix="")) +
scale_fill_manual(values=c("#CC79A7","#009E73")) +
labs(x="Depth (mm)",y="Count") +
theme_bw() + theme(legend.position="none")
It gave me this as a result.
enter image description here
The issue is that I would like to print the numeric percentages on top of the bins and haven't find a way to do so.
As I saw it done for printing counts elsewhere, I attempted to print them using stat_bin(), including the same y and label values as the y in geom_histogram, thinking it would print the right numbers:
ggplot(df_example,aes(x=z,fill=cut)) +
geom_histogram(aes(y=after_stat(width*density)),binwidth=1,center=0.5,col="black") +
stat_bin(aes(y=after_stat(width*density),label=after_stat(width*density*100)),geom="text",vjust=-.5) +
facet_wrap(~cut) +
scale_x_continuous(breaks=seq(0,9,by=1)) +
scale_y_continuous(labels=scales::percent_format(accuracy=2,suffix="")) +
scale_fill_manual(values=c("#CC79A7","#009E73")) +
labs(x="Depth (mm)",y="%") +
theme_bw() + theme(legend.position="none")
However, it does print way more values than there are bins, these values do not appear consistent with what is portrayed by the bar heights and they do not print in respect to vjust=-.5 which would make them appear slightly above the bars.
enter image description here
What am I missing here? I know that if there was no grouping variable/facet_wrap, I could use after_stat(count/sum(count)) instead of after_stat(width*density) and it seems that it would have fixed my issue. But I need the histograms for both groups to appear next to each other. Thanks in advance!
You have to use the same arguments in stat_bin as for the histogram when adding your labels to get same binning for both layers and to align the labels with the bars:
library(ggplot2)
library(dplyr)
df_example <- diamonds %>%
as.data.frame() %>%
filter(cut == "Premium" | cut == "Ideal")
ggplot(df_example, aes(x = z, fill = cut)) +
geom_histogram(aes(y = after_stat(width * density)),
binwidth = 1, center = 0.5, col = "black"
) +
stat_bin(
aes(
y = after_stat(width * density),
label = scales::number(after_stat(width * density), scale = 100, accuracy = 1)
),
geom = "text", binwidth = 1, center = 0.5, vjust = -.25
) +
facet_wrap(~cut) +
scale_x_continuous(breaks = seq(0, 9, by = 1)) +
scale_y_continuous(labels = scales::number_format(scale = 100)) +
scale_fill_manual(values = c("#CC79A7", "#009E73")) +
labs(x = "Depth (mm)", y = "%") +
theme_bw() +
theme(legend.position = "none")

How to rescale y axis to see proportional amounts?

I have a data set of around 70k obs. and I want to plot them in a x axis with 5(or more) different factors and wrap them through three types of different severity.
The main problem is that the majority of obs are gathered in 1 factor (severity =3 ) so i can't even read the other 2. ylim doesn't help me because it actually changes the results completely instead of make them a percentage.
Should I do the separation by myself? Or is there any command that could do that for me?
I am attaching below an image to make my problem more comprehensive.
I want to judge each factor based on severity.
Here is the sample of the code.
acc.10 <- read.csv("Accidents2010.csv")
install.packages("ggplot2")
library(ggplot2)
install.packages("stringr")
library(stringr)
acc.10$Road_Type <- as.factor(acc.10$Road_Type)
acc.10$X1st_Road_Class <- as.factor(acc.10$X1st_Road_Class)
ggplot(acc.10, aes(x = Road_Type )) +
geom_bar(width = 0.4) +
ggtitle("Accidents based on Road Type") +
xlab("Road Type")
ggplot(acc.10, aes(x = acc.10$X1st_Road_Class )) +
geom_bar(width = 0.4) +
ggtitle("Accidents based on 1st Road Class") +
xlab("1st Road Class")
data.10 <- acc.10[which(acc.10$X1st_Road_Class == 3),]
#we will check light conditions in order to
data.10$Light_Conditions <- as.factor(data.10$Light_Conditions)
#we plot to see the distribution
ggplot(data.10, aes(x = Light_Conditions)) +
geom_bar(width = 0.5) +
ggtitle("Accidents based on Light Conditions") +
xlab("Light Conditions")
ggplot(data.10[which(as.numeric(data.10$Accident_Severity) == 3),]
, aes(x = Light_Conditions)) +
geom_bar(width = 0.5) +
ggtitle("Accidents based on Light Conditions") +
xlab("Light Conditions")
#We drill harder to see if there are connections of survivability
data.10$Accident_Severity <- as.factor(data.10$Accident_Severity)
ggplot(data.10, aes(x = Light_Conditions, fill = Accident_Severity)) +
geom_bar(width = 0.5) +
ggtitle("Accidents based on Light Conditions and Survivability") +
xlab("Light Conditions")
# We will try to wrap them based on severity instead of the bar graph
ggplot(data.10, aes (x = Light_Conditions)) +
geom_bar(width = 0.5) +
ggtitle("Accident seperated by severity affected of Light Conditions") +
facet_wrap(~Accident_Severity) +
xlab("Light Conditions") +
ylab("Total Count")
And the file with data is here: https://data.gov.uk/dataset/cb7ae6f0-4be6-4935-9277-47e5ce24a11f/road-safety-data/datafile/4c03ef8d-992d-44df-8543-412d23f3661b/preview
Thanks a lot to #Peter K his solution worked
It is not in percentage the y axis but it does not really matter because the data now
are clearly readable.
I set the sample code
ggplot(data.10, aes (x = Light_Conditions)) +
geom_bar(width = 0.5) +
ggtitle("Accident seperated by severity affected of Light Conditions") +
facet_wrap(~Accident_Severity, scales = 'free_y') +
xlab("Light Conditions") +
ylab("Total Count")
the command facet_wrap(~Accident_Severity, scales = 'free_y') solved the problem
https://i.imgur.com/gyXV1EZ.png
The photo is above but i dont have the reputation to post it. Thanks a lot again.

How to draw a table using ggplot2

I have a dataframe of football matches in la liga and I want to make a table where each row and column is a team name and each tile shows what the result was in the game between the two teams of row and column
I've tried many using geom_tile and ggplot2 in many different ways but the closest I've come to is the below code
library(dplyr)
library(engsoccerdata)
spain = as.data.frame(spain)
library(ggplot2)
game1 = filter(spain, Season == "2012")
ggplot(game1, aes(home, reorder(visitor,desc(visitor)), fill = FT)) +
geom_tile(color="white", size=1.5, stat="identity", height=1, width=1) +
scale_fill_brewer(palette = rep(c("blue","white"),30)) +
geom_text(data=game1, aes(home, visitor, label = FT), size=rel(3)) +
scale_x_discrete(position="top") + scale_y_discrete(expand = c(0, 0)) +
xlab("Home") + ylab("Visitor") +
ggtitle("Laliga 2012")
I need the rows to be colored by oddity (odd rows white and even rows blue)
Also I want the team names to be inside tiles
over all I want my table to look like the first photo here but with striped lines
can anyone help me on modifications to my code?
You can change the row-colors by specifying a new factor just for fill. Consider e.g. this
fillframe = as.numeric(reorder(game1$visitor,desc(game1$visitor)))
fillframe = as.factor(ifelse(fillframe %% 2 == 0, 1, 0))
ggplot(game1, aes(home, reorder(visitor,desc(visitor)), fill = fillframe)) +
geom_tile(color="white", size=1.5, stat="identity", height=1, width=1) +
scale_fill_manual(values = c("white", "lightblue")) +
geom_text(data=game1, aes(home, visitor, label = FT), size=rel(3)) +
scale_x_discrete(position="top") + scale_y_discrete(expand = c(0, 0)) +
xlab("Home") + ylab("Visitor") +
ggtitle("Laliga 2012") +
theme(legend.position = "None",
axis.text.x = element_text(angle = 315))
For including the axis labels in the tiles, you'd have to expand the axis (since it is categorical, again by specifying additional factors), think this - but then you'd be better off just using Rmarkdown or HTML or so

R: how to plot a line plot with obvious distinction between different time periods (line with dots)

I have a data consisting of 14 different time periods where I would like to plot it in a way that viewer can see where the 14 periods lie. I used to achieve this through using different colors
mycolors = c(brewer.pal(name="Set2", n = 7), brewer.pal(name="Set2", n = 7))
ggplot(derv, aes(x=Date, y=derv, colour = Season)) +
geom_point() +
geom_abline(intercept = 0, slope = 0) +
geom_abline(intercept = neg.cut, slope = 0) +
geom_abline(intercept = pos.cut, slope = 0) +
scale_color_manual(values = mycolors) + ggtitle(" Derivative")+ylab("Derivative")
I have used the above code to product such as plot but now in a new report, I can only use black and white scheme. So I am wondering how I can plot such a plot in R. I have thought of using alternating line types for the 14 different time periods but I do not how to achieve through ggplot. I have tried the following code, but it does not work.The line type stayed the same.
ggplot(derv, aes(x=Date, y=derv)) +
geom_line() +
geom_abline(intercept = 0, slope = 0) +
geom_abline(intercept = neg.cut, slope = 0) +
geom_abline(intercept = pos.cut, slope = 0) +
#scale_color_manual(values = mycolors) + ggtitle("S&P 500 (Smoothed) Derivative") + ylab("Derivative")+
scale_linetype_manual(values = c("dashed","solid","dashed","solid","dashed","solid","dashed",
"solid","dashed","solid","dashed","solid","dashed","solid"))
If you need to show where season changes, couldn't you just use an alternating linetype or alternating point marker? See below for two examples. You can play around with different point markers and linetypes to get the look you want. For more on creating linetypes, see this SO answer. For more on additional point markers (beyond the standard one available through pch), see, for example, here and here. I've also included a way to add the three horizontal lines with less code.
# Fake data
x = seq(0,2*pi,length.out=20*14)
dat=data.frame(time=x, y=sin(x) + sin(5*x) + cos(2*x) + cos(7*x),
group=0:(length(x)-1) %/% 20)
ggplot(dat, aes(time, y)) +
geom_hline(yintercept=c(-0.5,0,0.5), colour="grey50") +
geom_point(aes(shape=factor(group), size=factor(group))) +
scale_shape_manual(values=rep(c(3,15),7)) +
scale_size_manual(values=rep(c(2,1.5),7)) +
theme_bw() + guides(shape=FALSE, size=FALSE)
ggplot(dat, aes(time, y, linetype=factor(group))) +
geom_hline(yintercept=c(-0.5,0,0.5), colour="grey50") +
geom_line(size=0.8) +
scale_linetype_manual(values=rep(1:2,7)) +
theme_bw() + guides(linetype=FALSE)

Annotate x-axis with N in faceted plot

I'm trying to produce a boxplot of some numeric outcome broken down by treatment condition and visit number, with the number of observations in each box placed under the plot, and the visit numbers labeled as well. Here's some fake data that will serve to illustrate, and I give two examples of things I've tried that didn't quite work.
library(ggplot2)
library(plyr)
trt <- factor(rep(LETTERS[1:2],150),ordered=TRUE)
vis <- factor(c(rep(1,150),rep(2,100),rep(3,50)),ordered=TRUE)
val <- rnorm(300)
data <- data.frame(trt,vis,val)
data.sum <- ddply(data, .(vis, trt), summarise,
N=length(na.omit(val)))
mytheme <- theme_bw() + theme(panel.margin = unit(0, "lines"), strip.background = element_blank())
The below code produces a plot that has N labels where I want them. It does this by grabbing summary data from an auxiliary dataset I created. However, I couldn't figure out how to also label visit on the x-axis (ideally, below the individual box labels), or to delineate visits visually in other ways (e.g. lines separating them into panels).
plot1 <- ggplot(data) +
geom_boxplot(aes(x=vis:trt,y=val,group=vis:trt,colour=trt), show.legend=FALSE) +
scale_x_discrete(labels=paste(data.sum$trt,data.sum$N,sep="\n")) +
labs(x="Visit") + mytheme
The plot below is closer to what I want than the one above, in that it has a nice hierarchy of treatments and visits, and a pretty format delineating the visits. However, for each panel it grabs the Ns from the first row in the summary data that matches the treatment condition, because it doesn't "know" that each facet needs to use the row corresponding to that visit.
plot2 <- ggplot(data) + geom_boxplot(aes(x=trt,y=val,group=trt,colour=trt), show.legend=FALSE) +
facet_wrap(~ vis, drop=FALSE, switch="x", nrow=1) +
scale_x_discrete(labels=paste(data.sum$trt,data.sum$N,sep="\n")) +
labs(x="Visit") + mytheme
One workaround is to manipulate your dataset so your x variable is the interaction between trt and N.
Working off what you already have, you can add N to the original dataset via a merge.
test = merge(data, data.sum)
Then make a new variable that is the combination of trt and N.
test = transform(test, trt2 = paste(trt, N, sep = "\n"))
Now make the plot, using the new trt2 variable on the x axis and using scales = "free_x" in facet_wrap to allow for the different labels per facet.
ggplot(test) +
geom_boxplot(aes(x = trt2, y = val, group = trt, colour = trt), show.legend = FALSE) +
facet_wrap(~ vis, drop = FALSE, switch="x", nrow = 1, scales = "free_x") +
labs(x="Visit") +
mytheme
Since this functionality isn't built in a good work-around is grid.extra:
library(gridExtra)
p1 <- ggplot(data[data$vis==1,]) + geom_boxplot(aes(x=trt,y=val,group=trt,colour=trt), show.legend=FALSE) +
#facet_wrap(~ vis, drop=FALSE, switch="x", nrow=1) +
scale_x_discrete(labels=lb[1:2]) + #paste(data.sum$trt,data.sum$N,sep="\n")
labs(x="Visit") + mytheme
p2 <- ggplot(data[data$vis==2,]) + geom_boxplot(aes(x=trt,y=val,group=trt,colour=trt), show.legend=FALSE) +
#facet_wrap(~ vis, drop=FALSE, switch="x", nrow=1) +
scale_x_discrete(labels=lb[3:4]) + #paste(data.sum$trt,data.sum$N,sep="\n")
labs(x="Visit") + mytheme
p3 <- ggplot(data[data$vis==3,]) + geom_boxplot(aes(x=trt,y=val,group=trt,colour=trt), show.legend=FALSE) +
#facet_wrap(~ vis, drop=FALSE, switch="x", nrow=1) +
scale_x_discrete(labels=lb[5:6]) + #paste(data.sum$trt,data.sum$N,sep="\n")
labs(x="Visit") + mytheme
grid.arrange(p1,p2,p3,nrow=1,ncol=3) # fully customizable
Related:
Varying axis labels formatter per facet in ggplot/R
You can also make them vertical or do other transformations:

Resources