Equal bar widths in ggplot2 histogram using facet_wrap() - r

I have data which looks similar to example data below and I am attempting to draw a histogram of the measurement column faceted on the Genotype column. Ultimately I would like the colours of the bars to be conditional on the Genotype and Condition columns.
Crucially Genotype B individuals were never measured under condition L.
This is what the data looks like:
library(ggplot2)
library(dplyr)
set.seed(123)
DF <- data.frame(Genotype = rep(c("A", "B"), 500),
Condition = sample(c("E", "L"), 1000, replace = T),
Measurment = round(rnorm(500,10,3), 0))
DF <- anti_join(DF, filter(DF, Genotype == "B" & Condition != "E"))'
head(DF)
Genotype Condition Measurment
1 A L 18
2 A L 2
3 B E 18
4 B E 18
5 B E 16
6 B E 16
Now I to specify the colours of the bars I thought it easiest to create a new column of hexcodes such that all individuals of Genotype B are one colour, and individuals of Genotype A are a second colour if measured under Condition E and a third colour if measured under Condition L.
DF <- DF %>% mutate(colr = ifelse(Genotype == "B", "#409ccd",
ifelse(Condition == "E", "#43cd80", "#ffc0cb")))
I can then draw a histogram faceted on the Genotype column like so:
ggplot(data=DF, aes(Measurment, fill = Condition)) +
geom_histogram(aes(y=..count.., fill = colr), position='dodge', binwidth = 1) +
facet_wrap(~Genotype, nrow=2) +
scale_fill_manual(values = c("#409ccd","#ffc0cb","#43cd80")) +
theme(legend.position="none")
and it like like this:
However as you can see the columns for Genotype B are twice the size of Genotype A. How can I shrink the Genotype B to the same size as Genotype A?
I considered adding dummy entries to my data where Genotype B has Condition L entries but the binning function then counts these as Measurements which is misleading. I also have a version of this using geom_bar() but that results in a similar problem. ggplot must have a way of doing this.
Any help appreciated.

something like this maybe?
ggplot(data=DF, aes(Measurment, fill = Condition)) +
geom_histogram(data=subset(DF, Genotype!="B"),aes(y=..count.., fill = colr), position='dodge', binwidth = 1) +
geom_histogram(data=subset(DF, Genotype=="B"),aes(x = Measurment, y=..count.., fill = colr), position=position_nudge(x=0.25), binwidth = 0.5) +
facet_wrap(~Genotype, nrow=2) +
scale_fill_identity() +
theme(legend.position="none")

Do you want something like the following? I assumed by size of the column you meant bar width.
library(grid)
library(gridExtra)
p1 <- ggplot(data=DF[DF$Genotype=='A',], aes(Measurment, fill = Condition)) +
geom_histogram(aes(y=..count.., fill = colr), position='dodge', binwidth = 1) +
scale_fill_manual(values = c("#43cd80","#ffc0cb")) +
theme(legend.position="none")
p2 <- ggplot(data=DF[DF$Genotype=='B',], aes(Measurment, fill = Condition)) +
geom_histogram(aes(y=..count.., fill = colr), binwidth = 0.5, boundary = 1) +
scale_fill_manual(values = c("#409ccd")) +
theme(legend.position="none")
grid.arrange(p1, p2)

Related

Grouping box plot based on time and coloring based on categories

I am making a box plot for the data I have. Here is the data frame. I wrote the codes and got a nice box plot as in picture 1. But I suppose that there must be box plot for all (2,3,4) time for Land each. As (at 2 time step in first grid extremely saline there should be box plots for all type of land, so on and forth). I may be missing grouping them based on time please see the picture 2. I have also tried to group them but couldn't get the graph as I intended to do. Any help will appreciated. Thanks
Seed(123)
ID = 1:5
Time = rep (c(1,2,3,4,5), each = 20)
Type = 1:25
data <- data.frame( IDn = rep(ID,20), Time, Land = rep(Type, 40), y = rnorm(100,0,1), x = runif(100,0,1))
data$Land= ifelse (data$Land > 15,"large farmers", ifelse(data$Land <=5, "small farmers", "medium-farmers"))
data<- data %>% mutate(xtype = case_when(x> 0.8~ 'Extremely Saline',
x > 0.6 & x<=0.8~ 'Severely Saline',
x > 0.5 & x<=0.6~ 'Highly Saline',
x > 0.3 & x<=0.5~ 'Moderatley Saline',
x > 0.2 & x<=0.3~ 'Slightly Saline',
x <= 0.2~ 'Non saline' ))
## Box Plot
ggplot(data, aes(x=Time, y =x)) +
geom_boxplot(aes(color = Land), size = 0.5, alpha = 0.6) +
facet_wrap(~xtype, nrow = 1) + theme_bw()
#box plot grouping
ggplot(data, aes(x=Time, y =x, group=Time)) +
geom_boxplot(aes(color = Land), size = 0.5, alpha = 0.6) +
facet_wrap(~xtype, nrow = 1) + theme_bw()
Picture 2
Edit: I tried suggested solution for my data set which i have used to put a reproducible example here. Data is some what large and got this graph time is overlapped. I am not sure what's happening.
Is this what you are looking for.
I changed the aes and made Time factor.
## Box Plot
ggplot(data, aes(x=factor(Time), y =x, color = Land)) +
geom_boxplot(size = 0.5, alpha = 0.6) +
facet_wrap(~xtype, nrow = 1) + theme_bw()
#box plot grouping
ggplot(data, aes(x=factor(Time), y =x, group=Time, color = Land)) +
geom_boxplot(size = 0.5, alpha = 0.6) +
facet_wrap(~xtype, nrow = 1) + theme_bw()

How can I know the code for the colours I used in a ggplot to add a vertical line with this same colour?

I have a df like this:
set.seed(123)
df <- data.frame(Delay=rep(-5:6, times=8, each=1),
ID= rep(c("A","B","C","D"), times=1, each=24),
variable=rep(c("R2","SE"), times=4, each=12),
value=c(0.3,0.4,0.51,0.58,0.64,0.78,0.68,0.63,0.54,0.45,0.32,0.22,0.78,0.68,0.59,0.55,0.47,0.35,0.28,0.41,0.50,0.58,0.63,0.73,0.3,0.4,0.51,0.58,0.64,0.78,0.68,0.63,0.54,0.45,0.32,0.22,0.78,0.68,0.59,0.55,0.47,0.35,0.28,0.41,0.50,0.58,0.63,0.73,0.3,0.4,0.51,0.58,0.64,0.78,0.68,0.63,0.54,0.45,0.32,0.22,0.78,0.68,0.59,0.55,0.47,0.35,0.28,0.41,0.50,0.58,0.63,0.73,0.3,0.4,0.51,0.58,0.64,0.78,0.68,0.63,0.54,0.45,0.32,0.22,0.78,0.68,0.59,0.55,0.47,0.35,0.28,0.41,0.50,0.58,0.63,0.73))
df$ID <- as.factor(df$ID)
df$variable <- as.factor(df$variable)
Plot<- ggplot(df[df$ID=="B",], aes(x=Delay, y=value, group=variable, colour=variable)) +
geom_point(size=1) +
geom_line () +
theme_hc() +
theme(legend.position="right") +
labs(x= '\nDelay',y=expression(R^{2})) +
guides(color=guide_legend(override.aes=list(fill=NA))) +
scale_x_continuous(breaks=seq(-5,5,1)) +
scale_color_jco()
Plot
I am plotting just data of B.
I would like to add a vertical for the minimum value of SE and a vertical line for the maximum value of R2. I would like that the lines had the same colour than the variable. However, I don't know how to do it. The colour of the vertical lines are black as you can see below, so I don't know how to indicate I want the specific colour I Used previously.
Plot <- Plot + geom_vline(xintercept = 0)
Plot
Does anyone know how add both vertical lines using the same colours that for the variables?
You don't need to find the color to instruct ggplot2 to reuse it: you can supply "new data" with your desired x-intercept lines, and identify each v-line as belonging to a particular variable to use that variable's color.
I don't have your original Plot object or call, so my colors/theme will be different.
library(ggplot2)
ggplot(df, aes(Delay, value, color = variable)) +
geom_line() +
geom_vline(aes(xintercept = Delay, color = variable),
data = data.frame(Delay = 0, variable = "R2"))
Or with multiple v-lines:
ggplot(df, aes(Delay, value, color = variable)) +
geom_line() +
geom_vline(aes(xintercept = Delay, color = variable),
data = data.frame(Delay = c(-1, 1, 2), variable = c("R2", "SE", "R2")))
This edit might answer this and your other question:
mins <- do.call(rbind, by(df, df[,c("ID", "variable")], function(z) z[which.min(z$value),]))
mins
# Delay ID variable value
# 12 6 A R2 0.22
# 36 6 B R2 0.22
# 60 6 C R2 0.22
# 84 6 D R2 0.22
# 19 1 A SE 0.28
# 43 1 B SE 0.28
# 67 1 C SE 0.28
# 91 1 D SE 0.28
ggplot(df[df$ID == "B",], aes(Delay, value, color = variable)) +
geom_line() +
geom_vline(aes(xintercept = Delay, color = variable), data = mins)
Or if you want to see multiple IDs, you can facet,
ggplot(df, aes(Delay, value, color = variable)) +
geom_line() +
geom_vline(aes(xintercept = Delay, color = variable), data = mins) +
facet_wrap("ID")
I think #r2evans approach to your specific problem is the correct one. However, to answer the more general question about how you can retrieve the colours from an applied colour scale (e.g. if you want to modify the colour etc), you can get it without going through ggbuild, using the following:
Plot$scales$get_scales("colour")$palette(2)
[1] "#0073C2FF" "#EFC000FF"
So we could do:
# Get colours
my_blue <- Plot$scales$get_scales("colour")$palette(2)[1]
my_yellow <- Plot$scales$get_scales("colour")$palette(2)[2]
# Get index of max R2 and min SE
maxR2 <- which.max(df$value[df$ID == "B" & df$variable == "R2"])
minSE <- which.min(df$value[df$ID == "B" & df$variable == "SE"])
# Get value of Delay at maxR2 and minSE
D_R2 <- df$Delay[df$ID == "B" & df$variable == "R2"][maxR2]
D_SE <- df$Delay[df$ID == "B" & df$variable == "SE"][minSE]
# Plot lines at the correct positions and with the desired colours
Plot + geom_vline(aes(xintercept = D_R2), colour = my_blue) +
geom_vline(aes(xintercept = D_SE), colour = my_yellow)

Color-coded PMF with legend in ggplot2

My goal is to produce two overlapping PMFs of binomial distributions using ggplot2, color-coded according to colors that I specify, with a legend at the bottom.
So far, I think I have set up the data frame right.
successes <- c(seq(0,10,1),seq(0,10,1))
freq <- c(dbinom(seq(0,10,1),10,0.2),dbinom(seq(0,10,1),10,0.8))
class <- c(rep(' A ',11),rep(' B ',11))
df1 <- data.frame(cbind(successes,freq,class))
However, this gives the wrong result.
library(ggplot2)
g <- ggplot(df1, aes(successes),y=freq)
g + geom_bar(aes(fill = class))
I feel like I'm following an example yet getting a totally different result. This (almost) does what I want: it would be exact if it gave relative frequencies.
g <- ggplot(mpg, aes(class))
g + geom_bar(aes(fill = drv))
A couple of questions:
1) Where am I going wrong in my block of code?
2) Is there a better way to show to PMFs in one graph? I'm not determined to use a histogram or bar chart.
3) How can I set this up to give me the ability to choose the colors?
4) How do I order the values on the x-axis? They aren't categories. They are the numbers 0-10 and have a natural order that I want to preserve.
Thanks!
UPDATE
The following two blocks worked.
successes <- c(seq(0,10,1),seq(0,10,1))
freq <- c(dbinom(seq(0,10,1),10,0.2),dbinom(seq(0,10,1),10,0.8))
class <- c(rep(' A ',11),rep(' B ',11))
df1 <- data.frame(successes,freq,class)
ggplot(df1, aes(successes ,y=freq, fill = class)) +
geom_bar(stat = "identity") +
scale_x_continuous(breaks = seq(0,10,1)) +
scale_fill_manual(values = c("blue", "green")) + theme_bw()
AND
successes <- c(seq(0,10,1),seq(0,10,1))
freq <- c(dbinom(seq(0,10,1),10,0.2),dbinom(seq(0,10,1),10,0.8))
class <- c(rep(' A ',11),rep(' B ',11))
df1 <- data.frame(successes,freq,class)
ggplot(df1, aes(x=successes,y=freq),y=freq) +
geom_col(aes(fill = class)) +
scale_x_continuous(breaks = seq(0,10,1)) +
scale_fill_manual(values = c("blue", "green")) + theme_bw()
I think your issue is that successes and freq are being changed to factors when you create df1
Maybe this is what you're thinking of?
successes <- c(seq(0,10,1),seq(0,10,1))
freq <- c(dbinom(seq(0,10,1),10,0.2),dbinom(seq(0,10,1),10,0.8))
class <- c(rep(' A ',11),rep(' B ',11))
df1 <- data.frame(successes = as.numeric(successes), freq = as.numeric(freq), class)
ggplot(df1, aes(x = successes, y = freq)) +
geom_bar(stat = "identity", aes(fill = class))
If not, happy to answer any further questions!
Is this what you're looking for?
library(ggplot2)
g <- ggplot(df1, aes(successes ,y=freq, fill = class))
g + geom_bar(stat = "identity") +
scale_fill_manual(values = c("blue", "green"))
Of course, keeping in mind you'd indeed change your dataframe creation to:
successes <- c(seq(0,10,1),seq(0,10,1))
freq <- c(dbinom(seq(0,10,1),10,0.2),dbinom(seq(0,10,1),10,0.8))
class <- c(rep(' A ',11),rep(' B ',11))
df1 <- data.frame(successes,freq,class)
as suggested in the comments.

How to create two barplots with different x and y axis in tha same plot in R?

I need plot two grouped barcodes with two dataframes that has distinct number of rows: 6, 5.
I tried many codes in R but I don't know how to fix it
Here are my data frames: The Freq colum must be in Y axis and the inter and intra columns must be the x axis.
> freqinter
inter Freq
1 0.293040975264367 17
2 0.296736775990729 2
3 0.297619926364764 4
4 0.587377012109561 1
5 0.595245125315916 4
6 0.597022018595893 2
> freqintra
intra Freq
1 0 3
2 0.293040975264367 15
3 0.597022018595893 4
4 0.598809552335782 2
5 0.898227748764939 6
I expect to plot the barplots in the same plot and could differ inter e intra values by colour
I want a picture like this one:
You probably want a histogram. Use the raw data if possible. For example:
library(tidyverse)
freqinter <- data.frame(x = c(
0.293040975264367,
0.296736775990729,
0.297619926364764,
0.587377012109561,
0.595245125315916,
0.597022018595893), Freq = c(17,2,4,1,4,2))
freqintra <- data.frame(x = c(
0 ,
0.293040975264367,
0.597022018595893,
0.598809552335782,
0.898227748764939), Freq = c(3,15,4,2,6))
df <- bind_rows(freqinter, freqintra, .id = "id") %>%
uncount(Freq)
ggplot(df, aes(x, fill = id)) +
geom_histogram(binwidth = 0.1, position = 'dodge', col = 1) +
scale_fill_grey() +
theme_minimal()
With the data you posted I don't think you can have this graph to look good. You can't have bars thin enough to differentiate 0.293 and 0.296 when your data ranges from 0 to 0.9.
Maybe you could try to treat it as a factor just to illustrate what you want to do:
freqinter <- data.frame(x = c(
0.293040975264367,
0.296736775990729,
0.297619926364764,
0.587377012109561,
0.595245125315916,
0.597022018595893), Freq = c(17,2,4,1,4,2))
freqintra <- data.frame(x = c(
0 ,
0.293040975264367,
0.597022018595893,
0.598809552335782,
0.898227748764939), Freq = c(3,15,4,2,6))
df <- bind_rows(freqinter, freqintra, .id = "id")
ggplot(df, aes(x = as.factor(x), y = Freq, fill = id)) +
geom_bar(stat = "identity", position = position_dodge2(preserve = "single")) +
theme(axis.text.x = element_text(angle = 90)) +
scale_fill_discrete(labels = c("inter", "intra"))
You can also check the problem by not treating your x variable as a factor:
ggplot(df, aes(x = x, y = Freq, fill = id)) +
geom_bar(stat = "identity", width = 0.05, position = "dodge") +
theme(axis.text.x = element_text(angle = 90)) +
scale_fill_discrete(labels = c("inter", "intra"))
Either the bars must be very thin (small width), or you'll get overlapping x intervals breaking the plot.

Side-by-side stacked bar charts with two color scales in ggplot2

I have a data set where I need to represent a stacked bar chart for two cohorts over three time periods. Currently, I am faceting by year, and filling based on probability values for my DV (# of times,t, that someone goes to a nursing home; pr that t=0, t=1, ... t >= 5). I am trying to figure out if it is possible to introduce another color scale, so that each of the "Comparison" bars would be filled with a yellow gradient, and the treatmetn bars would be filled with a blue gradient. I figure the best way to do this may to be to overlay the two plots, but I'm not sure if it is possible to do this in ggplot (or some other package.) Code and screenshot are below:
tempPlot <- ggplot(tempDF,aes(x = HBPCI, y = margin, fill=factor(prob))) +
scale_x_continuous(breaks=c(0,1), labels=c("Comparison", "Treatment"))+
scale_y_continuous(labels = percent_format())+
ylab("Prob snf= x")+
xlab("Program Year")+
ggtitle(tempFlag)+
geom_bar(stat="identity")+
scale_fill_brewer(palette = "Blues")+ #can change the color scheme here.
theme(axis.title.y =element_text(vjust=1.5, size=11))+
theme(axis.title.x =element_text(vjust=0.1, size=11))+
theme(axis.text.x = element_text(size=10,angle=-45,hjust=.5,vjust=.5))+
theme(axis.text.y = element_text(size=10,angle=0,hjust=1,vjust=0))+
facet_grid(~yearQual, scales="fixed")
You may want to consider using interaction() -- here's a reproducible solution:
year <- c("BP", "PY1", "PY2")
type <- c("comparison", "treatment")
df <- data.frame(year = sample(year, 100, T),
type = sample(type, 100, T),
marg = abs(rnorm(100)),
fact = sample(1:5, 100, T))
head(df)
# year type marg fact
# 1 BP comparison 0.2794279 3
# 2 PY2 comparison 1.6776371 1
# 3 BP comparison 0.8301721 2
# 4 PY1 treatment 0.6900511 1
# 5 PY2 comparison 0.6857421 3
# 6 PY1 treatment 1.4835672 3
library(ggplot2)
blues <- RColorBrewer::brewer.pal(5, "Blues")
oranges <- RColorBrewer::brewer.pal(5, "Oranges")
ggplot(df, aes(x = type, y = marg, fill = interaction(factor(fact), type))) +
geom_bar(stat = "identity") +
facet_wrap(~ year) +
scale_fill_manual(values = c(blues, oranges))

Resources