I have a huge file and I don't really know what small test dataset I can give here to produce the same problem in the plot, so I will not give any test dataset, I will only attach the plot image here to show the problem.
My code:
ggplot(tgc, aes(x=Week, y=MuFreq)) +
theme_gray(base_size=18) +
theme(plot.title=element_text(hjust=.5),
axis.title.x = element_text(face="bold"),
axis.title.y = element_text(face="bold")) +
geom_errorbar(aes(ymin=MuFreq-(1.96*se), ymax=MuFreq+(1.96*se)), width=3) +
geom_line() +
geom_point(aes(size= N), color="blue")+
scale_x_continuous(breaks=c(68,98,188), labels=c("Wk68", "Wk98", "Wk188")) +
scale_y_continuous(limits=c(0,0.15)) +
scale_size( breaks = unique(tgc$N))
So the problem is that I'm sizing the dots based on the sample size for each week, the middle dot actually has error bars associated with it but it's covering the error bar. I tried to use horizontal error bar but it didn't work because my x-axis is customized to be non-numerical.
What can I do to show the error bar that's being covered?
Also is there any way to make the background vertical grid lines spaced evenly?
The Q asks to improve two things in the ggplot2 chart:
Show error bars that are being covered
Make the background vertical grid lines spaced evenly
Data
As the OP didn't supply any data, we need a dummy data set. This is easily done by reading values from the plot:
tgc <- data.frame(Week = c(68, 98, 188),
MuFreq = c(0.08, 0.09, 0.091),
se = c(0.003, 0.001, 0.019)/1.96,
N = c(91, 835, 7))
This reproduces the original plot quite nicely:
Variant 1
This one is picking up Nick Criswell's comments:
Change order in which layers are plotted, so that error bars are plotted on top
Change colour and alpha
plus
Remove all vertical grid lines except those which are explicetly specified as breaks. The distances of major grid lines are still uneven but reflect the difference in time
With this code
library(ggplot2)
ggplot(tgc, aes(x = Week, y = MuFreq)) +
theme_gray(base_size = 18) +
theme(plot.title = element_text(hjust = .5),
axis.title = element_text(face = "bold")) +
geom_line() +
geom_point(aes(size = N), color = "dodgerblue1", alpha = 0.5) +
geom_errorbar(aes(ymin = MuFreq - (1.96 * se),
ymax = MuFreq + (1.96 * se)), width = 3) +
scale_x_continuous(
breaks = c(68, 98, 188),
labels = c("Wk68", "Wk98", "Wk188"),
minor_breaks = NULL
) +
scale_y_continuous(limits = c(0, 0.15)) +
scale_size(breaks = unique(tgc$N))
we do get:
Variant 2
To get evenly spaced data points on the x-axis we can turn weeks into factor. This requires to tell ggplot2 that the data belong to one group in order to have lines plotted and to add a custom x-axis label.
In addition, theme_bw is used instead of theme_gray:
library(ggplot2)
ggplot(tgc, aes(x = factor(Week, labels = c("Wk68", "Wk98", "Wk188")),
y = MuFreq, group = 1)) +
theme_bw(base_size = 18) +
theme(plot.title = element_text(hjust = .5),
axis.title = element_text(face = "bold")) +
geom_line() +
geom_point(aes(size = N), color = "dodgerblue1", alpha = 0.5) +
geom_errorbar(aes(ymin = MuFreq - (1.96 * se),
ymax = MuFreq + (1.96 * se)), width = 0.05 ) +
scale_y_continuous(limits = c(0, 0.15)) +
scale_size(breaks = unique(tgc$N)) +
xlab("Week")
Related
I have a gganimate sketch in R and I would like to have the percentages of my bar chart appear as labels.
But for some bizarre reason, I am getting seemingly random colours in place of the labels that I'm requesting.
If I run the ggplot part without animating then it's a mess (as it should be), but it's obvious that the percentages are appearing correctly.
Any ideas? The colour codes don't correspond to the colours of the bars which I have chosen separately. The codes displayed also cycle through about half a dozen different codes, at a rate different to the frame rate that I selected. And while the bars are the same height (they grow until they reach the chosen height displayed in the animation) then they display the same code until they stop and it gets frozen.
Code snippet:
df_new <- data.frame(index, rate, year, colour)
df_new$rate_label <- ifelse(round(df_new$rate, 1) %% 1 == 0,
paste0(round(df_new$rate, 1), ".0%"), paste0(round(df_new$rate, 1), "%"))
p <- ggplot(df_new, aes(x = year, y = rate, fill = year)) +
geom_bar(stat = "identity", position = "dodge") +
scale_fill_manual(values = colour) +
#geom_text(aes(y = rate, label = paste0(rate, "%")), vjust = -0.7) +
geom_shadowtext(aes(y = rate, label = rate_label),
bg.colour='white',
colour = 'black',
size = 9,
fontface = "bold",
vjust = -0.7,
alpha = 1
) +
coord_cartesian(clip = 'off') +
ggtitle("% population belonging to 'No religion', England and Wales census") +
theme_minimal() +
xlab("") + ylab("") +
theme(legend.position = "none") +
theme(plot.title = element_text(size = 18, face = "bold")) +
theme(axis.text = element_text(size = 14)) +
scale_y_continuous(limits = c(0, 45), breaks = 10*(0:4))
p
p <- p + transition_reveal(index) + view_follow(fixed_y = T)
animate(p, renderer = gifski_renderer(), nframes = 300, fps = frame_rate, height = 500, width = 800,
end_pause = 0)
anim_save("atheism.gif")
I think you have missed some delicate points about ggplot2. I will try my best to describe them to you. First of all, you need to enter the discrete values as factor or integer. So you can use as.factor() before plotting or just factor() in the aesthetic. Also, you should consider rounding the percentages as you wish. Here is an example:
set.seed(2023)
df_new <- data.frame(index=1:10, rate=runif(10), year=2001:2010, colour=1:10)
df_new$rate_label <- ifelse(round(df_new$rate, 1) %% 1 == 0,
paste0(round(df_new$rate, 1), ".0%"),
paste0(round(df_new$rate, 1), "%"))
The ggplot for this data is:
library(ggplot2)
p <- ggplot(df_new, aes(x = factor(year), y = rate, fill = factor(colour))) +
geom_bar(stat = "identity", position = "dodge") +
geom_text(aes(y = rate, label = paste0(round(rate,2), "%")), vjust = -0.7) +
coord_cartesian(clip = 'off') +
ggtitle("% population belonging to 'No religion', England and Wales census") +
theme_minimal() +
xlab("") + ylab("") +
theme(legend.position = "none",
plot.title = element_text(size = 18, face = "bold"),
axis.text = element_text(size = 14))
p
And you can combine all theme element in one theme() function (as did I). The output is:
And you can easily animate the plot using the following code:
library(gganimate)
p + transition_reveal(index)
And the output is as below:
Hope it helps.
So it was answered here although I don't know why the fix works.
For some reason, labels need to go into gganimate as factors
as.factor()
I just had to add the line:
df_new$rate_label <- as.factor(df_new$rate_label)
and it works fine.
I am trying to create a circular plot showing the number of movements fish have during each hour. It works fine if the code is like this:
ggplot(aragx, aes(x = eventhour, y = Changes, group=eventhour, col=Family)) +
geom_boxplot(position=position_dodge()) +
scale_x_continuous(breaks = seq(0, 23), labels = seq(0, 23)) +
coord_polar(start = 0) + theme_minimal() +
scale_fill_brewer() + ylab("Count") +
ggtitle("Daily Section Changes per Hour") +ylim(0,20) +facet_wrap(.~Family) +
theme(legend.position = "none") + theme(axis.title.x = element_blank())
However, the 0-12 line doesn't quite run straight to the middle, the angle between 23 and 0 isn't straight and it just doen't look nice. So I modify scale_x_continuous as follows:
ggplot(aragx, aes(x = eventhour, y = Changes, group=eventhour, col=Family)) +
geom_boxplot(position=position_dodge()) +
scale_x_continuous(limits = c(0,24), breaks = seq(0, 23), labels = seq(0, 23)) +
coord_polar(start = 0) + theme_minimal() +
scale_fill_brewer() + ylab("Count") +
ggtitle("Daily Section Changes per Hour") +ylim(0,20) +facet_wrap(.~Family) +
theme(legend.position = "none") + theme(axis.title.x = element_blank())
This fixes the cosmetic issue, but the data from eventhour=0 is all screwed up, like so:
Does anyone know how to help me? It would be much appreciated, I've been banging my head against the wall over this small thing.
The issue is that by setting limits = c(0, 24) the parts of your box plot which range to the left of 0 (or to the right of 24) are clipped off. Hence, for the boxplot at the zero position only the whiskers and the right segment of the box are drawn.
To prevent that you have to adjust the limits to take account of the width of the boxplot which by default is .75. Hence you could get a full boxplot at the zero position set limits = c(-width_bp / 2, 24 - width_bp / 2). However, doing so will rotate your circular plot slightly which we could compensate for by setting start in coord_polar eqaul to -width_bp / 8 (Note: I checked that out by trial and error but there is for sure a reason why it has to be one eigth. Sigh, was always better in algebra than in geometry. (; ).
Using some random fake example data:
library(ggplot2)
aragx <- data.frame(
eventhour = rep(0:23, 100),
Changes = runif(24 * 100)
)
width_bp <- .75
ggplot(aragx, aes(x = eventhour, y = Changes, group = eventhour)) +
geom_boxplot(position = position_dodge()) +
scale_x_continuous(
limits = c(-width_bp / 2, 24 - width_bp / 2),
breaks = seq(0, 23), labels = seq(0, 23)
) +
coord_polar(start = -width_bp / 8) +
theme_minimal() +
scale_fill_brewer() +
ylab("Count") +
ggtitle("Daily Section Changes per Hour") +
theme(legend.position = "none") +
theme(axis.title.x = element_blank())
Created on 2022-02-03 by the reprex package (v2.0.1)
I am making a set of scorecards where I am generating a set of graphs that show the distribution of responses from a survey and also where the response for a specific company falls. I need to modify the formatting of a graph, a stacked barchart, and add a few features I’ve outlined below. I’ve already spent a few hours getting my chart to where it is now and would appreciate your help with the features I outline below.
Data is
Data<-data.frame(Reviewed = c("Annually", "Annually", "Hourly", "Monthly", "Weekly","Monthly","Weekly","Other","Other","Monthly","Weekly"),Company=c("a","b","c","d","e","f","g","h","i","j","k"),Question="Q1")
So far I’ve developed this
ggplot(Data, aes(x="Question", fill=Reviewed)) + geom_bar(position='fill' ) +
coord_flip()
I would like to do the following:
Order the variables so they are arranged on plot as follows: Annually,Monthly,Weekly,Hourly,Other
Express the y axis in terms of percent. I.e. 0.25 turns into 25%
Move y-axis directly underneath the bar.
Remove the legend but move the terms underneath the respective part of the graph on a diagonal slant.
Add a black line that cuts down the 50% mark
Add a dot in at the midpoint of the stack for the value of company “e”.
Remove gray background
This is what I'm hoping the finished graph will look like.
There's a lot to unpack here, so I'll break it down bit by bit:
Order the variables so they are arranged on plot as follows: Annually,Monthly,Weekly,Hourly,Other
Assign "Reviewed" as an ordered factor. I'm reversing the order here since it wants to plot the "lowest" factor first (to the left).
Data$Reviewed <- factor(Data$Reviewed,
levels = rev(c('Annually', 'Monthly', 'Weekly', 'Hourly', 'Other')),
ordered = T)
ggplot(Data, aes(x="Question", fill=Reviewed)) + geom_bar(position='fill' ) +
coord_flip()
Express the y axis in terms of percent. I.e. 0.25 turns into 25%
Use scale_y_continuous(labels = scales::percent) to adjust the labels. I believe that the scales was pulled in when you installed ggplot2.
ggplot(Data, aes(x="Question", fill=Reviewed)) +
geom_bar(position = 'fill') +
scale_y_continuous(labels = scales::percent) +
coord_flip()
Move y-axis directly underneath the bar.
Remove gray background
These are done all at once by adding expand = F to coord_flip.
ggplot(Data, aes(x="Question", fill=Reviewed)) +
geom_bar(position = 'fill') +
scale_y_continuous(labels = scales::percent) +
coord_flip(expand = F)
Remove the legend...
Add theme(legend.position = 'none').
ggplot(Data, aes(x="Question", fill=Reviewed)) +
geom_bar(position = 'fill') +
scale_y_continuous(labels = scales::percent) +
coord_flip(expand = F) +
theme(legend.position = 'none')
but move the terms underneath the respective part of the graph on a diagonal slant.
This is tougher and takes a good amount of fiddling.
Use geom_text to make the labels
Calculate the position along the bar using the 'count' stat
Move the labels to the bottom of the plot by providing a fake x coordinate
Align the labels in the center of the bars using position_stack, and make them abut the x axis using hjust.
Add angle.
Use clip = 'off' in coord_flip to make sure that these values are not cut out since they're outside the plotting area.
Fiddle with the x limits to crop out empty plotting area.
Adjust the plot margin in theme to make sure everything can be seen.
ggplot(Data, aes(x="Question", fill=Reviewed)) +
geom_bar(position = 'fill') +
geom_text(aes(label = Reviewed, x = 0.45,
y = stat(..count../sum(..count..))), stat = 'count',
position = position_stack(0.5),
hjust = 0,
angle = 45) +
scale_y_continuous(labels = scales::percent) +
coord_flip(xlim = c(0.555, 1.4), clip = 'off',expand = F) +
theme(plot.margin = margin(0, 0, 35, 10),
legend.position = 'none')
Add a black line that cuts down the 50% mark
Use geom_hline(yintercept = 0.5); remember that it's a "horizontal" line since the coordinates are flipped.
ggplot(Data, aes(x="Question", fill=Reviewed)) +
geom_bar(position = 'fill') +
geom_text(aes(label = Reviewed, x = 0.45,
y = stat(..count../sum(..count..))), stat = 'count',
position = position_stack(0.5),
hjust = 0,
angle = 45) +
geom_hline(yintercept = 0.5) +
scale_y_continuous(labels = scales::percent) +
coord_flip(xlim = c(0.555, 1.4), clip = 'off',expand = F) +
theme(plot.margin = margin(0, 0, 20, 10),
legend.position = 'none')
Add a dot in at the midpoint of the stack for the value of company “e”.
This is pretty hack-y. Using the same y values as in geom_text, use geom_point to plot a point for every value of Reviewed, then use position_stack(0.5) to nudge them to the center of the bar. Then use scale_color_manual to only color "Weekly" values (which is the corresponding value of Reviewed for Company "e"). I'm sure there's a way to do this more programmatically.
ggplot(Data, aes(x="Question", fill=Reviewed)) +
geom_bar(position = 'fill') +
geom_text(aes(label = Reviewed, x = 0.45,
y = stat(..count../sum(..count..))), stat = 'count',
position = position_stack(0.5),
hjust = 0,
angle = 45) +
geom_hline(yintercept = 0.5) +
geom_point(aes(y = stat(..count../sum(..count..)),
color = Reviewed), stat = 'count',
position = position_stack(0.5), size = 5) +
scale_color_manual(values = 'black', limits = 'Weekly') +
scale_y_continuous(labels = scales::percent) +
coord_flip(xlim = c(0.555, 1.4), clip = 'off',expand = F) +
theme(plot.margin = margin(0, 0, 20, 10),
legend.position = 'none')
This is what I'm hoping the finished graph will look like.
Prettying things up:
ggplot(Data, aes(x="Question", fill = Reviewed)) +
geom_bar(position = 'fill') +
geom_text(aes(label = Reviewed, x = 0.45,
y = stat(..count../sum(..count..))), stat = 'count',
position = position_stack(0.5),
hjust = 0,
angle = 45) +
geom_hline(yintercept = 0.5) +
geom_point(aes(y = stat(..count../sum(..count..)),
color = Reviewed), stat = 'count',
position = position_stack(0.5), size = 5) +
scale_color_manual(values = 'black', limits = 'Weekly') +
scale_y_continuous(labels = scales::percent) +
coord_flip(xlim = c(0.555, 1.4), clip = 'off', expand = F) +
labs(x = NULL, y = NULL) +
theme_minimal() +
theme(plot.margin = margin(0, 0, 35, 10),
legend.position = 'none')
This is my first question here so hope this makes sense and thank you for your time in advance!
I am trying to generate a scatterplot with the data points being the log2 expression values of genes from 2 treatments from an RNA-Seq data set. With this code I have generated the plot below:
ggplot(control, aes(x=log2_iFGFR1_uninduced, y=log2_iFGFR4_uninduced)) +
geom_point(shape = 21, color = "black", fill = "gray70") +
ggtitle("Uninduced iFGFR1 vs Uninduced iFGFR4 ") +
xlab("Uninduced iFGFR1") +
ylab("Uninduced iFGFR4") +
scale_y_continuous(breaks = seq(-15,15,by = 1)) +
scale_x_continuous(breaks = seq(-15,15,by = 1)) +
geom_abline(intercept = 1, slope = 1, color="blue", size = 1) +
geom_abline(intercept = 0, slope = 1, colour = "black", size = 1) +
geom_abline(intercept = -1, slope = 1, colour = "red", size = 1) +
theme_classic() +
theme(plot.title = element_text(hjust=0.5))
Current scatterplot:
However, I would like to change the background of the plot below the red line to a lighter red and above the blue line to a lighter blue, but still being able to see the data points in these regions. I have tried so far by using polygons in the code below.
pol1 <- data.frame(x = c(-14, 15, 15), y = c(-15, -15, 14))
pol2 <- data.frame(x = c(-15, -15, 14), y = c(-14, 15, 15))
ggplot(control, aes(x=log2_iFGFR1_uninduced, y=log2_iFGFR4_uninduced)) +
geom_point(shape = 21, color = "black", fill = "gray70") +
ggtitle("Uninduced iFGFR1 vs Uninduced iFGFR4 ") +
xlab("Uninduced iFGFR1") +
ylab("Uninduced iFGFR4") +
scale_y_continuous(breaks = seq(-15,15,by = 1)) +
scale_x_continuous(breaks = seq(-15,15,by = 1)) +
geom_polygon(data = pol1, aes(x = x, y = y), color ="pink1") +
geom_polygon(data = pol2, aes(x = x, y = y), color ="powderblue") +
geom_abline(intercept = 1, slope = 1, color="blue", size = 1) +
geom_abline(intercept = 0, slope = 1, colour = "black", size = 1) +
geom_abline(intercept = -1, slope = 1, colour = "red", size = 1) +
theme_classic() +
theme(plot.title = element_text(hjust=0.5))
New scatterplot:
However, these polygons hide my data points in this area and I don't know how to keep the polygon color but see the data points as well. I have also tried adding "fill = NA" to the geom_polygon code but this makes the area white and only keeps a colored border. Also, these polygons shift my axis limits so how do I change the axes to begin at -15 and end at 15 rather than having that extra unwanted length?
Any help would be massively appreciated as I have struggled with this for a while now and asked friends and colleagues who were unable to help.
Thanks,
Liv
Your question has two parts, so I'll answer each in turn using a dummy dataset:
df <- data.frame(x=rnorm(20,5,1), y=rnorm(20,5,1))
Stop geom_polygon from hiding geom_point
Stefan had commented with the answer to this one. Here's an illustration. Order of operations matters in ggplot. The plot you create is a result of each geom (drawing operation) performed in sequence. In your case, you have geom_polygon after geom_point, so it means that it will plot on top of geom_point. To have the points plotted on top of the polygons, just have geom_point happen after geom_polygon. Here's an illustrative example:
p <- ggplot(df, aes(x,y)) + theme_bw()
p + geom_point() + xlim(0,10) + ylim(0,10)
Now if we add a geom_rect after, it hides the points:
p + geom_point() +
geom_rect(ymin=0, ymax=5, xmin=0, xmax=5, fill='lightblue') +
xlim(0,10) + ylim(0,10)
The way to prevent that is to just reverse the order of geom_point and geom_rect. It works this way for all geoms.
p + geom_rect(ymin=0, ymax=5, xmin=0, xmax=5, fill='lightblue') +
geom_point() +
xlim(0,10) + ylim(0,10)
Removing whitespace between the axis and limits of the axis
The second part of your question asks about how to remove the white space between the edges of your geom_polygon and the axes. Notice how I have been using xlim and ylim to set limits? It is a shortcut for scale_x_continuous(limits=...) and scale_y_continuous(limits=...); however, we can use the argument expand= within scale_... functions to set how far to "expand" the plot before reaching the axis. You can set the expand setting for upper and lower axis limits independently, which is why this argument expects a two-component number vector, similar to the limits= argument.
Here's how to remove that whitespace:
p + geom_rect(ymin=0, ymax=5, xmin=0, xmax=5, fill='lightblue') +
geom_point() +
scale_x_continuous(limits=c(0,10), expand=c(0,0)) +
scale_y_continuous(limits=c(0,10), expand=c(0,0))
What I'm trying to do is overlay circles that have a dark outline over the ones I have but I'm not sure how to size them since I already have varying sizes. Also is there anyway to change the legend symbols to something like $1M, $2m?
mikebay_usergraph <-
ggplot(mikebay_movies_dt, aes(y = tomatoUserMeter, x = Released, label = Title)) +
geom_point(aes(size = BoxOffice)) + (aes(color = tomatoImage)) +
geom_text(hjust = .45, vjust = -.75, family = "Futura", size = 5, colour = "#535353") +
ggtitle("The Fall of Bayhem: How Michael Bay movies have declined") +
theme(plot.title = element_text(size = 15, vjust = 1, family = "Futura"),
axis.text.x = element_text(size = 12.5, family = "Futura"),
axis.text.y = element_text(size = 12.0, family = "Futura"),
panel.background = element_rect(fill = '#F0F0F0'),
panel.grid.major=element_line(colour ="#D0D0D0",size=.75)) +
scale_colour_manual(values = c('#336333', '#B03530')) +
geom_hline(yintercept = 0,size = 1.2, colour = "#535353") +
scale_x_date(limits = c(as.Date("1994-1-1"),as.Date("2017-1-1"))) +
theme(axis.ticks = element_blank())
I offer two possible solutions for adding a circle or outline around size-scaled points in a scatterplot. For the first solution, I propose using plotting symbols that allow separate fill and outline colors. The drawback here is that you cannot control the thickness of the outline. For the second solution I propose adding an extra layer of slightly larger black points positioned under the primary geom_point layer. In this case, the thickness of the outline can be manually adjusted by setting thickness to a value between 0 and 1.
Finally, dollar legend formatting can be added by loading the scales package, and adding scale_size_continuous(labels=dollar) to your ggplot call.
library(ggplot2)
library(scales) # Needed for dollar labelling.
dat = data.frame(rating=c(80, 60, 40),
date=as.Date(c("1995-1-1", "2005-1-1", "2015-1-1")),
boxoffice=c(3e7, 1e8, 7e7),
tomato=c("fresh", "rotten", "rotten"))
p1 = ggplot(dat, aes(x=date, y=rating, size=boxoffice, fill=tomato)) +
geom_point(shape=21, colour="black") +
scale_fill_manual(values = c(fresh="green", rotten="red")) +
scale_size_continuous(labels=dollar, range=c(8, 22))
thickness = 0.35
p2 = ggplot(dat, aes(x=date, y=rating)) +
geom_point(colour="black",
aes(size=boxoffice + (thickness * mean(boxoffice)))) +
geom_point(aes(colour=tomato, size=boxoffice)) +
scale_colour_manual(values = c(fresh="green", rotten="red")) +
scale_size_continuous(labels=dollar, range=c(8, 22), name="Box Office")