Overlay circles in ggplot2 - r

What I'm trying to do is overlay circles that have a dark outline over the ones I have but I'm not sure how to size them since I already have varying sizes. Also is there anyway to change the legend symbols to something like $1M, $2m?
mikebay_usergraph <-
ggplot(mikebay_movies_dt, aes(y = tomatoUserMeter, x = Released, label = Title)) +
geom_point(aes(size = BoxOffice)) + (aes(color = tomatoImage)) +
geom_text(hjust = .45, vjust = -.75, family = "Futura", size = 5, colour = "#535353") +
ggtitle("The Fall of Bayhem: How Michael Bay movies have declined") +
theme(plot.title = element_text(size = 15, vjust = 1, family = "Futura"),
axis.text.x = element_text(size = 12.5, family = "Futura"),
axis.text.y = element_text(size = 12.0, family = "Futura"),
panel.background = element_rect(fill = '#F0F0F0'),
panel.grid.major=element_line(colour ="#D0D0D0",size=.75)) +
scale_colour_manual(values = c('#336333', '#B03530')) +
geom_hline(yintercept = 0,size = 1.2, colour = "#535353") +
scale_x_date(limits = c(as.Date("1994-1-1"),as.Date("2017-1-1"))) +
theme(axis.ticks = element_blank())

I offer two possible solutions for adding a circle or outline around size-scaled points in a scatterplot. For the first solution, I propose using plotting symbols that allow separate fill and outline colors. The drawback here is that you cannot control the thickness of the outline. For the second solution I propose adding an extra layer of slightly larger black points positioned under the primary geom_point layer. In this case, the thickness of the outline can be manually adjusted by setting thickness to a value between 0 and 1.
Finally, dollar legend formatting can be added by loading the scales package, and adding scale_size_continuous(labels=dollar) to your ggplot call.
library(ggplot2)
library(scales) # Needed for dollar labelling.
dat = data.frame(rating=c(80, 60, 40),
date=as.Date(c("1995-1-1", "2005-1-1", "2015-1-1")),
boxoffice=c(3e7, 1e8, 7e7),
tomato=c("fresh", "rotten", "rotten"))
p1 = ggplot(dat, aes(x=date, y=rating, size=boxoffice, fill=tomato)) +
geom_point(shape=21, colour="black") +
scale_fill_manual(values = c(fresh="green", rotten="red")) +
scale_size_continuous(labels=dollar, range=c(8, 22))
thickness = 0.35
p2 = ggplot(dat, aes(x=date, y=rating)) +
geom_point(colour="black",
aes(size=boxoffice + (thickness * mean(boxoffice)))) +
geom_point(aes(colour=tomato, size=boxoffice)) +
scale_colour_manual(values = c(fresh="green", rotten="red")) +
scale_size_continuous(labels=dollar, range=c(8, 22), name="Box Office")

Related

Why are colours appearing in the labels of my gganimate sketch?

I have a gganimate sketch in R and I would like to have the percentages of my bar chart appear as labels.
But for some bizarre reason, I am getting seemingly random colours in place of the labels that I'm requesting.
If I run the ggplot part without animating then it's a mess (as it should be), but it's obvious that the percentages are appearing correctly.
Any ideas? The colour codes don't correspond to the colours of the bars which I have chosen separately. The codes displayed also cycle through about half a dozen different codes, at a rate different to the frame rate that I selected. And while the bars are the same height (they grow until they reach the chosen height displayed in the animation) then they display the same code until they stop and it gets frozen.
Code snippet:
df_new <- data.frame(index, rate, year, colour)
df_new$rate_label <- ifelse(round(df_new$rate, 1) %% 1 == 0,
paste0(round(df_new$rate, 1), ".0%"), paste0(round(df_new$rate, 1), "%"))
p <- ggplot(df_new, aes(x = year, y = rate, fill = year)) +
geom_bar(stat = "identity", position = "dodge") +
scale_fill_manual(values = colour) +
#geom_text(aes(y = rate, label = paste0(rate, "%")), vjust = -0.7) +
geom_shadowtext(aes(y = rate, label = rate_label),
bg.colour='white',
colour = 'black',
size = 9,
fontface = "bold",
vjust = -0.7,
alpha = 1
) +
coord_cartesian(clip = 'off') +
ggtitle("% population belonging to 'No religion', England and Wales census") +
theme_minimal() +
xlab("") + ylab("") +
theme(legend.position = "none") +
theme(plot.title = element_text(size = 18, face = "bold")) +
theme(axis.text = element_text(size = 14)) +
scale_y_continuous(limits = c(0, 45), breaks = 10*(0:4))
p
p <- p + transition_reveal(index) + view_follow(fixed_y = T)
animate(p, renderer = gifski_renderer(), nframes = 300, fps = frame_rate, height = 500, width = 800,
end_pause = 0)
anim_save("atheism.gif")
I think you have missed some delicate points about ggplot2. I will try my best to describe them to you. First of all, you need to enter the discrete values as factor or integer. So you can use as.factor() before plotting or just factor() in the aesthetic. Also, you should consider rounding the percentages as you wish. Here is an example:
set.seed(2023)
df_new <- data.frame(index=1:10, rate=runif(10), year=2001:2010, colour=1:10)
df_new$rate_label <- ifelse(round(df_new$rate, 1) %% 1 == 0,
paste0(round(df_new$rate, 1), ".0%"),
paste0(round(df_new$rate, 1), "%"))
The ggplot for this data is:
library(ggplot2)
p <- ggplot(df_new, aes(x = factor(year), y = rate, fill = factor(colour))) +
geom_bar(stat = "identity", position = "dodge") +
geom_text(aes(y = rate, label = paste0(round(rate,2), "%")), vjust = -0.7) +
coord_cartesian(clip = 'off') +
ggtitle("% population belonging to 'No religion', England and Wales census") +
theme_minimal() +
xlab("") + ylab("") +
theme(legend.position = "none",
plot.title = element_text(size = 18, face = "bold"),
axis.text = element_text(size = 14))
p
And you can combine all theme element in one theme() function (as did I). The output is:
And you can easily animate the plot using the following code:
library(gganimate)
p + transition_reveal(index)
And the output is as below:
Hope it helps.
So it was answered here although I don't know why the fix works.
For some reason, labels need to go into gganimate as factors
as.factor()
I just had to add the line:
df_new$rate_label <- as.factor(df_new$rate_label)
and it works fine.

Formatting GGplot stacked barplot

I am making a set of scorecards where I am generating a set of graphs that show the distribution of responses from a survey and also where the response for a specific company falls. I need to modify the formatting of a graph, a stacked barchart, and add a few features I’ve outlined below. I’ve already spent a few hours getting my chart to where it is now and would appreciate your help with the features I outline below.
Data is
Data<-data.frame(Reviewed = c("Annually", "Annually", "Hourly", "Monthly", "Weekly","Monthly","Weekly","Other","Other","Monthly","Weekly"),Company=c("a","b","c","d","e","f","g","h","i","j","k"),Question="Q1")
So far I’ve developed this
ggplot(Data, aes(x="Question", fill=Reviewed)) + geom_bar(position='fill' ) +
coord_flip()
I would like to do the following:
Order the variables so they are arranged on plot as follows: Annually,Monthly,Weekly,Hourly,Other
Express the y axis in terms of percent. I.e. 0.25 turns into 25%
Move y-axis directly underneath the bar.
Remove the legend but move the terms underneath the respective part of the graph on a diagonal slant.
Add a black line that cuts down the 50% mark
Add a dot in at the midpoint of the stack for the value of company “e”.
Remove gray background
This is what I'm hoping the finished graph will look like.
There's a lot to unpack here, so I'll break it down bit by bit:
Order the variables so they are arranged on plot as follows: Annually,Monthly,Weekly,Hourly,Other
Assign "Reviewed" as an ordered factor. I'm reversing the order here since it wants to plot the "lowest" factor first (to the left).
Data$Reviewed <- factor(Data$Reviewed,
levels = rev(c('Annually', 'Monthly', 'Weekly', 'Hourly', 'Other')),
ordered = T)
ggplot(Data, aes(x="Question", fill=Reviewed)) + geom_bar(position='fill' ) +
coord_flip()
Express the y axis in terms of percent. I.e. 0.25 turns into 25%
Use scale_y_continuous(labels = scales::percent) to adjust the labels. I believe that the scales was pulled in when you installed ggplot2.
ggplot(Data, aes(x="Question", fill=Reviewed)) +
geom_bar(position = 'fill') +
scale_y_continuous(labels = scales::percent) +
coord_flip()
Move y-axis directly underneath the bar.
Remove gray background
These are done all at once by adding expand = F to coord_flip.
ggplot(Data, aes(x="Question", fill=Reviewed)) +
geom_bar(position = 'fill') +
scale_y_continuous(labels = scales::percent) +
coord_flip(expand = F)
Remove the legend...
Add theme(legend.position = 'none').
ggplot(Data, aes(x="Question", fill=Reviewed)) +
geom_bar(position = 'fill') +
scale_y_continuous(labels = scales::percent) +
coord_flip(expand = F) +
theme(legend.position = 'none')
but move the terms underneath the respective part of the graph on a diagonal slant.
This is tougher and takes a good amount of fiddling.
Use geom_text to make the labels
Calculate the position along the bar using the 'count' stat
Move the labels to the bottom of the plot by providing a fake x coordinate
Align the labels in the center of the bars using position_stack, and make them abut the x axis using hjust.
Add angle.
Use clip = 'off' in coord_flip to make sure that these values are not cut out since they're outside the plotting area.
Fiddle with the x limits to crop out empty plotting area.
Adjust the plot margin in theme to make sure everything can be seen.
ggplot(Data, aes(x="Question", fill=Reviewed)) +
geom_bar(position = 'fill') +
geom_text(aes(label = Reviewed, x = 0.45,
y = stat(..count../sum(..count..))), stat = 'count',
position = position_stack(0.5),
hjust = 0,
angle = 45) +
scale_y_continuous(labels = scales::percent) +
coord_flip(xlim = c(0.555, 1.4), clip = 'off',expand = F) +
theme(plot.margin = margin(0, 0, 35, 10),
legend.position = 'none')
Add a black line that cuts down the 50% mark
Use geom_hline(yintercept = 0.5); remember that it's a "horizontal" line since the coordinates are flipped.
ggplot(Data, aes(x="Question", fill=Reviewed)) +
geom_bar(position = 'fill') +
geom_text(aes(label = Reviewed, x = 0.45,
y = stat(..count../sum(..count..))), stat = 'count',
position = position_stack(0.5),
hjust = 0,
angle = 45) +
geom_hline(yintercept = 0.5) +
scale_y_continuous(labels = scales::percent) +
coord_flip(xlim = c(0.555, 1.4), clip = 'off',expand = F) +
theme(plot.margin = margin(0, 0, 20, 10),
legend.position = 'none')
Add a dot in at the midpoint of the stack for the value of company “e”.
This is pretty hack-y. Using the same y values as in geom_text, use geom_point to plot a point for every value of Reviewed, then use position_stack(0.5) to nudge them to the center of the bar. Then use scale_color_manual to only color "Weekly" values (which is the corresponding value of Reviewed for Company "e"). I'm sure there's a way to do this more programmatically.
ggplot(Data, aes(x="Question", fill=Reviewed)) +
geom_bar(position = 'fill') +
geom_text(aes(label = Reviewed, x = 0.45,
y = stat(..count../sum(..count..))), stat = 'count',
position = position_stack(0.5),
hjust = 0,
angle = 45) +
geom_hline(yintercept = 0.5) +
geom_point(aes(y = stat(..count../sum(..count..)),
color = Reviewed), stat = 'count',
position = position_stack(0.5), size = 5) +
scale_color_manual(values = 'black', limits = 'Weekly') +
scale_y_continuous(labels = scales::percent) +
coord_flip(xlim = c(0.555, 1.4), clip = 'off',expand = F) +
theme(plot.margin = margin(0, 0, 20, 10),
legend.position = 'none')
This is what I'm hoping the finished graph will look like.
Prettying things up:
ggplot(Data, aes(x="Question", fill = Reviewed)) +
geom_bar(position = 'fill') +
geom_text(aes(label = Reviewed, x = 0.45,
y = stat(..count../sum(..count..))), stat = 'count',
position = position_stack(0.5),
hjust = 0,
angle = 45) +
geom_hline(yintercept = 0.5) +
geom_point(aes(y = stat(..count../sum(..count..)),
color = Reviewed), stat = 'count',
position = position_stack(0.5), size = 5) +
scale_color_manual(values = 'black', limits = 'Weekly') +
scale_y_continuous(labels = scales::percent) +
coord_flip(xlim = c(0.555, 1.4), clip = 'off', expand = F) +
labs(x = NULL, y = NULL) +
theme_minimal() +
theme(plot.margin = margin(0, 0, 35, 10),
legend.position = 'none')

Dotplot: How to change dot sizes of dotplot based on a value in data and make all x axis values into whole numbers

I have made a dotplot for my data but need to help with the finishing touches. Been around stackoverflow a bit and haven't seen any posts that directly answer my queries yet.
My code for my dotplot is:
ggplot()+
geom_dotplot(mapping = aes(x= reorder(Description, -p.adjust), y=Count, fill=-p.adjust),
data = head(X[which(X$p.adjust < 0.05),], n = 15), binaxis = 'y', dotsize = 2,
method = 'dotdensity', binpositions = 'all', binwidth = NULL)+
scale_fill_continuous(low="black", high="light grey") +
labs(y = "Associated genes", x = "wikipathways", fill = "p.adjust") +
theme(axis.text=element_text(size=8)) +
ggtitle('') +
theme(plot.title = element_text(2, face = "bold", hjust = 1),
legend.key.size = unit(2, "line")) +
theme(panel.background = element_rect(fill = 'white', colour = 'black'))+
coord_fixed(ratio = 0.5)+
coord_flip()
Let's say the X is something along the lines of:
Description p.adjust Count GeneRatio
1 DescriptionA 0.001 3 3/20
2 DescriptionB 0.002 2 2/20
3 DescriptionC 0.003 5 5/20
4 DescriptionD 0.004 10 10/20
To complete this plot I need two edits.
I would like to use base the size of the dots on the GeneRatio, and make a secondary key based around this size. Is this possible with ggplot2, dotplots?
Next I would like to keep the X axis values as integers. I'd want to avoid using something like scale_x_continuous(limits = c(2, 10)) as this plot code is part of a function for multiple data sets of various sizes. Thus containing the limits/scale would not work well.
Help would be most appreciated.
If you can switch to a geom_point chart instead of geom_dotplot it's easy to adjust the dot size according to a variable. It also seems to have corrected your axis issue luckily enough.
ggplot(x)+
geom_point(mapping = aes(x= reorder(Description, -p.adjust), y=Count, fill=-p.adjust, size=GeneRatio),
data = head(x[which(x$p.adjust < 0.05),], n = 15), binaxis = 'y', #dotsize = 2,
method = 'dotdensity', binpositions = 'all', binwidth = NULL)+
scale_fill_continuous(low="black", high="light grey") +
labs(y = "Associated genes", x = "wikipathways", fill = "p.adjust") +
theme(axis.text=element_text(size=8)) +
ggtitle('') +
theme(plot.title = element_text(2, face = "bold", hjust = 1),
legend.key.size = unit(2, "line")) +
theme(panel.background = element_rect(fill = 'white', colour = 'black'))+
coord_fixed(ratio = 0.5)+
coord_flip()

Position dodge does not work with geom_point and geom_errorbar

I have this overplotting issue going on. Even after reading a lot of posts on dodge, jitter and jitter dodge in all kinds of implementations I can't figure it out.
Here you can get my data: http://pastebin.com/embed_js.php?i=uPXN7nPt
library(dplyr)
library(gdata)
library(ggplot2)
library(directlabels)
all<-read.xls('all_auto_bio_adjusted_c.xls')
all$size.new<-sqrt(all$size.new)
all$station<-as.factor(all$station)
all$group.new<-factor(all$group, levels=c('C. hyperboreus','C. glacialis','Special Calanus','M. longa','Pseudocalanus sp.','Copepoda'))
pd <- position_dodge(w = 50)
allp <- ggplot(data = all, aes(y = averagebiol, x = automatic, colour = group.new, group=group.new)) +
geom_abline(intercept = 0, slope = 1) +
geom_point(aes(size = size.new), show_guide=TRUE, position=pd) +
scale_size_identity()+
geom_errorbar(aes(ymin = averagebiol - stdevbiol, ymax = averagebiol + stdevbiol),colour = "grey", width = 0.1, position=pd) +
facet_grid(group.new~station, scales="free") +
xlab("Automatic identification") + ylab("Manual identification") +
ggtitle("Comparison of automatic vs manual identification") +
theme_bw() +
theme(plot.title = element_text(lineheight=.8, face="bold", size=20,vjust=1), axis.text.x = element_text(colour="grey20",size=15,angle=0,hjust=.5,vjust=.5,face="bold"), axis.text.y = element_text(colour="grey20",size=15,angle=0,hjust=1,vjust=0,face="bold"), axis.title.x = element_text(colour="grey20",size=20,angle=0,hjust=.5,vjust=0,face="bold"), axis.title.y = element_text(colour="grey20",size=20,angle=90,hjust=.5,vjust=1,face="bold"), legend.position="none", strip.text.x = element_text(size = 12, face="bold", colour = "black", angle = 0), strip.text.y = element_text(size = 12, face="bold", colour = "black"))
allp
Which produces this nice plot
But as you can see a lot of the points and error bars are cramped together. Shouldn't my implementation of position dodge work?
If I understood right position dodge takes the scale of the axes, so with a doge of 50 I should see some results. I also tried putting the dodge argument directly into the geom, but that had no effect either.
Any ideas?
If you leave out position = pd in both geom_errorbar() and geom_point() you get the same plot. The reason the data look 'cramped' is because of the spread of the x-values. As far as I know, dodging will only happen if two points 'overlap', which I interpret as having the same x-value, e.g. data on a categorical x-axis like in the case of a bar plot. Your x-axis is continuous so the points will not be dodged.
To deal with the overplotting you could try logarithmic scales:
library(ggplot2)
tmp <- tempfile()
download.file("http://pastebin.com/raw.php?i=uPXN7nPt", tmp)
all <- read.csv(tmp)
all$size.new <- sqrt(all$size.new)
all$station <- as.factor(all$station)
all$group.new <- factor(all$group, levels = c("C. hyperboreus", "C. glacialis",
"Special Calanus", "M. longa",
"Pseudocalanus sp.", "Copepoda"))
# explicitly remove missing data
all <- all[complete.cases(all), ]
allp <- ggplot(data = all, aes(y = averagebiol, x = automatic, colour = group.new,
group = group.new, ymin = averagebiol - stdevbiol,
ymax = averagebiol + stdevbiol)) +
theme_bw() +
geom_abline(intercept = 0, slope = 1) +
geom_errorbar(colour = "grey", width = 0.1) +
geom_point(aes(size = size.new)) +
scale_size_area() + # Just so I could see all the points on my monitor :)
xlab("Automatic identification") +
ylab("Manual identification") +
ggtitle("Comparison of automatic vs manual identification")
allp + scale_x_log10() +
scale_y_log10() +
facet_grid(group.new ~ station, scales = "fixed")

Changing line color when I have a geom_errorbar with ggplot

I have the following code:
library(ggplot2)
library(gridExtra)
data = data.frame(fit = c(9.8,15.4,17.6,21.6,10.8), lower = c(7.15,12.75,14.95,18.95,8.15), upper = c(12.44,18.04,20.24,24.24,13.44), factors = c(15,20,25,30,35), var = rep("Fator", 5))
gp <- ggplot(data, aes(x=factors, y=fit, ymax=upper, ymin=lower))
gp <- gp + geom_line(aes(group=var),size=1.2) +
geom_errorbar(width=.8, size=1, aes(colour='red')) +
geom_point(size=4, shape=21, fill="grey") +
labs(x = paste("\n",data$var[1],sep=""), y =paste("Values","\n",sep="")) +
theme(legend.position = 'none', axis.text = element_text(size = 11), plot.margin=unit(c(0.4,0.4,0.4,0.4), "cm"), axis.text.x = element_text(angle=45, hjust = 1, vjust = 1)) +
ylim((min(data$lower)), (max(data$upper)))
I want to change the line color after I have the ggplot object. I'm trying:
gp + scale_color_manual(values = "green")
but it change the error bar color and not the line color.
1)What should I do to change the line color?
2)How can I change the points color?
Thanks!
Try this:
gp$layers[[1]] <- NULL
gp + geom_line(aes(group = var),color = "green",size = 1.2)
A similar technique should work for the points layer. Technique was dredged up from my memories of a similar question.
I just looked at the contents of gp$layers manually to see which was which. I presume that the order will be the order in which they appear in your code, but I wouldn't necessarily rely on that.

Resources