Why is the resolution on my geom_points so poor - r

I came across an alternative to grouped bar charts in ggplot that Rebecca Barter posted on her blog and wanted to give it a try. It produces a slick Cleveland dot plot:
The code for my attempt follows:
ggplot() +
# remove axes and superfluous grids
theme_classic() +
theme(axis.ticks.y = element_blank(),
text = element_text(family = "Roboto Condensed"),
axis.text = element_text(size = rel(1.5)),
plot.title = element_text(size = 30, color = "#000000"),
plot.subtitle = element_text(size = 15, color = "#Ec111A"),
plot.caption = element_text(size = 15, color = "grey25"),
plot.margin = margin(20,20,20,20),
panel.background = element_rect(fill = "white"),
axis.line = element_blank(),
axis.text.x = element_text(vjust= + 15)) +
# add a dummy point for scaling purposes
geom_point(aes(x = 15, y = P),
size = 0, col = "white") +
# add the horizontal discipline lines
geom_hline(yintercept = 1:9, color = "grey80") +
# add a point for each male success rate
geom_point(aes(x = Male, y = P),
size = 15, col = "#00b0f0") +
# add a point for each female success rate
geom_point(aes(x = Female, y = P),
size = 15, col = "#Ec111A") +
geom_text(aes(x = Male, y = P,
label = paste0(round(Male, 1))),
col = "black", face="bold") +
# add the text (%) for each female success rate
geom_text(aes(x = Female, y = P,
label = paste0(round(Female, 1))),
col = "white", face="bold") +
# add a label above the first two points
geom_text(aes(x = x, y = y, label = label, col = label),
data.frame(x = c(21.8 - 0, 24.6 - 0), y = 7.5,
label = c("Male", "Female")), size = 6) +
scale_color_manual(values = c("#Ec111A", "#00b0f0"), guide = "none") +
# manually specify the x-axis
scale_x_continuous(breaks = c(0, 10, 20, 30),
labels = c("0%","10%", "20%", "30%")) +
# manually set the spacing above and below the plot
scale_y_discrete(expand = c(0.15, 0)) +
labs(
x = NULL,
y = NULL,
title= "Move Percentage By Gender",
subtitle = "What Percentage Of Moves Are Tops",
caption = "Takeaway: Males have fewer Tops and more Xs compared to Females.")
But my plot has very jagged (poor resolution points) and I can't figure out what's the cause.
Has anyone come across this problem and know how to fix it?

Saving and resolution depends on how you save and your graphics device. In other words... how are you saving your plot? Since it depends so much on your personal setup and parameters, your mileage will vary. One of the more dependable ways of saving plots from ggplot2 in R is to use ggsave(), where you can specify these parameters and maintain some consistency. Here is an example plot code:
ggplot(mtcars, aes(disp, mpg)) +
geom_point(size=10, color='red1') +
geom_text(aes(label=cyl), color='white')
This creates a plot similar to what you show using mtcars. If I copy and paste the graphic output directly from R or use export (I'm using RStudio) this is what you get:
Not sure if you can tell, but the edges are jagged and it does not look clean on close inspection. Definitely not OK for me. However, here's the same plot saved using ggsave():
ggsave('myplot.png', width = 9, height = 6)
You should be able to tell that it's a lot cleaner, because it is saved with a higher resolution. File size on the first is 9 KB, whereas it's 62 KB on the second.
In the end - just play with the settings on ggsave() and you should find some resolution that works for you. If you just input ggsave('myplotfile.png'), you'll get the width/height settings that match your viewport window in RStudio. You can get an idea of the aspect and size and adjust accordingly. One more point - be cautious that text does not scale the same as geoms, so your circles will increase in size differently than the text.

Related

Why are colours appearing in the labels of my gganimate sketch?

I have a gganimate sketch in R and I would like to have the percentages of my bar chart appear as labels.
But for some bizarre reason, I am getting seemingly random colours in place of the labels that I'm requesting.
If I run the ggplot part without animating then it's a mess (as it should be), but it's obvious that the percentages are appearing correctly.
Any ideas? The colour codes don't correspond to the colours of the bars which I have chosen separately. The codes displayed also cycle through about half a dozen different codes, at a rate different to the frame rate that I selected. And while the bars are the same height (they grow until they reach the chosen height displayed in the animation) then they display the same code until they stop and it gets frozen.
Code snippet:
df_new <- data.frame(index, rate, year, colour)
df_new$rate_label <- ifelse(round(df_new$rate, 1) %% 1 == 0,
paste0(round(df_new$rate, 1), ".0%"), paste0(round(df_new$rate, 1), "%"))
p <- ggplot(df_new, aes(x = year, y = rate, fill = year)) +
geom_bar(stat = "identity", position = "dodge") +
scale_fill_manual(values = colour) +
#geom_text(aes(y = rate, label = paste0(rate, "%")), vjust = -0.7) +
geom_shadowtext(aes(y = rate, label = rate_label),
bg.colour='white',
colour = 'black',
size = 9,
fontface = "bold",
vjust = -0.7,
alpha = 1
) +
coord_cartesian(clip = 'off') +
ggtitle("% population belonging to 'No religion', England and Wales census") +
theme_minimal() +
xlab("") + ylab("") +
theme(legend.position = "none") +
theme(plot.title = element_text(size = 18, face = "bold")) +
theme(axis.text = element_text(size = 14)) +
scale_y_continuous(limits = c(0, 45), breaks = 10*(0:4))
p
p <- p + transition_reveal(index) + view_follow(fixed_y = T)
animate(p, renderer = gifski_renderer(), nframes = 300, fps = frame_rate, height = 500, width = 800,
end_pause = 0)
anim_save("atheism.gif")
I think you have missed some delicate points about ggplot2. I will try my best to describe them to you. First of all, you need to enter the discrete values as factor or integer. So you can use as.factor() before plotting or just factor() in the aesthetic. Also, you should consider rounding the percentages as you wish. Here is an example:
set.seed(2023)
df_new <- data.frame(index=1:10, rate=runif(10), year=2001:2010, colour=1:10)
df_new$rate_label <- ifelse(round(df_new$rate, 1) %% 1 == 0,
paste0(round(df_new$rate, 1), ".0%"),
paste0(round(df_new$rate, 1), "%"))
The ggplot for this data is:
library(ggplot2)
p <- ggplot(df_new, aes(x = factor(year), y = rate, fill = factor(colour))) +
geom_bar(stat = "identity", position = "dodge") +
geom_text(aes(y = rate, label = paste0(round(rate,2), "%")), vjust = -0.7) +
coord_cartesian(clip = 'off') +
ggtitle("% population belonging to 'No religion', England and Wales census") +
theme_minimal() +
xlab("") + ylab("") +
theme(legend.position = "none",
plot.title = element_text(size = 18, face = "bold"),
axis.text = element_text(size = 14))
p
And you can combine all theme element in one theme() function (as did I). The output is:
And you can easily animate the plot using the following code:
library(gganimate)
p + transition_reveal(index)
And the output is as below:
Hope it helps.
So it was answered here although I don't know why the fix works.
For some reason, labels need to go into gganimate as factors
as.factor()
I just had to add the line:
df_new$rate_label <- as.factor(df_new$rate_label)
and it works fine.

GGplotly Doesn't Use Entire Plot Area When Height and Width Are Defined

I have a ggplot that I spent a fair amount of time formatting and getting the way I like it and discovered plotly/ggplotly to add in hovertext functionality, zoom, selection, etc. However, when I first tried ggplotly, the plot is very squashed especially on the x-axis (I'm looking at time-series data over the course of several days). I found the height/width parameters, defined them in a way that seemed to make sense, but on run all that seems to happen is the plot area gets larger, but the plot itself doesn't really fill the area defined by height and width. My code is below, along with screenshots of what I'm seeing. How do I get the plot to draw over the entire plotting area?
TimeSeries_plot <- ggplot(data = data_clean, aes(x = timestamp,
y = metrics_speed_download_mbps)) +
geom_line(color = '#9e2f7f') +
geom_point(aes(color=metrics_remote_location)) +
geom_hline(yintercept=DataAvg, color = "#414487") +
geom_text(aes(data_clean$timestamp[[150]],DataAvg, label = paste("average =", DataAvg, "Mbps")),
nudge_y = 30, color = "#414487", size = 5) +
scale_y_continuous(name = "Download Speed (Mbps)") +
scale_x_datetime(name = "Day in YYYY-MM-DD", date_breaks = "1 day", date_labels = "%Y-%m-%d",guide = guide_axis(angle=90)) +
geom_hline(yintercept = round(mean(data_clean$metrics_speed_download_mbps), 0), color = '#f1605d') +
geom_text(aes(data_clean$timestamp[[150]],round(mean(data_clean$metrics_speed_download_mbps)), label = paste("average =", round(mean(data_clean$metrics_speed_download_mbps)),"Mbps")),
nudge_y = 30, color = "#f1605d", size = 5) +
labs(title = "Figure 1: cURL 250M Download Test",
subtitle = "Boundary") +
scale_color_viridis_d() +
theme(legend.position = "bottom",
axis.text.x = element_text(angle = 90),
panel.background = element_rect(fill = '#fcfdbf', color = '#fcfdbf'),
panel.grid.major = element_line(color = '#feca8d', linetype = 'dashed'),
panel.grid.minor = element_line(color = '#feca8d', linetype = 'dotted')
)
ggplotly(TimeSeries_plot)
The ggplotly output looks like this:
ggplotly output with height and width defined:
Finally, ggplot2 output before adding ggplotly and what I'm trying to approximately get back to (using RMarkdown fig.height and fig.width parameters for the R chunk):

Kernel Density Estimate (Probability Density Function) is wrong?

I've created a histogram to show the density of the age at which serial killers first killed and have tried to superimpose a probability density function on this. However, when I use the geom_density() function in ggplot2, I get a density function that looks far too small (area<1). What is strange is that by changing the bin width of the histogram, the density function also changes (the smaller the bin width, the seemingly better fitting the density function. I was wondering if anyone had some guidance to make this function fit better and its area is so far below 1?
#Histograms for Age of First Kill:
library(ggplot2)
AFKH <- ggplot(df, aes(AgeFirstKill,fill = cut(AgeFirstKill, 100))) +
geom_histogram(aes(y=..count../sum(..count..)), show.legend = FALSE, binwidth = 3) + # density wasn't working, so had to use the ..count/../sum(..count..)
scale_fill_discrete(h = c(200, 10), c = 100, l = 60) + # c =, for color, and l = for brightness, the #h = c() changes the color gradient
theme(axis.title=element_text(size=22,face="bold"),
plot.title = element_text(size=30, face = "bold"),
axis.text.x = element_text(face="bold", size=14),
axis.text.y = element_text(face="bold", size=14)) +
labs(title = "Age of First kill",x = "Age of First Kill", y = "Density")+
geom_density(aes(AgeFirstKill, y = ..density..), alpha = 0.7, fill = "white",lwd =1, stat = "density")
AFKH
We don't have your data set, so let's make one that's reasonably close to it:
set.seed(3)
df <- data.frame(AgeFirstKill = rgamma(100, 3, 0.2) + 10)
The first thing to notice is that the density curve doesn't change. Look carefully at the y axis on your plot. You will notice that the peak of the density curve doesn't change, but remains at about 0.06. It's the height of the histogram bars that change, and the y axis changes accordingly.
The reason for this is that you aren't dividing the height of the histogram bars by their width to preserve their area. Your y aesthetic should be ..count../sum(..count..)/binwidth to keep this constant.
To show this, let's wrap your plotting code in a function that allows you to specify the bin width but also takes the binwidth into account when plotting:
draw_it <- function(bw) {
ggplot(df, aes(AgeFirstKill,fill = cut(AgeFirstKill, 100))) +
geom_histogram(aes(y=..count../sum(..count..)/bw), show.legend = FALSE,
binwidth = bw) +
scale_fill_discrete(h = c(200, 10), c = 100, l = 60) +
theme(axis.title=element_text(size=22,face="bold"),
plot.title = element_text(size=30, face = "bold"),
axis.text.x = element_text(face="bold", size=14),
axis.text.y = element_text(face="bold", size=14)) +
labs(title = "Age of First kill",x = "Age of First Kill", y = "Density") +
geom_density(aes(AgeFirstKill, y = ..density..), alpha = 0.7,
fill = "white",lwd =1, stat = "density")
}
And now we can do:
draw_it(bw = 1)
draw_it(bw = 3)
draw_it(bw = 7)

Dotplot: How to change dot sizes of dotplot based on a value in data and make all x axis values into whole numbers

I have made a dotplot for my data but need to help with the finishing touches. Been around stackoverflow a bit and haven't seen any posts that directly answer my queries yet.
My code for my dotplot is:
ggplot()+
geom_dotplot(mapping = aes(x= reorder(Description, -p.adjust), y=Count, fill=-p.adjust),
data = head(X[which(X$p.adjust < 0.05),], n = 15), binaxis = 'y', dotsize = 2,
method = 'dotdensity', binpositions = 'all', binwidth = NULL)+
scale_fill_continuous(low="black", high="light grey") +
labs(y = "Associated genes", x = "wikipathways", fill = "p.adjust") +
theme(axis.text=element_text(size=8)) +
ggtitle('') +
theme(plot.title = element_text(2, face = "bold", hjust = 1),
legend.key.size = unit(2, "line")) +
theme(panel.background = element_rect(fill = 'white', colour = 'black'))+
coord_fixed(ratio = 0.5)+
coord_flip()
Let's say the X is something along the lines of:
Description p.adjust Count GeneRatio
1 DescriptionA 0.001 3 3/20
2 DescriptionB 0.002 2 2/20
3 DescriptionC 0.003 5 5/20
4 DescriptionD 0.004 10 10/20
To complete this plot I need two edits.
I would like to use base the size of the dots on the GeneRatio, and make a secondary key based around this size. Is this possible with ggplot2, dotplots?
Next I would like to keep the X axis values as integers. I'd want to avoid using something like scale_x_continuous(limits = c(2, 10)) as this plot code is part of a function for multiple data sets of various sizes. Thus containing the limits/scale would not work well.
Help would be most appreciated.
If you can switch to a geom_point chart instead of geom_dotplot it's easy to adjust the dot size according to a variable. It also seems to have corrected your axis issue luckily enough.
ggplot(x)+
geom_point(mapping = aes(x= reorder(Description, -p.adjust), y=Count, fill=-p.adjust, size=GeneRatio),
data = head(x[which(x$p.adjust < 0.05),], n = 15), binaxis = 'y', #dotsize = 2,
method = 'dotdensity', binpositions = 'all', binwidth = NULL)+
scale_fill_continuous(low="black", high="light grey") +
labs(y = "Associated genes", x = "wikipathways", fill = "p.adjust") +
theme(axis.text=element_text(size=8)) +
ggtitle('') +
theme(plot.title = element_text(2, face = "bold", hjust = 1),
legend.key.size = unit(2, "line")) +
theme(panel.background = element_rect(fill = 'white', colour = 'black'))+
coord_fixed(ratio = 0.5)+
coord_flip()

knitr - How to make a plot fill the whole available A4 page space in pdf?

I've just found out about this tool knitr. Fist thing I want it use for is to produce simple raport appendix consisting of one page with content description and series of pages with plots, one plot per page, plots generated in a loop.
However, I've encountered a problem with size of single plot - it doesn't fill whole available space on page. I've tried different settings of fig.width, fig.height , ang googled around a little, but nothing works so far.
Here is how it looks like now:
The red rectangle is the approximation of desired size of a plot.
Here the code:
\documentclass{article}
\begin{document}
<< echo =FALSE >>=
suppressMessages(suppressWarnings(library("ggplot2")))
suppressMessages(suppressWarnings(library("RColorBrewer")))
colours <- brewer.pal(11, "RdYlGn")[3:9]
#
<< echo=FALSE, fig.width = 8.3, fig.height = 11.7, fig.align = 'center' >>=
name.percentage <- data.frame(name = paste0(LETTERS[1:30], letters[1:30], sample(LETTERS[1:30], size = 30, replace = TRUE )),
percentage = 0.85 + runif(30, 0, 0.15))
name.percentage <- rbind(
transform(name.percentage, type = 1, fill = cut(percentage, breaks = c(-Inf,(1:6 * 3 + 81)/100, Inf), right = T, labels = colours)),
transform(name.percentage, percentage = 1 - percentage, type = 2, fill = "#EEEEEE")
)
plot <- ggplot(data = name.percentage,
aes( x = name, y = percentage, fill = fill)) +
geom_bar(stat = "identity", position = "stack", width = 0.75) +
scale_fill_identity(guide = "none") +
labs(x = NULL, y = NULL) +
scale_y_continuous(expand = c(0,0)) +
scale_x_discrete(expand = c(0,0)) +
coord_flip() +
theme_classic() +
theme(axis.ticks.y = element_blank(),
axis.text.y = element_text(size = 11, colour = "black" ),
axis.text.x = element_text(size = 11, colour = "black" ),
axis.line = element_blank(),
plot.margin = unit(c(0,5,0,0),"mm"),
aspect.ratio = 1.45)
print(plot)
#
\end{document}
Any suggestions will be much appreciated!
The LaTeX Package geometry did the trick - as #CL pointed out, the problem was that default LaTeX margins were bigger than I thought they were. It was sufficient to add this line:
\usepackage[a4paper, total={6in, 8in}]{geometry}

Resources