distance to legend text ggplot - r

I am trying to get a little more distance between the legend box (indicator) and the legend text. I have a code adapted from this amazing page. Here is my MWE:
library(openxlsx) # for reading in Excel data
library(dplyr) # for data manipulation
library(tidyr) # for data manipulation
library(magrittr) # for easier syntax in one or two areas
library(gridExtra) # for generating some comparison plots
library(ggplot2) # for generating the visualizations
mwedata <- data.frame(Metro=c(rep("Dayton,OH",6)))
mwedata$class <- as.character(c("Lower","Middle","Upper","Lower","Middle","Upper"))
mwedata$year <- as.numeric(c(rep(2000,3),rep(2014,3)))
mwedata$value <- as.numeric(c(0.221,0.580,0.199,0.269,0.527,0.204))
mwedata <- mwedata %>%
mutate(y_label = paste0(round(value*100, 1), "%"))
plot <- ggplot(mwedata, aes(x = class, y = value, fill = factor(year))) +
geom_bar(stat = "identity", position = "dodge") +
scale_fill_manual(values = c("#29ABE2", "#217693")) +
geom_text(aes(label = y_label), position = position_dodge(0.9),
vjust = 1.5, color = "white", family = "Georgia")
plot <- plot +
scale_y_continuous(labels = scales::percent) +
scale_x_discrete(labels = c("Lower" = "Lower Class",
"Middle" = "Middle Class", "Upper" = "Upper Class")) +
labs(title = "Distribution of Adults by Income in Dayton, OH",
subtitle = "The percentage of adults in the middle class eroded by 5.3% from 2000 to 2014. Although a small \nfraction of these individuals moved into the upper class (+0.5%), the majority of these middle class \nindividuals moved into the lower income class (+4.8%).",
caption = "Source: Pew Research Center analysis of the \n2000 decennial census and 2014 American \nCommunity Survey (IPUMS)")
plot +
theme_minimal() +
theme(axis.title = element_blank(),
panel.grid.major.x = element_blank(),
panel.grid.minor = element_blank(),
legend.position = c(1,1), legend.justification = c(1,1),
legend.background = element_blank(),
legend.direction="vertical",
text = element_text(family = "Georgia"),
plot.title = element_text(size = 18, margin = margin(b = 10)),
plot.subtitle = element_text(size = 10, color = "darkslategrey", margin = margin(b = 25)),
plot.caption = element_text(size = 8, margin = margin(t = 10), color = "grey70", hjust = 0),
legend.title = element_blank(),
legend.text.align = 2)
The last line of code legend.text.align is supposed to move the text from the legend coloured boxes, but it only seem to apply for the lower of the two. See the image below. Can anyone help me?
EDIT 1:
I totally forgot to include the defined data.frame. I have now updated the MWE so it really is an WE with this line of code
mwedata <- data.frame(Metro=c(rep("Dayton,OH",6)))
Im sorry for the confusion..

This helps resolve the issue:
Remove legend.title = element_blank() and legend.text.align = 2 from theme()
Add fill = "" to labs()
Curious observation while debugging: using your original code, just changing the font family, e.g. from "Georgia" to "Open Sans", removes the discrepancy in alignment between the two labels in the legend.

Related

Adjust grid lines in ggplot+geom_tile (heatmap) or geom_raster

This heatmap has a grid builtin, which I am failing to find the way to customize.
I want to preserve horizontal lines in the grid, if possible increase thickness, and disable vertical lines. Each row should look as a continuous time-serie where data is present and blank where it is not.
Either adding vertical/horizontal lines on-top would possibly cover some data, because of that grid lines, or controlled gaps between tiny rectangles, is preferable.
Alternativelly, geom_raster doesn't shows any grid at all. With which I would need to add the horizontal lines of the grid.
I tried changing linetype, the geom_tile argument, which does seem to change the type or allow to fully disable it with linetype=0, fully disabling the grid, but it wouldn't allow to preserve horizontal grid-lines. I didn't saw any changes by modifying the size argument.
This is the code generating the plot as above:
ggplot( DF, aes( x=rows, y=name, fill = value) ) +
#geom_raster( ) +
geom_tile( colour = 'white' ) +
scale_fill_gradient(low="steelblue", high="black",
na.value = "white")+
theme_minimal() +
theme(
legend.position = "none",
plot.margin=margin(grid::unit(0, "cm")),
#line = element_blank(),
#panel.grid = element_blank(),
panel.border = element_blank(),
panel.grid = element_blank(),
panel.spacing = element_blank(),
#panel.grid = element_line(color="black"),
#panel.grid.minor = element_blank(),
plot.caption = element_text(hjust=0, size=8, face = "italic"),
plot.subtitle = element_text(hjust=0, size=8),
plot.title = element_text(hjust=0, size=12, face="bold")) +
labs( x = "", y = "",
#caption= "FUENTE: propia",
fill = "Legend Title",
#subtitle = "Spaces without any data (missing, filtered, etc)",
title = "Time GAPs"
)
I tried to attach DF %>% dput but I get Body is limited to 30000 characters; you entered 203304. If anyone is familiar with a similar Dataset, please advise.
Additionally,
There are 2 gaps at left&right of the plot area, one is seen inbetween the y-axis, and at the right you can see the X-axis outbounding, and are not controlled by a plot.margin argument.
I would want to set the grid to a thicker line when month changes.
The following data set has the same names and essential structure as your own, and will suffice for an example:
set.seed(1)
DF <- data.frame(
name = rep(replicate(35, paste0(sample(0:9, 10, T), collapse = "")), 100),
value = runif(3500),
rows = rep(1:100, each = 35)
)
Let us recreate your plot with your own code, using the geom_raster version:
library(ggplot2)
p <- ggplot( DF, aes( x=rows, y=name, fill = value) ) +
geom_raster( ) +
scale_fill_gradient(low="steelblue", high="black",
na.value = "white") +
theme_minimal() +
theme(
legend.position = "none",
plot.margin=margin(grid::unit(0, "cm")),
panel.border = element_blank(),
panel.grid = element_blank(),
panel.spacing = element_blank(),
plot.caption = element_text(hjust=0, size=8, face = "italic"),
plot.subtitle = element_text(hjust=0, size=8),
plot.title = element_text(hjust=0, size=12, face="bold")) +
labs( x = "", y = "", fill = "Legend Title", title = "Time GAPs")
p
The key here is to realize that discrete axes are "actually" numeric axes "under the hood", with the discrete ticks being placed at integer values, and factor level names being substituted for those integers on the axis. That means we can draw separating white lines using geom_hline, with values at 0.5, 1.5, 2.5, etc:
p + geom_hline(yintercept = 0.5 + 0:35, colour = "white", size = 1.5)
To change the thickness of the lines, simply change the size parameter.
Created on 2022-08-01 by the reprex package (v2.0.1)

Plot a raster stack with values above a certain threshold in R ggplot

I have a raster stack of 8 TIF images. I am plotting them using the gplot. As you can see, the raster values are from 0 to 8, but its impossible to have a place where the pH value of the groundwater is around 0.
So, what I want to do is filter the values of the raster files and plot only the regions with pH above 5.
Also, I want to label the regions with pH values below 5 in a different color (say, grey) and add a corresponding 'No Data Available' in the legend. Is it possible to do that?
I tried doing something like pfiles <- pfiles1 > 5 and then plot it. But it throws an error: Error: Discrete value supplied to continuous scale. I want to use gradient color.
The code I am using:
library(raster)
library(rasterVis)
library(viridis)
library(ggplot2)
library(ggpubr)
shp = broom::tidy(shapefile("G:/WB_dist_utm.shp"))
rfiles = list.files(path = "G:/GW_IDW", pattern = "*.tif", full.names = TRUE)
atr = c(paste("Year", sep = "_", seq(2005, 2019, 2)))
pfiles1 = stack(rfiles[c(33:40)])
names(pfiles) = atr
ph = gplot(pfiles) +
geom_raster(aes(fill = value)) +
geom_path(data=shp, aes(long, lat, group=group), color = 'black') +
facet_wrap(~ variable, ncol = 4) +
scale_fill_gradientn(colours = rev(magma(30)), na.value = "transparent", n.breaks = 6) +
theme_bw() +
theme(axis.text = element_blank(), legend.position = "right", legend.direction = "vertical",
axis.title = element_blank(),
legend.key.height = unit(2, "cm"), legend.key.width = unit(1, "cm"),
legend.title = element_blank(), legend.text = element_text(size = 26),
strip.text = element_text(size = 28, face = "bold"),
plot.title = element_text(size = 23, face = "bold"),
plot.subtitle = element_text(size = 15), plot.caption = element_text(size = 10)) +
labs(title = "Ground Water Quality in West Bengal: 2005 - 2019",
subtitle = "pH values\n",
caption = "Data source: ENVIS Centre on Control of Pollution Water, Air and Noise.
http://www.cpcbenvis.nic.in/water_quality_data.html
Prepared by: Akhilesh Kumar") +
coord_equal()
This is the plot I am getting. I want to plot only the pH values > 5 and show the rest of the region in grey with a legend entry as "No Data Available".
Thank you.

In R, ggplot for a population pyramid: how to align labels near to the axis with geom_bar geom_label after flipping the coordinates

I am making a sort of population pyramid using ggplot (plotrix doesn't allow me to do fancy labels etc), then I start with a geom_bar with labels and later I flip the coordinates. Sadly, labels almost cannot being seeing. I would like to move those labels near to the "y- axis" in the middle, that now is showing the age groups.
Data is here: d <- data.frame(age.grp2 = c("1-10", "11-20", "21-30", "31-40", "41-50", "1-10", "11-20", "21-30", "31-40", "41-50"),
sex = c("Female","Female","Female","Female","Female","Male","Male","Male","Male","Male" ),
n.enroll = c(288,500,400,300,200,300,460,300,200,300),
proportion = c(17.1,29.6,23.7,17.8,11.8,51,47.9,42.9,40,60),
proportion2 = c(-17.1,-29.6,-23.7,-17.8,-11.8,51,47.9,42.9,40,60)) My code is this one: ggplot(d, aes(x = age.grp2, y = proportion2, fill = sex)) +
geom_bar(position = position_dodge(width=1), stat='identity') +
geom_label(aes(label = paste(n.enroll," (",proportion,"%)", sep=""), group = factor(sex)),
fill="white", colour = "black",
position= position_dodge(width=1),
size = 3) +
scale_fill_manual(values=c("#BFD5E3", "grey")) +
facet_share(~sex, dir = "h", scales = "free", reverse_num = TRUE) +
coord_flip() +
theme(panel.grid.major = element_blank(),
panel.grid.minor = element_blank(),
#panel.border = element_blank(),
panel.background = element_blank(),
legend.position = "none",
#axis.line.x = element_line(color = "black"),
axis.ticks.y = element_blank(),
axis.text.x = element_text(colour = "black", size = 8, face = "bold", angle=0, hjust=0.5),
axis.text.y = element_text(colour = "black", size = 8, face = "bold"),
axis.title.x = element_text(size = 14, face="bold", margin = margin(t = 30, r = 20, b = 10, l = 20)),
plot.margin = unit(c(1,1,1,1),"cm")) +
labs(y = "Enrollment percentage within sex",x="") I am attaching also the plot, where we can see in females the label in the age group 11-20 is cut. I would like to have all labels near to the age group labels, within each bar: female labels moved to the right and male labels move it to the left. Also, I would like to have each x-axis extended to 100% or at least in same range, in females goes up to 30% and in males goes up to 60%. Thanks for all the comments
Here's a minimal solution using the base ggplot package, without most of your formatting. The key part is to add a conditional y = ... into the geom_label(aes()) section:
d %>%
mutate(
label = str_c(n.enroll, " (", proportion, "%)"),
label_loc = if_else(sex == "Female", -9.5, 3),
proportion_for_chart = if_else(sex == "Female", -proportion, proportion)
) %>%
ggplot(aes(x = age.grp2, y = proportion_for_chart, fill = sex)) +
geom_col(show.legend = FALSE) +
geom_label(aes(y = label_loc, label = label), size = 3, fill = "white", hjust = 0) +
coord_flip() +
facet_wrap(~ sex, scales = "free") +
theme(
axis.title = element_blank()
)
Whenever possible, I try to reshape data and use geom_col rather than try to get lucky with geom_bar. You should be able to play around with different hard-coded values of y in the geom_label call to fix the proper location for your labels based on your formatting and image size/scale.

Mix different font sizes/faces in plot caption

Using the code below to generate a heat map in R. It works well. The first section is the metric that I use to color the map. My question is how would I have the caption read like below. I know that one would put the first line in front of the other in the section that begins with caption = Paste ("Source...")However, how to have the first line be bigger and bold font is escaping me.
map50<- merge(us50, pop1)
breaks <- seq(-.01, .05, by = .01)
map50$c1<- cut(map50$growth, breaks, label=c("-1% to 0%", "0% to 1%", "1%
to 2%", "2% to 3%", "3% to 4%","4% to 5%"))
library(ggplot2)
b= ggplot(data= map50, aes(x=long, y=lat, group=group))
d= b+geom_polygon(aes(fill=c1),
colour=alpha("black"),size=.05)+scale_fill_brewer(palette="YlOrRd",name="y/y
growth rates")+coord_equal()
d= d+labs(x = NULL, y = NULL, fill = NULL,
title = "Average Employment Growth By State Q3 2018",
subtitle = "For Private, All Industries",
caption = paste("Source: BLS Quarterly Census of Employment and
Wages\nProduced By: #NVlabormarket"))
d= d+theme_void()
d=d+theme(text = element_text(family = "NimbusSan", size = 10),
plot.title = element_text(size = 20, face = "bold"),
plot.margin = unit(c(0, 0.25, 0.0, 0.25), "in"),
panel.border = element_rect(fill = NA, colour = "#cccccc"),
legend.text = element_text(size = 8),
legend.position=c(.93, 0.2))
ggsave("Q32018EmploymentGrowthHeat-YlOrRdu.pdf")
You can plot first caption (bold line) using caption argument and second line using tag argument in labs function. Next you have to manually specify tag position using plot.tag.position.
library(ggplot2)
ggplot(mtcars, aes(cyl, mpg)) +
geom_point() +
labs(caption = "Source: BLS Quarterly Census of Employment and Wages",
tag = "Produced By: #NVlabormarket") +
theme(plot.caption = element_text(vjust = 4, size = 9, face = "bold"),
plot.tag = element_text(size = 9),
plot.tag.position = c(0.89, 0))

Automated calculation of a plot margin, dependent on the size of a label

For a rather long report, I am trying to unify a number of bar-plots. The plots in general look like this:
The goal is that all the vertical axis start at the same position (e.g. 2 cm from the left plot-boarder), no matter how long the labels in front of the axis are.
The data that goes into the plot is generated as follows:
vector_bar <- as.character(c("Bar1","Bar1","Bar1","Bar1",
"Bar2","Bar2","Bar2","Bar2",
"thatincrediblylonglabel",
"thatincrediblylonglabel",
"thatincrediblylonglabel",
"thatincrediblylonglabel"))
vector_position <- as.numeric(c(1,1,1,1,2,2,2,2,3,3,3,3))
vector_bar_section <- c("section1","section2","section3","section4","section1","section2","section3","section4","section1","section2","section3","section4")
vector_percent <- as.numeric(c(1,0,0,0,0,1,0,0,0,0,1,0))
vector_yposition <- as.numeric(c(1.05, 1.15, 1.25, 1.35,1.05, 1.15, 1.25, 1.35,1.05, 1.15, 1.25, 1.35))
df <- data.frame(cbind(vector_bar,vector_position,vector_bar_section,vector_percent,vector_yposition))
#Formating
df$vector_percent <- as.numeric(as.character(df$vector_percent))
df$vector_yposition <- as.numeric(as.character(df$vector_yposition))
df$vector_bar <- as.character(df$vector_bar)
Now the ggplot-code:
ggplot(df, aes(x = vector_bar, y = vector_percent, fill = factor(vector_bar_section, levels = rev(c("section1", "section2", "section3", "section4"))))) +
geom_label(data = df, aes(x = vector_bar, y = vector_yposition, label = vector_percent),
colour = "white", fontface = "bold", size = 7.75, show.legend = FALSE) +
geom_bar(stat = "identity", data = subset(df), width = 0.65, colour = "white", lwd = 1.3) +
coord_flip() +
ggtitle("") +
theme(plot.title = element_text(size = 40, face = "bold"),
legend.title = element_text(size = 19),
legend.text = element_text(size = 19, color = "#587992"),
legend.key.size = unit(1.4, "line"),
legend.key.width = unit(3.4, "line"),
axis.text.x = element_text(size = 19, color = "#587992"),
axis.text.y = element_text(size = 19, color = "#587992"),
axis.ticks.y = element_blank(),
axis.ticks.x = element_blank(),
axis.title.y = element_blank(),
legend.position = "top",
legend.direction = "horizontal",
panel.border = element_blank(),
panel.grid.major = element_blank(),
panel.grid.minor = element_blank(),
plot.background = element_rect(fill = "white"),
panel.background = element_rect(fill = "white"),
plot.margin = unit(c(0,0,0,Autoplotmargin(df$vector_bar)), units = "in")) +
scale_y_continuous(labels = percent_format(), position = "top",breaks = seq(0,1,0.2)) +
scale_fill_manual("", values = c("section1"= "#FF0000",
"section2" = "#595959",
"section3" = "#A6A6A6",
"section4" = "#0D0D0D"), guide = guide_legend(reverse = TRUE, nrow = 1)) +
scale_x_discrete(limits = c(unique(df$vector_bar)), labels = addline_format(rev(c(unique(df$vector_bar))))) +
geom_segment(aes(x = 0.5, xend = length(unique(df$vector_bar)) + 0.5, y = 0, yend = 0),color="#587992", size = 1.5) +
geom_segment(aes(x = length(unique(df$vector_bar)) + 0.5, xend = length(unique(df$vector_bar)) + 0.5, y = 1, yend = 0),color = "#587992", size = 1.5) +
labs(y = "", x = "")
with:
addline_format <- function(x,...){
gsub('\\s ','\n',x)
}
Now the interesting part is the function "Autoplotmargin" which I have defined as follows:
Autoplotmargin <- function(x) {
y <- as.numeric(Marginkonstante)-as.numeric(unit(strwidth(strsplit(x[which.max(nchar(x))], " ", "[")[[1]][1],7.75, units = "in"), units= "in"))
y
}
whereas:
Marginkonstante <- unit(c(20), units = "in")
The idea behind this function is, that I first search for the longest label in df$vector_bar and measure it's length in inch:
as.numeric(unit(strwidth(strsplit(x[which.max(nchar(x))], " ", "[")[[1]][1],7.75, units = "in"), units= "in"))
Ignore the "strsplit"-section. It is needed because I have line breaks inside the labels and I split the string so that only the characters before the first line break are considered.
So this basically gives me the length of the longest label. I now set the Marginkonstante to a value, 20 in the example.
Now the idea is that the Autoplotmargin is defined as those 20 inches I set up, subtracted the length of the longest string. Amongst multiple plots this should set up the margin in a way that the vertical axis is positioned at the same place in every plot.
Problem is, that this does not happen. The tendency is right tough: for longer labels, the function Autoplotmargin gives me lower values, for shorter labels, it gives me higher values. But the axis are far away from being in the same position for all plots.
What is wrong in my way of thinking?
Important side-notes:
I do set fig.width in the rmarkdown chunk options, so that all figures are the same width.
I know there is a solution to this problem by using grid and/or grob functions (see here for example). I have looked into that, but can not use these solutions for a number of reasons (not explaining that in detail here, too long).
Thank you for your assistance in advance!
Best,
Fabian
Problem solved!
As it looks like, my way of thinking was all correct and I just had to change one thing to make it work. In the Autoplotmargin-Function, you choose in which Font-Size you want the string to be measured. In my case, I started with 7.75:
Autoplotmargin <- function(x) {
y <- as.numeric(Marginkonstante)-as.numeric(unit(strwidth(strsplit(x[which.max(nchar(x))], " ", "[")[[1]][1],7.75, units = "in"), units= "in"))
y
}
Now after playing around with that 7.75 value (in my case decreasing it), all works fine!
In my plot's I get an almost perfect result with 1.3:
Autoplotmargin <- function(x) {
y <- as.numeric(Marginkonstante)-as.numeric(unit(strwidth(strsplit(x[which.max(nchar(x))], " ", "[")[[1]][1],1.3, units = "in"), units= "in"))
y
}

Resources