I’d like to use ggplot to draw a grid plot of the following scenario which I’ve attempted to depict in the picture below... I could use some guidance on how to logically think about the approach. Thank you for the guidance.
--
Each aisle in the example picture below has an odd number side—and an even number side
The spaces on the odd-side are listed ascending from 1… K where K is odd
The spaces on the even-side are listed ascending from 2…N where N is even
This pattern exists for each aisle in the parking lot
If a car is parked in a space—we track that spot in a database.
How can I reproduce a grid-level ggplot to indicate with a symbol on the plot all spaces where a car is parked?
The listing of occupied spaces would be “fed” into the ggplot logic via a .csv file: the format of the .csv would look something like this:
A01
A04
A05
A08
A09
A15
A20
A33
B07
B31
B44
C01
C04
C36
...
Image credit: Michael Layefsky, 2010, Google Images
My experience with direct use of grid is limited, so I can't say how hard this would be with grid functions, but it seems reasonably straightforward in ggplot2. Here's a simple example that is (I hope) not too far off from what you're looking for:
library(ggplot2)
# Set up grid of space identifiers
df = data.frame(y=1:10, x=rep(c(0:1, 3:4, 6:7), each=10),
space=paste0(rep(c("A","B","C"), each=20),
rep(c(seq(2,20,2),seq(1,20,2)), 3)),
stringsAsFactors=FALSE)
# Assume we have a vector of occupied spaces
set.seed(194)
occupied = sample(df$space, 30)
# Mark occupied spaces in data frame
df$status = ifelse(df$space %in% occupied, "Occupied", "Available")
ggplot(df) +
geom_segment(aes(x=x - 0.5, xend=x + 0.5, y=y, yend=y - 1)) +
geom_label(aes(label=space, x=x, y=y, fill=status), colour="blue", label.size=0) +
annotate(geom="segment", x=seq(0.5,6.5,3), xend=seq(0.5,6.5,3),
y=rep(0,3), yend=rep(10,3), lty="11") +
theme_bw(base_size=14) +
scale_fill_manual(values=c(hcl(c(105,15),100,65))) +
#scale_fill_manual(values=c(NA, hcl(15,100,65))) + # Color only occupied spaces
theme(axis.text = element_blank(),
axis.ticks = element_blank(),
panel.grid = element_blank()) +
labs(x="",y="",fill="")
If you are taking a list of only the occupied spots as input in the form that you showed, and then you want to produce a visualization of occupied spots using ggplot2, this approach will work. First, I process the input, turning it into something that I can give ggplot easily.
# the provided example data
d <- read.table(text="
A01
A04
A05
A08
A09
A15
A20
A33
B07
B31
B44
C01
C04
C36", stringsAsFactors=FALSE)
Split the "spaces" into meaningful coordinates. I kept the original space names around for later labeling. What follows is all manipulation used to get the plot set up correctly.
cars <- strsplit(d[,1], "(?<=[A-Z])", perl=TRUE) # split the raw data
# turn resulting list into data.frame and give it names
cars <- setNames(do.call(rbind.data.frame, cars), c("aisle","spot.num"))
# convert the from factors to numeric,
# and turn the aisle letter into numeric data for plotting
# retain the original spot id for labeling the plot
cars <- with(cars, data.frame(
spot.num = as.numeric(as.character(spot.num)),
aisle = aisle, # keep this around for faceting
aisle.coord = 2 * (utf8ToInt(paste(as.character(aisle), collapse="")) - utf8ToInt("A")),
spot.id = d[,1]))
I multiplied the aisle by 2 after converting A to 1, B to 2, and so on, to make a new variable called aisle.coord. The reason for multiplying by 2 is to set up a variable where each aisle can be composed of two lines:
# if the spot number is even, increment aisle by 1 (put it on the right).
# This is possible because I multiplied by 2 earlier
cars$aisle.coord[cars$spot.num %% 2 == 0] <- cars$aisle.coord[cars$spot.num %% 2 == 0] + 1
# We need to adjust the spot numbers to real row numbers
# i.e. A02 is in row 1, not row 2, A10 is in row 5, etc.
cars$spot <- ceiling(cars$spot.num / 2)
Now, the plotting:
library(ggplot2)
library(grid) # for unit()
ggplot(cars, aes(x = aisle.coord %% 2, y = spot)) +
geom_tile(width = 0.5, height = 0.8) +
facet_grid(~aisle) +
geom_text( aes(x = aisle.coord %% 2, y = spot, label = spot.id), color = "white")
That is a bare-bones attempt at the graph. Lots of room for you to improve and adjust it. Here is another attempt with a little more effort. Still, plenty of room for adjustment (e.g. you could adjust the plot so that a the full lot appears, not just the part of the lot up to the maximum spot: B44):
ggplot(cars, aes(x = aisle.coord %% 2, y = spot)) +
geom_tile(width = 0.5, height = 0.8, fill = "orange") +
facet_grid(~aisle) +
geom_text( aes(x = aisle.coord %% 2, y = spot, label = spot.id), color = "white", size = 4) +
annotate("rect", ymin = 0, ymax = max(cars$spot)+0.5, xmin = 0.3, xmax = 0.7, fill = "grey40") +
theme(panel.margin.x = unit(0.05, "lines"),
plot.background = element_rect("grey40"),
panel.background = element_rect("grey40"),
panel.grid.minor = element_blank(),
axis.title = element_blank(),
axis.text = element_blank(),
strip.text = element_blank(),
strip.background = element_blank()) +
scale_y_continuous(breaks = seq(0.5, (max(cars$spot) + 0.5), 1)) +
scale_x_continuous(breaks = c(-0.3, 1.3)) +
geom_text(data=data.frame(x = 0.5, y = 10, aisle = LETTERS[1:length(unique(cars$aisle))]),
aes(x = x, y = y, label = aisle), inherit.aes = FALSE, color = "white")
Related
I want to write multiple pieces of information in each y-axis label of a ggplot bar chart (or any similar kind of plot). The problem is having everything aligned nicely.
It's probably best explained with an example for what I want to have:
My primary issue is the formatting on the left side of the figure.
What I've tried so far includes using monospace fonts to write the labels. This basically works but I want to try and avoid the use of monospace fonts for aesthetic purposes.
I've also tried making several ggplots where the idea was to remove everything in two initial plots, except for the y-axis labels (so these "plots" would only be the y-axis labels). Then align the plots next to each other using grid.align. The problem I have here is that there doesn't seem to be a way to remove the plot part of a ggplot (or is there?). It also requires some tweaking since removing x-axis labels in one of the "empty" plots would result in the labels moving down (since no space is occupied by the x-axis labels/title anymore).
I've also tried an approach using geom_text and setting the appropriate distances using the hjust parameter. However, for some reason, the spacing does not seem to be equal for the different size labels (for example distances for the "Red" and "Turquoise" labels are different for the same hjust). As the real data has many more variations in label sizes this variation makes the table look very messy...
I'm not too concerned about the headers since they are easy to add to the figure manually. The values on the right are also not too much of a problem since they have a fixed width and I can use geom_text to set them. So my main problem is with the y-axis (left) labels.
Here's an example data set:
dt = data.frame(shirt = c('Red','Turquoise','Red','Turquoise','Red','Turquoise','Red','Turquoise'),
group = c('Group alpha','Group alpha','Group beta','Group beta','Group delta','Group delta','Group gamma','Group gamma'),
n = c(22,21,15,18,33,34,20,19),
mean = c(1, 4, 9, 2, 4, 5 , 1, 2),
p = c(0.1, 0.09, 0.2, 0.03, 0.05, 0.99, 0.81, 0.75))
The closest I could come to is to use guide_axis_nested() from ggh4x for formatting the left part. (Disclaimer: I'm the author of ggh4x). With this axis, you can't align spanning categories (e.g group) to the top, nor have titles for the different levels.
library(ggplot2)
library(ggh4x)
# Create some dummy data
df <- expand.grid(
group = paste("Group", c("alpha", "beta", "delta", "gamma")),
shirt = c("Red", "Turquoise")
)
df$N <- sample(1:100, nrow(df))
df$mean <- rlnorm(nrow(df), meanlog = 1)
df$pvalue <- runif(nrow(df))
ggplot(df, aes(x = mean, y = interaction(N, shirt, group, sep = "&"))) +
geom_col() +
guides(
y = guide_axis_nested(delim = "&"),
y.sec = guide_axis_manual(
breaks = interaction(df$N, df$shirt, df$group, sep = "&"),
labels = scales::number(df$pvalue, 0.001)
)
) +
theme(
axis.text.y.left = element_text(margin = margin(r = 5, l = 5)),
ggh4x.axis.nesttext.y = element_text(margin = margin(r = 5, l = 5)),
ggh4x.axis.nestline = element_blank()
)
Created on 2021-11-16 by the reprex package (v1.0.0)
I think #teunbrand provided a very neat solution and code-wise a lot cleaner than mine. However, I also tried another approach using annotation_custom() (based on this answer in another question). The result is quite nice and it should be fairly easy to customize.
dt = data.frame(shirt = c('Red','Turquoise','Red','Turquoise','Red','Turquoise','Red','Turquoise'),
group = c('Group alpha','Group alpha','Group beta','Group beta','Group delta','Group delta','Group gamma','Group gamma'),
n = c(22,21,15,18,33,34,20,19),
lvls = c(1,2,3,4,5,6,7,8),
mean = c(1, 4, 9, 2, 4, 5 , 1, 2),
p = c(0.1, 0.09, 0.2, 0.03, 0.05, 0.99, 0.81, 0.75))
dt$groups = paste(dt$group, dt$shirt)
dt$groups = factor(dt$groups, levels=rev(dt$groups))
p2 = ggplot(dt) +
geom_col(aes(x=groups, y=mean)) +
coord_flip(clip='off') +
theme_bw() +
theme(axis.text.y = element_blank(),
axis.title.y = element_blank(),
plot.margin = unit(c(0.5,1,0,3.5), "in") # top, right, bottom, left
)
# Compute the position on the X axis for each information column
# I wanted fixed widths for the margins, so I basically compute what the X value
# would be on a specific location of the figure.
x_size = ggplot_build(p2)$layout$panel_params[[1]]$x.range[2] - ggplot_build(p2)$layout$panel_params[[1]]$x.range[1] # length of x-axis
p_width = par()$din[1] - 4.5 # width of plot minus the margins as defined above in: plot.margin = unit(c(0.5,1,0,3.5), "in")
rel_x_size = p_width / x_size # size of one unit X in inch
col1_x = ggplot_build(p2)$layout$panel_params[[1]]$x.range[1] - (3 / rel_x_size) # the Group column, 3 inch left of the start of the plot
col2_x = ggplot_build(p2)$layout$panel_params[[1]]$x.range[1] - (1.5 / rel_x_size) # the Shirt column, 1.5 inches left of the start of the plot
col3_x = ggplot_build(p2)$layout$panel_params[[1]]$x.range[1] - (0.25 / rel_x_size) # the N column, 0.25 inches left of the start of the plot
col4_x = ggplot_build(p2)$layout$panel_params[[1]]$x.range[2] + (0.2 / rel_x_size) # the P-val column, 0.2 inches right of the end of the plot
# Set the values for each "row"
i_range = 1:nrow(dt)
i_range_rev = rev(i_range) # Because we reversed the order of the groups
for (i in i_range) {
if(i %% 2 == 0) {
# Group
p2 = p2 + annotation_custom(grob = textGrob(label = dt$group[i_range_rev[i]], hjust = 0, gp = gpar()),
ymin=col1_x, ymax=col1_x,
xmin=i,xmax=i)
}
# Shirt
p2 = p2 + annotation_custom(grob = textGrob(label = dt$shirt[i_range_rev[i]], hjust = 0, gp = gpar()),
ymin=col2_x, ymax=col2_x,
xmin=i,xmax=i)
# N
p2 = p2 + annotation_custom(grob = textGrob(label = dt$n[i_range_rev[i]], hjust = 0, gp = gpar()),
ymin=col3_x, ymax=col3_x,
xmin=i,xmax=i)
# P-val
p2 = p2 + annotation_custom(grob = textGrob(label = dt$p[i_range_rev[i]], hjust = 0, gp = gpar()),
ymin=col4_x, ymax=col4_x,
xmin=i,xmax=i)
}
# Add the headers
i = i+1
p2 = p2 + annotation_custom(grob = textGrob(label = expression(bold('Group')), hjust = 0, gp = gpar()),
ymin=col1_x, ymax=col1_x,
xmin=i,xmax=i)
p2 = p2 + annotation_custom(grob = textGrob(label = expression(bold('Shirt')), hjust = 0, gp = gpar()),
ymin=col2_x, ymax=col2_x,
xmin=i,xmax=i)
p2 = p2 + annotation_custom(grob = textGrob(label = expression(bold('N')), hjust = 0, gp = gpar()),
ymin=col3_x, ymax=col3_x,
xmin=i,xmax=i)
p2 = p2 + annotation_custom(grob = textGrob(label = expression(bold('P-val')), hjust = 0, gp = gpar()),
ymin=col4_x, ymax=col4_x,
xmin=i,xmax=i)
p2
Output:
What is basically done, is that margins for the figure are set in plot.margin in the initial plot. Some computation is then performed to determine the correct location for each column of information. Subsequently we loop through the data set and set the values in each column using annotation_custom(). Finally, we can add the headers in a similar manner.
Note: if you resize the plot window (in RStudio for example), you need to re-run the code otherwise the layout will be messed up.
I am plotting a map of my study area and I am having problems to edit the legend title.
I need it to be "Projected fruit productivity in fallows in 40 yrs (fruits ha^-1) written in four lines. I could use bquote() to plot the -1 as a superscript. But it created an extra space that I cannot figure it out how to take it off. The extra space only appears when the title is divided into multiple lines.
Also, expression(atop()) creates the superscript but once I tried to divide it into more than two lines it does not show lines three and four.
This is the Map with the extra space using bquote()
This is the Map with the four line title using expression(atop())
I did try different solutions found on the internet, including this post. But they all plot the fourth line with the extra space or only plot the first or second line.
Bellow is the code I am using. Any help is welcomed.
The comments are different tries.
Data = spatial_dist_fallows.csv
library(sf) #sf = simple feature
library(ggplot2)
library(dplyr)
PAECM_fallows <-read.csv("spatial_dist_fallows.csv")
PAECM_fallows_sp <- st_as_sf(PAECM_fallows,coords = c("X", "Y"),crs = "+proj=longlat +datum=WGS84 +no_defs")
custom_bins_fruit = c(0,60,120,180,240,1400)
PAECM_fallows_fruit <- PAECM_fallows_sp %>%
mutate(prod_cat_fallow = cut(prod_40, breaks= custom_bins_fruit),
age_cat_fallow = cut(age, breaks = c(11,17,22,29,60)))
prod_map_PAECM_fruit<-ggplot()+
geom_sf(data = PAECM_fallows_fruit,aes(size = prod_cat_fallow), shape = 18, show.legend = "point")+
scale_size_manual(values= c(2,3,4,5,6),
# name = "Projected fruit\nproductivity in\nfallows in 40 yrs \n(fruits ha^-1)",
name = bquote("Projected fruit\nproductivity in\nfallows in 40 yrs \n( fruits"*ha^-1*")"),
# name = expression(paste("Projected fruit productivity\nin fallows in 40 yrs\n"),bquote(paste("("*fruits~ha^-1*")"))),#(Fruits/ha)
name = expression(atop("Projected fruit",
"productivity in",
"fallows in 40 yrs",
"( fruits ha"^-1,")")),
breaks= c(NA,"(0,60]","(60,120]","(120,180]","(180,240]","(240,1.4e+03]"),
labels= c("NA","\u2264 60","60 - 120","120 - 180","180 - 240","> 240"),
guide = guide_legend(override.aes = list(linetype = "blank", shape = 18, fill = NA)))+
# labs(size = expression(atop("Projected fruit\nproductivity in\nfallows in 40 yrs\n(fruits"*ha^-1*")", sep="")))+ #comment name line at the scale_size_manual
# labs(size = bquote("Projected fruit productivity \nin fallows in 40 yrs \n( fruits"*ha^-1*")"))+ #comment name line at the scale_size_manual
ggplot2::theme_minimal()+
ggplot2::theme(legend.text.align=0.5,
legend.title.align = 0.5,
plot.background = element_blank(),
panel.grid = element_line(colour = "white"),
panel.background = element_rect(fill = "grey87", color = "white"))+#,
coord_sf(xlim = c(-68.45,-68.2), ylim = c(-11.05,-10.8))
prod_map_PAECM_fruit
Extra question. Once I started to use the bquote I could not align the title text using theme(legend.title.align = 0.5), any other ideas?
After some other tries, I did come up with the following solution for the legend title.
name = expression(atop("",
atop(textstyle("Projected fruit"),
atop(textstyle("productivity in"),
atop(textstyle("fallows in 40 yrs"),
atop(textstyle("(fruits ha"^-1*")"))))))),
I used textstyle() to plot all text with the same size, otherwise it would be plotted smaller every time atop() was called. Atop() creates a space between the first and second line, that is why the first line of the code is atop("", so the first line will be a blank.
This is the final code with the map below.
library(sf) #sf = simple feature
library(ggplot2)
library(dplyr)
PAECM_fallows <-read.csv("spatial_dist_fallows.csv")
PAECM_fallows_sp <- st_as_sf(PAECM_fallows,coords = c("X", "Y"),crs = "+proj=longlat +datum=WGS84 +no_defs")
custom_bins_fruit = c(0,60,120,180,240,1400)
PAECM_fallows_fruit <- PAECM_fallows_sp %>%
mutate(prod_cat_fallow = cut(prod_40, breaks= custom_bins_fruit),
age_cat_fallow = cut(age, breaks = c(11,17,22,29,60)))
prod_map_PAECM_fruit_legend_test<-ggplot()+
geom_sf(data = PAECM_fallows_fruit,aes(size = prod_cat_fallow), shape = 18, show.legend = "point")+
scale_size_manual(values= c(2,3,4,5,6),
name = expression(atop("",
atop(textstyle("Projected fruit"),
atop(textstyle("productivity in"),
atop(textstyle("fallows in 40 yrs"),
atop(textstyle("(fruits ha"^-1*")"))))))),
breaks= c(NA,"(0,60]","(60,120]","(120,180]","(180,240]","(240,1.4e+03]"),
labels= c("NA","\u2264 60","60 - 120","120 - 180","180 - 240","> 240"),
guide = guide_legend(override.aes = list(linetype = "blank", shape = 18, fill = NA)))+
ggplot2::theme_minimal()+
ggplot2::theme(legend.text.align=0.5,
legend.title.align = 0.5,
plot.background = element_blank(),
panel.grid = element_line(colour = "white"),
panel.background = element_rect(fill = "grey87", color = "white"))+#,
coord_sf(xlim = c(-68.45,-68.2), ylim = c(-11.05,-10.8))
prod_map_PAECM_fruit_legend_test
Alternatively, you could use the annotation functions cowplot::draw_label() or ggplot2::annotation_custom(). I think that the explanations about these approaches given in ggplot2 two-line label with expression are helpful here as well.
1) Solution with cowplot::draw_label()
library(ggplot2)
library(cowplot)
#> Warning: package 'cowplot' was built under R version 3.5.2
#>
#> Attaching package: 'cowplot'
#> The following object is masked from 'package:ggplot2':
#>
#> ggsave
# If needed, revert to default theme (cowplot modifies the theme);
theme_set(theme_grey())
# Build a simple plot as example
p <- ggplot(mtcars, aes(x = wt, y = mpg, size = factor(gear))) +
geom_point() +
labs(size = element_blank()) + # remove default legend title
# Make enough space for the custom legend title by tweaking the right margin
theme(legend.margin = margin(t = 0, r = 26, b = 0, l = 0, unit = "mm"))
# Adjust further theme elements if needed, like text size, font, etc
# The lines of text and expression that constitute your custom legend title
lines <- list(
"Projected fruit",
"productivity in",
"fallows in 40 yrs",
expression("(fruits ha" ^-1 ~ ")")
)
# Using relative coordinates ranging from 0 to 1 (relative to the entire canvas).
# There is some guesswork with the coordinates until we get them right.
min_y <- 0.6
step <- 0.04 # dictates the line spacing; need to play with it until you get it right
ys <- seq(from = min_y + step * 4, to = min_y, by = -step)
x <- 0.87
# Add the annotations that will actually constitute the legend title.
gg <- ggdraw(p)
#> Warning: Using size for a discrete variable is not advised.
# Neglect the warning in this example.
for (i in 1:4){
gg <- gg + draw_label(lines[[i]], x = x, y = ys[i])
}
gg
Note that, cowplot::draw_label() can also be used in combination with setting the clipping off, coord_cartesian(clip = "off"), which allows plotting anywhere on the canvas (see next example with ggplot2::annotation_custom()). In such a case, we do not use the relative coordinates anymore, but the ones from the plot/data (the absolute coordinates).
2) Solution with ggplot2::annotation_custom()
Note that, cowplot::draw_label() uses ggplot2::annotation_custom() under the hood, so it is more or less the same annotation technique, but bit more verbose. We need to set clipping off. This time we do not use the relative coordinates anymore, but the ones from the plot/data (the absolute coordinates).
Building upon the p plot example from above:
min_y <- 24
step <- 1 # dictates the line spacing; need to play with it until you get it right
ys <- seq(from = min_y + step * 4, to = min_y, by = -step)
x <- 6.2
# set clipping off - allows plotting anywhere on the canvas
pp <- p + coord_cartesian(clip = "off")
for (i in 1:4){
pp <- pp + annotation_custom(grid::textGrob(lines[[i]]),
xmin = x, xmax = x, ymin = ys[i], ymax = ys[i])
}
pp
#> Warning: Using size for a discrete variable is not advised.
Created on 2019-01-15 by the reprex package (v0.2.1)
I have a dataframe like so:
set.seed(453)
year= as.factor(c(rep("1998", 20), rep("1999", 16)))
lepsp= c(letters[seq(from = 1, to = 20 )], c('a','b','c'),letters[seq(from =8, to = 20 )])
freq= c(sample(1:15, 20, replace=T), sample(1:18, 16,replace=T))
df<-data.frame(year, lepsp, freq)
df<-
df %>%
group_by(year) %>%
mutate(rank = dense_rank(-freq))
Frequencies freq of each lepsp within each year are ranked in the rank column. Larger freq values correspond to the smallest rank value and smaller freq values have the largest rank values. Some rankings are repeated if levels of lepsp have the same abundance.
I would like to split the df into multiple subsets by year. Then I would like to plot each subsetted dataframe in a multipanel figure. Essentially this is to create species abundance curves. The x-axis would be rank and the yaxis needs to be freq.
In my real dataframe I have 22 years of data. I would prefer the graphs to be displayed as 2 columns of 4 rows for a total of 8 graphs per page. Essentially I would have to repeat the solution offered here 3 times.
I also need to demarcate the 25%, 50% and 75% quartiles with vertical lines to look like this (desired result):
It would be great if each graph specified the year to which it belonged, but since all axis are the same name, I do not want x and y labels to be repeated for each graph.
I have tried to plot multiple lines on the same graph but it gets messy.
year.vec<-unique(df$year)
plot(sort(df$freq[df$year==year.vec[1]],
decreasing=TRUE),bg=1,type="b", ylab="Abundance", xlab="Rank",
pch=21, ylim=c(0, max(df$freq)))
for (i in 2:22){
points(sort(df$freq[df$year==year.vec[i]], decreasing=TRUE), bg=i,
type="b", pch=21)
}
legend("topright", legend=year.vec, pt.bg=1:22, pch=21)
I have also tried a loop, however it does not produce an output and is missing some of the arguments I would like to include:
jpeg('pract.jpg')
par(mfrow = c(6, 4)) # 4 rows and 2 columns
for (i in unique(levels(year))) {
plot(df$rank,df$freq, type="p", main = i)
}
dev.off()
Update
(Attempted result)
I found the following code after my post which gets me a little closer, but is still missing all the features I would like:
library(reshape2)
library(ggplot2)
library (ggthemes)
x <- ggplot(data = df2, aes(x = rank, y = rabun)) +
geom_point(aes(fill = "dodgerblue4")) +
theme_few() +
ylab("Abundance") + xlab("Rank") +
theme(axis.title.x = element_text(size = 15),
axis.title.y = element_text(size = 15),
axis.text.x = element_text(size = 15),
axis.text.y = element_text(size = 15),
plot.title = element_blank(), # we don't want individual plot titles as the facet "strip" will give us this
legend.position = "none", # we don't want a legend either
panel.border = element_rect(fill = NA, color = "darkgrey", size = 1.25, linetype = "solid"),
axis.ticks = element_line(colour = 'darkgrey', size = 1.25, linetype = 'solid')) # here, I just alter to colour and thickness of the plot outline and tick marks. You generally have to do this when faceting, as well as alter the text sizes (= element_text() in theme also)
x
x <- x + facet_wrap( ~ year, ncol = 4)
x
I prefer base R to modify graph features, and have not been able to find a method using base R that meets all my criteria above. Any help is appreciated.
Here's a ggplot approach. First off, I made some more data to get the 3x2 layout:
df = rbind(df, mutate(df, year = year + 4), mutate(df, year = year + 8))
Then We do a little manipulation to generate the quantiles and labels by group:
df_summ =
df %>% group_by(year) %>%
do(as.data.frame(t(quantile(.$rank, probs = c(0, 0.25, 0.5, 0.75)))))
names(df_summ)[2:5] = paste0("q", 0:3)
df_summ_long = gather(df_summ, key = "q", value = "value", -year) %>%
inner_join(data.frame(q = paste0("q", 0:3), lab = c("Common", "Rare-75% -->", "Rare-50% -->", "Rare-25% -->"), stringsAsFactors = FALSE))
With the data in good shape, plotting is fairly simple:
library(ggthemes)
library(ggplot2)
ggplot(df, aes(x = rank, y = freq)) +
geom_point() +
theme_few() +
labs(y = "Abundance (% of total)", x = "Rank") +
geom_vline(data = df_summ_long[df_summ_long$q != "q0", ], aes(xintercept = value), linetype = 4, size = 0.2) +
geom_text(data = df_summ_long, aes(x = value, y = Inf, label = lab), size = 3, vjust = 1.2, hjust = 0) +
facet_wrap(~ year, ncol = 2)
There's some work left to do - mostly in the rarity text overlapping. It might not be such an issue with your actual data, but if it is you could pull the max y values into df_summ_long and stagger them a little bit, actually using y coordinates instead of just Inf to get it at the top like I did.
I'm looking to replicate the design of the first plot seen in this post:
http://stats.blogoverflow.com/2011/12/andyw-says-small-multiples-are-the-most-underused-data-visualization/
This was taken from [1], but in that book there is no code shown for the figure.
This figure in the link has no legend, and instead shows the labels 'IN', 'MO', ect. in a staggered fashion positioned at the height of the respective line they represent. I know how to make plots using ggplot2, but the specific issue I'm having is writing code to make the labels on the right side of the figure like that. Could someone demonstrate how to do this?
1: Carr, Daniel & Linda Pickle. 2009. Visualizing Data Patterns with Micromaps. Boca Rotan, FL. CRC Press.
This is no easy task, especially to do programmatically. It involves working with grobs, nothing hard, but usually done manually on finished products, rather than generated procedurally.
Data
df <- read.table(text = 'Samples Day.1 Day.2 Day.3 Day.4
Seradigma335 1.2875 1.350 1.850 1.6125
Seradigma322 0.9375 2.400 1.487 1.8125
Sigma 1.1250 1.962 1.237 2.0500
Shapiro_red 0.7750 1.575 1.362 1.0125
Shapiro_w/red 0.7750 1.837 0.975 0.8250', header = T)
(taken from another question here on SO)
Code
library(tidyr)
library(ggplot2)
tidy_df <- df %>%
gather('Day','Value', -Samples)
Simple plot
g <- ggplot(tidy_df) +
geom_line(aes(Day, Value, group = Samples), show.legend = F) +
scale_x_discrete(expand = c(.02,0)) +
scale_y_continuous(limits = c(0,3)) +
theme_grey() +
theme(plot.margin = unit(c(0,.2,0,0), 'npc'),
panel.border = element_rect(color = 'black', fill = NA),
axis.ticks = element_blank()
)
Note the right margin (set with plot.margin = unit(c(0,.2,0,0), 'npc'))
Labels
library(cowplot)
g <- g +
draw_label(label = df$Samples[1], x = 4.1, y = df$Day.4[1], hjust = 0) +
draw_label(label = df$Samples[2], x = 4.1, y = df$Day.4[2], hjust = 0) +
draw_label(label = df$Samples[3], x = 4.1, y = df$Day.4[3], hjust = 0) +
draw_label(label = df$Samples[4], x = 4.1, y = df$Day.4[4], hjust = 0) +
draw_label(label = df$Samples[5], x = 4.1, y = df$Day.4[5], hjust = 0)
We added the labels directly to the plot (as opposed to a grob, see documentation), with an x of 4.1, so it's actually just outside the panel (Day.4 has an x of 4) and covered by the margin (it's called clipping).
Remove clipping
# transform to grob
gt <- ggplotGrob(g)
# set panel's clipping to off
gt$layout$clip[gt$layout$name == "panel"] <- "off"
# draw the grob
ggdraw(gt)
Notes
We can kind of speed up the label creation with a for loop:
for (i in 1:nrow(df)) {
g <- g + draw_label(label = df$Samples[i], x = 4.1, y = df$Day.4[i], hjust = 0)
}
But we can't really map to vars as we are used to with aesthetics.
As I said at the beginning these methods require quite a bit of work to find the right balance and positioning, that can't certainly be expected for a quick plot, but if something has to be published it may be worth.
You can use geom_label with the nudge_x argument like this:
library(data.table) # I always use data.table but not required
library(ggplot2)
a <- c(1:10)
b <- c(seq(1,20,2))
c <- c(seq(1,30,3))
d <- c(1:10)
aa <- data.table(a,b,c,d)
ggplot(aa)+
geom_line(aes(a,b))+
geom_line(aes(a,c))+
geom_line(aes(a,d))+
geom_label(aes(10,10), label = "line 1", nudge_x = 1)+
geom_label(aes(10,19), label = "line 2", nudge_x = 1)+
geom_label(aes(10,29), label = "line 3", nudge_x = 1)
You can directly input the ordered pair that corresponds to the desired location of your label. The plot looks like this:
I am trying to recreate the basic temperature trend of this Paleotemperature figure in R. (Original image and data.)
The scale interval of the x-axis changes from 100s of millions of years to 10s of millions to millions, and then to 100s of thousands, and so on, but the ticks marks are evenly spaced. The original figure was carefully laid out in five separate graphs in Excel to achieve the spacing. I am trying to get the same x-axis layout in R.
I have tried two basic approaches. The first approach was to use par(fig=c(x1,x2,y1,y2)) to make five separate graphs placed side by side. The problem is that the intervals among tick marks is not uniform and labels overlap.
#1
par(fig=c(0,0.2,0,0.5), mar=c(3,4,0,0))
plot(paleo1$T ~ paleo1$Years, col='red3', xlim=c(540,60), bty='l',type='l', ylim=c(-6,15), ylab='Temperature Anomaly (°C)')
abline(0,0,col='gray')
#2
par(fig=c(0.185,0.4,0,0.5), mar=c(3,0,0,0), new=TRUE)
plot(paleo2$T ~ paleo2$Year, col='forestgreen', axes=F, type='l', xlim=c(60,5), ylab='', ylim=c(-6,15))
axis(1, xlim=c(60,5))
abline(0,0,col='gray')
#etc.
The second approach (and my preferred approach, if possible) is to plot the data in a single graph. This causes non-uniform distance among tick marks because they follow their "natural" order. (Edit: example data added as well as link to full data set.).
years <- c(500,400,300,200,100,60,50,40,30,20,10,5,4,3,2,1)
temps <- c(13.66, 8.6, -2.16, 3.94, 8.44, 5.28, 12.98, 8.6, 5, 5.34, 3.66, 2.65, 0.78, 0.25, -1.51, -1.77)
test <- data.frame(years, temps)
names(test) <- c('Year','T')
# The full csv file can be used with this line instead of the above.
# test <- read.csv('https://www.dropbox.com/s/u0dfmlvzk0ztpkv/paleo_test.csv?dl=1')
plot(test$T ~ test$Year, type='l', xaxt='n', xlim=c(520,1), bty='l', ylim=c(-5,15), xlab="", ylab='Temperature Anomaly (°C)')
ticklabels = c(500,400,300,200,100,60,50,40,30,20,10,5,4,3,2,1)
axis(1, at=ticklabels)
Adding log='x' to plot comes closest but the intervals between ticks are still not even and the actual scale is, of course, not a log scale.
My examples only go down to 1 million years because I am trying to solve the problem first but my the goal is to match the original figure above. I am open to ggplot solutions although I am only fleetingly familiar with it.
I will strike a different note by saying: don't. In my experience, the harder something is to do in ggplot2 (and to a lesser extent, base graphics), the less likely it is to be a good idea. Here, the problem is that consistently changing the scales like is more likely to lead the viewer astray.
Instead, I recommend using a log scale and manually setting your cutoffs.
First, here is some longer data, just to cover the full likely scale of your question:
longerTest <-
data.frame(
Year = rep(1:9, times = 6) * rep(10^(3:8), each = 9)
, T = rnorm(6*9))
Then, I picked some cutoffs to place the labels at in the plot. These can be adjusted to whatever you want, but are at least a starting point for reasonably spaced ticks:
forLabels <-
rep(c(1,2,5), times = 6) * rep(10^(3:8), each = 3)
Then, I manually set some things to append to the labels. Thus, instead of having to say "Thousands of years" under part of the axis, you can just label those with a "k". Each order of magnitude gets a value. Nnote that the names are just to help keep things straight: below I just use the index to match. So, if you skip the first two, you will need to adjust the indexing below.
toAppend <-
c("1" = "0"
, "2" = "00"
, "3" = "k"
, "4" = "0k"
, "5" = "00k"
, "6" = "M"
, "7" = "0M"
, "8" = "00M")
Then, I change my forLabels into the text versions I want to use by grabbing the first digit, and concatenating with the correct suffix from above.
myLabels <-
paste0(
substr(as.character(forLabels), 1, 1)
, toAppend[floor(log10(forLabels))]
)
This gives:
[1] "1k" "2k" "5k" "10k" "20k" "50k" "100k" "200k" "500k" "1M" "2M"
[12] "5M" "10M" "20M" "50M" "100M" "200M" "500M"
You could likely use these for base graphics, but getting the log scale to do what you want is sometimes tricky. Instead, since you said you are open to a ggplot2 solution, I grabbed this modified log scale from this answer to get a log scale that runs from big to small:
library("scales")
reverselog_trans <- function(base = exp(1)) {
trans <- function(x) -log(x, base)
inv <- function(x) base^(-x)
trans_new(paste0("reverselog-", format(base)), trans, inv,
log_breaks(base = base),
domain = c(1e-100, Inf))
}
Then, just pass in the data, and set the scale with the desired breaks:
ggplot(longerTest
, aes(x = Year
, y = T)) +
geom_line() +
scale_x_continuous(
breaks = forLabels
, labels = myLabels
, trans=reverselog_trans(10)
)
Gives:
Which has a consistent scale, but is labelled far more uniformly.
If you want colors, you can do that using cut:
ggplot(longerTest
, aes(x = Year
, y = T
, col = cut(log10(Year)
, breaks = c(3,6,9)
, labels = c("Thousands", "Millions")
, include.lowest = TRUE)
, group = 1
)) +
geom_line() +
scale_x_continuous(
breaks = forLabels
, labels = myLabels
, trans=reverselog_trans(10)
) +
scale_color_brewer(palette = "Set1"
, name = "How long ago?")
Here is a version using facet_wrap to create different scales. I used 6 here, but you can set whatever thresholds you want instead.
longerTest$Period <-
cut(log10(longerTest$Year)
, breaks = c(3, 4, 5, 6, 7, 8, 9)
, labels = paste(rep(c("", "Ten", "Hundred"), times = 2)
, rep(c("Thousands", "Millions"), each = 3) )
, include.lowest = TRUE)
longerTest$Period <-
factor(longerTest$Period
, levels = rev(levels(longerTest$Period)))
newBreaks <-
rep(c(2,4,6,8, 10), times = 6) * rep(10^(3:8), each = 5)
newLabels <-
paste0(
substr(as.character(newBreaks), 1, 1)
, toAppend[floor(log10(newBreaks))]
)
ggplot(longerTest
, aes(x = Year
, y = T
)) +
geom_line() +
facet_wrap(~Period, scales = "free_x") +
scale_x_reverse(
breaks = newBreaks
, labels = newLabels
)
gives:
Here is a start:
#define the panels
breaks <- c(-Inf, 8, 80, Inf)
test$panel <- cut(test$Year, breaks, labels = FALSE)
test$panel <- ordered(test$panel, levels = unique(test$panel))
#for correct scales
dummydat <- data.frame(Year = c(0, 8, 8, 80, 80, max(test$Year)),
T = mean(test$T),
panel = ordered(rep(1:3, each = 2), levels = levels(test$panel)))
library(ggplot2)
ggplot(test, aes(x = Year, y = T, color = panel)) +
geom_line() +
geom_blank(data = dummydat) + #for correct scales
facet_wrap(~ panel, nrow = 1, scales = "free_x") +
theme_minimal() + #choose a theme you like
theme(legend.position = "none", #and customize it
panel.spacing.x = unit(0, "cm"),
strip.text = element_blank() ,
strip.background = element_blank()) +
scale_x_reverse(expand = c(0, 0))
Here's a basic example of doing it with separate plots using gridExtra. This may be useful to combine with extra grobs, for instance to create the epoch boxes across the top (not done here). If so desired, this might be best combined with Roland's solution.
# ggplot with gridExtra
library('ggplot2')
library('gridExtra')
library('grid')
d1 <- test[1:5, ]
d2 <- test[6:11, ]
d3 <- test[12:16, ]
plot1 <- ggplot(d1, aes(y = T, x = seq(1:nrow(d1)))) +
geom_line() +
ylim(c(-5, 15)) +
theme_minimal() +
theme(axis.title.x = element_blank(),
plot.margin = unit(c(1,0,1,1), "cm")) +
scale_x_continuous(breaks=)
plot2 <- ggplot(d2, aes(y = T, x = seq(1:nrow(d2)))) +
geom_line() +
ylim(c(-5, 15)) +
theme_minimal() +
theme(axis.text.y = element_blank(),
axis.title.y = element_blank(),
axis.ticks.y = element_blank(),
axis.title.x = element_blank(),
plot.margin = unit(c(1,0,1,0), "cm"))
plot3 <- ggplot(d3, aes(y = T, x = seq(1:nrow(d3)))) +
geom_line() +
theme_minimal() +
theme(axis.text.y = element_blank(),
axis.title.y = element_blank(),
axis.ticks.y = element_blank(),
axis.title.x = element_blank(),
plot.margin = unit(c(1,0,1,0), "cm")) +
ylim(c(-5, 15))
# put together
grid.arrange(plot1, plot2, plot3, nrow = 1,
widths = c(1.5,1,1)) # allow extra width for first plot which has y axis