Programmatically position ggplot labels - r

I am creating a lot of charts programmatically in R using ggplot2 and have everything working perfectly except the position of bar labels.
This requires inputs of the plot height, y axis scale and text size.
Example (stripped down) plot code:
testInput <- data.frame("xAxis" = c("first", "second", "third"), "yAxis" = c(20, 200, 60))
# Changeable variables
yMax <- 220
plotHeight <- 5
textSize <- 4
# Set up labels
geomTextList <- {
textHeightRatio <- textSize / height
maxHeightRatio <- yMax / height
values <- testInput[["yAxis"]]
### THIS IS THE FORMULA NEEDING UPDATING
testInput[["labelPositions"]] <- values + 5 # # Should instead be a formula eg. (x * height) + (y * textSize) + (z * yMax)?
list(
ggplot2::geom_text(data = testInput, ggplot2::aes_string(x = "xAxis", y = "labelPositions", label = "yAxis"), hjust = 0.5, size = textSize)
)
}
# Create plot
outputPlot <- ggplot2::ggplot(testInput) +
ggplot2::geom_bar(data = testInput, ggplot2::aes_string(x = "xAxis", y = "yAxis"), stat = "identity", position = "dodge", width = 0.5) +
geomTextList +
ggplot2::scale_y_continuous(breaks = seq(0, yMax, yInterval), limits = c(0, yMax))
ggplot2::ggsave(filename = "test.png", plot = outputPlot, width = 4, height = plotHeight, device = "png")
I have tried various combinations of coefficients for the formula, but suspect that at leat one of the factors isn't linear. If this is purely a statistical problem, I could take it to Cross-Validation, but I wondered whether anyone had already solved this?

If your problem is with offsetting the text to not overlap the bar while dealing with varying text sizes, just use vjust which is already proportional to the text size. A value of 0 will make the bottom of the text touch the bar, and a small negative value will give you some space between them:
testInput <- data.frame("xAxis" = c("first", "second", "third"), "yAxis" = c(20, 200, 60))
# Changeable variables
yMax <- 220
plotHeight <- 5
textSize <- 4
# Set up labels
geomTextList <- {
values <- testInput[["yAxis"]]
testInput[["labelPositions"]] <- values # Use the exact value
list(
ggplot2::geom_text(
data = testInput,
# vjust provides proportional offset
ggplot2::aes_string(x = "xAxis", y = "labelPositions", label = "yAxis"),
hjust = 0.5, vjust = -0.15, size = textSize
)
)
}
# Create plot
outputPlot <- ggplot2::ggplot(testInput) +
ggplot2::geom_bar(data = testInput, ggplot2::aes_string(x = "xAxis", y = "yAxis"), stat = "identity", position = "dodge", width = 0.5) +
geomTextList +
ggplot2::scale_y_continuous(limits = c(0, yMax))
ggplot2::ggsave(filename = "test.png", plot = outputPlot, width = 4, height = plotHeight, device = "png")

Related

ggplot, geom_density_ridges2 and row height

With ggplot2 and geom_density_ridges2, I try to plot two graphs. One with 2 rows and one with 9 rows.
On the two graphs I would like to keep the same height for each row. So the second graph should have the same width but it should be more than 4 times taller.
Unfortunately, Rstudio or ggsave give my graphs withs the same scale (same width, same height).
Code
data_df = data.frame(text = character(), position = numeric())
# Plot
theme_set(theme_bw())
g = data_df %>%
ggplot( aes(y=text, x=position, fill=text) ) +
coord_cartesian(xlim = c(0, max_position)) +
geom_density_ridges2(alpha=1, stat="binline", scale=0.95, bins=200, show.legend = FALSE) +
theme_ridges(font_size = 8, grid = TRUE, font_family = "",line_size = 0.5) +
labs(x = "positions", y = author)
# Save image
image = paste0(author, ".png")
unlink(image)
ggsave(
image,
plot = g,
device = "png",
path = "graphs/",
units = "mm",
width = 100,
scale = 1,
dpi = 320,
limitsize = FALSE
)
Is it possible to fix the height of the rows ?
Maybe this or another approach with patchwork could solve the problem?
library(patchwork)
p1 = ggplot(mtcars, aes(x = mpg, y =cyl)) +
geom_point()
p1 + (p1 / plot_spacer() / plot_spacer() / plot_spacer())

How is the line width (size) defined in ggplot2?

The line width (size) aesthetics in ggplot2 seems to print approximately 2.13 pt wider lines to a pdf (the experiment was done in Adobe Illustrator with a Mac):
library(ggplot2)
dt <- data.frame(id = rep(letters[1:5], each = 3), x = rep(seq(1:3), 5), y = rep(seq(1:5), each = 3), s = rep(c(0.05, 0.1, 0.5, 1, 72.27/96*0.5), each = 3))
lns <- split(dt, dt$id)
ggplot() + geom_line(data = lns[[1]], aes(x = x, y = y), size = unique(lns[[1]]$s)) +
geom_text(data = lns[[1]], y = unique(lns[[1]]$y), x = 3.5, label = paste("Width in ggplot =", unique(lns[[1]]$s))) +
geom_line(data = lns[[2]], aes(x = x, y = y), size = unique(lns[[2]]$s)) +
geom_text(data = lns[[2]], y = unique(lns[[2]]$y), x = 3.5, label = paste("Width in ggplot =", unique(lns[[2]]$s))) +
geom_line(data = lns[[3]], aes(x = x, y = y), size = unique(lns[[3]]$s)) +
geom_text(data = lns[[3]], y = unique(lns[[3]]$y), x = 3.5, label = paste("Width in ggplot =", unique(lns[[3]]$s))) +
geom_line(data = lns[[4]], aes(x = x, y = y), size = unique(lns[[4]]$s)) +
geom_text(data = lns[[4]], y = unique(lns[[4]]$y), x = 3.5, label = paste("Width in ggplot =", unique(lns[[4]]$s))) +
geom_line(data = lns[[5]], aes(x = x, y = y), size = unique(lns[[5]]$s)) +
geom_text(data = lns[[5]], y = unique(lns[[5]]$y), x = 3.5, label = paste("Width in ggplot =", unique(lns[[5]]$s))) +
xlim(1,4) + theme_void()
ggsave("linetest.pdf", width = 8, height = 2)
# Device size does not affect line width:
ggsave("linetest2.pdf", width = 10, height = 6)
I read that one should multiply the line width by 72.27/96 to get a line width in pt, but the experiment above gives me a line width of 0.8 pt, when I try to get 0.5 pt.
As #Pascal points out, the line width does not seem to follow the pt to mm conversion that works for fonts and was defined by #hadley in one of the comments. I.e. the line width does not appear to be defined by "the magic number" 1/0.352777778.
What is the equation behind line width for ggplot2?
You had all the pieces in your post already. First, ggplot2 multiplies the size setting by ggplot2::.pt, which is defined as 72.27/25.4 = 2.845276 (line 165 in geom-.r):
> ggplot2::.pt
[1] 2.845276
Then, as you state, you need to multiply the resulting value by 72.27/96 to convert from R pixels to points. Thus the conversion factor is:
> ggplot2::.pt*72.27/96
[1] 2.141959
As you can see, ggplot2 size = 1 corresponds to approximately 2.14pt, and similarly 0.8 pt corresponds to 0.8/2.141959 = 0.3734899 in ggplot2 size units.

how do i control geom_errorbar width by symbol size?

In the below example i have a simple plot of mean values with standard deviation error bars for both X and Y axis. I would like to control the error bar width so both axis always plot the same size.
Ideally I would like the bar width/height to be the same size as the symbols (i.e. in this case cex = 3) irrelevant of the final plot dimensions. Is there a way to do this?
# Load required packages:
library(ggplot2)
library(plyr)
# Create dataset:
DF <- data.frame(
group = rep(c("a", "b", "c", "d"),each=10),
Ydata = c(seq(1,10,1),seq(5,50,5),seq(20,11,-1),seq(0.3,3,0.3)),
Xdata = c(seq(1,10,1),seq(5,50,5),seq(20,11,-1),seq(0.3,3,0.3)))
# Summarise data:
subDF <- ddply(DF, .(group), summarise,
X = mean(Xdata), Y = mean(Ydata),
X_sd = sd(Xdata, na.rm = T), Y_sd = sd(Ydata))
# Plot data with error bars:
ggplot(subDF, aes(x = X, y = Y)) +
geom_errorbar(aes(x = X,
ymin = (Y-Y_sd),
ymax = (Y+Y_sd)),
width = 1, size = 0.5) +
geom_errorbarh(aes(x = X,
xmin = (X-X_sd),
xmax = (X+X_sd)),
height = 1, size = 0.5) +
geom_point(cex = 3)
Looks fine when plotted at 1:1 ratio (500x500):
but the errorbar width/heigh look different when plotted at 600x200
I'd use a size-variable so you can control all of the 3 plot elements at the same time
geom_size <- 3
# Plot data with error bars:
ggplot(subDF, aes(x = X, y = Y)) +
geom_errorbar(aes(x = X,
ymin = (Y-Y_sd),
ymax = (Y+Y_sd)),
width = 1, size = geom_size) +
geom_errorbarh(aes(x = X,
xmin = (X-X_sd),
xmax = (X+X_sd)),
height = 1, size = geom_size) +
geom_point(cex = geom_size)
This is just building on brettljausn earlier answer. You can control the ratio of your plot with a variable as well. This will only work when you actually save the file with ggsave() not in any preview. I also used size to control the size of the point. It scaled nicer with the error bar ends.
plotheight = 100
plotratio = 3
geomsize = 3
plot = ggplot(subDF, aes(x = X, y = Y)) +
geom_errorbar(aes(x = X,
ymin = (Y-Y_sd),
ymax = (Y+Y_sd)),
width = .5 * geomsize / plotratio, size = 0.5) +
geom_errorbarh(aes(x = X,
xmin = (X-X_sd),
xmax = (X+X_sd)),
height = .5 * geomsize, size = 0.5) +
geom_point(size = geomsize)
ggsave(filename = "~/Desktop/plot.png", plot = plot,
width = plotheight * plotratio, height = plotheight, units = "mm")
Change the plotheight, plotratio, and geomsize to whatever you need it to be to look good. You will have to change the filename in the one but last line to get the file in the folder of your choice.

Dynamic data point label Positioning in ggmap

I'm working with the ggmap package in R and I am relatively new to geospatial data visualizations. I have a data frame of eleven latitude and longitude pairs that I would like to plot on a map, each with a label. Here is the dummy data:
lat<- c(47.597157,47.656322,47.685928,47.752365,47.689297,47.628128,47.627071,47.586349,47.512684,47.571232,47.562283)
lon<-c(-122.312187,-122.318039,-122.31472,-122.345345,-122.377045,-122.370117,-122.368462,-122.331734,-122.294395,-122.33606,-122.379745)
labels<-c("Site 1A","Site 1B","Site 1C","Site 2A","Site 3A","Site 1D","Site 2C","Site 1E","Site 2B","Site 1G","Site 2G")
df<-data.frame(lat,lon,labels)
Now I use annotate to create the data point labels and plot these on a map;
map.data <- get_map(location = c(lon=-122.3485,lat=47.6200),
maptype = 'roadmap', zoom = 11)
pointLabels<-annotate("text",x=uniqueReach$lon,y=c(uniqueReach$lat),size=5,font=3,fontface="bold",family="Helvetica",label=as.vector(uniqueReach$label))
dataPlot <- ggmap(map.data) +
geom_point(data = uniqueReach,aes(x = df$lon, y = df$lat), alpha = 1,fill="red",pch=21,size = 6) + labs(x = 'Longitude', y = 'Latitude')+pointLabels
This produces a plot of the data points
As you can see, there are two data points that overlap around (-122.44,47.63), and their labels also overlap. Now I can manually add a shift to each label point to keep the labels from overlapping (see this post), but this is not a great technique when I need to produce many of these plots for different sets of latitude and longitude pairs.
Is there a way I can automatically keep data labels from overlapping? I realize whether the labels overlap is dependent on the actual figure size, so I'm open to fixing the figure size at certain dimensions if need be. Thank you in advance for any insights!
EDIT
The following is modified code using the answer given by Sandy Mupratt
# Defining function to draw text boxes
draw.rects.modified <- function(d,...){
if(is.null(d$box.color))d$box.color <- NA
if(is.null(d$fill))d$fill <- "grey95"
for(i in 1:nrow(d)){
with(d[i,],{
grid.rect(gp = gpar(col = box.color, fill = fill,alpha=0.7),
vp = viewport(x, y, w, h, "cm", c(hjust, vjust=0.25), angle=rot))
})
}
d
}
# Defining function to determine text box borders
enlarge.box.modified <- function(d,...){
if(!"h"%in%names(d))stop("need to have already calculated height and width.")
calc.borders(within(d,{
w <- 0.9*w
h <- 1.1*h
}))
}
Generating the plot:
dataplot<-ggmap(map.data) +
geom_point(data = df,aes(x = df$lon, y = df$lat),
alpha = 1, fill = "red", pch = 21, size = 6) +
labs(x = 'Longitude', y = 'Latitude') +
geom_dl(data = df,
aes(label = labels),
list(dl.trans(y = y + 0.3), "boxes", cex = .8, fontface = "bold"))
This is a MUCH more readable plot, but with one outstanding issue. You'll note that the label "Site 1E" begins to overlap the data point associated with "Site 1A". Does directlabels have a way with dealing with labels overlapping data points belonging to another label?
A final question I have regarding this is how can I plot several duplicate labels using this method. Suppose the labels for data.frame are all the same:
df$labels<-rep("test",dim(df)[1])
When I use the same code, directlabels removes the duplicate label names:
But I want each data point to have a label of "test". Any suggestions?
Edit 11 Jan 2016: using ggrepel package with ggplot2 v2.0.0 and ggmap v2.6
ggrepel works well. In the code below, geom_label_repel() shows some of the available parameters.
lat <- c(47.597157,47.656322,47.685928,47.752365,47.689297,47.628128,47.627071,
47.586349,47.512684,47.571232,47.562283)
lon <- c(-122.312187,-122.318039,-122.31472,-122.345345,-122.377045,-122.370117,
-122.368462,-122.331734,-122.294395,-122.33606,-122.379745)
labels <- c("Site 1A","Site 1B","Site 1C","Site 2A","Site 3A","Site 1D",
"Site 2C","Site 1E","Site 2B","Site 1G","Site 2G")
df <- data.frame(lat,lon,labels)
library(ggmap)
library(ggrepel)
library(grid)
map.data <- get_map(location = c(lon = -122.3485, lat = 47.6200),
maptype = 'roadmap', zoom = 11)
ggmap(map.data) +
geom_point(data = df, aes(x = lon, y = lat),
alpha = 1, fill = "red", pch = 21, size = 5) +
labs(x = 'Longitude', y = 'Latitude') +
geom_label_repel(data = df, aes(x = lon, y = lat, label = labels),
fill = "white", box.padding = unit(.4, "lines"),
label.padding = unit(.15, "lines"),
segment.color = "red", segment.size = 1)
Original answer but updated for ggplot v2.0.0 and ggmap v2.6
If there is only a small number of overlapping points, then using the "top.bumpup" or "top.bumptwice" method from the direct labels package can separate them. In the code below, I use the geom_dl() function to create and position the labels.
lat <- c(47.597157,47.656322,47.685928,47.752365,47.689297,47.628128,47.627071,
47.586349,47.512684,47.571232,47.562283)
lon <- c(-122.312187,-122.318039,-122.31472,-122.345345,-122.377045,-122.370117,
-122.368462,-122.331734,-122.294395,-122.33606,-122.379745)
labels <- c("Site 1A","Site 1B","Site 1C","Site 2A","Site 3A","Site 1D",
"Site 2C","Site 1E","Site 2B","Site 1G","Site 2G")
df <- data.frame(lat,lon,labels)
library(ggmap)
library(directlabels)
map.data <- get_map(location = c(lon = -122.3485, lat = 47.6200),
maptype = 'roadmap', zoom = 11)
ggmap(map.data) +
geom_point(data = df, aes(x = lon, y = lat),
alpha = 1, fill = "red", pch = 21, size = 6) +
labs(x = 'Longitude', y = 'Latitude') +
geom_dl(data = df, aes(label = labels), method = list(dl.trans(y = y + 0.2),
"top.bumptwice", cex = .8, fontface = "bold", family = "Helvetica"))
Edit: Adjusting for underlying labels
A couple of methods spring to mind, but neither is entirely satisfactory. But I don't think you will find a solution that will apply to all situations.
Adding a background colour to each label
This is a bit of a workaround, but directlabels has a "box" function (i.e., the labels are placed inside a box). It looks like one should be able to modify background fill and border colour in the list in geom_dl, but I can't get it to work. Instead, I take two functions (draw.rects and enlarge.box) from the directlabels website; modify them; and combine the modified functions with the "top.bumptwice" method.
draw.rects.modified <- function(d,...){
if(is.null(d$box.color))d$box.color <- NA
if(is.null(d$fill))d$fill <- "grey95"
for(i in 1:nrow(d)){
with(d[i,],{
grid.rect(gp = gpar(col = box.color, fill = fill),
vp = viewport(x, y, w, h, "cm", c(hjust, vjust=0.25), angle=rot))
})
}
d
}
enlarge.box.modified <- function(d,...){
if(!"h"%in%names(d))stop("need to have already calculated height and width.")
calc.borders(within(d,{
w <- 0.9*w
h <- 1.1*h
}))
}
boxes <-
list("top.bumptwice", "calc.boxes", "enlarge.box.modified", "draw.rects.modified")
ggmap(map.data) +
geom_point(data = df,aes(x = lon, y = lat),
alpha = 1, fill = "red", pch = 21, size = 6) +
labs(x = 'Longitude', y = 'Latitude') +
geom_dl(data = df, aes(label = labels), method = list(dl.trans(y = y + 0.3),
"boxes", cex = .8, fontface = "bold"))
Add an outline to each label
Another option is to use this method to give each label an outline, although it is not immediately clear how it would work with directlabels. Therefore, it would need a manual adjustment of the coordinates, or a search of the dataframe for coordinates that are within a given threshold then adjust. However, here, I use the pointLabel function from maptools package to position the labels. No guarantee that it will work every time, but I got a reasonable result with your data. There is a random element built into it, so you can run it a few time until you get a reasonable result. Also, note that it positions labels in a base plot. The label locations then have to extracted and loaded into the ggplot/ggmap.
lat<- c(47.597157,47.656322,47.685928,47.752365,47.689297,47.628128,47.627071,47.586349,47.512684,47.571232,47.562283)
lon<-c(-122.312187,-122.318039,-122.31472,-122.345345,-122.377045,-122.370117,-122.368462,-122.331734,-122.294395,-122.33606,-122.379745)
labels<-c("Site 1A","Site 1B","Site 1C","Site 2A","Site 3A","Site 1D","Site 2C","Site 1E","Site 2B","Site 1G","Site 2G")
df<-data.frame(lat,lon,labels)
library(ggmap)
library(maptools) # pointLabel function
# Get map
map.data <- get_map(location = c(lon=-122.3485,lat=47.6200),
maptype = 'roadmap', zoom = 11)
bb = t(attr(map.data, "bb")) # the map's bounding box
# Base plot to plot points and using pointLabels() to position labels
plot(df$lon, df$lat, pch = 20, cex = 5, col = "red", xlim = bb[c(2,4)], ylim = bb[c(1,3)])
new = pointLabel(df$lon, df$lat, df$labels, pos = 4, offset = 0.5, cex = 1)
new = as.data.frame(new)
new$labels = df$labels
## Draw the map
map = ggmap(map.data) +
geom_point(data = df, aes(x = lon, y = lat),
alpha = 1, fill = "red", pch = 21, size = 5) +
labs(x = 'Longitude', y = 'Latitude')
## Draw the label outlines
theta <- seq(pi/16, 2*pi, length.out=32)
xo <- diff(bb[c(2,4)])/400
yo <- diff(bb[c(1,3)])/400
for(i in theta) {
map <- map + geom_text(data = new,
aes_(x = new$x + .01 + cos(i) * xo, y = new$y + sin(i) * yo, label = labels),
size = 3, colour = 'black', vjust = .5, hjust = .8)
}
# Draw the labels
map +
geom_text(data = new, aes(x = x + .01, y = y, label=labels),
size = 3, colour = 'white', vjust = .5, hjust = .8)

textGrob placement relative to changing plot size

I'm producing a whole pile of graphs of changing sizes. I want each graph to display a symbol (say, asterisk) at a specific point on the graph margin (top y-axis value), regardless of plot size. Right now I do it manually by defining x/y for each textGrob, but there has got to be a better way.
Plot size is determined by number of categories in the dataset (toy data below). Ideally, the output plots would have identical panel sizes (I'm assuming that can be controlled through defining margin sizes in inches and adding that value to the height parameter?). Widths don't usually change, but it would be nice to automate both x and y placements based on the defined device width (and plot margins).
Thanks so much!
library(ggplot2)
library(gridExtra)
set.seed(123)
df <- data.frame(x = rnorm(20, 0, 1), y = rnorm(20, 0, 1), category = rep(c("a", "b"), each = 10))
## plot 1
sub <- df[df$category == "a",]
height = 2*length(unique(sub$category))
p <- ggplot(sub) +
geom_point(aes(x = x, y = y)) +
facet_grid(category ~ .)
jpeg(filename = "fig1.jpg",
width = 6, height = height, units = "in", pointsize = 12, res = 900,
quality = 100)
g <- arrangeGrob(p, sub = textGrob("*", x = 0.07, y = 10.15, hjust = 0, vjust=0, #### puts the top discharge value; might need to be adjusted manually in following years
gp = gpar(fontsize = 15)))
grid.draw(g)
dev.off()
## plot 2
height = 2*length(unique(df$category))
p <- ggplot(df) +
geom_point(aes(x = x, y = y)) +
facet_grid(category ~ .)
jpeg(filename = "fig2.jpg",
width = 6, height = height, units = "in", pointsize = 12, res = 900,
quality = 100)
g <- arrangeGrob(p, sub = textGrob("*", x = 0.07, y = 23.1, hjust = 0, vjust=0, #### puts the top discharge value; might need to be adjusted manually in following years
gp = gpar(fontsize = 15)))
grid.draw(g)
dev.off()

Resources