Dynamic data point label Positioning in ggmap - r

I'm working with the ggmap package in R and I am relatively new to geospatial data visualizations. I have a data frame of eleven latitude and longitude pairs that I would like to plot on a map, each with a label. Here is the dummy data:
lat<- c(47.597157,47.656322,47.685928,47.752365,47.689297,47.628128,47.627071,47.586349,47.512684,47.571232,47.562283)
lon<-c(-122.312187,-122.318039,-122.31472,-122.345345,-122.377045,-122.370117,-122.368462,-122.331734,-122.294395,-122.33606,-122.379745)
labels<-c("Site 1A","Site 1B","Site 1C","Site 2A","Site 3A","Site 1D","Site 2C","Site 1E","Site 2B","Site 1G","Site 2G")
df<-data.frame(lat,lon,labels)
Now I use annotate to create the data point labels and plot these on a map;
map.data <- get_map(location = c(lon=-122.3485,lat=47.6200),
maptype = 'roadmap', zoom = 11)
pointLabels<-annotate("text",x=uniqueReach$lon,y=c(uniqueReach$lat),size=5,font=3,fontface="bold",family="Helvetica",label=as.vector(uniqueReach$label))
dataPlot <- ggmap(map.data) +
geom_point(data = uniqueReach,aes(x = df$lon, y = df$lat), alpha = 1,fill="red",pch=21,size = 6) + labs(x = 'Longitude', y = 'Latitude')+pointLabels
This produces a plot of the data points
As you can see, there are two data points that overlap around (-122.44,47.63), and their labels also overlap. Now I can manually add a shift to each label point to keep the labels from overlapping (see this post), but this is not a great technique when I need to produce many of these plots for different sets of latitude and longitude pairs.
Is there a way I can automatically keep data labels from overlapping? I realize whether the labels overlap is dependent on the actual figure size, so I'm open to fixing the figure size at certain dimensions if need be. Thank you in advance for any insights!
EDIT
The following is modified code using the answer given by Sandy Mupratt
# Defining function to draw text boxes
draw.rects.modified <- function(d,...){
if(is.null(d$box.color))d$box.color <- NA
if(is.null(d$fill))d$fill <- "grey95"
for(i in 1:nrow(d)){
with(d[i,],{
grid.rect(gp = gpar(col = box.color, fill = fill,alpha=0.7),
vp = viewport(x, y, w, h, "cm", c(hjust, vjust=0.25), angle=rot))
})
}
d
}
# Defining function to determine text box borders
enlarge.box.modified <- function(d,...){
if(!"h"%in%names(d))stop("need to have already calculated height and width.")
calc.borders(within(d,{
w <- 0.9*w
h <- 1.1*h
}))
}
Generating the plot:
dataplot<-ggmap(map.data) +
geom_point(data = df,aes(x = df$lon, y = df$lat),
alpha = 1, fill = "red", pch = 21, size = 6) +
labs(x = 'Longitude', y = 'Latitude') +
geom_dl(data = df,
aes(label = labels),
list(dl.trans(y = y + 0.3), "boxes", cex = .8, fontface = "bold"))
This is a MUCH more readable plot, but with one outstanding issue. You'll note that the label "Site 1E" begins to overlap the data point associated with "Site 1A". Does directlabels have a way with dealing with labels overlapping data points belonging to another label?
A final question I have regarding this is how can I plot several duplicate labels using this method. Suppose the labels for data.frame are all the same:
df$labels<-rep("test",dim(df)[1])
When I use the same code, directlabels removes the duplicate label names:
But I want each data point to have a label of "test". Any suggestions?

Edit 11 Jan 2016: using ggrepel package with ggplot2 v2.0.0 and ggmap v2.6
ggrepel works well. In the code below, geom_label_repel() shows some of the available parameters.
lat <- c(47.597157,47.656322,47.685928,47.752365,47.689297,47.628128,47.627071,
47.586349,47.512684,47.571232,47.562283)
lon <- c(-122.312187,-122.318039,-122.31472,-122.345345,-122.377045,-122.370117,
-122.368462,-122.331734,-122.294395,-122.33606,-122.379745)
labels <- c("Site 1A","Site 1B","Site 1C","Site 2A","Site 3A","Site 1D",
"Site 2C","Site 1E","Site 2B","Site 1G","Site 2G")
df <- data.frame(lat,lon,labels)
library(ggmap)
library(ggrepel)
library(grid)
map.data <- get_map(location = c(lon = -122.3485, lat = 47.6200),
maptype = 'roadmap', zoom = 11)
ggmap(map.data) +
geom_point(data = df, aes(x = lon, y = lat),
alpha = 1, fill = "red", pch = 21, size = 5) +
labs(x = 'Longitude', y = 'Latitude') +
geom_label_repel(data = df, aes(x = lon, y = lat, label = labels),
fill = "white", box.padding = unit(.4, "lines"),
label.padding = unit(.15, "lines"),
segment.color = "red", segment.size = 1)
Original answer but updated for ggplot v2.0.0 and ggmap v2.6
If there is only a small number of overlapping points, then using the "top.bumpup" or "top.bumptwice" method from the direct labels package can separate them. In the code below, I use the geom_dl() function to create and position the labels.
lat <- c(47.597157,47.656322,47.685928,47.752365,47.689297,47.628128,47.627071,
47.586349,47.512684,47.571232,47.562283)
lon <- c(-122.312187,-122.318039,-122.31472,-122.345345,-122.377045,-122.370117,
-122.368462,-122.331734,-122.294395,-122.33606,-122.379745)
labels <- c("Site 1A","Site 1B","Site 1C","Site 2A","Site 3A","Site 1D",
"Site 2C","Site 1E","Site 2B","Site 1G","Site 2G")
df <- data.frame(lat,lon,labels)
library(ggmap)
library(directlabels)
map.data <- get_map(location = c(lon = -122.3485, lat = 47.6200),
maptype = 'roadmap', zoom = 11)
ggmap(map.data) +
geom_point(data = df, aes(x = lon, y = lat),
alpha = 1, fill = "red", pch = 21, size = 6) +
labs(x = 'Longitude', y = 'Latitude') +
geom_dl(data = df, aes(label = labels), method = list(dl.trans(y = y + 0.2),
"top.bumptwice", cex = .8, fontface = "bold", family = "Helvetica"))
Edit: Adjusting for underlying labels
A couple of methods spring to mind, but neither is entirely satisfactory. But I don't think you will find a solution that will apply to all situations.
Adding a background colour to each label
This is a bit of a workaround, but directlabels has a "box" function (i.e., the labels are placed inside a box). It looks like one should be able to modify background fill and border colour in the list in geom_dl, but I can't get it to work. Instead, I take two functions (draw.rects and enlarge.box) from the directlabels website; modify them; and combine the modified functions with the "top.bumptwice" method.
draw.rects.modified <- function(d,...){
if(is.null(d$box.color))d$box.color <- NA
if(is.null(d$fill))d$fill <- "grey95"
for(i in 1:nrow(d)){
with(d[i,],{
grid.rect(gp = gpar(col = box.color, fill = fill),
vp = viewport(x, y, w, h, "cm", c(hjust, vjust=0.25), angle=rot))
})
}
d
}
enlarge.box.modified <- function(d,...){
if(!"h"%in%names(d))stop("need to have already calculated height and width.")
calc.borders(within(d,{
w <- 0.9*w
h <- 1.1*h
}))
}
boxes <-
list("top.bumptwice", "calc.boxes", "enlarge.box.modified", "draw.rects.modified")
ggmap(map.data) +
geom_point(data = df,aes(x = lon, y = lat),
alpha = 1, fill = "red", pch = 21, size = 6) +
labs(x = 'Longitude', y = 'Latitude') +
geom_dl(data = df, aes(label = labels), method = list(dl.trans(y = y + 0.3),
"boxes", cex = .8, fontface = "bold"))
Add an outline to each label
Another option is to use this method to give each label an outline, although it is not immediately clear how it would work with directlabels. Therefore, it would need a manual adjustment of the coordinates, or a search of the dataframe for coordinates that are within a given threshold then adjust. However, here, I use the pointLabel function from maptools package to position the labels. No guarantee that it will work every time, but I got a reasonable result with your data. There is a random element built into it, so you can run it a few time until you get a reasonable result. Also, note that it positions labels in a base plot. The label locations then have to extracted and loaded into the ggplot/ggmap.
lat<- c(47.597157,47.656322,47.685928,47.752365,47.689297,47.628128,47.627071,47.586349,47.512684,47.571232,47.562283)
lon<-c(-122.312187,-122.318039,-122.31472,-122.345345,-122.377045,-122.370117,-122.368462,-122.331734,-122.294395,-122.33606,-122.379745)
labels<-c("Site 1A","Site 1B","Site 1C","Site 2A","Site 3A","Site 1D","Site 2C","Site 1E","Site 2B","Site 1G","Site 2G")
df<-data.frame(lat,lon,labels)
library(ggmap)
library(maptools) # pointLabel function
# Get map
map.data <- get_map(location = c(lon=-122.3485,lat=47.6200),
maptype = 'roadmap', zoom = 11)
bb = t(attr(map.data, "bb")) # the map's bounding box
# Base plot to plot points and using pointLabels() to position labels
plot(df$lon, df$lat, pch = 20, cex = 5, col = "red", xlim = bb[c(2,4)], ylim = bb[c(1,3)])
new = pointLabel(df$lon, df$lat, df$labels, pos = 4, offset = 0.5, cex = 1)
new = as.data.frame(new)
new$labels = df$labels
## Draw the map
map = ggmap(map.data) +
geom_point(data = df, aes(x = lon, y = lat),
alpha = 1, fill = "red", pch = 21, size = 5) +
labs(x = 'Longitude', y = 'Latitude')
## Draw the label outlines
theta <- seq(pi/16, 2*pi, length.out=32)
xo <- diff(bb[c(2,4)])/400
yo <- diff(bb[c(1,3)])/400
for(i in theta) {
map <- map + geom_text(data = new,
aes_(x = new$x + .01 + cos(i) * xo, y = new$y + sin(i) * yo, label = labels),
size = 3, colour = 'black', vjust = .5, hjust = .8)
}
# Draw the labels
map +
geom_text(data = new, aes(x = x + .01, y = y, label=labels),
size = 3, colour = 'white', vjust = .5, hjust = .8)

Related

combining land-only maps and contour plots using ggplot

I have developed a genetic algorithm for estimating the probability of observing an animal, given its genotype, across a regular grid of locations, here in south-east England. Using ggplot2 I can easily generate either a probability contour plot or a land-only (polygon-filled) map, but what I want is a map where the contour plot is restricted to land:
()
The desired outcome is generated by adding a black mask to the contour plot in Powerpoint, a tedious procedure that is impractical for generating the hundreds I need. I am sure there must be a simple way to do this.
I generate the contour plot using:
v <- ggplot(data, aes(Lat, Lng, z = P))
v + geom_contour(bins = 20)
and the map using:
ggplot(data = world) +
geom_sf(color = "black", fill = "gray") +
coord_sf(xlim = c(-2.3, 1.9), ylim = c(50.9, 53.5), expand = FALSE)
my input file comprises all locations in 0.05 increments of longitude and latitude in the intervals specified. It is large but I would happily add it if this helps. I have looked online and cannot see any examples that match what I want.
I have tried adding one component to the other as an extra layer but I struggle to understand what is needed and what the syntax are. For example:
layer(geom = "contour", stat = "identity", data = data, mapping = aes(Lng,Lat,P))
Error: Attempted to create layer with no position.
but even if this works it does not mask the sea area.
Here's a worked example with some made-up data:
library(rnaturalearth)
library(ggplot2)
sea <- ne_download(scale = 10, type = 'ocean', category = "physical",
returnclass = "sf")
ggplot(data) +
geom_contour_filled(aes(Lng, Lat, z = P), bins = 20, color = "black") +
guides(fill = "none") +
geom_sf(data = sea, fill = "black") +
coord_sf(ylim = c(51, 53.5), xlim = c(-2.2, 1.8), expand = FALSE)
Data used
set.seed(1)
a <- MASS::kde2d(rnorm(100), rnorm(100, 53), n = 100,
lims = c(-2.2, 1.8, 51, 53.5))
b <- MASS::kde2d(rnorm(25, 0.5), rnorm(25, 52), n = 100,
lims = c(-2.2, 1.8, 51, 53.5))
a$z <- b$z - a$z + max(a$z)
data <- cbind(expand.grid(Lng = a$x, Lat = a$y), P = c(a$z))
Created on 2023-01-02 with reprex v2.0.2

geom_raster() produces a whitish surface ontop of the map

I am trying to plot a heatmap ontop of a geographical map to show the geographic distribution of a variable. The minimum working code, with absurd data, is the following:
library(ggmap)
library(osmdata)
box <- c(left = 2.075, bottom = 41.325, right = 2.25, top = 41.47)
map <- get_stamenmap(bbox = box, maptype = "terrain-lines", zoom = 13)
lon_grid <- seq(2.075, 2.25, length.out = 30)
lat_grid <- seq(41.325, 41.47, length.out = 30)
grid <- expand.grid(lon_grid, lat_grid)
z <- c(rep(NA, 30^2/2), rnorm(30^2/2))
dataset <- cbind(grid, z)
ggmap(map) ### Plot 1
ggmap(map) + ### Plot 2
geom_raster(data = dataset, aes(x = Var1, y = Var2, fill = z), alpha = 0.5, interpolate = TRUE) +
scale_fill_viridis_c(option = "magma", na.value = "transparent") +
coord_equal()
The first map looks perfect: neat, clean, lines are defined.
The second one, having added the geom_raster layer, looks (besides wider) slightly blurred, not that crisp. See that the geom_raster line adds a whitish layer ontop of the map (if you look closely it does not even cover it totally). It is absolutely awful and I would like to remove it, or, in other words, I would like it to take a "transparent" colour when the tile produced by geom_raster takes a NA value.
Any ideas?
If I understood correctly the issue, it seems like the NA in the raster are not completely transparent. See if in scale_fill_viridis_c changing to na.value = NA does what you're looking for.
library(ggmap)
library(osmdata)
box <- c(left = 2.075, bottom = 41.325, right = 2.25, top = 41.47)
map <- get_stamenmap(bbox = box, maptype = "terrain-lines", zoom = 13)
lon_grid <- seq(2.075, 2.25, length.out = 30)
lat_grid <- seq(41.325, 41.47, length.out = 30)
grid <- expand.grid(lon_grid, lat_grid)
z <- c(rep(NA, 30^2/2), rnorm(30^2/2))
<- cbind(grid, z)
ggmap(map) + ### Plot 2
geom_raster(data = dataset, aes(x = Var1, y = Var2, fill = z), alpha = 0.5, interpolate = TRUE) +
scale_fill_viridis_c(option = "magma", na.value = NA) +
coord_equal()
This is what it outputs:

divide the y axis to make part with a score <25 occupies the majority in ggplot

I want to divide the y axis for the attached figure to take part with a score <25 occupies the majority of the figure while the remaining represent a minor upper part.
I browsed that and I am aware that I should use scale_y_discrete(limits .I used this p<- p+scale_y_continuous(breaks = 1:20, labels = c(1:20,"//",40:100)) but it doesn't work yet.
I used the attached data and this is my code
Code
p<-ggscatter(data, x = "Year" , y = "Score" ,
color = "grey", shape = 21, size = 3, # Points color, shape and size
add.params = list(color = "blue", fill = "lightgray"), # Customize reg. line
add = "loess", #reg.line
conf.int = T,
cor.coef = F, cor.method = "pearson",
xlab = "Year" , ylab= "Score")
p<-p+ coord_cartesian(xlim = c(1980, 2020));p
Here is as close as I could get getting a fake axis break and resizing the upper area of the plot. I still think it's a bad idea and if this were my plot I'd much prefer a more straightforward axis transform.
First, we'd need a function that generates a transform that squeezes all values above some threshold:
library(ggplot2)
library(scales)
# Define new transform
my_transform <- function(threshold = 25, squeeze_factor = 10) {
force(threshold)
force(squeeze_factor)
my_transform <- trans_new(
name = "trans_squeeze",
transform = function(x) {
ifelse(x > threshold,
((x - threshold) * (1 / squeeze_factor)) + threshold,
x)
},
inverse = function(x) {
ifelse(x > threshold,
((x - threshold) * squeeze_factor) + threshold,
x)
}
)
return(my_transform)
}
Next we apply that transformation to the y-axis and add a fake axis break. I've used vanilla ggplot2 code as I find the ggscatter() approach confusing.
ggplot(data, aes(Year, Score)) +
geom_point(color = "grey", shape = 21, size = 3) +
geom_smooth(method = "loess", fill = "lightgray") +
# Add fake axis lines
annotate("segment", x = -Inf, xend = -Inf,
y = c(-Inf, Inf), yend = c(24.5, 25.5)) +
# Apply transform to y-axis
scale_y_continuous(trans = my_transform(25, 10),
breaks = seq(0, 80, by = 10)) +
scale_x_continuous(limits = c(1980, 2020), oob = oob_keep) +
theme_classic() +
# Turn real y-axis line off
theme(axis.line.y = element_blank())
You might find it informative to read Hadley Wickham's view on discontinuous axes. People sometimes mock weird y-axes.

ggplot geom_point: how to set font of custom plotting symbols?

With ggplot::geom_point we are able to set any character for plotting symbols using scale_shape_manual. I am coming with an example which demonstrates the purpose: use triangles to make a heatmap with two values in each cell:
require(ggplot2)
data <- data.frame(
val = rnorm(40),
grp = c(rep('a', 20), rep('b', 20)),
x = rep(letters[1:4], 5),
y = rep(letters[1:5], 4)
)
p <- ggplot(data, aes(x = x, y = y, color = val, shape = grp)) +
geom_point(size = 18) +
scale_shape_manual(values=c("\u25E4","\u25E2")) +
theme_minimal() +
theme(panel.grid = element_blank())
ggsave('triangle-tiles.pdf', device = cairo_pdf, width = 4.1, height = 3.5)
This works fine if the font used for the symbols has these special characters. Otherwise apparently fails. I am aware that we can explicitely define font and get the same result with geom_text:
require(dplyr)
data <- data %>% mutate(sym = ifelse(grp == 'a', "\u25E4", "\u25E2"))
p <- ggplot(data, aes(x = x, y = y, color = val, label = sym)) +
geom_text(size = 18, family = 'DejaVu Sans') +
theme_minimal() +
theme(
panel.grid = element_blank()
)
ggsave('triangle-tiles-2.pdf', device = cairo_pdf, width = 4.1, height = 3.5)
However the fact that also in geom_point these are characters coming from a typeface makes me really curious what is the way to overwrite the default.
I checked the grob of this plot trying to follow this example and found the points this way:
gr <- ggplotGrob(p)
gr[['grobs']][[6]]$children$geom_point
Here we have x, y, size, lwd (used in example above) etc but no typeface. Also I am wondering how to find directly and automatically the grob of the points from the root grob, e.g. with grid::getGrob, e.g. in the cited example grid::grid.edit finds them.
I found some promising code here which uses editGtable, a method I could not find in any package, maybe is an old one. Then I tried editGrob with no success:
font <- gpar(fontfamily = 'DejaVu Sans', fontsize = 14)
editGrob(gr[['grobs']][[6]], 'geom_point.points', grep = TRUE, global = TRUE, gp = font)
The following is a bit of a hack, but you can wrap a pointlayer in a new class that assigns the fontfamily to the graphical parameters of a points. In the example below, the new class calls the parental methods for drawing the points in the layer and the key and then assigns the fontfamily to the graphical parameters.
require(ggplot2)
#> Loading required package: ggplot2
point_with_family <- function(layer, family) {
old_geom <- layer$geom
new_geom <- ggproto(
NULL, old_geom,
draw_panel = function(self, data, panel_params, coord, na.rm = FALSE) {
pts <- ggproto_parent(GeomPoint, self)$draw_panel(
data, panel_params, coord, na.rm = na.rm
)
pts$gp$fontfamily <- family
pts
},
draw_key = function(self, data, params, size) {
pts <- ggproto_parent(GeomPoint, self)$draw_key(
data, params, size
)
pts$gp$fontfamily <- family
pts
}
)
layer$geom <- new_geom
layer
}
data <- data.frame(
val = rnorm(40),
grp = c(rep('a', 20), rep('b', 20)),
x = rep(letters[1:4], 5),
y = rep(letters[1:5], 4)
)
p <- ggplot(data, aes(x = x, y = y, color = val, shape = grp)) +
point_with_family(geom_point(size = 18), "DejaVu Sans") +
scale_shape_manual(values=c("\u25E4","\u25E2")) +
theme_minimal() +
theme(panel.grid = element_blank())
ggsave('triangle-tiles-2.pdf', plot = p, device = cairo_pdf, width = 4.1, height = 3.5)
Created on 2021-08-29 by the reprex package (v2.0.0)
With a fontfamily that doesn't support the unicode characters:

Plotting points and lines separately in R with ggplot

I'm trying to plot 2 sets of data points and a single line in R using ggplot.
The issue I'm having is with the legend.
As can be seen in the attached image, the legend applies the lines to all 3 data sets even though only one of them is plotted with a line.
I have melted the data into one long frame, but this still requires me to filter the data sets for each individual call to geom_line() and geom_path().
I want to graph the melted data, plotting a line based on one data set, and points on the remaining two, with a complete legend.
Here is the sample script I wrote to produce the plot:
xseq <- 1:100
x <- rnorm(n = 100, mean = 0.5, sd = 2)
x2 <- rnorm(n = 100, mean = 1, sd = 0.5)
x.lm <- lm(formula = x ~ xseq)
x.fit <- predict(x.lm, newdata = data.frame(xseq = 1:100), type = "response", se.fit = TRUE)
my_data <- data.frame(x = xseq, ypoints = x, ylines = x.fit$fit, ypoints2 = x2)
## Now try and plot it
melted_data <- melt(data = my_data, id.vars = "x")
p <- ggplot(data = melted_data, aes(x = x, y = value, color = variable, shape = variable, linetype = variable)) +
geom_point(data = filter(melted_data, variable == "ypoints")) +
geom_point(data = filter(melted_data, variable == "ypoints2")) +
geom_path(data = filter(melted_data, variable == "ylines"))
pushViewport(viewport(layout = grid.layout(1, 1))) # One on top of the other
print(p, vp = viewport(layout.pos.row = 1, layout.pos.col = 1))
You can set them manually like this:
We set linetype = "solid" for the first item and "blank" for others (no line).
Similarly for first item we set no shape (NA) and for others we will set whatever shape we need (I just put 7 and 8 there for an example). See e.g. http://www.r-bloggers.com/how-to-remember-point-shape-codes-in-r/ to help you to choose correct shapes for your needs.
If you are happy with dots then you can use my_shapes = c(NA,16,16) and scale_shape_manual(...) is not needed.
my_shapes = c(NA,7,8)
ggplot(data = melted_data, aes(x = x, y = value, color=variable, shape=variable )) +
geom_path(data = filter(melted_data, variable == "ylines") ) +
geom_point(data = filter(melted_data, variable %in% c("ypoints", "ypoints2"))) +
scale_colour_manual(values = c("red", "green", "blue"),
guide = guide_legend(override.aes = list(
linetype = c("solid", "blank","blank"),
shape = my_shapes))) +
scale_shape_manual(values = my_shapes)
But I am very curious if there is some more automated way. Hopefully someone can post better answer.
This post relied quite heavily on this answer: ggplot2: Different legend symbols for points and lines

Resources