I am making a density map in R using ggmap and stat_density2d. The code looks like this:
riverside <- get_map('Riverside, IL', zoom = 14 , color = 'bw' )
RiversideMap <- ggmap(riverside, extent = 'device', legend = 'topleft')
# make the map:
RiversideMap +
stat_density2d(aes(x = lon, y = lat,
fill = ..level.. , alpha = ..level..),size = .01, bins = 16,
data = myData, geom = 'polygon') +
scale_fill_gradient(low = "yellow", high = "blue") +
scale_alpha(range = c(.0, 0.3), guide = FALSE)
The density shown in the map's color legend is normalized in stat_density2d by requiring the integral of the density over area equals 1.
In the map, the units of the x and y axes are decimal degrees. (For example, a point is specified by the coordinates lat = 41.81888 and lon = -87.84147).
For ease of interpretation, like to make two changes to the values of the density as displayed in the map legend.
First, I'd like the integral of the density to be N (the number of data points - or addresses - in the data set) rather than 1. So the values displayed in the legend need to be multiplied by N = nrow(myData).
Second, I'd like the unit of distance to be kilometers rather than decimal degrees. For the latitudes and longitudes that I am plotting, this requires dividing the values displayed in the legend by 9203.
With the default normalization of density in stat_density2d, I get these numbers in the legend: c(2000,1500,1000,500).
Taking N = 1600 and performing the above re-scalings, this becomes c(348, 261, 174, 87) (= 1600/9203 * 2000 etc). Obviously, these are not nice round numbers, so it would be even better if the legend numbers were say c(400,300,200,100) with their locations in the legend color bar adjusted accordingly.
The advantage of making these re-scalings is that the density in the map becomes easy to interpret: it is just the number of people per square km (rather than the probability density of people per square degree).
Is there an easy way to do this? I am new to ggmap and ggplot2. Thanks in advance.
In brief, use:
scale_fill_continuous(labels = scales::unit_format(unit = "k", scale = 1e-3))
This link is great help for managing scales, axes and labels: https://ggplot2-book.org/scales.html
Related
I am trying to do species distribution mapping in R for invasive oyster species (marine).
With the code that I am using, it will plot points only on land (ext= geographic.extent). I cannot find other "extent" options for example marine.extent or a way to do -geographic.extent so that it would be everything BUT the currently plotted area.
# Randomly sample points (same number as our observed points)
background <- randomPoints(mask = mask, # Provides resolution of sampling points
n = nrow(obs.data), # Number of random points
ext = geographic.extent, # Spatially restricts sampling
extf = 1.25) # Expands sampling a little bit
# Plot the base map
plot(wrld_simpl,
xlim = c(0, 30), #north and baltic sea
ylim = c(50, 70),
axes = TRUE,
col = "grey95",
main = "Presence and pseudo-absence points")
# Add the background points
points(background, col = "grey30", pch = 1, cex = 0.75)
geographic.extent is a variable that you supply. It is probably created in your own code, or else by a package you load. Can you edit your question and show what it is (print it)?
To only sample points from certain areas, use the mask argument (as you do). In your case all land areas should be NA and all marine areas should not be NA.
I am trying to recreate an image found in a textbook in R, the original of which was built in MATLAB:
I have generated each of the graphs seperately, but what would be best practice them into an image like this in ggplot2?
Edit: Provided code used. This is just a transformation of normally distributed data.
library(ggplot2)
mean <- 6
sd <- 1
X <- rnorm(100000, mean = mean, sd = sd)
Y <- dnorm(X, mean = mean, sd = sd)
Y_p <- pnorm(X, mean = mean, sd = sd)
ch_vars <- function(X){
nu_vars <- c()
for (x in X){
nu_vars <- c(nu_vars, (1/(1 + exp(-x + 5))))
}
return(nu_vars)
}
nu_X <- ch_vars(X)
nu_Y <- ch_vars(Y)
data <- data.frame(x = X, y = Y, Y_p = Y_p, x = nu_X, y = nu_Y)
# Cumulative distribution
ggplot(data = data) +
geom_line(aes(x = X, y = Y_p))
# Distribution of initial data
ggplot(data = data_ch, aes(x = X)) +
geom_histogram(aes(y = ..density..), bins = 25, fill = "red", color = "black")
# Distribution of transformed data
ggplot(data = data, aes(x = nu_X)) +
geom_histogram(aes(y = ..density..), bins = 25, fill = "green", color = "black")
In short, you can't, or rather, you shouldn't.
ggplot is a high-level plotting packaging. More than a system for drawing shapes and lines, it's fairly "opinionated" about how data should be represented, and one of its opinions is that a plot should express a clear relationship between its axes and marks (points, bars, lines, etc.). The axes essentially define a coordinate space, and the marks are then plotted onto the space in a straightforward and easily interpretable manner.
The plot you show breaks that relationship -- it's a set of essentially arbitrary histograms all drawn onto the same box, where the axis values become ambiguous. The x-axis represents the values of 1 histogram and the y-axis represents another (and thus neither axis represents the histograms' heights).
It is of course technically possible to force ggplot to render something like your example, but it would require pre-computing the histograms, normalizing their values and bin heights to a common coordinate space, converting these into suitable coordinates for use with geom_rect, and then re-labeling the plot axes. It would be a very large amount of manual effort and ultimately defeats the point of using a high-level plotting grammar like ggplot.
I'd like to use something like ggplot2 and ggmap to produce a heat map of arbitrary values such as property prices per metre squared over a geographic area at a street level (with a high resolution).
Unfortunately, the task appears to be rather difficult because while ggplot2 can produce a great density plot, it seems unable to visualise spatial data like this without prior interpolation.
For this, I've used libraries akima (gridded bivariate interpolation for irregular data) and mgcv (generalised additive models with integrated smoothness estimation), however my knowledge of interpolation methods is mediocre at best and the results I've been able to produce aren't satisfactory enough.
Consider the following example:
Data
library(ggplot2)
library(ggmap)
## data simulation
set.seed(1945)
df <- tibble(x = rnorm(500, -0.7406, 0.03),
y = rnorm(500, 51.9976, 0.03),
z = abs(rnorm(500, 2000, 1000)))
Map, scatterplot, density plot
## ggmap
map <- get_map("Bletchley Park, Bletchley, Milton Keynes", zoom = 13, source = "stamen", maptype = "toner-background")
q <- ggmap(map, extent = "device", darken = .5)
## scatterplot over map
q + geom_point(aes(x, y), data = df, colour = z)
## classic density heat map
q +
stat_density2d(aes(x=x, y=y, fill=..level..), data=df, geom="polygon", alpha = .2) +
geom_density_2d(aes(x=x, y=y), data=df, colour = "white", alpha = .4) +
scale_fill_distiller(palette = "Spectral")
As you can see, the data are rather dense over the chosen area and the density heat map looks great with round edges and closed curves (except for some of the outermost layers).
Interpolation and plotting using akima
## akima interpolation
library(akima)
df_akima <-interp2xyz(interp(x=df$x, y=df$y, z=df$z, duplicate="mean", linear = T,
xo=seq(min(df$x), max(df$x), length=200),
yo=seq(min(df$y), max(df$y), length=200)), data.frame=TRUE)
## akima plot
q +
geom_tile(aes(x = x, y = y, fill = z), data = df_akima, alpha = .4) +
stat_contour(aes(x = x, y = y, z = z, fill = ..level..), data = df_akima, geom = 'polygon', alpha = .4) +
geom_contour(aes(x = x, y = y, z = z), data = df_akima, colour = 'white', alpha = .4) +
scale_fill_distiller(palette = "Spectral", na.value = NA)
This produces a dense grid of interpolated values (to ensure a sufficient resolution) and while the tile plot underneath is acceptable, the contour plots are too ragged and many of the curves aren't closed.
Non-linear interpolation using linear = F is smoother, but apparently sacrifices resolution and goes wild with the numbers (negative values of z).
Interpolation and plotting using mgcv
## mgcv interpolation
library(mgcv)
gam <- gam(z ~ s(x, y, bs = 'sos'), data = df)
df_mgcv <- data.frame(expand.grid(x = seq(min(df$x), max(df$x), length=200),
y = seq(min(df$y), max(df$y), length=200)))
resp <- predict(gam, df_mgcv, type = "response")
df_mgcv$z <- resp
## mgcv plot
q +
geom_tile(aes(x = x, y = y, fill = z), data = df_mgcv, alpha = .4) +
stat_contour(aes(x = x, y = y, z = z, fill = ..level..), data = df_mgcv, geom = 'polygon', alpha = .4) +
geom_contour(aes(x = x, y = y, z = z), data = df_mgcv, colour = 'white', alpha = .4) +
scale_fill_distiller(palette = "Spectral", na.value = NA)
The same process using mgcv results in a nice and smooth plot, but the resolution is much lower and practically all curves aren't closed.
Questions
Could you please suggest a better method or modify my attempt to obtain a plot similar to the first one (clean, connected, and smooth lines with high resolution)?
Is it possible to close the curves, e.g. in the last plot (the shaded area should be computed beyond the image boundaries)?
Thank you for your time!
The problem with your maps is not the interpolation method you're using, but the way ggplot displays density lines. Here's an answer to this: Remove gaps in a stat_density2d ggplot chart without modifying XY limits.
The density lines go beyond the map, so any polygon that goes outside the plot area is rendered inappropriately (ggplot will close the polygon using the next point of the correspondent level). This does not show up much on your first map because the interpolation resolution is low.
The trick proposed by Andrew is to first expand the plot area, so that the density lines are rendered correctly, then cut off the display area to hide the extra space. Since I tested his solution with your first example, here's the code:
q +
stat_density2d(
aes(x = x, y = y, fill = ..level..),
data = df,
geom = "polygon",
alpha = .2,
color = "white",
bins = 20
) +
scale_fill_distiller(
palette = "Spectral"
) +
xlim(
min(df$x) - 10^-5,
max(df$x) + 10^-5
) +
ylim(
min(df$y) - 10^-3,
max(df$y) + 10^-3
) +
coord_equal(
expand = FALSE,
xlim = c(-.778, -.688),
ylim = c(51.965, 52.03)
)
The only differences is that I used min()- / max() + instead of fixed numbers and coord_equal to ensure the map wasn't distorted. In addition, I manually specified a greater number of levels (using bin), since by increasing the plot area, stat_density automatically chooses a lower resolution.
As for the best interpolation method, this depends on your objective and the type of data you have. The question is not what is the best method for your map, but what is the best method for your data. This is a very broad issue, out of scope for this space. But here's a good guide: http://www.rspatial.org/analysis/rst/4-interpolation.html
For general ideas on how to make good maps in R using ggplot: http://spatial.ly/r/
Sorry, I can't run your example at the moment to provide details. But try autoKrige() from automap package.
Kriging is a great method for interpolation. Just be sure that your data fits the requisitions. Here's a good guide:
https://gisgeography.com/kriging-interpolation-prediction/
I have been trying to create a map of membership locations from postcodes across the UK as a project in learning R. I have achieved nearly the result I wanted, but it's proving very frustrating getting the glitches sorted. This image is my current best effort:
I still want to change:
get rid of the extraneous legend (the "0.16", "0.5" squares), which are coming from the size arg to geom_point. If I remove the size=0.16 arg the guide/legend disappears, but the geom size returns to the default too. This also happens for the "black" guide -- coming from a colour obviously -- but why?
properly clip the stat_density2d polygons, which are exhibiting undesireable behaviour when clipped (see bottom-right plot near the top)
have control over the line-width of the geom_path that includes the county boundaries: it's currently too thick (would like about 1/2 thickness shown) but all I can achieve by including 'size' values is to make the lines stupidly thick - so thick that they obscure the whole map.
The R code uses revgeocode() to find the placename closest to the centre point but I don't know how to include the annotation on the map. I would like to include it in a text-box over the North Sea (top right of UK maps), maybe with a line/arrow to the point itself. A simpler option could just be some text beneath the UK map, below the x-axis ... but I don't know how to do that. geom_rect/geom_text seem fraught in this context.
Finally, I wanted to export the map to a high-res image, but when I do that everything changes again, see:
which shows the high-res (~1700x1800px) image on the left and the Rstudio version (~660x720px) on the right. The proportions of the maps have changed and the geom_text and geom_point for the centre point are now tiny. I would be happy if the gap between the two map rows was always fairly small, too (rather than just small at high res).
Code
The basics: read list of members postcodes, join with mySociety table of postcode<>OSGB locations, convert locations to Lat/long with spTransform, calculate binhex and density layers, plot with ggmap.
The code for all this is somewhat lengthy so I have uploaded it as a Gist:
https://gist.github.com/rivimey/ee4ab39a6940c0092c35
but for reference the 'guts' of the mapping code is here:
# Get a stylised base map for the whole-of-uk maps.
map.bbox = c(left = -6.5, bottom = 49.5, right = 2, top = 58)
basemap.uk <- get_stamenmap(bb = map.bbox, zoom=calc_zoom(map.bbox), maptype="watercolor")
# Calculate the density plot - a continuous approximation.
smap.den <- stat_density2d(aes(x = lat, y = lon, fill = ..level.., alpha = ..level..),
data = membs.wgs84.df, geom = "polygon",
breaks=2/(1.5^seq(0,12,by=1)), na.rm = TRUE)
# Create a point on the map representing the centroid, and label it.
cmap.p <- geom_point(aes(x = clat, y = clon), show_guide = FALSE, data = centroid.df, alpha = 1)
cmap.t1 <- geom_text(aes(x = clat, y = clon+0.22, label = "Centre", size=0.16), data = centroid.df)
cmap.t2 <- geom_text(aes(x = clat, y = clon+0.1, label = "Centre", size=0.25), data = centroid.df)
# Create an alternative presentation, as binned hexagons, which is more true to the data.
smap.bin <- geom_hex(aes(x = lat, y = lon),
data = membs.wgs84.df, binwidth = c(0.15, 0.1), alpha = 0.7, na.rm = TRUE)
# Create a path for the county and country boundaries, to help identify map regions.
bounds <- geom_path(aes(x = long, y = lat, group = group, colour = "black"), show_guide = FALSE,
data = boundaries.subset, na.rm = TRUE)
# Create the first two actual maps: a whole-uk binned map, and a whole-uk density map.
map.bin <- ggmap(basemap.uk) + smap.bin + grad + cmap.p + cmap.t1
map.den <- ggmap(basemap.uk) + smap.den + alpha + cmap.p + cmap.t1
# Create a zoomed-in map for the south-east, to show greater detail. I would like to use this
# bbox but google maps don't respect it :(
map.lon.bbox = c(left = -1, bottom = 51, right = 1, top = 52)
# Get a google terrain map for the south-east, bbox roughly (-1.7,1.7, 50.1, 53)
basemap.lon <- get_map(location = c(0,51.8), zoom = 8, maptype="terrain", color = "bw")
# Create a new hexbin with more detail than earlier.
smap.lon.bin <- geom_hex(aes(x = lat, y = lon),
data = membs.wgs84.df, bins=26, alpha = 0.7, na.rm = TRUE)
# Noe create the last two maps: binned and density maps for London and the SE.
lonmap.bin <- ggmap(basemap.lon) + bounds + smap.lon.bin + grad + cmap.p + cmap.t2
lonmap.den <- ggmap(basemap.lon) + bounds + smap.den + alpha + cmap.p + cmap.t2
# Arrange the maps in 2x2 grid, and tell the grid code to let the first row be taller than the second.
multiplot(map.bin, lonmap.bin, map.den, lonmap.den, heights = unit( c(10,7), "null"), cols=2 )
I have some longitude position data and I want to show its variation over time for each of the several different individuals of study. I also want to do a marginal tile-like density plot on top of it but I need it to show where this density is on a map so I need a geographical map overlaid on top of it.
My data looks something like this:
SO <- data.frame(date = rep(seq(as.Date("2000/1/1"), by = "day", length.out = 365), 3),
julian = rep(seq(1,365,1),3),
ind = c(rep(1,365), rep(2,365), rep(3,365)),
longitude = c(rnorm(365, mean = 90, sd =5), rnorm(365, mean = 85, sd =2), rnorm(365, mean = 92, sd =3)))
So far I have managed to plot the longitudes fitted with smoothed lines and flip coordinates to leave longitude in the x-axis. My code currently looks like this:
ggplot(SO, aes(x = julian, y = longitude)) +
geom_point(aes(color=factor(ind)), size = 0.1) +
stat_smooth(aes(group= factor(ind)), se = FALSE)+
stat_smooth(aes(color=factor(ind))) +
coord_flip() + scale_x_reverse()
However I have got stuck with plotting the tile density and overlaying a map to it. The result should look like this.
If you can just come up with how to overlay the map to the density plot that would already be of great help. Thank you very much.