Creating a connectivity histogram in R using shapefiles - r

While working in GeoDa on a data set of the US Census Shapefiles I can quickly create a connectivity histogram shown below:
Assuming that my data is sourced in the following manner:
# Download an read US state shapefiles
tmp_shps <- tempfile(); tmp_dir <- tempdir()
download.file("http://www2.census.gov/geo/tiger/GENZ2014/shp/cb_2014_us_state_20m.zip",
tmp_shps)
unzip(tmp_shps, exdir = tmp_dir)
# Libs
require(rgdal); require(ggplot2)
# Read
us_shps <- readOGR(dsn = tmp_dir, layer = "cb_2014_us_state_20m")
How can I arrive at a similar connectivity histogram in R? Addittionally, I would be interested in creating a meanigful histogram derived from distance matrix created in the following manner:
require(geospacom)
dzs_distmat <- DistanceMatrix(poly = us_shps, id = "GEOID",
unit = 1000, longlat = TRUE, fun = distHaversine)
In practice, I'm interested in achieving the following objectives:
Summarising how often geographies border one another, ideally through a connectivity histogram shown above
Summarising information on distances amongst geographies

I played around with it a bit. This seems to be a start.
For your second point. Can you be more specific? I guess a simple histogram or density plot would summarise just fine? I.e. something like:
dists <- dzs_distmat[lower.tri(dzs_distmat)]
hist(dists, xlab = "Dist",
main = "Histogram of distances",
col = "grey")
abline(v = mean(dists), col = "red", lwd = 2)
Regarding your first point, the following should be a very non-fancy version of the histogram you present. (But it doesn't look like it very much?!) It should be a histogram of the number of touching neighbours following this post.
library("rgeos")
# Get adjencency matrix
adj <- gTouches(us_shps, byid = TRUE)
# Add names
tmp <- as.data.frame(us_shps)$STATEFP
dimnames(adj) <- list(tmp, tmp)
# Check names
stopifnot(all(rownames(adj) == rownames(dzs_distmat))) # Sanity check
hist(rowSums(adj), col = "grey", main = "Number of neighbours",
breaks = seq(-0.5, 8.5, by = 1))
I guess the fancy colours can be added relatively easily.

Using spdep you could identify the spatial neighbors of the regions using the the poly2nb function and then plot the histogram using the card function. Ex:
nb_q <- poly2nb(us_shp, queen = T)
hist(card(nb_q), col = "grey", main = "Number of neighbours", breaks = seq(-0.5, 8.5, by = 1))

Related

Adjust plot margins to show figure legend

How do I adjust my plot size to make the heatmap legend visible?
I tried par(oma=c(0,0,1,0)+1, mar=c(0,0,0,0)+1) but it completely truncated my plot.
# Correlation Matrix
dat.cor <- cor(samp.matrix, method="pearson", use="pairwise.complete.obs")
cx <- redgreen(50)
# Correlation plot - heatmap
png("Heatmap_cor.matrix.png")
#par(oma=c(0,0,1,0), mar=c(0,0,0,0))
leg <- seq(min(dat.cor, na.rm=T), max(dat.cor, na.rm=T), length=10)
image(dat.cor, main="Correlation between Glioma vs Non-Tumor\n Gene Expression", col=cx, axes=F)
axis(1,at=seq(0,1,length=ncol(dat.cor)),label=dimnames(dat.cor)[[2]], cex.axis=0.9,las=2)
axis(2,at=seq(0,1,length=ncol(dat.cor)),label=dimnames(dat.cor)[[2]], cex.axis=0.9,las=2)
dev.off()
It would be a lot easier to help you with your problem if you included a minimal reproducible example. Please see https://stackoverflow.com/help/how-to-ask to get tips on improving your questions and improve your chances of getting an answer.
In order to replicate your issue, I downloaded a subset of the GEO dataset and used the mean affy intensities to create an approximation of your heatmap:
# Load libraries
library(tidyverse)
#BiocManager::install("affyio")
library(affyio)
# GSE data downloaded from https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE4290
list_of_files <- fs::dir_ls("~/Desktop/GSE4290_RAW/")
# Load the CEL files
CEL_list <- list()
for (f in seq_along(list_of_files)) {
CEL_list[[f]] <- read.celfile(list_of_files[[f]],
intensity.means.only = TRUE)
}
# Rename each element of the list with the corresponding sample name
names(CEL_list) <- gsub(x = basename(list_of_files),
pattern = ".CEL.gz",
replacement = "")
# Create a matrix of the mean intensities for all genes
samp.matrix <- map(CEL_list, pluck, "INTENSITY", "MEAN") %>%
bind_cols() %>%
as.matrix()
# Calculate correlations between samples
dat.cor <- cor(samp.matrix, method = "pearson",
use = "pairwise.complete.obs")
# Specify a colour palette (green/red is NOT colourblind friendly)
cx <- colorRampPalette(viridis::inferno(50))(50)
# Plot the heatmap
png("Heatmap_cor.matrix.png")
par(oma=c(0,0,1,0), mar=c(6,6,4,7), par(xpd = TRUE))
leg <- seq(from = 0.1, to = 1, length.out = 10)
image(dat.cor, main="Correlation between Glioma vs Non-Tumor\n Gene Expression", col=cx, axes=F)
axis(1,at=seq(0,1,length=ncol(dat.cor)),label=dimnames(dat.cor)[[2]], cex.axis=0.9,las=2)
axis(2,at=seq(0,1,length=ncol(dat.cor)),label=dimnames(dat.cor)[[2]], cex.axis=0.9,las=2)
legend(1.1, 1.1, title = "Correlation", legend = leg,
fill = colorRampPalette(viridis::inferno(50))(10))
dev.off()
Does this solve your problem?
Also, one of the great things about R is that people create packages to make these types of tasks easier; one example is the pheatmap package which makes clustering samples and annotating sample groups a lot more straightforward and I've found that the final image can be 'nicer' than creating the plot from scratch. E.g.
library(pheatmap)
pheatmap(mat = dat.cor, color = cx, border_color = "white", legend = TRUE,
main = "Correlation between Glioma vs Non-Tumor\n Gene Expression")

Formatting phylogeny to map projection (`phylo.to.plot`, or alternate method) in R

I am hoping someone can help me with the formating from phylo.to.plot() or suggest another method that can produce a similar output.
I have followed tutorial(s) here to produce an output but it seems difficult to alter the resulting figures.
Briefly these are my questions. I will expand further below.
How to plot a subregion of a "WorldHires" map, not entire region?
Change the shape of the points on the map, but maintain the colour?
Add gradient of continuous variable to map
Reproducible example:
Here is a very basic tree with some randomly assigned geographic locations
myTree <- ape::read.tree(text='((A, B), ((C, D), (E, F)));')
plot(myTree)
# It needs to be rooted for `phylo.to.map()` to work
myTree$branch.length = NULL
rooted_cladogram = ape::compute.brlen(myTree)
# Sample information
Sample <- c("A","B","C","D","E","F")
coords <- matrix(c(56.001966,57.069417,50.70228, 51.836213, 54.678997, 54.67831,-5.636926,-2.47805,-3.8975018, -2.235444,-3.4392211, -1.751833), nrow=6, ncol=2)
rownames(coords) <- Sample
head(coords)
## Plot phylo.to.map
obj<-phylo.to.map(rooted_cladogram,coords,database="worldHires", regions="UK",plot=FALSE,xlim=c(-11,3), ylim=c(49,59),direction="rightwards")
plot(obj,direction="rightwards",fsize=0.5,cex.points=c(0,1), lwd=c(3,1),ftype="i")
Plot output here:
Question 1: How do I plot a subregion of a "WorldHires" map, not the entire region?
I would like to only have mainland Britain which is a subregion of the "UK" in the WorldHires database. To access it normally I would do:
map1 <- ggplot2::map_data(map = "worldHires", region = c("UK"),xlim=c(-11,3), ylim=c(49,59))
GB <- subset(map1, subregion=="Great Britain")
# Plot
GB_plot<- ggplot(GB )+
geom_polygon(aes(x = long, y = lat, group = group), fill = "white", colour = "black")+
theme_classic()+
theme(axis.line=element_blank(),
axis.text=element_blank(),
axis.ticks=element_blank(),
axis.title=element_blank(),
panel.border = element_blank())
Which looks like this:
I have tried but it ignore the subregion argument.
obj<-phylo.to.map(ttree,coords,database="worldHires", regions="UK", subregion="Great Britain",plot=FALSE,xlim=c(-11,3), ylim=c(49,59),direction="rightwards")
Is there a way to provide it directly with a map instead of using WorldHires?
Question 2: How do I change the shape of the points on the map but keep maintain the colour?
I want to use shapes on the map to indicate the 3 major clade on my tree geographically. However, when I add a pch argument in, it correctly changes the shapes but the points then become black instead of following the colour that they were before. The lines from the tree to the map maintain the colour, it is just the points themselves that seem to turn black.
This is how I have tried to change the shape of the points:
# Original code - points
cols <-setNames(colorRampPalette(RColorBrewer::brewer.pal(n=6, name="Dark2"))(Ntip(myTree)),myTree$tip.label)
obj<-phylo.to.map(rooted_cladogram,coords,database="worldHires", regions="UK",plot=FALSE,xlim=c(-11,3), ylim=c(49,59),direction="rightwards")
plot(obj,direction="rightwards",fsize=0.5,cex.points=c(0,1), colors=cols,lwd=c(3,1),ftype="i")
Point and lines are coloured. I would like to change the shape of points
# Code to change points = but points are no longer coloured
shapes <- c(rep(2,2),rep(1,2),rep(0,2))
obj<-phylo.to.map(rooted_cladogram,coords,database="worldHires", regions="UK",plot=FALSE,xlim=c(-11,3), ylim=c(49,59),direction="rightwards")
plot(obj,direction="rightwards",fsize=0.5,cex.points=c(0,1), colors=cols,pch=shapes,lwd=c(3,1),ftype="i")
Output: The shapes are changed but they are no longer coloured in:
Question 3: How do I add a gradient to the map?
Given this fake dataset, how to I create a smoothed gradient of the value variable?
Any help and advice on this would be very much appreciated.
It would also be useful to know how to change the size of points
Thank you very much in advance,
Eve
I improved (somewhat) on my comments by using the map you made in your question. Here's the code:
library(mapdata)
library(phytools)
library(ggplot2)
myTree <- ape::read.tree(text='((A, B), ((C, D), (E, F)));')
plot(myTree)
# It needs to be rooted for `phylo.to.map()` to work
myTree$branch.length = NULL
rooted_cladogram = ape::compute.brlen(myTree)
# Sample information
Sample <- c("A","B","C","D","E","F")
coords <- matrix(
c(56.001966,
57.069417,
50.70228,
51.836213,
54.678997,
54.67831,
-5.636926,
-2.47805,
-3.8975018,
-2.235444,
-3.4392211,
-1.751833),
nrow=6,
ncol=2)
rownames(coords) <- Sample
head(coords)
obj <- phylo.to.map(
rooted_cladogram,
coords,
database="worldHires",
regions="UK",
plot=FALSE,
xlim=c(-11,3),
ylim=c(49,59),
direction="rightwards")
# Disable default map
obj2 <- obj
obj2$map$x <- obj$map$x[1]
obj2$map$y <- obj$map$y[1]
# Set plot parameters
cols <- setNames(
colorRampPalette(
RColorBrewer::brewer.pal(n=6, name="Dark2"))(Ntip(myTree)),myTree$tip.label)
shapes <- c(rep(2,2),rep(1,2),rep(0,2))
sizes <- c(1, 2, 3, 4, 5, 6)
# Plot phylomap
plot(
obj2,
direction="rightwards",
fsize=0.5,
cex.points=0,
colors=cols,
pch=shapes,
lwd=c(3,1),
ftype="i")
# Plot new map area that only includes GB
uk <- map_data(
map = "worldHires",
region = "UK")
gb <- uk[uk$subregion == "Great Britain",]
points(x = gb$long,
y = gb$lat,
cex = 0.001)
# Plot points on map
points(
x = coords[,2],
y = coords[,1],
pch = shapes,
col = cols,
cex = sizes)
e: Use sf object instead of points to illustrate GB. It is tough to provide more advice beyond this on how to add symbology for your spatially varying variable, but sf is popular and very well documented, e.g. https://r-spatial.github.io/sf/articles/sf5.html. Let me know if you have any other questions!
ee: Added lines to plot name and symbol on tips.
eee: Added gradient dataset to map.
library(phytools)
library(mapdata)
library(ggplot2)
library(sf)
myTree <- ape::read.tree(text='((A, B), ((C, D), (E, F)));')
plot(myTree)
# It needs to be rooted for `phylo.to.map()` to work
myTree$branch.length = NULL
rooted_cladogram = ape::compute.brlen(myTree)
# Sample information
Sample <- c("A","B","C","D","E","F")
coords <- matrix(c(56.001966,57.069417,50.70228, 51.836213, 54.678997, 54.67831,-5.636926,-2.47805,-3.8975018, -2.235444,-3.4392211, -1.751833), nrow=6, ncol=2)
rownames(coords) <- Sample
head(coords)
obj <- phylo.to.map(
rooted_cladogram,
coords,
database="worldHires",
regions="UK",
plot=FALSE,
xlim=c(-11,3),
ylim=c(49,59),
direction="rightwards")
# Disable default map
obj2 <- obj
obj2$map$x <- obj$map$x[1]
obj2$map$y <- obj$map$y[1]
## Plot tree portion of map
# Set plot parameters
cols <- setNames(
colorRampPalette(
RColorBrewer::brewer.pal(n=6, name="Dark2"))(Ntip(myTree)),myTree$tip.label)
shapes <- c(rep(2,2),rep(1,2),rep(0,2))
sizes <- c(1, 2, 3, 4, 5, 6)
# Plot phylomap
plot(
obj2,
direction="rightwards",
fsize=0.5,
cex.points=0,
colors=cols,
pch=shapes,
lwd=c(3,1),
ftype="i")
tiplabels(pch=shapes, col=cols, cex=0.7, offset = 0.2)
tiplabels(text=myTree$tip.label, col=cols, cex=0.7, bg = NA, frame = NA, offset = 0.2)
## Plot GB portion of map
# Plot new map area that only includes GB
uk <- map_data(map = "worldHires", region = "UK")
gb <- uk[uk$subregion == "Great Britain",]
# Convert GB to sf object
gb_sf <- st_as_sf(gb, coords = c("long", "lat"))
# Covert to polygon
gb_poly <- st_sf(
aggregate(
x = gb_sf$geometry,
by = list(gb_sf$region),
FUN = function(x){st_cast(st_combine(x), "POLYGON")}))
# Add polygon to map
plot(gb_poly, col = NA, add = TRUE)
## Load and format gradient data as sf object
# Load data
g <- read.csv("gradient_data.txt", sep = " ", na.strings = c("NA", " "))
# Check for, then remove NAs
table(is.na(g))
g2 <- g[!is.na(g$Lng),]
# For demonstration purposes, make dataset easier to manage
# Delete this sampling line to use the full dataset
g2 <- g2[sample(1:nrow(g2), size = 1000),]
# Create sf point object
gpt <- st_as_sf(g2, coords = c("Lng", "Lat"))
## Set symbology and plot
# Cut data into 5 groups based on "value"
groups <- cut(gpt$value,
breaks = seq(min(gpt$value), max(gpt$value), len = 5),
include.lowest = TRUE)
# Set colors
gpt$colors <- colorRampPalette(c("yellow", "red"))(5)[groups]
# Plot
plot(gpt$geometry, pch = 16, col = gpt$colors, add = TRUE)
## Optional legend for gradient data
# Order labels and colors for the legend
lev <- levels(groups)
# Used rev() here to make colors in correct order
fil <- rev(levels(as.factor(gpt$colors)))
legend("topright", legend = lev, fill = fil, add = TRUE)
## Plot sample points on GB
# Plot points on map
points(
x = coords[,2],
y = coords[,1],
pch = shapes,
col = cols,
cex = sizes)
see here for more info on gradient symbology and legends: R: Gradient plot on a shapefile

Questions regarding the stplanr package in R

I would like your help with the route_local function of the stplanr package (https://cran.r-project.org/web/packages/stplanr/stplanr.pdf), which is on page 89.
You may realize that a map is generated from the example function, showing the path between two points (I left the code and the image generated below). I would like to do the same thing. In my case it is show the path between two points considering my roads. Both are the shapefile file. I managed to generate the roads to show (code below), but I would like to show the route between any two points from these roads. Can someone help me??
I left it at the following site https://github.com/JovaniSouza/JovaniSouza5/blob/master/Example.zip to download the shapefiles.
library(geosphere)
library(sf)
library(stplanr)
roads<-st_read("C:/Users/Jose/Downloads/Example/Roads/Roads.shp")
p <- SpatialLinesNetwork(roads, uselonglat = FALSE, tolerance = 0)
plot(p)
Map generated by code
Example
from <- c(-1.535181, 53.82534)
to <- c(-1.52446, 53.80949)
sln <- SpatialLinesNetwork(route_network_sf)
r <- route_local(sln, from, to)
plot(sln)
plot(r$geometry, add = TRUE, col = "red", lwd = 5)
plot(cents[c(3, 4), ], add = TRUE)
r2 <- route_local(sln = sln, cents_sf[3, ], cents_sf[4, ])
plot(r2$geometry, add = TRUE, col = "blue", lwd = 3)
Try this. To adapt the example to your case you have to convert the coordinate system of the roads to the points shapefile (or the other way around):
library(geosphere)
library(sf)
library(stplanr)
roads <- st_read("Example/Roads/Roads.shp")
points <- st_read("Example/Points/Points.shp")
# Convert roads to coordinate system of points
roads_trf <- st_transform(roads, st_crs(points))
# Convert to points to SpatialPointsDataframe
points_sp <- as(points, "Spatial")
from <- c(-49.95058, -24.77502) # Feature 1
to <- c(-49.91084, -24.75200) # Feature 9
p <- SpatialLinesNetwork(roads_trf, uselonglat = FALSE, tolerance = 0)
r <- route_local(p, from, to)
plot(p)
plot(r$geometry, add = TRUE, col = "red", lwd = 5)
plot(points_sp[c(3, 4), ], add = TRUE)
r2 <- route_local(sln = p, points[3, ], points[4, ])
plot(r2$geometry, add = TRUE, col = "blue", lwd = 3)

Species distribution model to produce psuedo-absence data using the randomPoints() function in the Dismo package in R: Error Message

Aim:
My aim is to build a species distribution prediction model using the function randomPoints() in the Dismo package with the utlimate aim of generating pseudo-absence points and plotting them on a map. These points will be converted into a rastor file in order to extract meta-data (i.e sea surface salinity, chlorophyll levels etc) from MODIS files to act as important ecological predictors to determine how they affect the distribution of blue whales.
The idea is to plug both presence and pseudo absence data with associated meta-data values into general linear mixed (GLM’s), which will ultimately make my models more balanced and accurate.
Problem Outline:
I am attempting to follow this species distribution exercise to generate pseudo-absence points (desired output: see image 1) using the randomPoints() function. However, after running my R-code (see below), I am experiencing this R error message (see below). My r-code also produces a map with the GPS points plotted (image 2).
Error Message:
Error in (function (classes, fdef, mtable) :
unable to find an inherited method for function ‘nlayers’ for signature ‘"standardGeneric"’
I have tried to find the solution. However, I am relatively new in regards to at working with maps in R, and I am feeling really confused as to what the problem is here!
My data frame contains 918 rows and I would like to produce the same number of pseudo-absence points as presence points. Unfortunately, I cannot publish my data but I have provided a mini data frame to use as an example.
If anyone can help me, I would be deeply appreciative!
Many thanks in advance!
R-code:
###Open Packages
library("sp")
library("rgdal")
library("raster")
library("maptools")
library("rgdal")
library("dismo")
library("spatialEco")
library("ggplot2")
library("dplyr")
library("maps")
library("ggspatial")
library("GADMTools")
library("maps")
##Mini Dataframe
Blue.whale_New <- data.frame(longitude = c(80.5, 80.5, 80.5, 80.5, 80.4, 80.4, 80.5, 80.5, 80.4),
latitude = c(5.84, 5.82, 5.85, 5.85, 5.89, 5.82, 5.82, 5.84, 5.83))
####World Bioclim Data + GADM Object
##Creating an array object with just longitude and latitude decimal coordinates
##Upload the maps
##Plotting the map of Sri Lanka
###GADM OBJECT
dev.new()
bioclim1.data <- getData('GADM', country='LKA', level=1)
#####Worldclim raster layers
bioclim.data <- getData(name = "worldclim",
var = "bio",
res = 2.5,
path = "./")
####Get bounding box of Sri Lanka shape file
bb=bioclim1.data#bbox
# Determine geographic extent of our data
max.lat <- ceiling(max(Blue.whale$latitude))
min.lat <- floor(min(Blue.whale$latitude))
max.lon <- ceiling(max(Blue.whale$longitude))
min.lon <- floor(min(Blue.whale$longitude))
geographic.extent <- extent(x = c(min.lon, max.lon, min.lat, max.lat))
#####Plot map
dev.new()
plot(bioclim1.data,
xlim = c(min(c(min.lon,bb[1,1])), max(c(max.lon,bb[1,2]))),
ylim = c(min(c(min.lat,bb[2,1])), max(c(max.lat,bb[2,2]))),
axes = TRUE,
col = "grey95")
# Add the points for individual observation
points(x = Blue.whale$longitude,
y = Blue.whale$latitude,
col = "olivedrab",
pch = 15,
cex = 0.50)
###Building a model and visualising results
##Crop bioclim data to geographic extent of blue whales GADM Map
bioclim.data.blue.whale_1<-crop(x=bioclim1.data, y=geographic.extent)
##Crop bioclim data to geographic extent of blue whales World Clim
bioclim.data.blue.whale_2<-crop(x = bioclim.data, y = geographic.extent)
#Build distribution mode using the World Clim)
bw.model <- bioclim(x = bioclim.data, p = Blue.whale_New)
# Predict presence from model
predict.presence <- dismo::predict(object = bw.model, x = bioclim.data, ext = geographic.extent)
# Plot base map
dev.new()
plot(bioclim1.data,
xlim = c(min(c(min.lon,bb[1,1])), max(c(max.lon,bb[1,2]))),
ylim = c(min(c(min.lat,bb[2,1])), max(c(max.lat,bb[2,2]))),
axes = TRUE,
col = "grey95")
# Add model probabilities
plot(predict.presence, add = TRUE)
# Redraw those country borders
plot(bioclim1.data, add = TRUE, border = "grey5")
# Add original observations
points(Blue.whale_New$longitude,
Blue.whale_New$latitude,
col = "olivedrab", pch = 20, cex = 0.75)
##Psuedo Absence Points
# Use the bioclim data files for sampling resolution
bil.files <- list.files(path = "data/wc2-5",
pattern = "*.bil$",
full.names = TRUE)
# Randomly sample points (same number as our observed points)
##Mask = provides resolution of sampling points
##n = number of random points
##ext = Spatially restricts sampling
##extf = expands sampling a little bit
background <- randomPoints(mask = mask,
n = nrow(Blue.whale_New),
ext = geographic.extent,
extf = 1.25)
#Error in (function (classes, fdef, mtable) :
unable to find an inherited method for function ‘nlayers’ for signature ‘"standardGeneric"’
Code I plan to run once I have solved my problem
Visualize the pseudo-absence points on a map:
# Plot the base map
dev.new()
plot(bioclim1.data,
xlim = c(min(c(min.lon,bb[1,1])), max(c(max.lon,bb[1,2]))),
ylim = c(min(c(min.lat,bb[2,1])), max(c(max.lat,bb[2,2]))),
axes = TRUE,
col = "grey95")
# Add the background points
points(background, col = "grey30", pch = 1, cex = 0.75)
# Add the observations
points(x = Blue.whale_New$longitude,
y = Blue.whale_New$latitude,
col = "olivedrab",
pch = 20,
cex = 0.75)
box()
# Arbitrarily assign group 1 as the testing data group
testing.group <- 1
# Create vector of group memberships
group.presence <- kfold(x = Blue.whale_New, k = 5) # kfold is in dismo package
Image 1 (Desired Output)
Image 2:
I think your problem is that you are not passing a proper mask (e.g. a raster layer) to the randomPoints function, but rather you are passing the mask function itself, which is a standardGeneric, hence the error message.
You can generate a mask from the predictor raster you want to use and then pass it to the randomPoints function
mask_r <- bioclim1.data[[1]] #just one raterLayer
mask_r[!is.na(mask_r)] <- 1 #set all non-NA values to 1
background <- randomPoints(mask = mask_r, #note the proper mask layer
n = nrow(Blue.whale_New),
ext = geographic.extent,
extf = 1.25)
I haven't checked, but this solution should work.
Best,
Emilio

Too many legend items making it impossible to read

I have a SpatialPolygonsDataFrame with 213 Ecoregions to plot.
My issue is that I'm not able to organize the legend in a way that I could indeed read the legend. I'm new to r and I've been trying this for 2 days now, I feel really stupid... I wonder if anyone could give me some hint on how to achieve this goal.
#### Download and unzip ecoregions ####
#the reference for this ecoregions data: https://doi.org/10.1093/biosci/bix014
#Don't forget to change the path to a path of your own
dir_eco<-"C:/Users/thai/Desktop/Ecologicos/w2"
download.file("https://storage.googleapis.com/teow2016/Ecoregions2017.zip",
file.path(paste0(dir_eco,"/","Ecoregions2017.zip",sep="")))
unzip("Ecoregions2017.zip")
#Read this shapefile
#install.packages("rgdal")
library(rgdal)
ecoreg_shp<- readOGR("Ecoregions2017.shp")
#Crop to a smaller extent
xmin=-120; xmax=-35; ymin=-60; ymin2=-40; ymax=35
limits2 <- c(xmin, xmax, ymin2, ymax) # Just from mexico to Uruguay.
ecoreg_shp<-crop(ecoreg_shp,limits2)
# Color palette - one color for each attribute level
n <- 213
color = grDevices::colors()[grep('gr(a|e)y', grDevices::colors(), invert = T)]
# pie(rep(1,n), col=sample(color, n)) #just to take a look at the colors
col_samp<-sample(color, n)
ecoreg_shp#data$COLOR<-col_samp #put the colors in the polygons data frame
#Plot
png(file="29_ecoreg2.png", width=3000, height=3000, units="px", res=300)
par(mar=c(50,0.3,1.5,0),pty="s")
spplot(ecoreg_shp, zcol = "ECO_NAME", col.regions = ecoreg_shp#data$COLOR,
colorkey = list(space = "bottom", height = 1))
dev.off()
Now, this is how this plot looks like:
I've managed to put this legend at the right of the map, but gets also too overlayed... I've tried to do colorkey = FALSE and set a separate legend...
#Plot the map with no legend
spplot(ecoreg_shp, zcol = "ECO_NAME", col.regions = ecoreg_shp#data$COLOR,
colorkey = FALSE)
#Now, just the legend
legend("bottom",legend=ecoreg_shp#data$ECO_NAME,fill=ecoreg_shp#data$COLOR, ncol=3)
But doesn't work.. I get a message that plot.new has not been called yet
I've managed to do a lot of things with the legend, but I can't make it good... Like the legend item below the map in 2 or 3 columns in a long figure... Actually doesn't matter the format at all, I just wanted to be able to make a good figure. Can anyone point me in some direction? I'm trying to learn ggplot2, but I don't know r enough yet for using such a difficult package.
Thank you in advance, any tip is much appreciated.
As said in the comments, you will not really be able to distinguish between colors. You should define a classification with multiple levels and choose similar colors for similar ecoregions.
Nevertheless, you can create an image only for this long legend as follows. I used a reproducible example as I do not have your dataset but I use the same names as yours so that you can directly use the script:
library(sp)
library(rgdal)
n <- 213
dsn <- system.file("vectors", package = "rgdal")[1]
ecoreg_shp <- readOGR(dsn = dsn, layer = "cities")
ecoreg_shp <- ecoreg_shp[1:n,]
# Color palette - one color for each attribute level
color <- grDevices::colors()[grep('gr(a|e)y', grDevices::colors(), invert = T)]
col_samp <- sample(color, n)
ecoreg_shp#data$COLOR <- col_samp #put the colors in the polygons data frame
ecoreg_shp#data$ECO_NAME <- ecoreg_shp#data$NAME
# Define a grid to plot the legend
grid.dim <- c(45, 5)
ecoreg_shp#data$ROW <- rep(rev(1:grid.dim[1]), by = grid.dim[2], length.out = n)
ecoreg_shp#data$COL <- rep(1:grid.dim[2], each = grid.dim[1], length.out = n)
# Plot the legend
png(file = "legend.png",
width = 21, height = 29.7,
units = "cm", res = 300)
par(mai = c(0, 0, 0, 0))
plot(ecoreg_shp#data$COL,
ecoreg_shp#data$ROW,
pch = 22, cex = 2,
bg = ecoreg_shp#data$COLOR,
xlim = c(0.8, grid.dim[2] + 1),
xaxs = "i")
text(ecoreg_shp#data$COL,
ecoreg_shp#data$ROW,
ecoreg_shp#data$ECO_NAME,
pos = 4, cex = 0.75)
dev.off()
The result:

Resources