r geom_map fails with GeoJSON map simplified with gSimplify - r

I'm constructing world maps with countries color-filled with the (continuous) value depending on a column in a data frame called temp.sp. I want to put several of these maps in a graph. I construct each map using ggplot with geom_map and then construct and display the graphs using multiplot() which uses grid code.
I'm using a GeoJSON map (world <- readOGR(dsn = "ne_50m_admin_0_countries.geojson", layer = "OGRGeoJSON")). The resulting SpatialPolygonsDataFrame is 4.1 Mb and the dataframe that results from worldMap <- broom::tidy(world, region = "iso_a3") has 93391 rows. So when I run multiplot with 4 plot files, it takes a long time.
I thought that I could speed up the printing by simplifying the world map with gSimplify using code like world.simp <- gSimplify(world, tol = .1, topologyPreserve = TRUE). The resulting data frame, worldMap.simp only has 27033 rows but when I use this map I get the error message Error in unit(x, default.units) : 'x' and 'units' must have length > 0.
The error message is generated when I run this code with worldMap.simp. When I use worldMap I have no problems.
gg <- ggplot(temp.sp, aes(map_id = id))
gg <- gg + geom_map(aes(fill = temp.sp$value), map = worldMap.simp, color = "white").
I tried converting temp.sp$value to factor but it made no difference.
To summarize, using a gSimplified map causes the displaying of a graph produced with ggplot and geom_map to fail.

Rather than try to figure out what was going wrong with gSimplify, I found and downloaded a lower resolution map from http://geojson.xyz. The one I'm currently using is
https://d2ad6b4ur7yvpq.cloudfront.net/naturalearth-3.3.0/ne_110m_admin_0_countries.geojson
Note that it has a similar filename, but with 110m instead of 50m.

Related

Error in axis(side = side, at = at, labels = labels, ...) : invalid value specified for graphical parameter "pch"

I have applied DBSCAN algorithm on built-in dataset iris in R. But I am getting error when tried to visualise the output using the plot( ).
Following is my code.
library(fpc)
library(dbscan)
data("iris")
head(iris,2)
data1 <- iris[,1:4]
head(data1,2)
set.seed(220)
db <- dbscan(data1,eps = 0.45,minPts = 5)
table(db$cluster,iris$Species)
plot(db,data1,main = 'DBSCAN')
Error: Error in axis(side = side, at = at, labels = labels, ...) :
invalid value specified for graphical parameter "pch"
How to rectify this error?
I have a suggestion below, but first I see two issues:
You're loading two packages, fpc and dbscan, both of which have different functions named dbscan(). This could create tricky bugs later (e.g. if you change the order in which you load the packages, different functions will be run).
It's not clear what you're trying to plot, either what the x- or y-axes should be or the type of plot. The function plot() generally takes a vector of values for the x-axis and another for the y-axis (although not always, consult ?plot), but here you're passing it a data.frame and a dbscan object, and it doesn't know how to handle it.
Here's one way of approaching it, using ggplot() to make a scatterplot, and dplyr for some convenience functions:
# load our packages
# note: only loading dbscacn, not loading fpc since we're not using it
library(dbscan)
library(ggplot2)
library(dplyr)
# run dbscan::dbscan() on the first four columns of iris
db <- dbscan::dbscan(iris[,1:4],eps = 0.45,minPts = 5)
# create a new data frame by binding the derived clusters to the original data
# this keeps our input and output in the same dataframe for ease of reference
data2 <- bind_cols(iris, cluster = factor(db$cluster))
# make a table to confirm it gives the same results as the original code
table(data2$cluster, data2$Species)
# using ggplot, make a point plot with "jitter" so each point is visible
# x-axis is species, y-axis is cluster, also coloured according to cluster
ggplot(data2) +
geom_point(mapping = aes(x=Species, y = cluster, colour = cluster),
position = "jitter") +
labs(title = "DBSCAN")
Here's the image it generates:
If you're looking for something else, please be more specific about what the final plot should look like.

Displaying counts instead of "levels" using stat_density2d

My objective is to portray the locations with varying numbers of traffic conflicts in a road intersection. My data consists of all the conflicts that we observed in a given time period at an intersection coded into a .CSV file with the following fields "time of conflict", "TTC" (means Time to Collision), "Lat", "Lon" and "Conflict Type". I figured the best way to do so would be using the 'ggmap+stat_density2d' function in R. I am using the following code:
df = read.csv(filename, header = TRUE)
int.map = get_map(location = c(mean.long, mean.lat), zoom = 20, maptype = "satellite")
int.map = ggmap(int.map, extent ="device", legend = "right")'''
int.map +stat_density2d(data = new_xdf, aes(x, y, fill = ..levels.., alpha = ..levels..),
geom = "polygon")
int.map + scale_fill_gradientn(guide = "colourbar", colours = rev(brewer.pal(7,"Spectral")),
name = "Conflict Density")
The output is a very nice map Safety Heat Map that correctly portrays the conflict hotspots. My problem is that in the legends it gives the values of "levels" automatically calculated by the 'stat_density2d()' function. I tried searching for a way to display, say, the counts of all conflict points inside each level on the legend bar but to no avail.
I did find the below link that handles a similar question, but the problem with that is that it creates a new data frame (new_xdf) with much more points than in the original data. Thus, the counts determined in that program seems to be of no use to me as I want the exact number of conflict points in my original data to be displayed in the legends bar.
How to find points within contours in R?
Thanks in advance.
Edit: Link to a sample data file
https://docs.google.com/spreadsheets/d/11vc3lOhzQ-tgEiAXe-MNw2v3fsAqnadweVrvBdNyNuo/edit?usp=sharing

Trying to plot in tmap shapefile with attribute

I am trying to work with municipality data in Norway, and I'm totally new to QGIS, shapefiles and plotting this in R. I download the municipalities from here:
Administrative enheter kommuner / Administrative units municipalities
Reproducible files are here:
Joanna's github
I have downloaded QGIS, so I can open the GEOJson file there and convert it to a shapefile. I am able to do this, and read the data into R:
library(sf)
test=st_read("C:/municipality_shape.shp")
head(test)
I have on my own given the different municipalities different values/ranks that I call faktor, and I have stored this classification in a dataframe that I call df_new. I wish to merge this "classification" on to my "test" object above, and wish to plot the map with the classification attribute onto the map:
test33=merge(test, df_new[,c("Kommunekode_str","faktor")],
by=c("Kommunekode_str"), all.x=TRUE)
This works, but when I am to plot this with tmap,
library(tmap)
tmap_mode("view")
tm_shape(test33) +
tm_fill(col="faktor", alpha=0.6, n=20, palette=c("wheat3","red3")) +
tm_borders(col="#000000", lwd=0.2)
it throws this error:
Error in object[-omit, , drop = FALSE] : incorrect number of
dimensions
If I just use base plot,
plot(test33)
I get the picture:
You see I get three plots. Does this has something to do with my error above?
I think the main issue here is that the shapes you are trying to plot are too complex so tmap is struggling to load all of this data. ggplot also fails to load the polygons.
You probably don't need so much accuracy in your polygons if you are making a choropleth map so I would suggest first simplifying your polygons. In my experience the best way to do this is using the package rmapshaper:
# keep = 0.02 will keep just 2% of the points in your polygons.
test_33_simple <- rmapshaper::ms_simplify(test33, keep = 0.02)
I can now use your code to produce the following:
tmap_mode("view")
tm_shape(test_33_simple) +
tm_fill(col="faktor", alpha=0.6, n=20, palette=c("wheat3","red3")) +
tm_borders(col="#000000", lwd=0.2)
This produces an interactive map and the colour scheme is not ideal to tell differences between municipalities.
static version
Since you say in the comments that you are not sure if you want an interactive map or a static one, I will give an example with a static map and some example colour schemes.
The below uses the classInt package to set up breaks for your map. A popular break scheme is 'fisher' which uses the fisher-jenks algorithm. Make sure you research the various different options to pick one that suits your scenario:
library(ggplot2)
library(dplyr)
library(sf)
library(classInt)
breaks <- classIntervals(test_33_simple$faktor, n = 6, style = 'fisher')
#label breaks
lab_vec <- vector(length = length(breaks$brks)-1)
rounded_breaks <- round(breaks$brks,2)
lab_vec[1] <- paste0('[', rounded_breaks[1],' - ', rounded_breaks[2],']')
for(i in 2:(length(breaks$brks) - 1)){
lab_vec[i] <- paste0('(',rounded_breaks[i], ' - ', rounded_breaks[i+1], ']')
}
test_33_simple <- test_33_simple %>%
mutate(faktor_class = factor(cut(faktor, breaks$brks, include.lowest = T), labels = lab_vec))
# map
ggplot(test_33_simple) +
geom_sf(aes(fill = faktor_class), size= 0.2) +
scale_fill_viridis_d() +
theme_minimal()

How to convert point data collected at grid interval to a georeferenced dataset in r?

I have this dataset: https://www.dropbox.com/s/k06n9l05t25r6x2/newdata.csv?dl=0
(Sample)
"","row","col","flagrv"
"1",2361,530,2
"2",2378,531,2
"3",2360,531,2
"4",2355,531,2
"5",2363,532,2
"6",2359,532,2
"7",2368,533,2
"8",2367,533,2
"10",2359,533,2
And if I plot using this code:
gs.pal <- colorRampPalette(c("blue", "green","yellow","orange","red"),bias=1,space="rgb")
ggplot(data=ndata,aes(x=col,y=row,color=flagrv)) +
geom_point(size = 0.01)+
scale_colour_gradientn(name = "Scale",colours = gs.pal(5))+
xlab('Longitude')+
ylab('Latitude')+
theme_bw()+
theme(line = element_blank())+
theme(legend.position = c(.93,.20),panel.grid.major = element_line(colour = "#854440"))+
ggsave("test.png",width=10, height=8,dpi=300)
We get this figure:
Now, the problem is I don't have Lat-Long values. I want to overlay the state boundaries but can't use the Maps package. Someone suggested I used gdal but I don't know how. Could you please tell me how I can map this into the Lat-Long domain so that I can easily manipulate it.
Edit:
I learnt from someone else that I can use this:
gdal_translate -a_srs EPSG:4269 FILE.asc FILE.tif
#
Errors for answers 1
Error: unexpected ']' in "spdf = SpatialPointsDataFrame(coords, all_data[, c("flagrv"]"
Then I changed the code to:
spdf = SpatialPointsDataFrame(coords, all_data[, c("flagrv")])
But now I have this error:
Error in validObject(.Object) : invalid class “SpatialPointsDataFrame” object: invalid object for slot "data" in class "SpatialPointsDataFrame": got class "integer", should be or extend class "data.frame"
Without knowing at least the projection and datum of the dataset (but hopefully more info such as resolution and extent), there is no easy way to do this. If this is a derived map, try to find what was used to generate it.
With this information you can then use the projection function in the raster package to define the projection of the dataset.
EDIT (based on additional info provided, there is a working solution):
Here is a working solution given that the lower left corner of the dataset has a 24.55, -130 coordinate, spacing among row/col is 0.01 degrees and projection is nad83. Note that the metadata info provided was wrong, as the min lat value was not 20 degrees but could be estimated from the southernmost point (key west) as 24.55.
#load dataset
all_data=(read.csv('new_data.csv',header=T, stringsAsFactors=F))
res=0.01 #spacing of row and col coords pre-specified
#origin_col_row=c(0, 0)
origin_lat_lon=c(24.55, -130)
all_data$row=(all_data$row)*res+origin_lat_lon[1]
all_data$col=(all_data$col)*res+origin_lat_lon[2]
#now that we have real lat/lon, we can just create a spatial dataframe
library(rgdal)
library(sp)
coords = cbind(all_data$col, all_data$row)
spdf = SpatialPointsDataFrame(coords, data=all_data) #sp = SpatialPoints(coords)
proj4string(spdf) <- CRS("+init=epsg:4269")
r seems to choke trying to plot that many points, so to check if the answer made sense, I saved the dataset as a shapefile and plotted it on arcgis:
writeOGR(spdf,"D:/tmp_shapefile4.shp", "flagrv", driver="ESRI Shapefile")
I managed to plot it using ggplot2 with the code below, just be patient as it takes a while to plot it:
df=as.data.frame(spdf)
library(ggplot2)
ggplot(data=df,aes(x=col,y=row,color=flagrv))+
geom_point(size = 0.01)+
xlab('Longitude')+
ylab('Latitude')

ggmap with geom_map superimposed

library(sp)
library(spdep)
library(ggplot2)
library(ggmap)
library(rgdal)
Get and fiddle with data:
nc.sids <- readShapePoly(system.file("etc/shapes/sids.shp", package="spdep")[1],ID="FIPSNO", proj4string=CRS("+proj=longlat +ellps=clrk66"))
nc.sids=spTransform(nc.sids,CRS("+init=epsg:4326"))
Get background map from stamen.com, plot, looks nice:
ncmap = get_map(location=as.vector(bbox(nc.sids)),source="stamen",maptype="toner",zoom=7)
ggmap(ncmap)
Create a data frame with long,lat,Z, and plot over the map and a blank plot:
ncP = data.frame(coordinates(nc.sids),runif(nrow(nc.sids)))
colnames(ncP)=c("long","lat","Z")
ggmap(ncmap)+geom_point(aes(x=long,y=lat,col=Z),data=ncP)
ggplot()+geom_point(aes(x=long,y=lat,col=Z),data=ncP)
give it some unique ids called 'id' and fortify (with vitamins and iron?)
nc.sids#data[,1]=1:nrow(nc.sids)
names(nc.sids)[1]="id"
ncFort = fortify(nc.sids)
Now, my map and my limits, I want to plot the 74 birth rate:
myMap = geom_map(aes(fill=BIR74,map_id=id),map=ncFort,data=nc.sids#data)
Limits = expand_limits(x=ncFort$long,y=ncFort$lat)
and on a blank plot I can:
ggplot() + myMap + Limits
but on a ggmap I can't:
ggmap(ncmap) + myMap + Limits
# Error in eval(expr, envir, enclos) : object 'lon' not found
Some versions:
> packageDescription("ggplot2")$Version
[1] "0.9.0"
> packageDescription("ggmap")$Version
[1] "2.0"
I can add geom_polygon to ggplot or ggmap and it works as expected. So something is up with geom_map....
The error message is, I think, the result of an inheritance issue. Typically, it comes about when different data frames are used in subsequent layers.
In ggplot2, every layer inherits default aes mappings set globally in the initial call to ggplot. For instance, ggplot(data = data, aes(x = x, y = y)) sets x and y mappings globally so that all subsequent layers expect to see x and y in whatever data frame has been assigned to them. If x and y are not present, an error message similar to the one you got results. See here for a similar problem and a range of solutions.
In your case, it's not obvious because the first call is to ggmap - you can't see the mappings nor how they are set because ggmap is all nicely wrapped up. Nevertheless, ggmap calls ggplot somewhere, and so default aesthetic mappings must have been set somewhere in the initial call to ggmap. It follows then that ggmap followed by geom_map without taking account of inheritance issues results in the error.
So, Kohske's advice in the earlier post applies - "you need to nullify the lon aes in geom_map when you use a different dataset". Without knowing too much about what has been set or how they've been set, it's probably simplest to globber the lot by adding inherit.aes = FALSE to the second layer - the call to geom_map.
Note that you don't get the error message with ggplot() + myMap + Limits because no aesthetics have been set in the ggplot call.
In what follows, I'm using R version 2.15.0, ggplot2 version 0.9.1, and ggmap version 2.1. I use your code almost exactly, except for the addition of inherit.aes = FALSE in the call to geom_map. That one small change allows ggmap and geom_map to be superimposed:
library(sp)
library(spdep)
library(ggplot2)
library(ggmap)
library(rgdal)
#Get and fiddle with data:
nc.sids <- readShapePoly(system.file("etc/shapes/sids.shp", package="spdep")[1],ID="FIPSNO", proj4string=CRS("+proj=longlat +ellps=clrk66"))
nc.sids=spTransform(nc.sids,CRS("+init=epsg:4326"))
#Get background map from stamen.com, plot, looks nice:
ncmap = get_map(location=as.vector(bbox(nc.sids)),source="stamen",maptype="toner",zoom=7)
ggmap(ncmap)
#Create a data frame with long,lat,Z, and plot over the map and a blank plot:
ncP = data.frame(coordinates(nc.sids),runif(nrow(nc.sids)))
colnames(ncP)=c("long","lat","Z")
ggmap(ncmap)+geom_point(aes(x=long,y=lat,col=Z),data=ncP)
ggplot()+geom_point(aes(x=long,y=lat,col=Z),data=ncP)
#give it some unique ids called 'id' and fortify (with vitamins and iron?)
nc.sids#data[,1]=1:nrow(nc.sids)
names(nc.sids)[1]="id"
ncFort = fortify(nc.sids)
#Now, my map and my limits, I want to plot the 74 birth rate:
myMap = geom_map(inherit.aes = FALSE, aes(fill=BIR74,map_id=id), map=ncFort,data=nc.sids#data)
Limits = expand_limits(x=ncFort$long,y=ncFort$lat)
# and on a blank plot I can:
ggplot() + myMap + Limits
# but on a ggmap I cant:
ggmap(ncmap) + myMap + Limits
The result from the last line of code is:

Resources