mapping by ggplot2 geom_polygon goes crazy after merging data - r

I am trying to make a grid containing maps of megaregions in the us. I create a SpatialPolygonDataframe from a shape file. then convert it into a data.frame to use ggplot2. as soon as I add the data into the frame, the polygon plots.
the file containing SpatialPolygon and the data frame are here:
https://drive.google.com/open?id=1kGPZ3CENJbHva0s558vWU24-erbqWUGo
the code is as follow:
load("./data.rda")
prop.test <- proptest.result[which(proptest.result$variable=="Upward N"),]
#transforming the data
# add to data a new column termed "id" composed of the rownames of data
shape#data$id <- rownames(shape#data)
#add data to our
shape#data <- data.frame(merge(x = shape#data, y = prop.test, by.x='Name', by.y="megaregion"))
# create a data.frame from our spatial object
mega.prop <- fortify(shape)
#merge the "fortified" data with the data from our spatial object
mega.prop.test <- merge(mega.prop, shape#data, by="id")
Plotting the first one (mega.prop) works fine:
ggplot(data = mega.prop, aes(x=long, y=lat, group=group), fill="blue")+
geom_polygon()
but plotting after adding the analytics data:
ggplot(data = mega.prop.test, aes(x=long, y=lat, group=group), fill="blue")+
geom_polygon()
In the new plot:
The filling of polygons is messed up. (Is it about the order of points?how?)
two of the polygons are totally missed.
What is the problem?
Thank you very much for your help.

Use geom_map() (which requires a slight tweak of your shapefile for some reason) so you don't have to do the merge/left join.
Also, you merged a great deal of different factors, not sure which ones you want to plot.
Finally, it's unlikely the coastal areas need that fine level of detail. rgeos::gSimplify() will definitely speed things up and you're already distorting areas, so a smaller bit of additional distortion won't impact the results.
library(ggplot2)
library(tidyverse)
shape_map <- tbl_df(fortify(shape, region="Name"))
colnames(shape_map) <- c("long", "lat", "order", "hole", "piece", "region", "group")
prop.test <- proptest.result[which(proptest.result$variable=="Upward N"),]
ggplot() +
geom_map(data=shape_map, map=shape_map, aes(long, lat, map_id=region)) +
geom_map(
data=filter(prop.test, season=="DJF"),
map=shape_map, aes(fill=prop.mega, map_id=megaregion)
)

Related

ggplot, ggsave & coord_map/quickmap: how to save large spatial objects and get the projection right?

I have a largish polyline shapefile (Bavarian rivers, which can be accessed here) which I would like to plot and save via ggplot. This can easily be done via e.g. this code:
library(ggplot2)
library(rgdal)
library(sp)
library(rgeos)
riv <- readOGR(paste0(getwd(),"\\rivers_bavaria","rivers_bavaria"))
riv1 <- subset(riv,WDM=="1310"|WDM=="1320")
riv2 <- subset(riv,WDM=="1330")
p <- ggplot() +
geom_line(data=riv1, aes(x=long, y=lat, group=group), color="dodgerblue", size=1) +
geom_line(data=riv2, aes(x=long, y=lat, group=group), color="dodgerblue")
ggsave(paste0(getwd(),"\\riv.tiff",p,device="tiff",units="cm",dpi=300)
This is not exactly efficient, due to the large file size, but it works. However, without further specifying aspect ratio or projection, the dimensions of the output file are defined by the plot window - not desirable for maps. This can be remedied by using coord_quickmap().
p1 <- ggplot() +
geom_line(data=riv1, aes(x=long, y=lat, group=group), color="dodgerblue", size=1) +
geom_line(data=riv2, aes(x=long, y=lat, group=group), color="dodgerblue") +
coord_quickmap()
p1
Unfortunately, the projection is completely off. I have tried coord_map() for a better result, but due to the large file size, it takes forever and is therefore not a realistic option. Simplifying the polyline via gLinemerge() produces a much smaller object, but cannot be handled by ggplot, as it is a SpatialLines object. Using fortify() or data.frame() to coerce it into a ggplot-friendly data frame format also produces Error: ggplot2 doesn't know how to deal with data of class SpatialLines.
I'm therefore desperately looking for a workflow that will allow me to plot and save this kind of spatial data in good quality with ggplot. Any suggestions will be much appreciated!
Here's a quick walkthrough with sf. I recommend the sf vignettes and docs to see more details of any of the functions. I'm first reading the shapefile in as an sf object using sf::st_read, then filtering, mutating, and selecting the same as you would in dplyr to get a smaller version of the shape.
library(tidyverse)
library(sf)
rivers_sf <- st_read("rivers_bavaria/rivers_bavaria.shp") %>%
filter(WDM %in% c("1310", "1320", "1330")) %>%
mutate(name2 = ifelse(WDM == "1330", "river 2", "river 1")) %>%
select(name2, NAM, geometry)
The object is pretty big, and will be very slow to plot, so I simplified it by uniting the geometries by name, then using st_simplify. There's also rmapshaper::ms_simplify, which uses Mapshaper and which I prefer for better control over how much information you keep. Then to show a CRS transformation, I picked a projection from Spatial Reference for Germany.
riv_simple <- rivers_sf %>%
group_by(name2, NAM) %>%
summarise(geometry = st_union(geometry)) %>%
ungroup() %>%
st_simplify(preserveTopology = T, dTolerance = 1e6) %>%
st_transform(31493)
The dev version of ggplot2 on GitHub has a function geom_sf for plotting different types of sf objects. To get this version, run devtools::install_github("tidyverse/ggplot2").
geom_sf has some quirks, and works a little differently from other geoms, but it's pretty versatile. I believe it's being included in the next CRAN release. geom_sf has corresponding stat_sf and coord_sf. By default, it plots graticule lines; to turn those off, add coord_sf(ndiscr = F).
ggplot(riv_simple) +
geom_sf(aes(size = name2), color = "dodgerblue", show.legend = "line") +
scale_size_manual(values = c("river 1" = 1, "river 2" = 0.5)) +
theme_minimal() +
coord_sf(ndiscr = F)
Hope that helps you get started!

Incorrect polygon for New York State in ggplot2 maps

I am trying to plot data on New York state map. I am using map_data code. But If you look at polygon, It shows extra piece which is actually not part of New York state? Any ideas how can I apply filter on map data to remove that?
ny <- map_data("state", region="new york")
s1 <- ggplot() + geom_polygon(data=ny, aes(x=long, y=lat))
s2 <- ggplot() + geom_point(data=ny, aes(x=long, y=lat))
grid.arrange(s1, s2, ncol=2)
Output:
geom_point shows correct boundary, but not polygon
The state is actually composed of multiple polygons which are not connected. You just need to tell ggplot which points go with which groups. This is done by mapping your data to the group argument of aes(). See the documentation here, although it would be nicer if they had a map example.
So how do you know which points go with which groups? The data frame returned by map_data() contains a group column. See:
head(ny)
ny$group
To plot the map correctly, use:
ggplot() + geom_polygon(data = ny, aes(x = long, y = lat, group = group))

R post-merge ggplot/qmap plots zipcode polygons incorrectly (jagged)

I have spent days searching this site and others for a solution, and haven't found it yet. If there is another page with my solution, and I missed it, I apologize.
I found this
but reloading ggplot2 and rgdal (after detaching) didn't fix it.
I am using demographic data at the ZCTA (zip code tabulation area) to overlay polygons on a Google terrain map. I am able to get the polygons plotted correctly using qmap, but after I merge in the demographic data, the plots are all wrong. I've tried specifying the order, and playing with the merge. (Heck, I've tried all sorts of things.) I'd love some help with this.
is a working plot, before the merge, and
is after.
Here's my code:
# shapefile from Census
fips34 <-readOGR(".", "zt34_d00", stringsAsFactors = FALSE)
# zip code areas, 1 row per ZCTA with nonmissing Census data
ptInd <-read.dta("ptIndzcta.dta")
keepzips <- fips34
keepzips#data$id <-rownames(keepzips#data) # create idvar to remerge
keepzipsdat <- fortify(keepzips, region="id") # fortify
keepzipsdat <- keepzipsdat[order(keepzipsdat$order),] # clarify order
keepzipsdat <- join(keepzipsdat, keepzips#data, by="id") # remerge for zcta
qmap("new jersey", zoom = 8, maptype="terrain", color="bw") +
geom_polygon(aes(x=long, y=lat, group=group),
data=keepzipsdat) + coord_equal() # this map plots fine
# now merge in data to create choropleth
zip2 <- merge(keepzipsdat, ptInd, by.y="zcta5", by.x="ZCTA", all.x = TRUE)
zip2[order(zip2$order),] # reestablish order, is this necessary?
qmap("new jersey", zoom = 8, maptype="terrain", color="bw") +
geom_polygon(aes(x=long, y=lat, group=group),
data=zip2) + coord_equal() # this looks crazy
ggplot(data=zip2, aes(x=long, y=lat, group=group)) + geom_polygon()
# also crazy
# and this is before assigning a fill variable to the polygons

Plotting OpenStreetMap with ggmap

I'm trying to get districts of Warsaw and draw them on google map. Using this code, where 2536107 is relation code for OpenStreetMap single Warsaw district, gives me almost what I want but with a few bugs. There is general outline but also lines between points which shouldn't be connected. What am I doing wrong?
map <- get_googlemap('warsaw', zoom =10)
warszawa <- get_osm(relation(2536107), full = T)
warszawa.sp <- as_sp(warszawa, what='lines')
warsawfort <- fortify(warszawa.sp)
mapa_polski <- ggmap(map, extent='device', legend="bottomleft")
warsawfort2 <- geom_polygon(aes(x = long, y = lat),
data = warsawfort, fill="blue", colour="black",
alpha=0.0, size = 0.3)
base <- mapa_polski + warsawfort2
base
Edit: I figured it must be somehow connected with order of plotting every point/line but I have no idea how to fix this.
There is a way to generate your map without using external packages: don't use osmar...
This link, to the excellent Mapzen website, provides a set of shapefiles of administrative areas in Poland. If you download and unzip it, you will see a shapfile set called warsaw.osm-admin.*. This is a polygon shapefile of all the districts in Poland, conveniantly indexed by osm_id(!!). The code below assumes you have downloaded the file and unzipped it into the "directory with your shapefiles".
library(ggmap)
library(ggplot2)
library(rgdal)
setwd(" <directory with your shapefiles> ")
pol <- readOGR(dsn=".",layer="warsaw.osm-admin")
spp <- pol[pol$osm_id==-2536107,]
wgs.84 <- "+proj=longlat +datum=WGS84"
spp <- spTransform(spp,CRS(wgs.84))
map <- get_googlemap('warsaw', zoom =10)
spp.df <- fortify(spp)
ggmap(map, extent='device', legend="bottomleft") +
geom_polygon(data = spp.df, aes(x = long, y=lat, group=group),
fill="blue", alpha=0.2) +
geom_path(data=spp.df, aes(x=long, y=lat, group=group),
color="gray50", size=0.3)
Two nuances: (1) The osm IDs are stored as negative numbers, so you have to use, e.g.,
spp <- pol[pol$osm_id==-2536107,]
to extract the relevant district, and (2) the shapefile is not projected in WGS84 (long/lat). So we have to reproject it using:
spp <- spTransform(spp,CRS(wgs.84))
The reason osmar doesn't work is that the paths are in the wrong order. Your warszawa.sp is a SpatialLinesDataframe, made up of a set of paths (12 in your case), each of which is made up of a set of line segments. When you use fortify(...) on this, ggplot tries to combine them into a single sequence of points. But since the paths are not in convex order, ggplot tries, for example, to connect a path that ends in the northeast, to a path the begins in the southwest. This is why you're getting all the extra lines. You can see this by coloring the segments:
xx=coordinates(warszawa.sp)
colors=rainbow(11)
plot(t(bbox(warszawa.sp)))
lapply(1:11,function(i)lines(xx[[i]][[1]],col=colors[i],lwd=2))
The colors are in "rainbow" order (red, orange, yellow, green, etc.). Clearly, the lines are not in that order.
EDIT Response to #ako's comment.
There is a way to "fix" the SpatialLines object, but it's not trivial. The function gPolygonize(...) in the rgeos package will take a list of SpatialLines and convert to a SpatialPolygons object, which can be used in ggplot with fortify(...). One huge problem (which I don't understand, frankly), is that OP's warszaw.sp object has 12 lines, two of which seem to be duplicates - this causes gPolygonize(...) to fail. So if you create a SpatialLines list with just the first 11 paths, you can convert warszawa.sp to a polygon. This is not general however, as I can't predict how or if it would work with other SpatialLines objects converted from osm. Here's the code, which leads to the same map as above.
library(rgeos)
coords <- coordinates(warszawa.sp)
sll <- lapply(coords[1:11],function(x) SpatialLines(list(Lines(list(Line(x[[1]])),ID=1))))
spp <- gPolygonize(sll)
spp.df <- fortify(spp)
ggmap(map, extent='device', legend="bottomleft") +
geom_polygon(data = spp.df, aes(x = long, y=lat, group=group),
fill="blue", alpha=0.2) +
geom_path(data=spp.df, aes(x=long, y=lat, group=group),
color="gray50", size=0.3)
I am not sure this is a general hangup--I can reproduce your example and see the issue. My first thought was that you didn't supply group=id which are typically used for polygons with many lines, but you have lines, so that should not be needed.
The only way I could get it to display properly was by changing your lines into a polygon off script. Qgis' line to polygon didn't get this "right", getting a large donut hole, so I used ArcMap, which produced a full polygon. If this is a one off that may work for your workflow. Odds are it is not. In that case, perhaps RGDAL can transform lines to polygons, assuming that is indeed a general problem.
Upon reading the polygon shapefile and fortifying that, your code ran without problems.

How to change ggplot legend labels and names with two layers?

I am plotting the longitude and latitude coordinates of two different data frames in São Paulo map using ggmap and ggplot packages and want to label manually each legend layer:
update: I edited my code below to become fully reproducible (I was using the geocode function instead of get_map).
update: I would like to do this without combining the data frames.
require(ggmap)
sp <- get_map('sao paulo', zoom=11, color='bw')
restaurants <- data.frame(lon=c(-46.73147, -46.65389, -46.67610),
lat=c(-23.57462, -23.56360, -23.53748))
suppliers <- data.frame(lon=c(-46.70819,-46.68155, -46.74376),
lat=c(-23.53382, -23.53942, -23.56630))
ggmap(sp)+geom_point(data=restaurants, aes(x=lon, y=lat),color='blue',size=4)+geom_point(data=suppliers, aes(x=lon, y=lat), color='red', size=4)
I have looked to several questions and tried different ways without success. Does anyone know how can I insert legend and label the blue points as restaurants and the red points as suppliers?
Now that your code is reproducible (thanks!):
dat <- rbind(restaurants,suppliers)
dat$grp <- rep(c('Restaurants','Suppliers'),each = 3)
ggmap(sp) +
geom_point(data=dat, aes(x=lon, y=lat,colour = grp),size = 4) +
scale_colour_manual(values = c('red','blue'))

Resources