Calculating percent cover from shapefiles - r

I normally do not work with shapefiles, so I am a bit lost here. I have two shapefiles each with multiple objects. The first is a set of 32 polygons (each one is a plot). The second shapefile has >10,000 objects which represent vegetation clusters of different sizes within each plot. I am trying to figure out.
1) How do I calculate percent cover of total vegetation cover within each site?
2) What percentage of each the vegetation cover is less than 5 meters in area in each plot?
This is what my data looks like in ArcGIS for a single plot.

The following code will do what you want, I think.
NB: This uses the area information stored in the shapefile polygons (as explained below). It does not use the Area column in your vegetation shapefile data section. In most cases, your Area is identical to the area stored in the shapefile, but in some cases your Area is much larger. Since I don't know where your Area data came from, it seemed safer to use the information stored with the shapefile polygons.
library(rgdal)
library(ggplot2)
setwd("<directory containing all your shapefiles>")
plt.map <- readOGR(dsn=".",layer="plots")
veg.map <- readOGR(dsn=".",layer="veg_in_plots")
# associate LocCode with polygon IDs
plt.data <- cbind(id=rownames(plt.map#data), LocCode=plt.map#data$LocCode)
veg.data <- cbind(id=rownames(veg.map#data), LocCode=veg.map#data$LocCode)
# function to extract area from polygon data
get.area <- function(polygon) {
row <- data.frame(id=polygon#ID, area=polygon#area, stringsAsFactors=F)
return(row)
}
# area of each plot polygon
plt.areas <- do.call(rbind,lapply(plt.map#polygons, get.area))
plt.data <- merge(plt.data,plt.areas, by="id") # append area column to plt.data
# area of each vegetation polygon
veg.areas <- do.call(rbind,lapply(veg.map#polygons, get.area))
veg.data <- merge(veg.data,veg.areas, by="id") # append area column to veg.data
# total area of vegetation polygons by LocCode
veg.smry <- aggregate(area~LocCode,data=veg.data,sum)
smry <- merge(plt.data,veg.smry,by="LocCode")
smry$coverage <- with(smry,100*area.y/area.x) # coverage percentage
# total area for vegetation object with A < 5 msq
veg.lt5 <- aggregate(area~LocCode,data=veg.data[veg.data$area<5,],sum)
smry <- merge(smry, veg.lt5, by="LocCode")
# fraction of covered area coming from veg. obj. with A < 5 msq
smry$pct.lt5 <- with(smry, 100*area/area.y)
Produces this:
# head(smry)
# LocCode id area.x area.y coverage area pct.lt5
# 1 1 3 1165.916 259.2306 22.23408 60.98971 23.52720
# 2 10 11 1242.770 366.3222 29.47626 88.21827 24.08216
# 3 11 12 1181.366 213.2105 18.04779 129.21612 60.60496
# 4 12 13 1265.352 577.6037 45.64767 236.83946 41.00380
# 5 13 14 1230.662 226.2686 18.38593 48.09509 21.25575
# 6 14 15 1274.538 252.0577 19.77640 46.94874 18.62619
Explanation:
Shapefiles can be imported into R using readOGR(...) in the rgdal package. When importing a polygon shapefile, the result is a "SpatialPolygonDataFrame" object. These objects basically have two sections: a polygon section, which has the coordinates needed to plot each polygon, and a data section, which has data for each polygon (so, one row per polygon). If the shapefile is imported as, e.g., map,
map <- readOGR(dsn=".",layer="myShapeFile")
then the polygon and data sections can be accessed as map#polygon and map#data. It turns out that the polygon areas are stored in the polygon section. To get the areas, we define a function, get.area(...) that extracts the area and polygon ID from a polygon. Then we call that function for all polygons using lapply(...), and bind all the returned values together row-wise using rbind(...):
plt.areas <- do.call(rbind,lapply(plt.map#polygons, get.area))
veg.areas <- do.call(rbind,lapply(veg.map#polygons, get.area))
Now we need to associate vegetation areas with plot polygons. This is done through column LocCode, which is present in the data section of each shapefile. So we first associate polygon ID's with LocCode for both plots and vegetation areas:
plt.data <- cbind(id=rownames(plt.map#data), LocCode=plt.map#data$LocCode)
veg.data <- cbind(id=rownames(veg.map#data), LocCode=veg.map#data$LocCode)
Then we append the area column based on polygon ID:
plt.data <- merge(plt.data,plt.areas, by="id") # append area column to plt.data
veg.data <- merge(veg.data,veg.areas, by="id") # append area column to veg.data
Then we need to sum the vegetation areas by LocCode:
veg.smry <- aggregate(area~LocCode,data=veg.data,sum)
And finally merge this with the plot polygon areas:
smry <- merge(plt.data,veg.smry,by="LocCode")
In the smry dataframe, area.x is the area of the plot, and area.y is the total area covered by vegetation in that plot. Since, for both shapefiles, the projection is:
+proj=utm +zone=13 +datum=NAD83 +units=m +no_defs +ellps=GRS80 +towgs84=0,0,0
the units are in meters and the areas are in msq. To determine how much of the vegetation is coming from areas < 5 msq, we total the vegetation areas with area < 5 and merge that result with smry:
veg.lt5 <- aggregate(area~LocCode,data=veg.data[veg.data$area<5,],sum)
smry <- merge(smry, veg.lt5, by="LocCode")
Finally, with the data we have it's straightforward to render maps for each plot area:
cols <- c("id","LocCode")
plt.df <- fortify(plt.map)
plt.df <- merge(plt.df,plt.data[cols],by="id")
veg.df <- fortify(veg.map)
veg.df <- merge(veg.df,veg.data[cols],by="id")
ggp <- ggplot(plt.df, aes(x=long, y=lat, group=group))
ggp <- ggp + geom_path()
ggp <- ggp + geom_polygon(data=veg.df, fill="green")
ggp <- ggp + facet_wrap(~LocCode,scales="free")
ggp <- ggp + theme(axis.text=element_blank())
ggp <- ggp + labs(x="",y="")
ggp <- ggp + coord_fixed()
ggp

Related

Issue with country/province name from coordinates when points are on country coastal limits (in R) [duplicate]

This question already has an answer here:
Incorrect NA return when converting Lat/Long Coordinates to location in R
(1 answer)
Closed last year.
I have a set of coordinates that most fall within or near the Iberian Peninsula.
I'm interested in finding out the country name of a large set of coordinates and the province name. However, there are some problematic coordinates on the boundaries of the countries (coordinates near the coastal area) that are no detected within the country or province. First I have tried to find out the province name and after the country name and I find this issue with these two different approaches.
I'm aware that some coordinates could be typos but I have manually checked that some fall within the country limits.
Any advice of how to overcome this?
Here create here a minimal reproducible example:
First, I create a dataframe with the problematic coordinates. The last coordinate (36.76353, -4.425162) is a "correct one" in order to show that the function works.
lat <- c(36.81973, 43.69739, 36.51103, 38.50000, 42.25167, 42.25177, 39.31316, 36.76353)
long <- c(-2.411557, -5.919138, -4.635675, -0.100000, -8.804174, -8.790300, 2.995276, -4.425162)
coords <-data.frame(long, lat)
Then by readapting the answer of a colleague with the addition of library(mapSpain)
Latitude Longitude Coordinates to State Code in R I create this function that should tell the provinces where the different points are.
#I create a function that should tell the provinces of Spain from coordinates
library(sf)
library(spData)
library(mapSpain)
## pointsDF: A data.frame whose first column contains longitudes and
## whose second column contains latitudes.
##
## states: An sf MULTIPOLYGON object with 50 states plus DC.
##
## name_col: Name of a column in `states` that supplies the states'
## names.
lonlat_to_state <- function(pointsDF,
states = mapSpain::esp_get_prov(),
name_col = "ine.prov.name") {
## Convert points data.frame to an sf POINTS object
pts <- st_as_sf(pointsDF, coords = 1:2, crs = 4326)
## Transform spatial data to some planar coordinate system
## (e.g. Web Mercator) as required for geometric operations
states <- st_transform(states, crs = 3857)
pts <- st_transform(pts, crs = 3857)
## Find names of state (if any) intersected by each point
state_names <- states[[name_col]]
ii <- as.integer(st_intersects(pts, states))
state_names[ii]
}
lonlat_to_state(coords)
[1] NA NA NA NA NA NA NA "Málaga"
And again I have the same issue if I use
library(maps)
map.where(x = coords$long, y = coords$lat)
[1] NA NA NA NA NA NA NA "Spain"
Any advice would be more than welcome, thanks for your time!
st_intersects is working as expected, because the points (except one) lie outside the polygons. We can see this by plotting them (zooming in on each)
library(ggplot2)
for (i in seq_len(nrow(pts))) {
xlim = unlist(st_geometry(pts[i,]))[1] + c(-5000, 5000)
ylim = unlist(st_geometry(pts[i,]))[2] + c(-5000, 5000)
g = ggplot() +
geom_sf(data=states) +
geom_sf(data=pts) +
coord_sf(xlim = xlim, ylim = ylim)
print(g)
}
A few examples of the plots:

Filter shapefile polygons by area

I have the following boundary dataset for the United Kingdom, which shows all the counties:
library(raster)
library(sp)
library(ggplot)
# Download the data
GB <- getData('GADM', country="gbr", level=2)
Using the subset function it is really easy to filter the shapefile polygons by an attribute in the data. For example, if I want to exclude Northern Ireland:
GB_sub <- subset(UK, NAME_1 != "Northern Ireland")
However, there are lots of small islands which distort the scale data range, as shown in the maps below:
Any thoughts on how to elegantly subset the dataset on a minimum size? It would be ideal to have something in the format consistent with the subset argument. For example:
GB_sub <- subset(UK, Area > 20) # specify minimum area in km^2
Here is another potential solution. Because your data is in lat-long projection, directly calculating the area based on latitude and longitude would cause bias, it is better to calculate the area based on functions from the geosphere package.
install.packages("geosphere")
library(geosphere)
# Calculate the area
GB$poly_area <- areaPolygon(GB) / 10^6
# Filter GB based on area > 20 km2
GB_filter <- subset(GB, poly_area > 20)
poly_area contains the area in km2 for all polygons. We can filter the polygon by a threshold, such as 20 in your example. GB_filter is the final output.
This is one potential solution:
GB_sub = GB[sapply(GB#polygons, function(x) x#area>0.04),] # select min size
map.df <- fortify(GB_sub)
ggplot(map.df, aes(x=long, y=lat, group=group)) + geom_polygon()
Check this link for specifics on the actual interpretation of km2 size: Getting a slot's value of S4 objects?
I compared both as well but they don't seem to differ:
out1 = sapply(GB#polygons, function(x) x#area)
out2 = rgeos::gArea(GB, byid=TRUE)

Maps doesn't register weird shapes

I'm working with one of my professors on some research aimed toward bettering the current methods of carbon accounting. We noticed that many of the locations for point sources were defaulted to the centroid of the county it was in (this is specific to the US at the moment, though it will be applied globally) if there was no data on the location.
So I'm using R to to address the uncertainty associated with these locations. My code takes the range of longitude and latitude for a county and plots 10,000 points. It then weeds out the points that are not in the county and take the average of the leftover points to locate the centroid. My goal is to ultimately take the difference between these points and the centroid to find the spacial uncertainty for point sources that were placed in the centroid.
However, I'm running into problems with coastal regions. My first problem is that the maps package ignores islands (the barrier islands for example) as well as other disjointed county shapes, so the centroid is not accurately reproduced when the points are averaged. My second problem is found specifically with Currituck county (North Carolina). Maps seems to recognize parts of the barrier islands contained in this county, but since it is not continuous, the entire function goes all wonky and produces a bunch of "NAs" and "Falses" that don't correspond with the actual border of the county at all.
(The data for the centroid is going to be used in other areas of the research which is why it's important we can accurately access all counties.)
Is there any way around the errors I'm running into? A different data set that could be read in, or anything of the sort? Your help would be greatly appreciated. Let me know if there are any questions about what I'm asking, and I'll be happy to clarify.
CODE:
ggplot2 helps
SOME TROUBLE COUNTIES: north carolina, currituck & massachusetts,dukes
library(ggplot2)
library(maps) # package has maps
library(mapproj) # projections
library(sp)
WC <- map_data('county','north carolina,currituck') #calling on county
p <- ggplot(data = WC, aes(x = long, y = lat)) #calling on latitude and longitude
p1 <- p + geom_polygon(fill = "lightgreen") + theme_bw() +
coord_map("polyconic") + coord_fixed() #+ labs(title = "Watauga County")
p1
### range for the long and lat
RLong <- range(WC$long)
RLong
RLat <- range(WC$lat)
RLat
### Add some random points
n <- 10000
RpointsLong <- sample(seq(RLong[1], RLong[2], length = 100), n, replace = TRUE)
RpointsLat <- sample(seq(RLat[1], RLat[2], length = 100), n, replace = TRUE)
DF <- data.frame(RpointsLong, RpointsLat)
head(DF)
p2<-p1 + geom_point(data = DF, aes(x = RpointsLong, y = RpointsLat))
p2
# Source:
# http://www.nceas.ucsb.edu/scicomp/usecases/GenerateConvexHullAndROIForPoints
inside <- map.where('county',RpointsLong,RpointsLat)=="north carolina,currituck"
inside[which(nchar(inside)==2)] <- FALSE
inside
g<-inside*DF
g1<-subset(g,g$RpointsLong!=0)
g1
CentLong<-mean(g1$RpointsLong)
CentLat<-mean(g1$RpointsLat)
Centroid<-data.frame(CentLong,CentLat)
Centroid
p1+geom_point(data=g1, aes(x=RpointsLong,y=RpointsLat)) #this maps all the points inside county
p1+geom_point(data=Centroid, aes(x=CentLong,y=CentLat))
First, given your description of the problem, I would probably invest a lot of effort to avoid this business of locations defaulting to the county centroids - that's the right way to solve this problem.
Second, if this is a research project, I would not use the built in maps in R. The USGS National Atlas website has excellent county maps of the US. Below is an example using Currituck County in NC.
library(ggplot2)
library(rgdal) # for readOGR(...)
library(rgeos) # for gIntersection(...)
setwd("< directory contining shapefiles >")
map <- readOGR(dsn=".",layer="countyp010")
NC <- map[map$COUNTY=="Currituck County" & !is.na(map$COUNTY),]
NC.df <- fortify(NC)
bbox <- bbox(NC)
x <- seq(bbox[1,1],bbox[1,2],length=50) # longitude
y <- seq(bbox[2,1],bbox[2,2],length=50) # latitude
all <- SpatialPoints(expand.grid(x,y),proj4string=CRS(proj4string(NC)))
pts <- gIntersection(NC,all) # points inside the polygons
pts <- data.frame(pts#coords) # ggplot wants a data.frame
centroid <- data.frame(x=mean(pts$x),y=mean(pts$y))
ggplot(NC.df)+
geom_path(aes(x=long,y=lat, group=group), colour="grey50")+
geom_polygon(aes(x=long,y=lat, group=group), fill="lightgreen")+
geom_point(data=pts, aes(x,y), colour="blue")+
geom_point(data=centroid, aes(x,y), colour="red", size=5)+
coord_fixed()
Finally, another way to do this (which I'd recommend, actually), is to just calculate the area weighted centroid. This is equivalent to what you are approximating, is more accurate, and much faster.
polys <- do.call(rbind,lapply(NC#polygons[[1]]#Polygons,
function(x)c(x#labpt,x#area)))
polys <- data.frame(polys)
colnames(polys) <- c("long","lat","area")
polys$area <- with(polys,area/sum(area))
centr <- with(polys,c(x=sum(long*area),y=sum(lat*area)))
centr # area weighted centroid
# x y
# -76.01378 36.40105
centroid # point weighted centroid (start= 50 X 50 points)
# x y
# 1 -76.01056 36.39671
You'll find that as you increase the number of points in the points-weighted centroid the result gets closer to the area-weighted centroid.

Plot spatial area defined by multiple polygons

I have a SpatialPolygonsDataFrame with 11589 spatial objects of class "polygons". 10699 of those objects consists of exactly 1 polygon. However, the rest of those spatial objects consist of multiple polygons (2 to 22).
If an object of consists of multiple polygons, three scenarios are possible:
1) The additional polygons could describe a "hole" in the spatial area described by the first polygon .
2) The additional polygons could also describe additional geographic areas, i.e. the shape of the region is quite complex and described by putting together multiple parts.
3) Often it is a mix of both, 1) and 2).
My question is: How to plot such a spatial object which is based on multiple polygons?
I have been able to extract and plot the information of the first polygon, but I have not figured out how plot all polygons of such a complex spatial object at once.
Below you find my code. The problem is the 15th last line.
# Load packages
# ---------------------------------------------------------------------------
library(maptools)
library(rgdal)
library(ggmap)
library(rgeos)
# Get data
# ---------------------------------------------------------------------------
# Download shape information from the internet
URL <- "http://www.geodatenzentrum.de/auftrag1/archiv/vektor/vg250_ebenen/2012/vg250_2012-01-01.utm32s.shape.ebenen.zip"
td <- tempdir()
setwd(td)
temp <- tempfile(fileext = ".zip")
download.file(URL, temp)
unzip(temp)
# Get shape file
shp <- file.path(tempdir(),"vg250_0101.utm32s.shape.ebenen/vg250_ebenen/vg250_gem.shp")
# Read in shape file
x <- readShapeSpatial(shp, proj4string = CRS("+init=epsg:25832"))
# Transform the geocoding from UTM to Longitude/Latitude
x <- spTransform(x, CRS("+proj=longlat +datum=WGS84"))
# Extract relevant information
att <- attributes(x)
Poly <- att$polygons
# Pick an geographic area which consists of multiple polygons
# ---------------------------------------------------------------------------
# Output a frequency table of areas with N polygons
order.of.polygons.in.shp <- sapply(x#polygons, function(x) x#plotOrder)
length.vector <- unlist(lapply(order.of.polygons.in.shp, length))
table(length.vector)
# Get geographic area with the most polygons
polygon.with.max.polygons <- which(length.vector==max(length.vector))
# Check polygon
# x#polygons[polygon.with.max.polygons]
# Get shape for the geographic area with the most polygons
### HERE IS THE PROBLEM ###
### ONLY information for the first polygon is extracted ###
Poly.coords <- data.frame(slot(Poly[[polygon.with.max.polygons ]]#Polygons[[1]], "coords"))
# Plot
# ---------------------------------------------------------------------------
## Calculate centroid for the first polygon of the specified area
coordinates(Poly.coords) <- ~X1+X2
proj4string(Poly.coords) <- CRS("+proj=longlat +datum=WGS84")
center <- gCentroid(Poly.coords)
# Download a map which is centered around this centroid
al1 = get_map(location = c(lon=center#coords[1], lat=center#coords[2]), zoom = 10, maptype = 'roadmap')
# Plot map
ggmap(al1) +
geom_path(data=as.data.frame(Poly.coords), aes(x=X1, y=X2))
I may be misinterpreting your question, but it's possible that you are making this much harder than necessary.
(Note: I had trouble dealing with the .zip file in R, so I just downloaded and unzipped it in the OS).
library(rgdal)
library(ggplot2)
setwd("< directory with shapefiles >")
map <- readOGR(dsn=".", layer="vg250_gem", p4s="+init=epsg:25832")
map <- spTransform(map, CRS("+proj=longlat +datum=WGS84"))
nPolys <- sapply(map#polygons, function(x)length(x#Polygons))
region <- map[which(nPolys==max(nPolys)),]
plot(region, col="lightgreen")
# using ggplot...
region.df <- fortify(region)
ggplot(region.df, aes(x=long,y=lat,group=group))+
geom_polygon(fill="lightgreen")+
geom_path(colour="grey50")+
coord_fixed()
Note that ggplot does not deal with the holes properly: geom_path(...) works fine, but geom_polygon(...) fills the holes. I've had this problem before (see this question), and based on the lack of response it may be a bug in ggplot. Since you are not using geom_polygon(...), this does not affect you...
In your code above, you would replace the line:
ggmap(al1) + geom_path(data=as.data.frame(Poly.coords), aes(x=X1, y=X2))
with:
ggmap(al1) + geom_path(data=region.df, aes(x=long,y=lat,group=group))

Check if point is in spatial object which consists of multiple polygons/holes

I have a SpatialPolygonsDataFrame with 11589 objects of class "polygons". 10699 of those objects consists of exactly 1 polygon, however the rest of those objects consists of multiple polygons (2 to 22).
If an object of consists of multiple polygons, three scenarios are possible:
Sometimes, those additional polygons describe a "hole" in the geographic ara describe by the first polygon in the object of class "polygons".
Sometimes, those additional polygons describe additional geographic areas, i.e. the shape of the region is quite complex and described by putting together multiple parts.
Sometimes, it might be a mix of both, 1) and 2).
Stackoverflow helped me to plot such an spatial object properly (Plot spatial area defined by multiple polygons).
However, I am still not able to answer how to determine whether a point (defined by longitude/latitude) is in a polygon.
Below is my code. I tried to apply the function point.in.polygon in the sp package, but found no way how it could handle such an object which consists of multiple polygons/holes.
# Load packages
# ---------------------------------------------------------------------------
library(maptools)
library(rgdal)
library(rgeos)
library(ggplot2)
library(sp)
# Get data
# ---------------------------------------------------------------------------
# Download shape information from the internet
URL <- "http://www.geodatenzentrum.de/auftrag1/archiv/vektor/vg250_ebenen/2012/vg250_2012-01-01.utm32s.shape.ebenen.zip"
td <- tempdir()
setwd(td)
temp <- tempfile(fileext = ".zip")
download.file(URL, temp)
unzip(temp)
# Get shape file
shp <- file.path(tempdir(),"vg250_0101.utm32s.shape.ebenen/vg250_ebenen/vg250_gem.shp")
# Read in shape file
map <- readShapeSpatial(shp, proj4string = CRS("+init=epsg:25832"))
# Transform the geocoding from UTM to Longitude/Latitude
map <- spTransform(map, CRS("+proj=longlat +datum=WGS84"))
# Pick an geographic area which consists of multiple polygons
# ---------------------------------------------------------------------------
# Output a frequency table of areas with N polygons
nPolys <- sapply(map#polygons, function(x)length(x#Polygons))
# Get geographic area with the most polygons
polygon.with.max.polygons <- which(nPolys==max(nPolys))
# Get shape for the geographic area with the most polygons
Poly.coords <- map[which(nPolys==max(nPolys)),]
# Plot
# ---------------------------------------------------------------------------
# Plot region without Google maps (ggplot2)
plot(Poly.coords, col="lightgreen")
# Find if a point is in a polygon
# ---------------------------------------------------------------------------
# Define points
points_of_interest <- data.frame(long=c(10.5,10.51,10.15,10.4),
lat =c(51.85,51.72,51.81,51.7),
id =c("A","B","C","D"), stringsAsFactors=F)
# Plot points
points(points_of_interest$long, points_of_interest$lat, pch=19)
You can do this simply with gContains(...) in the rgeos package.
gContains(sp1,sp2)
returns a logical depending on whether sp2 is contained within sp1. The only nuance is that sp2 has to be a SpatialPoints object, and it has to have the same projection as sp1. To do that, you would do something like this:
point <- data.frame(lon=10.2, lat=51.7)
sp2 <- SpatialPoints(point,proj4string=CRS(proj4string(sp1)))
gContains(sp1,sp2)
Here is a working example based on the answer to your previous question.
library(rgdal) # for readOGR(...)
library(rgeos) # for gContains(...)
library(ggplot2)
setwd("< directory with all your files >")
map <- readOGR(dsn=".", layer="vg250_gem", p4s="+init=epsg:25832")
map <- spTransform(map, CRS("+proj=longlat +datum=WGS84"))
nPolys <- sapply(map#polygons, function(x)length(x#Polygons))
region <- map[which(nPolys==max(nPolys)),]
region.df <- fortify(region)
points <- data.frame(long=c(10.5,10.51,10.15,10.4),
lat =c(51.85,51.72,51.81,51.7),
id =c("A","B","C","D"), stringsAsFactors=F)
ggplot(region.df, aes(x=long,y=lat,group=group))+
geom_polygon(fill="lightgreen")+
geom_path(colour="grey50")+
geom_point(data=points,aes(x=long,y=lat,group=NULL, color=id), size=4)+
coord_fixed()
Here, point A is in the main polygon, point B is in a lake (hole), point C is on an island, and point D is completely outside the region. So this code checks all of the points using gContains(...)
sapply(1:4,function(i)
list(id=points[i,]$id,
gContains(region,SpatialPoints(points[i,1:2],proj4string=CRS(proj4string(region))))))
# [,1] [,2] [,3] [,4]
# id "A" "B" "C" "D"
# TRUE FALSE TRUE FALSE
Since you can use the "point in polygon" routine, and this apparently isn't already suitably designed to handle the multi-polygon case in R (which I find a bit odd actually), you are left with having to cycle through each of the multiple polygons. Now the trick is, if you are inside an odd number of polygons, you are inside the multi-polygon. If you are inside an even number of polygons, then you are actually outside of the shape.
Point in polygon testing that uses ray-crossings should ALREADY be able to handle this, just by making sure you pass in all the vertices to the original point.in.polygon test, but I am not sure which mechanism R is using, so I can only give you the even/odd advice above.
I also found this code, not sure if it will help:
require(sp)
require(rgdal)
require(maps)
# read in bear data, and turn it into a SpatialPointsDataFrame
bears <- read.csv("bear-sightings.csv")
coordinates(bears) <- c("longitude", "latitude")
# read in National Parks polygons
parks <- readOGR(".", "10m_us_parks_area")
# tell R that bear coordinates are in the same lat/lon reference system
# as the parks data -- BUT ONLY BECAUSE WE KNOW THIS IS THE CASE!
proj4string(bears) <- proj4string(parks)
# combine is.na() with over() to do the containment test; note that we
# need to "demote" parks to a SpatialPolygons object first
inside.park <- !is.na(over(bears, as(parks, "SpatialPolygons")))
# what fraction of sightings were inside a park?
mean(inside.park)
## [1] 0.1720648
# use 'over' again, this time with parks as a SpatialPolygonsDataFrame
# object, to determine which park (if any) contains each sighting, and
# store the park name as an attribute of the bears data
bears$park <- over(bears, parks)$Unit_Name
# draw a map big enough to encompass all points (but don't actually plot
# the points yet), then add in park boundaries superimposed upon a map
# of the United States
plot(coordinates(bears), type="n")
map("world", region="usa", add=TRUE)
plot(parks, border="green", add=TRUE)
legend("topright", cex=0.85,
c("Bear in park", "Bear not in park", "Park boundary"),
pch=c(16, 1, NA), lty=c(NA, NA, 1),
col=c("red", "grey", "green"), bty="n")
title(expression(paste(italic("Ursus arctos"),
" sightings with respect to national parks")))
# now plot bear points with separate colors inside and outside of parks
points(bears[!inside.park, ], pch=1, col="gray")
points(bears[inside.park, ], pch=16, col="red")
# write the augmented bears dataset to CSV
write.csv(bears, "bears-by-park.csv", row.names=FALSE)
# ...or create a shapefile from the points
writeOGR(bears, ".", "bears-by-park", driver="ESRI Shapefile")

Resources