geopandas plotting - Identify locations that fall outside of the map - plot

I have a shapefile that shows the map of Pakistan at district level. I also have a geodataframe that has information about polling stations in Pakistan.
I have mapped the geodataframe on to the shapefile, but noticed that some lat/lon values from the geodataframe are wrong i.e. they lie outside Pakistan.
I want to identify which polling stations these are. (I want to select those rows from the geodataframe) Is there a way to do this?
Please see below for reference - the black dots indicate polling stations, and the colourful map is the map of Pakistan at district levels:
image_pakistan_map_pollingstations
edit:
So I'm trying this and it seems to work, however it's taking a very long time to run (been running it for 5+ hrs now) - for reference, the geodataframe has about 50,000 rows and it's called ours_NA_gdf.
for i in range(len(ours_NA_gdf)):
if ours_NA_gdf['geometry'][i].within(pakistan['geometry'][0]):
ours_NA_gdf.at[i, 'loc_validity'] = 'T'
else:
ours_NA_gdf.at[i, 'loc_validity'] = 'F'
ours_NA_gdf[ours_NA_gdf['loc_validity']=='F']

I suspect that the geometries of Pakistan you use are the problem. They are too complex and detailed to use. In your use-case, simple geometry provided by naturalearth_lowres should give better performance. Here I provide a runnable code that demonstrates the use of simple Pakistan geometry to perform contains() operations, and assign properties color of points to plot on the map.
import pandas as pd
import geopandas as gpd
from shapely.geometry import Point
import numpy as np
import matplotlib.pyplot as plt
import matplotlib.patches as mpatches
from cartopy import crs as ccrs
# create a geoDataFrame of points locations across Pakistan areas
pp = 40
lons = np.linspace(60, 80, pp)
lats = np.linspace(22, 39, pp)
# create point geometry
# points will be plotted across Pakistan in red (outside) and green (inside)
points = [Point(xy) for xy in zip(lons, lats)]
# create a dataframe of 3 columns
mydf = pd.DataFrame({'longitude': lons, 'latitude': lats, 'point': points})
# manipulate dataframe geometry
gdf = mydf.drop(['longitude', 'latitude'], axis=1)
gdf = gpd.GeoDataFrame(gdf, crs="EPSG:4326", geometry=gdf.point)
fig, ax = plt.subplots(figsize=(6,7), subplot_kw={'projection': ccrs.PlateCarree()})
world = gpd.read_file(gpd.datasets.get_path('naturalearth_lowres'))
parki = world[(world.name == "Pakistan")] #take a country of interest
# grab the geometry of Pakistan
# can .simplify() it if need be
pg = parki['geometry']
newcol = []
for index, row in gdf.iterrows(): # Looping over all points
res = pg.contains( row.geometry).values[0]
newcol.append(res)
# add a new column ('insideQ') to the geodataframe
gdf['insideQ'] = newcol
# add a new column ('color') to the geodataframe
gdf.loc[:, 'color'] = 'green' #set color='green'
# this set color='red' to selected rows
gdf.loc[gdf['insideQ']==False, 'color'] = 'red'
# plot Pakistan
ax.add_geometries(parki['geometry'], crs=ccrs.PlateCarree(), color='lightpink', label='Pakistan')
# plot all points features of `gdf`
gdf.plot(ax=ax, zorder=20, color=gdf.color)
ax.set_extent([60, 80, 22, 39]) #zoomed-in to Pakistan
LegendElement = [
mpatches.Patch(color='lightpink', label='Pakistan')
]
ax.legend(handles = LegendElement, loc='best')
plt.show()
The output plot:

Related

finding no data values inside the extent of a shapefile and discarding the values outside the extent

I have Randolph Glacier Inventory boundary shapefiles of glaciers in Himachal Pradesh. I clipped three different rasters with these shapefiles, and then stacked them together . these clipped rasters contain no data values and I have to find the no data values and the pixel values inside these rasters . But when I am extracting the raster , the values am getting are more than that they should be .
for example , the area of a glacier / shapefile is 8.719 km. sq and the resolution of raster is 10 sq.m so accordingly number of pixel in the raster should be 87190, but I am getting approx. 333233(that may be because of the bounding box). So, I decided to create a binary mask so that I can get the values inside the boundary of the raster but still I am getting a lot more values than 87190 .
the stacked raster all have the same resolution and the same extent but still when I extract them and multiply the extracted array with the mask given below, the number of pixels extracted are different for two bands.
the code i used for making the binary mask is given below .
I want to write a code so that I can extract pixel values inside(only inside the boundary of the shapefile) the raster with the no data values present within them.
this is the mask I created
this is the shapefile of the same glacier
from rasterio.plot import reshape_as_image
import rasterio.mask
from rasterio.features import rasterize
from shapely.geometry import mapping, Point, Polygon
from shapely.ops import cascaded_union
shape_path= "E:\semester_4\glaciers of lahaul spiti clipped\RGIId_RGI60-14.11841/11841.shp"
glacier_df = gpd.read_file(shape_path)
raster_path =('E:/semester_4/glaciers of lahaul spiti clipped/RGIId_RGI60-14.11841/stack9/raster_stack_sv_srtm.tif')
with rasterio.open(raster_path , "r") as src:
raster_img = src.read()
raster_meta = src.meta
print("CRS Raster :{} , CRS Vector{}". format (glacier_df.crs , src.crs))
def poly_from_utm (polygon,transform):
poly_pts = []
poly = cascaded_union(polygon)
for i in np.array(poly.exterior.coords):
poly_pts.append(~transform* tuple(i))
new_poly = Polygon(poly_pts)
return new_poly
poly_shp = []
im_size = (src.meta['height'] , src.meta['width'])
for num , row in glacier_df.iterrows():
if row['geometry'].geom_type == 'Polygon':
poly = poly_from_utm(row['geometry'] , src.meta['transform'])
poly_shp.append(poly)
else:
for p in row['geometry']:
poly= poly_from_utm (p , src.meta['transform'])
poly_shp.append(poly)
mask_stack_sv_srtm = rasterize(shapes = poly_shp ,
out_shape = im_size)
plt.figure(figsize = (5, 5))
plt.imshow(mask_stack_sv_srtm)

Calculate slope over a gridded latitude/longitude coordinate area with corresponding depths in r

I have built a gridded area in the Gulf of Alaska with a resolution of 0.02 decimal degrees (~1nm);
library(sp)
library(rgdal)
# Set interval for grid cells.
my.interval=0.02 #If 1 is 1 degree, which is 60nm, than 0.1 is every 6nm, and 0.05 is every
3nm, so 0.0167 is every 1nm
# Select range of coordinates for grid boundaries (UTM to maintain constant grid cell area regardless of geographic location).
lonmin = -140.5083
lonmax = -131.2889
latmin = 53.83333
latmax = 59.91667
LON = seq(lonmin, lonmax, by=my.interval)
LAT = seq(latmin, latmax, by=my.interval)
# Compile series of points for grid:
mygrd = expand.grid(
Longitude = seq(lonmin, lonmax, by=my.interval),
Latitude = seq(latmin, latmax, by=my.interval)) %>%
#mutate(z=1:n()) %>%
data.frame
I exported that grid as a .csv file and brought it into ArcGIS where I used a few bathymetry rasters to extract the bottom depth at the midpoint of each cell. I then exported that from GIS back into R as a .csv file data frame. So now it has another column called "Depth" on it.
For now, I'll just add a column with random "depth" numbers in it:
mygrd$Depth<-NA
mygrd$Depth<-runif(nrow(mygrd), min=100, max=1000)
I would like to calculate the slope at the midpoint of each cell (between points).
I've been trying to do this with the slope() function in SDMTools package, which requires you to have a SpatialGridDataFrame in the sp package.
I can't get this to work; I am also not sure if this is the easiest way to do that?
I have a data frame with 3 columns: Longitude, Latitude, and Depth. I'd like to calculate slope. If anyone knows any better way to do this, let me know! Any help is much appreciated!
Here is some of the code I've been trying to use:
library(SDMTools)
proj <- CRS('+proj=longlat +datum=WGS84')
coords <- mygrd[,1:2]
t2 <- SpatialPointsDataFrame(coords=coords, data=mgrd proj4string=proj)
t2<-SpatialPixelsDataFrame(points=t2[c("Longitude","Latitude")], data=t1[,c(1,2)])
t3 <- SpatialGridDataFrame(grid=NULL, data=t2, proj4string=CRS("+proj=longlat +datum=WGS84"))
class(t3)
slope.test<-slope(t3, latlon=TRUE)

rworldmap coordinates, how to match NetCDF data to the map?

Rworldmap looks like exactly what I need for mapping climate data, but I'm having a problem lining up the base map with the climate data. What I am mapping is ocean temperature data from JAMSTEC for August, 2015 from here:
http://www.jamstec.go.jp/ARGO/argo_web/ancient/MapQ/Mapdataset_e.html
The dataset name is TS_201508_GLB.nc. The R script I'm using is below. The country outlines are fine, but the data is for the oceans only and the data does not show in the oceans it is offset somehow. Can you tell me how to align the data to the map?
I've read lots of articles but I cannot tell how to align the two. The data I have is longitude and latitude. South latitude is negative and west longitude is negative, I don't see how they could be confused. How is the map shown, is there some sort of special convention for the lat/longs?
Thanks for any help you can provide.
The code:
library(RNetCDF)
library(sp)
library(rworldmap)
library(rgeos)
library(RColorBrewer)
library (classInt)
library(grid)
library(spam)
library(maps)
library(maptools)
library(fields)
library(methods)
library(rgdal)
library(rworldxtra)
fname <- "G:/Climate_Change/Ocean_Warming/MOAA_GPV_Jamstec_Temperature/TS_201508_GLB.nc"
moaa <- open.nc(fname)
# moaa
print.nc(moaa)
file.inq.nc(moaa)
#TOI is the temperature array extracted from the NCDF file
TOI = var.get.nc(moaa,"TOI",start=c(1,1,1),count=c(360,132,25))
TOI[1,1,1]
Long = var.get.nc(moaa,"LONGITUDE")
Lat = var.get.nc(moaa, "LATITUDE")
Pres = var.get.nc(moaa,"PRES")
# create grid
offset=c(-179.5,-60.50)
cellsize = c(abs(Long[1]-Long[2]),abs(Lat[1]-Lat[2]))
cells.dim = c(dim(Long), dim(Lat))
# create gt
gt <- GridTopology(cellcentre.offset=offset,cellsize=cellsize,cells.dim=cells.dim)
# create map window
mapDevice()
# Create a color pallette
colourPalette=c('blue','lightblue','white',brewer.pal(9,'YlOrRd'))
# Values at 2000 decibar for August 2015
ncMatrix <- TOI[,,25]
# Gridvalues
gridVals <-data.frame(att=as.vector(ncMatrix))
# create a spatialGridDataFrame
sGDF <-SpatialGridDataFrame(gt,data=gridVals)
# Vector to classify data
catMethod=seq(from=0,to=4,by=.33)
# plotting the map and getting params for legend
mapParams <- mapGriddedData( sGDF, nameColumnToPlot='att',catMethod=catMethod,colourPalette=colourPalette,addLegend=FALSE)
I finally figured it out. rworldmap wants the data organized from the upper left of the map(Northwest corner), that is Long = -180, Lat=90. The NetCDF data starts at Long=0 and Lat=-90(the middle of the map and south edge). So we have to reverse the values in the North-South direction:
#
# Flip the Latitude values so south is last
ncMatrix2 <- ncMatrix[,dim(Lat):1]
Then switch the values for east longitude and west longitude:
#
#Longitude values need to be from -180 to 0 then 0 to 180
# So we divide into East and West, then recombine with rbind
East_Long_values <-ncMatrix2[1:180,]
West_Long_Values <-ncMatrix2[181:360,]
ncMatrix3 <- rbind(West_Long_Values,East_Long_values)
Then everything else works.

Label a point depending on which polygon contains it (NYC civic geospatial data)

I have the longitude and latitude of 5449 trees in NYC, as well as a shapefile for 55 different Neighborhood Tabulation Areas (NTAs). Each NTA has a unique NTACode in the shapefile, and I need to append a third column to the long/lat table telling me which NTA (if any) each tree falls under.
I've made some progress already using other point-in-polygon threads on stackoverflow, especially this one that looks at multiple polygons, but I'm still getting errors when trying to use gContains and don't know how I could check/label each tree for different polygons (I'm guessing some sort of sapply or for loop?).
Below is my code. Data/shapefiles can be found here: http://bit.ly/1BMJubM
library(rgdal)
library(rgeos)
library(ggplot2)
#import data
setwd("< path here >")
xy <- read.csv("lonlat.csv")
#import shapefile
map <- readOGR(dsn="CPI_Zones-NTA", layer="CPI_Zones-NTA", p4s="+init=epsg:25832")
map <- spTransform(map, CRS("+proj=longlat +datum=WGS84"))
#generate the polygons, though this doesn't seem to be generating all of the NTAs
nPolys <- sapply(map#polygons, function(x)length(x#Polygons))
region <- map[which(nPolys==max(nPolys)),]
plot(region, col="lightgreen")
#setting the region and points
region.df <- fortify(region)
points <- data.frame(long=xy$INTPTLON10,
lat =xy$INTPTLAT10,
id =c(1:5449),
stringsAsFactors=F)
#drawing the points / polygon overlay; currently only the points are appearing
ggplot(region.df, aes(x=long,y=lat,group=group))+
geom_polygon(fill="lightgreen")+
geom_path(colour="grey50")+
geom_point(data=points,aes(x=long,y=lat,group=NULL, color=id), size=1)+
xlim(-74.25, -73.7)+
ylim(40.5, 40.92)+
coord_fixed()
#this should check whether each tree falls into **any** of the NTAs, but I need it to specifically return **which** NTA
sapply(1:5449,function(i)
list(id=points[i,]$id, gContains(region,SpatialPoints(points[i,1:2],proj4string=CRS(proj4string(region))))))
#this is something I tried earlier to see if writing a new column using the over() function could work, but I ended up with a column of NAs
pts = SpatialPoints(xy)
nyc <- readShapeSpatial("< path to shapefile here >")
xy$nrow=over(pts,SpatialPolygons(nyc#polygons), returnlist=TRUE)
The NTAs we're checking for are these ones (visualized in GIS): http://bit.ly/1A3jEcE
Try simply:
ShapeFile <- readShapeSpatial("Shapefile.shp")
points <- data.frame(long=xy$INTPTLON10,
lat =xy$INTPTLAT10,
stringsAsFactors=F)
dimnames(points)[[1]] <- seq(1, length(xy$INTPTLON10), 1)
points <- SpatialPoints(points)
df <- over(points, ShapeFile)
I omitted transformation of shapefile because this is not the main subject here.

How do I select the subset of my lat/lon data that is highway travel using R?

I have some data which consists of time-stamped lat/lon pairs, a subset of which I've mapped out below using ggmap. If I wanted to select only the data consisting of travel along the highways, which you can kinda see on the map - the 280 running NW-SE between the green mountains and the grey flat area & the 101 cutting through the middle of the grey flat area (where the red is dense) - how would I select only that data?
What I'd ultimately like to achieve is a dataframe which contains only the highway/interstate travel. I've seen this question, which is a brief sketch of a possible solution in javascript, & suggests to use the Directions API to return the nearest road for any given point. I could then filter those results, but I'm wondering if anyone has found a cleaner solution.
Here's some sample data (CSV)
Here's the code to plot the above:
require(ggmap)
map<-get_googlemap(center="Palo Alto", zoom = 10)
ggmap(map) + geom_point(data = sample, aes(x = lon, y = lat),size = 3, color = "red")
You don't need an API key to run the above.
I just found this post and thought this is an interesting question. I wanted to download your sample data file. Unfortunately, the link was not working any more. Therefore, I could not try the whole process I had in my mind. However, I believe the following will let you move forward if you still try to do this task.
I recently noticed that Natural Earth offers road data. That is, you can get long and lat for the roads in the US, for example. If you can compare lon/lat in your data set and lon/lat of the road data, and identify matches in data points, you can get the data you want. My concern is to what extent your data points are accurate. If lon/lat perfectly stays on the road you are interested in, you will be OK. But if there are some margins, you may have to think how you can filter your data.
What I would like to leave here is evidence that the road data and googlemap match pretty well. As long as I see the output, the road data is reliable. You can subset your data using the road data. Here is my code.
### Step 1: shapefile becomes SpatialLinesDataFrame.
foo <- readShapeLines("ne_10m_roads_north_america.shp")
### Step 2: If necessary, subset data before I use fortify().
### dplyr does not work with SpatialLinesDataFrame at this point.
CA <- foo %>%
subset(.,country == "United States" & state == "California")
### Step 3: I need to convert foo to data frame so that I can use the data
### with ggplot2.
ana <- fortify(CA)
### Step 4: Get a map using ggmap package
longitude <- c(-122.50, -121.85)
latitude <- c(37.15, 37.70)
map <- get_map(location = c(lon = mean(longitude), lat = mean(latitude)),
zoom = 12, source = "google",
maptype = "satellite")
ggmap(map) +
geom_path(aes(x = long, y = lat, group = group), data = ana)

Resources