County average from latitude and longitude - r

I have a big data frame (832k rows) with latitude and longitude in a gridded format plus one variable. I would like to plot the average of this variable per county. The problem is that I do not have the identification of county or state by point, only the coordinates.
Sorry, I am not sure how to include a replicable example

Two approaches:
1) Calculate average of all the lat/lon grids. This approach skews your county centre towards higher density grids
2) Calculate bounds[min-max lat/lon] of grids and average the bounds. This approach places the county centre in exactly centre of the grid span.

You will need to obtain the county (or state) data and then spatially join it with your dataframe. One possible source for such data is the TIGER shapefile published by the U.S. Census (see e.g. https://catalog.data.gov/dataset/tiger-line-shapefile-2016-nation-u-s-current-county-and-equivalent-national-shapefile).
You can then use the sf package to read the shapefile into R, join it with your data, and then use regular summary functions to summarize your data by county.
library(sf)
filename <- 'https://www2.census.gov/geo/tiger/TIGER2016/COUNTY/tl_2016_us_county.zip'
tmpfile <- tempfile()
tmpdir <- tempdir()
download.file(filename,tmpfile)
unzip(zipfile = tmpfile, exdir = tmpdir)
county_data <- st_read(paste0(tmpdir, '/tl_2016_us_county.shp'))
unlink(tmpfile)
unlink(tmpdir)

Related

Use netCDF file in R as panel data sf object

I have a netCDF file from:
Kummu, Matti; Taka, Maija; Guillaume, Joseph H. A. (2020), Data from: Gridded global datasets for Gross Domestic Product and Human Development Index over 1990-2015, Dryad, Dataset, https://doi.org/10.5061/dryad.dk1j0
More specifically, I want to use this dataset:
https://drive.google.com/file/d/1-PUa0R0ZcUCJbA1VIMe0KKdElHX2C3Cc/view?usp=sharing
which is in netCDF format.
I would like to import it as a panel dataset sf-object such that I can spatia-join it with a different sf-object.
I tried using
ppp <- stars::read_stars('~/GDP_per_capita_PPP_1990_2015_v2.nc', var="GDP_per_capita_PPP")
ppp <- sf::st_as_sf(ppp)
The result is a panel-dataset with over 9 million observations but the geometry seems to be off. I am only interested in African countries, maybe this helps reduce the dataset, but from the sf-object I get I do not know how to proceed.
The code below to combine the gridded gdp with the centroid of an administrative region (which was an SF object - represented by obj2 in the code snippet).
If you need to aggregate the grids into an administrative region (say by averaging over the region), have a look at exactextractr
library(sf)
library(raster)
library(terra)
obj1 <- stack("./doi_10.5061_dryad.dk1j0__v2/GDP_per_capita_PPP_1990_2015_v2.nc")
extract.df <- terra::extract(obj1, obj2, df = T)

Trying to convert Numerical Values of Lat/Long into Spatial Data

I am working with a dataset that features chemical analyses from different locations within a cave, with each analysis ordered by a site number and that sites latitude and longitude. This first image is what I had done originally simply using ggplot.
Concentrations mapped by color over map
But what I want to do is use the shapefile of the cave system from which the data is sourced from and do something similar by plotting the points over the system and then coloring them by concentration. This below is the shapefile that I uploaded
Cave system shapefile
So basically I want to be able to map the chemical data from my dataset used to map the first figure, but on the map of the shapefile. Initially it kept on saying that it could not plot on top of it. So I figured I had to convert the latitude and longitude into spatial coordinates that could then be mapped on the shapefile.
Master_Cave_data <- Master_cave_data %>%
st_as_sf(MastMaster_cave_data, agr = "identity", coord = Lat_DD)
This was what I had thought to use in order to convert the numerical Latitude cooridnates into spatial data.

Crop, change values, and merge rasters with overlapping extent

I am trying to take a raster of soils data for one state, crop it by county, change the cell values in each county (to the county fips code), and then re-merge the county rasters back into the state raster.
Here I read in the state soils raster (which by default as the map unit key associated with each soil type as the cell value) and a polygon of US counties. I then select out the polygon just of one state, transform it to the same coordiante system as the soils raster, and then select out the soils rasters and polygons two example counties.
state_soils_raster <- raster("MapunitRaster_IL_10m.tif")
us_county_polygons <- readOGR("cb_2016_us_county_500k/cb_2016_us_county_500k.shp")
IL_county_polygons <- us_county_polygons[us_county_polygons$STATEFP == 17,]
IL_county_polygons <- spTransform(IL_county_polygons, CRS = crs(state_soils_raster))
county1 <- "Douglas"
county2 <- "Coles"
county1_polygon <- IL_county_polygons[IL_county_polygons$NAME %in% county1,]
county2_polygon <- IL_county_polygons[IL_county_polygons$NAME %in% county2,]
county1_raster <- crop(state_soils_raster, county1_polygon)
county2_raster <- crop(state_soils_raster, county2_polygon)
If I plot each county by itself, you can see that the extent of the cropped region is rectangular and extends beyond the area of the county itself. The coloring is crazy because the mukey values are all over the place (although typically grouped by county). County1 lies just to the north of County2.
plot(county1_raster)
plot(county1_polygon, add = T)
plot(county2_raster)
plot(county2_polygon, add = T)
If I leave the values as is and merge the two county rasters back together, everything is fine. Even though the extents of the two rasters do overlap, the cell values are identical regardless of which raster merge is pulling from. I'm not actually sure which raster merge pulls from in this case, but it doesn't really matter. Everything fits back together nicely and the cell values are correct.
both_counties_raster <- merge(county1_raster, county2_raster)
plot(both_counties_raster)
plot(county1_polygon, add = T)
plot(county2_polygon, add = T)
However, what I want to do is to change the cell values by county prior to recombining the county rasters.
values(county1_raster) <- 1
values(county2_raster) <- 2
both_counties_raster_new <- merge(county1_raster, county2_raster)
Everything merges just fine, but when I now plot the new combined raster it is clear that for cells that were contained in both county rasters merge just took the cell values from one of the rasters. Clearly merge prioritizes the the first input raster by default.
plot(both_counties_raster_new)
plot(county1_polygon, add = T)
plot(county2_polygon, add = T)
What I'm looking for is to just change the cell values within the boundaries of each county and then merge all the counties back together again.
I am aware of the raster::mask function that can turn anything outside of the county boundary to NA, with a 10m cell resolution (described here), this takes an insane amount of time!
I have also tried an alternative approach using the raster::rasterize function to turn the county boundary polygons into a raster with the same cell size and extent of the state soils raster. Again, with a 10m cell resolution this takes forever. I was able to process one county on each of my 8 cores in 1.5 hours. And I've got a whole country to do!
I am not aware of any 10m raster US county dataset, although that would be amazing if someone pointed me to that.
The soils data is gSSURGO data - I'm also not aware if gSSURGO has within its many tables a county attribute. If it's there, I can't find it. that would also be an easy solution.
It may not be quicker but have you tried with raster::cellFromPolygon ?
Here is a simple example:
# Create a raster with zero values
r <- raster(ncols=30, nrows=30, res = 1/3)
values(r) <- 0
# Create polygons
cds1 <- rbind(c(-180,-20), c(-160,5), c(-60, 0), c(-160,-60), c(-180,-20))
cds2 <- rbind(c(80,0), c(100,60), c(120,0), c(120,-55), c(80,0))
pols <- SpatialPolygons(list(Polygons(list(Polygon(cds1)), 1), Polygons(list(Polygon(cds2)), 2)))
plot(r)
plot(pols, add = TRUE)
r2 <- r
# Find which cells are in which polygons
cellpol <- cellFromPolygon(r, pols)
# Not a really clean way to attribute values in the global environment...
lapply(1:length(cellpol), function(x) values(r2)[cellpol[[x]]] <<- x)
plot(r2)
plot(pols, add = TRUE)

Label a point depending on which polygon contains it (NYC civic geospatial data)

I have the longitude and latitude of 5449 trees in NYC, as well as a shapefile for 55 different Neighborhood Tabulation Areas (NTAs). Each NTA has a unique NTACode in the shapefile, and I need to append a third column to the long/lat table telling me which NTA (if any) each tree falls under.
I've made some progress already using other point-in-polygon threads on stackoverflow, especially this one that looks at multiple polygons, but I'm still getting errors when trying to use gContains and don't know how I could check/label each tree for different polygons (I'm guessing some sort of sapply or for loop?).
Below is my code. Data/shapefiles can be found here: http://bit.ly/1BMJubM
library(rgdal)
library(rgeos)
library(ggplot2)
#import data
setwd("< path here >")
xy <- read.csv("lonlat.csv")
#import shapefile
map <- readOGR(dsn="CPI_Zones-NTA", layer="CPI_Zones-NTA", p4s="+init=epsg:25832")
map <- spTransform(map, CRS("+proj=longlat +datum=WGS84"))
#generate the polygons, though this doesn't seem to be generating all of the NTAs
nPolys <- sapply(map#polygons, function(x)length(x#Polygons))
region <- map[which(nPolys==max(nPolys)),]
plot(region, col="lightgreen")
#setting the region and points
region.df <- fortify(region)
points <- data.frame(long=xy$INTPTLON10,
lat =xy$INTPTLAT10,
id =c(1:5449),
stringsAsFactors=F)
#drawing the points / polygon overlay; currently only the points are appearing
ggplot(region.df, aes(x=long,y=lat,group=group))+
geom_polygon(fill="lightgreen")+
geom_path(colour="grey50")+
geom_point(data=points,aes(x=long,y=lat,group=NULL, color=id), size=1)+
xlim(-74.25, -73.7)+
ylim(40.5, 40.92)+
coord_fixed()
#this should check whether each tree falls into **any** of the NTAs, but I need it to specifically return **which** NTA
sapply(1:5449,function(i)
list(id=points[i,]$id, gContains(region,SpatialPoints(points[i,1:2],proj4string=CRS(proj4string(region))))))
#this is something I tried earlier to see if writing a new column using the over() function could work, but I ended up with a column of NAs
pts = SpatialPoints(xy)
nyc <- readShapeSpatial("< path to shapefile here >")
xy$nrow=over(pts,SpatialPolygons(nyc#polygons), returnlist=TRUE)
The NTAs we're checking for are these ones (visualized in GIS): http://bit.ly/1A3jEcE
Try simply:
ShapeFile <- readShapeSpatial("Shapefile.shp")
points <- data.frame(long=xy$INTPTLON10,
lat =xy$INTPTLAT10,
stringsAsFactors=F)
dimnames(points)[[1]] <- seq(1, length(xy$INTPTLON10), 1)
points <- SpatialPoints(points)
df <- over(points, ShapeFile)
I omitted transformation of shapefile because this is not the main subject here.

How to get count of non-NA raster cells within polygon

I've been running into all sorts of issues using ArcGIS ZonalStats and thought R could be a great way. Saying that I'm fairly new to R, but got a coding background.
The situation is that I have several rasters and a polygon shape file with many features of different sizes (though all features are bigger than a raster cell and the polygon features are aligned to the raster).
I've figured out how to get the mean value for each polygon feature using the raster library with extract:
#load packages required
require(rgdal)
require(sp)
require(raster)
require(maptools)
# ---Set the working directory-------
datdir <- "/test_data/"
#Read in a ESRI grid of water depth
ras <- readGDAL("test_data/raster/pl_sm_rp1000/w001001.adf")
#convert it to a format recognizable by the raster package
ras <- raster(ras)
#read in polygon shape file
proxNA <- readShapePoly("test_data/proxy/PL_proxy_WD_NA_test")
#plot raster and shp
plot(ras)
plot(proxNA)
#calc mean depth per polygon feature
#unweighted - only assigns grid to district if centroid is in that district
proxNA#data$RP1000 <- extract(ras, proxNA, fun = mean, na.rm = TRUE, weights = FALSE)
#check results
head(proxNA)
#plot depth values
spplot(proxNA[,'RP1000'])
The issue I have is that I also need an area based ratio between the area of the polygon and all non NA cells in the same polygon. I know what the cell size of the raster is and I can get the area for each polygon, but the missing link is the count of all non-NA cells in each feature. I managed to get the cell number of all the cells in the polygon proxNA#data$Cnumb1000 <- cellFromPolygon(ras, proxNA)and I'm sure there is a way to get the actual value of the raster cell, which then requires a loop to get the number of all non-NA cells combined with a count, etc.
BUT, I'm sure there is a much better and quicker way to do that! If any of you has an idea or can point me in the right direction, I would be very grateful!
I do not have access to your files, but based on what you described, this should work:
library(raster)
mask_layer=shapefile(paste0(shapedir,"AOI.shp"))
original_raster=raster(paste0(template_raster_dir,"temp_raster_DecDeg250.tif"))
nonNA_raster=!is.na(original_raster)
masked_img=mask(nonNA_raster,mask_layer) #based on centroid location of cells
nonNA_count=cellStats(masked_img, sum)

Resources