Use netCDF file in R as panel data sf object - r

I have a netCDF file from:
Kummu, Matti; Taka, Maija; Guillaume, Joseph H. A. (2020), Data from: Gridded global datasets for Gross Domestic Product and Human Development Index over 1990-2015, Dryad, Dataset, https://doi.org/10.5061/dryad.dk1j0
More specifically, I want to use this dataset:
https://drive.google.com/file/d/1-PUa0R0ZcUCJbA1VIMe0KKdElHX2C3Cc/view?usp=sharing
which is in netCDF format.
I would like to import it as a panel dataset sf-object such that I can spatia-join it with a different sf-object.
I tried using
ppp <- stars::read_stars('~/GDP_per_capita_PPP_1990_2015_v2.nc', var="GDP_per_capita_PPP")
ppp <- sf::st_as_sf(ppp)
The result is a panel-dataset with over 9 million observations but the geometry seems to be off. I am only interested in African countries, maybe this helps reduce the dataset, but from the sf-object I get I do not know how to proceed.

The code below to combine the gridded gdp with the centroid of an administrative region (which was an SF object - represented by obj2 in the code snippet).
If you need to aggregate the grids into an administrative region (say by averaging over the region), have a look at exactextractr
library(sf)
library(raster)
library(terra)
obj1 <- stack("./doi_10.5061_dryad.dk1j0__v2/GDP_per_capita_PPP_1990_2015_v2.nc")
extract.df <- terra::extract(obj1, obj2, df = T)

Related

County average from latitude and longitude

I have a big data frame (832k rows) with latitude and longitude in a gridded format plus one variable. I would like to plot the average of this variable per county. The problem is that I do not have the identification of county or state by point, only the coordinates.
Sorry, I am not sure how to include a replicable example
Two approaches:
1) Calculate average of all the lat/lon grids. This approach skews your county centre towards higher density grids
2) Calculate bounds[min-max lat/lon] of grids and average the bounds. This approach places the county centre in exactly centre of the grid span.
You will need to obtain the county (or state) data and then spatially join it with your dataframe. One possible source for such data is the TIGER shapefile published by the U.S. Census (see e.g. https://catalog.data.gov/dataset/tiger-line-shapefile-2016-nation-u-s-current-county-and-equivalent-national-shapefile).
You can then use the sf package to read the shapefile into R, join it with your data, and then use regular summary functions to summarize your data by county.
library(sf)
filename <- 'https://www2.census.gov/geo/tiger/TIGER2016/COUNTY/tl_2016_us_county.zip'
tmpfile <- tempfile()
tmpdir <- tempdir()
download.file(filename,tmpfile)
unzip(zipfile = tmpfile, exdir = tmpdir)
county_data <- st_read(paste0(tmpdir, '/tl_2016_us_county.shp'))
unlink(tmpfile)
unlink(tmpdir)

Any workaround to construct temperature distribution over multi-layers raster in R

Here I found a very interesting blog:critical threshold in temperature effects and empirical approach is very interesting, so I want to implement its idea in R. However, I have multi-layer raster data of German' historical daily temperatures (15 years' historical daily mean temperature) in large RasterBrick object. According to the empirical approach that discussed in inspired post, I need to construct the distribution of temperature from my multi-layer raster data.
Update 2: reproducible shapefile:
I aware that downloading shapefile from 3rd party website is not practical, so here I come up reproducible shapefile to give it try:
library(sf)
library(maps)
library(rgeos)
library(mapdata)
germany <- st_as_sf(map("Germany", plot = FALSE, fill = TRUE))
write_sf(germany, "germany.shp")
To easily follow up my post, I created reproducible raster data to work with in R. I also provide Germany' shapefile that taken from eurostat website; here is the shapefile on the fly (I can guarantee the link is quite safe and file is very small to use): eurostat' shapefile and here is handy reproducible raster data:
reproducible data
library(raster)
library(lubridate)
library(tidyverse)
r <- raster(xmn=5.75, xmx= 15, ymn = 47.25, ymx =55,res=c(0.25,0.25))
Deu_crop <- do.call(stack,lapply(1:5479,function(i) setValues(r,round(runif(n = ncell(r),min = -10,max = 25)))))
names(Deu_crop) <- paste0('X',gsub('-','.',ymd('1980.01.01') + days(1:5479)))
shp <- shapefile('eurostat_NUTS3/deu_adm_2006.shp')
e <- raster::extract(Deu_crop,shp)
names(e) <- shp$NUTS_ID
so to test the workflow that presented in inspired post, I need to design several global variables which will be used helper functions that presented in here. But I don't understand how to design some critical global variables that used to accomplish its workflow; It is recommended to define global variable like: w - weather data; tempDat: particular aggregated weather data; Trows: span aggregated grid data; and T: vector of integer temperature (details can be found here: details).
I want to estimate the distribution of temperatures over time from the gridded daily weather data. But I have a difficulty to test the empirical steps that presented in this inspired post because it didn't mention the solution for the case of handling multi-layer raster data, so I don't know how to adopt its fantastic idea on my own in R.
Here is my approach to aggregate multi-layer raster data for each polygons from shapefile (eurostat' shapefile) before using helper funcion in inspired post:
initial attempt to manipulate multi-layers raster:
rasterHelper <- function(ix,e){
gather(data.frame(e[[ix]],stringsAsFactors = F),'colname','temp') %>%
group_by(colname) %>% summarise(temp = mean(temp)) %>% ungroup() %>% # spatial mean
mutate(year = sub('X(\\d{4}).+','\\1',colname)) %>%
group_by(year) %>% summarise_all(funs(sum)) %>% mutate(NUTS_ID = names(e)[ix])
}
do.call(rbind,lapply(1:length(e),function(ix) rasterHelper(ix)))
but my above attempt is not working; In my attempt, I intend to aggregate temperature raster data for each polygon. The implementation of inspired post is very useful but it is still hard to follow up for handling multi-layer raster data. I assume I should work on each raster layer and construct temperature distribution over time, but really don't have a solid idea how to do in R. Any idea?
Update:
Here is the paper that I got inspiration from it: nonlinear temperature effect ..., but implementing the proposed method is still challenging for me even I followed the workflow that presented in the respective blog: searching critical threshold in temperature effect
Is there anyone point me out how to adopt its empirical approach on multi-layer raster data in R? How can I estimate the distribution of temperature over time? How can I make this happen in R? Any more thoughts? Thanks
I am not quite sure what you want to do.
To set up a smaller example:
library(raster)
lux <- shapefile(system.file("external/lux.shp", package="raster"))
r <- raster(lux)
s <- stack(lapply(1:12, function(i) setValues(r, 1:ncell(r))))
e <- extract(s, lux)
Now you say that you want to aggregate --- that is a bit ambiguous, but perhaps what you want is
x <- lapply(e, function(i) apply(i,2,mean))
equivalent to
y <- extract(s, lux, fun='mean')

Dealing with unordered XY points to create a polygon shapefile in R

I've inherited a geodatabase of polygons of lakes for which I am trying to create sampling grids on each lake. My current strategy is to export spatial data to CSV, use R to run a loop to create the grids on each lake, and then write to a new shapefile. However, here is my problem, when exporting to a CSV the WKT strings get messed up and put onto different lines. Okay, no problem, I moved on to exporting just the geometry to CSV so that I get X-Y values. When I simply plot the points they look perfect (using plot(y~x)), but the points are not in order. So, when I transform the data to a SpatialPolygon in the sp package in R using the following sequence:
XY-points -> Polygon -> Polygons -> SpatialPolygon
and then plot the SpatialPolygon I get this:
I know this is an artifact of incorrectly ordered points, because when I order the points by X and then by Y and run the same procedure here is what I get:
This is what the correct plotting is supposed to look like (X-Y data plotted with open circles):
Here is a short reproducible example of what I am trying to deal with:
library(sp)
# correct polygon
data <- data.frame(x=c(1:10, 10:1), y=c(5:1, 1:10, 10:6))
# plot(y~x, data=data)
correct.data.points <- rbind(data, data[1,]) # to close the ring for a polygon
correct.data.coords <- as.matrix(cbind(correct.data.points))
correct.data.poly <- Polygon(correct.data.coords, hole=F)
correct.data.poly <- Polygons(list(correct.data.poly), ID=0)
correct.data.poly.sp <- SpatialPolygons(list(correct.data.poly))
plot(correct.data.poly.sp)
# incorrect polygon
scr.data <- data[c(sample(1:20)),]
# plot(y~x, data=scr.data)
scr.data.points <- rbind(scr.data, scr.data[1,]) # to close the ring for a polygon
scr.data.coords <- as.matrix(cbind(scr.data.points))
scr.data.poly <- Polygon(scr.data.coords, hole=F)
scr.data.poly <- Polygons(list(scr.data.poly), ID=0)
scr.data.poly.sp <- SpatialPolygons(list(scr.data.poly))
plot(scr.data.poly.sp)
Any thoughts? Thanks for any help or insight anyone can provide. Also, for reference I am using QGIS 2.6.0 and the MMQGIS Python plugin to do the geometry exporting.

Subset SpatialPolygonsDataFrame

I have a SpatialPolygonsDataFrame that I created by reading in a shapefile using readOGR in the rgdal package. I am attempting to use it to generate a sampling grid using spsample in the sp package to be used for interpolation from survey data collected in the area. However, the SpatialPointsDataFrame encompasses a much larger area than the survey, and as a result, the interpolation is predicting values far away from where any surveys were conducted. The survey data and the shapefile were both projected using the same proj4string.
I would like to subset the SpatialPolygonsDataFrame using the coordinates set by the survey stations, but I'm not sure where the relevant values are stored in the object.
I'm afraid that I can't provide the relevant data as the shapefile is not hosted online. However, I will borrow some code from Paul Hiemstra's response to this post for the Netherlands:
library(ggplot2)
library(sp)
library(automap)
library(rgdal)
#get the spatial data for the Netherlands
con <- url("http://gadm.org/data/rda/NLD_adm0.RData")
print(load(con))
close(con)
class(gadm)
bbox(gadm)
> min max
>r1 3.360782 7.29271
>r2 50.755165 53.55458
Let's say that the surveys were conducted in this area:
bbox(surveys)
> min max
>r1 4.000 7.000
>r2 51.000 53.000
How can I crop out that area of the SpatialPolygonsDataFrame?
EDIT: This question appears to answer mine. Apologies for not searching hard enough (although the comments did give me some sense of where to turn with rgeos). However, gIntersection causes R to crash...
Depending upon the size of the polygons you could do something like
range = cbind(c(4,7), c(51,53))
centroids <- coordinates(spdf)
spdf.subset <- spdf[centroids[,1] > range[1,1] &
centroids[,1] < range[2,1] &
centroids[,2] > range[1,2] &
centroids[,2] < range[2,2],]

How to block a part of the level plot in R made using lattice package?

I have made a level plot in R of a variable using the lattice package. This grid corresponds to South Asia. I am only interested in viewing the values of this variable (aerosol optical depth) for certain countries in South Asia. I have a dummy variable that takes the value 1 for the countries I am interested in and 0 otherwise. Is it possible for me to colour this part of the grid black or any other colour?
I cannot show the level plot as I am low on reputation with stackoverflow. (The pdf that was attached to the crossposted message to rhelp should now appear:)
Here is my R code:
levelplot(aod ~ longitude + latitude | factor(day), data = aod_Jan,
aspect="iso", contour = TRUE, layout=c(1,1))
Since you are using geographical data, maybe the raster package is useful for you. For example, let's display the altitude of France (download this zip file or use the raster::getData function). After you unzip the file:
library(raster)
fraAlt <- raster('FRA_alt')
plot(fraAlt) ## Not only France is displayed...
If you want to display only the altitude of France, you need the information of the boundaries: download this RData file (or use the raster::getData function). This RData contains a SpatialPolygonsDataFrame (named gadm) which can be converted to a Raster with:
mk <- rasterize(gadm, fraAlt)
Now you can mask the altitude raster with the boundaries:
fraAltMask <- mask(fraAlt, x)
plot(fraAltMask) ##Now only France is displayed
Finally, if you want to use lattice methods you need the rasterVis package:
library(rasterVis)
levelplot(fraAlt)
levelplot(fraAltMask)
Now, all together with the boundaries superimposed:
s <- stack(fraAlt, fraAltMask)
layerNames(s) <- c('Alt', 'AltMask')
boundaries <- as(gadm, 'SpatialLines')
levelplot(s) + layer(sp.lines(boundaries))
Use the subset argument to levelplot. Perhaps:
levelplot(aod ~ longitude + latitude | factor(day), data = aod_Jan, subset = dummy==1,
aspect="iso", contour = TRUE, layout=c(1,1))

Resources