Calculate distance to nearest neighbor in very large raster - r

I have a large raster (145.927.240 cells) with categorical data. The data can be found here:
https://developers.google.com/earth-engine/datasets/catalog/ESA_GLOBCOVER_L4_200901_200912_V2_3
For each cell I would like to calculate the distance to the nearest neighbor of each class. What is the most efficient (i.e. feasible) way to do this? I've looked for suitable packages, but so far I haven't found one that does what I want (with a raster of that size).
To give some context:
I would like to combine several raster files, convert them to a data table to use them as input in different models and then convert the result back to a raster file.

Related

How to create a binomial field in a raster from a polygon overlay?

I have a raster where each attribute field represents a different input in a fire risk model. I'm trying to test my model's skill at predicting fire locations. I have polygons representing two years of fire perimeters (years that weren't used to train the original model).
I'd like to create a binomial presence/absence field in my raster based on the fire polygons, where presence represents cells overlaid by a fire polygon and absence represents areas that did not burn in the study period. I'd like to include cells that are both completely and partially overlaid by a fire polygon as presences. Is there a good way to do this in either ArcMap or R?
I've considered rasterizing my polygons but that seems clunky. I'd like to avoid losing resolution if at all possible.
You say two contradictory things. First
I'd like to create a presence/absence raster based on the fire
polygons
And then
I've considered rasterizing my polygons but that seems clunky
Based on your goal (first statement) you could start with
r <- rasterize(x, y, field=1, getCover=TRUE)
(in the terra or raster packages)

How to get covariate data from a geographic raster for `ppm`?

I want to fit a Poisson point-process model with spatstat::ppm and I'm unsure what is the best way to feed covariate data to the function. I understand that spatstat expects planar coordinates, so I have transformed my point location data to a planar crs before creating a ppp point pattern object. The covariate data are in a raster stack with unprojected geographic coordinates and I understand that projecting rasters is generally ill-advised. I extracted covariate values for the point locations from the raster using the points' original geographic coordinates and raster::extract. So far so good. The issue is ...
it is not sufficient to have observed the covariate only at the points
of the data point pattern; the covariate must also have been observed
at other locations in the window. -ppm helpfile
I appear to have two options for providing the covariate data to the data argument.
A pixel image; seems ill-advised because of raster projection issues.
A list of functions (one per covariate) that can be evaluated at any location (x,y) to obtain corresponding covariate values. This seems like the way to go, but my attempt at writing such a function turns out to be ridiculously slow. It calls raster::extract for each coordinate pair after transforming the coordinates to the raster's crs. While raster::extract is reasonably fast when given a large number of points, there appears to be a substantial overhead for each call. According to microbenchmark, the coordinate transformation takes about 4ms and the extraction takes about 582ms for a single covariate, or about 4 seconds for each point to get all 7 covariates. I don't know how many times ppm will want to call this, but if it's even once per point in the pattern, it'll take too long.
Is there some way I can find out what is the complete set of points that ppm will query for covariate data so that I can extract those beforehand with a single call?
It seems like my use case (covariates in a geographic raster) should be pretty common, so I'm guessing there's an established way to do this right. What is it?
Thanks for a well written question clearly identifying you need. It would have been even better with a simple reproducible example using e.g. built-in data from raster and spatstat or artificially generated data. In lack of the reproducible example my answer will not contain any code but outline what you could do.
First step in ppm is to make a quadrature scheme or class quad or logiquad depending on which maximum likelihood approximation is used in ppm. These can be generated directly by the user via quadscheme or quadscheme.logi. The quadrature scheme contains all the points where ppm will evaluate the covariates. You can extract the coordinates of the quadrature scheme using the function coords. If you construct a data.frame with all covariates evaluated at these points you can supply that as the data argument to ppm while the quadrature scheme is the first argument. To understand things better try to read the Details section of help(ppm.quad).
Another approach which may give you the optimal use of your data is to extract the grid points of you current raster stack together with all the covariate values and project this point data. Then convert it to a simple data.frame with columns x, y, covar1, covar2, etc. Then you can use x and y together with your point observations of interest to create a quadrature scheme manually and the remaining columns can be supplied as data to ppm.
It would be interesting to compare the results from both these approaches as well as the results from just projecting the raster stack and converting it to a list of im objects.

Randomly sampling an irregular raster extent in R

Is there a function in the R raster package that is analogous to sampleRandom but which extracts n random pixel values from within an irregularly shaped polygon feature rather than a rectangular extent object?
I know there are alternative approaches such as generating random points within a polygon and then use the extract() function to get pixel values, but am wondering if there is a more direct path I have missed.
Thanks
No, there is not a single function for this.

Create stage height raster using least cost path and r

I have a point shapefile of Station IDs and stageheights. I would like to create a raster where each cell has the stage height value (in meters) of the closest in situ station to that cell.
I want this raster to match up with another raster. So I would like it if I could input both a raster I have created (dataset 3 described below) and my point shapefile (1).
Datasets:
1) Point Shapefile with stage heights of a river delta
2) Shapefile of the river delta extent
3) Raster of the delta where NA's represent land (could also have them be zero's if need be) and 1's are water. Two datasets 10 meter resolution and 30 meter resolution.
One conceptual issue I am having is with the amount of small streams I have.
For example (pictured in image below), station 1 (circled in blue) is technically closer to the black x region than station 2 (circled in red), but the stage height value in red is more representative of point x. There are NA's in between the two streams, does that mean that the value will not jump across streams?
How can I reassign the values in my Raster (all the 1's) to the stage height of the nearest station and make sure that these values are not jumping from stream to stream? Do I need to use least cost path? What is the best way to do this?
I would like to use R, but can use ArcMap if I must.
So I'm not sure what tools you have available to you but I think this answer may be useful:
Calculating attribute for network distance between multiple points in ArcGIS Desktop?
Here the questioner was looking to calculate distances on roads to some points, but your problem seems similar. I think the main point I would make here is that you should do your network distance classification prior to worrying about the raster layer. You may have to convert from polygon to lines or some workaround to get your data into a format that works, but this is the kind of job the tool is designed to do.
After you have reclassified your river shapefile based on their network distance to a given point, then convert the polygons to raster and use this to classify your original raster. You could do this in R or Arcmap. Arcmap will probably be faster.

Not able to export the desired results in R

I have a square mesh of latitudes and longitudes for a geographical region. However I only know the latitude and longitude values of the 4 corners of this mesh. Using these I need to calculate the lat-long values at all the cross hairs. So, I separately crested a nested loop program in R for latitudes and longitudes.
tllong<-67.481961
sink("output_long.txt")
for (i in c(1:11447)) {
for (j in c(1:10335)) {
tllong<- 67.481961 + (j-1)*0.0030769
print(tllong)
}
}
sink()
The above program was for calculating longitudes. tllong is the value of longitude at top left corner of the mesh. 11447 are the number of latitudes and 10335 are the number of longitudes.
Similarly I created a program for calculating latitudes.
tllat<-36.348639
sink("output_lat_again.txt")
for (i in c(1:11447)) {
for (j in c(1:10335)) {
print(tllat)
}
tllat<- tllat - (i-1)*0.002508
}
sink()
tllat is the value of latitude of the top left corner mesh square.
So as you can see that the loop first calculate all the lat,long values for first row Then goes to second row, then third and so on. However when I get the exported text files for both the programs, I get a single column containing all the values. This is not much of a use for me. I tried to export the output results of R in xlsx format using sink("output_long.xlsx") but when I get the excel file (after 4-5 hours of constant long run of loop) I fail to open it. The error message shows either the file is corrupted or file is of different format. I have tried this 3-4 times but in vain.
So how do I export the results of these two programs in an excel file such that I do not get all the values in a single column but in an appropriate matrix form (i.e. the values of lat,long in each cell corresponds to the values of lat,long in the corresponding cross hair of the mesh).
Also, it would be nice if someone can tell me how to run these two programs together so that I can get the lat-long values in a single run in the same file.
Seems like you want to create 10335*11447=118304745 pairs of lat/lon values. It's a pretty big number. Is that correct? However, I will show the procedure applied to a smaller example. Try this:
#setting the values of parameters
tllong<-67.481961
tllat<-36.348639
deltalong<-0.0030769
deltalat<-0.002508
#small example: you can set the following to the real values
nlong<-10
nlat<-10
#create vectors of values without loops
lat<-seq(tllat,by=deltalat,length.out=nlat)
lon<-seq(tllong,by=deltalong,length.out=nlong)
#now we build every possible pair of lat/lon values
latlong<-expand.grid(lon=lon,lat=lat)
#we export it to a csv file
write.csv(latlong,"somefile.csv",row.names=FALSE,quote=FALSE)
At the end, the somefile.csv will be created. Keep in mind that, with your values, the created file will be very big.

Resources