Proper workflow to manipulate a raster to match the extent, origin, and resolution of another - r

I'm working with two rasters that differ in their origin, extent, and resolution. I have a bathymetry raster, with a higher resolution (x=0.0008333333, y=0.0008333333) and a MUCH great spatial extent. I also have a sea surface temperature raster, which has a much coarser resolution (x=0.04166667, y=0.04166667). Both rasters have the same projection (longlat, datum=WGS84).
I would like to manipulate the bathymetry raster to match the extent, origin, and resolution of the sea surface temperature raster. However, I have very little experience and I am uncertain of the 'best practices.'
I have tried two different methods, and I would like to know which is better, and maybe an explanation of how they differ in terms of the underlying processes. I'm also open to other methods that might be better at preserving the data.
Method 1:
1) first, I aggregated the bathymetry raster to make it as similar to the SST raster as possible
bathycoarse<-aggregate(bathymetry, fact=c(48,50), fun=mean)
2) second, I cropped the bathymetry raster by the SST raster
3) third, I resampled the bathymetry raster using the SST raster, resulting in the same origin and extent.
bathyresample<-resample(bathycoarsecrop, sst, method="bilinear")
Method 2: I used the function projectRaster()
bathy2<-projectRaster(bathymetry, sst, method="bilinear")
Obviously, method 2 is much simpler. But I don't really understand what the function is doing, so I want to make sure I am accomplishing my goal in the correct method.

The "projectRaster" function uses the same resampling as the "resample" function (the resampling method is defined by the "method" argument you set to "bilinear" - indicating bilinear interpolation, which is probably what you want when your dealing with continuous numeric datasets).
So using the function should just work fine for you.
If you want to speed things up, you can easily use paralell processing with the "projectRaster" function by starting a cluster with the "beginCluster" function, which then allows automatic parallel processing with the "projectRaster" function.
beginCluster(4) # use the number of cores you want to use
bathy2 <- projectRaster(bathymetry, sst, method="bilinear")


How to get covariate data from a geographic raster for `ppm`?

I want to fit a Poisson point-process model with spatstat::ppm and I'm unsure what is the best way to feed covariate data to the function. I understand that spatstat expects planar coordinates, so I have transformed my point location data to a planar crs before creating a ppp point pattern object. The covariate data are in a raster stack with unprojected geographic coordinates and I understand that projecting rasters is generally ill-advised. I extracted covariate values for the point locations from the raster using the points' original geographic coordinates and raster::extract. So far so good. The issue is ...
it is not sufficient to have observed the covariate only at the points
of the data point pattern; the covariate must also have been observed
at other locations in the window. -ppm helpfile
I appear to have two options for providing the covariate data to the data argument.
A pixel image; seems ill-advised because of raster projection issues.
A list of functions (one per covariate) that can be evaluated at any location (x,y) to obtain corresponding covariate values. This seems like the way to go, but my attempt at writing such a function turns out to be ridiculously slow. It calls raster::extract for each coordinate pair after transforming the coordinates to the raster's crs. While raster::extract is reasonably fast when given a large number of points, there appears to be a substantial overhead for each call. According to microbenchmark, the coordinate transformation takes about 4ms and the extraction takes about 582ms for a single covariate, or about 4 seconds for each point to get all 7 covariates. I don't know how many times ppm will want to call this, but if it's even once per point in the pattern, it'll take too long.
Is there some way I can find out what is the complete set of points that ppm will query for covariate data so that I can extract those beforehand with a single call?
It seems like my use case (covariates in a geographic raster) should be pretty common, so I'm guessing there's an established way to do this right. What is it?
Thanks for a well written question clearly identifying you need. It would have been even better with a simple reproducible example using e.g. built-in data from raster and spatstat or artificially generated data. In lack of the reproducible example my answer will not contain any code but outline what you could do.
First step in ppm is to make a quadrature scheme or class quad or logiquad depending on which maximum likelihood approximation is used in ppm. These can be generated directly by the user via quadscheme or quadscheme.logi. The quadrature scheme contains all the points where ppm will evaluate the covariates. You can extract the coordinates of the quadrature scheme using the function coords. If you construct a data.frame with all covariates evaluated at these points you can supply that as the data argument to ppm while the quadrature scheme is the first argument. To understand things better try to read the Details section of help(ppm.quad).
Another approach which may give you the optimal use of your data is to extract the grid points of you current raster stack together with all the covariate values and project this point data. Then convert it to a simple data.frame with columns x, y, covar1, covar2, etc. Then you can use x and y together with your point observations of interest to create a quadrature scheme manually and the remaining columns can be supplied as data to ppm.
It would be interesting to compare the results from both these approaches as well as the results from just projecting the raster stack and converting it to a list of im objects.

Randomly sampling an irregular raster extent in R

Is there a function in the R raster package that is analogous to sampleRandom but which extracts n random pixel values from within an irregularly shaped polygon feature rather than a rectangular extent object?
I know there are alternative approaches such as generating random points within a polygon and then use the extract() function to get pixel values, but am wondering if there is a more direct path I have missed.
No, there is not a single function for this.

Create buffer around spatial data in R

I have a spatial dataset of shopping centers that I would like to create buffers around in R.
I think these packages will be useful:
I was able to do so for a set of coordinates, but not for spatial data. The code looks like this:
coordinates(locs) <- c("Longitude", "Latitude") # set spatial coordinates
fivekm <- cbind(coordinates(locs), X=rowSums(distm (coordinates(locs)[,1:2], fun = distHaversine) / 1000 <= 5)) # number of points within 5 km
But I don't know what function/package to use for a set of polygons. Can someone please advise on the function (or code) and I will go from there?
In library rgeos, there is the gBuffer function that works with SpatialPoints or SpatialPolygons.
The width parameter allows to set the distance to which you want to buffer. However, be careful, this distance is in the scale of the coordinates system used. Thus, in degrees and not in meters with non-projected data. As suggested by #Ege Rubak, you will have to project your data with spTransform first (be sure to use the appropriate CRS according to your location).
As for now, rgeos library works with library sp, but not (yet?) with the recent sf.
I think the only option at the moment is to project your longitude and latitude points to a flat map and then do everything there. As far as I know there are no packages for doing polygonal geometry on the sphere yet (I'm working on one, but there is no ETA).
Projection used to be done with spTransform from the sp package, but now it may be more convenient to use the more modern simple features package sf which has the function st_transform. The vignette has a section called "Coordinate reference systems and transformations" to help you with this part. The buffering is described in the section "Geometrical operations".
The two previous post have covered the details but I thought it might be helpful to provide a workflow. This is assuming you have you are using points of lat and long. What is your original spatial data format?
Convert your coordinates into a Spatial Points Dataframe SpatialPointsDataFrame and assign it a geographic CRS (proj4) that matches your coordinate data (probably WGS84)
Change the projection to a local projected CRS with preferred units
Apply buffer to spatial point data frame, the width will now be in more usable units

Create stage height raster using least cost path and r

I have a point shapefile of Station IDs and stageheights. I would like to create a raster where each cell has the stage height value (in meters) of the closest in situ station to that cell.
I want this raster to match up with another raster. So I would like it if I could input both a raster I have created (dataset 3 described below) and my point shapefile (1).
1) Point Shapefile with stage heights of a river delta
2) Shapefile of the river delta extent
3) Raster of the delta where NA's represent land (could also have them be zero's if need be) and 1's are water. Two datasets 10 meter resolution and 30 meter resolution.
One conceptual issue I am having is with the amount of small streams I have.
For example (pictured in image below), station 1 (circled in blue) is technically closer to the black x region than station 2 (circled in red), but the stage height value in red is more representative of point x. There are NA's in between the two streams, does that mean that the value will not jump across streams?
How can I reassign the values in my Raster (all the 1's) to the stage height of the nearest station and make sure that these values are not jumping from stream to stream? Do I need to use least cost path? What is the best way to do this?
I would like to use R, but can use ArcMap if I must.
So I'm not sure what tools you have available to you but I think this answer may be useful:
Calculating attribute for network distance between multiple points in ArcGIS Desktop?
Here the questioner was looking to calculate distances on roads to some points, but your problem seems similar. I think the main point I would make here is that you should do your network distance classification prior to worrying about the raster layer. You may have to convert from polygon to lines or some workaround to get your data into a format that works, but this is the kind of job the tool is designed to do.
After you have reclassified your river shapefile based on their network distance to a given point, then convert the polygons to raster and use this to classify your original raster. You could do this in R or Arcmap. Arcmap will probably be faster.

How to use the function r.cost to get the least-cost path between two polygons?

I am a beginner in GRASS but I would like to get the least-cost path between two polygons. More exactely, I would like to get the smallest cost from any point situated at the edge of one polygon (polygon A) to any point situated at the edge of another polygon (polygon B).
Until now, I used the function CostDistance and CostPath of ArcGIS by using a cost raster where each cell had a cost value, a shapefile for the first polygon, and a shapefile for the second polygon. I would like to do the same thing with GRASS. I think that the function r.cost allows to do this. But I don't know how to specify in parameters the two polygons in GRASS ?
Have you got an example of how to use r.cost with two polygons in R with package spgrass6?
Thanks very much for your help.
If the use of GRASS is not mandatory and sticking with R is sufficient, you should check the marmap package. Section 2.4 of the vignette (vignette("marmap")) is entitled:
2.4 Using bathymetric data for least-cost path analysis
The marmap package allows for computation of least-cost path constrained within a range of depth/altitude between any number of points. The two key functions here are trans.mat() to create a transition matrix similar to the cost-raster you mention. Then, lc.dist() computes the least-cost distance and allows to plot the path between points.
Detailed examples are provided in the marmap vignette.
