I am using the sf_package to work around spatial data in r. At this stage, I want to make a spatial join so that the tax lots of my area of study inherit the attributes of the floodplain on which they are located. For example, taxlots may be located in a floodplain categorized as X, VE, A, A0, or V (these are codes that relate to the intensity of the flood in each area).
To do this, I tested the sf function st_join, which will by default rely on st_intersects to determine the spatial join for each entity of my tax lots.
However, I am trying to figure out the criteria used by the function when a tax lot intersects with two different floodplain areas (as in the photo below, in which several lots intersect both with an A floodplain and an AE floodplain). Does it take the value of the area that covers the largest area of the lot? or is it a matter of which area is located upper in the dataframe?
Note that I am not interested in partitioning the intersecting lots so that I divide them according to their areas intersecting one and other floodplain zones.
Photo of tax lots intesecting with more than one floodplain category
By default, st_join(x, y, join = st_intersects) duplicates all features in x,
that intersect with more than one features from y.
If you set the argument largest = TRUE, st_join() returns the x features augmented with the fields of y that have the largest overlap with each of the features of x.
See https://r-spatial.github.io/sf/reference/st_join.html and https://github.com/r-spatial/sf/issues/578 for more details.
Related
I have two shapefiles of the quartiers of Paris, one before (available here) 1860 and one after (available here).
Neighborhood boundaries changed overtime, and I want to do an analysis with geography before 1860.
To do so, I need to compute how many new neighborhoods overlap the old ones by computing weights. There can be several new neighborhoods that overlap only one old neighborhood. At the end, I want to end up with a crosswalk matrix with all the weights of new neighborhoods to old neighborhoods.
Here is a figure to illustrate, (in orange the old, in red the new)
I have blocks of census data (shapefile with the column of interest being pop20) and polygons of areas of interest (shapefile with the column of interest being site). I am trying to get a sum of the population within each of the areas of interest (see example of one area of interest and the census blocks below). I don't know how to join the population estimates (column: pop20) to the areas of interest and account for polygons that are only partially within the areas of interest.
Hence I am interested in the following:
what is the population within each census block within each area of interest, accounting for some blocks only being partial inside (so if 1/2 the block is within the area of interest, assume the population is 1/2 of the value in pop20).
Then what is the sum of all the blocks within the area of interest weighing the blocks that are only partially within the area of interest from part 1.
I have essentially imported by shapefiles using the sf package but then I don't know what to do (do I use st_intersection or st_join or something else)?
pop<-st_read("...\\pop_census2020_prj.shp")
buff<-st_read("...\\trap_mcpbuff_prj.shp")
Thank you for your help.
I have presence points of a certain species all over the United States. I completed a spatial join between the US and said points. However, I am unsure of how to normalize the data. There is a "percent of total," but I am unsure if this is the appropriate option. Or is it as simple as just normalizing by the counts themselves?
It depends on what comparison you're trying to make with the normalized data.
If you want to look at the occurrence of that species by state, you could do a spatial join on a US States layer, then calculate a new field where the value is the species count for each state divided by the total area of the state. That would give you the normalized 'count per square mile' (or whatever unit you want).
I am trying to generate randomly distributed points in a rectangle.
To create 50 random points in a rectangle, I used
i=50
pp<-runifpoint(i, win=owin(c(0,19.5),c(0,3.12))
If I were to add conditions on the coordinates before randomly generating points,
e.g. 0.24 <x<19.26 ,0.24<y<2.64 ,
then generate random points, what code can I imply?
The ultimate goal is to generate points in the rectangle except for the grey shaded area, in the below image
This is a question about the R package spatstat.
The argument win specifies the spatial region in which the points will be generated. In your example you have specified this region to be a rectangle. You just need to replace this rectangle by the region in which you want the points to be generated.
You can construct spatial regions (objects of class owin) in many ways. See help(owin), or help(spatstat) for an overview.
In your example, you could build up the shape by forming the union of several rectangles. For example to make a simple cross shape, I could just write
require(spatstat)
A <- owin(c(-1,1), c(-4, 4))
B <- owin(c(-4,4), c(-1,1))
U <- union.owin(A, B)
plot(U)
Another way would be to specify the corners of the polygon shape and use W <- owin(poly=p) where p = list(x, y) contains the coordinates of the corners, listed in anticlockwise order without repetition. See help(owin).
This is also covered in Section 3.5 of the spatstat book. You can download Chapter 3 for free.
Problem: Suppose you have a collection of points in the 2D plane. I want to know if this set of points sits on a regular grid (if they are a subset of a 2D lattice). I would like some ideas on how to do this.
For now, let's say I'm only interested in whether these points form an axis-aligned rectangular grid (that the underlying lattice is rectangular, aligned with the x and y axes), and that it is a complete rectangle (the subset of the lattice has a rectangular boundary with no holes). Any solutions must be quite efficient (better than O(N^2)), since N can be hundreds of thousands or millions.
Context: I wrote a 2D vector field plot generator which works for an arbitrarily sampled vector field. In the case that the sampling is on a regular grid, there are simpler/more efficient interpolation schemes for generating the plot, and I would like to know when I can use this special case. The special case is sufficiently better that it merits doing. The program is written in C.
This might be dumb but if your points were to lie on a regular grid, then wouldn't peaks in the Fourier transform of the coordinates all be exact multiples of the grid resolution? You could do a separate Fourier transform the X and Y coordinates. If theres no holes on grid then the FT would be a delta function I think. FFT is O(nlog(n)).
p.s. I would have left this as a comment but my rep is too low..
Not quite sure if this is what you are after but for a collection of 2d points on a plane you can always fit them on a rectangular grid (down to the precision of your points anyway), the problem may be the grid they fit to may be too sparsly populated by the points to provide any benefit to your algorithm.
to find a rectangular grid that fits a set of points you essentially need to find the GCD of all the x coordinates and the GCD of all the y coordinates with the origin at xmin,ymin this should be O( n (log n)^2) I think.
How you decide if this grid is then too sparse is not clear however
If the points all come only from intersections on the grid then the hough transform of your set of points might help you. If you find that two mutually perpendicular sets of lines occur most often (meaning you find peaks at four values of theta all 90 degrees apart) and you find repeating peaks in gamma space then you have a grid. Otherwise not.
Here's a solution that works in O(ND log N), where N is the number of points and D is the number of dimensions (2 in your case).
Allocate D arrays with space for N numbers: X, Y, Z, etc. (Time: O(ND))
Iterate through your point list and add the x-coordinate to list X, the y-coordinate to list Y, etc. (Time: O(ND))
Sort each of the new lists. (Time: O(ND log N))
Count the number of unique values in each list and make sure the difference between successive unique values is the same across the whole list. (Time: O(ND))
If
the unique values in each dimension are equally spaced, and
if the product of the number of unique values of each coordinate is equal to the number of original points (length(uniq(X))*length(uniq(Y))* ... == N,
then the points are in a regular rectangular grid.
Let's say a grid is defined by an orientation Or (within 0 and 90 deg) and a resolution Res. You could compute a cost function that evaluate if a grid (Or, Res) sticks to your points. For example, you could compute the average distance of each point to its closest point of the grid.
Your problem is then to find the (Or, Res) pair that minimize the cost function. In order to narrow the search space and improve the , some a heuristic to test "good" candidate grids could be used.
This approach is the same as the one used in the Hough transform proposed by jilles. The (Or, Res) space is comparable to the Hough's gamma space.