I have a dataset of species and their rough locations in a 100 x 200 meter area. The location part of the data frame is not in a format that I find to be usable. In this 100 x 200 meter rectangle, there are two hundred 10 x 10 meter squares named A through CV. Within each 10 x 10 square there are four 5 x 5 meter squares named 1, 2, 3, and 4, respectively (1 is south of 2 and west of 3. 4 is east of 2 and north of 3). I want to let R know that A is the square with corners at (0 ,0), (10,0), (0,0), and (0,10), that B is just north of A and has corners (0,10), (0,20), (10,10), and (10,20), and K is just east of A and has corners at (10,0), (10,10), (20,0), and (20,10), and so on for all the 10 x 10 meter squares. Additionally, I want to let R know where each 5 x 5 meter square is in the 100 x 200 meter plot.
So, my data frame looks something like this
10x10 5x5 Tree Diameter
A 1 tree1 4
B 1 tree2 4
C 4 tree3 6
D 3 tree4 2
E 3 tree5 3
F 2 tree6 7
G 1 tree7 12
H 2 tree8 1
I 2 tree9 2
J 3 tree10 8
K 4 tree11 3
L 1 tree12 7
M 2 tree13 5
Eventually, I want to be able to plot the 100 x 200 meter area and have each 10 x 10 meter square show up with the number of trees, or number of species, or total biomass
What is the best way to turn the data I have into spatial data that R can use for graphing and perhaps analysis?
Here's a start.
## set up a vector of all 10x10 position tags
tags10 <- c(LETTERS,
paste0("A",LETTERS),
paste0("B",LETTERS),
paste0("C",LETTERS[1:22]))
A function to convert (e.g.) {"J",3} to the center of the corresponding sub-square.
convpos <- function(pos10,pos5) {
## convert letters to major (x,y) positions
p1 <- as.numeric(factor(pos10,levels=tags10)) ## or use match()
p1.x <- ((p1-1) %% 10) *10+5 ## %% is modulo operator
p1.y <- ((p1-1) %/% 10)*10+5 ## %/% is integer division
## sort out sub-positions
p2.x <- ifelse(pos5 <=2,2.5,7.5) ## {1,2} vs {3,4} values
p2.y <- ifelse(pos5 %%2 ==1 ,2.5,7.5) ## odd {1,3} vs even {2,4} values
c(p1.x+p2.x,p1.y+p2.y)
}
usage:
convpos("J",2)
convpos(mydata$tenbytenpos,mydata$fivebyfivepos)
Important notes:
this is a proof of concept, I can pretty much guarantee I haven't got the correspondence of x and y coordinates quite right. But you should be able to trace through this line-by-line and see what it's doing ...
it should work correctly on vectors (see second usage example above): I switched from switch to ifelse for that reason
your column names (10x10) are likely to get mangled into something like X10.10 when reading data into R: see ?data.frame and ?check.names
Similar to what #Ben Bolker has done, here's a lookup function (though you may need to transpose something to make the labels match what you describe).
tenbyten <- c(LETTERS[1:26],
paste0("A",LETTERS[1:26]),
paste0("B",LETTERS[1:26]),
paste0("C",LETTERS[1:22]))
tenbyten <- matrix(rep(tenbyten, each = 2), ncol = 10)
tenbyten <- t(apply(tenbyten, 1, function(x){rep(x, each = 2)}))
# the 1234 squares
squares <- matrix(c(rep(c(1,2),10),rep(c(4,3),10)), nrow = 20, ncol = 20)
# stick together into a reference grid
my.grid <- matrix(paste(tenbyten, squares, sep = "-"), nrow = 20, ncol = 20)
# a lookup function for the site grid
coordLookup <- function(tbt, fbf, .my.grid = my.grid){
x <- col(.my.grid) * 5 - 2.5
y <- row(.my.grid) * 5 - 2.5
marker <- .my.grid == paste(tbt, fbf, sep = "-")
list(x = x[marker], y = y[marker])
}
coordLookup("BB",2)
$x
[1] 52.5
$y
[1] 37.5
If this isn't what you're looking for, then maybe you'd prefer a SpatialPolygonsDataFrame, which has proper polygon IDs, and you attach data to, etc. In that case just Google around for how to make one from scratch, and manipulate the row() and col() functions to get your polygon corners, similar to what's given in this lookup function, which only returns centroids.
Edit: getting SPDF started:
This is modified from the function example and can hopefully be a good start:
library(sp)
# really you have a 20x20 grid, counting the small ones.
# c(2.5,2.5) specifies the distance in any direction from the cell center
grd <- GridTopology(c(1,1), c(2.5,2.5), c(20,20)))
grd <- as.SpatialPolygons.GridTopology(grd)
# get centroids
coords <- coordinates(polys)
# make SPDF, with an extra column for your grid codes, taken from the above.
# you can add further columns to this data.frame(), using polys#data
polys <- SpatialPolygonsDataFrame(grd,
data=data.frame(x=coords[,1], y=coords[,2], my.ID = as.vector(my.grid),
row.names=getSpPPolygonsIDSlots(grd)))
Related
I have data frame, for example
df <- data.frame(x = 1:1e3, y = rnorm(1e3))
I need to split points on N (in my case N = 6, 12 and 24) rectangles with equal number of points. How to split my df using R-tree algorithm?
For uniformely distributed data on the x axis, kmeans clustering works (without surprise) well:
library(dplyr)
library(ggplot2)
set.seed(1)
df <- data.frame(x = 1:1e3, y = rnorm(1e3))
N <- 10
df$cluster <- kmeans(df,N)$cluster
cluster_rectangles <- df %>% group_by(cluster) %>%
summarize(xmin = min(x),
xmax = max(x),
ymin = min(y),
ymax = max(y),
n = n())
ggplot() + geom_rect(data = cluster_rectangles, mapping=aes(xmin=xmin, xmax=xmax, ymin=ymin, ymax=ymax, fill=cluster)) +
geom_point(data = df,mapping=aes(x,y),color='white')
It also works if x distribution is normal :
df <- data.frame(x = rnorm(1e3), y = rnorm(1e3))
Drawback is that the number of points for each rectangle varies :
> cluster_rectangles %>% select(cluster,n)
# A tibble: 10 x 2
cluster n
<int> <int>
1 1 137
2 2 58
3 3 121
4 4 61
5 5 72
6 6 184
7 7 78
8 8 70
9 9 126
10 10 93
For an uniform distribution, the result is quite good (with N=9):
In case that all the points have different x coordinates, as it is the case in your example, sort the points increasingly according to the x coordinate. Note that, in this case, your problem of finding a covering with rectangles (with equal number of points) for the 2d points can be simplified to finding a covering with segments for 1d points (i.e. you can ignore the height of the rectangles).
Here how you can find the points in each rectangle:
num_rect <- 7 # In your example 6, 12 or 24
num_points <- 10 # In your example 1e3
# Already ordered according to x
df <- data.frame(x = 1:num_points, y = rnorm(num_points))
# Minimum number of points in the rectangles to cover all of them
points_in_rect <- ceiling(num_points/num_rect)
# Cover the first points using non-overlaping rectangles
breaks <- seq(0,num_points, by=points_in_rect)
cover <- split(seq(num_points), cut(seq(num_points), breaks))
names(cover) <- paste0("rect", seq(length(cover)))
# Cover the last points using overlaping rectangles
cur_num <- length(cover)
if (num_points < num_rect*points_in_rect ) {
# To avoid duplicate rectangles
last <- num_points
if (num_points %% 1 == 0)
last <- last -1
while (cur_num < num_rect) {
cur_num <- cur_num + 1
new_rect <- list(seq(last-points_in_rect+1, last))
names(new_rect) <- paste0("rect", cur_num)
cover <- c(cover,new_rect)
last <- last - points_in_rect
}
}
The points in the rectangles are:
$rect1
[1] 1 2
$rect2
[1] 3 4
$rect3
[1] 5 6
$rect4
[1] 7 8
$rect5
[1] 9 10
$rect6
[1] 8 9
$rect7
[1] 6 7
The minimum bounding rectangles (parallel to the axes) that enclose those set of points are the ones that you are finding.
Duplicated coordinate values in both axes
Randomly rotate the points (save the rotation angle) and check if there are not duplicate x (or y) coordinates. If this is the case, use the above strategy with the rotated coordinates (remember to sort before the rotated points according to the new x coordinates), and then rotate back the obtained rectangles in the opposite direction. If duplicated coordinates remain in both axes, rotate the points again with a different (random) angle. Since you have a finite number of points, you can always find a rotation angle that separates de x (or y) coordinates.
I am trying to apply the Simpson's Diversity Index across a number of different datasets with a variable number of species ('nuse') captured. As such I am trying to construct code which can cope with this automatically without needing to manually construct a formula each time I do it. Example dataset for a manual formula is below:
diverse <- data.frame(nuse1=c(0,20,40,20), nuse2=c(5,5,3,20), nuse3=c(0,2,8,20), nuse4=c(5,8,2,20), total=c(10,35,53,80))
simp <- function(x) {
total <- x[,"total"]
nuse1 <- x[,"nuse1"]
nuse2 <- x[,"nuse2"]
nuse3 <- x[,"nuse3"]
nuse4 <- x[,"nuse4"]
div <- round(((1-(((nuse1*(nuse1 - 1)) + (nuse2*(nuse2 - 1)) + (nuse3*(nuse3 - 1)) + (nuse4*(nuse4 - 1)))/(total*(total - 1))))),digits=4)
return(div)
}
diverse$Simpson <- simp(diverse)
diverse
As you can see this works fine. However, how would I be able to create a function which could automatically adjust to, for example, 9 species (so up to nuse9)?
I have experimented with the paste function + as.formula as indicated here Formula with dynamic number of variables; however it is the expand form of (nuse1 * (nuse1 - 1)) that I'm struggling with. Does anyone have any suggestions please? Thanks.
How about something like:
diverse <- data.frame(nuse1=c(0,20,40,20), nuse2=c(5,5,3,20), nuse3=c(0,2,8,20), nuse4=c(5,8,2,20), total=c(10,35,53,80))
simp <- function(x, species) {
spcs <- grep(species, colnames(x)) # which column names have "nuse"
total <- rowSums(x[,spcs]) # sum by row
div <- round(1 - rowSums(apply(x[,spcs], 2, function(s) s*(s-1))) / (total*(total - 1)), digits = 4)
return(div)
}
diverse$Simpson2 <- simp(diverse, species = "nuse")
diverse
# nuse1 nuse2 nuse3 nuse4 total Simpson2
# 1 0 5 0 5 10 0.5556
# 2 20 5 2 8 35 0.6151
# 3 40 3 8 2 53 0.4107
# 4 20 20 20 20 80 0.7595
All it does is find out which columns start with "nuse" or any other species you have in your dataset. It constructs the "total" value within the function and does not require a total column in the dataset.
I have a parameter space given by (x,y) with x values from 1:5 and y values from 1:8. Let's say my current point p is located at (2,5) (it is colored in red). My goal is to try to pull all the points within one unit distance away from point p (the points in blue).
I was wondering if there was an efficient way to do this. Let's say my variables are stored in the following way:
xrange <- 1:5
yrange <- 1:8
grid <- expand.grid(xrange,yrange)
p <- data.frame(x=2,y=5)
I would like to store the other points below p in this fashion:
res <- data.frame(x=c(1,1,1,2,2,3,3,3),y=c(4,6,4,5,6,4,5,6))
res <- rbind(p,res)
> res
x y
1 2 5
2 1 4
3 1 6
4 1 4
5 2 5
6 2 6
7 3 4
8 3 5
9 3 6
The ultimate goal is to have a parameter space that is more than 2 dimensional. So I would eventually like to find all points that are some euclidean distance s away and similarly have a resulting dataframe with each column being a parameter in the parameter space and each row being a point with coordinates (x,y,z,..,etc) from its columns.
EDIT I have tried the following implementation if I wanted a circle or euclidean distance s and this seems to work. I am not sure how efficient the solution is though.
eucdist <- function(z,p){
return(dist(rbind(z, p)))
}
# in this case s=1 since that is the <= condition
res <- do.call(rbind,lapply(1:nrow(grid),function(m) if(eucdist(as.numeric(grid[m,]),as.numeric(p[1,])) <= 1){return(grid[m,])}))
More information: for now, my parameter space is discretized like the one in the picture above. Eventually some parameters will be continuous mixed in with discrete parameters as well. Thank you so much!
The euclidean distance of each point on the grid from the target point p can be efficiently computed with:
dist <- sqrt(rowSums(mapply(function(x,y) (x-y)^2, grid, p)))
Basically the inner mapply call will result in a matrix of the same size as grid but that has the squared distance of that point from the target point in that dimension; rowSums and sqrt efficiently then compute the euclidean distance.
In this case you are including anything with sqrt(2) Euclidean distance from the target point:
grid[dist < 1.5,]
# Var1 Var2
# 16 1 4
# 17 2 4
# 18 3 4
# 21 1 5
# 22 2 5
# 23 3 5
# 26 1 6
# 27 2 6
# 28 3 6
The use of mapply (operating over dimensions) and rowSums makes this much more efficient than an approach that loops through individual points on the grid, computing the distance to the target point. To see this, consider a slightly larger example with 1000 randomly distributed points in three dimensions:
set.seed(144)
grid <- data.frame(x=rnorm(1000), y=rnorm(1000), z=rnorm(1000))
p <- data.frame(x=rnorm(1), y=rnorm(1), z=rnorm(1))
lim <- 1.5
byrow <- function(grid, p, lim) grid[apply(grid, 1, function(x) sqrt(sum((x-p)^2))) < lim,]
vectorized <- function(grid, p, lim) grid[sqrt(rowSums(mapply(function(x,y) (x-y)^2, grid, p))) < lim,]
identical(byrow(grid, p, lim), vectorized(grid, p, lim))
[1] TRUE
library(microbenchmark)
# Unit: microseconds
# expr min lq mean median uq max neval
# byrow(grid, p, lim) 446792.71 473428.137 500680.0431 495824.7765 521185.093 579999.745 10
# vectorized(grid, p, lim) 855.33 881.981 954.1773 907.3805 1081.658 1108.679 10
The vectorized approach is 500 times faster than the approach that loops through the rows.
This approach can be used in cases where you have many more points (1 million in this example):
set.seed(144)
grid <- data.frame(x=rnorm(1000000), y=rnorm(1000000), z=rnorm(1000000))
p <- data.frame(x=rnorm(1), y=rnorm(1), z=rnorm(1))
lim <- 1.5
system.time(vectorized(grid, p, lim))
# user system elapsed
# 3.466 0.136 3.632
Here's how to do it with package FNN. The result is different from what you have because your solution has (1 4) and (2 5) twice. The solution also works with border data. You will only have 6 nearest neighbors if your x or y is 1 or on the edge of your matrix.
library(FNN)
x <-2
y <- 5
pt <-grid[grid$Var1==x & grid$Var2==y ,] #target point
distance <-knnx.dist(grid,pt,k=9) #distance from pt
k <-length(distance[distance<2]) #distance is less than 2. Useful for border data
nearest <-knnx.index(grid,pt,k=k) #find index of k nearest neighbors
grid[nearest,]
Var1 Var2
22 2 5
23 3 5
27 2 6
21 1 5
17 2 4
26 1 6
28 3 6
18 3 4
16 1 4
I see that you also have asked for higher dimensions. It would still work witht he following changes:
x <-2
y <- 5
z <-3
pt <-grid[grid$Var1==x & grid$Var2==y & grid$Var3==z ,] #3-dimensional point
distance <-knnx.dist(grid,pt,k=27) #increase to k=27
k <-length(distance[distance<2])
nearest <-knnx.index(grid,pt,k=k)
grid[nearest,]
On extracting values of a raster to points I find that I have several NA's, and rather than use a buffer and fun arguments of extract function, instead I'd like to extract the nearest non-NA Pixel to a point that overlaps NA.
I am using the basic extract function:
data.extr<-extract(loc.thr, data[,11:10])
Here's a solution without using the buffer. However, it calculates a distance map separately for each point in your dataset, so it might be ineffective if your dataset is large.
set.seed(2)
# create a 10x10 raster
r <- raster(ncol=10,nrow=10, xmn=0, xmx=10, ymn=0,ymx=10)
r[] <- 1:10
r[sample(1:ncell(r), size = 25)] <- NA
# plot the raster
plot(r, axes=F, box=F)
segments(x0 = 0, y0 = 0:10, x1 = 10, y1 = 0:10, lty=2)
segments(y0 = 0, x0 = 0:10, y1 = 10, x1 = 0:10, lty=2)
# create sample points and add them to the plot
xy = data.frame(x=runif(10,1,10), y=runif(10,1,10))
points(xy, pch=3)
text(x = xy$x, y = xy$y, labels = as.character(1:nrow(xy)), pos=4, cex=0.7, xpd=NA)
# use normal extract function to show that NAs are extracted for some points
extracted = extract(x = r, y = xy)
# then take the raster value with lowest distance to point AND non-NA value in the raster
sampled = apply(X = xy, MARGIN = 1, FUN = function(xy) r#data#values[which.min(replace(distanceFromPoints(r, xy), is.na(r), NA))])
# show output of both procedures
print(data.frame(xy, extracted, sampled))
# x y extracted sampled
#1 5.398959 6.644767 6 6
#2 2.343222 8.599861 NA 3
#3 4.213563 3.563835 5 5
#4 9.663796 7.005031 10 10
#5 2.191348 2.354228 NA 2
#6 1.093731 9.835551 2 2
#7 2.481780 3.673097 3 3
#8 8.291729 2.035757 9 9
#9 8.819749 2.468808 9 9
#10 5.628536 9.496376 6 6
This is a raster-based solution, by first filling the NA pixels with the nearest non-NA pixel value.
Note however, that this does not take into account the position of a point within a pixel. Instead, it calculates the distances between pixel centers to determine the nearest non-NA pixel.
First, it calculates for each NA raster pixel the distance and direction to the nearest non-NA pixel. The next step is to calculate the coordinates of this non-NA cell (assumes projected CRS), extract its value and to store this value at the NA location.
Starting data: a projected raster, with identical values as in the answer from koekenbakker:
set.seed(2)
# set projected CRS
r <- raster(ncol=10,nrow=10, xmn=0, xmx=10, ymn=0,ymx=10, crs='+proj=utm +zone=1')
r[] <- 1:10
r[sample(1:ncell(r), size = 25)] <- NA
# create sample points
xy = data.frame(x=runif(10,1,10), y=runif(10,1,10))
# use normal extract function to show that NAs are extracted for some points
extracted <- raster::extract(x = r, y = xy)
Calculate the distance and direction from all NA pixels to the nearest non-NA pixel:
dist <- distance(r)
# you can also set a maximum distance: dist[dist > maxdist] <- NA
direct <- direction(r, from=FALSE)
Retrieve coordinates of NA pixels
# NA raster
rna <- is.na(r) # returns NA raster
# store coordinates in new raster: https://stackoverflow.com/a/35592230/3752258
na.x <- init(rna, 'x')
na.y <- init(rna, 'y')
# calculate coordinates of the nearest Non-NA pixel
# assume that we have a orthogonal, projected CRS, so we can use (Pythagorean) calculations
co.x <- na.x + dist * sin(direct)
co.y <- na.y + dist * cos(direct)
# matrix with point coordinates of nearest non-NA pixel
co <- cbind(co.x[], co.y[])
Extract values of nearest non-NA cell with coordinates 'co'
# extract values of nearest non-NA cell with coordinates co
NAVals <- raster::extract(r, co, method='simple')
r.NAVals <- rna # initiate new raster
r.NAVals[] <- NAVals # store values in raster
Fill the original raster with the new values
# cover nearest non-NA value at NA locations of original raster
r.filled <- cover(x=r, y= r.NAVals)
sampled <- raster::extract(x = r.filled, y = xy)
# compare old and new values
print(data.frame(xy, extracted, sampled))
# x y extracted sampled
# 1 5.398959 6.644767 6 6
# 2 2.343222 8.599861 NA 3
# 3 4.213563 3.563835 5 5
# 4 9.663796 7.005031 10 10
# 5 2.191348 2.354228 NA 3
# 6 1.093731 9.835551 2 2
# 7 2.481780 3.673097 3 3
# 8 8.291729 2.035757 9 9
# 9 8.819749 2.468808 9 9
# 10 5.628536 9.496376 6 6
Note that point 5 takes another value than the answer of Koekenbakker, since this method does not take into account the position of the point within a pixel (as mentioned above). If this is important, this solution might not be appropriate. In other cases, e.g. if the raster cells are small compared to the point accuracy, this raster-based method should give good results.
For a raster stack, use #koekenbakker's solution above, and turn it into a function. A raster stack's #layers slot is a list of rasters, so, lapply it across and go from there.
#new layer
r2 <- raster(ncol=10,nrow=10, xmn=0, xmx=10, ymn=0,ymx=10)
r2[] <- 1:10
r2[sample(1:ncell(r2), size = 25)] <- NA
#make the stack
r_stack <- stack(r, r2)
#a function for sampling
sample_raster_NA <- function(r, xy){
apply(X = xy, MARGIN = 1,
FUN = function(xy) r#data#values[which.min(replace(distanceFromPoints(r, xy), is.na(r), NA))])
}
#lapply to get answers
lapply(r_stack#layers, function(a_layer) sample_raster_NA(a_layer, xy))
Or to be fancy (speed improvements?)
purrr::map(r_stack#layers, sample_raster_NA, xy=xy)
Which makes me wonder if the whole thing can be sped up even more using dplyr...
I'd like to snap single points to other points based on their distance in R.
In detail, I have a bunch of points defined by X and Y coordinate pairs.
In addition I have single different points, that I want to snap to
the closest neighbors (euclidean distance).
# target points
df <- data.frame(X=c(1,2,2,2,3,4),Y=c(1,2,2,3,3,4))
# points that need snapping
point1 <- data.frame(X=2.1, Y=2.3)
point2 <- data.frame(X=2.5, Y=2.5)
plot(df)
points(point1$X,point1$Y,pch=20,col="red")
points(point2$X,point2$Y,pch=20,col="blue")
But how to proceed with the snapping of the points?
How can I snap the points and assign new coordinate pairs to the single points?
Is there a simple function available in R? Or do I need to apply
the dist() function to obtain a distance matrix and search for
the closest distance? Maybe there is a more straight forward way.
Thats how it should look like:
1) snap to the closest (euclidean distance) point (clear solution for point 1)
point1$X_snap <- 2
point1$Y_snap <- 2
2) if two or more points similarily close than
snap to that which is more "north-east"
a) snap first to the one which is more north (Y direction)
b) if there are more than one that a similarly distant in Y direction snap
to the one that is more east
point2$X_snap <- 3
point2$Y_snap <- 3
For a graphical illustration how the results should look like
#plot snapped points:
points(point1$X_snap,point1$Y_snap,pch=8,col="red")
points(point2$X_snap,point2$Y_snap,pch=8,col="blue")
It's possible using dist:
dist(rbind(point1,df))
1 2 3 4 5
2 1.7029386
3 0.3162278 1.4142136
4 0.3162278 1.4142136 0.0000000
5 0.7071068 2.2360680 1.0000000 1.0000000
6 1.1401754 2.8284271 1.4142136 1.4142136 1.0000000
7 2.5495098 4.2426407 2.8284271 2.8284271 2.2360680
6
2
3
4
5
6
7 1.4142136
So the row with the minimum value (distance) in the first column identifies the point in df which is closest to point1 . In your example, you have a repeated location. Repeat for each of your point_x .
I found another solution using the matchpt() function
from Biobase (Bioconductor):
# target points
df <- data.frame(X=c(1,2,2,2,3,4),Y=c(1,2,2,3,3,4))
# points that need snapping
point1 <- data.frame(X=2.1, Y=2.3)
point2 <- data.frame(X=2.5, Y=2.5)
snap <- function(df,point){
require(Biobase)
d <- matchpt(as.matrix(df),
as.matrix(data.frame(X=point$X+0.0001,Y=point$Y+0.0001))) # to the "northwest" criteria correct
min_row <- as.numeric(rownames(d[d$distance==min(d$distance),]))
point$X_snap <- unique(df[min_row,"X"])
point$Y_snap <- unique(df[min_row,"Y"])
point
}
snap(df,point2)
You might also want to try the RANN package for Fast Nearest Neighbour Search:
# your data
df <- data.frame(X=c(1,2,2,2,3,4),Y=c(1,2,2,3,3,4))
pts <- data.frame(X=c(2.1, 2.5), Y=c(2.3, 2.5))
library(RANN)
# for each point in pts, find the nearest neighbor from df
closest <- RANN::nn2(data = df, query = pts, k = 1)
# argument k sets the number of nearest neighbours, here 1 (the closest)
closest
# $nn.idx
# [,1]
# [1,] 3
# [2,] 5
#
# $nn.dists
# [,1]
# [1,] 0.3162278
# [2,] 0.7071068
# Get coordinates of nearest neighbor
pts$X_snap <- df[closest$nn.idx, "X"]
pts$Y_snap <- df[closest$nn.idx, "Y"]
pts
# X Y X_snap Y_snap
# 1 2.1 2.3 2 2
# 2 2.5 2.5 3 3
I would put the criteria (distance, "southness", "westness") in a dataframe and then sort this dataframe along these criteria:
# input data
df <- data.frame(X=c(1,2,2,2,3,4),Y=c(1,2,2,3,3,4))
point1 <- data.frame(X=2.1, Y=2.3)
point2 <- data.frame(X=2.5, Y=2.5)
df.res[with(df.res, order(dst, dy, dx)), ]
# function that sorts all potential snapping points according to distance, "westness", "southness"
snap.xy <- function(point, other.points) {
df.res <- data.frame(X = other.points$X, # to later access the coordinates to snap to
Y = other.points$Y, # dto
dx <- point$X - other.points$X, # "westness" (the higher, the more "west")
dy <- point$Y - other.points$Y, # "southness"
dst = sqrt(dx^2 + dy^2)) # distance
# print(df.res[with(df.res, order(dst, dy, dx)), ]) # just for checking the results
return(df.res[with(df.res, order(dst, dy, dx)), ][1,c("X", "Y")]) # return only the X/Y coordinates
}
# examples
snap.xy(point1, df) # 2/2
snap.xy(point2, df) # 3/3
snap.xy(point2, df)$X # 3
snap.xy(point2, df)$Y # 3