I have a data frame with two columns including coordinates in meters (about 45.000 locations).
What I want to do is to calculate the minimum and maximum distances between the locations. I have tried to calculate the minimum distance as follow:
library(sf)
xco<-c(320963.6,421813.6,315423.6,405733.6,365603.6)
yco<-c(172137.7,165287.7,232197.7,138917.7,183697.7)
mydata<-data.frame(xco,yco)
mydata_sf<-st_as_sf(mydata, coords = c("coords.x1", "coords.x2"), crs = 2100)
dist_df<-as.data.frame(st_distance(mydata_sf))
min(dist_df[dist_df> 0])
However, that gives me a value which I can not see in my data.
Can anyone suggest a faster and better way to do that?
Thank you!
You have an error in your code. The fifth line should be
mydata_sf <- st_as_sf(mydata, coords = c("xco", "yco"), crs = 2100)
Then
dist_df <- as.data.frame(st_distance(mydata_sf))
Gives you a labeled distance matrix. You just need the upper or lower triangle:
range(dist_df[lower.tri(dist_df)])
# [1] 30885.97 129834.72
The first value is the minimum distance and the second is the maximum.
Related
I would like to group polygons together based on a distance criteria:
Any polygon within a certain distance (1200 metres or less) of an origin polygon are grouped together
If other polygons are within the same distance (1200 metres or less) of these 'neighbouring' polygons they are added to this same group
The process for this group continues until no further polygons are added (because they are all further than 1200 metres away).
The next ungrouped polygon is selected and the process repeats for a new grouping
Polygons with no neighbour within 1200 metres are assigned to be in a group by themselves
A polygon should only belong to one group
The final output would be a table with the single polygon ID (UID) and the group ID it belongs to (GrpID) and the average distance between the polygons in that group
I am sure a distance matrix with st_distance means this is possible, but I'm just not getting it.
library(sf)
library(dplyr)
download.file("https://drive.google.com/uc?export=download&id=1-I4F2NYvFWkNqy7ASFNxnyrwr_wT0lGF" , destfile="ProximityAreas.zip")
unzip("ProximityAreas.zip")
Proximity_Areas <- st_read("Proximity_Areas.gpkg")
Dist_Matrix <- Proximity_Areas %>%
st_distance(. , by_element = FALSE)
This function uses sf and igraph package functions:
group_polygons <- function(polys, distance){
## get distance matrix
dist_matrix = st_distance(polys, by_element = FALSE)
## this object has units, so get rid of them:
class(dist_matrix) = NULL
## make a binary 0/1 matrix where 1 if two polys are inside the distance threshold
connected = dist_matrix < distance
## make a graph
g = igraph::graph_from_adjacency_matrix(connected)
return(components(g)$membership)
}
You can use it like this:
Proximity_Areas$Group = group_polygons(Proximity_Areas, 1200)
Let's make a category for mapping:
Proximity_Areas$FGroup = factor(Proximity_Areas$Group)
plot(Proximity_Areas[,"FGroup"])
There are three clusters here, the big one, one with 3 regions on the right, and one singleton region on the left. All the orange regions could be connected together by bridges that are less than 1200m long.
If you want to compute the average distance without re-computing the distance matrix, you can do this within the function by subsetting according to the membership value from the components function. The key here is computing the binary 0/1 matrix and using igraph to compute the connectivity of that as an adjacency matrix.
I try to find groups of points within a radius of 300 meters that gather the highest amount. I am looking for the coordinates of this point. Note that the center point of the area that gather the highest amount has no reason to be a point in data frame observations.
I have the following data:
observations <- spatialrisk::insurance %>%
dplyr::select(amount, lon, lat)
The function spatialrisk::concentration determines the concentration for all target points (i.e. sub):
spatialrisk::concentration(sub = observations,
full = observations,
value = amount, radius = 300)
The function is written in C++ (Rcpp), and is therefore fast. However, the approach is not 'smart'.
Any ideas for a faster solution with the raster (or velox) package? Or with a kernel density approach.
The figure is the plot of x,y set in a excel file, total 8760 pair of x and y. I want to remove the noise data pair in red circle area and output a new excel file with remain data pair. How could I do it in R?
Using #G5W's example:
Make up data:
set.seed(2017)
x = runif(8760, 0,16)
y = c(abs(rnorm(8000, 0, 1)), runif(760,0,8))
XY = data.frame(x,y)
Fit a quantile regression to the 90th percentile:
library(quantreg)
library(splines)
qq <- rq(y~ns(x,20),tau=0.9,data=XY)
Compute and draw the predicted curve:
xvec <- seq(0,16,length.out=101)
pp <- predict(qq,newdata=data.frame(x=xvec))
plot(y~x,data=XY)
lines(xvec,pp,col=2,lwd=2)
Keep only points below the predicted line:
XY2 <- subset(XY,y<predict(qq,newdata=data.frame(x)))
plot(y~x,data=XY2)
lines(xvec,pp,col=2,lwd=2)
You can make the line less wiggly by lowering the number of knots, e.g. y~ns(x,10)
Both R and EXCEL read and write .csv files, so you can use those to transfer the data back and forth.
You do not provide any data so I made some junk data to produce a similar problem.
DATA
set.seed(2017)
x = runif(8760, 0,16)
y = c(abs(rnorm(8000, 0, 1)), runif(760,0,8))
XY = data.frame(x,y)
One way to identify noise points is by looking at the distance to the nearest neighbors. In dense areas, nearest neighbors will be closer. In non-dense areas, they will be further apart. The package dbscan provides a nice function to get the distance to the k nearest neighbors. For this problem, I used k=6, but you may need to tune for your data. Looking at the distribution of distances to the 6th nearest neighbor we see that most points have 6 neighbors within a distance of 0.2
XY6 = kNNdist(XY, 6)
plot(density(XY6[,6]))
So I will assume that point whose 6th nearest neighbor is further away are noise points. Just changing the color to see which points are affected, we get
TYPE = rep(1,8760)
TYPE[XY6[,6] > 0.2] = 2
plot(XY, col=TYPE)
Of course, if you wish to restrict to the non-noise points, you can use
NonNoise = XY[XY6[,6] > 0.2,]
I’m ashamed bothering you with a stupid (but very necessary to me) question. I’ve a bunch of lat/lon points distributed almost randomly within a rectangle of ca. two x three degrees (latitude x longitude).
I need to calculate the maximum distance to the second nearest neighbor as well as the maximum distance to the farthest neighbor. I calculated these using package spatstat,
d2 <- max(nndist(data[,2:3], k = 2)
dn <- max(nndist(data[,2:3], k=(nrow(data))-1))
, respectively, and the distances obtained were 0.3 to 4.2.
I need these distances in kilometers.
So, I supposed that distances provided by nndist where expressed in radians.
So, if θ = a /r, where θ is the subtended angle in radians, a is arc length, and r is Earth radius), then, to calculate a the equations becomes: a = θr.
However, the distances transformed in such a way ranged from:
a = 6371 * 0.3 = 1911.3, and
a= 6371 * 4.2 = 2650.2
This is evidently wrong; since the maximum distance measured using – for example – Qgis between the farthest points is just 480 km…
Can anybody indicate me where am I mistaken?
Thanks a lot in advance!!!
nndist is simply calculating the euclidean distance. It does no unit conversion. As such you have given it values in "degrees", and thus it will return a value whose units are degrees. (not radians).
Thus
6371*0.3*pi/180 = 33.36
will give an approximation of the distance between these points.
A better approach would be to use great circle distances (eg in geosphere or gstat packages or to project the lat/long coordinates onto an appropriate map projection. (rgdal::spTransform will do this) and then nndist will calculate your distances in metres.
I need to make a 2D plot of distance travelled versus my value at that point ("intensity").
My data is formatted as:
lon lat intensity
1. -85.01478 37.99030 -68.3167
2. -85.00752 37.97601 -68.0247
3. -85.00027 37.96172 -67.9565
4. -84.99302 37.94743 -67.8917
and it continues for 282 rows like this. I was looking at a few packages that calculate distance between longitude (lon) and latitude (lat) points (such as geosphere), but I couldn't understand how to get my data into the format that it wanted. I know the total distance travelled in degrees should be 4.01538, evenly spaced out between the 282 points, but I don't know how I could make a column in R with this in mind.
dfrm$dist<- cumsum(c(0, with(dfrm, sqrt( (lat[-1]-lat[-nrow(dfrm)])^2+
(lon[-1]-lon[-nrow(dfrm)])^2
))) )
with(dfrm, plot(dist, intensity, type="b"))
Or choose a more "geographic" distance measure with the lagged column values. But given the increments, I doubt the error from using a naive distance measure can be that much.
From here I found some packages to calculate distance between coordinates. Assuming your data is called dtf and using the RSEIS package:
dtf <- data.frame(rbind(c(-85.01478,37.99030,-68.3167),
c(-85.00752,37.97601,-68.0247),c(-85.00027,37.96172,-67.9565),
c(-84.99302,37.94743,-67.8917)))
names(dtf) <- c('lon','lat','int')
library(RSEIS)
travelint <- function(i,data){
ddeg <- GreatDist(dtf$lon[i],dtf$lat[i],dtf$lon[i+1],dtf$lat[i+1])$ddeg;
dint <- dtf$int[i+1] - dtf$int[i]; return(list(ddeg,dint))}
out <- sapply(1:(nrow(dtf)-1),data=dtf,travelint)
out <- data.frame(matrix(as.numeric(out),ncol=2,byrow=T))
out$X1 <- cumsum(out$X1)
This will take your data, calculate the distance traveled between points and the intensity change between them. After that it can be plotted like this:
ggplot(out,aes(X1,X2)) + geom_line() +
labs(x="Distance (Degrees)",y="Intensity Change")
If instead you want increasing intensity , you can use cumsum again to get the cumulative change in intensity and then add it to the first intensity:
out2 <- out
out2 <- rbind(c(0,0),out2)
out2$X2 <- cumsum(out2$X2) + dtf$int[1]
ggplot(out2,aes(X1,X2)) + geom_line() +
labs(x="Distance (Degrees)",y="Intensity")
As mentioned by DWin you can use naive measure or geographic distance measure. Here I am using gdist function from Imap package calculates Great-circle distance .
library(Imap)
library(lattice)
#Dummy data
longlat <- read.table(text="lon lat intensity
1. -85.01478 37.99030 -68.3167
2. -85.00752 37.97601 -68.0247
3. -85.00027 37.96172 -67.9565
4. -84.99302 37.94743 -67.8917", header=TRUE)
ll <- lapply(seq(nrow(longlat)-1), function(x){
start <- longlat[x,]
end <- longlat[x+1,]
cbind(distance = gdist(start$lon, start$lat, end$lon, end$lat,units = "m"),
intensity = end$intensity - start$intensity)
})
dd <- as.data.frame(do.call(rbind,ll))
library(lattice)
xyplot(intensity~distance,dd,type= c('p','l'),pch=20,cex=2)