I am looking at some dispersal data and would like to get distance between points and also the angle between those points. So far, I have only been able to achieve the first part. Using the teal data from the adehabitatLT package I have done this:
require("adehabitatLT")
require("sp")
data("teal")
teal <- teal[1:10 ,]
capsd <- SpatialPointsDataFrame(coords = SpatialPoints(coords =
teal[, c("x","y")], proj4string = CRS("+proj=longlat +datum=WGS84
+ellps=WGS84 +towgs84=0,0,0")), data=teal)
capdistance <- as.data.frame(pointDistance(capsd))
The capdistance is a 10x10 dataframe displaying the distances between the first 10 points of the teal dataset.
Does anyone know how I would calculate the angle between these points to create a similar matrix to the capdistance data.frame? I have searched, but so far I have not found anything that would calculate the angle between two set locations. Any help would be greatly appreciated.
EDIT
So I have been looking around and it would seem that the bearing function from the geosphere package would be useful for this, but I am still (at least) a step away from working this all the way through:
require("geosphere")
capbearing1 <- bearing(capsd[1:10 ,], capsd[1 ,])
capbearing2 <- bearing(capsd[1:10 ,], capsd[2 ,])
I could repeat this ten times to achieve ten lists each one giving the angle of one of the ten points relative to all ten points (itself and the nine others); however, I would really like this to operate smoothly to give all ten lists at once as a single matrix; again, any help is very appreciated.
cygps gave some good code if you are utilizing UTMs in a single zone and have limited data points, so try that out if you have those parameters.
foo <- function(df) {
x1 <- x2 <- df$x
y1 <- y2 <- df$y
Xpair<-merge(x1,x2)
names(Xpair)<-c("x1","x2")
Ypair<-merge(y1,y2)
names(Ypair)<-c("y1","y2")
dist <- c(sqrt((Xpair$x1 - Xpair$x2)^2 + (Ypair$y1 - Ypair$y2)^2), NA)
dx <- c(Xpair$x1 - Xpair$x2, NA)
dy <- c(Ypair$y1 - Ypair$y2, NA)
abs.angle <- ifelse(dist < 1e-07, NA, atan2(dy, dx))
so <- list(dist, abs.angle)
return(so)
}
I adapted this function from adehabitatLT:as.ltraj. It produces your distance and absolute angle matrices (assuming you wanted distances and angles between all points, not time-ordered distances and angles).
Related
I have a bunch of points in 2D space and have calculated a convex hull for them. I would now like to "tighten" the hull so that it no longer necessarily encompasses all points. In the typical nails-in-board-with-rubber-band analogy, what I'd like to achieve is to be able to tune the elasticity of the rubber band and allow nails to bend at pressure above some limit. That's just an analogy, there is no real physics here. This would kind-of be related to the reduction in hull area if a given point was removed, but not quite because there could be two points that are very close to each-other. This is not necessarily related to outlier detection, because you could imagine a pattern where a large fractions of the nails would bend if they are on a narrow line (imagine a hammer shape for example). All of this has to be reasonably fast for thousands of points. Any hints where I should look in terms of algorithms? An implementation in R would be perfect, but not needed.
EDIT AFTER COMMENT: The three points I've labelled are those with largest potential for reducing the hull area if they are excluded. In the plot there is no other set of three points that would result in a larger area reduction. A naive implementation of what I'm looking for would maybe be to randomly sample some fraction of the points, calculate the hull area, remove each point on the hull iteratively, recalculate the area, repeat many times and remove points that tend to lead to high area reduction. Maybe this could be implemented in some random forest variant? It's not quite right though, because I would like the points to be removed one by one so that you get the following result. If you looked at all points in one go it would possibly be best to trim from the edges of the "hammer head".
Suppose I have a set of points like this:
set.seed(69)
x <- runif(20)
y <- runif(20)
plot(x, y)
Then it is easy to find the subset points that sit on the convex hull by doing:
ss <- chull(x, y)
This means we can plot the convex hull by doing:
lines(x[c(ss, ss[1])], y[c(ss, ss[1])], col = "red")
Now we can randomly remove one of the points that sits on the convex hull (i.e. "bend a nail") by doing:
bend <- ss[sample(ss, 1)]
x <- x[-bend]
y <- y[-bend]
And we can then repeat the process of finding the convex hull of this new set of points:
ss <- chull(x, y)
lines(x[c(ss, ss[1])], y[c(ss, ss[1])], col = "blue", lty = 2)
To get the point which will, on removal, cause the greatest reduction in area, one option would be the following function:
library(sp)
shrink <- function(coords)
{
ss <- chull(coords[, 1], coords[, 2])
outlier <- ss[which.min(sapply(seq_along(ss),
function(i) Polygon(coords[ss[-i], ], hole = FALSE)#area))]
coords[-outlier, ]
}
So you could do something like:
coords <- cbind(x, y)
new_coords <- shrink(coords)
new_chull <- new_coords[chull(new_coords[, 1], new_coords[, 2]),]
new_chull <- rbind(new_chull, new_chull[1,])
plot(x, y)
lines(new_chull[,1], new_chull[, 2], col = "red")
Of course, you could do this in a loop so that new_coords is fed back into shrink multiple times.
Calculate a robust center and variance using mcd.cov in MASS and the mahalanobis distance of each point from it (using mahalanobis in psych). We then show a quantile plot of the mahalanobis distances using PlotMD from modi and also show the associated outliers in red in the second plot. (There are other functions in modi that may be of interest as well.)
library(MASS)
library(modi)
library(psych)
set.seed(69)
x <- runif(20)
y <- runif(20)
m <- cbind(x, y)
mcd <- cov.mcd(m)
md <- mahalanobis(m, mcd$center, mcd$cov)
stats <- PlotMD(md, 2, alpha = 0.90)
giving:
(continued after screenshot)
and we show the convex hull using lines and the outliers in red:
plot(m)
ix <- chull(m)
lines(m[c(ix, ix[1]), ])
wx <- which(md > stats$halpha)
points(m[wx, ], col = "red", pch = 20)
Thank you both! I've tried various methods for outlier detection, but it's not quite what I was looking for. They have worked badly due to weird shapes of my clusters. I know I talked about convex hull area, but I think filtering on segment lengths yields better results and is closer to what I really wanted. Then it would look something like this:
shrink <- function(xy, max_length = 30){
to_keep <- 1:(dim(xy)[1])
centroid <- c(mean(xy[,1]), mean(xy[,2]))
while (TRUE){
ss <- chull(xy[,1], xy[,2])
ss <- c(ss, ss[1])
lengths <- sapply(1:(length(ss)-1), function(i) sum((xy[ss[i+1],] - xy[ss[i],])^2))
# This gets the point with the longest convex hull segment. chull returns points
# in clockwise order, so the point to remove is either this one or the one
# after it. Remove the one furthest from the centroid.
max_point <- which.max(lengths)
if (lengths[max_point] < max_length) return(to_keep)
if (sum((xy[ss[max_point],] - centroid)^2) > sum((xy[ss[max_point + 1],] - centroid)^2)){
xy <- xy[-ss[max_point],]
to_keep <- to_keep[-ss[max_point]]
}else{
xy <- xy[-ss[max_point + 1],]
to_keep <- to_keep[-ss[max_point + 1]]
}
}
}
It's not optimal because it factors in the distance to the centroid, which I would have liked to avoid, and there is a max_length parameter that should be calculated from the data instead of being hard-coded.
No filter:
It looks like this because there are 500 000 cells in here, and there are many that end up "wrong" when projecting from ~20 000 dimensions to 2.
Filter:
Note that it filters out points at tips of some clusters. This is less-than-optimal but ok. The overlap between some clusters is true and should be there.
I apologize in advance if my code looks very amateurish.
I'm trying to assign quadrants to 4 measurement stations approximately located on the edges of a town.
I have the coordinates of these 4 stations:
a <- c(13.2975,52.6556)
b <- c(14.0083,52.5583)
c <- c(13.3722,52.3997)
d <- c(12.7417,52.6917)
Now my idea was to create lines connecting the north-south and east-west stations:
line.1 <- matrix(c(d[1],b[1],d[2],b[2]),ncol=2)
line.2 <- matrix(c(a[1],c[1],a[2],c[2]),ncol=2)
Plotting all the stations the connecting lines looks allright, however not very helpful for analyzing it on a computer.
So I calculated the eucledian vectors for the two lines:
vec.1 <- as.vector(c((b[1]-d[1]),(b[2]-d[2])))
vec.2 <- as.vector(c((c[1]-a[1]),(c[2]-a[2])))
which allowed me to calculate the angle between the two lines in degrees:
alpha <- acos((vec.1%*%vec.2) / (sqrt(vec.1[1]^2+vec.1[2]^2)*
sqrt(vec.2[1]^2+vec.2[2]^2)))) * 180/pi
The angle I get for alpha is 67.7146°. This looks fairly good. From this angle I can easily calculate the other 3 angles of the intersection, however I need values relative to the grid so I can assign values from 0°-360° for the wind directions.
Now my next planned step was to find the point where the two lines intersect, add a horizontal and vertical abline through that point and then calculate the angle relative to the grid. However I can't find a proper example that does that and I don't think I have a nice linear equation system I could solve.
Is my code way off? Or maybe anyone knows of a package which could help me? It feels like my whole approach is a bit wrong.
Okay I managed to calculate the intersection point, using line equations. Here is how.
The basic equation for two points is like this:
y - y_1 = (y_2-y_1/x_2-x_1) * (x-x_1)
If you make one for each of the two lines, you can just substitute the fractions.
k.1 <- ((c[2]-a[2])/(c[1]-a[1]))
k.2 <- ((b[2]-d[2])/(b[1]-d[1]))
Reshaping the two functions you get a final form for y:
y <- (((-k.1/k.2)*d[2]+k.1*d[1]-k.1*c[1]+d[2])/(1-k.1/k.2))
This one you can now use to calculate the x-value:
x <- ((y-d[2])+d[1]*k.2)/k.2
In my case I get
y = 52.62319
x = 13.3922
I'm starting to really enjoy this program!
Wikipedia has a good article on finding the intersection between two line segments with an explicit formula. However, you don't need to know the point of intersection to calculate the angle to the grid (or axes of coordinate system.) Just compute the angles from your vec.1 and vec.2 to the basis vectors:
e1 <- c(1, 0)
e2 <- c(0, 1)
as you have done.
I have occurrence points for a species, and I'd like to remove potential sampling bias (where some regions might have much greater density of points than others). One way to do this would be to maximize a subset of points that are no less than a certain distance X of each other. Essentially, I would prevent points from being too close to each other.
Are there any existing R functions to do this? I've searched through various spatial packages, but haven't found anything, and can't figure out exactly how to implement this myself.
An example occurrence point dataset can be downloaded here.
Thanks!
I've written a new version of this function that no longer really follows rMaternII.
The input can either be a SpatialPoints, SpatialPointsDataFrame or matrix object.
Seems to work well, but suggestions welcome!
filterByProximity <- function(xy, dist, mapUnits = F) {
#xy can be either a SpatialPoints or SPDF object, or a matrix
#dist is in km if mapUnits=F, in mapUnits otherwise
if (!mapUnits) {
d <- spDists(xy,longlat=T)
}
if (mapUnits) {
d <- spDists(xy,longlat=F)
}
diag(d) <- NA
close <- (d <= dist)
diag(close) <- NA
closePts <- which(close,arr.ind=T)
discard <- matrix(nrow=2,ncol=2)
if (nrow(closePts) > 0) {
while (nrow(closePts) > 0) {
if ((!paste(closePts[1,1],closePts[1,2],sep='_') %in% paste(discard[,1],discard[,2],sep='_')) & (!paste(closePts[1,2],closePts[1,1],sep='_') %in% paste(discard[,1],discard[,2],sep='_'))) {
discard <- rbind(discard, closePts[1,])
closePts <- closePts[-union(which(closePts[,1] == closePts[1,1]), which(closePts[,2] == closePts[1,1])),]
}
}
discard <- discard[complete.cases(discard),]
return(xy[-discard[,1],])
}
if (nrow(closePts) == 0) {
return(xy)
}
}
Let's test it:
require(rgeos)
require(sp)
pts <- readWKT("MULTIPOINT ((3.5 2), (1 1), (2 2), (4.5 3), (4.5 4.5), (5 5), (1 5))")
pts2 <- filterByProximity(pts,dist=2, mapUnits=T)
plot(pts)
axis(1)
axis(2)
apply(as.data.frame(pts),1,function(x) plot(gBuffer(SpatialPoints(coords=matrix(c(x[1],x[2]),nrow=1)),width=2),add=T))
plot(pts2,add=T,col='blue',pch=20,cex=2)
There is also an R package called spThin that performs spatial thinning on point data. It was developed for reducing the effects of sampling bias for species distribution models, and does multiple iterations for optimization. The function is quite easy to implement---the vignette can be found here. There is also a paper in Ecography with details about the technique.
Following Josh O'Brien's advice, I looked at spatstat's rMaternI function, and came up with the following. It seems to work pretty well.
The distance is in map units. It would be nice to incorporate one of R's distance functions that always returns distances in meters, rather than input units, but I couldn't figure that out...
require(spatstat)
require(maptools)
occ <- readShapeSpatial('occurrence_example.shp')
filterByProximity <- function(occ, dist) {
pts <- as.ppp.SpatialPoints(occ)
d <- nndist(pts)
z <- which(d > dist)
return(occ[z,])
}
occ2 <- filterByProximity(occ,dist=0.2)
plot(occ)
plot(occ2,add=T,col='blue',pch=20)
Rather than removing data points, you might consider spatial declustering. This involves giving points in clusters a lower weight than outlying points. The two simplest ways to do this involve a polygonal segmentation, like a Voronoi diagram, or some arbitrary grid. Both methods will weight points in each region according to the area of the region.
For example, if we take the points in your test (1,1),(2,2),(4.5,4.5),(5,5),(1,5) and apply a regular 2-by-2 mesh, where each cell is three units on a side, then the five points fall into three cells. The points ((1,1),(2,2)) falling into the cell [0,3]X[0,3] would each have weights 1/( no. of points in current cell TIMES tot. no. of occupied cells ) = 1 / ( 2 * 3 ). The same thing goes for the points ((4.5,4.5),(5,5)) in the cell (3,6]X(3,6]. The "outlier", (1,5) would have a weight 1 / ( 1 * 3 ). The nice thing about this technique is that it is a quick way to generate a density based weighting scheme.
A polygonal segmentation involves drawing a polygon around each point and using the area of that polygon to calculate the weight. Generally, the polygons completely cover the entire region, and the weights are calculated as the inverse of the area of each polygon. A Voronoi diagram is usually used for this, but polygonal segmentations may be calculated using other techniques, or may be specified by hand.
I need to make a 2D plot of distance travelled versus my value at that point ("intensity").
My data is formatted as:
lon lat intensity
1. -85.01478 37.99030 -68.3167
2. -85.00752 37.97601 -68.0247
3. -85.00027 37.96172 -67.9565
4. -84.99302 37.94743 -67.8917
and it continues for 282 rows like this. I was looking at a few packages that calculate distance between longitude (lon) and latitude (lat) points (such as geosphere), but I couldn't understand how to get my data into the format that it wanted. I know the total distance travelled in degrees should be 4.01538, evenly spaced out between the 282 points, but I don't know how I could make a column in R with this in mind.
dfrm$dist<- cumsum(c(0, with(dfrm, sqrt( (lat[-1]-lat[-nrow(dfrm)])^2+
(lon[-1]-lon[-nrow(dfrm)])^2
))) )
with(dfrm, plot(dist, intensity, type="b"))
Or choose a more "geographic" distance measure with the lagged column values. But given the increments, I doubt the error from using a naive distance measure can be that much.
From here I found some packages to calculate distance between coordinates. Assuming your data is called dtf and using the RSEIS package:
dtf <- data.frame(rbind(c(-85.01478,37.99030,-68.3167),
c(-85.00752,37.97601,-68.0247),c(-85.00027,37.96172,-67.9565),
c(-84.99302,37.94743,-67.8917)))
names(dtf) <- c('lon','lat','int')
library(RSEIS)
travelint <- function(i,data){
ddeg <- GreatDist(dtf$lon[i],dtf$lat[i],dtf$lon[i+1],dtf$lat[i+1])$ddeg;
dint <- dtf$int[i+1] - dtf$int[i]; return(list(ddeg,dint))}
out <- sapply(1:(nrow(dtf)-1),data=dtf,travelint)
out <- data.frame(matrix(as.numeric(out),ncol=2,byrow=T))
out$X1 <- cumsum(out$X1)
This will take your data, calculate the distance traveled between points and the intensity change between them. After that it can be plotted like this:
ggplot(out,aes(X1,X2)) + geom_line() +
labs(x="Distance (Degrees)",y="Intensity Change")
If instead you want increasing intensity , you can use cumsum again to get the cumulative change in intensity and then add it to the first intensity:
out2 <- out
out2 <- rbind(c(0,0),out2)
out2$X2 <- cumsum(out2$X2) + dtf$int[1]
ggplot(out2,aes(X1,X2)) + geom_line() +
labs(x="Distance (Degrees)",y="Intensity")
As mentioned by DWin you can use naive measure or geographic distance measure. Here I am using gdist function from Imap package calculates Great-circle distance .
library(Imap)
library(lattice)
#Dummy data
longlat <- read.table(text="lon lat intensity
1. -85.01478 37.99030 -68.3167
2. -85.00752 37.97601 -68.0247
3. -85.00027 37.96172 -67.9565
4. -84.99302 37.94743 -67.8917", header=TRUE)
ll <- lapply(seq(nrow(longlat)-1), function(x){
start <- longlat[x,]
end <- longlat[x+1,]
cbind(distance = gdist(start$lon, start$lat, end$lon, end$lat,units = "m"),
intensity = end$intensity - start$intensity)
})
dd <- as.data.frame(do.call(rbind,ll))
library(lattice)
xyplot(intensity~distance,dd,type= c('p','l'),pch=20,cex=2)
I have multiple matrices filled with the x and y coordinates of multiple points in 2D space that make up a graph. The matrices look something like this
x1 x2 x3 x4 ...
y1 y2 y3 y4 ...
A possible graph looks something like this
What I want to do is rotate the graph around point A so that the line between the points A and B are parallel to the X-Axis.
My idea was to treat the line AB as the hypothenuse of a right-triangle, calculate α (the angle at point A) and rotate the matrix for this graph by it using a rotation matrix.
What I did so far is the following
#df is the subset of my data that describes the graph we're handling right now,
#df has 2 or more rows
beginx=df[1,]$xcord #get the x coordinate of point A
beginy=df[1,]$ycord #get the y coordinate of point A
endx=df[nrow(df)-1,]$xcord #get the x coordinate of point B
endy=df[nrow(df)-1,]$ycord #get the y coordinate of point B
xnow=df$xcord
ynow=df$ycord
xdif=abs(beginx-endx)
ydif=abs(beginy-endy)
if((xdif != 0) & (ydif!=0)){
direct=sqrt(abs((xdif^2)-(ydif^2))) #calculate the length of the hypothenuse
sinang=abs(beginy-endy)/direct
angle=1/sin(sinang)
if(beginy>endy){
angle=angle
}else{
angle=360-angle
}
rotmat=rot(angle) # use the function rot(angle) to get the rotation matrix for
# the calculated angle
A = matrix(c(xnow,ynow),nrow=2,byrow = TRUE) # matrix containing the graph coords
admat=rotmat%*%A #multiply the matrix with the rotation matrix
}
This approach fails because it isn't flexible enough to always calculate the needed angle with the result being that the graph is rotated by the wrong angle and / or in the wrong direction.
Thanks in advance for reading and hopefully some of you can bring some fresh ideas to this
Edit: Data to reproduce this can be found here
X-Coordinates
Y-Coordinates
Not sure how to provide the data you've asked for, I'll gladly provide it in another way if you specify how you'd like it
Like this?
#read in X and Y as vectors
M <- cbind(X,Y)
#plot data
plot(M[,1],M[,2],xlim=c(0,1200),ylim=c(0,1200))
#calculate rotation angle
alpha <- -atan((M[1,2]-tail(M,1)[,2])/(M[1,1]-tail(M,1)[,1]))
#rotation matrix
rotm <- matrix(c(cos(alpha),sin(alpha),-sin(alpha),cos(alpha)),ncol=2)
#shift, rotate, shift back
M2 <- t(rotm %*% (
t(M)-c(M[1,1],M[1,2])
)+c(M[1,1],M[1,2]))
#plot
plot(M2[,1],M2[,2],xlim=c(0,1200),ylim=c(0,1200))
Edit:
I'll break down the transformation to make it easier to understand. However, it's just basic linear algebra.
plot(M,xlim=c(-300,1200),ylim=c(-300,1200))
#shift points, so that turning point is (0,0)
M2.1 <- t(t(M)-c(M[1,1],M[1,2]))
points(M2.1,col="blue")
#rotate
M2.2 <- t(rotm %*% (t(M2.1)))
points(M2.2,col="green")
#shift back
M2.3 <- t(t(M2.2)+c(M[1,1],M[1,2]))
points(M2.3,col="red")
Instead of a data frame, it looks like your data is better served as a matrix (via as.matrix).
This answer very similar to Roland's, but breaks things down into more steps and has some special-case handling when the angle is a multiple of pi/2.
#sample data
set.seed(1) #for consistency of random-generated data
d <- matrix(c(sort(runif(50)),sort(runif(50))),ncol=2)
#rotation about point A
rotA <- function(d) {
d.offset <- apply(d,2,function(z) z - z[1]) #offset data
endpoint <- d.offset[nrow(d.offset),] #gets difference
rot <- function(angle) matrix(
c(cos(angle),-sin(angle),sin(angle),cos(angle)),nrow=2) #CCW rotation matrix
if(endpoint[2] == 0) {
return(d) #if y-diff is 0, then no action required
} else if (endpoint[1] == 0) {
rad <- pi/2 #if x-diff is 0, then rotate by a right angle
} else {rad <- atan(endpoint[2]/endpoint[1])}
d.offset.rotate <- d.offset %*% rot(-rad) #rotation
d.rotate <- sapply(1:2,function(z) d.offset.rotate[,z] + d[1,z]) #undo offset
d.rotate
}
#results and plotting to check visually
d.rotate <- rotA(d)
plot(d.rotate)
abline(h=d[1,2])