I'm having trouble finding a solution to this simple problem. I have been searching the forums and altought I have gotten closer to an answer this is not exactly what I need.
I'm trying to find from a set of x,y points which point is the furthest away from any other points i.e. not the maximum distance between points, but the one furthest from the rest.
I've tried
x <-c(x1,x2,x3....)
y <-c(y1,y2,y3...)
dist(cbind(x,y))
Which gives me a matrix of the distance between each point to each point. I can interrogate the data in MS Excel and find the answer. Find the minimum values in each column, then the maximum number across them.
If I were to plot the data, I would like to have as output the distance of either the red or blue line (depending on which is longer).
Starting from this example data set:
set.seed(100)
x <- rnorm(150)
y <- rnorm(150)
coord <- cbind(x,y)
dobj <- dist(coord)
Now dobj is a distance object, but you can't examine that directly. You'll have to convert that to a matrix first, and make sure you don't take zero distances between a point and itself into account:
dmat <- as.matrix(dobj)
diag(dmat) <- NA
The latter line replaces the diagonal values in the distance matrix with NA.
Now you can use the solution of amonk:
dmax <- max(apply(dmat,2,min,na.rm=TRUE))
This gives you the maximum distance to the nearest point. If you want to know which points these are, you can take an extra step :
which(dmat == dmax, arr.ind = TRUE)
# row col
# 130 130 59
# 59 59 130
So point 130 and 59 are the two points fulfilling your conditions. Plotting this gives you:
id <- which(dmat == dmax, arr.ind = TRUE)
plot(coord)
lines(coord[id[1,],], col = 'red')
Note how you get this info twice, as euclidean distances between two points are symmetric (A -> B is as long as B -> A ).
It looks like to me, that you have spatial points in some projection. One could argue, that the point furthest away from the rest, is the one which lies furthest from the center (the mean coordinates):
library(raster)
set.seed(21)
# create fake points
coords <- data.frame(x=sample(438000:443000,10),y=sample(6695000:6700000,10))
# calculate center
center <- matrix(colMeans(coords),ncol=2)
# red = center, magenta = furthest point (Nr.2)
plot(coords)
# furthest point #2
ix <- which.max(pointDistance(coords,center,lonlat = F))
points(center,col='red',pch='*',cex=3)
points(coords[ix,],col='magenta',pch='*',cex=3)
segments(coords[ix,1],coords[ix,2],center[1,1],center[1,2],col='magenta')
To find the points farthest from the rest of the points you could do something like this. I opted for the median distance as you said the point(s) farthest from the rest of the data. If you have a group of points very close to each other the median should remain robust to this.
There is probably also a way to do this with hierarchical clustering but it is escaping me at the moment.
set.seed(1234)
mat <- rbind(matrix(rnorm(100), ncol=2), c(-5,5), c(-5.25,4.75))
d <- dist(mat)
sort(apply(as.matrix(d), 1, median), decreasing = T)[1:5]
# 51 52 20 12 4
# 6.828322 6.797696 3.264315 2.806263 2.470919
I wrote up a handy little function you can use for picking from the largest of line distances. You can specify if you want the largest, second largest, and so forth with the n argument.
getBigSegment <- function(x, y, n = 1){
a <- cbind(x,y)
d <- as.matrix(dist(a, method = "euclidean"))
sorted <- order(d, decreasing = T)
sub <- (1:length(d))[as.logical(1:length(sorted) %% 2)]
s <- which(d == d[sorted[sub][n]], arr.ind = T)
t(cbind(a[s[1],], a[s[2],]))
}
With some example data similar to your own you can see:
set.seed(100)
mydata <- data.frame(x = runif(10, 438000, 445000) + rpois(10, 440000),
y = runif(10, 6695000, 6699000) + rpois(10, 6996000))
# The function
getBigSegment(mydata$x, mydata$y)
# x y
#[1,] 883552.8 13699108
#[2,] 881338.8 13688458
Below you can visualize how I would use such a function
# easy plotting function
pointsegments <- function(z, ...) {
segments(z[1,1], z[1,2], z[2,1], z[2,2], ...)
points(z, pch = 16, col = c("blue", "red"))
}
plot(mydata$x, mydata$y) # points
top3 <- lapply(1:3, getBigSegment, x = mydata$x, y = mydata$y) # top3 longest lines
mycolors <- c("black","blue","green") # 3 colors
for(i in 1:3) pointsegments(top3[[i]], col = mycolors[i]) # plot lines
legend("topleft", legend = round(unlist(lapply(top3, dist))), lty = 1,
col = mycolors, text.col = mycolors, cex = .8) # legend
This approach first uses chull to identify extreme_points, the points that lie on the boundary of the given points. Then, for each extreme_points, it calculates centroid of the extreme_points by excluding that particular extreme_points. Then it selects the point from extreme_points that's furthest away from the centroid.
foo = function(X = all_points){
plot(X)
chull_inds = chull(X)
extreme_points = X[chull_inds,]
points(extreme_points, pch = 19, col = "red")
centroid = t(sapply(1:NROW(extreme_points), function(i)
c(mean(extreme_points[-i,1]), mean(extreme_points[-i,2]))))
distances = sapply(1:NROW(extreme_points), function(i)
dist(rbind(extreme_points[i,], centroid[i,])))
points(extreme_points[which.max(distances),], pch = 18, cex = 2)
points(X[chull_inds[which.max(distances)],], cex = 5)
return(X[chull_inds[which.max(distances)],])
}
set.seed(42)
all_points = data.frame(x = rnorm(25), y = rnorm(25))
foo(X = all_points)
# x y
#18 -2.656455 0.7581632
So for df as your initial data frame you can perform the following:
df<-NULL#initialize object
for(i in 1:10)#create 10 vectors with 10 pseudorandom numbers each
df<-cbind(df,runif(10))#fill the dataframe
cordf<-cor(df);diag(cordf)<-NA #create correlation matrix and set diagonal values to NA
Hence:
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
[1,] NA -0.03540916 -0.29183703 0.49358124 0.79846794 0.29490246 0.47661166 -0.51181482 -0.04116772 -0.10797632
[2,] -0.03540916 NA 0.47550478 -0.24284088 -0.01898357 -0.67102287 -0.46488410 0.01125144 0.13355919 0.08738474
[3,] -0.29183703 0.47550478 NA -0.05203104 -0.26311149 0.01120055 -0.16521411 0.49215496 0.40571893 0.30595246
[4,] 0.49358124 -0.24284088 -0.05203104 NA 0.60558581 0.53848638 0.80623397 -0.49950396 -0.01080598 0.41798727
[5,] 0.79846794 -0.01898357 -0.26311149 0.60558581 NA 0.33295170 0.53675545 -0.54756131 0.09225002 -0.01925587
[6,] 0.29490246 -0.67102287 0.01120055 0.53848638 0.33295170 NA 0.72936185 0.09463988 0.14607018 0.19487579
[7,] 0.47661166 -0.46488410 -0.16521411 0.80623397 0.53675545 0.72936185 NA -0.46348644 -0.05275132 0.47619940
[8,] -0.51181482 0.01125144 0.49215496 -0.49950396 -0.54756131 0.09463988 -0.46348644 NA 0.64924510 0.06783324
[9,] -0.04116772 0.13355919 0.40571893 -0.01080598 0.09225002 0.14607018 -0.05275132 0.64924510 NA 0.44698207
[10,] -0.10797632 0.08738474 0.30595246 0.41798727 -0.01925587 0.19487579 0.47619940 0.06783324 0.44698207 NA
Finally by executing:
max(apply(cordf,2,min,na.rm=TRUE),na.rm = TRUE)#avoiding NA's
one can get:
[1] -0.05275132
the maximum value of the local minima.
Edit:
In order to get the index of matrix
>which(cordf==max(apply(cordf,2,min,na.rm=TRUE),na.rm = TRUE))
[1]68 77
or in order to get the coordinates:
> which(cordf==max(apply(cordf,2,min,na.rm=TRUE),na.rm = TRUE), arr.ind = TRUE)
row col
[1,] 8 7
[2,] 7 8
Related
I would like to present the probabilities of some joint events as a raster using ggplot2 package and wonder how does geom_raster decides which value to promote in case of more than one cell values. I have cases where these events can have more than one probabilities for some reasons. In the code below and the picture above, I illustrate the point of my question at coordinate (10, 10). Does geom_raster considers the last value? Does it sample?
library(ggplot2)
# Normal raster
r <- data.frame(x = 1:10, y = rep(10, 10), value = 1:10)
p1 <- ggplot(r, aes(x, y, fill=value))+
geom_raster()+
coord_equal()+
theme(legend.position = 'bottom')+
labs(title = 'Normal raster: every cell has one value')
p1
# Assuming that coordinate (10, 10) have values 10 and 0
r <- rbind(r, c(10, 10, 0))
p2 <- ggplot(r, aes(x, y, fill=value))+
geom_raster()+
coord_equal()+
theme(legend.position = 'bottom')+
labs(title = 'Raster having 2 different values (10 then 0) at coordinates (10, 10)')
p2
It appears that just the last value for the cell is used. The logic can be found in the source code in the draw_panel function of GeomRaster. We see this code
x_pos <- as.integer((data$x - min(data$x))/resolution(data$x,
FALSE))
y_pos <- as.integer((data$y - min(data$y))/resolution(data$y,
FALSE))
nrow <- max(y_pos) + 1
ncol <- max(x_pos) + 1
raster <- matrix(NA_character_, nrow = nrow, ncol = ncol)
raster[cbind(nrow - y_pos, x_pos + 1)] <- alpha(data$fill,
data$alpha)
So what it does is makes a matrix with rows and columns for all the values, then it does an assignment using matrix indexing. When you do this, only the last assignment survives. For example
(m <- matrix(1:9, nrow=3))
# [,1] [,2] [,3]
# [1,] 1 4 7
# [2,] 2 5 8
# [3,] 3 6 9
(rowcols <- cbind(c(2,3,2), c(3,1,3)))
# [,1] [,2]
# [1,] 2 3
# [2,] 3 1
# [3,] 2 3
m[rowcols] <- 10:12
m
# [,1] [,2] [,3]
# [1,] 1 4 7
# [2,] 2 5 12
# [3,] 11 6 9
What we are doing is creating a matrix then changing the value of cell (2,3), (3,1) then (2,3) again. Only the last assignment to (2,3) is preserved (the 10 value is overwritten). So the value kept depends on the order your data is passed to the ggplot object.
I can make one pseudo-random matrix with the following :
nc=14
nr=14
set.seed(111)
M=matrix(sample(
c(runif(58,min=-1,max=0),runif(71, min=0,max=0),
runif(nr*nc-129,min=0,max=+1))), nrow=nr, nc=nc)
The more important question: I need 1000 matrices with the same amount of negative, positive and zero values, just the location in the matrices need to be various.
I can make matrices one by one, but I want to do this task faster.
The less important question: If I have the 1000 matrices, I need to identify for every point of the matrices, that how many positive negative or zero value got there, for example:
MATRIX_A
[,1]
[9,] -0,2
MATRIX_B
[,1]
[9,] -0,5
MATRIX_C
[,1]
[9,] 0,1
MATRIX_D
[,1]
[9,] 0,0
MATRIX_E
[,1]
[9,] 0,9
What I need:
FINAL_MATRIX_positive
[,1]
[9,] (2/5*100)=40% or 0,4 or 2
,because from 5 matrix in this point were 2 positive value, and also need this for negative and zero values too.
If it isn't possible to do this in R, I can compare them "manually" in Excel.
Thank you for your help!
Actually you are almost there!
You can try the code below, where replicate can make 1000 times for generating the random matrix, and Reduce gets the statistics of each position:
nc <- 14
nr <- 14
N <- 1000
lst <- replicate(
N,
matrix(sample(
c(
runif(58, min = -1, max = 0),
runif(71, min = 0, max = 0),
runif(nr * nc - 129, min = 0, max = +1)
)
), nrow = nr, nc = nc),
simplify = FALSE
)
pos <- Reduce(`+`,lapply(lst,function(M) M > 0))/N
neg <- Reduce(`+`,lapply(lst,function(M) M < 0))/N
zero <- Reduce(`+`,lapply(lst,function(M) M == 0))/N
I use a function for your simulation scheme:
my_sim <- function(n_neg = 58, n_0 = 71, n_pos = 67){
res <- c(runif(n_neg, min=-1, max=0),
rep(0, n_0),
runif(n_pos, min=0, max=+1))
return(sample(res))
}
Then, I simulate your matrices (I store them in a list):
N <- 1000
nr <- 14
nc <- nr
set.seed(111)
my_matrices <- list()
for(i in 1:N){
my_matrices[[i]] <- matrix(my_sim(), nrow = nr, ncol = nc)
}
Finally, I compute the proportion of positive numbers for the position row 1 and column 9:
sum(sapply(my_matrices, function(x) x[1,9]) > 0)/N
# [1] 0.366
However, if you are interested in all the positions, these lines will do the job:
aux <- lapply(my_matrices, function(x) x > 0)
FINAL_MATRIX_positive <- 0
for(i in 1:N){
FINAL_MATRIX_positive <- FINAL_MATRIX_positive + aux[[i]]
}
FINAL_MATRIX_positive <- FINAL_MATRIX_positive/N
# row 1, column 9
FINAL_MATRIX_positive[1, 9]
# [1] 0.366
I want to find the minimum value in my distance matrix in order to programm the single linkage algorithm for cluster analysis with R. But the output doesn´t show the coordinates (row number and column number) to identify the minimum.
I tried the "which" command to solve this.
This seems to be the right approach:
> x <- matrix(c(1, 2, 0, 4), nrow=2, ncol=2)
> which(x == min(x), arr.ind=TRUE)
row col
[1,] 1 2
I tried it with my case, but there is no right output:
> which(distance.matrix.euc==min(distance.matrix.euc), arr.ind=TRUE)
row col
I expect that R shows me the coordinates where the minimum value is in the distance matrix, but it shows nothing.
Do you have an idea what´s wrong.
If you create the distance.matrix.euc with the dist function in R, then its class will be dist, not a matrix.
set.seed(2)
x <- matrix(sample(1:10, 6, replace = FALSE), nrow=3)
x
# [,1] [,2]
# [1,] 5 1
# [2,] 6 10
# [3,] 9 7
distance_matrix <- dist(x)
distance_matrix
# 1 2
# 2 9.055385
# 3 7.211103 4.242641
class(distance_matrix)
# [1] "dist"
As #akrun suggested, you can convert your distance matrix into matrix class. Then, the which command should return closest points.
min_dist <- min(distance_matrix)
distance_matrix <- as.matrix(distance_matrix)
which(distance_matrix==min_dist, arr.ind=TRUE)
# row col
# 3 3 2
# 2 2 3
I am trying to get some summary statistics (mean, variance and quantiles) from a data vector with tied values. In particular, it is stored in a frequency distribution table: unique data values var and number of ties frequency.
I know I could use rep function to first expand the vector to its full format:
xx <- rep(mydata$var, mydata$frequency)
then do standard
mean(xx)
var(xx)
quantile(xx)
But the frequency is really large and I have many unique values, which makes the program really slow. Is there a way to compute these statistics directly from var and frequency?
set.seed(0)
x <- runif(10) ## unique data values
k <- sample.int(5, 10, TRUE) ## frequency
n <- sum(k)
xx <- rep.int(x, k) ## "expanded" data
#################
## sample mean ##
#################
mean(xx) ## using `xx`
#[1] 0.6339458
mu <- c(crossprod(x, k)) / n ## using `x` and `k`
#[1] 0.6339458
#####################
## sample variance ##
#####################
var(xx) * (n - 1) / n ## using `xx`
#[1] 0.06862544
v <- c(crossprod(x ^ 2, k)) / n - mu * mu ## using `x` and `k`
#[1] 0.06862544
Computing quantiles are much more involved, but doable. We need to first understand how quantiles are computed in a standard way.
xx <- sort(xx)
pp <- seq(0, 1, length = n)
plot(pp, xx); abline(v = pp, col = 8, lty = 2)
The standard quantile computation is a linear interpolation problem. However, when data have ties, we can clearly see that there are "runs" (of the same value) and "jumps" (between two values) in the plot. Linear interpolation is only needed on "jumps", while on "runs" the quantiles are just the run values.
The following function finds quantiles only using x and k. For demonstration purpose there is an argument verbose. If TRUE it will produce a plot and a data frame containing information of "runs" (and "jumps").
find_quantile <- function (x, k, prob = seq(0, 1, length = 5), verbose = FALSE) {
if (is.unsorted(x)) {
ind <- order(x); x <- x[ind]; k <- k[ind]
}
m <- length(x) ## number of unique values
n <- sum(k) ## number of data
d <- 1 / (n - 1) ## break [0, 1] into (n - 1) intervals
## the right and left end of each run
r <- (cumsum(k) - 1) * d
l <- r - (k - 1) * d
if (verbose) {
breaks <- seq(0, 1, d)
plot(r, x, "n", xlab = "prob (p)", ylab = "quantile (xq)", xlim = c(0, 1))
abline(v = breaks, col = 8, lty = 2)
## sketch each run
segments(l, x, r, x, lwd = 3)
## sketch each jump
segments(r[-m], x[-m], l[-1], x[-1], lwd = 3, col = 2)
## sketch `prob`
abline(v = prob, col = 3)
print( data.frame(x, k, l, r) )
}
## initialize the vector of quantiles
xq <- numeric(length(prob))
run <- rbind(l, r)
i <- findInterval(prob, run, rightmost.closed = TRUE)
## odd integers in `i` means that `prob` lies on runs
## quantiles on runs are just run values
on_run <- (i %% 2) != 0
run_id <- (i[on_run] + 1) / 2
xq[on_run] <- x[run_id]
## even integers in `i` means that `prob` lies on jumps
## quantiles on jumps are linear interpolations
on_jump <- !on_run
jump_id <- i[on_jump] / 2
xl <- x[jump_id] ## x-value to the left of the jump
xr <- x[jump_id + 1] ## x-value to the right of the jump
pl <- r[jump_id] ## percentile to the left of the jump
pr <- l[jump_id + 1] ## percentile to the right of the jump
p <- prob[on_jump] ## probability on the jump
## evaluate the line `(pl, xl) -- (pr, xr)` at `p`
xq[on_jump] <- (xr - xl) / (pr - pl) * (p - pl) + xl
xq
}
Applying the function to the example data above with verbose = TRUE gives:
result <- find_quantile(x, k, prob = seq(0, 1, length = 5), TRUE)
# x k l r
#1 0.2016819 4 0.0000000 0.1111111
#2 0.2655087 2 0.1481481 0.1851852
#3 0.3721239 1 0.2222222 0.2222222
#4 0.5728534 4 0.2592593 0.3703704
#5 0.6291140 2 0.4074074 0.4444444
#6 0.6607978 5 0.4814815 0.6296296
#7 0.8966972 1 0.6666667 0.6666667
#8 0.8983897 3 0.7037037 0.7777778
#9 0.9082078 2 0.8148148 0.8518519
#10 0.9446753 4 0.8888889 1.0000000
Each row of the data frame is a "run". x gives the run values, k is the run length, and l and r are the left and right percentile of the run. In the figure, "runs" are drawn in black horizontal lines.
Information of "jumps" is implied by the r, x values of a row and the l, x values of the next row. In the figure, "jumps" are drawn in red lines.
The vertical green lines signals the prob values we give.
The computed quantiles are
result
#[1] 0.2016819 0.5226710 0.6607978 0.8983897 0.9446753
which are identical to
quantile(xx, names = FALSE)
#[1] 0.2016819 0.5226710 0.6607978 0.8983897 0.9446753
I am very lost in Euclidean distance calculation. I have found functions dist2{SpatialTools} or rdist{fields} to do this, but they doesn´t work as expected.
I suppose that one point has two coordinates in carthesian system, so [x,y].
To measure distance between 2 points (defined by row), I need 4 coordinates for 2 points, so
point A: [x1,y1]
point B: [x2,y2]
Points coordinations:
A[0,1]
B[0,0]
C[1,1]
D[1,1]
I have two matrices: x1(A and C are there, defined by rows) and x2 (contain B and D). Written in matrix:
library("SpatialTools")
x1<-matrix(c(0,1,1,1), nrow = 2, ncol=2, byrow=TRUE)
x2<-matrix(c(0,0,1,1), nrow = 2, ncol=2, byrow=TRUE)
so I obtain
> x1
[,1] [,2]
[1,] 0 1 #(as xy coordinates of A point)
[2,] 1 1 #(same for C point)
> x2
[,1] [,2]
[1,] 0 0 #(same for B point)
[2,] 1 1 #(same for D point)
To calculate euclidean distance between
A <-> B # same as x1[1,] <-> x2[1,]
C <-> D # same as x1[2,] <-> x2[2,]
I assume to obtain EuclidDist:
> x1 x2 EuclidDist
[,1] [,2] [,1] [,2]
[1,] 0 1 #A [1,] 0 0 #B 1
[2,] 1 1 #B [2,] 1 1 #D 0
I would like just to obtain vector of distances between two points identified by [x,y] coordinates, however, using dist2 I obtain a matrix:
> dist2(x1,x2)
[,1] [,2]
[1,] 1.000000 1
[2,] 1.414214 0
My question is, which numbers describe the real Euclidean distance between A-B and C-D from this matrix? Am I misunderstanding something? Thank you very much for every advice or any explanation.
If you just want a vector, something like this will work for you.
Try something like this:
euc.dist <- function(x1, x2) sqrt(sum((x1 - x2) ^ 2))
library(foreach)
foreach(i = 1:nrow(x1), .combine = c ) %do% euc.dist(x1[i,],x2[i,])
This will work for any dimensions.
If you don't want to use foreach, you can use a simple loop:
dist <- NULL
for(i in 1:nrow(x1)) dist[i] <- euc.dist(x1[i,],x2[i,])
dist
Although, I would recommend foreach (because it's very easy to for various tasks like this). Read more about it in the documentation of the package.
The diagonal is what you're looking for. The output matrix of dist2 shows the distance between all points. The row number in the output corresponds to the row in the first input, and column of the output corresponds to the row in the second input. Here's a diagram, hope it makes sense (this is the kind of thing I wish Stack Overflow supported MathJax for):
dist2( A_x A_y C_x C_y ( AC AD
B_x B_y , D_x D_y ) = BC BD )
dist2( x1 , x2 ) = result
In your case, you want the distance from the first point of x1 to the first point of x2, then the second point of x1 to the second point of x2, hence the diagonal.
If you have a lot of data, and you only care about the corresponding pairs, you'll be much better off calculating this directly:
> x1 <- matrix(c(0, 1, 1, 1), ncol = 2, byrow = T)
> x2 <- matrix(c(0, 0, 1, 1), ncol = 2, byrow = T)
> sqrt(rowSums((x1 - x2)^2))
[1] 1 0
If you've got a whole lot of data (millions of points), it might be worth using foreach like #Shambho suggests.
library(rgdal)
library(sp)
##**COORDINATES** DATAFRAME THAT CONTENT THE LATITUDE (LAT) AND LONGITUDE
##(LON) IN THE COORDINATE REFERENT SYSTEM (CRS) WGS84.
coordinates(COORDINATES) <- ~ LON + LAT
proj4string(COORDINATES) <- CRS("+proj=longlat +datum=WGS84") #ASSIGN THE CRS
Zone <- input$Zone #UTM ZONE FOR YOUR COUNTRY
COORDINATES <- spTransform(COORDINATES, CRS(paste("+proj=utm", " +zone=",
Zone, " +ellps=WGS84", " +datum=WGS84", "
+units=m", sep=""))) #REPROJECT THE CRS
COORDINATES <- as.data.frame(COORDINATES)
X <- COORDINATES$LON #EXTRACT THE LOGITUDE VECTOR
Y <- COORDINATES$LAT #EXTRACT THE LATITUDE VECTOR
MX1 <- X %*% t(X) #CREATE A MATRIX FOR LONGITUDE VECTOR
MX2 <- matrix(rep(t(X),nrow(COORDINATES)), ncol = nrow(COORDINATES),
nrow = nrow(COORDINATES)) #CREATE A MATRIX FOR REPEAT LONGITUDE VECTOR
MX <- MX1/MX2 #DEFENITIVE MATRIX FOR LONGITUDE VECTORS
MX <- abs((MX-MX2)**2) #SQUARE SUM OF LONGITUDE VECTORS
colnames(MX)<- paste(COORDINATES$STATION) #ASSIGN COLNAMES
rownames(MX)<- paste(COORDINATES$STATION) #ASSIGN ROWNAMES
MY1 <- Y %*% t(Y) #CREATE A MATRIX FOR LATITUDE VECTOR
MY2 <- matrix(rep(t(Y), nrow(COORDINATES)), ncol = nrow(COORDINATES),
nrow = nrow(COORDINATES)) #CREATE A MATRIX FOR REPEAT LATITUDE VECTOR
MY <- MY1/MY2 #DEFENITIVE MATRIX FOR LATITUDE VECTORS
MY <- abs((MY-MY2)*2) #SQUARE SUM OF LONGITUDE VECTORS
colnames(MY)<- paste(COORDINATES$STATION) #ASSIGN COLNAMES
rownames(MY)<- paste(COORDINATES$STATION) #ASSIGN ROWNAMES
EUCLIDEAND <- round((sqrt(MX+MY)/1000), digits = 0) #EUCLIDEAN DISTANCE FOR THESE COORDINATES
EUCLIDEAND <- as.data.frame(EUCLIDEAND)
You can always just apply the true equation (written for the sqldf package, but it can easily be converted):
sum(SQRT(power(a.LONG-b.lon,2)+power(a.LAT-b.lat,2))) AS DISTANCE