Distances between two lists of position vectors - r

I am trying to get a matrix that contains the distances between the points in two lists.
The vector of points contain the latitude and longitude, and the distance can be calculated between any two points using the function distCosine in the geosphere package.
> Points_a
lon lat
1 -77.69271 45.52428
2 -79.60968 43.82496
3 -79.30113 43.72304
> Points_b
lon lat
1 -77.67886 45.48214
2 -77.67886 45.48214
3 -77.67886 45.48214
4 -79.60874 43.82486
I would like to get a matrix out that would look like:
d_11 d_12 d_13
d_21 d_22 d_23
d_31 d_32 d_33
d_41 d_42 d_43
I am struggling to think of a way to generate the matrix without just looping over Points_a and Points_b and calculating each combination, can anyone suggest a more elegant solution?

You can use this:
outer(seq(nrow(Points_a)),
seq(nrow(Points_b)),
Vectorize(function(i, j) distCosine(Points_a[i,], Points_b[j,]))
)
(based on tip by #CarlWitthoft)
According to the desired output you post, maybe you'll want the transpose t() of this, or simply replace _a with _b above.
EDIT: some explanation:
seq(nrow(Points_x)): creates a sequence from 1 to the number of rows of Points_x;
distCosine(Points_a[i,], Points_b[j,]): expression to compute the distance between points given by row i of Points_a and row j of Points_b;
function(i, j): makes the above an unnamed function in two parameters;
Vectorize(...): ensure that, given inputs i and j of length greater than one, the unnamed function above is called only once for each element of the vectors (see this for more info);
outer(x, y, f): creates "expanded" vectors x and y such that all combinations of its elements are present, and calls f using this input (see link above). The result is then reassembled into a nice matrix.

Related

How to add a constant in a for loop by keeping the original matrix in each iteration?

For example,
x<-matrix(c(1,2,3,4),2,2)
1 2
3 4
I want to add the constant "c" to each element of the matrix separately like this.
Iteration 1
1+c 2
3 4
Iteration 2
1 2+c
3 4
Iteration 3
1 2
3+c 4
Iteration 4
1 2
3 4+c
I have tried the following R code, but it retains the updated value while performing second iteration.
x= matrix of order nxm
for(i in 1:r)
{
for(j in 1:c)
{
x[i,j]=x[i,j]+c
print(x)
}
}
In this code the values getting updated and printing the updated value for each iteration.
Please help me... Thanks in Advance.
R prefers array operations.
Any matrix x is just an array of its entries, laid out column by column. You may successively add the constant c to the first, second, third, ... entry to copies of x, so that the original x remains unchanged. Do this by constructing arrays of the same length as x with all zero entries except for c in the desired location. The code shown at the end of this post does this by concatenating a bunch of zeros, c, and more zeros so that c appears in position i:
c(rep(0,i-1), cnst, rep(0,n-i)
If you loop with i=1, 2, 3, etc, the results will work down through each column of x, moving left to right. To do the operations in the order presented in the question, which works through each row, moving top to bottom, simply apply the procedure to the transpose of x and transpose the outputs.
Even for large matrices, this approach of adding an entire array is at least twice as fast on my system as adding c just to the i position of a copy of x.
Here is R code for the general procedure. It works on any non-empty matrix x. Beware: the output consists of length(x) copies of x and therefore can be quite large. In this example--which takes about a second to run on my system--x has 10,000 entries and therefore the output has 100,000,000 entries. You might want to test it on smaller matrices first!
x <- matrix(1:(100^2), 100) # Any nonempty matrix
cnst <- 1 # Value to add successively to each term in `x`
#
# The algorithm begins here.
#
n <- length(x)
lapply(1:n, function(i) matrix(as.vector(x)+c(rep(0,i-1),cnst,rep(0,n-i)), nrow(x)))
You just need to make a copy of the matrix:
x_safely_stored <- matrix of order nxm
for(i in 1:r) {
for(j in 1:c) {
x <- x_safely_stored
x[i,j]=x[i,j]+c
print(x)
}
}

Convert a one column matrix to n x c matrix

I have a (nxc+n+c) by 1 matrix. And I want to deselect the last n+c rows and convert the rest into a nxc matrix. Below is what I've tried, but it returns a matrix with every element the same in one row. I'm not sure why is this. Could someone help me out please?
tmp=x[1:n*c,]
Membership <- matrix(tmp, nrow=n, ncol=c)
You have a vector x of length n*c + n + c, when you do the extract, you put a comma in your code.
You should do tmp=x[1:(n*c)].
Notice the importance of parenthesis, since if you do tmp=x[1:n*c], it will take the range from 1 to n, multiply it by c - giving a new range and then extract based on this new range.
For example, you want to avoid:
(1:100)[1:5*5]
[1] 5 10 15 20 25
You can also do without messing up your head with indexing:
matrix(head(x, n*c), ncol=c)

R looping over two vectors

I have created two vectors in R, using statistical distributions to build the vectors.
The first is a vector of locations on a string of length 1000. That vector has around 10 values and is called mu.
The second vector is a list of numbers, each one representing the number of features at each location mentioned above. This vector is called N.
What I need to do is generate a random distribution for all features (N) at each location (mu)
After some fiddling around, I found that this code works correctly:
for (i in 1:length(mu)){
a <- rnorm(N[i],mu[i],20)
feature.location <- c(feature.location,a)
}
This produces the right output - a list of numbers of length sum(N), and each number is a location figure which correlates with the data in mu.
I found that this only worked when I used concatenate to get the values into a vector.
My question is; why does this code work? How does R know to loop sum(N) times but for each position in mu? What role does concatenate play here?
Thanks in advance.
To try and answer your question directly, c(...) is not "concatenate", it's "combine". That is, it combines it's argument list into a vector. So c(1,2,3) is a vector with 3 elements.
Also, rnorm(n,mu,sigma) is a function that returns a vector of n random numbers sampled from the normal distribution. So at each iteration, i,
a <- rnorm(N[i],mu[i],20)
creates a vector a containing N[i] random numbers sampled from Normal(mu[i],20). Then
feature.location <- c(feature.location,a)
adds the elements of that vector to the vector from the previous iteration. So at the end, you have a vector with sum(N[i]) elements.
I guess you're sampling from a series of locations, each a variable no. of times.
I'm guessing your data looks something like this:
set.seed(1) # make reproducible
N <- ceiling(10*runif(10))
mu <- sample(seq(1000), 10)
> N;mu
[1] 3 4 6 10 3 9 10 7 7 1
[1] 206 177 686 383 767 496 714 985 377 771
Now you want to take a sample from rnorm of length N(i), with mean mu(i) and sd=20 and store all the results in a vector.
The method you're using (growing the vector) is not recommended as it will be re-copied in memory each time an element is added. (See Circle 2, although for small examples like this, it's not so important.)
First, initialize the storage vector:
f.l <- NULL
for (i in 1:length(mu)){
a <- rnorm(n=N[i], mean=mu[i], sd=20)
f.l <- c(f.l, a)
}
Then, each time, a stores your sample of length N[i] and c() combines it with the existing f.l by adding it to the end.
A more efficient approach is
unlist(mapply(rnorm, N, mu, MoreArgs=list(sd=20)))
Which vectorizes the loop. Unlist is used as mapply returns a list of vectors of varying lengths.

How to create a list from an array of z-scores in R?

I have an array of z-scores that is structured like num [1:27, 1:11, 1:467], so there are 467 entries with 27 rows and 11 columns. Is there a way that I can make a list from this array? For example a list of entries which contain a z-score over 2.0 (not just a list of z scores, a list which identifies which 1:467 entries have z > 2).
Say that your array is called z in your R session. The function you are looking for is which with the argument arr.ind set to TRUE.
m <- which(z > 2, arr.ind=TRUE)
This will give you a selection matrix, i.e. a matrix with three columns, each line corresponding to an entry with a Z-score greater than 2. To know the number of Z-scores greater than 2 you can do
nrow(m)
# Note that 'sum(z > 2)' is easier.
and to get the values
z[m]
# Note that 'z[z > 2]' is easier

A more generalized expand.grid function?

expand.grid(a,b,c) produces all the combinations of the values in a,b, and c in a matrix - essentially filling the volume of a three-dimensional cube. What I want is a way of getting slices or lines out of that cube (or higher dimensional structure) centred on the cube.
So, given that a,b, c are all odd-length vectors (so they have a centre), and in this case let's say they are of length 5. My hypothetical slice.grid function:
slice.grid(a,b,c,dimension=1)
returns a matrix of the coordinates of points along the three central lines. Almost equivalent to:
rbind(expand.grid(a[3],b,c[3]),
expand.grid(a,b[3],c[3]),
expand.grid(a[3],b[3],c))
almost, because it has the centre point repeated three times. Furthermore:
slice.grid(a,b,c,dimension=2)
should return a matrix equivalent to:
rbind(expand.grid(a,b,c[3]), expand.grid(a,b[3],c), expand.grid(a[3],b,c))
which is the three intersecting axis-aligned planes (with repeated points in the matrix at the intersections).
And then:
slice.grid(a,b,c,dimension=3)
is the same as expand.grid(a,b,c).
This isn't so bad with three parameters, but ideally I'd like to do this with N parameters passed to the function expand.grid(a,b,c,d,e,f,dimension=4) - its unlikely I'd ever want dimension greater than 3 though.
It could be done by doing expand.grid and then extracting those points that are required, but I'm not sure how to build that criterion. And I always have the feeling that this function exists tucked in some package somewhere...
[Edit] Right, I think I have the criterion figured out now - its to do with how many times the central value appears in each row. If its less than or equal to your dimension+1...
But generating the full matrix gets big quickly. It'll do for now.
Assuming a, b and c each have length 3 (and if there are 4 variables then they each have length 4 and so on) try this. It works by using 1:3 in place of each of a, b and c and then counting how many 3's are in each row. If there are four variables then it uses 1:4 and counts how many 4's are in each row, etc. It uses this for the index to select out the appropriate rows from expand.grid(a, b, c) :
slice.expand <- function(..., dimension = 1) {
L <- lapply(list(...), seq_along)
n <- length(L)
ix <- rowSums(do.call(expand.grid, L) == n) >= (n-dimension)
expand.grid(...)[ix, ]
}
# test
a <- b <- c <- LETTERS[1:3]
slice.expand(a, b, c, dimension = 1)
slice.expand(a, b, c, dimension = 2)
slice.expand(a, b, c, dimension = 3)

Resources