R: pairwise Euclidean distance between columns of two matrices - r

The following loop takes too lonng to run (2mins/iteration)
The tumor_signals is size 950000x422
The normal_signals is size 950000x772
Any ideas for how to speed it up?
for(i in 1:ncol(tumor_signals)){
x <- as.vector(tumor_signals[,i])
print("Assigned x")
y <- t((t(normal_signals) - x)^2)
print("assigned y")
y <- t(sqrt(colSums(y)))
print("done")
#all_distance <- cbind(all_distance,matrix(distance))
print(i)
}

There's a bug in your code -- you don't need to take the transpose of normal_signals. As I understand it, you are trying to compute, for all i = 1,2,...422, and j=1,2,...,772, the Euclidean distance between tumor_signals[,i] and normal_signals[,j]. You would probably want the results in a 422 x 772 matrix. There's a function rdist() in the package fields that will do this for you:
require(fields)
result <- rdist(t(tumor_signals), t(normal_signals))
Incidentally, a Google search for [R Euclidean distance] would have easily found this package.

Related

foreach doesn't change value of raster cell in R

I'm trying to simulate herding behavior in R.
Here's the code
library(raster)
library(sp)
library(foreach)
K=100
sig=0.2
G=0.3
x <- raster(ncol=2000,nrow=2000)
values(x) <- sign(rnorm(4000000,mean=0,sd=0.3))
y <- raster(ncol=2000,nrow=2000)
values(y) <- sign(rnorm(4000000,mean=0,sd=0.3))
#plot(x)
ei <- rnorm(4000000)
j=0
while(j < 30) {
for(i in 1:4000000){
ad <- adjacent(x,cell=c(i))[,2]
y[i] <- sign(K*sum(x[ad])+sig*ei[i]+G)
}
x <- y
plot(x)
j = j+1
}
The classic loop approach is too slow.
If I use a foreach loop instead of a classic for loop it doesn't change the values of y in every iteration.
I can't fix it at all.
Can someone please help about this?
Thank you
You have a dynamic model in which the output of each (time) step is input for the next step. It is not possible to do that in parallel. But that does not mean you cannot make the model run faster.
Looping over raster cells in R is always going to be slow, so we need to avoid that. Normally a problem like this could be solved with focal (see code a the bottom) --- but in this case it is difficult because you effectively use two rasters (x and ei) --- I will look at implementing multi-layer focal operations in the terra package.
Here is an approach with getFocalValues. It is much faster (and I use Sys.sleep to slow it down a bit).
library(raster)
set.seed(0)
x <- raster(ncol=200, nrow=200)
values(x) <- sign(rnorm(ncell(x),mean=0,sd=0.3))
y <- raster(x)
values(y) <- sign(rnorm(ncell(x),mean=0,sd=0.3))
ei <- rnorm(ncell(x))
K=100
sig=0.2
G=0.3
for (j in 1:29) {
# with large rasters, you may need to do the below in chunks
v <- getValuesFocal(x, 1, nrow(x), c(3,3))
# only keep the rook neighbors
v <- v[, c(2,4,6,8)]
v <- rowSums(v, na.rm=TRUE)
values(x) <- sign(K*v+sig*ei+G)
plot(x)
Sys.sleep(0.1)
}
This how you could use focal in similar cases
w <- matrix(c(0,1,0,1,0,1,0,1,0), 3, 3)
y <- focal(x, w, fun=function(i)sign(K*sum(i)+sig+G))
Also see the cellular automata examples in ?focal

Is there any way I can optimize this R code?

This is a code I'm trying to run in rstudio. I know the iterations are way too long. Is there any optimal/faster way to do this? I've been stuck for 4+ hours and it doesn't seem like finishing any time soon.
I'm trying to make a distance matrix between 415 cities and 3680126 monuments. To optimize, I am only comparing those monuments with cities which are present in the same country.
for(x in 1:3680126){
for(y in 1:415){
if(list2_cities$Country[y]==list1_POI$Country[x]){
distance_matrix [x,y] <- ({POI$Longitude[x]-cities$Longitude[y]}^2)+({POI$Latitude[x]-cities$Latitude[y]}^2)
}
else{
distance_matrix [x,y] <- 0
}
}
}
Maybe you can try distm from package geosphere
library(geosphere)
d <- distm(list1_POI[c("Longitude","Latitude")],list2_cities[c("Longitude","Latitude")])
m <- +(outer(list1_POI$Country,list2_cities$Country,`==`))
res <- d*m
where
the distm part gives the all paired distances between two cities
the outer part provides a mask such that values for non-matched cities are set to 0
If your desired matrix is sparse, here is another option
common <- intersect(list1_POI$Country,list2_cities$Country)
rl <- match(common,list1_POI$Country)
cl <- match(common,list2_cities$Country)
d <- diag(distm(list1_POI[rl,c("Longitude","Latitude")],list2_cities[cl,c("Longitude","Latitude")]))
res <- matrix(0,length(list1_POI$Country),length(list1_cities$Country))
res[cbind(rl,cl)] <- d
where you only need to locate the matched cities and calculate their distances.

How to efficiently apply a two variable function to data.frame( or matrix) elements - to fill an output matrix?

I am trying to find more efficient way(s) to fill an output matrix by applying a function to elements in a data.frame. I experimented with the apply() family functions and the outer() function but couldn't make them work.
Maybe someone here might be able to help? Here's a simplified version of my script. Thanks!
set.seed(192)
n = 1000
distMatrix <- matrix(nrow=n,ncol=n)
# Co-ordinates
coord <- data.frame(x = runif(n,min=0,max=n),
y = runif(n,min=0,max=n))
# Distance Function
distance <- function(A,B) { sqrt( (A['x']-B['x'])^2 + (A['y']-B['y'])^2 ) }
# Fill distMatrix -- this part could use better programming. Note that I am only
# filling the upper triangular part of distMatrix.
for (r in 1:(n-1)) {
for (c in (r+1):n) {
distMatrix[[r,c]] <- distance(coord[r,],coord[c,])
}
}
You can use:
distFun <- function(A,B)
sqrt(
(coord[A, "x"] - coord[B, "x"]) ^ 2 +
(coord[A, "y"] - coord[B, "y"]) ^ 2
)
distMatrix <- outer(1:nrow(coord), 1:nrow(coord), distFun)
Notice that we need to pass outer two vectors. Here we use the indeces of the rows of the data frame. outer then produces two new vectors that together represent every possible combination of our original vectors, and passes those to our function. Our function then pulls the relevant coordinates for our calculations (here coord is assumed to be defined ahead of the function).
One key thing to understand when using outer is that our function is called only once. outer just computes the vector inputs assuming our function is vectorized, and then applies the corresponding dimensions to the result.
Also, look at ?dist:
dist(coord)
Try it with a smaller matrix (maybe 10 by 10) to see the result.

R : how to use variables for vector indices?

I'm new user of R, and trying to generate a k-moving average graph with sine function which involves random number(in range [-0.5,+0.5]) noise.
So what I have to do is calculate a mean of consecutive (2*k+1) elements in noised-sine vector but however, the code with "HELP" below, it's not working as I expected... :(
The code seems to calculate the mean of 1 through (i-k)th element.
What's wrong with it? Help please!
set.seed(1)
x = seq(0,2*pi,pi/50)
sin_graph <- sin(x)
noise <- runif(101, -0.5, 0.5)
sin_noise <- sin_graph + noise
plot(x,sin_noise, ylim=c(-2,2))
lines(x,sin_graph, col="red")
k<-1
MA<-0
while (k<=1){
i <- k+1
MA_vector <- rep(NA, times=101)
while (i<=101-k){
MA_vector[i] <- mean(sin_noise[i-k:i+k]) #HELP!
i <- i+1
}
print(MA_vector)
plot(x, MA_vector, ylim=c(-2,2))
lines(x,sin_graph, col="red")
k<-k+1
}
As it stands, it's substracting a vector of k:i from i and then adding k. : takes precedent over mathematical operators. By using brackets (see code below), it evaluates i-k and i+k and creates a vector with min and max as results of the evaluations. I get another smooth function.
MA_vector[i] <- mean(sin_noise[(i-k):(i+k)])

What is R's crossproduct function?

I feel stupid asking, but what is the intent of R's crossprod function with respect to vector inputs? I wanted to calculate the cross-product of two vectors in Euclidean space and mistakenly tried using crossprod .
One definition of the vector cross-product is N = |A|*|B|*sin(theta) where theta is the angle between the two vectors. (The direction of N is perpendicular to the A-B plane). Another way to calculate it is N = Ax*By - Ay*Bx .
base::crossprod clearly does not do this calculation, and in fact produces the vector dot-product of the two inputs sum(Ax*Bx, Ay*By).
So, I can easily write my own vectorxprod(A,B) function, but I can't figure out what crossprod is doing in general.
See also R - Compute Cross Product of Vectors (Physics)
According to the help function in R: crossprod (X,Y) = t(X)%*% Y is a faster implementation than the expression itself. It is a function of two matrices, and if you have two vectors corresponds to the dot product. #Hong-Ooi's comments explains why it is called crossproduct.
Here is a short code snippet which works whenever the cross product makes sense: the 3D version returns a vector and the 2D version returns a scalar. If you just want simple code that gives the right answer without pulling in an external library, this is all you need.
# Compute the vector cross product between x and y, and return the components
# indexed by i.
CrossProduct3D <- function(x, y, i=1:3) {
# Project inputs into 3D, since the cross product only makes sense in 3D.
To3D <- function(x) head(c(x, rep(0, 3)), 3)
x <- To3D(x)
y <- To3D(y)
# Indices should be treated cyclically (i.e., index 4 is "really" index 1, and
# so on). Index3D() lets us do that using R's convention of 1-based (rather
# than 0-based) arrays.
Index3D <- function(i) (i - 1) %% 3 + 1
# The i'th component of the cross product is:
# (x[i + 1] * y[i + 2]) - (x[i + 2] * y[i + 1])
# as long as we treat the indices cyclically.
return (x[Index3D(i + 1)] * y[Index3D(i + 2)] -
x[Index3D(i + 2)] * y[Index3D(i + 1)])
}
CrossProduct2D <- function(x, y) CrossProduct3D(x, y, i=3)
Does it work?
Let's check a random example I found online:
> CrossProduct3D(c(3, -3, 1), c(4, 9, 2)) == c(-15, -2, 39)
[1] TRUE TRUE TRUE
Looks pretty good!
Why is this better than previous answers?
It's 3D (Carl's was 2D-only).
It's simple and idiomatic.
Nicely commented and formatted; hence, easy to understand
The downside is that the number '3' is hardcoded several times. Actually, this isn't such a bad thing, since it highlights the fact that the vector cross product is purely a 3D construct. Personally, I'd recommend ditching cross products entirely and learning Geometric Algebra instead. :)
The help ?crossprod explains it quite clearly. Take linear regression for example, for a model y = XB + e you want to find X'X, the product of X transpose and X. To get that, a simple call will suffice: crossprod(X) is the same as crossprod(X,X) is the same as t(X) %*% X. Also, crossprod can be used to find the dot product of two vectors.
In response to #Bryan Hanson's request, here's some Q&D code to calculate a vector crossproduct for two vectors in the plane. It's a bit messier to calculate the general 3-space vector crossproduct, or to extend to N-space. If you need those, you'll have to go to Wikipedia :-) .
crossvec <- function(x,y){
if(length(x)!=2 |length(y)!=2) stop('bad vectors')
cv <- x[1]*y[2]-x[2]*y[1]
return(invisible(cv))
}
Here is a minimalistic implementation for 3D vectors:
vector.cross <- function(a, b) {
if(length(a)!=3 || length(b)!=3){
stop("Cross product is only defined for 3D vectors.");
}
i1 <- c(2,3,1)
i2 <- c(3,1,2)
return (a[i1]*b[i2] - a[i2]*b[i1])
}
If you want to get the scalar "cross product" of 2D vectors u and v, you can do
vector.cross(c(u,0),c(v,0))[3]
There is a useful math operations package named pracma (https://rdrr.io/rforge/pracma/api/ or CRAN https://cran.r-project.org/web/packages/pracma/index.html).
Easy to use and quick. The cross product is literally given by pracma::cross(x, y) for any two vectors.

Resources