I have a list like this
list1<- list(c(12,45,12,0,0),c(12,45,12,0,1),c(14,45,12,0,2),c(12,15,12,0,3),c(12,45,17,0,4))
I want to iterate through this list by using foreach in R. The goal here is to compare a random vector like c(1,1,2,0,6) with these vectors in the list. By "compare", I mean I need to calculate the euclidean distance between these vectors and find the closest one to my random vector.
The most efficient approach to calculate the distances could be achieved with the dist function.
# a random vector
rvec <- c(1,1,2,0,6)
# a list of coordinates
list1 <- list(c(12,45,12,0,0),
c(12,45,12,0,1),
c(14,45,12,0,2),
c(12,15,12,0,3),
c(12,45,17,0,4))
# calculate distances between the random vector and the list elements:
dist(rbind(rvec, t(matrix(unlist(list1), length(list1)))))[seq_along(list1)]
[1] 46.82948 46.71188 47.12749 20.63977 47.81213
Related
I would like to extract the coordinate from a vector that is closest to a test coordinate.
The task would be very similar to the previously posted:(Find the approximate value in the vector) but adapted to nDimensional cases and with multiple inputs.
In other words, given:
test=t(data.frame(
c(0.9,1.1,1),
c(7.5,7.4,7.3),
c(11,11,11.2)
))
reference=t(data.frame(
c(1,0,0.5),
c(2,2,2),
c(3.3,3.3,3.3),
c(9,9,9),
c(10,11,12)
))
result <- approximate(test,reference)
1 0 0.5
9 9 9
10 11 12
I programmed a function using euclidean distances and old school loops but when the inputs dataframes are big it results in looong executing times.
Anyone can figure it out a more efficient way of doing it?
Thank you in advance.
PS:This is the function I created that works but takes a while (in case someone could find it useful)
approximate_function<- function(approximate,reference){
# Function that returns for each entrance of approximate the closest value of reference
# It uses a euclidean distance.
# each entrance must be a row in the dataframe
# the number of columns of the df indicates the dimension of the points
# Sub function to calculate euclidean distance
distance_function<- function(a,b){
squaresum<-0
for(id in 1:length(a)){
squaresum=squaresum+(a[id]-b[id])^2
}
result=sqrt(squaresum)
return(result)
}
result<-data.frame()
#Choose 1 item from vector to aproximate at a time
for(id_approximate in 1:nrow(approximate)){
distance=c()
#Compare the value to aproximate with the reference points and chose the one with less distance
for(id_reference in 1:nrow(reference)){
distance[id_reference]<-distance_function(approximate[id_approximate,],reference[id_reference,])
}
result<-rbind(
result,
reference[which.min(distance),]
)
}
return(result)
}
This way the calculation is done instantly.
approximate_function<- function(approximate,reference){
# Function that returns for each entrance of approximate the closest value of reference
# It uses a euclidean distance.
# each entrance must be a row in the dataframe
# the number of columns of the df indicates the dimension of the points
results=data.frame()
#Choose 1 item from vector to aproximate at a time
for(id in 1:nrow(approximate)){
#calculates euclidean distances regardless the dimension
sumsquares=rep(0,nrow(reference))
for(dim in 1:ncol(approximate)){
sumsquares = sumsquares + (approximate[id,dim]-reference[,dim])^2
}
distances=sqrt(sumsquares)
results<- rbind(
results,
reference[which.min(distances),]
)
}
return(results)
}
You've got a few calculations that will be slow.
First:
test=t(data.frame(
c(0.9,1.1,1),
c(7.5,7.4,7.3),
c(11,11,11.2)
))
This one probably doesn't matter, but it would be better as
test=rbind(
c(0.9,1.1,1),
c(7.5,7.4,7.3),
c(11,11,11.2)
)
Same for setting up reference.
Second and third: You set up result as a dataframe, then add rows to it one at a time. Dataframes are much slower for row operations than matrices, and gradually growing structures in R is slow. So set it up as a matrix from the beginning at the right size, and assign results into specific rows.
EDITED to add:
Fourth: there's no need for the inner loop. You can calculate all the squared differences in one big matrix, then use rowSums or colSums to get the squared distances. This is easiest if you're working with matrix columns instead of rows, because vectors will be properly replicated automatically.
Fifth: There's no need to take the square root; if the squared distance is minimized, so is the distance.
Here's the result:
approximate <- function(test, reference){
# transpose the reference
reference <- t(reference)
# set up the result, not transposed
result <- test*NA
#Choose 1 item from vector to aproximate at a time
for(id in seq_len(nrow(test))){
squareddist <- colSums((test[id,] - reference)^2)
result[id,] <- reference[, which.min(squareddist)]
}
return(result)
}
Suppose I have a vector of length 10
vec <- c(10,9,8,7,6,5,4,3,2,1)
and I wanted to create a function that takes in a subset length value (say 3) and computes the squared inverse up to that length. I would like to compute:
10+(9/(2^2))+(8/(3^2))
which would be
vec[1]+(vec[2]/(2^2))+(vec[3]/(3^2))
but with a function that can take input of the subset length.
The only solution I can think of is a for loop, is there a faster more elegant solution in R?
Yes, you can use the fact that most operations in R are vectorised to do this without a loop:
vec <- c(10,9,8,7,6,5,4,3,2,1)
cum_inverse_square <- function(vec, n) {
sum(vec[1:n] / (1:n)^2)
}
cum_inverse_square(vec, 3) == 10+(9/(2^2))+(8/(3^2)) # TRUE
So I have a rather complex (at least for me) problem in R.
I want to calculate distances between two pair of distributions, for nearly 10k pairs.
I have a distance function from package philentropy, which takes two vectors x y and calculates the distance between them such as:
d <- distance(x, y, method="desired_method")
Another option is to create a matrix with each row representing a distribution, so that the function will calculate all pairwise distances among all distributions in the matrix:
d <- distance(x, method="desired_method")
I have two correlation matrices a and b with nearly 10k rows each, corresponding to 10k correlation distributions. Both matrices have the same number of rows, and my goal is to contrast first row of matrix a with first row of matrix b, second a row with second b row and so on, iteratively.
I can select each desired rows and perform the first distance usage, or I can merge the two matrices with rbind and perform all pairwise distances with second distance usage.
The problem is, with first approach, I do not know how to generate a for loop to iteratively get the nth row of each matrix, and perform distance calculation, while storing the result in a vector.
Additionally, if I perform the second option, I do not want to get all pairwise distances, but just distances corresponding to:
d[i,i+nrow(a)]
And doing so iteratively to generate a corresponding vector of nrow(a) values.
Any help?
If you have two matrices, mat_x and mat_y, each with the same number of rows, then the for loop would be:
answer <- vector(mode = 'numeric', length = 10000L)
for (i in 1:10000){
answer[[i]] <- distance(mat_x[i,], mat_y[i,], method="desired_method")
}
How can I create a vector of matrices of different dimension in R. For example say I have two matrices
M1=array(0,dim=c(2,2))
M2=array(0,dim=c(3,3))
Then I can make a vector C containing these matrices such that
C[1]=M1
and
C[2]=M2.
I know that I can create a 3 dimensional array
C=array(NA,dim=c(2,3,3)
but the only way I know how to do this has to have the
C[1,,]
element in the array have more space then necessary.
Use a list
C <- list()
C[[1]] <- array(0,dim=c(2,2))
C[[2]] <- array(0,dim=c(3,3))
C[[1]][1,1] <- 5
C[[1]]
C[[2]]
For example, how would I calculate, for each vertex, the percentage of ties directed outward toward males?
g <- erdos.renyi.game(20, .3, type=c("gnp"), directed = TRUE)
V(g)$male <- rbinom(20,1,.5)
V(g)$male[10] <- NA
A possible (not necessary optimal) solution is as follows (this is one single line, I just break it down for sake of readability):
unlist(lapply(get.adjlist(g, mode="out"),
function (neis) {
sum(V(g)[neis]$male, na.rm=T)
}
)) / degree(g, mode="out")
Now let's break it up into smaller pieces. First, we get the adjacency list of the graph using get.adjlist(g, mode="out"). This gives you a list of vectors, each vector containing the out-neighbors of a vertex. Then we apply a function to each vector in this list using lapply. The function being applied is as follows:
function (neis) {
sum(V(g)[neis]$male, na.rm=T)
}
The function simply takes the neighbors of a node in neis and uses that to select a subset of vertices from the entire vertex set V(g). Then the male attribute is retrieved for this vertex subset and the values are summed, removing NA values on the fly. Essentially, this function gives you the number of males in neis.
Now, returning to our original expression, we have applied this function to the adjacency list of the graph using lapply, obtaining a list of numbers, each number containing the number of male neighbors of a given vertex. We convert this list into a single R vector using unlist and divide it elementwise by the out-degrees of the vertices to obtain the ratios.