Vectorized vector construction in R through indexing - r

I would like to construct an atomic vector X using values from a vector A, such that length(X)>=length(A). Furthermore, the values of X are indexed by a third vector B such that length(B)=length(X). The mapping to construct X is as follows:
X[i] <- A[B[i]]
Now, it is clear to me how I would construct the vector X in a for loop. My question is: since the X is due to be quite large (length(X) ~ 30,000) is there a way to vectorize the construction of X? That is, apply a blanket function that avoids element by element calculation. I looked into functions such as sapply and mapply, but I didn't see how I could incorporate the indexing of vector B into those.
For example, if:
A <- c(20,31,17,110,87)
B <- c(1,1,2,1,1,3,4,3,5)
I would expect X to be:
X <- c(20,20,31,20,20,17,110,17,87)

That's very simple to vectorise, so you can avoid overcomplicating it with applys or loops etc. - simply use B as numerical vector to index the values of A.
In your case, using A[B] translates to A[c(1,1,2,1,...,5)] which is basically saying "return the 1st element of A, the first element of A, the second element of A, the first element of A... the fifth element of A".
A <- c(20,31,17,110,87)
B <- c(1,1,2,1,1,3,4,3,5)
A[B]
## > A[B]
## [1] 20 20 31 20 20 17 110 17 87
X <- A[B]

Related

Calculate product of all following vector elements in R

I have some arbitrary vector such as
v <- seq(1,5)
For every index of v I now need to compute the product of all following elements of v.
Here, the result would be a vector w=(5*4*3*2,5*4*3,5*4,5,1) but I need a general algorithm for this. I am trying to avoid a loop (which is the obvious solution).
You can use cumprod with rev:
c(rev(cumprod(rev(v[-1]))), 1)
#[1] 120 60 20 5 1

Counting number of elements for each vector in a dataframe on larger than each element in vector 2 in R

Finding number of elements in one vector that are less than an element in another vector
This post has a very similar question, now b is a dataframe instead of a vecor. How do we do the same comparison if a has different length than each vecor in b?
sapply(b, function(x) sum(a < x))
Not sure I understand your question, but:
If a is a scalar, then this same sapply statement returns the number of elements that are larger than a in each column. If a is a vector, you may be interested in sapply(b, function(x) sum(max(a) < x)), to count the number of elements in each column of b that are greater than all the elements in a.

Convert a one column matrix to n x c matrix

I have a (nxc+n+c) by 1 matrix. And I want to deselect the last n+c rows and convert the rest into a nxc matrix. Below is what I've tried, but it returns a matrix with every element the same in one row. I'm not sure why is this. Could someone help me out please?
tmp=x[1:n*c,]
Membership <- matrix(tmp, nrow=n, ncol=c)
You have a vector x of length n*c + n + c, when you do the extract, you put a comma in your code.
You should do tmp=x[1:(n*c)].
Notice the importance of parenthesis, since if you do tmp=x[1:n*c], it will take the range from 1 to n, multiply it by c - giving a new range and then extract based on this new range.
For example, you want to avoid:
(1:100)[1:5*5]
[1] 5 10 15 20 25
You can also do without messing up your head with indexing:
matrix(head(x, n*c), ncol=c)

R looping over two vectors

I have created two vectors in R, using statistical distributions to build the vectors.
The first is a vector of locations on a string of length 1000. That vector has around 10 values and is called mu.
The second vector is a list of numbers, each one representing the number of features at each location mentioned above. This vector is called N.
What I need to do is generate a random distribution for all features (N) at each location (mu)
After some fiddling around, I found that this code works correctly:
for (i in 1:length(mu)){
a <- rnorm(N[i],mu[i],20)
feature.location <- c(feature.location,a)
}
This produces the right output - a list of numbers of length sum(N), and each number is a location figure which correlates with the data in mu.
I found that this only worked when I used concatenate to get the values into a vector.
My question is; why does this code work? How does R know to loop sum(N) times but for each position in mu? What role does concatenate play here?
Thanks in advance.
To try and answer your question directly, c(...) is not "concatenate", it's "combine". That is, it combines it's argument list into a vector. So c(1,2,3) is a vector with 3 elements.
Also, rnorm(n,mu,sigma) is a function that returns a vector of n random numbers sampled from the normal distribution. So at each iteration, i,
a <- rnorm(N[i],mu[i],20)
creates a vector a containing N[i] random numbers sampled from Normal(mu[i],20). Then
feature.location <- c(feature.location,a)
adds the elements of that vector to the vector from the previous iteration. So at the end, you have a vector with sum(N[i]) elements.
I guess you're sampling from a series of locations, each a variable no. of times.
I'm guessing your data looks something like this:
set.seed(1) # make reproducible
N <- ceiling(10*runif(10))
mu <- sample(seq(1000), 10)
> N;mu
[1] 3 4 6 10 3 9 10 7 7 1
[1] 206 177 686 383 767 496 714 985 377 771
Now you want to take a sample from rnorm of length N(i), with mean mu(i) and sd=20 and store all the results in a vector.
The method you're using (growing the vector) is not recommended as it will be re-copied in memory each time an element is added. (See Circle 2, although for small examples like this, it's not so important.)
First, initialize the storage vector:
f.l <- NULL
for (i in 1:length(mu)){
a <- rnorm(n=N[i], mean=mu[i], sd=20)
f.l <- c(f.l, a)
}
Then, each time, a stores your sample of length N[i] and c() combines it with the existing f.l by adding it to the end.
A more efficient approach is
unlist(mapply(rnorm, N, mu, MoreArgs=list(sd=20)))
Which vectorizes the loop. Unlist is used as mapply returns a list of vectors of varying lengths.

R: smallest distance between an element of vector a and an element of vector b

a and b are two vectors of real numbers.
They do not necessarily have the same length.
The distance between the ith element of a and the jth element of b is defined as abs(a[i] - b[j])
How would you compute the smallest distance between any element of a and any element of b without explicit loops?
Here is what I did: min(sapply(X=1:length(b), FUN=function(x) abs(a - b[x]))).
However, I have the feeling there is something better to do...
I'd use the dist function to create a distance matrix, and then find the minimum distance in that. This is probably much faster than an explicit loop in R (including sapply).
a = runif(23)
b = runif(10)
d_matrix = as.matrix(dist(cbind(a,b)))
d_matrix[d_matrix == 0] <- NA
sqrt(min(d_matrix, na.rm = TRUE))
Note that cbind recycles the smaller vector. So this function is probably not optimal, but for vectors that do not differ that much in size still much fast than an explicit loop.
And to find which pair of elements had this distance (although the recycling introduces some challenges here):
which(d_matrix == min(d_matrix, na.rm = TRUE), arr.ind = TRUE)
Here's an attempt:
a <- c(9,5,6); b <- c(6,9)
# a
#[1] 9 5 6
# b
#[1] 6 9
combos <- sapply(b,function(x) abs(x-a))
# or an alternative
combos <- abs(outer(a,b,FUN="-"))
You could then get the minimum distance with:
min(combos)
If you wanted to get the respective indexes of the minimum values you could do:
which(combos==min(combos),arr.ind=TRUE)
# each matrix row has the 2 indexes for the minimums
# first column is 'a' index, second is 'b' index
# row col
# [1,] 3 1
# [2,] 1 2
One-liner should work here: min(abs(outer(a, b, "-")))

Resources