R: Can we Sum an vector with condition? - r

Is it possible for us to use sum all the elements having even index in R vector without going through iterating through all the elements ? such as sum(vectorx[i*2], which i in (1:5))

Multiply the vector by c(0, 1) and then add the elements. Due to vector recycling, the elements with odd indices will be multiplied by 0 and the ones in even indices will be multiplied by 1
x = 1:10
sum(x * c(0, 1))
#[1] 30

There are multiple ways to do this
set.seed(1234)
i <- sample(5)
i
#[1] 4 5 2 3 1
1) Use recycling method
sum(i[c(FALSE, TRUE)])
#[1] 8
2) Create a sequence of alternating index to subset
sum(i[seq(2, length(i), 2)])
3) Use modulo division
sum(i[seq_along(i) %% 2 == 0])

We can use seq.int
x <- 1:10
sum(x[seq.int(2, length(x), 2)])

Related

Remove isolated elements of a vector

I have a vector of integers and I want to filter it by eliminating the components that are "isolated".
What do I mean by "isolated"? those components that does not lie in an 4-neighbourhood of other component.
The components in the vector are ordered increasingly, and there are no repetitions.
For example if I have c(1,2,3,8,15,16,17) then I need to eliminate 8 because is not in a 4-neighbourhood of other element.
I've tried applying
for (p in 1:(length(index)-2))
if((index[p+1]>3+index[p])&(index[p+2]>3+index[p+1])){index[p+1]<-0}
index<-index[index!=0]
where index is my vector of interest, but there's some problem with the logical condition.
Could you please give me some hints?
Thanks in advance.
You can achieve it with a combination of outer and colSums, i.e.
x[colSums(abs(outer(x, x, `-`)) >= 4) == length(x)-1]
#[1] 8
To eliminate the values, we can do,
i1 <- colSums(outer(x, x, FUN = function(i, j) abs(i - j) >= 4)) == length(x) - 1
x[!i1]
#[1] 1 2 3 15 16 17
where,
x <- c(1,2,3,8,15,16,17)
We keep values where preceding or next difference is lower or equal to 4 :
v <- c(1,2,3,8,15,16,17)
v[c(FALSE, diff(v) <= 4) | c(diff(v) <= 4, FALSE)]

How to exclusive a set from a large set in R

Suppose that I have a set of 10 elements. Suppose that my code is able to choose only 3 elements at a time. Then, I would like it to choose another $3$ elements, however, without selecting the elements that are already selected.
x <- c(4,3,5,6,-2,7,-4,10,22,-12)
Then, suppose that my condition is to select 3 elements that are less than 5. Then,
new_x <- c(4, 3, -2)
Then, I would like to select another 3 elements that are less than 5 but were not selected at the first time. If there is no 3 element then the third element should have value zero.
Hence,
new_xx <- c(-4,-12,0)
Any help, please?
Here is an option using split
f <- function(x, max = 5, n = 3) {
x <- x[x < max]
ret <- split(x, rep(1:(length(x) / n + 1), each = n)[1:length(x)])
lapply(ret, function(w) replace(rep(0, n), 1:length(w), w))
}
f(x)
#$`1`
#[1] 4 3 -2
#
#$`2`
#[1] -4 -12 0
Explanation: We define a custom function that first selects entries < 5, then splits the resulting vector into chunks of length 3 and stores the result in a list, and finally 0-pads those list elements that are vectors of length < 3.
Sample data
x <- c(4,3,5,6,-2,7,-4,10,22,-12)

Angle between vector and list of vectors in R

When comparing two vectors it is simple to calculate the angle between them, but in R it is noticeably harder to calculate the angle between a vector and a matrix of vectors efficiently.
Say you have a 2D vector A=(2, 0) and then a matrix B={(1,3), (-2,4), (-3,-3), (1,-4)}. I am interested in working out the smallest angle between A and the vectors in B.
If I try to use
min(acos( sum(a%*%b) / ( sqrt(sum(a %*% a)) * sqrt(sum(b %*% b)) ) ))
it fails as they are non-conformable arguments.
Is there any code similar to that of above which can handle a vector and matrix?
Note: At the risk of being marked as a duplicate the solutions found in several sources do not apply in this case
Edit: The reason for this is I have a large matrix X, and A is just one row of this. I am reducing the number of elements based solely on the angle of each vector. The first element of B is the first in X, and then if the angle between any element in B and the next element X[,2] (here A) is greater than a certain tolerance, this is added to the list B. I am just using B<-rbind(B,X[,2]) to do this, so this results in B being a matrix.
You don't describe the format of A and B in detail, so I assume they are matrices by rows.
(A <- c(2, 0))
# [1] 2 0
(B <- rbind(c(1,3), c(-2,4), c(-3,-3), c(1,-4)))
# [,1] [,2]
# [1,] 1 3
# [2,] -2 4
# [3,] -3 -3
# [4,] 1 -4
Solution 1 with apply():
apply(B, 1, FUN = function(x){
acos(sum(x*A) / (sqrt(sum(x*x)) * sqrt(sum(A*A))))
})
# [1] 1.249046 2.034444 2.356194 1.325818
Solution 2 with sweep(): (replace sum() above with rowSums())
sweep(B, 2, A, FUN = function(x, y){
acos(rowSums(x*y) / (sqrt(rowSums(x*x)) * sqrt(rowSums(y*y))))
})
# [1] 1.249046 2.034444 2.356194 1.325818
Solution 3 with split() and mapply:
mapply(function(x, y){
acos(sum(x*y) / (sqrt(sum(x*x)) * sqrt(sum(y*y))))
}, split(B, row(B)), list(A))
# 1 2 3 4
# 1.249046 2.034444 2.356194 1.325818
The vector of dot products between the rows of B and the vector A is B %*% A. The vector lengths of the rows of B are sqrt(rowSums(B^2)).
To find the smallest angle, you want the largest cosine, but you don't actually need to compute the angle, so the length of A doesn't matter.
Thus the row with the smallest angle will be given by row <- which.max((B %*% A)/sqrt(rowSums(B^2))). With Darren's data, that's row 1.
If you really do need the smallest angle, then you can apply the formula for two vectors to B[row,] and A. If you need all of the angles, then the formula would be
acos((B %*% A)/sqrt(rowSums(B^2))/sqrt(sum(A^2)))

Select random and unique elements from a vector

Say I have a simple vector with repeated elements:
a <- c(1,1,1,2,2,3,3,3)
Is there a way to randomly select a unique element from each of the repeated elements? I.e. one random draw pointing which elements to keep would be:
1,4,6 ## here I selected the first 1, the first 2 and the first 3
And another:
1,5,8 ## here I selected the first 1, the second 2 and the third 3
I could do this with a loop for each repeated elements, but I am sure there must be a faster way to do this?
EDIT:
Ideally the solution should also always select a particular element if it is already a unique element. I.e. my vector could also be:
b <- c(1,1,1,2,2,3,3,3,4) ## The number four is unique and should always be drawn
Using base R ave we could do something like
unique(ave(seq_along(a), a, FUN = function(x) if(length(x) > 1) head(sample(x), 1) else x))
#[1] 3 5 6
unique(ave(seq_along(a), a, FUN = function(x) if(length(x) > 1) head(sample(x), 1) else x))
#[1] 3 4 7
This generates an index for every value of a, grouped by a and then selects one random index value in each group.
Using same logic with sapply and split
sapply(split(seq_along(a), a), function(x) if(length(x) > 1) head(sample(x), 1) else x)
And it would also work with tapply
tapply(seq_along(a), a, function(x) if(length(x) > 1) head(sample(x), 1) else x)
The reason why we need to check the length (if(length(x) > 1)) is because from ?sample
If x has length 1, is numeric (in the sense of is.numeric) and x >= 1, sampling via sample takes place from 1:x.
Hence, when there is only one number (n) in sample(), it takes sample from 1:n (and not n) so we need to check it's length.

The number of data points in matrix and vector forms

Supposed that X contains 1000 rows with m columns, where m equal to 3 as follows:
set.seed(5)
X <- cbind(rnorm(1000,0,0.5), rnorm(1000,0,0.5), rnorm(1000,0,0.5))
Variable selection is performed, then the condition will be checked before performing the next operation as follows.
if(nrow(X) < 1000){print(a+b)}
,where a is 5 and b is 15, so if nrow(X) < 1000 is TRUE, then 20 will be printed out.
However, in case that X happens to be a vector because only one column is selected,
how can I check the number of data points when X can be either a matrix or vector ?
What I can think of is that
if(is.matrix(X)){
n <- nrow(X)
} else {
n <- length(X)}
if(n < 1000){print(a+b)}
Anyone has a better idea ?
Thank you
You can use NROW for both cases. From ?NROW
nrow and ncol return the number of rows or columns present in x. NCOL and NROW do the same treating a vector as 1-column matrix.
So that means that even if the subset is dropped down to a vector, as long as x is an array, vector, or data frame NROW will treat it as a one-column matrix.
sub1 <- X[,2:3]
is.matrix(sub1)
# [1] TRUE
NROW(sub1)
# [1] 1000
sub2 <- X[,1]
is.matrix(sub2)
# [1] FALSE
NROW(sub2)
# [1] 1000
So if(NROW(X) < 1000L) a + b should work regardless of whether X is a matrix or a vector. I use <= below, since X has exactly 1000 rows in your example.
a <- 5; b <- 15
if(NROW(sub1) <= 1000L) a + b
# [1] 20
if(NROW(sub2) <= 1000L) a + b
# [1] 20
A second option would be to use drop=FALSE when you make the variable selection. This will make the subset remain a matrix when the subset is only one column. This way you can use nrow with no worry. An example of this is
X[, 1, drop = FALSE]

Resources