Remove isolated elements of a vector - r

I have a vector of integers and I want to filter it by eliminating the components that are "isolated".
What do I mean by "isolated"? those components that does not lie in an 4-neighbourhood of other component.
The components in the vector are ordered increasingly, and there are no repetitions.
For example if I have c(1,2,3,8,15,16,17) then I need to eliminate 8 because is not in a 4-neighbourhood of other element.
I've tried applying
for (p in 1:(length(index)-2))
if((index[p+1]>3+index[p])&(index[p+2]>3+index[p+1])){index[p+1]<-0}
index<-index[index!=0]
where index is my vector of interest, but there's some problem with the logical condition.
Could you please give me some hints?
Thanks in advance.

You can achieve it with a combination of outer and colSums, i.e.
x[colSums(abs(outer(x, x, `-`)) >= 4) == length(x)-1]
#[1] 8
To eliminate the values, we can do,
i1 <- colSums(outer(x, x, FUN = function(i, j) abs(i - j) >= 4)) == length(x) - 1
x[!i1]
#[1] 1 2 3 15 16 17
where,
x <- c(1,2,3,8,15,16,17)

We keep values where preceding or next difference is lower or equal to 4 :
v <- c(1,2,3,8,15,16,17)
v[c(FALSE, diff(v) <= 4) | c(diff(v) <= 4, FALSE)]

Related

R: Can we Sum an vector with condition?

Is it possible for us to use sum all the elements having even index in R vector without going through iterating through all the elements ? such as sum(vectorx[i*2], which i in (1:5))
Multiply the vector by c(0, 1) and then add the elements. Due to vector recycling, the elements with odd indices will be multiplied by 0 and the ones in even indices will be multiplied by 1
x = 1:10
sum(x * c(0, 1))
#[1] 30
There are multiple ways to do this
set.seed(1234)
i <- sample(5)
i
#[1] 4 5 2 3 1
1) Use recycling method
sum(i[c(FALSE, TRUE)])
#[1] 8
2) Create a sequence of alternating index to subset
sum(i[seq(2, length(i), 2)])
3) Use modulo division
sum(i[seq_along(i) %% 2 == 0])
We can use seq.int
x <- 1:10
sum(x[seq.int(2, length(x), 2)])

How to exclusive a set from a large set in R

Suppose that I have a set of 10 elements. Suppose that my code is able to choose only 3 elements at a time. Then, I would like it to choose another $3$ elements, however, without selecting the elements that are already selected.
x <- c(4,3,5,6,-2,7,-4,10,22,-12)
Then, suppose that my condition is to select 3 elements that are less than 5. Then,
new_x <- c(4, 3, -2)
Then, I would like to select another 3 elements that are less than 5 but were not selected at the first time. If there is no 3 element then the third element should have value zero.
Hence,
new_xx <- c(-4,-12,0)
Any help, please?
Here is an option using split
f <- function(x, max = 5, n = 3) {
x <- x[x < max]
ret <- split(x, rep(1:(length(x) / n + 1), each = n)[1:length(x)])
lapply(ret, function(w) replace(rep(0, n), 1:length(w), w))
}
f(x)
#$`1`
#[1] 4 3 -2
#
#$`2`
#[1] -4 -12 0
Explanation: We define a custom function that first selects entries < 5, then splits the resulting vector into chunks of length 3 and stores the result in a list, and finally 0-pads those list elements that are vectors of length < 3.
Sample data
x <- c(4,3,5,6,-2,7,-4,10,22,-12)

Select random and unique elements from a vector

Say I have a simple vector with repeated elements:
a <- c(1,1,1,2,2,3,3,3)
Is there a way to randomly select a unique element from each of the repeated elements? I.e. one random draw pointing which elements to keep would be:
1,4,6 ## here I selected the first 1, the first 2 and the first 3
And another:
1,5,8 ## here I selected the first 1, the second 2 and the third 3
I could do this with a loop for each repeated elements, but I am sure there must be a faster way to do this?
EDIT:
Ideally the solution should also always select a particular element if it is already a unique element. I.e. my vector could also be:
b <- c(1,1,1,2,2,3,3,3,4) ## The number four is unique and should always be drawn
Using base R ave we could do something like
unique(ave(seq_along(a), a, FUN = function(x) if(length(x) > 1) head(sample(x), 1) else x))
#[1] 3 5 6
unique(ave(seq_along(a), a, FUN = function(x) if(length(x) > 1) head(sample(x), 1) else x))
#[1] 3 4 7
This generates an index for every value of a, grouped by a and then selects one random index value in each group.
Using same logic with sapply and split
sapply(split(seq_along(a), a), function(x) if(length(x) > 1) head(sample(x), 1) else x)
And it would also work with tapply
tapply(seq_along(a), a, function(x) if(length(x) > 1) head(sample(x), 1) else x)
The reason why we need to check the length (if(length(x) > 1)) is because from ?sample
If x has length 1, is numeric (in the sense of is.numeric) and x >= 1, sampling via sample takes place from 1:x.
Hence, when there is only one number (n) in sample(), it takes sample from 1:n (and not n) so we need to check it's length.

How can I find out what proportion of values fall outside range?

v <- c(1,2,3,4,5,6)
And I mention max=4,min=2
So, I want to know how many values fall outside this range.
I can do this (v < 2 & v > 4)
But not sure how to do the count...
After that I will simply create a percentage with respect to total number of values (here 6).
You can create and sum a logical vector. TRUE elements count as 1 and FALSE as 0, so this will give you the number of elements matching a particular condition.
v <- c(1, 2, 3, 4, 5, 6)
sum(v < 2 | v > 4)
The latter returns 3 because there are three values less than 2 or greater than 4. The comparisons are vectorized, so v < 2 tests whether each element of v in turn is less than 2. The OR operator is given by | in R.
To get the proportion of values beyond the range, you can divide the sum by the length of the vector, or alternatively use mean(), since the mean is the sum divided by the length.
mean(v < 2 | v > 4)
You can simply do:
sum(v < 2 | v > 4) / length(v)
[1] 0.5
You want to use | instead of & because no number will be both less than 2 and greater than 4.

How many values of a vector are divisible by 2? Use R

I have an ex. where I have to see how many values of a vector are divisible by 2. I have this random sample:
set.seed(1)
y <- sample(c(0:99, NA), 400, replace=TRUE)
I created a new variable d to see which of the values are or aren't divisible by 2:
d <- y/2 ; d
What I want to do is to create a logical argument, where all entire numbers give true and the rest gives false. (ex: 22.0 -> TRUE & 24.5 -> FALSE)
I used this command, but I believe that the answer is wrong since it would only give me the numbers that are in the sample:
sum(d %in% y, na.rm=T)
I also tried this (I found on the internet, but I don't really understand it)
is.wholenumber <- function(x, tol = .Machine$double.eps^0.5) abs(x - round(x)) < tol
sum(is.wholenumber(d),na.rm = T)
Are there other ways that I could use the operator "%%"?
you can sum over the mod operator like so: sum(1-y%%2) or sum(y%%2 == 0). Note that x %% 2 is the remainder after dividing by two which is why this solution works.
Here are three different ways:
length(y[y %% 2 == 0])
length(subset(y, y %% 2 == 0))
length(Filter(function(x) x %% 2 == 0, y))
Since we're talking about a division by 2, I would actually take it to the bit level and check if the last bit of the number is a 0 or a 1 (a 0 means it would be divisible by 2).
Going out on a limb here (not sure how the compiler handles this division by 2) but think that would likely be more optimized than a division, which is typically fairly expensive.
To do this at the bit level, you can just do an AND operation between the number itself and 1, if result it 1 it means won't be divisible by 2:
bitwAnd(a, b)

Resources