Cut elements from the beginning and end of an R vector - r

For time series analysis I handle data that often contains leading and trailing zero elements. In this example, there are 3 zeros at the beginning an 2 at the end. I want to get rid of these elements, and filter for the contents in the middle (that also may contain zeros)
vec <- c(0, 0, 0, 1, 2, 0, 3, 4, 0, 0)
I did this by looping from the beginning and end, and masking out the unwanted elements.
mask <- rep(TRUE, length(vec))
# from begin
i <- 1
while(vec[i] == 0 && i <= length(vec)) {
mask[i] <- FALSE
i <- i+1
}
# from end
i <- length(vec)
while(i >= 1 && vec[i] == 0) {
mask[i] <- FALSE
i <- i-1
}
cleanvec <- vec[mask]
cleanvec
[1] 1 2 0 3 4
This works, but I wonder if there is a more efficient way to do this, avoiding the loops.

vec[ min(which(vec != 0)) : max(which(vec != 0)) ]
Basically the which(vec != 0) part gives the positions of the numbers that are different from 0, and then you take the min and max of them.

We could use the range and Reduce to get the sequence
vec[Reduce(`:`, range(which(vec != 0)))]
#[1] 1 2 0 3 4

Take the cumsum forward and backward of abs(vec) and keep only elements > 0. if it were known that all elements of vec were non-negative, as in the question, then we could optionally omit abs.
vec[cumsum(abs(vec)) > 0 & rev(cumsum(rev(abs(vec)))) > 0]
## [1] 1 2 0 3 4

Related

Locating the row and col number with the entry number in R

I am trying to find the row and column numbers of a matrix once I have the entry number. For example if I am trying to find a 0 in a matrix full of numbers I would try something like this
test_array <- array(1,c(30,30))
test_array[200] <- 0
counter <- 0
for(i in test_array){
counter <- counter + 1
if(i == 0){
print(counter)
}
}
200
So now I know that at position 200 I have a 0 but how do I check where it is in terms of its row and col.
Something like (15, 8)
You can use the arr.ind argument in which:
which(test_array == 0, arr.ind = TRUE)
#> row col
#> [1,] 20 7

Counting leading zeros in a vector

Let's say we have a few vectors containing some numeric data.
For example:
vect1<- c(0, 0, 0, 60, 40, 80, 0)
vect2<- c(100, 0, 0, 80, 40)
vect3<- c(0,0,1,2,3)
I would like to count the number of "leading zeros" in this vector... so In this case I would like to have a value of 3 returned for vect1, a value of 0 returned for vect2, and a value of 2 returned for vect3. How would I go about implementing this in R?
Thanks in advance!
You can use a simple for loop with a counter here.
counter <- 0
for (i in vect){
if (i == 0){
counter<-counter+1
} else {
break()
}
print(counter)
You could easily wrap this in a function for reuse and just replace the print with return.
There are several ways to do this. One option is to get the cumulative sum where the vector is not 0, then, convert it to a logical and get the sum
f1 <- function(x) sum(cumsum(x != 0) == 0)
-testing
> f1(vect1)
[1] 3
> f1(vect2)
[1] 0
> f1(vect3)
[1] 2
Or may also use rle
f2 <- function(x) with(rle(x == 0), lengths[1][values[1]][1])

For and If in R data programming

I want to evaluate the distance between non-zero data. So if i have 50 data, and only the first and last data is non-zero, thus i want the result to be 49.
For example, my data is:
1. 0
2. 0
3. 5
4. 6
5. 0
6. 1
7. 0
Based on my data above, i want to get 4 variables:
v0 = 3 (because the distance between 0th to 3rd data is 3 jumps)
v1 = 1 (because the distance between 3rd to 4th data is 1 jump)
v2 = 2 (because the distance between 4rd to 6th data is 2 jump)
v3 = 1 (because the distance between 6rd to 7th data is 1 jump)
This is my code:
data=c(0,0,5,6,0,1,0)
t=1
for (i in data) {
if (i == 0) {
t[i]=t+1
}
else {
t[i]=1
}
}
t
The result is:
[1] 1 NA NA NA 1 1
Could you help me in figuring out this problem? I also hope that the code is using some kind of loop, so that it can be applied to any other data.
The general rule is not clear from the question but if x is the input we assume that:
the input is non-negative
the first element in output is the position of the first +ve element in x
subsequent elements of output are distances between successive +ve elements of x
if that results in a vector whose sum is less than length(x) append the remainder
To do that determine the positions of the positive elements of c(1, x), calculate the differences between successive elements in that reduced vector using diff and then if they don't sum to length(x) append the remainder.
dists <- function(x) {
d <- diff(which(c(1, x) > 0))
if (sum(d) < length(x)) c(d, length(x) - sum(d)) else d
}
# distance to 5 is 3 and then to 6 is 1 and then to 1 is 2 and 1 is left
x1 <- c(0, 0, 5, 6, 0, 1, 0)
dists(x1)
## [1] 3 1 2 1
# distance to first 1 is 1 and from that to second 1 is 3
x2 <- c(1, 0, 0, 1)
dists(x2)
## [1] 1 3
Here it is redone using a loop:
dists2 <- function(x) {
pos <- 0
out <- numeric(0)
for(i in seq_along(x)) {
if (x[i]) {
out <- c(out, i - pos)
pos <- i
}
}
if (sum(out) < length(x)) out <- c(out, length(x) - sum(out))
out
}
dists2(x1)
## [1] 3 1 2 1
dists2(x2)
## [1] 1 3
Updates
Simplification based on comments below answer. Added loop approach.

Remove zeros in the start and end of a vector

I have a vector like this:
x <- c(0, 0, 0, 0, 4, 5, 0, 0, 3, 2, 7, 0, 0, 0)
I want to keep only the elements from position 5 to 11. I want to delete the zeroes in the start and end. For this vector it is quite easy since it is small.
I have very large data and need something in general for all vectors.
Try this:
x[ min( which ( x != 0 )) : max( which( x != 0 )) ]
Find index for all values that are not zero, and take the first -min and last - max to subset x.
You can try something like:
x=c(0,0,0,0,4,5,0,0,3,2,7,0,0,0)
rl <- rle(x)
if(rl$values[1] == 0)
x <- tail(x, -rl$lengths[1])
if(tail(rl$values,1) == 0)
x <- head(x, -tail(rl$lengths,1))
x
## 4 5 0 0 3 2 7
Hope it helps,
alex
This would also work :
x[cumsum(x) & rev(cumsum(rev(x)))]
# [1] 4 5 0 0 3 2 7
I would probably define two functions, and compose them:
trim_leading <- function(x, value=0) {
w <- which.max(cummax(x != value))
x[seq.int(w, length(x))]
}
trim_trailing <- function(x, value=0) {
w <- which.max(cumsum(x != value))
x[seq.int(w)]
}
And then pipe your data through:
x %>% trim_leading %>% trim_trailing

How to prevent NULL from killing my ifelse vectorising?

I would like to vectorize such function:
if (i > 0)
fun1(a[i])
else
fun2(i)
where fun1, fun2 are already vectorized, and a is a vector.
My attempt:
ifelse(i > 0, fun1(a[i]), fun2(i))
However it is wrong!
> a <- c(1,2,3,4,5)
> i<-c(0,1,2,3)
> ifelse(i > 0, a[i], 0)
[1] 0 2 3 1 # expected 0 1 2 3
Do I have to use sapply? Is there any simple alternative that works?
There is nothing wrong. a[i] evaluates to c(1,2,3) since a[0] is ignored. this is recycled to c(1,2,3,1) to match length(i). So you get 0 2 3 1 from your ifelse because the first element of i is FALSE, and the other come from a[i] recycled.
There is a workaroud though: you can replace non-positive indices with NA:
> ifelse(i > 0, a[ifelse(i > 0, i, NA)], 0)
[1] 0 1 2 3

Resources