So, I have a vector full of 1s and 0s. I need to plot a graph that starts at (0, 0) and rises by 1 for every 1 in the vector and dips by 1 for every 0 in the vector. For example if my vector is [ 1, 1, 1, 0, 1, 0, 1, 1 ] I should get something that looks like
I thought about creating another vector that would hold the sum of the first i elements of the original vector at index i (from the example: [ 1, 2, 3, 3, 4, 4, 5, 6 ]) but that would not account for the dips at 0s. Also, I cannot use loops to solve this.
I would convert the zeros to -1, add a zero at the very beginning to make sure it starts from [0,0] and then plot the cumulative sum:
#starting vec
myvec <- c(1, 1, 1, 0, 1, 0, 1, 1)
#convert 0 to -1
myvec[myvec == 0] <- -1
#add a zero at the beginning to make sure it starts from [0,0]
myvec <- c(0, myvec)
#plot cumulative sum
plot(cumsum(myvec), type = 'line')
#points(cumsum(myvec)) - if you also want the points on top of the line
Related
I am trying to remove consecutive rows in a dataframe if all the values in the rows are less than 1 and it exceeds e.g 4 rows.
Lets say we have a column [0.1, 0, 5, 4, 0.2, 0.1, 0, 0, 0, 4, 9, 10]. Then I would like to remove only the middle part [0.2, 0.1, 0, 0, 0] and have left [0.1, 0, 5, 4, 4, 9, 10]. The thing is I can easily do this by using a for loop, however I am dealing with over 3 million data points and it takes way too long. Therefore I am looking for a solution that makes use of vectorization in R. Does anyone know what function I can use?
Thanks in advance!
You can try to perform a convolution/correlation over your dataset. If all elements in 4 consecutive rows are less than 1, then their sum is less than 4 * m, with m being the number of columns of your dataset. Then, it is a matter of upsampling the result correctly. Here is a complete example, with NumPy array (that you can easily extract from your DataFrame with df.to_numpy()):
import numpy as np
"""
Notation: row whose elements are all < 1, will be called "target row"
Task: Remove every target row in a cluster of 4 consecutive target rows
Input: 11 x 5 dataset with target rows [0, 1, 2, 3, 4, 7]
Output: pruned dataset with rows [5, 6, 7, 8, 9, 10]
(Note that target row 7 must be kept because it's separated from the others)
"""
# Input
n, m = 11, 5
ar = np.random.rand(n, m)
ar[[5, 6, 8, 9, 10]] += 1.
min_rows = 4
# Find all target rows
sums = (ar.sum(axis=1) < ar.shape[1]).astype(np.float32)
print(f" Sums: {sums}")
# Find centers of clusters with 4 consecutive target rows
kernel = np.ones((min_rows,))
output = np.correlate(sums, kernel, mode="same")
print(f" Output: {output}")
mask = output == min_rows
print(f" Mask: {mask.astype(np.float32)}")
# Find all elements in the clusters
mask_ids = np.nonzero(mask)[0]
center = min_rows // 2
rng = np.arange(-center, center + (min_rows % 2 != 0), dtype=np.int32)
ids = (rng + mask_ids.reshape(-1, 1)).ravel()
mask[ids] = True
print(f"New Mask: {mask.astype(np.float32)}")
# mask the dataset
ar = ar[~mask]
I am doing the next task.
Suppose that I have the next vector.
(1,1,0,0,0,1,1,1,1,0,0,1,1,1,0)
I need to extract the next info.
the maximum number of sets of consecutive zeros
the mean number of consecutive zeros.
FOr instance in the previous vector
the maximum is: 3, because I have 000 00 0
Then the mean number of zeros is 2.
I am thinking in this idea because I need to do the same but with several observations. I think to implement this inside an apply function.
We could use rle for this. As there are only binary values, we could just apply the rle on the entire vector, then extract the lengths that correspond to 0 (!values - returns TRUE for 0 and FALSE others)
out <- with(rle(v1), lengths[!values])
And get the length and the mean from the output
> length(out)
[1] 3
> mean(out)
[1] 2
data
v1 <- c(1, 1, 0, 0, 0, 1, 1, 1, 1, 0, 0, 1, 1, 1, 0)
You can try another option using regmatches
> v <- c(1, 1, 0, 0, 0, 1, 1, 1, 1, 0, 0, 1, 1, 1, 0)
> s <- paste0(v, collapse = "")
> zeros <- unlist(regmatches(s, gregexpr("0+", s)))
> length(zeros)
[1] 3
> mean(nchar(zeros))
[1] 2
Is there an efficient way to calculate the length of portions of a vector that repeat a specified value?
For instance, I want to calculate the length of rainless periods along a vector of daily rainfall values:
daily_rainfall=c(15, 2, 0, 0, 0, 3, 3, 0, 0, 10)
Besides using the obvious but clunky approach of looping through the vector, what cleaner way can I get to the desired answer of
rainless_period_length=c(3, 2)
given the vector above?
R has a built-in function rle: "run-length encoding":
daily_rainfall <- c(15, 2, 0, 0, 0, 3, 3, 0, 0, 10)
runs <- rle(daily_rainfall)
rainless_period_length <- runs$lengths[runs$values == 0]
rainless_period_length
output:
[1] 3 2
Let's say I want to find the longest length of consecutive numbers (excluding 0) in a sequence in R.
Example: (0,2,3,0,5) in this case it should return 2 .
The solution I came up with is as follows:
A1 <- c(1, 1, 0,1,1,1)
length =NULL
B<-rle(A1==0)
C<-B$lengths
D<-B$values
for(i in 1:length(C)){
if(D[i]==FALSE){length[i]=C[i]}
}
length <- length [!is.na(length )]
max(length)
[1] 3
How can I find the longest sequence of non-zero numbers in a vector in R?
We could use rle. A==0 output a logical index vector, rle computes the lengths and runs of values of adjacent elements that are the same for logical vector. Extract the lengths of values that are not '0' and get the max after removing the first and last elements to account for the maximum lengths of non-zero elements at the start or end of vector.
max(with(rle(A==0), lengths[-c(1, length(lengths))][
!values[-c(1, length(values))]]))
#[1] 2
Another example
A1 <- c(1, 1, 1, 1, 1, 0, 0, 1, 1, 0, 0,0, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1)
max(with(rle(A1==0), lengths[-c(1, length(lengths))][
!values[-c(1, length(values))]]))
#[1] 4
Or
indx <- A1==0
max(with(rle(A1[which(indx)[1L] : tail(which(indx),1)]==0),
lengths[!values]))
#[1] 4
Update
Based on the new info, may be you can try,
A1 <- c(1, 1, 0,1,1,1)
max(with(rle(A1==0), lengths[!values]))
#[1] 3
I have a numeric vector in R, which consists of both negative and positive numbers. I want to separate the numbers in the list based on sign (ignoring zero for now), into two seperate lists:
a new vector containing only the negative numbers
another vector containing only the positive numbers
The documentation shows how to do this for selecting rows/columns/cells in a dataframe - but this dosen't work with vectors AFAICT.
How can it be done (without a for loop)?
It is done very easily (added check for NaN):
d <- c(1, -1, 3, -2, 0, NaN)
positives <- d[d>0 & !is.nan(d)]
negatives <- d[d<0 & !is.nan(d)]
If you want exclude both NA and NaN, is.na() returns true for both:
d <- c(1, -1, 3, -2, 0, NaN, NA)
positives <- d[d>0 & !is.na(d)]
negatives <- d[d<0 & !is.na(d)]
It can be done by using "square brackets".
A new vector is created which contains those values which are greater than zero. Since a comparison operator is used, it will denote values in Boolean. Hence square brackets are used to get the exact numeric value.
d_vector<-(1,2,3,-1,-2,-3)
new_vector<-d_vector>0
pos_vector<-d_vector[new_vector]
new1_vector<-d_vector<0
neg_vector<-d_vector[new1_vector]
purrrpackage includes some useful functions for filtering vectors:
library(purrr)
test_vector <- c(-5, 7, 0, 5, -8, 12, 1, 2, 3, -1, -2, -3, NA, Inf, -Inf, NaN)
positive_vector <- keep(test_vector, function(x) x > 0)
positive_vector
# [1] 7 5 12 1 2 3 Inf
negative_vector <- keep(test_vector, function(x) x < 0)
negative_vector
# [1] -5 -8 -1 -2 -3 -Inf
You can use also discard function