Summing Vector elements until a Positive element is encountered - R - r

I'm new to R and I'm looking through a book called "Discovering Statistics using R".
Although the book implies you don't need any statistical background, some of the content isn't covered/explained...
I'm trying to sum the elements of a vector starting from position 1 until a positive element is present.
I found this question which is very similar to what I'm trying to achieve. However when I implement it, it doesn't always seem to work (and it sometimes appears to include the first positive element)...
My program is:
vecA <- runif(10, -10, 10);
sumA <-sum(vecA [1:min(which(vecA < 0))]);
Is there a more robust way to calculate this without using loops that works every time and doesn't add the positive element? I'm not at the looping stage of my books yet.
I also found this site which asks a similar question but their answer errors:
sum(vecA [seq_len(which.max(vecA > 0)]);

You can use the following code:
sum(vecA * !cumsum(vecA > 0))
This also works if the first element is positive or all elements are negative.

You want to use > not < to sum all elements until the first positive one is reached.
You're currently summing from 1 until the first negative value is reached (including the first negative value).
sum(vecA[1:min(which(vecA>0))-1])
the which() function will return all of the positions of the positive elements, then taking the sum from 1 to the position of the first positive - 1 will guarantee you are summing all of the negative elements

match function is usually the fastest to find the first occurrence of some element in a vector, so another version of this could look like follows:
first.positive <- match(TRUE, vecA > 0)
sumA <- sum( vecA[ 1 : first.positive ] ) - vecA[first.positive]
This will give you zero if positive element is the first.

Related

Calculating the index of an element in non-repetitive permutation

The following question is about math. The matter is, how to calculate the index of an element in a non-repetitive permutation. Example,
A={a,b,c} The permutation is then 3!=6 therefore: (a,b,c);(a,c,b);(b,a,c);(b,c,a);(c,a,b);(c,b,a)
I researched for algorithm to get the index of an element in this permutation. In internet there are only repetitive permutation algorithms.
The index of (b,c,a) is in this zero-based list, obviously 3. Is there an easy way to calculate the position directly by formula ?
I do not need the itertools from python. Because i use very large permutations.(Example 120!) I messed once with python's itertools' permutations function to get the index of an element over the list iterator. But the results were weary. I need a mathematical solution to get the index directly.
Thanks for reading.
Some clues:
You have n! permutations. Note that (n-1)! permutations start from the first element (a), next (n-1)! permutations start from the second element (b) and so on.
So you can calculate the first term of permutation rank as (n-1)! * Ord(P[0]) where Ord gives ordering number of the first element of permutation in initial sequence (0 for a, 1 for b etc).
Then continue with the second element using (n-2)! multiplier and so on.
Don't forget to exclude used elements from order - for your example b is used, so at the second stage c has index 1 rather 0, ad rank is 2!*1 + 1!*1 + 0! * 0 = 3

Counting all the matchings of a pattern in a vector in R

I have a boolean vector in which I want to count the number of occurrences of some patterns.
For example, for the pattern "(1,1)" and the vector "(1,1,1,0,1,1,1)", the answer should be 4.
The only built-in function I found to help is grepRaw, which finds the occurrences of a particular string in a longer string. However, it seems to fail when the sub-strings matching the pattern overlap:
length(grepRaw("11","1110111",all=TRUE))
# [1] 2
Do you have any ideas to obtain the right answer in this case?
Edit 1
I'm afraid that Rich's answer works for the particular example I posted, but fails in a more general setting:
> sum(duplicated(rbind(c(FALSE,FALSE),embed(c(TRUE,TRUE,TRUE,FALSE,TRUE,TRUE,TRUE),2))))
[1] 3
In this other example, the expected answer would be 0.
Using the function rollapply you can apply a moving window of width = 2 summing the values. Then you can sum the records where the result is equal to 2 i.e. sum(c(1,1))
library(zoo)
z <- c(1,1,1,0,1,1,1)
sum(rollapply(z, 2, sum) == 2)

How to find the length of a list based on a condition in R

The problem
I would like to find a length of a list.
The expected output
I would like to find the length based on a condition.
Example
Suppose that I have a list of 4 elements as follows:
myve <–list(1,2,3,0)
Here I have 4 elements, one of them is zero. How can I find the length by extracting the zero values? Then, if the length is > 1I would like to substruct one. That is:
If the length is 4 then, I would like to have 4-1=3. So, the output should be 3.
Note
Please note that I am working with a problem where the zero values may be changed from one case to another. For example, For the first list may I have only one 0 value, while for the second list may I have 2 or 3 zero values.
The values are always positive or zero.
You just need to apply the condition to each element. This will produce a list of boolean, then you sum it to get the number of True elements (i.e. validation your condition).
In your case:
sum(myve != 0)
In a more complex case, where the confition is expressed by a function f:
sapply(myve, f)
Use sapply to extract the ones different to zeros and sum to count them
sum(sapply(myve, function(x) x!=0))

Find n closest non-NA values to position t in vector

This is probably a simple question for those experienced in R, but it is something that I (a novice) am struggling with...
I have two examples of vectors that are common to the problem I am trying to solve, A and B:
A <- c(1,3,NA,3,NA,4,NA,1,7,NA,2,NA,9,9,10)
B <- c(1,3,NA,NA,NA,NA,NA,NA,NA,NA,2,NA,9)
#and three scalars
R <- 4
t <- 5
N <- 3
There is a fourth scalar, n, where 0<=n<=N. In general, N <= R.
I want to find the n closest non-NA values to t such that they fall within a radius R centered on t. I.e., the search radius, R comprises of R+1 values. For example A, the search radius sequence is (3,NA,3,NA,4,NA,1), where t=NA, the middle value in the search radius sequence.
The expected answer can be one of two results for A:
answerA1 <- c(3,4,1)
OR
answerA2 <- c(3,4,3)
The expected answer for B:
answerB <- c(1,3)
How would I accomplish this task in the most time- and space-efficient manner? One liners, loops, etc. are welcome. If I have to choose a preference, it is for speed!
Thanks in advance!
Note:
For this case, I understand that the third closest non-NA value may involve choosing a preference for the third value to fall on either the right or left of t (as shown by the two possible answers above). I do not have a preference for whether this values falls to the left or the right of t but, if there is a way to leave it to random chance, (whether the third value falls to the right or the left) that would be ideal (but, again, it is not a requirement).
A relatively short solution is:
orderedA <- A[order(abs(seq_len(length(A)) - t))][seq_len(R*2)]
n_obj <- min(sum(is.na(orderedA)), N, length(na.omit(orderedA)))
res <- na.omit(orderedA)[seq_len(n_obj)]
res
#[1] 3 4 3
Breaking this down a little more the steps are:
Order A, by the absolute distance from the position of interest, t.
Code is: A[order(abs(seq_len(length(A)) - t))]
Subset to the first R*2 elements (so this will get the elements on either side of t within R.
Code is: [seq_len(R*2)]
Get the first min(N, # of non-NA, len of non-NA) elements
Code is: min(sum(is.na(orderedA)), N, length(na.omit(orderedA)))
Drop NA
Code is: na.omit()
Take first elements determined in step 3 (whichever is smaller)
Code is: [seq_len(n_obj)]
Something like this?
thingfinder <- function(A,R,t,n) {
left <- A[t:(t-R-1)]
right <- A[t:(t+R+1)]
leftrightmat <- cbind(left,right)
raw_ans <- as.vector(t(leftrightmat))
ans <- raw_ans[!is.na(raw_ans)]
return(ans[1:n])
}
thingfinder(A=c(1,3,NA,3,NA,4,NA,1,7,NA,2,NA,9,9,10), R=3, t=5, n=3)
## [1] 3 4 3
This would give priority to the left side, of course.
In case it is helpful to others, #Mike H. also provided me with a solution to return the index positions associated with the desired vector elements res:
A <- setNames(A, seq_len(length(A)))
orderedA <- A[order(abs(seq_len(length(A)) - t))][seq_len(R*2)]
n_obj <- min(sum(is.na(orderedA)), N, length(na.omit(orderedA)))
res <- na.omit(orderedA)[seq_len(n_obj)]
positions <- as.numeric(names(res))

array index difference notation Python <-> R

what is the Python notation a[i-j] translated to R? As far as I understand it, it should be the array element at position i-j. But in R it seems to be the array until the ith element subtracted by the element at position j.
R and Python have somewhat similar indexing properties, with the main difference being that indexing in Python starts at 0 while in R it starts at 1. Beyond the index start, there is also the fact that Python supports negative indexing, while in R negative indexing means that you are removing the element at that exact index from your list. To be specific to your case, the indexing list[i-j] could be somewhat the same thing if i - j returns a positive integer. Otherwise, you are talking about two completely different things. The illustration below should be helpful to you:
Python:
#Create a list
lst = [1,3,5,6,7,7]
#index element at 4-2 (which is 2)
lst[4-2] # returns 5
#index element at 2-4 (which is -2) or lst[len(lst)-2]
lst[2-4] # returns 7
R:
lst <- c(1,3,5,6,7,7)
#indexing element at 4-2 (which is 2)
lst[4-2] # returns 3 (because R indexing starts at 1, not 0)
[1] 3
#BUT indexing element at 2-4 (which is -2) does not work,
#because it means that you are removing the element at index 2, i.e. 3
lst[2-4] #returns the original list without element at index 2
[1] 1 5 6 7 7
These are the main differences in indexing a list that I could offer to help with your question. The differences in indexing become more prominent as you tackle more complicated data structures in both languages.
I hope this is helpful.

Resources