Removing certain elements from a vector in R [duplicate] - r

This question already has answers here:
How to delete multiple values from a vector?
(9 answers)
Closed 5 years ago.
I have a vector called data which has approximately 35000 elements (numeric). And I have a numeric vector A. I want to remove every appearance of elements of A in the vector data. For example if A is the vector [1,2]. I want to remove all appearance of 1 and 2 in the vector data. How can I do that? Is there a built in function that does this? I couldn't think of a way. Doing it with a loop would take a long time I assume. Thanks!

There is this handy %in%-operator. Look it up, one of the best things I can think of in any programming language! You can use it to check all elements of one vector A versus all elements of another vector B and returns a logical vector that gives the positions of all elements in A that can be found in B. It is what you need! If you are new to R, it might seem a bit weird, but you will get very much used to it.
Ok, so how to use it? Lets say datvec is your numeric vector:
datvec = c(1, 4, 1, 7, 5, 2, 8, 2, 10, -1, 0, 2)
elements_2_remove = c(1, 2)
datvec %in% elements_2_remove
## [1] TRUE FALSE TRUE FALSE FALSE TRUE FALSE TRUE FALSE FALSE FALSE TRUE
So, you see a vector that gives you the positions of either 1 or 2 in datvec. So, you can use it to index what yuo want to retain (by negating it):
datvec = datvec[!(datvec %in% elements_2_remove)]
And you are done!

Related

Logical Indexing with NA in R - How to set to FALSE or exclude rather than return NA? [duplicate]

This question already has answers here:
Gotchas with logical indexing and "which" in R
(2 answers)
Closed last month.
Apologies if this is a common question, but it has caused some unexpected frustration in a script I am running. I have a dataset which roughly looks like the following (though much larger in practice):
df <- data.frame(A = c(1, 2, 3, NA, NA, 6),
B = c(10, 20, 30, 40 , 50, 60))
My script cycles through a list of values from column A and is supposed to take action based on whether the values in B are larger than 25. However, the corresponding values of B for missing values in A are ALWAYS returned, whereas I want them to always be excluded. For example,
df$B[df$A == 6]
Gives the output
NA NA 60
Rather than the expected
60
Thus, the code
df$B[df$A == 6] > 25
returns
NA NA TRUE
rather than just
TRUE
Could someone explain the reason for this and any simple solutions? The immediate solution that came to mind is to remove any rows with NA values in column A, but I would prefer a solution which is robust to missingness in A and will only return the single desired logical value from B.
Whenever you ask whether Not Available (NA) value is equal to number or anything else - you got the only possible answer: The answer is Not Available (NA).
NA might be equal to 6, or to John the Baptist, or to ⛄ as well as to any other object. It is just impossible to say if it does, since the value is not available.
To get the answer you want, you can use na.omit() or na.exclude() on the results. Or you can apply yet another logical condition during subsetting:
with(df, B[A == 6 & !is.na(A)])
# [1] 60

Issue using a list of Booleans to index in R

I've had experience programming for a few years, but am relatively new to R. I ran into an unexpected result when trying to extract an entry from an array using an array of Boolean entries:
array = c(2, 3, 4, 5);
array[c(FALSE, FALSE, FALSE, TRUE)]
#output: [1] 5
array[c(0, 0, 0, 1)]
#output: [1] 2
This surprised me, since I thought FALSE and 0 were interchangeable (likewise for TRUE and 1) in this sort of process. I checked the following to make sure, and became even more confused:
T==1
#output: [1] TRUE
F==0
#output: [1] TRUE
c(0,0,0,1)==c(F,F,F,T)
#output: [1] TRUE TRUE TRUE TRUE
Can someone help explain why R is treating these indexing methods differently?
Much thanks,
as.logical(c(0,0,0,1))==c(F,F,F,T)
Explanation
In R, numeric values are treated differently than the logical values.
In your scenario, Since 2 is the first element of the array, it is returned for the second subset operation (and there is no 0th element).
PS
Only for a few operations(e.g ==), logical values are first coerced to numeric. Thanks to #IceCreamTouchan to add this in the comments above.

Series vector for approximating pi

I've been set a question about Madhava's approximation of pi. The first part of it is to create a vector which contains the first 20 terms in the series. I know I could just input the first 20 terms into a vector, however that seems like a really long winded way of doing things. I was wondering if there is an easier way to create the vector?
Currently I have the vector
g = c((-3)^(-0)/(2*0+1), (-3)^(-1)/(2*1+1), (-3)^(-2)/(2*2+1), (-3)^(-3)/(2*3+1), (-3)^(-4)/(2*4+1), (-3)^(-5)/(2*5+1), (-3)^(-6)/(2*6+1), (-3)^(-7)/(2*7+1), (-3)^(-8)/(2*8+1), (-3)^(-9)/(2*9+1), (-3)^(-10)/(2*10+1), (-3)^(-11)/(2*11+1), (-3)^(-12)/(2*12+1), (-3)^(-13)/(2*13+1), (-3)^(-14)/(2*14+1), (-3)^(-15)/(2*15+1), (-3)^(-16)/(2*16+1), (-3)^(-17)/(2*17+1), (-3)^(-18)/(2*18+1), (-3)^(-19)/(2*19+1), (-3)^(-20)/(2*20+1))
And
h = sqrt(12)
So I have done g*h to get the approximation of pi. Surely there's an easier way of doing this?
Apologies if this is relatively basic, I am very new to R and still learning how to properly use stack overflow.
Thanks.
One of the best features of R is that it is vectorised. This means that we can do operations element-wise on entire vectors rather than having to type out the operation for each element. For example, if you wanted to find the square of the first five natural numbers (starting at one), we can do this:
(1:5)^2
which results in the output
[1] 1 4 9 16 25
instead of having to do this:
c(1^2, 2^2, 3^2, 4^2, 5^2)
which gives the same output.
Applying this amazing property of R to your situation, instead of having to manually construct the whole vector, we can just do this:
series <- sqrt(12) * c(1, -1) / 3^(0:19) / seq(from=1, by=2, length.out=20)
sum(series)
which gives the following output:
[1] 3.141593
and we can see more decimal places by doing this:
sprintf("%0.20f", sum(series))
[1] "3.14159265357140338182"
To explain a little further what I did in that line of code to generate the series:
We want to multiply the entire thing by the square root of 12, hence the sqrt(12), which will be applied to every element of the resulting vector
We need the signs of the series to alternate, which is accomplished via * c(1, -1); this is because of recycling, where R recycles elements of vectors when doing vector operations. It will multiply the first element by one, the second element by -1, then recycle and multiply the third element by 1, the fourth by -1, etc.
We need to divide each element by 1, 3, 9, etc., which is accomplished by / 3^(0:19) which gives / c(3^0, 3^1, ...)
Lastly, we also need to divide by 1, 3, 5, 7, etc. which is accomplished by seq(from=1, by=2, length.out=20) (see help(seq))

Couldn't reduce the looping variable inside the "for" loop in R

I have a for loop to do a matrix manipulation in R. For some checks are true i need to come to the same row again., means i need to be reduced by 1.
for(i in 1:10)
{
if(some chk)
{
i=i-1
}
}
Actually i is not reduced for me. For an example in 5th row i'm reducing the i to 4, so again it should come as 5, but it is coming as 6.
Please advice.
My intention is:
Checking the first column values of a matrix, if I find any duplicate value, I take the second column value and append with the first row's second column and remove the duplicate row. So, when I'm removing a row I do not need increase the i in while loop. (This is just a map reduce method, append values of same key)
Variables in R for loops are read-only, you cannot modify them. What you have written would be solved completely differently in normal R code – the exact solution depending on the actual problem, there isn’t a generic, direct replacement (except by replacing the whole thing with a while loop but this is both ugly and probably unnecessary).
To illustrate this, consider these two typical examples.
Assume you want to filter all duplicated elements from a list. Instead of looping over the list and copying all duplicated elements, you can use the duplicated function which tells you, for each element, whether it’s a duplicate.
Secondly, you use standard R subsetting syntax to select just those elements which are not a duplicate:
x = x[! duplicated(x)]
(This example works on a one-dimensional vector or list, but it can be generalised to more dimensions.)
For a more complex case, let’s say that you have a vector of numbers and, for every even number in the vector, you want to double the preceding number (this is highly artificial but in signal processing you might face similar problems). In other words:
input = c(1, 3, 2, 5, 6, 7, 1, 8)
output = ???
output
# [1] 1 6 2 10 6 7 2 8
… we want to fill in ???. In the first step, we check which numbers are even:
even = input %% 2 == 0
# [1] FALSE FALSE TRUE FALSE TRUE FALSE FALSE TRUE
Next, we shift the result down – because we want to know whether the next number is even – by removing the first element, and appending a dummy element (FALSE) at the end.
even = c(even[-1], FALSE)
# [1] FALSE TRUE FALSE TRUE FALSE FALSE TRUE FALSE
And now we can multiply just these inputs by two:
output = input
output[even] = output[even] * 2
There, done.

What is the alternate to 'in' command we have in Python, for R? [duplicate]

This question already has answers here:
Test if a vector contains a given element
(8 answers)
Closed 9 years ago.
Say I have a numerical vector in R. And I want to see if a particular integer is present in the vector or not. We can do that easily in python using 'in' command and an if statement may be.
Do we have something similar in R as well? So that I don't have to use a for loop to check if the integer I want is present in a vector? I tried the following, but it does not seem to work. 'normal' is a dataframe and the second column has integers.
if (12069692 in normal[,2]) {print("yes")}
Says,
Error: unexpected 'in' in "if (12069692 in"
In R, it's called %in%:
> 1 %in% c(1, 2, 3)
[1] TRUE
> 4 %in% c(1, 2, 3)
[1] FALSE
It is vectorized on the left-hand side, so you can check multiple values at once:
> c(1, 4, 2, 1) %in% c(1, 2, 3)
[1] TRUE FALSE TRUE TRUE
(hat tip #Spacedman)

Resources