I just ran into an interesting nuance with the not ! operator in subsetting while answering this question link.
Check out:
y <- 1:10
y[!y]
integer(0)
y[4] <- NA
y[!y]
[1] NA
y[6] <- 0
y[!y]
[1] NA 0
From R documentation:
! indicates logical
negation (NOT)
How is 0 and NA both NOT y?
You're not subsetting using equality, you are coercing the numerics 1:10 to logical--and any numeric other than 0 is coerced to TRUE. Run, e.g.,
!(1:10)
# [1] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
You get 10 FALSEs, so when you subset a any vector of length 10 with 10 FALSEs, you get nothing.
As documented in ?TRUE and ?NA, a logical comparison with NA results in NA.
And, of course, 0 is coerced to FALSE, so !0 is coerced to TRUE, so when you set the 6th element to 0,
!c(1:5, 0, 7:10)
# [1] FALSE FALSE FALSE FALSE FALSE TRUE FALSE FALSE FALSE FALSE
# 1 2 3 4 5 ^^^6 7 8 9 10
You get a TRUE in the 6th position, so subsetting with that will return the 6th element.
How is 0 and NA both NOT y?
You might be looking for y[y != y]?
0s are interpreted as FALSE in logical operations (Boolean algebra).
!0 = !(FALSE) = TRUE.
Likewise, non-0 valid (i.e. non-NA) numerical values are interpreted as TRUE in logical operations.
The NAs are always tricky, see Frank's comment above.
Related
I am new to R. I have created an object a:
a <- c(2,4,6,8,10,12,14,16,18,20)
I have performed the following operation on the vector:
a[!c(10,0,8,6,0)]
and I get the output as 4 10 14 20
I do understand that !c(10,0,8,6,0) produces the output as FALSE TRUE FALSE FALSE TRUE
I don't understand how the final results comes out to be 4 10 14 20
Can someone help?
We obtain the results because the logical vector is recycled (as its length is only 5 compared to length(a) which is 10) to meet the end of the 'a' vector i..e
i1 <- rep(!c(10,0,8,6,0), length.out = length(a))
i1
[1] FALSE TRUE FALSE FALSE TRUE FALSE TRUE FALSE FALSE TRUE
If we use that vector
a[i1]
[1] 4 10 14 20
It is easier to understand if we just pass TRUE, then the TRUE is recycled to return all the elements or the reverse with FALSE
a[TRUE]
[1] 2 4 6 8 10 12 14 16 18 20
a[FALSE]
numeric(0)
The recycling is mentioned in the documentation of ?Extract
For [-indexing only: i, j, ... can be logical vectors, indicating elements/slices to select. Such vectors are recycled if necessary to match the corresponding extent. i, j, ... can also be negative integers, indicating elements/slices to leave out of the selection.
In most of the languages, 0 is considered as FALSE and other values as TRUE. So, when we negate the 0 (FALSE) is converted to TRUE and all others to FALSE
Suppose I have a data.frame like
a <- data.frame(col1=1:6,
col2=c('a','b',1,'c',2,3),
stringsAsFactors=F)
a
col1 col2
1 1 a
2 2 b
3 3 1
4 4 c
5 5 2
6 6 3
I want to have a vector saying which rows have col2 as a number. I'm trying something like
apply(a$col2,1,is.numeric)
or
apply(a$col2,FUN=is.numeric)
but it always says
Error in apply(a$col2, 1, is.numeric) :
dim(X) must have a positive length
If a$col2 (the X in apply) must be a matrix, then why does the help from the function say:
X: an array, including a matrix.
The help on arrays says:
An array in R can have one, two or more dimensions.
If an array can have only one dimension, then why can't a one-dimensional array be used in apply? What am I missing here?
(Beyond that, I still would like to know how to find the numeric rows in col2 without using a loop.)
First note that even the numbers in col2 are character since when combined with other elements which are character they get coerced to character.
str(a)
## 'data.frame': 6 obs. of 2 variables:
## $ col1: int 1 2 3 4 5 6
## $ col2: chr "a" "b" "1" "c" ...
1) grepl thus we should use character processing like this:
grepl("^\\d+$", a$col2)
## [1] FALSE FALSE TRUE FALSE TRUE TRUE
grepl is alredy vectorized so we don't need an apply or related function to iterate over the elements of col2.
2) (s)apply These also work but seems unnecessarily involved given that grepl alone works:
sapply(a$col2, grepl, pattern = "^\\d+$")
## a b 1 c 2 3
## FALSE FALSE TRUE FALSE TRUE TRUE
apply(array(a$col2), 1, grepl, pattern = "^\\d+$")
## [1] FALSE FALSE TRUE FALSE TRUE TRUE
3) type.convert Another approach is to use type.convert which will convert to numeric if it can be represented as one. Then we can use is.numeric.
sapply(a$col2, function(x) is.numeric(type.convert(x)))
## a b 1 c 2 3
## FALSE FALSE TRUE FALSE TRUE TRUE
I've started learning R and got a piece of code in which a statement is:
if(sum(C == C[i]) == 1)# C is simply a vector and i is index of a value in this vector which the user specifies in an argument.
How can you pass a conditional statement as an argument of a function? Also explain the meaning of this statement.
Thank you.
Let's take an example to understand
Consider C as a numeric vector from 1 to 10 and let's take i as 3
C <- 1:10
i <- 3
So when we do
C == C[i]
#[1] FALSE FALSE TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
it compares every element of C with C[i] which is 3 and returns a corresponding logical vector which is only TRUE at 3rd index.
When we sum this logical vector it returns count of all TRUE (as it considers FALSE as 0 and TRUE as 1) values which in this case is 1
sum(C == C[i])
#[1] 1
which is then compared to 1 again to make sure that there is only one C[i] in C
sum(C == C[i]) == 1
#[1] TRUE
This will fail in case if we have repeated numbers in C. For example,
C <- c(1:10, 3) #Adding an extra 3 in the end
C
#[1] 1 2 3 4 5 6 7 8 9 10 3
i <- 3
sum(C == C[i]) == 1
#[1] FALSE
The bottom line is the condition is TRUE if C[i] occurs only once in C.
I am getting some unexpected behavior using %in% c() versus == c() to filter data on multiple conditions. I am returning incomplete results when the == c() method. Is there a logical explanation for this behavior?
df <- data.frame(region = as.factor(c(1,1,1,2,2,3,3,4,4,4)),
value = 1:10)
library(dplyr)
filter(df, region == c(1,2))
filter(df, region %in% c(1,2))
# using base syntax
df[df$region == c(1,2),]
df[df$region %in% c(1,2),]
The results do not change if I convert 'region' to numeric.
I am returning incomplete results when the == c() method. Is there a
logical explanation for this behavior?
That's kind of logical, let's see:
df$region == 1:2
# [1] TRUE FALSE TRUE TRUE FALSE FALSE FALSE FALSE FALSE FALSE
df$region %in% 1:2
# [1] TRUE TRUE TRUE TRUE TRUE FALSE FALSE FALSE FALSE FALSE
The reason is in the first form your trying to compare different lenght vectors, as #lukeA said in his comment this form is the same as (see implementation-of-standard-recycling-rules):
# 1 1 1 2 2 3 3 4 4 4 ## df$region
# 1 2 1 2 1 2 1 2 1 2 ## c(1,2) recycled to the same length
# T F T T F F F F F F ## equality of the corresponding elements
df$region == c(1,2,1,2,1,2,1,2,1,2)
# [1] TRUE FALSE TRUE TRUE FALSE FALSE FALSE FALSE FALSE FALSE
Where each value on the left hand side of the operator is tested with the corresponding value on the right hand side of the operator.
However when you use df$region %in% 1:2 it's more in the idea:
sapply(df$region, function(x) { any(x==1:2) })
# [1] TRUE TRUE TRUE TRUE TRUE FALSE FALSE FALSE FALSE FALSE
I mean each value is tested against the second vector and TRUE is returned if there's one match.
It is quite surprise to me that I could not find a ready answer for this question on stackoverflow.
In R, I have a vector of 0 and 1, and I want to convert it to binary vector, with 0 becomes F and 1 becomes T.
How could I do that?
Thanks
My comment should have been an answer. You can just do:
as.logical(c(0,1,1,0))
We can use !!:
rep(0:1, 5)
[1] 0 1 0 1 0 1 0 1 0 1
!!rep(0:1, 5)
[1] FALSE TRUE FALSE TRUE FALSE TRUE FALSE TRUE FALSE TRUE
All 0s will be converted to FALSE, any other numeric to TRUE.
We can use !=
c(0,1,1,0)!=0
#[1] FALSE TRUE TRUE FALSE
If the object size in memory is important (TRUE or FALSE are 56 bits size) a good option is the bit package, that can turn TRUE or FALSE values in a vector in to a 1 bit values vector.