Vector Indexing using Logical vector - r

I am new to R. I have created an object a:
a <- c(2,4,6,8,10,12,14,16,18,20)
I have performed the following operation on the vector:
a[!c(10,0,8,6,0)]
and I get the output as 4 10 14 20
I do understand that !c(10,0,8,6,0) produces the output as FALSE TRUE FALSE FALSE TRUE
I don't understand how the final results comes out to be 4 10 14 20
Can someone help?

We obtain the results because the logical vector is recycled (as its length is only 5 compared to length(a) which is 10) to meet the end of the 'a' vector i..e
i1 <- rep(!c(10,0,8,6,0), length.out = length(a))
i1
[1] FALSE TRUE FALSE FALSE TRUE FALSE TRUE FALSE FALSE TRUE
If we use that vector
a[i1]
[1] 4 10 14 20
It is easier to understand if we just pass TRUE, then the TRUE is recycled to return all the elements or the reverse with FALSE
a[TRUE]
[1] 2 4 6 8 10 12 14 16 18 20
a[FALSE]
numeric(0)
The recycling is mentioned in the documentation of ?Extract
For [-indexing only: i, j, ... can be logical vectors, indicating elements/slices to select. Such vectors are recycled if necessary to match the corresponding extent. i, j, ... can also be negative integers, indicating elements/slices to leave out of the selection.
In most of the languages, 0 is considered as FALSE and other values as TRUE. So, when we negate the 0 (FALSE) is converted to TRUE and all others to FALSE

Related

How to use apply over a vector?

Suppose I have a data.frame like
a <- data.frame(col1=1:6,
col2=c('a','b',1,'c',2,3),
stringsAsFactors=F)
a
col1 col2
1 1 a
2 2 b
3 3 1
4 4 c
5 5 2
6 6 3
I want to have a vector saying which rows have col2 as a number. I'm trying something like
apply(a$col2,1,is.numeric)
or
apply(a$col2,FUN=is.numeric)
but it always says
Error in apply(a$col2, 1, is.numeric) :
dim(X) must have a positive length
If a$col2 (the X in apply) must be a matrix, then why does the help from the function say:
X: an array, including a matrix.
The help on arrays says:
An array in R can have one, two or more dimensions.
If an array can have only one dimension, then why can't a one-dimensional array be used in apply? What am I missing here?
(Beyond that, I still would like to know how to find the numeric rows in col2 without using a loop.)
First note that even the numbers in col2 are character since when combined with other elements which are character they get coerced to character.
str(a)
## 'data.frame': 6 obs. of 2 variables:
## $ col1: int 1 2 3 4 5 6
## $ col2: chr "a" "b" "1" "c" ...
1) grepl thus we should use character processing like this:
grepl("^\\d+$", a$col2)
## [1] FALSE FALSE TRUE FALSE TRUE TRUE
grepl is alredy vectorized so we don't need an apply or related function to iterate over the elements of col2.
2) (s)apply These also work but seems unnecessarily involved given that grepl alone works:
sapply(a$col2, grepl, pattern = "^\\d+$")
## a b 1 c 2 3
## FALSE FALSE TRUE FALSE TRUE TRUE
apply(array(a$col2), 1, grepl, pattern = "^\\d+$")
## [1] FALSE FALSE TRUE FALSE TRUE TRUE
3) type.convert Another approach is to use type.convert which will convert to numeric if it can be represented as one. Then we can use is.numeric.
sapply(a$col2, function(x) is.numeric(type.convert(x)))
## a b 1 c 2 3
## FALSE FALSE TRUE FALSE TRUE TRUE

Conditional statement in sum() function of R

I've started learning R and got a piece of code in which a statement is:
if(sum(C == C[i]) == 1)# C is simply a vector and i is index of a value in this vector which the user specifies in an argument.
How can you pass a conditional statement as an argument of a function? Also explain the meaning of this statement.
Thank you.
Let's take an example to understand
Consider C as a numeric vector from 1 to 10 and let's take i as 3
C <- 1:10
i <- 3
So when we do
C == C[i]
#[1] FALSE FALSE TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
it compares every element of C with C[i] which is 3 and returns a corresponding logical vector which is only TRUE at 3rd index.
When we sum this logical vector it returns count of all TRUE (as it considers FALSE as 0 and TRUE as 1) values which in this case is 1
sum(C == C[i])
#[1] 1
which is then compared to 1 again to make sure that there is only one C[i] in C
sum(C == C[i]) == 1
#[1] TRUE
This will fail in case if we have repeated numbers in C. For example,
C <- c(1:10, 3) #Adding an extra 3 in the end
C
#[1] 1 2 3 4 5 6 7 8 9 10 3
i <- 3
sum(C == C[i]) == 1
#[1] FALSE
The bottom line is the condition is TRUE if C[i] occurs only once in C.

subsetting with not operator `!`

I just ran into an interesting nuance with the not ! operator in subsetting while answering this question link.
Check out:
y <- 1:10
y[!y]
integer(0)
y[4] <- NA
y[!y]
[1] NA
y[6] <- 0
y[!y]
[1] NA 0
From R documentation:
! indicates logical
negation (NOT)
How is 0 and NA both NOT y?
You're not subsetting using equality, you are coercing the numerics 1:10 to logical--and any numeric other than 0 is coerced to TRUE. Run, e.g.,
!(1:10)
# [1] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
You get 10 FALSEs, so when you subset a any vector of length 10 with 10 FALSEs, you get nothing.
As documented in ?TRUE and ?NA, a logical comparison with NA results in NA.
And, of course, 0 is coerced to FALSE, so !0 is coerced to TRUE, so when you set the 6th element to 0,
!c(1:5, 0, 7:10)
# [1] FALSE FALSE FALSE FALSE FALSE TRUE FALSE FALSE FALSE FALSE
# 1 2 3 4 5 ^^^6 7 8 9 10
You get a TRUE in the 6th position, so subsetting with that will return the 6th element.
How is 0 and NA both NOT y?
You might be looking for y[y != y]?
0s are interpreted as FALSE in logical operations (Boolean algebra).
!0 = !(FALSE) = TRUE.
Likewise, non-0 valid (i.e. non-NA) numerical values are interpreted as TRUE in logical operations.
The NAs are always tricky, see Frank's comment above.

how to find if all elements in a subset of a data.frame row are TRUE

I have a data.frame with a block of columns that are logicals, e.g.
> tmp <- data.frame(a=c(13, 23, 52),
+ b=c(TRUE,FALSE,TRUE),
+ c=c(TRUE,TRUE,FALSE),
+ d=c(TRUE,TRUE,TRUE))
> tmp
a b c d
1 13 TRUE TRUE TRUE
2 23 FALSE TRUE TRUE
3 52 TRUE FALSE TRUE
I'd like to compute a summary column (say: e) that is a logical AND over the whole range of logical columns. In other words, for a given row, if all b:d are TRUE, then e would be TRUE; if any b:d are FALSE, then e would be FALSE.
My expected result is:
> tmp
a b c d e
1 13 TRUE TRUE TRUE TRUE
2 23 FALSE TRUE TRUE FALSE
3 52 TRUE FALSE TRUE FALSE
I want to indicate the range of columns by indices, as I have a bunch of columns, and the names are cumbersome. The following code works, but i'd rather use a vectorized approach to improve performance.
> tmp$e <- NA
> for(i in 1:nrow(tmp)){
+ tmp[i,"e"] <- all(tmp[i,2:(ncol(tmp)-1)]==TRUE)
+ }
> tmp
a b c d e
1 13 TRUE TRUE TRUE TRUE
2 23 FALSE TRUE TRUE FALSE
3 52 TRUE FALSE TRUE FALSE
Any way to do this without using a for loop to step through the rows of the data.frame?
You can use rowSums to loop over rows... and some fancy footwork to make it quasi-automated:
# identify the logical columns
boolCols <- sapply(tmp, is.logical)
# sum each row of the logical columns and
# compare to the total number of logical columns
tmp$e <- rowSums(tmp[,boolCols]) == sum(boolCols)
By using rowSums in ifelse statement, in one go it can be acheived:
tmp$e <- ifelse(rowSums(tmp[,2:4] == T) == 3, T, F)

Finding All Positions for Multiple Elements in a Vector

Suppose I have the following vector:
x <- c(8, 6, 9, 9, 7, 3, 2, 5, 5, 1, 6, 8, 5, 2, 9, 3, 5, 10, 8, 2)
How can I find which elements are either 8 or 9?
This is one way to do it. First I get the indices at which x is either 8 or 9. Then we can verify that at those indices, x is indeed 8 and 9.
> inds <- which(x %in% c(8,9))
> inds
[1] 1 3 4 12 15 19
> x[inds]
[1] 8 9 9 8 9 8
You could try the | operator for short conditions
which(x == 8 | x == 9)
In this specific case you could also use grep:
# option 1
grep('[89]',x)
# option 2
grep('8|9',x)
which both give:
[1] 1 3 4 12 15 19
When you also want to detect number with more than one digit, the second option is preferred:
> grep('10|8',x)
[1] 1 12 18 19
However, I did put emphasis on this specific case at the start of my answer for a reason. As #DavidArenburg mentioned, this could lead to unintended results. Using for example grep('1|8',x) will detect both 1 and 10:
> grep('1|8',x)
[1] 1 10 12 18 19
In order to avoid that side-effect, you will have to wrap the numbers to be detected in word-bounderies:
> grep('\\b1\\b|8',x)
[1] 1 10 12 19
Now, the 10 isn't detected.
Here is a generalized solution to find the locations of all target values (only works for vectors and 1-dimensional arrays).
locate <- function(x, targets) {
results <- lapply(targets, function(target) which(x == target))
names(results) <- targets
results
}
This function returns a list because each target may have any number of matches, including zero. The list is sorted (and named) in the original order of the targets.
Here is an example in use:
sequence <- c(1:10, 1:10)
locate(sequence, c(2,9))
$`2`
[1] 2 12
$`9`
[1] 9 19
Alternatively, if you do not need to use the indices but just the elements you can do
> x <- sample(1:10,20,replace=TRUE)
> x
[1] 6 4 7 2 9 3 3 5 4 7 2 1 4 9 1 6 10 4 3 10
> x[8<=x & x<=9]
[1] 9 9
If you want to find the answer using loops, then the following script will do the job:
> req_nos<- c(8,9)
> pos<-list()
> for (i in 1:length(req_nos)){
pos[[i]]<-which(x==req_nos[i])}
The output will look like this:
>pos
[[1]]
[1] 1 12 19
[[2]]
[1] 3 4 15
Here, pos[[1]] contains positions of 8 and pos[[2]] contains positions of 9. If you are using the %in% method and change the input order of elements, i.e, c(9,8) instead of c(8,9), the output will be the same for both of them. This method alleviates such problem.
grepl maybe a useful function. Note that grepl appears in versions of R 2.9.0 and later. What's handy about grepl is that it returns a logical vector of the same length as x.
grepl(8, x)
[1] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[13] FALSE FALSE FALSE TRUE FALSE FALSE FALSE FALSE
grepl(9, x)
[1] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE TRUE FALSE
[13] FALSE FALSE FALSE FALSE TRUE FALSE FALSE TRUE
To arrive at your answer, you could do the following
grepl(8,x) | grepl(9,x)

Resources