Why does filtering elements of a vector with '[]' results in NA, while 'which' function does not return any NA?
Here is an example:
setor <- c('residencial','residencial',NA,'comercial')
setor[setor == 'residencial']
#"residencial" "residencial" NA`
setor[which(setor=='residencial')]
#[1] "residencial" "residencial"
Your help would be much appreciated!
Because when you use == for comparison it returns NA for NA values.
setor == 'residencial'
#[1] TRUE TRUE NA FALSE
and subsetting with NA returns NA
setor[setor=='residencial']
#[1] "residencial" "residencial" NA
However, when we use which it doesn't count NA's and returns index of only TRUE values.
which(setor=='residencial')
#[1] 1 2
setor[which(setor=='residencial')]
#[1] "residencial" "residencial"
We could use %in%, which returns FALSE where there are NA elements
setor %in% 'residencial'
#[1] TRUE TRUE FALSE FALSE
It also works when we need to subset more than one element, i.e.
setor %in% c('residencial', 'comercial')
#[1] TRUE TRUE FALSE TRUE
and this can be directly used to subset
setor[setor %in% 'residencial']
#[1] "residencial" "residencial"
Related
An alternative title to this question is:
When is an NA date not an NA?
Answer: when it is infinity formatted as a date.
Converting infinity to a date results in NA being displayed, but it is not!
> BadDates <- as.Date(c(NA,Inf,-Inf))
> BadDates
[1] NA NA NA
> is.na(BadDates)
[1] TRUE FALSE FALSE
This causes confusion when trying to catch errors.
A work-around is to test for infinity and NA
> is.na(BadDates) & is.infinite(BadDates)
[1] FALSE FALSE FALSE
Is there a better way to manage this quirk of the Date class?
is.finite(as.Date(NA)) and is.infinite(as.Date(NA)) are both FALSE so is.finite will be FALSE for all the bad dates but TRUE for a good date.
BadDates <- as.Date(c(NA,Inf,-Inf))
is.finite(BadDates)
## [1] FALSE FALSE FALSE
GoodDate <- Sys.Date()
is.finite(GoodDate)
## [1] TRUE
It works the same without dates so it is not specific to Date class.
x <- c(NA, Inf, -Inf)
is.finite(x)
## [1] FALSE FALSE FALSE
is.finite(3)
## [1] TRUE
I have example data as follows:
library(data.table)
dat <- fread("q1 q2 ...1 ..2 q3..1 ..1
NA response other else response other
1 4 NA NA 1 NA")
I wanted to filter out all columns that are automatically named when reading in an Excel file with missing column names, which have names like ..x. I thought that the following piece of code would work:
grepl("\\.+", names(dat))
[1] FALSE FALSE TRUE TRUE TRUE TRUE
But it also filters out columns which have a similar structure as column q3..1.
Although I do not know why the ..x part is added to such a column (because it was not empty), I would like to adapt the grepl code, so that the outcome is TRUE, unless the structure is ONLY ..x.
How should I do this?
Desired output:
grepl("\\.+", names(dat))
[1] FALSE FALSE TRUE TRUE FALSE TRUE
Use an anchor ^ to state that the dots have to be in the start of the string:
grepl("^\\.+", names(dat))
#[1] FALSE FALSE TRUE TRUE FALSE TRUE
We may do
library(dplyr)
dat %>%
select(matches('^[^.]+$'))
q1 q2
<int> <char>
1: NA response
2: 1 4
I am trying to change the logical values (elements) of my list based on another list. Basically, where both lists are "TRUE", I want to change the value in the main list to "FALSE". Both lists are lengths of 5. For example
List_A <- list(c(TRUE,FALSE,TRUE),c(FALSE,TRUE,TRUE),c(FALSE,FALSE,FALSE),c(TRUE,TRUE,TRUE),c(TRUE,FALSE,TRUE))
List_B <-list(c(FALSE,FALSE,FALSE),c(TRUE,TRUE,FALSE),c(TRUE,TRUE,TRUE),c(FALSE,FALSE,TRUE),c(FALSE,TRUE,TRUE))
List B has sequences as name attributes.
Desired output:
Output <-
list(c(TRUE,FALSE,TRUE),c(FALSE,FALSE,TRUE),c(FALSE,FALSE,FALSE),c(TRUE,TRUE,FALSE),c(TRUE,FALSE,FALSE))
In other words, elements in listA remain the same unless they have matching TRUE values in both lists, which replaces them to FALSE.
I've tried running the for loop below but it doesn't work and I don't know how I would redirect the output, if it did.
for(i in 1:length(List_A)) { List_A[[i]][List_B[[i]]] <- FALSE }
You can take help of Map function.
If both the values are TRUE turn to FALSE or keep value from List_A.
Output <- Map(function(x, y) replace(x, x & y, FALSE), List_A, List_B)
Output
#[[1]]
#[1] TRUE FALSE TRUE
#[[2]]
#[1] FALSE FALSE TRUE
#[[3]]
#[1] FALSE FALSE FALSE
#[[4]]
#[1] TRUE TRUE FALSE
#[[5]]
#[1] TRUE FALSE FALSE
data
List_A <- list(c(TRUE,FALSE,TRUE),c(FALSE,TRUE,TRUE),c(FALSE,FALSE,FALSE),c(TRUE,TRUE,TRUE),c(TRUE,FALSE,TRUE))
List_B <- list(c(FALSE,FALSE,FALSE),c(TRUE,TRUE,FALSE),c(TRUE,TRUE,TRUE),c(FALSE,FALSE,TRUE),c(FALSE,TRUE,TRUE))
We can use map2
library(purrr)
map2(List_A, List_B, ~ !(.x & .y))
data
List_A <- list(c(TRUE,FALSE,TRUE),c(FALSE,TRUE,TRUE),c(FALSE,FALSE,FALSE),c(TRUE,TRUE,TRUE),c(TRUE,FALSE,TRUE))
List_B <- list(c(FALSE,FALSE,FALSE),c(TRUE,TRUE,FALSE),c(TRUE,TRUE,TRUE),c(FALSE,FALSE,TRUE),c(FALSE,TRUE,TRUE))
I don't understand what is going on here:
Set up:
> df = data.frame(x1= rnorm(10), x2= rnorm(10))
> df[3,1] <- "the"
> df[6,2] <- "NA"
## I want to create values that will be challenging to coerce to numeric
> df$x1.fixed <- as.numeric(df$x1)
> df$x2.fixed <- as.numeric(df$x2)
## Here is the DF
> df
x1 x2 x1.fixed x2.fixed
1 0.955965351551298 -0.320454533088042 0.9559654 -0.3204545
2 -1.87960909714257 1.61618672247496 -1.8796091 1.6161867
3 the -0.855930398468875 NA -0.8559304
4 -0.400879592905882 -0.698655375066432 -0.4008796 -0.6986554
5 0.901252404134257 -1.08020133150191 0.9012524 -1.0802013
6 0.97786920899034 NA 0.9778692 NA
.
.
.
> table(is.na(df[,c(3,4)]))
FALSE TRUE
18 2
I wanted to find the rows that got converted to NAs, so I put in a complex apply that did not work as expected. I then simplified and tried again...
Question:
Simpler call:
> apply(df, 1, function(x) (any(is.na(df[x,3]), is.na(df[x,4]))))
which unexpectedly yielded:
[1] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
Instead, I'd expected:
[1] FALSE FALSE TRUE FALSE FALSE TRUE FALSE FALSE FALSE FALSE
to highlight the rows (3 & 6) where an NA existed. To verify that non-apply'ed functions would work, I tried:
> any(is.na(df[3,1]), is.na(df[3,2]))
[1] FALSE
> any(is.na(df[3,3]), is.na(df[3,4]))
[1] TRUE
as expected. To further my confusion on what apply is doing, I tried:
> apply(df, 1, function(x) is.na(df[x,1]))
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
[1,] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
[2,] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
[3,] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
[4,] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
Why is this traversing the entire DF, when I have clearly indicated both (a) that I want it in the row direction (I passed "1" into the second parameter), and (b) the value "x" is only placed in the row id, not the column id?
I understand there are other, and perhaps better, ways to do what I am trying to do (find the rows that have been changed to NA's in the new columns. But please don't supply that in the answer. Instead, please explain why apply did not work as I'd expected, and what I could do to fix it.
To find the columns that have NA's you can do:
sapply(df, function(x) any(is.na(x)))
# x1 x2 x1.fixed x2.fixed
# FALSE FALSE TRUE TRUE
A data.frame is a list of vectors, so the above function inside sapply will evaluate any(is.na( for each element of that list, i.e. each column.
As per OP edit - to get the rows that have NA's, use apply(df, 1, ... instead:
apply(df, 1, function(x) any(is.na(x)))
# [1] FALSE FALSE TRUE FALSE FALSE TRUE FALSE FALSE FALSE FALSE
apply is working exactly as it is supposed to. It is your expectations that are wrong.
apply(df, 1, function(x) is.na(df[x,1]))
The first thing that apply does (per the documentation) is coerce your data frame to a matrix. In the process, all numeric columns are coerced to character.
Next, each individual row of df is passed as the argument x to your function. In what sense is it meaningful to index df by the character values in the first row in df? So you just get a bunch of NAs. You can test this via:
> df[as.character(df[1,]),]
x1 x2 x1.fixed x2.fixed
NA <NA> <NA> NA NA
NA.1 <NA> <NA> NA NA
NA.2 <NA> <NA> NA NA
NA.3 <NA> <NA> NA NA
You say you want to know which columns introduced NAs, and yet you are applying over rows. If you really wanted to use apply (I recommend #eddi's method) you could do:
apply(df,2,function(x) any(is.na(x)))
You could use
rowSums(is.na(df))>0
[1] FALSE FALSE TRUE FALSE FALSE TRUE FALSE FALSE FALSE FALSE
to find the rows containing NAs.
I'm not sure, but I think this is a vectorized operation which might be faster than using apply in case you are working with large data.
I just wrote one, but I was wondering if one already exists in R.
Here's the function BTW (suggestions for improvement are welcome):
set.seed(50)
x <- sample(c(letters, LETTERS), 7)
is.lower <- function(x)
{
unlist(sapply(x, function(x2) {x2 %in% letters}))
}
is.lower(x)
grepl("[a-z]",x) for example?
> grepl("[a-z]",x)
[1] FALSE TRUE TRUE FALSE TRUE TRUE FALSE
And why make it difficult?
> x %in% letters
[1] FALSE TRUE TRUE FALSE TRUE TRUE FALSE
No need to make your own function.
Another approach with the values instead of a logical index as the result, would be to name the letters as themselves and use "["with x as the index:
names(letters) <- letters
letters[x]
#<NA> w k <NA> y c <NA>
# NA "w" "k" NA "y" "c" NA