R grep NAs from vector - r

How do I use grep() to get NAs from a vector?
i.e: when I try grep(NA, c(1,NA))
I get [1] NA NA

You want is.na():
> vec <- c(1,NA)
> is.na(vec)
[1] FALSE TRUE
If you want the NA, try
> which(is.na(vec))
[1] 2
> vec[which(is.na(vec))]
[1] NA
> vec[is.na(vec)] # simpler, logical subscripting
[1] NA
If you don't, negate the output from is.na():
> !is.na(vec)
[1] TRUE FALSE
> which(!is.na(vec))
[1] 1
> vec[which(!is.na(vec))]
[1] 1
> vec[!is.na(vec)] ## simpler, logical subscripting
[1] 1
One reason your code doesn't work is that you gave NA as the pattern. To R this means that the pattern is not defined, so whether either of the elements of the vector match this pattern is also undefined - hence both are NA in the output.

grep is the wrong option here. Use the built-in function is.na instead.
> is.na(c(1,NA))
[1] FALSE TRUE
EDIT: if you want the integer indices rather than true/falses (which is more like what grep returns), use which(is.na()).

Don't; use which and is.na instead:
> which(is.na(c(1,NA)))
[1] 2
> which(is.na(c(NA,1,NA)))
[1] 1 3

Use is.na(c(1, NA)).

Related

String matching within a list of lists [duplicate]

I have a list like this:
map_tmp <- list("ABC",
c("EGF", "HIJ"),
c("KML", "ABC-IOP"),
"SIN",
"KMLLL")
> grep("ABC", map_tmp)
[1] 1 3
> grep("^ABC$", map_tmp)
[1] 1 # by using regex, I get the index of "ABC" in the list
> grep("^KML$", map_tmp)
[1] 5 # I wanted 3, but I got 5. Claiming the end of a string by "$" didn't help in this case.
> grep("^HIJ$", map_tmp)
integer(0) # the regex do not return to me the index of a string inside the vector
How can I get the index of a string (exact match) in the list?
I'm ok not to use grep. Is there any way to get the index of a certain string (exact match) in the list? Thanks!
Using lapply:
which(lapply(map_tmp, function(x) grep("^HIJ$", x))!=0)
The lapply function gives you a list of which for each element in the list (0 if there's no match). The which!=0 function gives you the element in the list where your string occurs.
Use either mapply or Map with str_detect to find the position, I have run only for one string "KML" , you can run it for all others. I hope this is helpful.
First of all we make the lists even so that we can process it easily
library(stringr)
map_tmp_1 <- lapply(map_tmp, `length<-`, max(lengths(map_tmp)))
### Making the list even
val <- t(mapply(str_detect,map_tmp_1,"^KML$"))
> which(val[,1] == T)
[1] 3
> which(val[,2] == T)
integer(0)
In case of "ABC" string:
val <- t(mapply(str_detect,map_tmp_1,"ABC"))
> which(val[,1] == T)
[1] 1
> which(val[,2] == T)
[1] 3
>
I had the same question. I cannot explain why grep would work well in a list with characters but not with regex. Anyway, the best way I found to match a character string using common R script is:
map_tmp <- list("ABC",
c("EGF", "HIJ"),
c("KML", "ABC-IOP"),
"SIN",
"KMLLL")
sapply( map_tmp , match , 'ABC' )
It returns a list with similar structure as the input with 'NA' or '1', depending on the result of the match test:
[[1]]
[1] 1
[[2]]
[1] NA NA
[[3]]
[1] NA NA
[[4]]
[1] NA
[[5]]
[1] NA

grep exact match in vector inside a list in R

I have a list like this:
map_tmp <- list("ABC",
c("EGF", "HIJ"),
c("KML", "ABC-IOP"),
"SIN",
"KMLLL")
> grep("ABC", map_tmp)
[1] 1 3
> grep("^ABC$", map_tmp)
[1] 1 # by using regex, I get the index of "ABC" in the list
> grep("^KML$", map_tmp)
[1] 5 # I wanted 3, but I got 5. Claiming the end of a string by "$" didn't help in this case.
> grep("^HIJ$", map_tmp)
integer(0) # the regex do not return to me the index of a string inside the vector
How can I get the index of a string (exact match) in the list?
I'm ok not to use grep. Is there any way to get the index of a certain string (exact match) in the list? Thanks!
Using lapply:
which(lapply(map_tmp, function(x) grep("^HIJ$", x))!=0)
The lapply function gives you a list of which for each element in the list (0 if there's no match). The which!=0 function gives you the element in the list where your string occurs.
Use either mapply or Map with str_detect to find the position, I have run only for one string "KML" , you can run it for all others. I hope this is helpful.
First of all we make the lists even so that we can process it easily
library(stringr)
map_tmp_1 <- lapply(map_tmp, `length<-`, max(lengths(map_tmp)))
### Making the list even
val <- t(mapply(str_detect,map_tmp_1,"^KML$"))
> which(val[,1] == T)
[1] 3
> which(val[,2] == T)
integer(0)
In case of "ABC" string:
val <- t(mapply(str_detect,map_tmp_1,"ABC"))
> which(val[,1] == T)
[1] 1
> which(val[,2] == T)
[1] 3
>
I had the same question. I cannot explain why grep would work well in a list with characters but not with regex. Anyway, the best way I found to match a character string using common R script is:
map_tmp <- list("ABC",
c("EGF", "HIJ"),
c("KML", "ABC-IOP"),
"SIN",
"KMLLL")
sapply( map_tmp , match , 'ABC' )
It returns a list with similar structure as the input with 'NA' or '1', depending on the result of the match test:
[[1]]
[1] 1
[[2]]
[1] NA NA
[[3]]
[1] NA NA
[[4]]
[1] NA
[[5]]
[1] NA

Can extract_numeric deal with negative numbers?

Is there a way to use tidyr's extract_numeric() to extract negative numbers?
For example,
> extract_numeric("2%")
[1] 2
> extract_numeric("-2%")
[1] 2
I'd really like the second call to return -2.
Bill
PS: While it doesn't concern me today, I suspect cases such as "-$2.00" complicate any general solution.
extract_numeric is pretty simple:
> extract_numeric
function (x)
{
as.numeric(gsub("[^0-9.]+", "", as.character(x)))
}
<environment: namespace:tidyr>
It just replaces any char that isn't 0 to 9 or "." with nothing. So "-1" will become 1, and there's nothing you can do about it... except maybe file an enhancement request to tidyr, or write your own...
extract_num = function(x){as.numeric(gsub("[^0-9\\-]+","",as.character(x)))}
will sort of do it:
> extract_num("-$1200")
[1] -1200
> extract_num("$-1200")
[1] -1200
> extract_num("1-1200")
[1] NA
Warning message:
In extract_num("1-1200") : NAs introduced by coercion
but a regexp could probably do better, only allowing minus signs at the start...
Just use sub if there's a single number in the string. Here's an approach:
The function:
myfun <- function(s) as.numeric(sub(".*?([-+]?\\d*\\.?\\d+).*", "\\1", s))
Examples:
> myfun("-2%")
[1] -2
> myfun("abc 2.3 xyz")
[1] 2.3
> myfun("S+3.")
[1] 3
> myfun(".5PPP")
[1] 0.5

Vector-list comparison in R

I am currently trying to check if a list(containing multiple vectors filled with values) is equal to a vector. Unfortunately the following functions did not worked for me: match(), any(), %in%. An example of what I am trying to achieve is given below:
Lets say:
lists=list(c(1,2,3,4),c(5,6,7,8),c(9,7))
vector=c(1,2,3,4)
answer=match(lists,vector)
When I execute this it does return False values instead of a positive result. When I compare a vector with a vector is working but when I compare a vector with a list it seems that it can not work properly.
I would use intersect, something like this :
lapply(lists,intersect,vector)
[[1]]
[1] 1 2 3 4
[[2]]
numeric(0)
[[3]]
numeric(0)
I'm not completely sure what you want the result to be (for example do you care about vector order?) but regardless you'll need to think about lapply. For example,
##Create some data
R> lists=list(c(1,2,3,4),c(5,6,7,8),c(9,7))
R> vector=c(1,2,3,4)
then we use lapply to go through each list element and apply a function. In this case, I've used the match function (since you mentioned that in your question):
R> lapply(lists, function(i) all(match(i, vector)))
[[1]]
[1] TRUE
[[2]]
[1] NA
[[3]]
[1] NA
It's probably worth converting to a vector, so
R> unlist(lapply(lists, function(i) all(match(i, vector))))
[1] TRUE NA NA
and to change NA to FALSE, something like:
m = unlist(lapply(lists, function(i) all(match(i, vector))))
m[is.na(m)] = FALSE

excluding FALSE elements from a character vector by using logical vector

I manage to do the following:
stuff <- c("banana_fruit","apple_fruit","coin","key","crap")
fruits <- stuff[stuff %in% grep("fruit",stuff,value=TRUE)]
but I can't get select the-not-so-healthy stuff with the usual thoughts and ideas like
no_fruit <- stuff[stuff %not in% grep("fruit",stuff,value=TRUE)]
#or
no_fruit <- stuff[-c(stuff %in% grep("fruit",stuff,value=TRUE))]
don't work. The latter just ignores the "-"
> stuff[grep("fruit",stuff)]
[1] "banana_fruit" "apple_fruit"
> stuff[-grep("fruit",stuff)]
[1] "coin" "key" "crap"
You can only use negative subscripts with numeric/integer vectors, not logical because:
> -TRUE
[1] -1
If you want to negate a logical vector, use !:
> !TRUE
[1] FALSE
As Joshua mentioned: you can't use - to negate your logical index; use ! instead.
stuff[!(stuff %in% grep("fruit",stuff,value=TRUE))]
See also the stringr package for this kind of thing.
stuff[!str_detect(stuff, "fruit")]
There is also a parameter called 'invert' in grep that does essentially what you're looking for:
> stuff <- c("banana_fruit","apple_fruit","coin","key","crap")
> fruits <- stuff[stuff %in% grep("fruit",stuff,value=TRUE)]
> fruits
[1] "banana_fruit" "apple_fruit"
> grep("fruit", stuff, value = T)
[1] "banana_fruit" "apple_fruit"
> grep("fruit", stuff, value = T, invert = T)
[1] "coin" "key" "crap"

Resources