How to Apply String Vector to Logical Vector - r

I would like to replace any instances of TRUE in a logical vector with the corresponding elements of a same-lengthed string vector.
For example, I would like to combine:
my_logical <- c(TRUE, FALSE, TRUE)
my_string <- c("A", "B", "C")
to produce:
c("A", "", "C")
I know that:
my_string[my_logical]
gives:
"A" "C"
but can't seem to figure out how to return a same-lengthed vector. My first thought was to simply multiply the vectors together, but that raises the error "non-numeric argument to binary operator."

Another option with replace
replace(my_string, !my_logical, "")
#[1] "A" "" "C"

What about:
my_logical <- c(TRUE, FALSE, TRUE)
my_string <- c("A", "B", "C")
my_replace <- ifelse(my_logical==TRUE,my_string,'')
my_replace
[1] "A" "" "C"
Edit, thanks #www:
ifelse(my_logical, my_string, "")

Maybe:
my_string[ !my_logical ] <- ""
my_string
# [1] "A" "" "C"
Of course this overwrites existing object.

Use ifelse to add NA when my_logical equals FALSE (TRUE otherwise). Use this to subset.
new <- my_string[ifelse(!my_logical, NA, T)]
new
[1] "A" NA "C"
If you want "" over NA do this next.
new[is.na(new)] <- ""
[1] "A" "" "C"

Related

Match all elements of a pattern with a vector and in the same order

I created a function yes.seq that takes two arguments, a pattern pat and data dat. The function looks for the presence of a pattern in the data and in the same sequence
for example
dat <- letters[1:10]
dat
[1] "a" "b" "c" "d" "e" "f" "g" "h" "i" "j"
pat <- c('a',"c","g")
yes.seq(pat = pat,dat = dat)
# [1] TRUE
because this sequence is in the pattern and in the same order
"a" "b" "c" "d" "e" "f" "g" "h" "i" "j"
If, for example, 'dat' is reversed, then we get FALSE:
yes.seq(pat = pat, dat = rev(dat))
# [1] FALSE
Here is my function
yes.seq <- function(pat , dat){
lv <- rep(F,length(pat))
k <- 1
for(i in 1:length(dat)){
if(dat[i] == pat[k])
{
lv[k] <- TRUE
k <- k+1
}
if(k==length(pat)+1) break
}
return( all(lv) )
}
Are there any more efficient solutions, this function is too slow for me
We could paste them and use either grepl
grepl(paste(pat, collapse=".*"), paste(dat, collapse=""))
#[1] TRUE
or str_detect
library(stringr)
str_detect(paste(dat, collapse=""), paste(pat, collapse=".*"))
#[1] TRUE
Another option:
yes.seq <- function(pat, dat) {
all(pat %in% dat) && all(diff(na.omit(match(pat, dat))) > 0)
}
yes.seq(pat, dat)
# [1] TRUE
yes.seq(c(pat, "ZZ"), dat)
# [1] FALSE
yes.seq(pat, rev(dat))
# [1] FALSE
Here is another base R option
yes.seq <- function(pat,dat) identical(order(match(pat, dat)), seq_along(pat))

Is it possible to remove variables with a certain pattern from a datatable or list?

For example if I have a list which contains: "a", "ab", "b", "c", "ad" as variables.
Is it possible to remove all variables which contain an "a", without writing every single variable down?
I think grep or grepl could help
> grep("a",v,value = TRUE, invert = TRUE)
[1] "b" "c"
or
> v[!grepl("a",v)]
[1] "b" "c"
Data
v <- c("a","ab","b","c","ad")
“variables” are conventionally called “names” in R.
So if you want to remove them from a list-like structure, you can manipulate its names, and then subset the list with the resulting vector of names.
x = x[grep('a', names(x), value = TRUE, invert = TRUE)]
Or, using grepl instead:
x = x[! grepl('a', names(x))]
An option with str_subset
library(stringr)
str_subset(v, "a", negate = TRUE)
#[1] "b" "c"
data
v <- c("a","ab","b","c","ad")

Corresponding Vector to TRUE/FALSE Vector in R

I have two vectors
> filename
[1] "10021978_1909-07-21_ed-1_seq-4" "10021978_1910-01-19_ed-1_seq-31"
[3] "10021978_1910-01-19_ed-1_seq-31" "10021978_1910-01-19_ed-1_seq-31"
[5] "10021978_1910-01-19_ed-1_seq-31" "10021978_1911-06-07_ed-1_seq-12"
[7] "10021978_1911-07-05_ed-1_seq-11" "10021978_1911-07-12_ed-1_seq-11"
[9] "10021978_1911-07-12_ed-1_seq-11" "10021978_1911-09-27_ed-1_seq-4"
AND
> dups = duplicated(filename)
> dups
[1] FALSE FALSE TRUE TRUE TRUE FALSE FALSE FALSE TRUE FALSE
I am dealing with exporting files but not overwriting files with duplicate file names. I have a few that are duplicates in this set of 10. What I need to do is make those filenames unique.
How can I create a new vector that would have nothing wherever the vector dups is FALSE, and then nonzero wherever TRUE? The tricky thing is that I need it to begin incrementing starting with 2 when there is a series of TRUE next to each other then reset when there is a FALSE. The vector I need for this set would be:
ans = c("", "", 2, 3, 4, "", "", "", 2, "")
so that I can append it to the filenames to deal with duplicates. The final filename vector I need would be:
[1] "10021978_1909-07-21_ed-1_seq-4" "10021978_1910-01-19_ed-1_seq-31"
[3] "10021978_1910-01-19_ed-1_seq-31-2" "10021978_1910-01-19_ed-1_seq-31-3"
[5] "10021978_1910-01-19_ed-1_seq-31-4" "10021978_1911-06-07_ed-1_seq-12"
[7] "10021978_1911-07-05_ed-1_seq-11" "10021978_1911-07-12_ed-1_seq-11"
[9] "10021978_1911-07-12_ed-1_seq-11-2" "10021978_1911-09-27_ed-1_seq-4"
Thank you very much in advance.
make.unique should be good enough, but if you need the numbering to start at 2, perhaps it is easier to use ave.
Here is an example of both so you can see the difference between the two approaches:
a <- c("a", "a", "a", "b", "c", "d", "b", "d", "e")
make.unique(a, sep = "-")
# [1] "a" "a-1" "a-2" "b" "c" "d" "b-1" "d-1" "e"
dups <- ave(a, a, FUN = seq_along)
a[duplicated(a)] <- paste(a[duplicated(a)], dups[duplicated(a)], sep = "-")
a
# [1] "a" "a-2" "a-3" "b" "c" "d" "b-2" "d-2" "e"

Use to dictionary to relace certain specific values

I have the following ilst
list <- c("AB", "G", "H")
Now I have certain letters that should be replaced. So fe. B and H should be replaced.
So what I have not is:
replace_letter <- c("B", "H")
for(letter in replace_letter){
for (i in list){
print(i)
print(letter)
if(grepl(letter, i)){
new_value <- gsub(letter,"XXX",i)
print("yes")
}
else{
print("no")
}
}
}
However the XXX in my code should be replace by certain lookup values/.
So instead a B -> B+, in stead of H -> H**.
So I need some kind of dictionary function to replace the XXX with something specific.
Does anybody have suggestion how I can include this in the code above?
Data and dictionary
dictionary <- data.frame(From = LETTERS,
To = LETTERS[c(2:length(LETTERS), 1)], stringsAsFactors = F)
set.seed(1234)
data <- LETTERS[sample(length(LETTERS), 10, replace = T)]
Here is the replace-function
replace <- function(input, dictionary){
dictionary[which(input == dictionary$From),]$To
}
Apply it to data:
sapply(data, replace, dictionary = dictionary)
# C Q P Q W Q A G R N
# "D" "R" "Q" "R" "X" "R" "B" "H" "S" "O"
You just have to adjust your dictionary according to your needs.
I use the function plyr::mapvalues to do this. The function takes three arguments, the strings to do the replacement on, and two vectors from and to that define the replacement.
e.g.
plyr::mapvalues(letters[1:3], c("b", "c"), c("x", "y"))
# [1] "a" "x" "y"
I switched to the newer dplyr library, so I'll add another answer here:
In an interactive session I would enter the replacements in dplyr::recode directly:
dplyr::recode(letters[1:3], "b"="x", "c"="y")
# [1] "a" "x" "y"
Using a pre-defined dictionary, you'll have to use UQS to unquote the dictionary due to the tidy-eval semantics of dpylr:
dict <- c("b"="x", "c"="y")
dict
# b c
# "x" "y"
dplyr::recode(letters[1:3], UQS(dict))
# [1] "a" "x" "y"

Why is this grep exclusion failed to work in R?

I am trying to do exclude certain characters when using grep in R. But I cannot get the result that I expect.
Here is the code:
x <- c("a", "ab", "b", "abc")
grep("[^b]", x, value=T)
> [1] "a" "ab" "abc"
I want to grab anything in vector x that does not contain b. It should not return "ab" or "abc".
Ultimately I want to pick up any element that contains "a" but not "b".
This is the result that I would expect:
grep("a[^b]", x, value=T)
> [1] "a"
How can I do that?
Try this:
grep("^[^b]*a[^b]*$", x, value=TRUE)
# [1] "a"
It looks for the start of the string, then allows any number of characters that are not "b", then an "a", then any number of characters that are not "b" again and then the end of the string is reached.
We can use the invert property of grep which returns values which do not match. So here it returns those values which do not have "b" in them.
grep("b", x, value = TRUE, invert = TRUE)
#[1] "a"
I've got the result, what are you looking for, using this regular expression in grep:
grep("^[^b]*$", x, value=TRUE)
[1] "a"

Resources