Use to dictionary to relace certain specific values - r

I have the following ilst
list <- c("AB", "G", "H")
Now I have certain letters that should be replaced. So fe. B and H should be replaced.
So what I have not is:
replace_letter <- c("B", "H")
for(letter in replace_letter){
for (i in list){
print(i)
print(letter)
if(grepl(letter, i)){
new_value <- gsub(letter,"XXX",i)
print("yes")
}
else{
print("no")
}
}
}
However the XXX in my code should be replace by certain lookup values/.
So instead a B -> B+, in stead of H -> H**.
So I need some kind of dictionary function to replace the XXX with something specific.
Does anybody have suggestion how I can include this in the code above?

Data and dictionary
dictionary <- data.frame(From = LETTERS,
To = LETTERS[c(2:length(LETTERS), 1)], stringsAsFactors = F)
set.seed(1234)
data <- LETTERS[sample(length(LETTERS), 10, replace = T)]
Here is the replace-function
replace <- function(input, dictionary){
dictionary[which(input == dictionary$From),]$To
}
Apply it to data:
sapply(data, replace, dictionary = dictionary)
# C Q P Q W Q A G R N
# "D" "R" "Q" "R" "X" "R" "B" "H" "S" "O"
You just have to adjust your dictionary according to your needs.

I use the function plyr::mapvalues to do this. The function takes three arguments, the strings to do the replacement on, and two vectors from and to that define the replacement.
e.g.
plyr::mapvalues(letters[1:3], c("b", "c"), c("x", "y"))
# [1] "a" "x" "y"

I switched to the newer dplyr library, so I'll add another answer here:
In an interactive session I would enter the replacements in dplyr::recode directly:
dplyr::recode(letters[1:3], "b"="x", "c"="y")
# [1] "a" "x" "y"
Using a pre-defined dictionary, you'll have to use UQS to unquote the dictionary due to the tidy-eval semantics of dpylr:
dict <- c("b"="x", "c"="y")
dict
# b c
# "x" "y"
dplyr::recode(letters[1:3], UQS(dict))
# [1] "a" "x" "y"

Related

Is it possible to remove variables with a certain pattern from a datatable or list?

For example if I have a list which contains: "a", "ab", "b", "c", "ad" as variables.
Is it possible to remove all variables which contain an "a", without writing every single variable down?
I think grep or grepl could help
> grep("a",v,value = TRUE, invert = TRUE)
[1] "b" "c"
or
> v[!grepl("a",v)]
[1] "b" "c"
Data
v <- c("a","ab","b","c","ad")
“variables” are conventionally called “names” in R.
So if you want to remove them from a list-like structure, you can manipulate its names, and then subset the list with the resulting vector of names.
x = x[grep('a', names(x), value = TRUE, invert = TRUE)]
Or, using grepl instead:
x = x[! grepl('a', names(x))]
An option with str_subset
library(stringr)
str_subset(v, "a", negate = TRUE)
#[1] "b" "c"
data
v <- c("a","ab","b","c","ad")

How can I use an arguement supplied to a user defined function as both an input and a character string? [duplicate]

This question already has answers here:
In R, how to get an object's name after it is sent to a function?
(4 answers)
Closed 2 years ago.
I find I often am comparing two character vectors to see where they don't match up (typically columns in two different data frames). Because I'm doing this often, I want to write a function to make it easier. This is what I've come up with so far:
x <- c("A", "B", "C")
y <- c("B", "C", "D", "X")
check_mismatch <- function(vec1, vec2) {
vec1 <- unique(as.character(vec1))
vec2 <- unique(as.character(vec2))
missing_from_1 <- vec2[vec2 %notin% vec1]
missing_from_2 <- vec1[vec1 %notin% vec2]
print("Missing from vector 1")
print(missing_from_1)
print("Missing from vector 2")
print(missing_from_2)
}
check_mismatch(x,y)
[1] "Missing from vector 1"
[1] "D" "X"
[1] "Missing from vector 2"
[1] "A"
What I would really like is "Missing from x" instead of "Missing from vector 1". I would like the function to output the name of the actual argument that was entered. Another example of how I would like the function to work:
check_mismatch(all_polygons_df$Plot, sb_year$Plot)
[1] "Missing from all_polygons_df$Plot"
[1] "KWI-1314B"
[1] "Missing from sb_year$Plot"
character(0)
Any suggestions on how I could do this? I'm open to other ways of displaying the output too - perhaps some kind of table. But the output needs to be flexible to different lengths of output.
Up front, deparse(substitute(...)) is what you're asking for, and that is what makes your initial question a duplicate.
Some recommendations, however:
printing things to the console is a little off (IMO), since it prepends [1] to everything you print. Consider message (or cat). Since many R environments color things based on comments, etc, I have found it useful to prepend # before some text to break it out from other portions of the same text.
Your function is operating solely in side-effect, printing something to the console and then losing it forever. The function does happen to return a single object (the value of missing_from_2, accidentally), but it might be more useful if the function returned the mismatches.
With that, I offer an alternative:
check_mismatch <- function(vec1, vec2) {
nm1 <- deparse(substitute(vec1))
nm2 <- deparse(substitute(vec2))
vec1 <- unique(as.character(vec1))
vec2 <- unique(as.character(vec2))
missing_from_1 <- vec2[!vec2 %in% vec1]
missing_from_2 <- vec1[!vec1 %in% vec2]
setNames(list(missing_from_1, missing_from_2), c(nm1, nm2))
}
check_mismatch(x, y)
# $x
# [1] "D" "X"
# $y
# [1] "A"
One immediate benefit is that we can look for specific differences in one of the vectors immediately:
mis <- check_mismatch(x, y)
mis$x
# [1] "D" "X"
However, this uses the names of the variables presented to it. Realize that with non-standard evaluation comes responsibility and consequence. Consider:
mis <- check_mismatch(x, c("A", "B", "E"))
mis
# $x
# [1] "E"
# $`c("A", "B", "E")`
# [1] "C"
The name of the second element is atrocious. Fortunately, if all you care about is what the differences are for the second element, once can still use [[2]] to retrieve the character vector without issue. (This is mostly aesthetic.)
mis[[2]]
# [1] "C"
Also, one might want to repeat this for more than two vectors, so generalizing it might be useful (for "1 or more"):
check_mismatch_many <- function(...) {
dots <- list(...)
if (!length(dots)) {
out <- list()
} else {
nms <- as.character(match.call()[-1])
out <- lapply(seq_along(dots), function(i) {
b <- unique(unlist(dots[-i]))
b[!b %in% dots[[i]]]
})
out <- replace(out, sapply(out, is.null), list(dots[[1]][0]))
names(out) <- nms
}
out
}
z <- c("Y","Z")
check_mismatch_many()
# list()
check_mismatch_many(x)
# $x
# character(0)
check_mismatch_many(x, y)
# $x
# [1] "D" "X"
# $y
# [1] "A"
check_mismatch_many(x, y, z)
# $x
# [1] "D" "X" "Y" "Z"
# $y
# [1] "A" "Y" "Z"
# $z
# [1] "A" "B" "C" "D" "X"
And finally, if you want to be a little "personal" with the presentation on the console, you can go overboard and class it with an additional print.myclass S3 method.
check_mismatch_many <- function(...) {
dots <- list(...)
if (!length(dots)) {
out <- list()
} else {
nms <- as.character(match.call()[-1])
out <- lapply(seq_along(dots), function(i) {
b <- unique(unlist(dots[-i]))
b[!b %in% dots[[i]]]
})
out <- replace(out, sapply(out, is.null), list(dots[[1]][0]))
names(out) <- nms
}
class(out) <- c("mismatch", "list")
out
}
print.mismatch <- function(x, ...) {
cat("<Mismatch>\n")
cat(str(x, give.attr = FALSE, no.list = TRUE))
invisible(x)
}
mis <- check_mismatch_many(x, y)
mis
# <Mismatch>
# $ x: chr [1:2] "D" "X"
# $ y: chr "A"
(There are a lot more things you can do in the print.mismatch method, obviously. str is the major component of it, and it is the swiss-army-knife of depicting structure.)

How to Apply String Vector to Logical Vector

I would like to replace any instances of TRUE in a logical vector with the corresponding elements of a same-lengthed string vector.
For example, I would like to combine:
my_logical <- c(TRUE, FALSE, TRUE)
my_string <- c("A", "B", "C")
to produce:
c("A", "", "C")
I know that:
my_string[my_logical]
gives:
"A" "C"
but can't seem to figure out how to return a same-lengthed vector. My first thought was to simply multiply the vectors together, but that raises the error "non-numeric argument to binary operator."
Another option with replace
replace(my_string, !my_logical, "")
#[1] "A" "" "C"
What about:
my_logical <- c(TRUE, FALSE, TRUE)
my_string <- c("A", "B", "C")
my_replace <- ifelse(my_logical==TRUE,my_string,'')
my_replace
[1] "A" "" "C"
Edit, thanks #www:
ifelse(my_logical, my_string, "")
Maybe:
my_string[ !my_logical ] <- ""
my_string
# [1] "A" "" "C"
Of course this overwrites existing object.
Use ifelse to add NA when my_logical equals FALSE (TRUE otherwise). Use this to subset.
new <- my_string[ifelse(!my_logical, NA, T)]
new
[1] "A" NA "C"
If you want "" over NA do this next.
new[is.na(new)] <- ""
[1] "A" "" "C"

R: Check if strings in a vector are present in other vectors, and return name of the match

I need a tool more selective than %in% or match(). I need a code that matches a vector of string with another vector, and that returns the names of the matches.
Currently I have the following,
test <- c("country_A", "country_B", "country_C", "country_D", "country_E", "country_F") rating_3 <- c("country_B", "country_D", "country_G", "country_K")
rating_3 <- c("country_B", "country_D", "country_G", "country_K")
rating_4 <- c("country_C", "country_E", "country_M", "country_F)
i <- 1
while (i <= 33) {
print(i)
print(test[[i]])
if (grepl(test[[i]], rating_3) == TRUE) {
print(grepl(test[[i]], rating_3)) }
i <- i+1
},
This should check each element of test present in rating_3, but for some reason, it returns only the position, the name of the string, and a warning;
[1]
[country_A]
There were 6 warnings (use warnings() to see them)
I need to know what this piece of code fails, but I'd like to eventually have it return the name only when it's inside another vector, and if possible, testing it against several vectors at once, having it print the name of the vector in which it fits, something like
[1]
[String]
[rating_3]
How could I get something like that?
Without a reproducible example, it is hard to determine what exactly you need, but I think this could be done using %in%:
# create reprex
test <- sample(letters,10)
rating_3 <- sample(letters, 20)
print(rating_3[rating_3 %in% test])
[1] "r" "z" "l" "e" "m" "c" "p" "t" "f" "x" "n" "h" "b" "o" "s" "v" "k" "w" "a"
[20] "i"

Why is this grep exclusion failed to work in R?

I am trying to do exclude certain characters when using grep in R. But I cannot get the result that I expect.
Here is the code:
x <- c("a", "ab", "b", "abc")
grep("[^b]", x, value=T)
> [1] "a" "ab" "abc"
I want to grab anything in vector x that does not contain b. It should not return "ab" or "abc".
Ultimately I want to pick up any element that contains "a" but not "b".
This is the result that I would expect:
grep("a[^b]", x, value=T)
> [1] "a"
How can I do that?
Try this:
grep("^[^b]*a[^b]*$", x, value=TRUE)
# [1] "a"
It looks for the start of the string, then allows any number of characters that are not "b", then an "a", then any number of characters that are not "b" again and then the end of the string is reached.
We can use the invert property of grep which returns values which do not match. So here it returns those values which do not have "b" in them.
grep("b", x, value = TRUE, invert = TRUE)
#[1] "a"
I've got the result, what are you looking for, using this regular expression in grep:
grep("^[^b]*$", x, value=TRUE)
[1] "a"

Resources