R: Is there a method in R, to substiute the values of a vector using a dictionary (2 column dataframe with old and new value) - r

Is there a method in R, to substitute the values of a vector using a dictionary (2 column dataframe with old and new value)
The only method I know is to extract the old value into a dataframe and merge it with, what I call,the dictionary (which is a two column dataframe with old and new values). Afterwards reassign the new value to the original old value. However, it seems when using merge (at least since R v4.1, the order of the x value is not maintained, so I am using join now which keeps the original order of dataframe x intact. I am thinking that there must be an easier way, I just have not found it. Hope this is understandable, I appreciate any help.
cheers Hermann

You could use a named character vector as a dict for replacement by unquoting with !!! inside of dplyr::recode. If you have your "dict" stored as a two-column dataframe, then tidyr::deframe might be handy.
library(tidyverse)
x <- c("a", "b", "c")
dict <- tribble(
~old, ~new,
"a", "d",
"b", "e",
"c", "f"
)
recode(x, !!!deframe(dict))
#> [1] "d" "e" "f"
Created on 2021-06-14 by the reprex package (v1.0.0)

You can use match to substitute the values of a vector using a dictionary:
D$new[match(x, D$old)]
#[1] "d" "e" "f"
You can also use the names to get the new values:
L <- setNames(D$new, D$old)
L[x]
#"d" "e" "f"
Data:
x <- c("a", "b", "c")
D <- data.frame(old = c("a", "b", "c"), new = c("d", "e", "f"))

Related

How to use the same R recode function on multiple variables without coding each?

From the recode examples, what if I have two variables where I want to apply the same recode?
factor_vec1 <- factor(c("a", "b", "c"))
factor_vec2 <- factor(c("a", "d", "f"))
How can I recode the same answer without writing a recode for each factor_vec? These don't work, do I need to learn how to use purrr to do it, or is there another way?
Output 1: recode(c(factor_vec1, factor_vec2), a = "Apple")
Output 2: recode(c(factor_vec2, factor_vec2), a = "Apple", b =
"Banana")
If there are not many items needed to be recoded, you can try a simple lookup table approach using base R.
v1 <- c("a", "b", "c")
v2 <- c("a", "d", "f")
# lookup table
lut <- c("a" ="Apple",
"b" = "Banana",
"c" = "c",
"d" = "d",
"f" = "f")
lut[v1]
lut[v2]
You can reuse the lookup table for any relevant variables. The results are:
> lut[v1]
a b c
"Apple" "Banana" "c"
> lut[v2]
a d f
"Apple" "d" "f"
Use lists to hold multiple vectors and then you can apply same function using lapply/map.
library(dplyr)
list_fac <- lst(factor_vec1, factor_vec2)
list_fac <- purrr::map(list_fac, recode, a = "Apple", b = "Banana")
You can keep the vectors in list itself (which is better) or get the changed vectors in global environment using list2env.
list2env(list_fac, .GlobalEnv)

How to merge only specific elements of a vector in R

This is probably pretty straightforward but I'm really stuck: Let's say that I have a vector=c("a", "b", "c","d","e"). How can I concatenate only some specific elements? For example, how do I merge "b" and "c", which will lead me to the vector=c("a","bc","d","e") ?
Thank you
We can do
i1 <- vector %in% c("b", "c")
c(vector[!i1], paste(vector[i1], collapse=""))

R how to find the intersection of a subest of vectors in a list

I have a list of vectors (characters). For example:
my_list <- list(c("a", "b", "c"),
c("a", "b", "c", "d"),
c("e", "d"))
For the intersection of all these three vectors, I could use: Reduce(intersect, my_list). But as you can see, there is no common element in all three vectors.
Then, what if I want to find the common element that appears "at least" a certain amount of times in the list? Such as: somefunction(my_list, time=2) would give me c("a", "b", "c", "d") because those elements appear two times.
Thanks.
We can convert this to a data.table and do the group by action to get the elements
library(data.table)
setDT(stack(setNames(my_list, seq_along(my_list))))[,
if(uniqueN(ind)==2) values , values]$values
#[1] "a" "b" "c" "d"
A base R option would be to unlist the 'my_list', find the frequency count with the replicated sequence of 'my_list' using table, get the column sums, check whether it is equal to 2 and use that index to subset the names.
tblCount <- colSums(table(rep(seq_along(my_list), lengths(my_list)), unlist(my_list)))
names(tblCount)[tblCount==2]
#[1] "a" "b" "c" "d"
If you assume that each element will appear no more than once in a vector, you can "unlist" your vectors and count the frequency.
Here, using dplyr functions
library(dplyr)
my_list %>% unlist %>% data_frame(v=.) %>% count(v) %>% filter(n>=2) %>% .[["v"]]
Or base functions
subset(as.data.frame(table(unlist(my_list))), Freq>=2)$Var1
This works:
my_list %>%
purrr::map(~ .) %>%
purrr::reduce(.f = dplyr::intersect, .x = .)

How to calculate how many times vector appears in a list? in R

I have a list of 10,000 vectors, and each vector might have different elements and different lengths. I would like to know how many unique vectors I have and how often each unique vector appears in the list.
I guess the way to go is the function "unique", but I don't know how I could use it to also get the number of times each vector is repeated.
So what I would like to get is something like that:
"a" "b" "c" d" 301
"a" 277
"b" c" 49
being the letters, the contents of each unique vector, and the numbers, how often are repeated.
I would really appreciate any possible help on this.
thank you very much in advance.
Tina.
Maybe you should look at table:
Some sample data:
myList <- list(A = c("A", "B"),
B = c("A", "B"),
C = c("B", "A"),
D = c("A", "B", "B", "C"),
E = c("A", "B", "B", "C"),
F = c("A", "C", "B", "B"))
Paste your vectors together and tabulate them.
table(sapply(myList, paste, collapse = ","))
#
# A,B A,B,B,C A,C,B,B B,A
# 2 2 1 1
You don't specify whether order matters (that is, is A, B the same as B, A). If it does, you can try something like:
table(sapply(myList, function(x) paste(sort(x), collapse = ",")))
#
# A,B A,B,B,C
# 3 3
Wrap this in data.frame for a vertical output instead of horizontal, which might be easier to read.
Also, do be sure to read How to make a great R reproducible example? as already suggested to you.
As it is, I'm just guessing at what you're trying to do.

Subset a data frame using OR when the column contains a factor

I would like to make a subset of a data frame in R that is based on one OR another value in a column of factors but it seems I cannot use | with factor values.
Example:
# fake data
x <- sample(1:100, 9)
nm <- c("a", "a", "a", "b", "b", "b", "c", "c", "c")
fake <- cbind(as.data.frame(nm), as.data.frame(x))
# subset fake to only rows with name equal to a or b
fake.trunk <- fake[fake$nm == "a" | "b", ]
produces the error:
Error in fake$nm == "a" | "b" :
operations are possible only for numeric, logical or complex types
How can I accomplish this?
Obviously my actual data frame has more than 3 values in the factor column so just using != "c" won't work.
You need fake.trunk <- fake[fake$nm == "a" | fake$nm == "b", ]. A more concise way of writing that (especially with more than two conditions) is:
fake[ fake$nm %in% c("a","b"), ]
Another approach would be to use subset() and write
fake.trunk = subset(fake, nm %in% c('a', 'b'))

Resources