How to merge only specific elements of a vector in R - r

This is probably pretty straightforward but I'm really stuck: Let's say that I have a vector=c("a", "b", "c","d","e"). How can I concatenate only some specific elements? For example, how do I merge "b" and "c", which will lead me to the vector=c("a","bc","d","e") ?
Thank you

We can do
i1 <- vector %in% c("b", "c")
c(vector[!i1], paste(vector[i1], collapse=""))

Related

R: Is there a method in R, to substiute the values of a vector using a dictionary (2 column dataframe with old and new value)

Is there a method in R, to substitute the values of a vector using a dictionary (2 column dataframe with old and new value)
The only method I know is to extract the old value into a dataframe and merge it with, what I call,the dictionary (which is a two column dataframe with old and new values). Afterwards reassign the new value to the original old value. However, it seems when using merge (at least since R v4.1, the order of the x value is not maintained, so I am using join now which keeps the original order of dataframe x intact. I am thinking that there must be an easier way, I just have not found it. Hope this is understandable, I appreciate any help.
cheers Hermann
You could use a named character vector as a dict for replacement by unquoting with !!! inside of dplyr::recode. If you have your "dict" stored as a two-column dataframe, then tidyr::deframe might be handy.
library(tidyverse)
x <- c("a", "b", "c")
dict <- tribble(
~old, ~new,
"a", "d",
"b", "e",
"c", "f"
)
recode(x, !!!deframe(dict))
#> [1] "d" "e" "f"
Created on 2021-06-14 by the reprex package (v1.0.0)
You can use match to substitute the values of a vector using a dictionary:
D$new[match(x, D$old)]
#[1] "d" "e" "f"
You can also use the names to get the new values:
L <- setNames(D$new, D$old)
L[x]
#"d" "e" "f"
Data:
x <- c("a", "b", "c")
D <- data.frame(old = c("a", "b", "c"), new = c("d", "e", "f"))

Find vector of strings in list (R)

I have a list, in which each element is a vector of strings, as:
l <- list(c("a", "b"), c("c", "d"))
I want to find the index of the element in l that contains a specific vector of strings, as c("a", "b"). How do I do that? I thought which(l %in% c("a", "b")) should work, but it returns integer(0) instead of 1.
%in% checks presence of elements of the LHS among elements of the RHS. To treat c("a", "b") as a single element of the RHS, it needs to be in a list:
which(l %in% list(c("a", "b")))
Other possibilities are to go element-by-element through l with sapply, such as
which(sapply(l,function(x) all(c("a","b") %in% x)))
# order doesn't matter, other elements allowed
which(sapply(l, identical, c("a", "b"))) # exact match, in order

Group together identical elements in a vector using r

A very simple question but couldn't find an answer.
I have a vector of characters (for example - "a" "a" "a" "c" "c" "c" "b" "b" "b").
I would like to group together the elements to "a" "c" "b".
Is there a specific function for that?
Thank you
You can using sqldf librayr and using group by:
require(sqldf)
vector<- data.frame(v=c("a", "a", "a", "c", "c", "c", "b", "b", "b"))
sqldf("SELECT v from vector group by v")
Here you go
vector <- c("a", "a", "a", "c", "c", "c", "b", "b", "b")
sorted <- sort(vector)
If you just want the unique elements then there is, well, unique.
> unique(c("a", "a", "a", "c", "c", "c", "b", "b", "b"), sort=TRUE)
[1] "a" "c" "b"
Update
With the new description of the problem, this would be my solution
shifted <- c(NA, vector[-length(vector)])
vector[is.na(shifted) | vector != shifted]
I shift the vector one to the right, putting NA at the front because I have no better idea of what to put there, and then pick out the elements that are not NA and not equal to the previous element.
If the vector contains NA, some additional checks will be needed. It is not obvious how to put something that isn't the first element in the first position of the shifted vector without knowing a bit more. For example, you could extract all the elements form the vector and pick one that isn't the first, but that would fail if the vector only contains identical elements.
Another question now: is there a smarter way to implement the shift operation? I couldn't think of one, but there might be an more canonical solution.

R how to find the intersection of a subest of vectors in a list

I have a list of vectors (characters). For example:
my_list <- list(c("a", "b", "c"),
c("a", "b", "c", "d"),
c("e", "d"))
For the intersection of all these three vectors, I could use: Reduce(intersect, my_list). But as you can see, there is no common element in all three vectors.
Then, what if I want to find the common element that appears "at least" a certain amount of times in the list? Such as: somefunction(my_list, time=2) would give me c("a", "b", "c", "d") because those elements appear two times.
Thanks.
We can convert this to a data.table and do the group by action to get the elements
library(data.table)
setDT(stack(setNames(my_list, seq_along(my_list))))[,
if(uniqueN(ind)==2) values , values]$values
#[1] "a" "b" "c" "d"
A base R option would be to unlist the 'my_list', find the frequency count with the replicated sequence of 'my_list' using table, get the column sums, check whether it is equal to 2 and use that index to subset the names.
tblCount <- colSums(table(rep(seq_along(my_list), lengths(my_list)), unlist(my_list)))
names(tblCount)[tblCount==2]
#[1] "a" "b" "c" "d"
If you assume that each element will appear no more than once in a vector, you can "unlist" your vectors and count the frequency.
Here, using dplyr functions
library(dplyr)
my_list %>% unlist %>% data_frame(v=.) %>% count(v) %>% filter(n>=2) %>% .[["v"]]
Or base functions
subset(as.data.frame(table(unlist(my_list))), Freq>=2)$Var1
This works:
my_list %>%
purrr::map(~ .) %>%
purrr::reduce(.f = dplyr::intersect, .x = .)

How to calculate how many times vector appears in a list? in R

I have a list of 10,000 vectors, and each vector might have different elements and different lengths. I would like to know how many unique vectors I have and how often each unique vector appears in the list.
I guess the way to go is the function "unique", but I don't know how I could use it to also get the number of times each vector is repeated.
So what I would like to get is something like that:
"a" "b" "c" d" 301
"a" 277
"b" c" 49
being the letters, the contents of each unique vector, and the numbers, how often are repeated.
I would really appreciate any possible help on this.
thank you very much in advance.
Tina.
Maybe you should look at table:
Some sample data:
myList <- list(A = c("A", "B"),
B = c("A", "B"),
C = c("B", "A"),
D = c("A", "B", "B", "C"),
E = c("A", "B", "B", "C"),
F = c("A", "C", "B", "B"))
Paste your vectors together and tabulate them.
table(sapply(myList, paste, collapse = ","))
#
# A,B A,B,B,C A,C,B,B B,A
# 2 2 1 1
You don't specify whether order matters (that is, is A, B the same as B, A). If it does, you can try something like:
table(sapply(myList, function(x) paste(sort(x), collapse = ",")))
#
# A,B A,B,B,C
# 3 3
Wrap this in data.frame for a vertical output instead of horizontal, which might be easier to read.
Also, do be sure to read How to make a great R reproducible example? as already suggested to you.
As it is, I'm just guessing at what you're trying to do.

Resources