R - List manipulation element concatenation - r

Assume I have a list with 5 elements:
list <- list("A", "B", "C", "D", c("E", "F"))
I am trying to return this to a simple character vector using purrr with the need to combine list elements that have two strings into one, separated by a delimiter such as '-'. The output should look like this:
chr [1:5] "A" "B" "C" "D" "E-F"
I've tried a ton of approaches including paste, paste0, str_c and where I am getting hung up is it seems that map applies the function to each individual string of an element of a list and not the group of strings of an element (when there are more than one). The closes I've gotten is:
list2 <- unlist(map(list, str_flatten))
str(list2)
This returns:
chr [1:5] "A" "B" "C" "D" "EF"
where I need a hyphen between E and F:
chr [1:5] "A" "B" "C" "D" "E-F"
When I try to pass a function as a parenthetiinton to str_flatten(), such as str_flatten(list, collapse = "-"), it doesn't work. The big problem is I can't figure out what string to pass as an argument in str_flatten to group two strings of a given element of a list.

You almost had it. Try
library(purrr)
library(stringr)
unlist(map(lst, str_flatten, collapse = "-"))
#[1] "A" "B" "C" "D" "E-F"
You could also use map_chr
map_chr(lst, str_flatten, collapse = "-")
Without additional packages and with thanks to #G.Grothendieck you could do
sapply(lst, paste, collapse = "-")
data
lst <- list("A", "B", "C", "D", c("E", "F"))

We can also use map_chr and paste.
library(purrr)
lst <- list("A", "B", "C", "D", c("E", "F"))
map_chr(lst, ~paste(.x, collapse = "-"))
# [1] "A" "B" "C" "D" "E-F"

Related

Complementary sequence using gsub

I'm trying to make the complementary sequence of a dna chain stored in a vector.
It's supposed to change the "A" for the "T" and the "C" for the "G" and vice versa, the thing is, I need this to happen to the first vector and print the complementary sequence correctly. This is what I tried but got stucked:
pilot_sequence <- c("C","G","A","T","C","C","T","A","T")
complement_sequence_display <- function(pilot_sequence){
complement_chain_Incom <- gsub("A", "T", pilot_sequence)
complement_chain <- paste(complement_chain_Incom, collapse = "")
cat("Complement sequence: ", complement_chain, "\n")
}
complement_chain_Incom <- gsub("A","T", pilot_sequence)
complement_chain <- paste(complement_chain_Incom, collapse= "")
complement_sequence_display(pilot_sequence)
I got as answer: CGTTCCTTT,just the second and penultimate T are correct, how do I solve to the rest of letters ?
the pilot_sequence vector is character type and the functions displays no execution errors.
This is a ideal use case for chartr function:
chartr("ATGC","TACG",pilot_sequence)
output:
[1] "G" "C" "T" "A" "G" "G" "A" "T" "A"
You can do this with purrr::map:
pilot_sequence |> purrr::map_chr(~case_when(
.x == "T" ~ "A",
.x == "G" ~ "C",
.x == "A" ~ "T",
.x == "C" ~ "G"
))
#> [1] "G" "C" "T" "A" "G" "G" "A" "T" "A"
You can use recode from dplyr
library(dplyr)
recode(pilot_sequence, "C" = "G", "G" = "C", "A" = "T", "T" = "A")
Or in base R, create a named vector and use match to match the values location in the named vector and then call name to get the names
pilot_sequence <- c("C","G","A","T","C","C","T","A","T")
values = c("G" = "C", "C" = "G", "A" = "T", "T" = "A")
names(values[match(pilot_sequence, values)])
"G" "C" "T" "A" "G" "G" "A" "T" "A"

Combine list components to a vector

Suppose, I have a list:
l = list(c("a", "b", "c"), c("d", "e", "f"))
[[1]]
[1] "a" "b" "c"
[[2]]
[1] "d" "e" "f"
I want to get a vector.
"ad" "be" "cf"
I can convert the list to a matrix, e.g.,sapply(l, c), and then concatenate columns, but, perhaps, there is an easier way.
We can use Reduce with paste0
Reduce(paste0, l)
[1] "ad" "be" "cf"
Or with do.call
do.call(paste0, l)
[1] "ad" "be" "cf"
Here is another option
> apply(list2DF(l), 1, paste0, collapse = "")
[1] "ad" "be" "cf"

Is it possible to remove variables with a certain pattern from a datatable or list?

For example if I have a list which contains: "a", "ab", "b", "c", "ad" as variables.
Is it possible to remove all variables which contain an "a", without writing every single variable down?
I think grep or grepl could help
> grep("a",v,value = TRUE, invert = TRUE)
[1] "b" "c"
or
> v[!grepl("a",v)]
[1] "b" "c"
Data
v <- c("a","ab","b","c","ad")
“variables” are conventionally called “names” in R.
So if you want to remove them from a list-like structure, you can manipulate its names, and then subset the list with the resulting vector of names.
x = x[grep('a', names(x), value = TRUE, invert = TRUE)]
Or, using grepl instead:
x = x[! grepl('a', names(x))]
An option with str_subset
library(stringr)
str_subset(v, "a", negate = TRUE)
#[1] "b" "c"
data
v <- c("a","ab","b","c","ad")

How to Apply String Vector to Logical Vector

I would like to replace any instances of TRUE in a logical vector with the corresponding elements of a same-lengthed string vector.
For example, I would like to combine:
my_logical <- c(TRUE, FALSE, TRUE)
my_string <- c("A", "B", "C")
to produce:
c("A", "", "C")
I know that:
my_string[my_logical]
gives:
"A" "C"
but can't seem to figure out how to return a same-lengthed vector. My first thought was to simply multiply the vectors together, but that raises the error "non-numeric argument to binary operator."
Another option with replace
replace(my_string, !my_logical, "")
#[1] "A" "" "C"
What about:
my_logical <- c(TRUE, FALSE, TRUE)
my_string <- c("A", "B", "C")
my_replace <- ifelse(my_logical==TRUE,my_string,'')
my_replace
[1] "A" "" "C"
Edit, thanks #www:
ifelse(my_logical, my_string, "")
Maybe:
my_string[ !my_logical ] <- ""
my_string
# [1] "A" "" "C"
Of course this overwrites existing object.
Use ifelse to add NA when my_logical equals FALSE (TRUE otherwise). Use this to subset.
new <- my_string[ifelse(!my_logical, NA, T)]
new
[1] "A" NA "C"
If you want "" over NA do this next.
new[is.na(new)] <- ""
[1] "A" "" "C"

joining lists of different length in R

I am not sure if R has the capabilities to do this, but I'd like to join two different lists of different lengths so that it's like a nested list within a list (if that makes sense).
edit: I'd like to add values in x as an additional value in z.
z <- c("a", "b", "c")
x <- c("c", "g")
c(z, x)
[1] "a" "b" "c" "c" "g"
# what I'd really like to see
[1] "a" "b" "c" "c, g"
I think it would be something similar to doing the following in python pandas
self.z.append(x)
We can paste the 'x' together and concatenate with 'z'
c(z, toString(x))
#[1] "a" "b" "c" "c, g"

Resources